||
Tom Ridge
April 12, 2005
Contents
1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
2 Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
3 Current Automation in Interactive Provers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1 Proof Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
4.1.1 Logical System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.1.2 Intro Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
4.2 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .8
4.2.1 Rewriting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
4.2.2 Conditional Simplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.3 Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.4 Dynamic Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
4.2.5 Equational Unification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
5 Interface and Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
6 Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
6.1 Assessment wrt. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
6.2 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
6.3 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
6.4 In Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
7 Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
1 IntroductionAutomation can be key to successful mechanisation. In some situations, mechanisation is feasible without automation. Indeed, in highly abstract mathematical areas, most mechanised reasoning consists of the user spelling out complicated arguments which are far beyond those which can currently be tackled by automation. In this setting, automation, if it is used at all, is directed at easily solvable, tightly defined subproblems. A typical example of such a mechanisation is our formalisation of Ramsey's Theorem [Rid04]. On the other hand, automation can be fruitfully applied in verification style proofs, where the reasoning is relatively restricted, but the sheer level of detail makes a non-automated mechanisation infeasible.
Many man years have been spent developing fully automatic systems such as Vampire[VR] and Otter [McC]. It would be foolish to imagine that we could compete with such systems. Their performance is way beyond that of systems currently implemented in interactive theorem provers. Projects are underway [MP04] to link such systems to interactive theorem provers. This is extremely valuable work: if one knows that a first order statement is provable, then one should probably expect that the machine can provide a proof.
In this section, we outline some techniques we have applied in various case studies. Naturally we do not seek to solve the problem of automated reasoning once and for all. Rather we focus on the problems that typically arise in the case studies we have been involved with. We start by outlining the functionality we require of the automated engine. We then describe the techniques we applied, and how they were integrated. We evaluate the resulting engine qualitatively in terms of our requirements, and quantitatively with respect to a sizable case study. Few of these techniques are novel, rather, we seek to combine existing techniques in a suitable fashion.
These procedures were developed in the HOL Light theorem prover, which we found to be an excellent vehicle for prototyping different approaches.
2 Requirements
What do we require of our automation? Let us distinguish between automation for fully automatic use, and automation for interactive use, the requirements for each being considerably different.
Perhaps unexpectedly, failure of the automated proof engine is the norm, is the sense that when interactively developing complex proofs we spend most of our time on obligations that are "almost" provable. Thus we would like the prover to give us excel lent feedback as to why obligations could not be discharged. [Sym98]
This quote emphasizes an important difference between automatic and interactive proof. In automatic pro of, one typically knows that the goal is provable (or at least, suspects very strongly, and is prepared to wait a considerable amount of time before terminating a proof search). Indeed, automatic provers are judged on how many provable goals they can actually prove. In interactive pro of, "we spend most of our time on obligations that are almost provable". This is the difference between interactive and automatic proof. If we spend most of the time trying to prove goals that are simply not provable, then completeness of the proof search becomes less important. This is not to say that it loses importance altogether: if a system lacks completeness, then it will fail to prove some provable goals. It is vitally important to know what sort of goals one is giving up on, in order that one can understand what it means when a prover fails to prove a goal. Such knowledge is also useful when combining systems: in order to understand the behaviour of the system as a whole one
should first understand the behaviour of the parts.
What properties might be preferred, in an interactive setting, over completeness? For us, the most important aspect of automation is simplicity. By this we do not mean implementation simplicity (how many lines did it take to implement the system? etc.), but conceptual simplicity. For instance, simplification is used ubiquitously in interactive theorem proving. If the set of rewrite rules is not confluent, then to understand the behaviour of the simplifier, one has to understand the order in which the rules are applied. Needless to say, this is an extremely complex thing to understand, and proofs which depend on these properties are presumably extremely fragile. Conceptual simplicity for a simplifier is closely bound up with confluence and termination of the simpset. Conceptual simplicity is important if a user is to understand the system. If a system is conceptually simple, it will hopefully be simple to use.
In an interactive setting, we expect automation to fail. In order to make progress, we must understand why a proof attempt fails: the prover must provide feedback. Resolution based systems can provide feedback, but they are destructive (in the sense that the goal is converted into a normal form before the proof attempt starts, destroying the original logical structure), so that the feedback can be difficult to understand (the point where the proof fails may lo ok very different to the original goal). A better approach is to conduct the proof in a way that is as close as possible to how a human might conduct the proof. We require the proof system to be natural in some sense. In this case, if a proof attempt fails, the failing branch can often be returned directly to the user for inspection.
Feedback is related to visibility. Often a user wishes to inspect a failed proof, but only a proof trace is available, which can cause a conceptual mismatch: the user is focused on sequents, whereas the trace may be of a different nature altogether. If there are many unproved branches, then a user might not inspect them all, but might wish to step through the proof. Automatic methods, such as John Harrison's implementation of mo del elimination[Har96], often search for a pro of in a tree making use of global information about nodes visited previously. If this global information is not present in the sequent the user has access to, it will be difficult to step through the automatic pro of by simply invoking the automatic prover a step at a time: the automatic prover will not make the same decisions it made when conducting the search using global information because it only has access to the local sequent.
Many methods currently employed by interactive theorem provers, such as Isabelle's blast, leave the goal unchanged if they fail to prove it. Natural methods of proof search expect to make at least some progress in all situations, so that they can assist even if the goal is not provable. For instance, safe steps (such as∧E in many systems) should be performed, simplification steps applied and so on.
Automation should also be stable. In large proofs, one frequently mixes interactive and automatic proof. If the goals returned by automation are apt to change radically with slight variations in the goal, then the dependent interactive proofs can be rendered useless, and must be rewritten. For this reason, unsolvable subgoals returned by automation should be stable under small changes to the original goal.
If a proof can be found, then one begins to focus on aspects that make the pro of more maintainable, such as robustness. Efficiency is the icing on the cake. Completeness is also important, although not necessarily completeness in the full first order sense, but rather there should be a clear notion of the class of problems a given procedure solves. At any rate, it is vital to have some theoretical understanding of the behaviour of the system, if it is to approach the goal of being simple.
Let us summarize these points:
• Automation should be simple: it should be theoretically well understood, and predictable in use.
• Automation should also provide feedback, so that the user can assess why a proof attempt failed.
• An even stronger requirement is that automation be visible, in that one can directly inspect the execution of the automation e.g. by stepping though the pro of search.
• Automation should be natural, in order to minimise the conceptual gap between the prover and the user. For instance, automation should execute in standard logical systems that are close to, or identical with, the tactic level at which the user conducts proofs.
• Often, there are many safe steps that the user would make to progress a proof. Automation that simply returns "provable" or "not provable" is less useful than automation that at least makes progress before returning.
• Automation should be stable, so that small changes to theories do not produce large changes in the behaviour of the automation. This is important because interactive proofs contain large sections of interleaved user/automation steps.
• Automation should also be robust, in the sense that small changes do not affect automatic provability.
• We would like automation to be efficient for obvious reasons.
•Finally, we would like automation to be complete wrt. well defined classes of problems.
3 Current Automation in Interactive Provers
The current methods of automated proof in interactive theorem provers do not meet the requirements of the previous section.
Isabelle/HOL is representative of current HOL implementations as regards automation. The main automatic techniques that are used are a tableaux prover blast, and the simplifier simp. These are combined in the single auto tactic. Isabelle also includes a model elimination procedure which is similar in use to blast.
Simp is widely used, but is not a complete first order proof method. However, the fact that it does meet many of the requirements of the previous section explains why it is so popular in interactive proof. The blast method performs well on simple problems of predicate logic and sets, but it is hard to understand exactly what its properties are. It is based on a prover, leanTAP [BP95], that is complete for first order logic. On the other hand, the following goal is trivially provable in first order logic, but blast fails.
f (g a) = a┝ Зy .g(f y ) = y
This is even more upsetting when one realises that the term g a, occurring in the sequent itself, is a witness to the existential. To make matters worse, when expressed in a typed logic such as HOL, g a is the only term occurring in the sequent of the correct type that could be used! This failure to deal adequately with equational logic is a general failing of tableaux style procedures incorporating unification. The auto method is almost never used in the middle of interactive proofs because it is so unconstrained. Moreover, it is unstable, in that the subgoals it generates can be wildly different under minor changes to the goal, which renders is unusable except in tightly controlled areas.
The work [MP04] to integrate Vampire, a modern resolution prover, into Isabelle will not necessarily rectify these failings. Resolution based methods provide little support for interactive theorem proving, because the reduction to clause form means that the user has little visibility into the proof search, and little understanding of why a particular lemma failed. The proof search is a local forward synthetic search as opposed to a global backwards analytic search typical of tableaux presentations and the tactic level of Isabelle. Thus, resolution is an unnatural system for the user. In general the feedback from such systems is poor or at worst non-existent. Furthermore, whereas many steps should be considered safe(apply a terminating and confluent set of rewrites, perform a ∧E step), because the system is being used as a black box, either the goal is solved outright or, more usually, not solved at all, and information that is contained in the system about which steps can be safely applied is lost, leaving the user having to apply such steps manually. The problem here is that the system fails to make progress on goals that it is unable to prove outright.
Let us also make the following observation about first order theorem provers. These provers are designed to find large quantifier instantiations, optionally modulo equality reasoning. Yet the failure of automation in interactive theorem provers is not the failure to guess large quantifier instantiations.
4 Techniques
In the following sections, we describe how we built up the automation. First, we are trying to find proofs, so that we will need (at least) a proof search engine. Next, we wish to model, to a large extent, the way the proofs were done by hand. Apart from proof search, the other main method employed during the hand proofs was simplification. We describe how we augment the simplifier.
Conditional simplification is a form of simplification where the simplifier is invoked recursively in an attempt to solve conditions on rewrites that may be applicable to the current goal. We argue that invoking simplification on these goals is often not the best way to proceed, and suggest an alternative.
Next, when using simplification, it is a good idea to ensure that the simpset is confluent and terminating. We describe an implementation of completion. When using simplification, however, one also wants to utilise assumptions that arise during the course of a proof as rewrites. To ensure the set of rewrite rules are still confluent and terminating, one must provide some form of dynamic completion during the proof search itself. We discuss how we tackled these issues.
4.1 Proof Search
In this section, we describe our basic system for proof search. This is a single conclusion, intuitionistic system suitable for backwards proof search. We eschew unification in favour of term enumeration. This has several interesting consequences when combining proof search with other methods. Note that we are only interested in the automation of an essentially
first order system.
4.1.1 Logical System
Two main systems of proof search are resolution and tableaux. For automation in an interactive setting, resolution is too unnatural. For this reason, tableaux are more promising. We therefore restrict our attention to tableaux based systems. Tableaux systems typically proceed by negating the goal and searching for a contradiction. We even consider that this is to o unnatural. We therefore focus on tableaux systems that execute directly in standard logical systems.
Using such a system has significant advantages in terms of implementation complexity, since we can simply search at the level of HOL goals using more-or-less standard tactics. This also has advantages in terms of naturality, since the user is already familiar with proof in such a system, and feedback, since we can present failing branches directly to the user, who does not have to translate the results from some alternative proof system.
Single conclusion systems are unsuited to classical pro of, which is much better supported by multiple conclusion systems. However multiple conclusion systems are unnatural. If we are interested in classical proof, we must admit that our single conclusion system has disadvantages. On the other hand, we believe that the large scale structure of proofs is mostly intuitionistic: typically we are focussed on one goal, and our lemmas serve as intermediate points in the proof, i.e. ├ A٨ B→ C as a lemma is not typically equivalent to ├ ┑(A ٨ B ) ٧C, since we expect to reach a point in the proof where A 。ト B is provable, not to do a disjunctive split. Certainly for the domains we are interested in, all proofs were essentially intuitionistic, with classical reasoning restricted to small areas which could be dealt with by simplification or other constrained techniques.
4.1.2 Intro Rules
There is one more improvement we wish to make. Although the procedure outlined above is complete, it is rather one sided. Since most proofs make heavy use of lemmas, the antecedent of a corresponding goal tends to become rather crowded. Moreover, many lemmas are naturally viewed as introduction rules, that is, are intended to be used to refine the conclusion C rather than used to chain forward fromГ . For these reasons, we also utilize intro rules à la Paulson [PNW03]. These are typically unsafe, but again, completeness can be preserved if we backtrack, and take other precautions. These intro rules are implemented in the same way as lemmas, but are marked as intro rules. During proof search, the conclusions of such lemmas are matched against the current goal. If the conclusion matches the current goal, the goal is replaced with the condition of the intro rule, and backtracking is employed in the event that these assumptions cannot be proved. If an intro rule is marked safe, then backtracking does not occur. Note that this improvement is unnecessary from a theoretical view, and its sole motivation is to model natural forms of proof.
If completeness is considered in a certain way, then this can be seen to preserve completeness. In the presence of simplification, one needs equational unification if these rules are to behave as expected. However, currently we simulate uses of equational unification by hand.
Failure of completeness for intro rules: if we use a rule as an intro rule, then we can fail to be complete? but this is not true- either the rule is safe, in which case we are fine, or unsafe, in which case we backtrack. the requirement of unification means that we can avoid using the rule as an additional assumption. however, this only works if at some stage we can use the rule as an intro rule- there are situations where an intro rule was intended to be used, but there was no connecting chain, whereas considered as an extra assumption there would have been no problem. eunification
Moreover, since we are not utilizing such a rule as a standard lemma, it is not even clear what notion of completeness to use. Let us define completeness wrt. an intro rule to mean that if there is a proof using the rule as a lemma, then there is a proof using the rule as an intro rule. In the presence of equality, we must therefore use eunification to ensure that our intro rule can be applied wherever possible. Since at every stage we have a finite set of ground terms, and a terminating and confluent rewrite order, such an approach can be made feasible, although at this stage we manually simulate such steps.
The use of intro rules brings complications because it is not clear what notion of completeness is appropriate. If the intro rule were used as a normal lemma, then completeness is preserved, since the lemma is handled just as it would be in normal predicate logic. The idea behind an intro rule is to balance the proof search, which would otherwise operate mainly on the left. The difference between a rule ├ A → B as a lemma, and as an intro rule, is that the automation can apply→ L with the rule as a lemma, whereas as an intro rule, we expect the goal to eventually become B, and to replace this with A.
4.2 Equality
4.2.1 Rewriting
Rewriting is the process of transforming a term by replacing subterms with equal subterms. For example, we might use the lemma f (a) = a to rewrite Q f (f (a)) to Q f (a) and then to Q a. In this section, we are informal about various rewriting notions. For more information the reader may consult the excellent [BN98].
A note on terminology: a simplification order is a restriction of a rewrite order, used to prove termination of a term rewriting system. However, the word "simplification" is often used informally to refer to rewriting, and the word "simpset" is often used to refer to the system of rewrite rules. We follow this informal usage.
Two key properties of simplification are termination and confluence. Without confluence, to understand the behaviour of the simplifier one must understand the order in which rewrites are applied. This is too much to expect of the user. Termination is useful in an interactive setting, for instance, to allow feedback. Moreover, termination allows simplification to be integrated with the main proof search. If we have both confluence and termination, then each term has a unique normal form, and equality testing becomes decidable. This is an extremely useful property of a simpset.
Our basic strategy is to apply simplification eagerly,
after each step of the proof search, using assumptions to simplify other
assumptions and the conclusion. We work with a terminating and confluent
simpset. We check termination and confluence manually using completion. We
apply simplification at the types of individuals. If the simpset is terminating
and confluent, then this preserves completeness of the proof search. We also
apply simplification at boolean type, which is not a possibility in first order
logic. For instance, we employ higher order rewrites to miniscope quantifiers.
It is not too difficult to argue that this also preserves completeness of the
proof search. Assumptions resembling rewrites often arise during proof, and
these may be incorporated into the simpset. Given an assumption
x. P x, and
an assumption P t, one could simplify P t = , which is again a simplification
at Boolean type, and so not a first order operation. We do not use assumptions
of this form as rewrites:our basic approach to quantifier instantiation is via
term enumeration, and using these simplifications would destroy the
completeness of this approach. We do use assumptions of the form x = y
(possibly quantified) as rewrites. In order to retain confluence and
termination we complete the simpset wrt. these dynamically arising rewrites.
Because completion is not guaranteed to terminate, completion is limited to a
certain number of steps, although in our applications, completion always
succeeds.
4.2.2 Conditional Simplification
In this section, we describe conditional simplification, and note that the standard approach to solving the conditions -recursive invocation of the simplifier- may not be desirable. We suggest some other approaches, and discuss other problems with conditional simplification. We conclude that conditional simplification is, at present, too complicated to count as a "simple" technique, and certainly too complicated to be linked with proof search in a reasonable way. We suggest the alternative of considering conditional rewrites just as ordinary lemmas.
Simplification works with rewrites of the form a = b, rewriting a to b anywhere in the goal. Conditional simplification works with conditional rewrites of the form ├A→a = b(where A is typically a conjunction). In a sequent Г├ C, we are justified in rewriting a to b in the goal C if we can prove A from ﹃C, Г.
How should we attempt to prove the condition A? Suppose we are linking a complete proof search system with simplification. In this case, very good (from a theoretical standpoint) behaviour would be obtained by requiring that, if A is provable, then it is actually proved. Because the search for a proof of A is potentially non-terminating, we would then have to orchestrate a complicated process of interleaving the main proof search, with the subproof searches attempting to solve the conditions on conditional rewrites. Such an approach is unpalatable not only for efficiency reasons.
One of the benefits of simplification is that, unlike proof search, it is terminating. If we wish to preserve this terminating behaviour, whilst employing conditional simplification, we must tame the non-termination involved when attempting to prove the conditions. One way would be to employ our main proof search to prove the conditions, but limit the search to a particular depth. This is unpalatable too. In order to keep simplification running in a reasonable amount of time, it is likely that this depth would have to be very small, because conditions arise very frequently. It is likely that this small depth would not permit the solution of many more conditions than other simple approaches. Limiting proof search to a small depth means that its theoretical limitations also become practical limitations: any limited proof search will be incomplete, but with large depth we could hopefully ignore such incompleteness. Small depth means that we would probably encounter situations where it was clear to the user that the condition could be solved, but that the restricted depth of the search meant that the condition was not solved. In our experience, the depth would have to be at least 10 logical steps (∨E etc.), coupled with intermediate effective equality reasoning(we imagine a scenario where we have an proof search which handles equality effectively). In experiments, performance with this approach became impractical in terms of response times from the simplifier, after a depth of around 4. It is possible that this approach will be feasible for computers in the (distant?) future. At the moment it is not.
An alternative to using limited proof search to solve the conditions is to fix a decidable class of conditions, and accept that there will be some conditions that may be provable, but that will fall outside the given class. The obvious choice is propositional logic. However, the problems with conditional simplification extend much deeper than the occasional failure to prove a propositionally valid condition, as we discuss later, so that this approach would not yield any great practical benefit, although the behaviour would at least be theoretically well understood.
The current approach taken in HOL, HOL Light, and Isabelle, is to invoke the simplifier itself on the conditions. This recursive calling is potentially non-terminating (conditional simplification is extremely prone to looping), so that the number of times the simplifier is recursively invoked is limited. This restriction of the depth of search when solving conditions suffers from the same problems as those discussed in the previous paragraph. The approach is in many ways worse than that above because the exact properties of simplification, which are necessary if one is to understand its behaviour when solving conditions, may be unclear even on propositional conditions, let alone conditions involving predicates. For a typical goal, we only require that simplification makes progress in a way that is simple to understand. For conditions, our requirement is usually that the condition is solved if possible. Simplification is not a complete form of proof, so we are hoping that our conditions will be such that the simplifier usually performs well. However, the simplifier cannot prove all prepositional conditions so that its behaviour even at the propositional level is hard to understand. On quantifier reasoning, even at the most basic level, failure is the norm.
So, current approaches fail to deal adequately with the problem of non-termination when proving conditions. Moreover the class of conditions solvable by simplification is hard to characterize, and so will not qualify as "simple" for the user to understand.
4.2.3 Completion
Completion is a process that can ensure a simpset is confluent and terminating. Confluence and termination are the two main properties that we require of a simpset, so that completion is clearly a useful tool for when employing simplification. We discuss our use of completion.
The standard example of using completion on the axioms of group theory, to derive a set of terminating and confluent rewrite rules that constitute a canonical rewrite system for this theory, show that completion is a powerful technique. In our work, such power was never needed. We implemented basic completion which we found to be very useful. Because our needs were limited, we did not implement the full completion procedure of Huet, although in different settings this would be desirable. We leave this for future work.
Our completion procedure also permits conditional completion. When working in a typed setting, one often wishes to refer to a subset of the individuals at a given type. Then one introduces a predicate subtype, and restricts statements to talk about members of the subtype only.
4.2.4 Dynamic Completion
We mentioned previously that it is often the case that one wishes to use assumptions as rewrites. Clearly, the assumptions are dynamic and change throughout the course of a proof. Although we may start off with a confluent and terminating set of simplification rules, if we use assumptions as rewrites, it may very well be that our base simpset, augmented with assumptions, is no longer confluent and terminating. If we wish to preserve the properties that accompany a terminating and confluent simpset, then we must perform some sort of completion during the course of a proof. We call such completion dynamic, since it must be performed on the fly during the course of a proof.
Unfortunately, completion is a rather costly process. Moreover, it is in general nonterminating. Yet, when one looks at the kind of rules that are being used, typically they are of a very simple form. For instance, we often derive assumptions similar to
a = b ٧ a = c, then proceed to do a disjunctive elimination. The resulting goals have assumptions a = b and a = c which we wish to use as simplification rules. The majority of these simplification rules are ground. In this case, completion is guaranteed to terminate, and one feels that there should be a way of short-circuiting the completion procedure to get an equivalent procedure that perhaps runs faster. Hopefully, if we can reduce completion to mere rewriting of one rule by another, then one can hope that the rewriting procedures are relatively highly optimised (certainly this is the case for Isabelle's fantastically good simplifier, and HOL Light's simplifier is optimised to some extent also), and that the performance will be acceptable.
There are several papers covering completion with ground terms. We looked at [GNP +93]. This algorithm takes a set of ground rewriting rules, and produces a reduced canonical rewriting system in polynomial time. Unfortunately, although the procedure operates in polynomial time, it makes use of congruence closure. Congruence closure is not implemented in HOL Light or Isabelle, and it is likely that an unoptimised version would be relatively slow (that is, although the algorithm from [GNP+93] is polynomial in time, there would be a high constant of proportionality). Of course, during a proof one is making incremental changes to the set of equalities, so one might hope to improve on this somewhat, but still there was a feeling that we should be able to do better.
Looking at the assumptions we use as rewrites, it was clear that beyond being ground, they were of an even simpler nature: no subterm of a left hand side was used in any other rewrite rule (for instance, with assumption a = b, there were no proper subterms of the left hand side at all, and we never had occasion that another assumption of the form a = d occurred at the same time). In this instance, it sufficed to rewrite all the other simplification rules with this rule in order to maintain confluence and termination. If we require online completion where these conditions are not satisfied, we would fall back on our (time limited) completion procedure, but in our case study this does not happen. When considered in the context of the proof of correctness of basic completion, this result is clear. However, during our case studies there were a number of tricky scenarios which, without this optimisation, caused our prover to take extraordinarily longer.
4.2.5 Equational Unification
Our top level proof search is based on enumerating terms up to a fixed term depth. Unification is not present at all in such a procedure. We introduce intro rules, à la Paulson, into our search procedure. Currently this is accomplished simply by matching conclusions of intro rules against the current goal.
The intro rules are treated like other lemmas, in that they are incorporated into the
sequent as additional assumptions. However, they are marked as intro rules. We instantiate them with terms from the enumeration just like other lemmas. The conclusions get rewritten by the simplifier. At each stage of the proof search, we check whether the conclusion of any intro rule matches the current goal, and if so we replace the goal with the condition of the intro rule, backtracking over unsafe intro rules. This has the effect of simulating equational unification.
5 Interface and Integration
In this section we describe how the techniques outlined in the previous section are integrated into a single tactic, and the interface between this tactic and the user.
Our automation consists of a tableaux proof search, with quantifier instantiation to a fixed term depth, interspersed with calls to the simplifier. How should these be integrated? If simplification is terminating and confluent, then the obvious strategy of applying simplification after every step of proof search preserves completeness. This is the essential observation behind the integration of these two techniques.
The tactic provides an interface to the user. To use the procedure the user must specify a set of terminating and confluent rewrite rules and a set of introduction rules. In fact, the user also specifies a set of lemmas that should be considered during the pro of, and these are simply incorporated into the goal statement as described previously. Proof search proceeds, with safe rules and simplification applied eagerly, and with backtracking over unsafe rules. If proof is unsuccessful, the failing branch is returned to the user at the point where no safe rules, or simplification steps, could be applied. A downside to our approach is that the sequent can get rather large, since we are instantiating quantifiers with all (type-correct) terms below a certain depth. It would be trivial to adapt the user interface so that these instantiations are not directly presented to the user, but are viewable on demand. We leave this to further work.
6 Assessment
In this section, we discuss how our proof procedure fairs in terms of the criteria outlined in Sect. 2. We then discuss the issue of completeness. We do not claim to have a formal proof for completeness wrt. some class of problems, but we do argue that the procedure has good properties. We assess the efficiency by locating our technique wrt. other theorem proving techniques such as resolution. We then assess the practical use of the procedure by giving examples of its success, and by describing the "feel" of the procedure.
6.1 Assessment wrt. Requirements
Our procedure was designed with the criteria of Sect. 2 in mind. We feel that it meets the requirement of simplicity. Conceptually, we are performing a standard intuitionistic proof search. We incorporate safe and unsafe rules. We use simplification, both at the logical level (type boolean) and at domain specific types. Our simplification rules should be confluent and terminating. We ensure that any assumptions used as rewrites do not destroy this important property. We use simplification, and conditional simplification rules only in restricted instances where they are well behaved. All these notions are readily comprehensible.
Conceptual simplicity leads to simplicity when using the procedure. We must only decide on our simpset, our set of intro rules, and the lemmas relevant to the goal. The tactic executes at the tactic level, so we can step through a failed proof attempt by invoking the tactic a step at a time. Since the procedure is non-destructive (in the sense that we search directly using the rules of natural deduction, rather than converting to normal form), we have a high level of visibility into the proof search. Failing branches are typically immediately comprehensible. The notion of safe steps means that even if we fail to find a proof, we return to the user having performed a considerable amount of work i.e. we make progress. Incompleteness manifests itself as a failure to guess large quantifier instantiations. Incompleteness can also arise as a failure of completion, which the user invokes prior to proof search. Incompleteness
also arises during pro of search, with the failure of dynamic completion.
Simplicity leads to stability. One of the properties we would claim is that the prover is monotonic, in that adding lemmas, simplification rules, or unsafe rules does not cause the procedure to fail when before it had succeeded. Likewise the prover is robust in the face of minor changes to definitions and so on. We would not claim that the procedure is efficient.
6.2 Completeness
There are several sources of incompleteness in our automation. Provability in FOL is in general undecidable. We restrict to a depth limited subset of the term universe. Whether there is a pro of involving only depth restricted terms is now decidable, but we will inevitably fail to find proofs that involve terms lying outside our restricted subset. This is a source of incompleteness. However, we may successively increase the term depth and in this way regain completeness at a cost to decidability.
FOL can be embedded in equational reasoning [McK75], so that one expects that equality reasoning manifests similar problems to general proof search. Our approach is to restrict ourselves to terminating and confluent sets of rewrite rules. Since completion may fail, or may not terminate, this is a source of incompleteness. Moreover, we employ completion dynamically, which can similarly fail, although in practice this was not a problem. However, completion is largely handled interactively by the user. If the user ensures that the set of rewrite rules is confluent and terminating, and if dynamic completion during proof search always succeeds, then our equality handling will be complete.
6.3 Efficiency
The current success of automatic methods rests largely on unification, whereas we use unification only trivially, when handling intro rules. In terms of proof search, our procedure will be broadly similar to pre-unification procedures, such as Gilmour's procedure, in terms of performance. For instance, the following goal5 is decidable:
(
xyz.P x y →P y z →P x z)
→ (
xyz.Q x y → Q y z → Q x z)
→ (
xy.P x y→P y x)
→ (
xy.P x y ∨ Q x y)
→ (
xy.P x y) ∨ (
xy.Q x y)
Indeed, the truth or falsity of such a formula can be evaluated in a model with just four elements. However, even though such a goal is decidable, even some modern resolution based provers struggle. Sadly, our procedure is doomed to take an extremely long time. In defense, such goals are rarely met in interactive theorem proving. If they are met, the user is of course free to call a resolution prover interactively to solve them.
In addition to basic pro of search, we pay special attention to equality reasoning. Our approach to equality rests on completion, and so is suitable only in case various conditions are satisfied. However, if these conditions hold, this approach to equality is probably as effective as exists elsewhere. Modern resolution provers employ unfailing completion, which will presumably behave not more efficiently than completion in case a complete set of rewrites exists.
6.4 In Practice
Because it is conceptually simple, the prover is very easy to use. One typically moves from one theorem to the next, and invokes the prover with the simpset and the set of previous lemmas to use, often unchanged from the previous theorem. If a proof is not found, we see the failing branch. This branch is usually readily comprehensible. Almost always (in our case study, always) one needs to add either a simp rule, or a non-trivial lemma. If the lemma is not already proved one must break out and prove it. Having added the necessary rule, the process is repeated, usually with success. Generally, we do not tailor the sets of simplification rules and lemmas to each theorem: these rules are utilised depending on the context, and it is always a good idea to use them if available, by their very nature. Against this, too many rules can cause the performance to degrade, but generally one is working in a given theory, where the number of lemmas is relatively restricted.
Occasionally one must instantiate a quantifier with a clever term, but these are naturally difficult steps. The ability to step through the proof at the user level (rather than inspecting some representation of the failed branch) is very useful, and provides great visibility if something is not working correctly.
We argued that completeness was not of overriding importance, but clearly the aim of automation is to assist the user in proving lemmas. If the automation can prove the lemmas outright, we will not complain.
In our case study we carried out many proofs. One of the main results consisted of over 250 lines of interactive proof script. These proof scripts were not transcriptions of the original proofs, so that automation could feasibly reproduce them. Our automation was able to solve each of the lemmas. We give statistics for a section of the proof concerning a lemma theorem_ 3_ nes_Mup, which is representative of the other lemmas. This lemma in turn consists of a sublemma, theorem_ 3_ nes _ Mup _3. In the following table, we record the theorem name, the lines of tactic script without our automation, the lines with our automation, and the time taken. The sublemma theorem_3_nes_Mup_3 was fed to the automation when proving theorem_3_nes_Mup.We also record what happened when this sublemma was omitted, and the main lemma attempted without this assistance, theorem_3_nes_Mup'. This involved the prover reproving the sublemma during the proof of the main lemma. In this case, a pro of was also found, although this took over 5 minutes. However, we believe this was due to inefficient implementation of the automation tactic, and we intend to reimplement this in future work. These lemmas are representative of the effect of the automation on the other lemmas in the case study. Moreover, we have applied such techniques in other areas, with similar success.
Even though the performance is poor, coordinating these pro of procedures so that such
Tactic Automated Automated
Theorem Name Lines Tactic Lines Time/s
theorem3_ nes_Mup _3 28 1 66
theorem3_ nes_ Mup 12 1 24
theorem3_ nes_Mup' n/a 1 305
lemmas could be proved is a major achievement. The fact that we are able to do so in a general way provides evidence that the procedure has good theoretical properties.
7 Alternative
The automation we developed was motivated by the problems we met during our case study. However, the fact that we were always able to complete our simpset must be counted a lucky accident. In general we can expect completion to fail e.g. for rules expressing commutativity, so that an approach to testing equality of two terms via a canonical set of rewrites appears too strong. There are ways of handling problematic situations, such as rules for commutativity, e.g. by ordered rewriting, but we would like to take this opportunity to consider an alternative general approach to equality handling.
So far, our pro of search limits the depth of terms considered. It seems very reasonable to likewise limit equality reasoning to a subset of the term universe, and employ congruence closure within this restricted subset. This introduces incompleteness of course.
Proving s = t using the axioms for equality requires, in general, terms of arbitrarily large depth. If we restrict the depth of the terms we consider, our equality reasoning will be incomplete. For instance, suppose we prove an equality s = t by a transitivity chain s = u = v = x = y = t. One can picture such a proof as shown in Fig. 1. If we restrict the depth of terms we consider to d or less, the picture might resemble Fig. 2. In such a situation, the equalities s = u = v will be derived, but the equalities x = y and y = t will not be. If no proof of s = t exists within the restricted term universe, we will not be able to deduce s = t.
Thus at any given stage of our pro of search, our equality reasoning will be incomplete. However, if we successively increase the depth of terms we consider, we regain completeness for equality reasoning.
In this setting, equational unification, wrt. the restricted set of terms, becomes decidable, and moreover can be implemented very efficiently.
Another advantage of this approach is that the representative for the equivalence class can be chosen in a user specified manner. For example, the user might choose the smallest member of the equivalence class as representative, or the smallest wrt. some lexicographic path order. This can help to keep sequents readable.
Another advantage of this approach is that it is simple for the user to understand, and moreover dovetails well with our approach to pro of search. The focus on a single measure of complexity, i.e. term depth, over several procedures, is a unifying step. Moreover, the disadvantages and incompleteness of the approach via completion are absent here. We leave the implementation of this alternative to future work.
Term depth
s u v x y t
Term
Figure 1: Equality Proof by Transitivity
d
Term depth
s u v x y t
Term
Figure 2: Equality Pro of by Transitivity, with Depth Restriction
8 Conclusion
We have presented the automation we developed to tackle a large case study. The automation is tailored to interactive use. The automation is based on the integration of a pro of search engine and simplification, relying essentially on completion.
The contributions of this section are to address the failings of current automation in an interactive setting, in terms of basic pro of search, and its integration with simplification. Our techniques do not step outside the realm of predicate logic with equality, but even in this very restricted domain, we encountered numerous problems. On the other hand, having solved these problems, we had no requirement for domain specific automation. We do not deny that in areas like linear arithmetic, such procedures are necessary for effective automation. However, our experience suggests that simplification and proof search are a very effective combination. This may be an instance of the phenomenon in logic that "a little goes a long way".
In the previous sections we noted several possibilities for future work, which we here summarize.
• The theoretical issue of completeness of intro rules should be addressed, although this is unlikely to have much of a practical impact.
• The issue of how lemmas are incorporated during proof search should be addressed. Craig's Interpolation Lemma suggests the naive approach based on syntactic features
shared between the current goal and a lemma: if a lemma and the current goal are expressed in different languages, then the lemma is not useful in order to prove the current goal. Craig's Interpolation Lemma holds for FOL, but it may be possible to extend the idea to richer theories.
• Completion may be extended to full completion ` a la Huet. Conditional completion should also be implemented so that it copes with common cases, and supporting theory should be developed. Further, the implementation of completion should be parameterised by a notion of equality which is not identical with HOL's inbuilt equality, in order to support different equality notions. An additional task is to integrate good automatic termination provers, such as AProVE [GTSKF04] with the completion procedure, so that user interaction is minimised.
• The interface for the automation should be extended so that the user is not swamped with to o much information.
• The automation as a whole needs reimplementing in order to increase the efficiency of the tactic.
• We intend to implement the alternative approach to handling equality via congruence closure and link this with proof search.
This covers future work arising as a direct consequence of this work. We now consider other opportunities which this work has uncovered.
Although we believe our approach largely solve the problem of proof for a given lemma, still the choice of lemmas and definitions is largely a matter of art. In the case of lemmas, we typically accumulate them on the boundaries of theories, then look for general lemmas that cover several of these as instances. This strengthening is reminiscient of the strengthening of inductive hypotheses.
We also found that in some areas, there was an explosion of non-trivial lemmas on the boundaries between theories, which could not be reduced by generalising. This typically occurs for common mathematical objects, such as trees. Many of the lemmas are intuitively immediately plausible, but their proofs often involve considerable e ort. In this case, it seems hard to control the spread of the subtheory into the main theory, since the lemmas of the subtheory are used so pervasively in the main theory. So here we have three related problems, namely
• Explosion of non-trivial and independent lemmas required from a subtheory.
• Lemmas intuitively plausible, hard to prove.
• Lemmas permeate main theory, reducing modularity and making the main theory
heavily dependent on the subtheory.
There seems to be no way to reduce the explosion in the number of lemmas. In terms of proof, we note that the difficulty arises from the need to state induction lemmas carefully, so that this is primarily a failure of automation of inductive proof. We also note that most of the definitions in this area are implicitly executable, so that there is scope for some form of model checking via execution of the definitions. This in turn could be used to tip off an inductive prover as to the truth or otherwise of conjectured lemmas.
We have not touched on the subject of automating induction, or other higher order
features. We have more to say about induction in Sect. ??, but suffice it to say this seems like a difficult area.
We would also like to note the difficulty in reasoning about combinations of tactics. We feel that recent moves [?] towards a calculus of tactics could prove interesting. We suggest that relatively trivial steps, such as ensuring that tactic scripts are tree structured rather than linearly structured, and incorporating proper handling of parameters in proofs via lambda binding at the ML level, can improve robustness of scripts. However, beyond this the way is unclear.
There is much related work. Automatic theorem proving is already a vast field, and we cannot hope to survey it in a reasonable space. Our primary source was the two volumes of the Handbook of Automated Reasoning [RV01]. We note the prevalence of model theoretic methods over proof theoretic methods, which can make implementing some techniques problematic. Our work is aimed at addressing the needs of automation in an interactive setting. Relevant here is the PhD of Syme [Sym98] who addresses the needs of automation in a declarative setting. Much of his conclusions apply in an interactive setting. Our analysis of the problems of automation in an interactive setting draws on his work.
編號:
畢業(yè)設(shè)計英文翻譯譯文
題 目: Automation
院 (系): 計算機系
專 業(yè): 自動化
學(xué)生姓名: 李 林
學(xué) 號: 201300022
指導(dǎo)教師: 王 超
職 稱: 實驗師
2005 年 6 月 3日
自動化
湯姆里治
2005 年四月 12 日
目 錄
1 介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
2 需求. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
3 目前交互式證明器的自動化. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
4 技術(shù). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1 證明的搜尋. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
4.1.1邏輯系統(tǒng). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
4.1.2引進規(guī)則. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
4.2 等式. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
4.2.1改寫. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
4.2.2條件的簡單化. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
4.2.3完成. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.4動態(tài)的完成. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
4.2.5方程式的統(tǒng)一. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
5 連接與整合. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
6 評估. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
6.1 評估生產(chǎn)需求. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
6.2 完整性. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.3 效率. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
6.4 實際應(yīng)用. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7 替代選擇. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8 結(jié)論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
1 介紹
自動化可能是成功的機械化的關(guān)鍵。在一些情形中,機械化不需要自動化就可以實行。確實,在高度抽象的數(shù)學(xué)區(qū)域,大部分由使用者拼出的復(fù)雜證明組成的機械化推論遠(yuǎn)遠(yuǎn)超過了那些目前能被自動化解決的范圍。在這一背景下,如果自動化它被全部使用, 將指導(dǎo)在容易的可以解決的被緊緊地限定的次問題上。一個機械化的典型例子是我們的拉姆齊定理的形式化。另一方面,在推論相對被限制的地方,自動化能富有成效地在確認(rèn)類型的證明中被應(yīng)用,但是絕對程度的細(xì)節(jié)將使一個非自動化的機械化不可實行。
許多人已經(jīng)花費了數(shù)年來發(fā)展全自動系統(tǒng)。比如Vampire [VR] 和otter [McC]全自動系統(tǒng)。我們可以和這樣的系統(tǒng)競爭的設(shè)想是愚蠢的。它們的執(zhí)行是一種方式,這種方式超過目前在交互式定理證明器被實行的系統(tǒng)。這些計劃正在進行是為了把這些系統(tǒng)和交互式定理證明器相連。這是極其有價值的工作:如果知道一個一階的陳述是可證明的,這時應(yīng)該期待機器能提供一個證明。
在這一節(jié)中,我們簡略說明一些我們已經(jīng)在各種不同的情況應(yīng)用研究的技術(shù)。自然地我們不企圖僅此一次去解決自動化推論的問題。寧可我們把重心集中在問題,而這些問題典型地出現(xiàn)在我們已經(jīng)涉及的研究。我們首先簡略說明我們需要的自動化的引擎功能。我們?nèi)缓竺枋鑫覀儜?yīng)用的技術(shù),而且他們是如何整合的。我們評估根據(jù)我們的需求性質(zhì)上地產(chǎn)生的引擎,和數(shù)量地有關(guān)于一件相當(dāng)大的案例研究。這些技術(shù)中的少數(shù)是新奇的,寧可,我們企圖用一種適宜的方式來融合現(xiàn)行的技術(shù)。
這些步驟在HOL啟發(fā)定理證明器被發(fā)展,這是我們建立的設(shè)計原型的一輛優(yōu)良車輛的不同方法。
2 需求
我們的自動化的需求是什么? 讓我們區(qū)別一下自動化和全自動化的使用,以及交互式自動化的使用,每一種的需求都有非常地不同。
也許料想不到地,失敗的自動化證明引擎是基準(zhǔn),是觀念,這觀念是當(dāng)交互式發(fā)展的復(fù)雜證明使我們花費大多數(shù)時間對 " 幾乎 " 的可證明的義務(wù)的時候。因此我們想要證明器給我們完美的反饋作為為什么任務(wù)不能夠被執(zhí)行的原因。
這引證強調(diào)一種在自動化和交互式證明之間的重要的不同。在自動校對中,可典型地知道,目標(biāo)是可證明的 ( 或至少,非常強烈地懷疑,而且在結(jié)束相當(dāng)多量的時間之前準(zhǔn)備等候證明的搜尋)。的確,自動證明器被判斷在他們能實際上證明多少可證明的目標(biāo)之上。在交互式證明中," 我們花費我們大多數(shù)的時間在幾乎可證明的義務(wù)上"。這是交互式和證明之間的不同。如果我們花費大部份的時間嘗試去證明簡單但不可證實的目標(biāo),然后校正的搜尋完全變成比較不重要。這雖不能說是它全部失去重要::如果一個系統(tǒng)缺乏完整性,這時它將會無法證明一些可證明的目標(biāo)。什么類型的目標(biāo)正在被放棄,這是非常必要而且主要去知道的,這是為了知道當(dāng)一個證明器失敗于證明一個目標(biāo)時就能理解它是什么意思。這些知識也非常有用,當(dāng)混合系統(tǒng)為了了解作為整個系統(tǒng)的行為,應(yīng)當(dāng)首先了解部分的行為。
在一個交互式的環(huán)境中,在完全之上所有物可能被偏愛,就我們而言,自動化的最重要方面是簡單化。 而這些我們不是意味著去落實簡單 ( 要布多少根線來實現(xiàn)系統(tǒng)? 等等),而是概念上的簡單化。例如,簡單化到處被用于交互式定理求證。如果一組改寫規(guī)則不能融合,這時去了解簡化器的行為,必須了解適用的規(guī)則順序。不用說,對于理解這是一件極端復(fù)雜的事情,而且屬于這些所有物的證明在推測上極端易碎。概念上的簡單化通過聚集和簡易裝置的終端對一個簡化器進行緊密地約束。如果一個使用者要了解系統(tǒng),概念上的簡單化很重要。如果一個系統(tǒng)的概念簡單,它將會很有希望被簡單的使用。
在一個交互式環(huán)境中,我們期待自動化失敗。為了取得進步,我們必須了解一種證明的嘗試為什么失敗,證明器一定會提供反饋;谙到y(tǒng)的決議能提供反饋,但是他們是毀滅性的 (在某種意義上目標(biāo)轉(zhuǎn)換成一種正常的形式在證明的嘗試之前開始,破壞了最初的邏輯結(jié)構(gòu)),所以反饋很難被了解 (論證失敗的點對于最初的目標(biāo)能看出明顯的不同)。較好的方式是引導(dǎo)在一個盡可能接近的方法證明到一個人類可能如何引導(dǎo)證明。在一些感覺中我們需要證明的系統(tǒng)應(yīng)該是自然的。在這情況下,如果一種證明的嘗試失敗,失敗的部分能時常被作為檢驗直接地返回給使用者。
反饋與能見度相關(guān)。時常一個使用者希望檢查失敗的證明,但是只有證明的蹤跡是可得的,它能引起一個概念上的錯誤結(jié)合: 使用者把重心集中在結(jié)果,然而蹤跡可能全部是有不同的自然。如果有許多沒有被證明的部分,這時一個使用者不可能檢查它們的全部,但是可能愿意沿著證明走。像約翰 哈里森的模式消除的執(zhí)行的自動化方法,時常利用全球化的節(jié)點優(yōu)先尋址的信息在一棵樹里搜尋一個證明。如果全球化的信息不是出現(xiàn)在結(jié)果中使用者有機會接近,它將會難以經(jīng)過通過簡單運用的自動證明器一步一次的自動化證明。當(dāng)傳導(dǎo)的搜尋正在使用全球化的信息時,自動證明器將不做出它做出的相同決定,因為它只能進入本地的結(jié)果。
許多方法現(xiàn)在被交互式定理證明器所使用,就像伊莎貝爾的blast,留下未改變的目標(biāo)如果它們失敗于證明這目標(biāo)的話。證明搜尋的自然方法期待在所有的情形中至少能做出一些進步,所以即使目標(biāo)不是可證明的,它們也能夠有所幫助。 比如,安全的步驟 (就像在許多系統(tǒng)的∧E) 應(yīng)該被實行,簡單化步驟被應(yīng)用等等。
自動化也應(yīng)該是穩(wěn)定的。在大量的證明中,一個證明時常混合交互式和自動的證明。如果被自動化返回的目標(biāo)容易根本地以目標(biāo)的微小變化改變,這時關(guān)聯(lián)的交互式證明將被毫無用處的給予,而且一定被改寫。因為這一個理由,被自動化返回的不能解決的次目標(biāo)應(yīng)該在對于最初的目標(biāo)的微小改變下穩(wěn)定。
如果一個證明能被建立,然后一開始把重心集中在使證明更加可維持的方面,就像魯棒一樣。效率是在蛋糕上的糖衣。完全也是重要的,雖然沒有必要在一階邏輯完全,但是寧可應(yīng)該有一個給定的程序解決的層次問題一個清楚的觀念。無論如何,一些系統(tǒng)的行為理論上理解是重要的,如果它接近簡單的目標(biāo)。
讓我們總結(jié)出這些觀點:
• 自動化應(yīng)該很簡單:它應(yīng)該在理論上很好地被了解,和在使用中可預(yù)期的。
• 自動化也應(yīng)該提供反饋,以至于使用者能估定一種證明的嘗試為什么失敗。
• 一個甚至比較強烈的需求是,自動化是可視的,在可視中通過證明的研究,能直接地檢查自動化的執(zhí)行。
• 自動化應(yīng)該是自然的,這是為了使證明器和使用者之間的概念差距最小化。舉例來說,自動化應(yīng)該在標(biāo)準(zhǔn)的邏輯系統(tǒng)中執(zhí)行,或者與使用者傳導(dǎo)證明的決策程度相一致。
• 時常,使用者對一個證明進行優(yōu)化有很多的安全步驟。與自動化在返回前至少做出改進相比,自動化簡單的返回“可證明”與“不可證明”是沒什么用處的。
• 自動化應(yīng)該是穩(wěn)定的,以至于在自動化的行為中理論上微小的改變不會產(chǎn)生大的變化。這是很重要的,因為交互式的證明包含大量隔行掃描使用者或自動化步驟的環(huán)節(jié)。
• 自動化也應(yīng)該是魯棒的,在意義上微小的變化不會影響自動化的可證實性。
• 我們想要自動化對明顯的理論是有效率的。
• 最后,我們想要自動化是完全精密的。她能很好的定義問題的層次。
3 目前交互式證明器的自動化
目前交互式定理證明器的自動化證明方法不符合先前環(huán)節(jié)的需求。
Isabelle/ HOL是目前作為被認(rèn)可的HOL執(zhí)行的代表。 主要被使用的技術(shù)是一個描述的證明器blast,和簡化器simp。這些都混合在單個的自動化策略里。 Isabelle也包括一個模式消除程序,它和blast使用起來有點相似。
Simp 被廣泛地使用,但是它并不是一個完全的一階證明方法。然而,它能滿足許多前期環(huán)節(jié)需求的事實解釋了為什么它這么流行,在交互式證明中。blast方法在肯定邏輯和組的簡單問題上運行得很好,但是它很難了解它的特性是什么。它基于一個證明器leanTAP,那對一階邏輯是完全的。另一方面,下列的目標(biāo)稍微可證明在一階邏輯,但是blast不行。
f( g a)=a┝З y •g(f y)=y
它甚至更加的混亂,但它在區(qū)間 g a 實現(xiàn)的時候,這區(qū)間在結(jié)果本身中發(fā)生,并是存在的證明。更糟的是,當(dāng)像HOL 的類型邏輯被表達的時候,g a 是唯一的區(qū)間發(fā)生在可以使用的正確類型結(jié)果中。這一個失敗對于有效地處理等式邏輯是一個描述類型程序失敗的合并統(tǒng)一。因為它是這么的不受限制,所以自動方法幾乎從不被用于交互式證明的中間。而且,它是不穩(wěn)定的,在它產(chǎn)生的子目標(biāo)中在對目標(biāo)的較小變化之下可能是野性地不同的,它的提出是無用的除了在緊緊地被控領(lǐng)域。
整合一個現(xiàn)代的解決證明器Vampire進入Isabelle將沒有必要糾正這些錯誤;诮鉀Q的方式提供了一點支持對于交互式理論證明器,因為子句形式的減少意味著使用者有很少的可觀性進入證明的搜尋,而且很少知道為什么一個特殊的輔助定理會失效。證明的搜尋是一個本地的向前的綜合的搜尋,與一個全球化的向后的綜合搜尋的典型描述介紹以及Isabelle的策略水平相反。因此,對于使用者,解決是一個不自然的系統(tǒng)。通常從這些系統(tǒng)來的反饋是極差的甚至不存在的。此外,許多步驟應(yīng)該被認(rèn)為是安全的 (應(yīng)用一個終端和聚合的改寫組,運行一個∧ E 步驟) 因為系統(tǒng)被當(dāng)作一個黑盒子使用,或目標(biāo)率直地被解決,或者更通常,一點也不解決,而且系統(tǒng)包含的關(guān)于可靠應(yīng)用的步驟被丟失的信息,留給使用者可以手動的應(yīng)用這些步驟。這里的問題是系統(tǒng)無法做出改進那些無法直接證明的目標(biāo)。
我們也作關(guān)于一階理論證明器的以下觀察。這些證明器被設(shè)計去尋找大量的數(shù)量詞實例,尋找可選擇地系數(shù)平等推論。然而在交互式定理證明器里的自動化的失敗沒有失敗于猜測大量的數(shù)量詞實例。
4 技術(shù)
在下面這節(jié),我們描述我們?nèi)绾谓⒆詣踊J紫,我們正在嘗試尋找證明,所以我們將會需要 ( 至少) 證明的搜尋引擎。然后,我們在很大的程度上愿做用手動的方式的模型證明。除了證明的搜尋之外,在手動證明中其他主要的利用方法是簡單化。我們描述我們怎樣討論簡化器。
有條件的簡單化是一種簡單化的形式,它的簡化器采用遞歸的方式企圖解決可能被應(yīng)用的當(dāng)前目標(biāo)的改寫選擇。我們討論正采用在這些目標(biāo)上的簡化器沒有被最好的運行,并建議替代選擇。
然后,當(dāng)使用簡單化的時候,它是一個很好的方法去確定簡單裝置是匯合的和終端的。我們描述完成的執(zhí)行。當(dāng)使用簡單化的時候,然而,也想利用在改寫一個證明的過程中產(chǎn)生的假設(shè)。確定改寫規(guī)則的組仍然是聚合的和終止的,就應(yīng)該在證明搜尋時提供一些動態(tài)完成的形式。我們討論我們?nèi)绾巫プ×诉@些議題。
4.1證明的搜尋
在這節(jié)中,我們描述我們證明搜尋的基本系統(tǒng)。這是一個單一的結(jié)論,對向后的證明搜尋是適當(dāng)?shù)闹庇^系統(tǒng)。我們避開統(tǒng)一的流行的區(qū)間計算。這有一些有趣的結(jié)果當(dāng)混合證明搜尋時用其他的方法。注意我們只對自動化的一階系統(tǒng)感興趣。
4.1.1邏輯系統(tǒng)
證明搜尋的二個主要系統(tǒng)是決議和描述。因為一個交互式環(huán)境的自動化,決議太不自然。因為這一原因,描述是更有希望的。我們因此限制我們的注意在基于描述的系統(tǒng)。描述系統(tǒng)典型地藉由否定目標(biāo)而且尋找矛盾著手進行。我們甚至考慮到這是不自然的。我們因此集中在標(biāo)準(zhǔn)邏輯系統(tǒng)中直接執(zhí)行的描述系統(tǒng)。
使用這樣一個系統(tǒng)根據(jù)復(fù)雜的執(zhí)行有重要的意義,因為我們能只是簡單地在 HOL 目標(biāo)的水平搜尋使用或多或少標(biāo)準(zhǔn)的策略。這也對自然化有有利條件,因為使用者已經(jīng)熟悉一個如此系統(tǒng)的證明和反饋,因此我們能直接地對使用者呈現(xiàn)失敗的部分,這些使用者沒有翻譯出來自一些其它可能的證明系統(tǒng)的結(jié)果。
單一結(jié)論系統(tǒng)對古典的證明是不適宜的,古典的證明被多樣結(jié)論系統(tǒng)更好的支援。然而多樣的結(jié)論系統(tǒng)是不自然的。如果我們對古典的證明感興趣,我們一定承認(rèn)我們的單一結(jié)論系統(tǒng)有缺點。另一方面,我們相信大規(guī)模的證明結(jié)構(gòu)大概是直觀的:我們典型地集中于一個目標(biāo),并且我們的輔助定理服務(wù)作為證明中的中間觀點,和我們的補助定理如證明的中間點服務(wù),即├A٨B→ C作為一個補助定理對├ ┑(A ٨ B ) ٧C不是典型地相等,因此我們期待在證明中達到一個點,這里A ٨ B 是可證明的,而不是去做一個可分的拆開。確定地對于我們感興趣的領(lǐng)域,所有的證明本質(zhì)上是直觀的,以及古典的推論被限制到可能被簡單化或其他的限制技術(shù)處理的小區(qū)域。
4.1.2引進規(guī)則
我們希望做出更多的進步。雖然程序簡述說明上方是完全的,它寧可是一邊的。既然大多數(shù)的證明大量利用輔助定理,對應(yīng)的目標(biāo)傳輸容易變成相當(dāng)擁擠。而且,當(dāng)引進規(guī)定的時候,許多輔助定理自然地被瀏覽,也就是說,趨于使用精煉的結(jié)論 C 而不愿使用從Г開始向前連接著的結(jié)論。因為這些理由,我們也利用引進規(guī)則à la Paulson[PNW03]。這些典型不安全,但是如果我們再一次返回,而且輪流其他的防備,完全性就能被保存。這些引進規(guī)則能像輔助定理一樣被執(zhí)行,但是被作記號作為引進規(guī)則。在證明的搜尋時候,這些輔助定理的結(jié)論被相配對抗當(dāng)前的目標(biāo)。如果結(jié)論與當(dāng)前的目標(biāo)相配,那么目標(biāo)將被選擇的引進規(guī)則所代替,而且在事件中這些假定不能夠被證明,那么返回將被采用。如果一條引進規(guī)則標(biāo)明是安全的,這時返回不發(fā)生。注意,這些提高沒有必要來自理論的觀點,而且它的唯一動機是做模型自然形式的證明。
如果完全性以特定的方式被考慮,這時候能夠看見完全性的保存。如果這些規(guī)則是行為表現(xiàn)所預(yù)期的,則在簡單化之前需要等式的統(tǒng)一。然而,目前我們用手動的方式來模擬等式統(tǒng)一的使用。
完全性失敗對于引用規(guī)則:如果我們以一條規(guī)則作為一條引進規(guī)則,這時我們無法是完全的? 不過這不是真實的,而且規(guī)則是安全的,在哪一情況我們都很順利,或不安全的,在哪一情況我們返回。需求的統(tǒng)一意味著我們能避免我們能避免作為增加的假設(shè)而使用的規(guī)則。然而,如果在一些階段我們能以規(guī)則作為一條引進規(guī)則,這里有一個引進規(guī)則被打算使用的環(huán)境,但是沒有連接鏈,然而當(dāng)一項額外的假定可能有時這里將沒有統(tǒng)一的問題。
而且,自從我們沒有使用一個作為標(biāo)準(zhǔn)的輔助定理的規(guī)則,使用完全化的概念將不是很清楚。讓我們定義完全化的生產(chǎn)一個引進規(guī)則去了解是否有一個證明作為一個輔助定理的規(guī)則。這時一個使用規(guī)則的證明作為一條引進規(guī)則。在平等面前,我們應(yīng)該因此使用統(tǒng)一去確定我們的引進規(guī)則能被無論在那里都可能應(yīng)用。既然在每個階段我們有一個有限的場地區(qū)間組,與一個終端和匯合的改寫次序,這樣一個方法能夠可以實行,即使在現(xiàn)階段我們只能用人工模擬這樣的步驟。
引進規(guī)則的使用會帶來復(fù)雜化,因為完全化是否適合不清楚。如果引進規(guī)則能作為一個正常的輔助定理使用,這時完全化是可以保留的,因此輔助定理被操作就像它是個正常的肯定邏輯。在一條引進規(guī)則后面的計劃是去平衡證明的搜尋,這會以別的方式主要地在左邊上操作。在規(guī)則├ A → B作為幾個輔助定理和作為一個引進規(guī)則之間的差異,是自動化能應(yīng)用→ L與一個作為輔助定理的規(guī)則相結(jié)合。因此作為一個引進規(guī)則,我們期望目標(biāo)最終成為B和它能用A代替這些。
4.2 等式
4.2.1改寫
改寫是由用相等的子術(shù)語更換一個術(shù)語的變換過程。例如,我們可能使用輔助定理f (a) = a去改寫Q f (f (a)) 到 Q f (a)然后到Q (a)。在這節(jié)中,我們的各種改寫概念是非正式的。對于更多的信息讀者可以請教優(yōu)秀者。
在用辭上的一個注解:一個簡單化的次序是一個改寫被限制的次序,過去一直用來證明一個區(qū)間改寫系統(tǒng)的終止。然而,“簡單化”這個詞時常被非正式地用于提到改寫,而且“ 簡化組”這個詞時常用來提及改寫的系統(tǒng)規(guī)則。我們下面采用這些非正式的用法。
兩個主要的簡單化財富是終止和聚集。沒有聚集,了解簡化器的行為就必須知道改寫被應(yīng)用的次序。這太多而不是使用者所期待的。在一個交互式環(huán)境中終止非常有用,例如,它允許反饋。而且,終止允許簡單化與主要的證明的搜尋相整合。如果我們有聚集和終端,這時每一個區(qū)間有一個獨特正式形式,而且嘗試的等式變得可決定。這是一個極其有用的簡化組的財富。
我們的基本策略是熱心地應(yīng)用簡單化,在證明的搜尋每個步驟之后,使用假定去簡化其他的假定和結(jié)論。我們用一個有終止和聚集的簡化器工作。我們檢查終止和聚集的手動使用完成。我們在個體的類型中應(yīng)用簡單化。如果簡化組正在終止和聚合,這時它保存了證明搜尋的完全性。我們也在布爾類型應(yīng)用簡單化,但它不是一階邏輯的一種可能性。例如,我們采用高階對微小范圍的數(shù)量詞改寫。不是太困難而無法爭論而這也保存證明的搜尋完全。經(jīng)常相似改寫的假定在證明期間出現(xiàn),而且這些可能被簡化組合并。給一項假定
x.P x和假定 P t,就可以簡化 P t=,是再一次在布爾類型的簡化,而不是一階的操作。我們不以這一種形式的假定作為改寫:我們的對數(shù)量詞示例的基本途徑經(jīng)由區(qū)間計算,而且使用這些簡單化會破壞這方式的完全。我們使用假定形式x= y(可能數(shù)量化)來改寫。 為了保有聚集和終止,我們完成簡化組的生產(chǎn)并動態(tài)的產(chǎn)生改寫。因為完成沒被保證結(jié)束,完成被限制到一個步驟的確定數(shù)字,雖然在我們的應(yīng)用中,完成總是成功。
4.2.2條件的簡單化
在這節(jié)中,我們描述有條件的簡單化,并且注意到標(biāo)準(zhǔn)的途徑去解決遞歸環(huán)境下簡化器的實施可能沒有被期待。我們建議一些其他的方法,而且討論其他的有條件的簡單化的問題。我們得出結(jié)論,目前有條件的簡單化太復(fù)雜而無法作為 " 簡單的 " 技術(shù)計算,并且的確太復(fù)雜而無法以一合理的方式與證明的搜尋相連接。我們建議考慮有條件的改寫就如平常的輔助定理的一樣的替代選擇。
簡單化的工作在于對形式a = b的改寫,在目標(biāo)的任何地方改寫a到 b。有條件的簡單化的工作在于對有條件的形式├A→a = b(A是一典型的連接)的改寫。在一個依次的Г├ C 中,如果我們能證明﹃C, Г的話,我們就能在目標(biāo)里調(diào)整a到b的改寫。
我們應(yīng)該如何嘗試去證明條件A? 假如我們正在用簡單化連接一個完全的證明搜尋系統(tǒng)。在這情況下,非常好的 (從一個理論上的立場) 行為會被由需求獲得,如果A是可證明的,那么它實際上是被證明的。因為對A的證明的搜尋是潛在的沒有終止的,我們這時應(yīng)該妥善地安排一個復(fù)雜的隔行掃描主要證明搜尋的過程,與子證明搜尋企圖去解決有條件的改寫情況。一個如此方式是不適口的不只因為效率的原因。
簡單化的好處之一是,不像證明的搜尋,它正在終止。如果當(dāng)采用有條件的簡單化的時候,我們希望保留這一終止的行為,當(dāng)企圖證明條件的時候,我們應(yīng)該遵行被涉及的非終止。一個我們應(yīng)該采用的主要證明搜尋是去證明條件,但是限制對特別深度的搜尋。這也是不適口的。為了在一個合理的時間里保持簡單化的運行,這深度應(yīng)當(dāng)非常小,因為條件的產(chǎn)生是很頻繁的。這些小的深度不能提供更多的條件解決辦法比其他簡單的方法。對于一個小深度的有限制的證明搜尋,這也意味著它的理論限制也能成為實際的限制:任何的有限制的證明搜尋將會是不完全的,但是由于大的深度我們可以有希望的忽略這樣的不完全。小的深度意味著我們可能面臨的條件能被解決的條件,這條件對于使用者是很清晰的,不過被限制的搜尋深度意味著情況沒有被解決。根據(jù)我們的經(jīng)驗,深度必須至少有10個邏輯的步驟 ( ∨ E 及其他。),連接中間有效的等式推論(我們想像一個情節(jié)我們有一個有效地處理等式的證明搜尋)。在實驗方面,根據(jù)簡化器的反應(yīng)時間執(zhí)行這些方法變得不實際,在深度4之后。在將來,這些方法對于電腦將變得是可實行的,雖然目前它不能。
一個使用有限制的證明搜尋的替代選擇是為了解決從一個條件去確定一個可接受的條件類別,而且接受一些將會有可能是可證明的條件,但是它將會在外部被給的類別上失敗。明顯的選擇是提議的邏輯。然而,有條件的簡單化的問題擴充多深入地超過偶然的失敗證明一種提議有效的條件,如同我們稍后討論一樣,所以這方式不能產(chǎn)生任何很大實際的效益,雖然行為會至少在理論上有好的理解。
被采用的 HOL , HOL 啟發(fā)和 Isabelle 的目前方式,是在條件上激發(fā)簡單器它自己。這些遞歸的召集是潛在的非終止 (有條件的簡單化極端地傾向于成環(huán)),以至于被激發(fā)的遞歸簡化器的時間是限制的。這些深度搜尋的限制當(dāng)解決條件遭受同樣的問題,這些問題在先前的段落被討論過的。方式在許多方面比上面的很差,因為額外的簡單化所有物有必要了解它的行為,當(dāng)解決條件時,可能是不清晰的甚至在提議的條件上,讓單獨的情況來涉及述語。對于一個典型的目標(biāo),我們只需要簡單化能在一個簡單了解的方法方面有進步。對于條件,我們的需求通常是體檢可能的話,就被解決。簡單化不是完全形式的證明,因此,我們希望,我們的情況將會是以致于那簡單化能經(jīng)常很好地運行。然而,簡單化不能夠證明所有的前置詞條件以至于它的行為甚至在提議的水平是難了解的。在數(shù)量詞推論上,甚至在最基本的水平,失敗是基準(zhǔn)。
因此,當(dāng)求證條件時,目前的方法無法足夠地以非終止的問題處理。而且條件的級別可被簡單化解決的很難去描述,而且不是簡單的適合使用者去理解。
4.2.3完成
完成是一個能確定簡單組是聚合還是終止的程序。聚合與終止是我們要求簡化組具備的兩個主要所有物,所以完成是當(dāng)采用簡單化時一個有用的工具。我們討論我們的完成使用。
在群的公理理論上使用完成的標(biāo)準(zhǔn)例子,源自一組終止和聚合的改寫對于這一個理論構(gòu)成一個標(biāo)準(zhǔn)的改寫系統(tǒng)的規(guī)則,展示了完成是有力的技術(shù)。在我們的工作中,這樣的動力是從不被需要。我們實現(xiàn)了我們發(fā)現(xiàn)的非常有用的基本完成。因為我們的需要是被限制的,我們沒有實現(xiàn) Huet 的完全完成程序,雖然在不同的設(shè)定中這是令人想要的。我們放下這些是為了將來的工作。
我們的完成程序也允許有條件的完成。當(dāng)工作在一個鍵入設(shè)定的時候,時常希望在一個給定的類型中提及個體的一個子集。然后介紹一個述語的子類型,而且限制語句只談?wù)撟宇愋偷慕M成。
4.2.4動態(tài)的完成
我們先前提到了它時常是一個希望以假定作為改寫的例子。清楚地說,假定是動態(tài)的并且在證明的程序中各處改變。雖然我們可能開始就錯在改寫的簡單化規(guī)則的組,如果我們使用假定,它可能很好的作為我們的基本簡化組,增大假定,不再是聚合和終止。如果我們愿保存一個正在共同聚合和終止的簡化組的所有形式,這時我們在證明的過程期間必須運行一些完成形式。我們需要這樣的動態(tài)完成,因為它一定能在證明的過程期間被很好地運行。
不幸地是,完成是一個相當(dāng)昂貴的過程。而且,它大體上非終止的。然而,當(dāng)看到各種規(guī)則類型被使用的時候,典型地它們是一種非常簡單的形式。舉例來說,我們時常獲得與假定類似的a = b ∨ a = c,然后著手進行一個可分的除去。 產(chǎn)生的目標(biāo)有假定a = b 和 a = c,我們愿作為簡單化的規(guī)則來使用它。多數(shù)的這些簡單化規(guī)則是基礎(chǔ)的。在這情況下,完成被保證去結(jié)束,應(yīng)該有一個方法發(fā)生短路的完成程序去得到一個能更快速地運行的相等程序。希望,如果我們能微小地減少完成的一條到另外一條規(guī)則的改寫,這時希望改寫程序相對地充分運用( 當(dāng)然地,這是伊莎貝爾空想的簡單化的情形,而且 HOL 光簡單化也在某些程度上是充分運用的) ,而且執(zhí)行將會是可接受的。
有一些文件覆蓋了基本區(qū)間的完成。我們看 [GNP +93]。這一個運算法則帶來了一個基本的改寫規(guī)則組,而且在多項式產(chǎn)生一個減少的權(quán)威改寫系統(tǒng)的時間。不幸地是,雖然過程在多項式時間里操作,但是它利用合適的終止。合適的終止在HOL啟發(fā)或Isabelle 中沒有被執(zhí)行,而且在及時上可能有一個沒有充分運用的版本相對緩慢是多項式的(即,雖然來自[ GNP+93] 的運算法則是多項式的在及時上,這里會有一個比例的高常數(shù))。當(dāng)然,在證明期間做一個增加的變化給等式組,因此,在這某種程度希望有一點提高,但是我們?nèi)匀粦?yīng)該有能夠做更多的感覺。
看一下我們使用作為改寫的假定,它是清楚,超過基本的,他們甚至有比較簡單的自然:沒有左手邊的子區(qū)間被用于任何其他的改寫規(guī)則(比如,假定a = b,全然沒有左手邊的一個適當(dāng)子區(qū)間,而且我們從沒有另一項假定形式a = d同時發(fā)生的情況)。在這個例子中,它足夠為了要維持聚合和終止,用這一條規(guī)則改寫所有其他的簡單化規(guī)則。如果我們需要在線完成這些沒有被滿足的條件,我們會依靠我們的 (被限制的時間) 完成程序,但是在我們的個案研究中這不發(fā)生。當(dāng)在基本完成的正確證明的上下文中考慮的時候,這一個結(jié)果是清晰的。然而,在我們的個案研究期間有若干的復(fù)雜情節(jié),沒有這最佳化,導(dǎo)致我們的證明器花了特別久的時間。
4.2.5 方程式的統(tǒng)一
我們的最高水平的證明搜尋基于列舉區(qū)間上升到一個確定的區(qū)間深度。統(tǒng)一沒有出現(xiàn)在一個這樣的程序中。我們引用引進規(guī)則à la Paulson進入我們的搜尋程序,F(xiàn)在由相配對的引進規(guī)則對抗現(xiàn)在的目標(biāo)規(guī)則的結(jié)論被簡單的完成。
引進規(guī)則像其他的輔助定理被對待,作為補充的假定它們合并的進入結(jié)果。不過,他們作為引進規(guī)則被作記號。我們用具體的例子說明它們的從區(qū)間的列舉像其他輔助定理一樣。結(jié)論通過簡化器被改寫。在證明搜尋的每個階段,我們檢查任何引進規(guī)則的結(jié)論是否與目前的目標(biāo)相配,而且如果因此我們以引進規(guī)則的條件替換目標(biāo),返回不安全的引進規(guī)定,這有模擬等式統(tǒng)一的效果。
5 連接與整合
在這節(jié)中我們描述在早先的章節(jié)中被簡述的技術(shù)如何整合進入一個單一戰(zhàn)略,和在這一個戰(zhàn)略和使用者之間的連接。
我們的自動化包含一個描述的證明搜尋,藉由對固定的區(qū)間深度的數(shù)量詞實例化,以要求散布到簡單化。這些應(yīng)該如何被整合? 如果簡單化正在終止和聚集的,這時應(yīng)用在證明搜尋的每個步驟后的簡單化的明顯策略保存完全。這是在這兩項技術(shù)整合后的必要觀察。
策略提供了一個連接給使用者。使用過程中使用者必須敘述一組終止和聚集的改寫規(guī)則和一組引進規(guī)則的過程。事實上,使用者也敘述一組應(yīng)該在證明期間被考慮的輔助定理,而且這些只是被當(dāng)作先前描述與目標(biāo)陳述的合并。證明搜尋地進行,與安全的規(guī)則和簡單化熱心地應(yīng)用,以及在不安全規(guī)則上的返回。如果證明是不成功的,失敗部分被返回給使用者幾乎沒有安全規(guī)則,或簡單化的步驟,可能被應(yīng)用。對我們的方式一個缺點是結(jié)果相當(dāng)大,因為我們正在特定的深度下面用所有的(類型-正確的) 區(qū)間例示數(shù)量詞。它將有一點適應(yīng)使用者的連接,以使這些實例化不直接地呈現(xiàn)給使用者,但是在要求上可視的。我們離開這更進一步工作。
6 評估
在這節(jié)中,我們討論我們的證明的過程怎樣相當(dāng)于根據(jù)在第2節(jié)中被簡述的標(biāo)準(zhǔn)。我們?nèi)缓笥懻撏耆淖h題。我們沒有宣稱有一個正式的證明對完全產(chǎn)生一些問題的類別,但是我們確實討論過程有好的所有物。我們通過設(shè)置我們的技術(shù)產(chǎn)生其他的就像分析一樣的理論證明技術(shù)。然后我們通過給出的成功例子評估實際的過程使用,而且描述過程的感覺。
6.1評估生產(chǎn)需求
我們的程序與在第2節(jié)提到的標(biāo)準(zhǔn)一起被設(shè)計。我們感到它滿足簡單化的需求。在概念上,我們正在執(zhí)行一個標(biāo)準(zhǔn)直觀的證明搜尋。我們合并安全的和不安全的規(guī)則。我們在邏輯的水平(布爾類型)和領(lǐng)域的特性類型里使用簡單化。我們的簡單化規(guī)則應(yīng)該是聚集和終止的。當(dāng)改寫不破壞重要的所有物時候,我們確信任何的假定都可以使用。我們只有在限制的它們有很好的行為的例子中使用簡單 化以及有條件的簡單化規(guī)則。所有的這些觀念很容易理解的。
當(dāng)使用程序的時候,概念上的簡單化導(dǎo)致簡單化。我們應(yīng)當(dāng)確定我們的簡化組,我們的引進規(guī)則組,以及與目標(biāo)相關(guān)的輔助定理。策略在策略水平運行,因此,我們能通過激發(fā)一次一步的策略步進經(jīng)過一個失敗的證明嘗試。既然程序是非破壞力的( 在某種意義上我們直接地搜尋使用自然減除的規(guī)則,并非對正常的形式轉(zhuǎn)換),我們有高水平的可視化進入證明的搜尋。失敗部分典型地立刻被理解。安全步驟的觀念意謂,即使我們無法找到證明,我們回到使用者執(zhí)行得相當(dāng)多的我們?nèi)〉眠M步的工作。不完全顯示它本身作為一個失敗去猜測大量的數(shù)量詞實例。隨著動態(tài)完成的失敗,不完全也能在證明搜尋期間產(chǎn)生。
簡單化導(dǎo)致了穩(wěn)定性。當(dāng)在它已經(jīng)成功之前,我們會稱的證明器是單調(diào)的,在那增加的輔助定理中,簡單化的規(guī)則,或不安全的規(guī)則不會導(dǎo)致程序的失敗。同樣地證明器是魯棒的,即使對定義有微小的變化等等。我們不會宣稱程序是有效率的。
6.2完全
我們自動化有一些不完全的來源。FOL 的可證明性大體上是無法決定的。我們對區(qū)間集合的子集深度限制。是否有一個證明僅僅涉及到深度限制區(qū)間是現(xiàn)在可決定的,但是我們將會不可避免地失敗于找到我們的限制子集之外包括區(qū)間放置的證明。這是一個不完全的來源。然而,我們可能連續(xù)地增加區(qū)間深度并且這樣來恢復(fù)對可決定性花費的完全。
FOL 能在方程序的推論 [McK75] 中置入,所以期待等式推論表明對通常證明搜尋的類似問題。我們的方式是限制我們自己對終止和聚集的改寫規(guī)則組。因此完成可能失敗,或不可能結(jié)束,這是一個不完全的來源。而且,我們采用動態(tài)的完成,能同樣地失敗,雖然在實踐中這不是一個問題。然而,完成能讓使用者大量的進行交互式處理。如果使用者確定,改寫規(guī)則組是聚集和終止的,而且如果動態(tài)的完成在證明搜尋期間總是成功的,這時我們的等式操作將會是完整的。
6.3 效率
目前自動化方法的成功很大停留在統(tǒng)一上,然而我們只有稍微地使用統(tǒng)一,當(dāng)操作引進規(guī)則時。根據(jù)證明的搜尋,根據(jù)執(zhí)行我們的程序?qū)䦶V泛類似像吉爾曼這樣的前統(tǒng)一程序。舉例來說,下列的目標(biāo)是可決定的:
(
xyz.P x y →P y z →P x z)
→ (
xyz.Q x y → Q y z → Q x z)
→ (
xy.P x y→P y x)
→ (
xy.P x y ∨ Q x y)
→ (
xy.P x y) ∨ (
xy.Q x y)
的確,事實或一個公式的虛偽能在一個只有四種要素的模型中被評估。然而,即使這樣一個的目標(biāo)是可決定的,甚至于一些現(xiàn)代解決方法基于證明器的努力。不幸的是,我們的程序注定要花非常長的時間。在防御中,這樣的目標(biāo)很少在交互式定理的證明中被碰到。如果他們被遇見,使用者當(dāng)然可以用一個交互式的決議證明器去解決他們。
除了基本的證明搜尋之外,我們對等式推論給予特別的關(guān)注。我們在完成上對等式結(jié)束的方式,和在各種可變的條件是滿足的。然而,如果保持這些情況,達成平等的方式或許是有效的。現(xiàn)代的決議證明器采用無窮盡的完成,推測的行為將不再比完成一個完全存在的改寫組有效率。
6.4實際應(yīng)用
因為它概念簡單,證明器是非常容易使用的。從一個定理到下一個定理的典型移動,激發(fā)證明器關(guān)于簡化組和預(yù)先輔助定理組的使用。如果證明沒有被建立,我們會見到失敗部分。這個部分通常是很容易理解的。幾乎總是(在我們的個案研究中,總是)需要增加一個簡化規(guī)則,或一個不是不重要的輔助定理。如果輔助定理沒有已經(jīng)被證明就應(yīng)該取出并證明它。增加必要的規(guī)則,程序被重復(fù)執(zhí)行,通常是成功的。大體上說,我們沒有修整簡單化規(guī)則和輔助定理組到每個定理:這些規(guī)則被利用取決與上下文的內(nèi)容,而且總是一個良好的主意去使用它們?nèi)绻梢缘脑。相對于這些,太多的規(guī)則能導(dǎo)致執(zhí)行能力的降低,但是通常它正工作在一個給定的理論中,這里輔助定理的數(shù)量相對地被限制。
有時候必須用一個巧妙的區(qū)間例示一個數(shù)量詞,但是這些是自然的困難的步驟。在使用者的手段(并非檢查一些失敗的部分的表現(xiàn))下步進的經(jīng)過證明的能力是非常有用的,而且提供很好的可視化,如果一些東西沒有正確工作的話。
我們討論完全不是首要重要的,但是清楚地自動化的目標(biāo)是協(xié)助使用者證明輔助定理。如果自動化能直接證明輔助定理,我們將沒有抱怨的了。
在我們的個案研究中我們執(zhí)行許多證明。一個主要的結(jié)果包含超過250條交互式證明的手寫體。這些證明的手寫體不是最初的證明抄寫,所以自動化可以可實現(xiàn)的再造它們。我們的自動化能夠解決每一個輔助定理。 我們提供針對證明的一個區(qū)段的統(tǒng)計,它涉及輔助定理 theorem_3_ nes_Mup,對于其他輔助定理是代表性的。這些輔助定理依次有一個次補助定理theorem_3__Mup _3。在下面的列表中,我們記錄定理名稱,策略手寫體的組成沒有我們的自動化,組成有我們的自動化和花費的時間。當(dāng)求證 theorem_3_nes_Mup時,次輔助定理 theorem_3_nes_Mup_3 被提供給自動化。當(dāng)這個次輔助定理被省略,而且主要的輔助定理不需要theorem_3_nes_Mup協(xié)助的時候,我們也記錄發(fā)生的東西。這涉及了證明器在主要的輔助定理的證明期間的再次證明。在這情況下,一個證明被建立,雖然它需要花超過5 分鐘的時間。然而,我們相信,這是由于自動化策略的無效率實行,而且我們想要在將來的工作中再實現(xiàn)它。這些補助定理對自動化對個案研究的其他輔助定理的效果是有代表性的。而且,我們已經(jīng)在其他的領(lǐng)域中相對成功的應(yīng)用了這樣的技術(shù)。
即使執(zhí)行很差勁,調(diào)和這些證明程序以至于輔助定理能被證明是一個主要的完成。事實上我們可以用一個常用的方法提供程序有很好的理論所有的證據(jù)。
定理名稱 策略行 自動操作的策略行 自動操作的時間/秒
theorem3_ nes_Mup _3 28 1 66
theorem3_ nes_ Mup 12 1 24
theorem3_ nes_Mup' n/a 1 305
7 替代選擇
我們發(fā)展的自動化被我們在案例研究中遇見的問題所激發(fā)。然而,我們總是能夠完成我們的簡化組的事實應(yīng)該是一件幸運的事情。大體上我們能期待完成舉例失敗對于表達交換性的規(guī)則來說,所以測試兩個區(qū)間經(jīng)過一個權(quán)威的改寫組的等式。這里有一些處理問題的環(huán)境的方法,比如順序改寫交互式的規(guī)則的例子,但是我們想要這個機會考慮替代選擇一般的對等式處理的途徑。
到現(xiàn)在為止,我們的證明搜尋限制了被考慮的區(qū)間深度。看起來似乎限制等式推論到一個共同的區(qū)間子集是很合理的,而且在這里面采用適合終止限制的子集。這些介紹是過程的不完全。
求證 s= t 使用等式需要的公理,大體上,是任意很大深度的區(qū)間。如果我們限制我們考慮的區(qū)間深度,我們的等式推論將會是不完全的。舉例來說,假設(shè)我們通過轉(zhuǎn)移鏈s = u = v = x = y=t來證明一個等式s = t。這樣的一個證明被展示在圖1。如果我們限制我們考慮到 d 或更小的區(qū)間深度,圖可能與圖2相似。 在這個情況下,等式s= u= v 將會被獲得,但是等式x= y 和 y= t 將不會被獲得。 如果沒有 s= t 的證明存在與被限制的共同區(qū)間里,我們將會無法推論得到s=t。
因此在任何給出的我們證明搜尋的階段,我們的等式推論將會是不完全的。然而,如果我們連續(xù)地增加我們考慮的區(qū)間深度,我們將重新獲得等式推論的完全性。
在這一環(huán)境下,方程式的統(tǒng)一,生產(chǎn)被限制的區(qū)間組,變成可決定的,而且能非常有效率地實現(xiàn)。
這個方式的另一個優(yōu)點是給同等級別的代表能在一個使用者特別的習(xí)慣下被選擇。舉例來說,使用者可能選擇同等類型的子句作為代表,或最小的產(chǎn)生一些詞典的路徑命令。這對使結(jié)果保持易讀有幫助。
這方式的另一個優(yōu)點是使用者的理解是簡單的,而且我們在證明搜尋的方法密合得很好。聚焦在單個復(fù)雜性上的衡量,比如區(qū)間深度,在一些程序之上,是一個統(tǒng)一的步驟。而且,缺點和方式的不完全經(jīng)由完成在這里是不存在的。我們留下這些替代選擇的執(zhí)行給將來的工作。
區(qū)間深度
s u v x y t
區(qū)間
圖1: 傳遞性的等式證明
d
區(qū)間深度
s u v x y t
區(qū)間
圖2: 有深度限制的傳遞性等式證明
8 結(jié)論
我們已經(jīng)呈現(xiàn)我們發(fā)展去抓住一件大的個案研究的自動化。自動化被修整到交互式的使用,自動化以一個證明搜尋引擎和簡單化的整合為基礎(chǔ),本質(zhì)上依賴于完成。
這節(jié)的貢獻是根據(jù)基本的證明搜尋,和它的簡單化整合,在一個交互式假定里解說目前自動化的失敗。我們的技術(shù)沒有走出等式術(shù)語邏輯的范疇,不過甚至在這樣一個被限制領(lǐng)域,我們遇到了很多的問題。另一方面,解決這些問題,我們沒有對特殊自動化領(lǐng)域的需求。我們不否認(rèn),在線性算術(shù)區(qū)域中,這樣的程序?qū)τ行У淖詣踊呛苡斜匾。然而,我們的?jīng)驗表明簡單化和證明的搜尋是一個非常有效的組合,這可能是邏輯現(xiàn)象一個例證 "從一點點開始一個長的方法"。
在先前的章節(jié)中我們提到一些將來工作的可能性,我們在這里概述。
• 引進規(guī)則的完全理論的議題應(yīng)該是被追求的,雖然這不可能有許多實際的影響。
• 在證明搜尋期間,怎樣合并輔助定理的議題是應(yīng)該追求的。克雷格的插入補助定理意味著以句法的特征為基礎(chǔ)的新奇方式在當(dāng)前目標(biāo)和一個輔助定理之間被分享:如果一個輔助定理和當(dāng)前目標(biāo)以不同的語言被表達,這時輔助定理為了要證明當(dāng)前目標(biāo)是沒有什么用的。克雷格的插入補助定理為 FOL 支撐,但是把擴充的主意到較富有的理論是有可能的。
• 完成可能被擴充為完全的完成la Huet。有條件的完成也應(yīng)該被實現(xiàn),以便它應(yīng)付通常的情形,而且支持理論應(yīng)該被發(fā)展。更遠(yuǎn)地說,為了支持不同的等式概念,完成的執(zhí)行應(yīng)該被一個等式的概念參數(shù)化,這等式與HOL的嵌入的等式不同。一件額外的工作是整合好自動的終止證明器,就像完成程序的 AProVE,所以使用者交互作用是最小化的。
• 自動化的連接應(yīng)該被擴充,以便使用者不會因太多的信息而忙得不可開交。
• 自動化整體上而言為了提高策略的效率,需要再實現(xiàn)。
• 我們企圖實行替代選擇的方法和證明的搜尋去處理經(jīng)由合適的關(guān)閉和連接等式
這包括將來的工作的產(chǎn)生作為這個工作的直接結(jié)果。我們現(xiàn)在考慮其他沒有被防備的機會。
雖然我們相信我們的方式很大地解決了一個被給定的輔助定理的證明問題,但是輔助定理和定義的選擇主要是一個技巧的事情。在輔助定理的情況下,我們典型地在理論的邊界上累積它們,然后尋找一般的輔助定理,它包含了幾個這樣的例子。這些增強是歸納假定的增強的回憶。
我們也發(fā)現(xiàn)在一些領(lǐng)域中,在理論之間有一個在邊界上的非瑣細(xì)的輔助定理的爆發(fā),這在推廣上無法被減少。這典型地普通數(shù)學(xué)對象的發(fā)生,就像一棵樹。許多輔助定理直觀上似乎合理的,但是它們的證明時常包括相當(dāng)多的努力。在這情況下,似乎很難控制子理論的傳播使其進入主要的理論,因為子理論的輔助定理是如此的普遍被用于主要的理論之中。因此在這里我們有三個相關(guān)的問題,即:
•來自一個子理論的重要而獨立的輔助定理的擴展被請求。
•輔助定理的直觀似乎合理的,這很難去證明。
•輔助定理滲入主要理論,減少模組化和使主要的理論很重地依賴在子理論上。
這里似乎沒有方法減少在一些輔助定理里的爆發(fā)。根據(jù)證明,我們注意到困難從需要發(fā)生小心地說歸納法補助定理,所以這主要地是歸納證明的一個自動化的失敗。我們也注意在這一個區(qū)域中的大部份的定義暗示可運行,以至于這里有一些模式檢查的形式范圍經(jīng)由定義的執(zhí)行。這依次能用來告誡一個歸納的證明器作為事實或不同的猜測輔助定理。
我們沒有提及自動化的歸納法,或其他的更高階的功能。我們有更多要說關(guān)于在章節(jié)?的介紹。但是能夠說這可能像一個困難的領(lǐng)域。
我們也注意關(guān)于在策略混合的推論方面的困難。我們感覺到最近的提議朝向微積分學(xué)的策略可以證明是有趣的。我們建議相對瑣細(xì)的步驟, 就像確定策略手寫體是樹結(jié)構(gòu)而不是線性的結(jié)構(gòu), 而且在證明經(jīng)在ML水平的有束縛力的λ合并合適參數(shù)的處理。然而, 超過這個方法是不清楚的。
這里有很多相關(guān)的工作。自動化的定理證明已經(jīng)是一個巨大的領(lǐng)域,而且我們不能夠希望在合理的空間中觀察它。我們的主要來源是兩冊自動化推論 [RV01] 的手冊。我們注意到普遍的理論方法模式在可以執(zhí)行一些有疑問的技術(shù)證明理論方法之上,我們的工作的目標(biāo)是在一個交互式環(huán)境中滿足自動化的需要。相關(guān)的是哲學(xué)博士Syme[Sym98],他滿足了在一個陳述的環(huán)境下自動化的需要。許多的他的結(jié)論應(yīng)用在一個交談式環(huán)境。我們在對一個交互式環(huán)境的自動化問題的分析中運用了他的工作。
Powered by 單片機教程網(wǎng)