First-Order Logic

views updated

FIRST-ORDER LOGIC

First-order logic is a bag of tools for studying the validity of arguments. At base it consists of a family of mathematically defined languages called first-order languages. Because these languages are constructed to be "logically perfect" (in Gottlob Frege's phrase), we can guarantee from their grammatical form that certain arguments written in these languages are valid. Separately from this we can study how arguments in English or any other natural language can be translated into an appropriate first-order language. It was Gottfried Wilhelm Leibniz who in the 1680s first proposed to divide the study of arguments into a mathematical part and a translational part, though his notion of mathematical languages was barely adequate for the purpose. First-order languages first came to light in the work of Charles S. Peirce in the 1880s; his name for them was "first-intentional logic of relatives." It took some time to develop a satisfactory mathematical description of these languages. David Hilbert achieved this in his lectures at Göttingen in 1917–1922, which appeared in an edited form in his book Grundzüge der Theoretischen Logik with Wilhelm Ackermann. Many logicians reckon that the appearance of this book in 1928 marked the true birth of first-order logic.

Logic and Arguments

For purposes of this article an argument consists of one or more sentences of English, then the word "Therefore," and then a sentence. The sentences before "Therefore" are called the premises of the argument and the sentence at the end is called its conclusion. We say that the argument is valid if the conclusion follows from the premises, and invalid otherwise. Logic is the study of valid arguments. Typical questions of logic are: Which arguments are valid and which are invalid? How can we construct valid arguments?

We shall study these questions with the help of first-order languages. First-order languages differ from natural languages in several ways. One is that their vocabulary and grammar are precisely defined. A second and equally important difference is that they contain expressions that we can interpret in a range of possible ways, and the range is determined by the grammatical form of the expressions.

Take for example the first-order sentence
(1) ∀x (P (x ) → Q (x )).
The part in brackets,
(2) (P (x ) → Q (x )),
is read "If P (x ) then Q (x )" and for the moment we can read "∀x " as "Whatever x may be." Here there is no object named "x." Rather, "x " is a variable ranging over a class of possible objects. We call this class the domain of quantification, or more briefly the domain. The domain is not supplied with the sentence itself; when somebody uses the sentence to make a statement, we have to be told what domain the user of the sentence intended.

We also have to be told how the expressions "P (x )" and "Q (x )" are interpreted. For the whole sentence (1) to make sense, each of these expressions must translate into a predicate, that is, an English sentence with the variable "x " standing where we could have put a name; for example
(3)      x is a town in Italy.
(4)      The number 5 + 3 is equal to x.
(5)      The father of x is a pianist.
But there is a further requirement. It must be possible to paraphrase the predicate into the form
(6)      x is a member of S
where S names a particular class of objects in the domain. For example the sentences (3)–(5) paraphrase as follows:

(3)' x is a member of the class of towns in Italy.
(4)' x is a member of the class of numbers that 5 + 3 is equal to.
(5)' x is a member of the class of individuals whose fathers are pianists.

To see why this restriction is needed, consider my father-in-law Marcus Ward, who was born on July 4th. He used to reason:
(7) Americans celebrate July 4th. July 4th is the birthday of Marcus Ward. Therefore: Americans celebrate the
birthday of Marcus Ward.
This conclusion gave him constant pleasure. Unfortunately the argument is unsound. The predicate
(8) Americans celebrate x
allows two paraphrases into class form:
(9) x is in the class of events celebrated
by Americans.
(10) x is in the class of days of the year celebrated by Americans.
On the first paraphrase, the second premise of the argument breaks down; the event that the Americans celebrate is not Marcus Ward's birthday. On the second paraphrase, the conclusion holds but not in the sense that pleased my father-in-law. Requiring a translation into classes is very effective for detecting ambiguities in arguments.

Returning to the sentence (1) we can interpret "P (x )" and "Q (x )" by saying what the classes in question are. So in an obvious notation, here is an interpretation of the first-order sentence above:
(11)
On this interpretation the sentence (1) expresses that among all people now living, the pianists are musicians. Under this interpretation the sentence is true. Another possible interpretation is
(12)
Under this second interpretation the sentence (1) is false; there are musicians who are not pianists. A crucial fact about first-order languages is that their sentences may change from true to false, or vice versa, when their interpretation is changed.

There are many first-order languages. Their common core consists of the following symbols, known as the logical symbols.
(13)

The third row of the table lists the commonest alternative notations, though in this entry we will use only the symbols on the top row.

Besides the logical symbols, each first-order language has its own collection of nonlogical symbols, sometimes known as primitives. These are symbols such as "P " and "Q " above. The nonlogical symbols need to be interpreted, and the language carries a rule specifying what kinds of interpretation are allowed. The set of all nonlogical symbols of a language, together with the information what kinds of interpretation are allowed, is called the signature of the language. An interpretation of the language consists of a domain and allowed interpretations of all the symbols in the signature of the language.

Below we shall see what kinds of nonlogical symbol a first-order language can have. But before we do that, we shall define a notion that brings us back to the difference between valid and invalid arguments.

Suppose L is a first-order language and ϕ₁, …, ϕ_n, ψ are sentences of L. Then the expression
(14) ϕ₁, …, ϕ_n ⊧ ψ
means that if I is any interpretation of L and all of ϕ₁, …, ϕ_n are true under I, then ψ is also true under I. We call the expression (14) a semantic sequent, or for short a sequent. We say that (14) is valid if it is true, and invalid otherwise. The sentences ϕ₁, …, ϕ_n are called the premises of the sequent, and ψ is called its conclusion.

An example of a valid sequent, using the kinds of symbol that we have already seen, is
(15) ∀x (P (x ) → Q (x )), ∀x (Q (x ) → R (x )) ⊧
∀x (P (x ) → R (x )).
In any interpretation I, the first sentence of (15) expresses that the class assigned to "P " is a subclass of the class assigned to "Q," and the second sentence expresses that the class assigned to "Q " is a subclass of the class assigned to "R." If these two sentences are true then it follows that the class assigned to "P " is a subclass of the class assigned to "R," and hence the conclusion of (15) is true under interpretation I. So (15) is valid.

Now suppose we have an argument written or spoken in English. Suppose also that we can find a suitable first-order language L, sentences ϕ₁,…, ϕ_n of L and an interpretation I of L, such that under the interpretation I the sentences ϕ₁, …, ϕ_n are translations of the premises of the English argument and ψ is a translation of its conclusion. Suppose finally that we have a proof that
(16) ϕ₁, …, ϕ_n ⊧ ψ.
is a valid sequent. Then via the translation from English to L our proof shows that if the premises of the argument are true, its conclusion must also be true. In short we have shown that the English argument is valid.

The following example appears as an exercise in Richard Whately's logic textbook of 1826:
(17) A negro is a man; therefore he who murders a
negro murders a man.

This argument seems to defy the logical tools available in 1826; in fact some years later Augustus De Morgan challenged the logicians of his age to develop a logic that does recognize such arguments. There is no record of how Whately expected his students to solve (17), but at least for first-order logic it is straightforward. The main step is to introduce symbols whose interpretations are predicates with two variables:
(18)
(We need the variables on the left side of (18) to distinguish between "x murders y " and "y murders x.") Again we insist that there is a translation into classes. For example we can paraphrase "x murders y " by
(19) The pair of objects (x, y ) lies in the class of all
ordered pairs of objects such that the first murders the
second.
Classes of this kind, whose members are ordered pairs, are called relations. Now Whately's argument translates into first-order sentences as follows:
(20) ∀x (N (x ) → M (x )) ⊧ ∀x (∃y (N (y ) ∧ R (x, y )) →
∃y (M (y ) ∧ R (x, y ))).
Here the premise has a form that we have already considered. We can read "∃y " as "There is something, call it y, such that," and we can read " ∧ " as "and." So the conclusion says: For every living being x, if there is a negro y such that x murders y, then there is a man y such that x murders y. Now one can show that under every interpretation if the premise of (20) is true then the conclusion is true. So Whately's argument is valid.

If we fail to find a translation of an English argument into a valid first-order sequent, this does not prove that the original argument was invalid. It could be that the argument is valid but a more powerful language than first-order is needed to show this. It could be that there is a suitable first-order sequent but we simply failed to find it.

Propositional Logic

Before we define the languages of first-order logic, we should examine a simpler logic called propositional logic or sentential logic. It uses the symbols ¬, ∧, ∨, → and ↔ but not ∀ or ∃ or =.

In "classical" propositional logic we consider meaningful sentences that are either true or false. We say that their truth value is Truth (T for short) if they are true, and Falsehood (F for short) if they are false. There are also "many-valued" propositional logics that allow three or more truth values; we shall not consider these.

If ϕ is any sentence, we can form a new sentence ¬ϕ by writing the negation sign "¬" immediately in front of ϕ. We stipulate that ¬ϕ is true if ϕ is false, and false if ϕ is true. For example
(21) ¬ Today is Tuesday.
expresses
(22) It is not true that today is Tuesday.
We read the symbol "¬" as "not," and ¬ϕ is called the negation of ϕ. Likewise if ϕ and ψ are sentences, we can form a sentence (ϕ ∧ ψ), and we stipulate that (ϕ ∧ ψ) is true if and only if both ϕ and ψ are true. For example we can form
(23) (Today is Tuesday ∧ The paper is not yet finished),
and this sentence expresses
(24) Today is Tuesday and the paper is not yet finished.
The sentence (ϕ ∧ ψ) is called the conjunction of ϕ and ψ, and the sentences ϕ and ψ are its conjuncts. We read the symbol " ∧ " as 'and.'

The remaining logical symbols of propositional logic have similar explanations. They all form new sentences from old ones, and in each case we stipulate the truth value of the new sentence in terms of the truth values of the old ones. The following table records these stipulations:
(25)
We read the table as follows. Suppose for example that ϕ is the sentence "Today is Tuesday" and ψ is the sentence "The paper is not yet finished." If ϕ is true and ψ is false, then we are in row (ii) of the table. In this row there is F below (ϕ ∧ ψ), and this tells us that the sentence
(26) (Today is Tuesday ∧ The paper is not yet finished)
is false. Likewise for the other rows and formulas.

We call (ϕ ∨ ψ) the disjunction of ϕ and ψ; the sentences ϕ and ψ are its disjuncts. The symbol '∨' can be read as 'or.' But notice that (ϕ ∨ ψ) is true when both ϕ and ψ are true; so in some circumstances a safer reading of (ϕ ∨ ψ) is 'Either ϕ or ψ, or both.'

The symbol " ↔ " can be read safely as "if and only if."

There remains the symbol " →," sometimes known as material implication. The one case where (ϕ → ψ ) is false is where ϕ is true and ψ is false, and this suggests reading (ϕ → ψ ) as "If ϕ then ψ." In mathematical contexts this reading generally works well. But note that we also have T in the bottom two rows of the table below (ϕ → ψ ), so that this sentence counts as true whenever ϕ is false, whether or not there is any connection between ϕ and ψ. For example the following sentence is true on any day of the week:
(27) (Three plus three is two → Today is Tuesday)
Likewise (ϕ → ψ ) is true whenever ψ is true, and so the following sentence is also true on any day of the week:
(28) (Today is Tuesday → Three plus three is six)
These properties of " → " are sometimes referred to as the paradoxes of material implication —though really they are not so much paradoxes as puzzles about how to translate " → " into English.

The symbols "¬," " ∧," " ∨," " →," and " ↔ " are known as the propositional operators. We can build up complex sentences by using the propositional operators any number of times. But first it is helpful to introduce so-called propositional symbols
(29)      p, q, r, p ₀, p ₁, p ₂, ….
which can stand as abbreviations of any sentence. Thus we can form sentences
(30)      ¬p, ((p → q ) ∧ r ), (p ∨ ¬q ), ¬¬p ₂
and so on. The sentences of propositional logic are the propositional symbols and all the complex sentences that can be built up from them using the propositional operators. The table (25) tells us when each of these sentences is true, as soon as we know what truth values to assign to the propositional symbols in them. Take ((p → q ) ∧ r ), for example. It uses three propositional symbols. Each of these three symbols could stand for a true sentence or a false one, and so there are eight possible cases that we can list as follows.
(31)
For each row we can evaluate the truth value of ((p → q ) ∧ r ) by starting at the propositional symbols and working upwards to more complex sentences, reading values from the table (25). Thus:
(32)
On the right the columns below the propositional symbols copy the truth values from the left side of the table. The column below a propositional operator gives the truth values of the sentence formed by introducing this operator; for example the table below " → " gives the truth values of (p → q ). The numbers at the bottom of the table show a possible order for working out the columns. The final column calculated, number (v), gives the truth value of the whole sentence in each of the eight cases listed on the left. The table
(33)
is called the truth table of the sentence ((p → q ) ∧ r ).

We say that a sentence of propositional logic is a tautology if its truth table has T in every row, and a contradiction if its truth table has F in every row.

Suppose ϕ is a tautology and suppose also that we replace each propositional symbol in ϕ by an English sentence (the same English sentence for each occurrence of the same propositional symbol), creating a sentence S. Then S must be true since the truth values of the English sentences will indicate a particular row of the truth table, and we know that the value in this row must be T since ϕ has T in every row. Generally S will be a mixture of English and propositional operators. But we can translate S into a sentence S' of English, for example translating the propositional operators as suggested above but with due caution. Since S' is a translation of S, it has the same truth value, and we saw that this value has to be Truth. In short S' will be a necessary truth in English.

Here are some tautologies.

(34) ((p ∧ q ) → p )
((p ∧ q ) → q )
(p → (p ∨ q ))
(q → (p ∨ q ))
(p → (q → p ))
((p → q ) → ((q → r ) → (p → r ))).
(((p → q ) ∧ p ) → q )
((p ↔ q ) → (p → q ))
(¬¬p → p )
((p ∧ ¬p ) → q )
(((p → q ) → p ) → p )

A possible translation of the first of these tautologies into English is "If the light is broken and the switch is on, the light is broken." This sentence has to be true in any situation in which each of the sentences "The light is broken" and "The switch is on" has a truth value, regardless of what these truth values are.

Suppose ϕ, ψ and χ are sentences of propositional logic. As above, the expression
(35) ϕ, ψ⊧χ
is called a (semantic) sequent. We say that it is valid if in every case where ϕ and ψ are true, χ is also true. It's easy to calculate from the truth tables of ϕ, ψ and χ whether or not the sequent (35) is valid. As with tautologies we can translate the sentences ϕ, ψ and χ simultaneously into sentences of English, by choosing sentences to replace the propositional symbols and then paraphrasing to remove the propositional operators. The result is an English argument, if we write "Therefore" in place of ⊧. Suppose we have a proof that the sequent (35) is valid (for example, by truth tables). Then this proof shows that if the premises of the English argument are true, its conclusion must be true too. In this way we justify the English argument.

For example we can justify the English argument
(36) If sending abusive e-mails is an offense, then Smith has just committed an offense. Sending abusive e-mails is an offense. Therefore: Smith has just committed an
offence.
by proving that the following sequent is valid:
(37) (p → q ), p ⊧ q
In fact it is valid, as truth tables quickly show. Since this sequent corresponds to indefinitely many other English arguments too, we should think of it as a rule of argument rather than an argument in itself. Logicians sometimes express this point by saying that in logic we study forms of argument rather than individual arguments.

Translating between English and First-Order Logic

Translations from first-order logic to English are generally straightforward; the problem is to make the English version digestible. But for assessing English arguments we need translation in the other direction, and this can be hazardous.

noun phrases

Proper names with singular meaning can go over into constants. For example the interpretation
(38)
(with any suitable domain supplied) allows us to make the translation
(39) The Pyrenees lie between France and Spain.
R (a, b, c )
Complex singular noun phrases such as "my father" are in general more complicated to translate. In first-order logic we are allowed to use function symbols, as F in the interpretation and translation
(40)
(41)      Lloyd George knew my father.
R (a, F (b ))
But there is a catch. The requirement that first-order languages should be "logically perfect" implies that if "a " names any element of the domain of an interpretation, the expression F (a ) should also name an element of the domain. So the domain to be supplied for (40) above must contain not only me but my father, my father's father, my father's father's father, and so on. Worse still, to adapt an example from Frege, if for any reason the domain contains the moon, it must also contain the father of the moon!

For such reasons, one hardly ever meets function symbols in first-order logic outside mathematical contexts. Even there caution is needed. For example in studying number fields one would like to have a "multiplicative inverse" function taking 2 to 1/2, 3 to 1/3, and so on; but 1/0 is undefined.

A common solution is to use, instead of a function symbol, a relation symbol with one more argument place:
(42)
(43) Lloyd George knew my father.
∃x (P (x, b ) ∧ R (a, x ))
This raises a further problem: If the implication that I have exactly one father plays any role in an argument using this sentence, then our translation by (42) fails to convey this implication. Here we need to call in Bertrand Russell's analysis of definite descriptions.

According to Russell's analysis, a sentence of the form "The X is a Y" paraphrases as
(44) At least one thing is an X, at most one thing is an X,
and everything that is an X is a Y.
We can translate this directly into first-order symbols, but the following paraphrase is neater:
(45) ∃z (∀x (x is an X ↔ x = z ) ∧ z is a Y).
A major problem with Russell's analysis is that it assumes we can choose the domain of the interpretation in such a way that it contains a unique x. But there are other requirements on the domain; all the quantifiers in the first-order sentence range over it. These requirements may clash. In everyday English we use the phrase "the X" in situations in which there is one "salient" X (see for example David Lewis), and to do justice to this in a first-order translation we need to make explicit what makes a certain individual "salient."

noun phrases that contain quantifier words like "every"

We can handle some cases by paraphrasing:
(46)
(47)      Every prime number greater than two is odd.
Every object, if it is a prime number greater than two, is
an odd number.
∀x (P (x ) → Q (x ))
(48) Some prime numbers greater than two are odd.
∃x (P (x ) ∧ Q (x )).
If we wanted "some" to imply "more than one" in this example, we would need a longer paraphrase using " = ":
(49)      ∃x ∃y (P (x ) ∧ Q (x ) ∧ P (y ) ∧ Q (y ) ∧ x ≠ y )
("x ≠ y " is a standard abbreviation for "¬(x = y ).") Likewise we can express that there are at least three odd numbers:
(50) ∃x ₁∃x ₂∃x ₃ (x ₁ ≠ x ₂ ∧ x ₁ ≠ x ₃ ∧ x ₂ ≠ x ₃ ∧ Q (x ₁) ∧
Q (x ₂) ∧ Q (x ₃))
Using " = " in analogous ways, first-order logic is equipped to express things like "There are exactly ten million Xs."

conditionals

The nub of the paradoxes of material implication may be that in real life as opposed to mathematics, sentences play many roles besides being true or false. Paul Grice argued that when we make the appropriate distinction between "what is said" and "what is implicated," there remains no difference in meaning between 'If ϕ then ψ' and (ϕ → ψ). On the other side, Dorothy Edgington pointed out that
(51) ((p → q ) ∨ (¬p → q ))
is a tautology, and drew the following consequence of reading " → " as "If … then":
(52) … if I reject the conditional "If the Conservatives lose, Thatcher [the leader of the Conservative party] will resign," I am committed to accepting "If the Conservatives win, Thatcher will resign"!

She found this consequence implausible.

adverbs

Consider the argument
(53) Nadia accidentally poisoned her father. Therefore:
Nadia poisoned her father.
The obvious first-order translations of "Nadia poisoned her father" don't allow us to add further information like "accidentally" or "last Wednesday" or "with strychnine." Peirce in 1892 suggested a way around this problem, namely to talk explicitly about actions. Thus:
(54)
(55) ∃x (A (x, n ) ∧ P (x, f ) ∧ U (x )) ⊧ ∃x (A (x, n ) ∧ P (x, f ))
Donald Davidson and others have independently revived Peirce's suggestion in connection with the semantics of natural languages.

Peirce comments that his translation consists in "catching one of the transient elements of thought upon the wing and converting it into one of the resting places of the mind." This is more than idle whimsy. Peirce's point is that in order to formalize arguments like (53), we sometimes need to introduce abstract objects—in his case, actions—into the domain.

A different kind of example to illustrate Peirce's point is the sentence
(56) For every drop of rain that falls a flower grows.
(George Boolos.) Taken literally, this statement implies that there at least as many flowers as raindrops. If we wanted to make this explicit in order to draw out consequences in an argument, we would need to incorporate some set-theoretic apparatus for talking about cardinalities.

Arguments about past, present and future are another example of the same general point. Since sentences of first-order logic lack tense, the normal way to handle such arguments in first-order logic is to add points of time (or sometimes intervals of time) to the domain. Then in general we need to add to the premises of an argument some basic facts about time, for example that the ordering of time into earlier and later is a linear ordering. (One can use the axioms for linear ordering, (76) below.)

non-indicative sentences

Sentences of first-order logic are all in the indicative. They are not designed for giving instructions or asking questions. One place where this matters is the formalization of mathematical reasoning. Mathematicians often use imperatives:
(57) "Draw a triangle ABC and consider the midpoint of
the side AB."

"Assume there is a greatest prime."
"Let x be a number between 0 and 5."

First-order logic has no straightforward way of expressing these instructions. In 1926 Jan Łukasiewicz suggested we should regard such instructions as moves in a proof. For example the instruction "Assume ϕ" is an indication that we are going to prove the sequent (84) below by proving the sequent (83).

Most English sentences can be translated into first-order sentences in many different ways. The translation that we choose should be guided by the arguments that we are trying to formalize. Some philosophers have speculated that for each unambiguous English sentence S there is a first-order translation ϕ that expresses the correct analysis of S into its most primitive components. If we knew these translations we could use them to formalize in first-order sentences any valid argument in English. A difficulty with this thesis is that (as we saw) analyses of English sentences for the purpose of justifying arguments may need to bring in whole ontologies of abstract objects: Sets, actions, points of time and space. Some philosophers would add possible worlds.

Most of the starred textbooks in the bibliography below give further advice about translating from English into first-order sentences.

First-Order Syntax

The signature of a first-order language consists of symbols of four kinds, though not every first-order language has all four kinds. The kinds are as follows:

(i) Propositional symbols as in section "Propositional Logic."
(ii) Relation symbols, usually

P, Q, R, R ₀, R ₁, R ₂,.…

(iii) Individual constant symbols, or more briefly constants, usually

a, b, c, c ₀, c ₁, c ₂,.…

(iv) Function symbols, usually symbols such as

F, G, H, F ₀, F ₁, F ₂,.…

Each relation symbol and each function symbol has an arity, which is a positive integer. One normally requires that no symbol occurs in more than one of these kinds, and that no relation or function symbol occurs with more than one arity. If a function or relation symbol has arity n, we describe the symbol as n -ary. Binary means 2-ary.

A first-order language also has an infinite set of symbols called variables. Variables are usually chosen as lower case letters near the end of the alphabet:
(58) u, v, w, x, y, z, v ₀, v ₁,.…
The variables are not in the signature.

Given any signature σ, we define a first-order language L (σ) in terms of σ. We begin with the terms of L (σ).

(a) Every constant of σ is a term of L (σ).
(b) Every variable of L is a term of L (σ).
(c) Suppose F is a function symbol of σ, n is the arity of F, and t ₁, …, t _n are terms of L (σ). Then the expression

F (t ₁, …, t _n)

is a term of L (σ).
(d) L (σ) has no terms except as given by (a)–(c).

This definition is an inductive definition. Clauses (a) and (b) together form its base clause ; they say outright that certain expressions are terms. Clause (c) is the inductive clause ; it says that if certain expressions are terms then certain other expressions are terms too. Clause (d) tells us that if t is a term of L (σ) then t can be generated in a finite number of steps by using the base and inductive clauses.

A metatheorem of first-order logic is a theorem about first-order logic (as opposed to a theorem proved by means of first-order logic).

metatheorem 1 (unique parsing of terms)

If t is a term of L (σ) then exactly one of the following holds:

(1) t is a constant of σ.
(2) t is a variable.
(3) t is F(t ₁, …, t _n) where F is a function symbol of σ, n is the arity of F and t ₁, …, t _n are terms of L(σ).

Moreover in case (3) if t is also G (s ₁, …, s _m) where G is a function symbol of σ and s ₁, …, s_m are terms of L (σ ), then F is G and n is m and t ₁ is s ₁ and … and t _n is s _n.

See Stephen Kleene §17 for the proof. Thanks to unique parsing, we can distinguish three types of term. The first two types are the constants and the variables, and together they form the atomic terms of L. The third type of term consists of those of the form F (t ₁,…, t _n); these are said to be compound terms. Broadly speaking the terms of L (σ) correspond grammatically to the singular definite noun phrases of a natural language.

The unique parsing lemma is used to justify certain types of definition and proof by induction. For example we can define, for each term t, the set V (t ) of variables that occur in t, as follows:

(a) If t is a constant then V (t ) is ∅ (the empty set).
(b) If t is a variable then V (t ) is the set {t}.
(c) If t is F (t ₁, …,t _n) then V (t ) is the union

V (t ₁) ∪ … ∪ V (t _n).

(This is the set of objects that are in at least one of V (t ₁), …, V (t _n).) We say that a term is closed if it contains no variables.

Along similar lines we can define t (s ₁, …, x _n/v ₁, …, v _n), which is the term got by taking the term t and putting the term s ₁ in place of each occurrence of the variable v ₁ in t, s ₂ in place of each occurrence of v ₂ and so on; the replacements should be made simultaneously. For example if t is F (x, G (y )) and s is G (z ), then t (s/x ) is F (G (z ),G (y )) and t (y, t/x, y ) is F (y, G (F (x, G (y )))).

Having defined the terms of L (σ), we define the formulas of L (σ) as follows.

(a) Every propositional symbol of σ is a formula of L (σ).
(b) If R is a relation symbol of σ, n is the arity of R and t ₁, …, t _n are terms of L (σ), then the expression

R (t ₁, …, t _n)

is a formula of L (σ).
(c) If s and t are terms of L (σ) then the expression

(s = t )

is a formula of L (σ).
(d) If ϕ is a formula of L (σ) then the expression

¬ϕ

is a formula of L (σ).
(e) If ϕ and ψ are formulas of L (σ) then the four expressions

(ϕ ∧ ψ), (ϕ ∨ ψ), (ϕ → ψ), (ϕ ↔ ψ)

are formulas of L (σ).
(f) If ϕ is a formula of L (σ) and v is a variable of L (σ), then the two expressions

∀v ϕ, ∃v ϕ

are formulas of L (σ). (∀v is called a universal quantifier and ∃v is an existential quantifier.)
(g) Nothing is a formula of L (σ) except as given by clauses (a)–(f) above.

The obvious counterpart to Metatheorem 1 is true for formulas. It allows us to say that a formula is atomic if it comes from clauses (a)–(c) and compound if has one of the forms described in clauses (d)–(f). Also no expression of L (σ) is both a term and a formula.

The next definition will get its full motivation when we come to discuss satisfaction of formulas. Roughly speaking, a variable x can serve to name an object, unless it appears in one of the contexts "For all objects x, … x …" and "There is an object x such that … x …." (Here we recall that in first-order logic, 'for all objects x ' is written ∀x and "there is an object x such that" is written ∃x.) An occurrence of a variable in one of these contexts is said to be bound ; an occurrence that is not bound is free. We say a variable is free in ϕ if it has a free occurrence in ϕ. A definition by induction of the set FV (ϕ) of variables free in the formula ϕ runs as follows:

(a) If ϕ is atomic then FV (ϕ) is the set of all variables that occur in ϕ.
(b) FV (¬ϕ) is FV (ϕ).
(c) FV ((ϕ ∧ ψ)), FV ((ϕ ∨ ψ)), FV ((ϕ → ψ)) and FV ((ϕ ↔ ψ)) are all equal to

FV (ϕ) ∪ FV (ψ).

(d) If ϕ is a formula and v is a variable, then FV (∀v ϕ) and FV (∃v ϕ) are both the set

FV (ϕ) \ {v }

of all the variables that are in FV (ϕ) and are not v.

A formula ϕ is said to be a sentence if FV (ϕ) is empty, in other words, if no variable is free in ϕ. For example ∀x (P (x ) → Q (x )) is a sentence, but (P (x ) → Q (x )) is not a sentence since x has two free occurrences in it.

Unique parsing also allows us to define by induction the complexity of a formula ϕ, comp(ϕ), as follows:

(1) If ϕ is an atomic formula then comp(ϕ) = 0.
(2) For every formula ϕ, comp(¬ϕ) = comp(ϕ) + 1.
(3) If ϕ and ψ are formulas and n is the maximum of comp(ϕ) and comp(ψ), then comp((ϕ ∧ ψ)), comp((ϕ ∨ ψ)), comp((ϕ → ψ)) and comp((ϕ ↔ ψ)) are all equal to n + 1.
(4) If ϕ is a formula and v is a variable then comp(∀v ϕ) = comp(∃v ϕ) = comp(ϕ) + 1.

There is a similar definition for the complexity of terms. The chief use of complexity is in proofs by induction on complexity, which run as follows. We want to show that all formulas of a first-order language have a certain property P. So we show first that all atomic formulas have the property P, and then we show that for every positive integer n, if all formulas of complexity <n have P then every formula of complexity n has P too.

One speaks of the subformulas of a formula ϕ in two senses. First a subformula of ϕ is a segment of ϕ that is a formula in its own right. Second a subformula of ϕ is a formula that occurs as a subformula of ϕ in the first sense. For example the formula (P (x ) ↔ P (x )) has two subformulas of the form P (x ) in the first sense, but only one in the second sense. It is easy to give a formal definition of the set of subformulas of ϕ in the second sense, by induction on the complexity of ϕ. Metatheorem 12 below uses subformulas in the first sense.

There are several useful conventions about how to write first-order formulas. For example
(59)      ϕ ∧ ψ ∧ χ
is strictly not a first-order formula, but we count it as an abbreviation of
(60)      ((ϕ ∧ ψ) ∧ χ).
In the same spirit the conjunction
(61)      (ϕ₁ ∧ ϕ₂ ∧ … ∧ ϕ_n)
is an abbreviation for
(62)      (…(ϕ₁ ∧ ϕ₂) ∧ … ∧ ϕ_n),
and the disjunction
(63)      (ϕ₁ ∨ ϕ₂ ∨ … ∨ ϕ_n)
is an abbreviation for
(64)      (…(ϕ₁ ∨ ϕ₂) ∨ … ∨ ϕ_n).
We count " ∧ " and " ∨ " as binding tighter than " → " or " ↔," in the sense that
(65)      (ϕ ∧ ψ ↔ χ), (ϕ → ψ ∨ χ)
are respectively abbreviations for
((ϕ ∧ ψ) ↔ χ), (ϕ → (ψ ∨ χ)).
Other useful abbreviations are
(66)      (x ≠ y ) for ¬(x = y ),
∀x ₁ … x _n ϕ for ∀x ₁ … ∀x _n ϕ,
∃x ₁ … x _n ϕ for ∃x ₁ … ∃x _n ϕ.
Also we allow ourselves to leave out a pair of brackets when they stand at the opposite ends of a formula.

Another useful convention is based on mathematical notation for functions. If ϕ is a formula and all the variables free in ϕ are included in the list v ₁, …, v _n, we introduce ϕ as ϕ(v ₁, …, v _n). Then if t ₁, …, t _n are terms, we write
(67) ϕ(t ₁, …, t _n)
for ϕ(t ₁, …,t _n/v ₁, …, v _n), which is the result of putting t _i in place of each free occurrence of v _i in ϕ, simultaneously for all I from 1 to n. (We shall revise this definition later.)

Interpretations of First-Order Languages

A first-order language is a language L (σ) for some signature σ. The signature σ determines the language L (σ), but equally if we know the formulas of L (σ) we can recover σ. So if L is the language L (σ), we could equally well say "formula of σ" or "formula of L." Likewise σ-structures, which we are about to define, can equally well be called L -structures.

We recall some set theory. If X is a set and n a positive integer, then an n -tuple from X is an ordered list (a ₁, …, a _n) where a ₁, …, a _n are members of X. We write X ⁿ for the set of all n -tuples from X. An n -ary relation on X is a subset of X ⁿ. An n -ary function on X is a function f : X ⁿ → Y, for some set Y, which assigns to each n -tuple (a ₁, …, a _n) from X an element f (a ₁, …, a _n) of Y.

Suppose σ is a signature. A σ-structure is a set-theoretic interpretation for the symbols in σ. More precisely a σ-structure A has the following ingredients:

(a) A set (usually required to be nonempty) which is the domain of a, in symbols dom (A).
(b) For each propositional symbol p of σ, a truth value (T or F) which we write as p _A.
(c) For each constant c of σ, an element c _A of dom (A ).
(d) For each relation symbol R of σ, an n -ary relation R _A on dom (A )ⁿ, where n is the arity of R.
(e) For each function symbol F of σ, an n -ary function F _A: dom(A )ⁿ → dom (A ), where n is the arity of F.

example 1 (arithmetic and its language)

For talking about the natural numbers
(68) 0, 1, 2,…,
we use a first-order language called the language of arithmetic. Its signature σ_ℕ consists of one constant symbol 0, two function symbols + and · of arity 2, a function symbol S of arity 1 and a relation symbol < of arity 2. The number structure ℕ is the following σ_ℕ-structure. The domain is the set of natural numbers. The constant 0 stands for the number zero (i.e., 0 _ℕ = 0 ). The function symbols + and · stand for addition and multiplication of natural numbers, and S stands for the function "plus one." The binary relation symbol < stands for the relation "less than" (i.e., <_ℕ is the set of all ordered pairs of natural numbers (m, n ) with m < n ). Following normal mathematical usage we write + (x, y ), · (x, y ) and < (x, y ) as x + y, x · y and x < y respectively.

The structure ℕ interprets the closed terms of L (σ_ℕ) as names of numbers. For example the term S (0 ) stands for the number 1, the term S (S (0 )) stands for the number 2, and so on; we write these terms as 0, 1, 2 and so on. Likewise the closed term 2 + 3 names the number 5.

We can use our earlier explanations of the first-order symbols in order to read any sentence of L (σ_ℕ) as making a statement about ℕ. The following sentences are all true in ℕ:
(69) PA1. ∀x (Sx ≠ 0 ).
PA2. ∀x ∀y (S (x ) = S (y ) → x = y ).
PA3. ∀x (x + 0 = x ).
PA4. ∀x ∀y (x + S (y ) = S (x + y )).
PA5. ∀x (x · 0 = 0 ).
PA6. ∀x ∀y (x · S (y ) = (x · y ) + x ).
PA7. ∀x ¬(x < 0 ).
PA8. ∀x ∀y (x < Sy ↔ x < y ∨ x = y ).

The following induction axiom is also true in ℕ, though it is not a first-order sentence because the variable 'X ' ranges over sets rather than numbers.
(70) For every set X of numbers,
((0∈X ) ∧ ∀x (x ∈X → S (x )∈X → ∀x (x ∈X )).
Within L (σ_ℕ) the closest we can come to this axiom is to give a separate axiom for each set X defined by a first-order formula. Namely let ϕ(x, y ₁, …, y _n) be any formula of L (σ_ℕ). Then we write the sentence
(PA.9)

The sentences of the form PA9 constitute the first-order induction schema for arithmetic. The infinite set of sentences PA1–PA9 is called first-order Peano arithmetic, or PA for short.

The situation with ℕ is typical. Given any signature σ, any σ-structure A and any sentence ϕ of L (σ), we can read ϕ as making a statement about A. If this statement is true we say that A is a model of ϕ, and we express this fact by writing
(71) A ⊧ ϕ.
If ϕ is false in a we write A ⊧̷ ϕ. It is an unfortunate fact of history that we use the symbol "⊧" both in (71) (which is not a sequent) and in semantic sequents such as (16) above and (72) below. One can avoid confusion by noting that in (71) there is a structure to the left of "⊧," whereas in sequents the space to the left of "⊧" is empty or contains sentences.

Theories and Their Models

Let L be a first-order language. A set of sentences of L is called a theory. An L -structure A is called a model of T if it is a model of every sentence in T. The semantic sequent
(72)      T ⊧ ψ
states that every model of T is a model of ψ; when this holds, we say the sequent is valid and that ψ is a (logical) consequence of T. When T is a finite set, say {ϕ₁, …, ϕ_n}, we write the sequent (72) as
(73)      ϕ₁, …, ϕ_n ⊧ ψ.
When T is empty, we also write the sequent as
(74)      ⊧ ψ.
(74) says that ψ is true in every L -structure; when it holds, we say that the sentence ψ is valid, and that it is a theorem. Finally the sequent
(75)      T ⊧
expresses that T has no models.

We say that T is consistent if it has models, in other words if T ⊭. We say that T is complete if for every sentence ϕ of L, at least one of ϕ and ¬ϕ is a consequence of T. So T is consistent and complete if and only if for every sentence ϕ of L, exactly one of ϕ and ¬ϕ is a consequence of T.

Since a sentence of L (σ) is also a sentence of L (τ) whenever τ includes σ, we should check that the validity of a sequent depends only on the sentences in it, and not on the signature—otherwise our notation for sequents would need to mention the signature. The following metatheorem assures this. (It requires that every structure has nonempty domain. In a structure with empty domain, ∃x (x = x ) is false; but adding a constant to the signature automatically excludes structures with empty domains and hence makes ∃x (x = x ) a theorem.)

metatheorem 2

If T and ψ are a theory and a sentence of L(σ), and τ is a signature extending σ, then (72) is valid for σ-structures if and only if it is valid for τ-structures.

A theory T is said to be deductively closed if T contains all its consequences. If S is any theory then the set T of all consequences of S is deductively closed and contains S ; we call T the deductive closure of S. When T is the deductive closure of S we say also that S is a set of axioms for T.

First-order theories commonly arise in one of two ways.

In the first way we have a first-order language L and an L -structure A, and we want to list the facts that we know about A. By the complete theory of A we mean the set T of all sentences of L that have A as a model. This set T is always deductively closed, consistent and complete. If we have a set S of sentences that are all true in A, then certainly S is consistent; if it is also complete, then S is a set of axioms for the complete theory of A. An ambition for logicians is to give sets of axioms for the complete theories of various mathematical structures. For many cases this is achieved. But in 1931 Kurt Gödel gave an indirect but astonishingly insightful proof that PA is not complete, so that it doesn't axiomatise the complete first-order theory of ℕ. (See entry on "Gödel's Theorem.")

The second common source of first-order theories is the definitions of classes of mathematical structures. For example a linearly ordered set is a structure in a signature containing a binary relation symbol <, which is a model of the three sentences:
(76) ∀x ∀y ∀z (x < y ∧ y < z → x < z )
∀x ¬ (x < x )
∀x ∀y (x < y ∨ y < x ∨ x = y )
The structure ℕ forms a linear ordering, since all of (76) is true in ℕ. This theory (76) is a direct translation into first-order notation of the usual informal definition of linear orderings.

Formulas and Satisfaction

The formula x < 3 is neither true nor false in ℕ, because the variable x lacks an interpretation. The interpretations of the symbols of σ_ℕ are fixed in ℕ, but the interpretations of the variables are not. The same holds for any first-order language L, any L -structure A and any formula ϕ of L in which some variables are free.

By an assignment in the structure A we mean a function α whose domain is a set of variables and which assigns to each variable in its domain an element of A. If t is a term and α is an assignment in A whose domain includes all the variables in t, then A and α together tell us how to treat t as the name of an element of A, and we write this element t _A[α]. For example if A is ℕ and α is an assignment in ℕ with α(x ) = 4, and t is the term x + 5, then t _ℕ[α] is 4 + 5, in other words 9.

When t is a closed term, t _A[α] is independent of α and we write it simply as t _A.

Similarly we can use assignments to interpret the free occurrences of variables in a formula. (The bound occurrences need no interpretation; they are part of the apparatus of quantification.) For example in ℕ, any assignment α with α(x ) = 4 interprets the formula x < 5 as making the statement that 4 is less than 5, which is true. We express the fact that this statement is true by writing
(77)      ℕ ⊧ (x < 5 )[α].
More generally if α is an assignment in A which makes assignments to all the variables free in ϕ, then
(78)      A ⊧ ϕ[α]
states that ϕ, interpreted in A by means of α, is true. When (78) holds we say that α satisfies ϕ in A. We write
(79)      A ⊧̷ ϕ[α]
if α fails to satisfy ϕ in A.

There are two reasons for introducing the notion of "satisfying." The first is that it allows us to use formulas with free variables in order to express properties of individual elements or sequences of elements in a structure. For this application another notation is helpful. Suppose ϕ(x ₁, …, x _n) is a formula of L and α is an assignment that assigns elements to at least the variables x ₁, …, x _n. Write a _i for α(x _i). Then instead of "A ⊧ ϕ[α]" we also write
(80) A ⊧ ϕ[a ₁, …, a _n].
We read this statement as "a ₁, …, a _n satisfy ϕ in A.'

The second reason for introducing satisfaction is that (as Alfred Tarski pointed out) it allows us to give a fully precise mathematical definition of the relation "A ⊧ ϕ," by first defining the relation "A ⊧ϕ[α]." The first step of the definition is to define t _A[α] by induction on the complexity of t ; we omit details. This done, the definition of "A ⊧ ϕ[α]" goes by induction on the complexity of ϕ. We give some typical cases and leave the remainder to the reader.

(a) For every propositional symbol p,

A ⊧ p if and only if p _A is T.

(b) If R is a relation symbol of arity n and ϕ has the form R (t ₁, …, t _n), then

A ⊧ ϕ[α ] if and only if ((t ₁)_A[α], …, (t _n)_A[α]) is in R _A.

(c) A ⊧ (ϕ ∧ ψ)[α] if and only if A ⊧ ϕ[α] and A ⊧ψ[α].

The clauses for quantifiers need some further notation. Suppose α is an assignment whose domain includes all the variables free in ϕ except perhaps v, and a is an element of the structure A. Then we write α(a/v ) for the assignment β whose domain is the domain of α together with v, and such that for each variable x in the domain of β,
(81)

(d) A ⊧ ∀v ϕ[α] if and only if for every element a of dom(A ), A ⊧ ϕ[α(a/v )].
(e) A ⊧ ∃v ϕ[α] if and only if there is an element a of dom(A ) such that A ⊧ ϕ[α(a/v )].

Tarski's definition of "A ⊧ ϕ[α]" goes by induction on the complexity of formulas, as above. But by standard set-theoretic methods we can convert it to an explicit set-theoretic definition and hence prove the following metatheorem:

metatheorem 3

There is a formula θ of set theory such that
θ(σ,A,ϕ,α)
is true in the universe of sets if and only if σ is a signature, A is a σ-structure, ϕ is a formula of L(σ) and A ⊧ ϕ[α].

We would like to know that ϕ(y/ x) says the same thing about an object y as ϕ(x ) says about an object x. More precisely, we would like to know the following:

metatheorem 4

Suppose ϕ(x) is a formula of the first-order language L, t(y) is a term of L, α is an assignment whose domain includes y, and β is an assignment with β(x) = t_A [α]. Then
A ⊧ ϕ(t/x )[α] if and only if A ⊧ ϕ[β].

Unfortunately this metatheorem is false unless we make certain adjustments. For example let ϕ(x ) be the formula ∃y (x ≠ y ) which says that there is something else besides x, and let t be the variable y. Then ϕ(t/x ) is the everywhere false sentence ∃y (y ≠ y ), not a formula saying that there is something else besides y. The quantifier ∃y has captured the variable y when we substituted t for x in ϕ(t/x ).

There is a remedy. Given any formula ϕ and any term t, we define ϕ(t/ /x ) as follows. For each variable v occurring in the term t, choose another variable v' that doesn't occur in either t or ϕ, taking different variables v' for different v. Form the formula ϕ' by replacing each bound occurrence of each variable v in ϕ by an occurrence of v'. Finally take ϕ(t//x ) to be ϕ'(t/x ). (A more precise account would say how we choose the variables v' and would explain the relevance of the logical equivalence (90)(i) below.) Then ϕ(t//x ) is said to come from ϕ by substituting t for x "without clash of variables." For simplicity of notation we now redefine ϕ(t/x ) to mean ϕ(t//x ), thus throwing away the ladder we climbed up. After this redefinition, Metatheorem 4 is true.

Some authors avoid this redefinition by forbidding the use of ϕ(t/x ) when ϕ contains a quantifier that captures a variable in t.

Some Metatheorems of First-Order Logic

The metatheorems in this section are mostly immediate from the definitions. We state them because they have useful applications.

metatheorem 5

If ϕ is a first-order sentence then the sequent ϕ ⊧ ϕ is valid.

metatheorem 6 (monotonicity)

If ψ is a sentence and T a theory in a first-order language, and U is a subset of T such that the sequent U ⊧ ψ is valid, then the sequent T ⊧ ψ is valid.

metatheorem 7 (cut)

If T is a theory and ϕ, ψ are sentences, all in a first-order language, and the sequents
(82) T ⊧ ϕ, T ∪ {ϕ} ⊧ ψ
are both valid, then the sequent T ⊧ ψ is valid. (The sentence ϕ is "cut.")

There are a number of metatheorems expressing properties of particular logical operators. The three below are only a sample.

metatheorem 8

Suppose T is a first-order theory and ϕ is a first-order sentence. Then the sequent T ⊧ϕ is valid if and only if the sequent T ∪ {¬ϕ} ⊧ is valid.

metatheorem 9

Suppose T is a first-order theory and ϕ and ψ are first-order sentences. If
(83)      T ∪ {ϕ} ⊧ ψ
is valid, then
(84)      T ⊧ (ϕ → ψ)
is valid. Also if
(85)      T ⊧ ϕ, T ⊧ (ϕ → ψ)
are both valid then
(86)      T ⊧ ψ
is valid.

The first half of Metatheorem 9 is sometimes called the Deduction Theorem. The second half is one form of a rule traditionally called Modus Ponens.

metatheorem 10

Suppose T is a first-order theory, ϕ(x₁, …, x_n) is a first-order formula and c₁, …, c_n are n distinct constants that occur nowhere in either T or ϕ. Then if either of the following sequents is valid, so is the other:
T ⊧ ϕ(c ₁, …, c _n).
T ⊧ ∀x ₁, … ∀x _nϕ.

Our remaining metatheorems describe important properties of first-order logic as a whole.

metatheorem 11

Let L be a first-order language and ϕ(v₁, …, v_n) and ψ(v₁, …, v_n) formulas of L. Then the following are equivalent:

(a) For every σ-structure A and all elements a₁, …, a_n of the domain of A,

A ⊧ ϕ[a ₁, …, a _n] if and only if A ⊧ ψ[a ₁, …, a _n].

(b) ⊧∀v ₁ …∀v _n (ϕ(v ₁, …, v _n)
↔ ψ(v ₁, …, v _n)).

When these conditions (a) and (b) hold, we say that ϕ and ψ are logically equivalent, and we write this as ϕ≡ψ.

Logical equivalence is an equivalence relation on formulas. Here are some logically equivalent pairs:
(87)
Equivalences (a), (b) and (c) are examples of a group of equivalences that go by the name of De Morgan's Laws.

metatheorem 12

Let L be a first-order language and suppose ϕ and ϕ' are logically equivalent formulas of L. Let ψ be a formula of L, and let ψ' come from ψ by replacing a subformula of ψ of the form ϕ by a copy of ϕ'. Then ψ' is logically equivalent to ψ.

Metatheorem 12 together with equivalences (a), (e), and (f) tells us that, given any first-order formula ϕ, we can remove all occurrences of the symbol " ↔ ", and then all occurrences of the symbols " ∨ " and " →," and so find a formula logically equivalent to ϕ in which none of these symbols occurs. So there would be no loss of expressive power if we removed these symbols from the language. By a similar argument we could make do with " ∨ " and "¬," discarding " ∧," " → " and " ↔."

Other choices of symbol are open to us. For example we can introduce the symbol "⊥" as an atomic formula; since it has no variables it is a sentence, and we stipulate that its truth value is F in all structures. This symbol "⊥" is logically equivalent to ¬∀x (x = x ), or more generally to ¬ϕ where ϕ is any valid sentence. We pronounce "⊥" as "absurdity"; some computer scientists read it as "bottom." Given the logical equivalences
(88)
we see that in the presence of "⊥" and " → " we can drop "¬," " ∧," " ∨," " ↔ " from the language.

A formula with no quantifiers is said to be quantifier-free. By a literal we mean a formula that is either atomic or the negation of an atomic formula. By a basic conjunction we mean either a literal or a conjunction of literals; likewise a basic disjunction is a literal or a disjunction of literals. A quantifier-free formula is said to be in disjunctive normal form if it is a basic conjunction or a disjunction of basic conjunctions; it is in conjunctive normal form if it is a basic disjunction or a conjunction of basic disjunctions.

metatheorem 13

Every quantifier-free formula ϕ(x₁, …, x_n) is logically equivalent to a quantifier-free formula ϕ ^dnf(x₁, …, x _n) in disjunctive normal form, and to a quantifier-free formula ϕ^cnf(x₁, …, x _n) in conjunctive normal form.

We illustrate Metatheorem 13:
(89)

Here follow some important logical equivalences involving quantifiers.
(90)

A formula is said to be prenex if it is quantifier-free or consists of a string of quantifiers followed by a quantifier-free formula. The (possibly empty) string of quantifiers at the front of a prenex formula is called its quantifier prefix.

metatheorem 14 (prenex form theorem)

Let L be a first-order language and ϕ(x₁, …, x_n) a formula of L. Then there is a prenex formula ψ(x₁, …, x_n) of L that is logically equivalent to ϕ.

To prove the prenex form theorem, one establishes ways of moving a quantifier in a formula "outwards." Equivalences (j) and (k) above are typical examples, and there are corresponding equivalences with ∃. If the variable x does occur free in ψ, we first use equivalence (i) to change x to another variable not occurring in ψ.

Construction of Models

One way to show that a theory T is consistent is to build a model of T. Depending on what T is, this can call for a good deal of ingenuity. A number of logicians have studied how, by analysing T itself, we can make the process more systematic. The approach described below follows suggestions of Jaakko Hintikka.

Let L be a first-order language and T a set of sentences of L. For simplicity we assume L doesn't have ∨, → or ↔. We say that T is a Hintikka set if it has the following properties:

(H1) If (ϕ ∧ ψ) is in T then both ϕ and ψ are in T ; if ¬(ϕ ∧ ψ) is in T then at least one of ¬ϕ and ¬ψ is in T.
(H2) If ¬¬ϕ is in T then ϕ is in T.
(H3) For every closed term t, (t = t ) is in T.
(H4) If (s = t ) and ϕ(s/x ) are both in T then ϕ(t/x ) is in T.
(H5) If ∃x ϕ(x ) is in T then for some constant c, ϕ(c ) is in T ; if ¬∃x ϕ(x ) is in T then ¬ϕ(t ) is in T for every closed term t.
(H6) If ∀x ϕ(x ) is in T then ϕ(t ) is in T for every closed term t ; if ¬∀x ϕ(x ) is in T then for some constant c, ¬ϕ(c ) is in T.
(H7) If ϕ is an atomic sentence then ϕ and ¬ϕ are not both in T.

metatheorem 15

If L is a first-order language, A is an L-structure and every element of A is named by a constant, then the set of all sentences of L that are true in A is a Hintikka set.

metatheorem 16

If the first-order language L has at least one constant and T is a Hintikka set in L then T has a model

We sketch the proof of Metatheorem 16. Let C be the set of all closed terms of L. Define a relation ∼ on C by: s ∼ t if and only if (s = t ) is in T. Then we can show, using (H3) and (H4), that ∼ is an equivalence relation on C. Write t ^∼ for the equivalence class of the closed term t, and C ^∼ for the set of equivalence classes t ^∼. Since L has at least one constant, C ^∼ is not empty. We shall build an L -structure A whose domain is C ^∼. For each constant c we take c _A to be c ^∼. If F is a function symbol of arity n and t ₁, …, t _n are closed terms, we define F _A(t ₁^∼, …, t _n^∼) to be the equivalence class
(91) F (t ₁, …, t _n)^∼.
We can use (H4) to justify this definition. An argument by induction on complexity shows that for each closed term t of L, t _A is t ^∼. If R is a relation symbol of arity n and t ₁, …, t _n are closed terms, we define
(92) (t ₁^∼, …, t _n^∼) is in R _A if and only if R (t ₁, …, t _n) is in T.
Again this definition is justified by an argument involving (H4). This completes the definition of the structure A.

Now we prove, by induction on the complexity of ψ, that for every sentence ϕ of L, if ϕ is in T then A ⊧ ψ, and if ¬ψ is in T then A ⊧ ¬ψ. A typical clause is where ψ is ∃x ϕ(x ). If ψ is in T then by (H5) there is a constant c such that ϕ(c ) is in T. Since ϕ(c ) has lower complexity than ∃x ϕ(x ), the induction hypothesis shows that A ⊧ ϕ(c ). So A ⊧ ∃x ϕ(x ). On the other hand if ¬∃x ϕ(x ) is in T, then by (H5) again and induction hypothesis, A ⊧ ¬ϕ(t ) for every closed term t, so that A ⊧¬ϕ[t _A]. Since all elements of A are of the form t _A, this shows that A ⊧ ¬∃x ϕ(x ). Thus A is a model of every sentence in T, proving the metatheorem.

As an example we shall show that the sequent
(93) ∀x ¬(P (x ) ∧ ¬Q (x )), ∀x ¬(Q (x ) ∧ ¬R (x )) ⊧ ∀x
¬(R (x ) ∧ ¬P (x ))
is not valid. ((93) is the sequent (15) but with the conclusion reversed and " → " removed in favour of "¬" and " ∧.") We begin by noting that by Metatheorem 8, the sequent is valid if and only if the theory consisting of the sentences
(94) ∀x ¬(P (x ) ∧ ¬Q (x )), ∀x ¬(Q (x ) ∧ ¬R (x )), ¬∀x
¬(R (x ) ∧ ¬P (x ))
is inconsistent. So we can show the invalidity of (93) by constructing a model of these three sentences. We aim to build a Hintikka set that contains the sentences.

Property (H5) of Hintikka sets and the hypothesis of Metatheorem 16 alert us that we may need to call on constants. Maybe L has no constants; maybe it has constants, but they are all used in sentences of T that make their use for (H5) and (H6) impossible. So as a first step we allow ourselves to add new constant symbols to the language when needed. Metatheorem 2 tells us that this expansion of L makes no difference to the consistency of T. The added constants are called witnesses, since in (H5) the sentence ϕ(c) serves as a witness to the truth of ∃x ϕ.

We begin by writing the sentences (94):
(95)      ∀x ¬(P (x ) ∧ ¬Q (x ))
∀x ¬(Q (x ) ∧ ¬R (x ))
¬∀x ¬(R (x ) ∧ ¬P (x ))
At this point we apply the second part of (H6) to the third sentence. This requires us to introduce a witness, say c. A Hintikka set containing ¬∀x ¬(R (x ) ∧ ¬P (x )) needs to contain ¬¬(R (c ) ∧ ¬P (c )), so we add this to the diagram. We notice that by (H2) a Hintikka set containing this new sentence must also contain (R (c ) ∧ ¬P (c )), so we add this too. Next (H1) tells us that a Hintikka set containing (R (c ) ∧ ¬P (c )) must also contain R (c ) and ¬P (c ), so we add these below. Then by the first part of (H6) we also need to add ¬(P (c ) ∧ ¬Q (c )) and ¬(Q (c ) ∧ ¬R (c )), so we add them.
(96)
Here we meet a problem. The Hintikka set that we are constructing contains ¬(P (c ) ∧ ¬Q (c )), so by the second part of (H1) it must contain either ¬P (c ) or ¬¬Q (c ). But we don't yet know which will work; so we try both possibilities. The diagram will branch. To the left we try to construct a Hintikka set containing ¬P (c ), and to the right, a Hintikka set containing ¬¬Q (c ) (and hence also Q (c ) by (H2)). The same rule (H1) applies to the sentence ¬(Q (c ) ∧ ¬R (c )), and tells us to add either ¬Q (c ) or ¬¬R (c ). We must try each of these choices in each branch, so we have a double branching. At this point we notice that the third branch from the left contains both the atomic sentence Q (c ) and its negation ¬Q (c ); so by (H7) there is no hope of extending this branch to a Hintikka set, and we close it with a horizontal line across the bottom.
(97)
The first, second, and fourth branches are still open. Inspection shows that all that is needed to turn the contents of each of these branches into a Hintikka set is to add the equation (c = c ); so we do that. Now choose one of the open branches, say the first. Since its contents (reckoning from the top of the diagram) form a Hintikka set, it has a model, and this model is a model of the first three sentences in particular. So we have shown that the sequent (93) is not valid.

What would happen if we replaced the conclusion of (93) by ∀x ¬(P (x ) ∧ ¬R (x ))? The resulting sequent is a paraphrase of (15) from section 1, and we claimed that that sequent was valid. Here is the tree for the revised sequent:
(98)
Here every branch contains an atomic sentence together with its negation, so they all close and we say that the diagram is closed. This closed diagram shows that there is no Hintikka set containing the top three sentences. Hence by Metatheorem 15 they have no model, and this shows that our revised sequent is valid, giving a formal justification of (15).

Proof Calculi

A proof calculus is a mathematical device for proving the validity of sequents. In any proof calculus a central notion is that of a formal proof of a sequent. For most proof calculi a formal proof is an array of symbols that can be written on a page. But some calculi are for use by computers and their formal proofs are not meant for visual inspection. On the other hand the proof calculus Hyperproof (Jon Barwise and John Etchemendy) allows formal proofs that consist of sequences of labelled pictures.

If a proof calculus 𝒞 contains a formal proof of the sequent
(99) ϕ₁, …, ϕ_n ⊨ ψ,
we express this fact by writing
(100) ϕ₁, …, ϕ_n ⊦_𝒞 ψ.
The expression (100) is called a syntactic sequent. When we are discussing a particular proof calculus 𝒞, we can drop the subscript from ⊦_𝒞 and write simply ⊦; the symbol "⊦" is read as "turnstile."

In many proof calculi a formal proof of a sequent is a formalisation of an argument that one might use in order to persuade someone that the sequent is valid. For example the natural deduction calculus proposed by Gerhard Gentzen in 1934 is designed to make the same moves as are used in "natural" mathematical arguments. But in general a formal proof need not have any visible connection with arguments. The main requirements on a proof calculus 𝒞 are as follows.

(a) Whenever a syntactic sequent (100) holds, the corresponding semantic sequent (99) is valid. A proof calculus satisfying this condition is said to be sound.
(b) Whenever a semantic sequent (99) is valid, the corresponding syntactic sequent (100) holds. A proof calculus satisfying this condition is said to be complete.
(c) A computer can identify those arrays of symbols that are formal proofs in 𝒞, and for each formal proof and each finite semantic sequent, the computer can calculate whether or not the proof is a formal proof of the sequent.

Soundness says that 𝒞 doesn't prove any sequent that it ought not to; completeness says that 𝒞 does prove any sequent that it ought to.

All the proof calculi commonly taught to undergraduates are both sound and complete. There is one main exception: the resolution calculus is limited to proofs of sequents of the form
(101) ∀x ₁ …∀x _m ϕ ⊨
where ϕ is quantifier-free and in conjunctive normal form. Computer science students who study this calculus also learn how to reduce more general proof problems to this form. Also some Hilbert-style calculi are only able to prove sequents of the form ⊧ψ.

If a proof calculus 𝒞 is sound and complete, then finite sequents with ⊧ are valid if and only if the corresponding sequents with ⊦_𝒞 are also valid. It follows that all the metatheorems of section 8 using ⊧ remain true when they are stated with ⊦_𝒞. But the versions with ⊢_𝒞 generally have direct proofs using syntactic properties of the proof calculus 𝒞. Sometimes these direct proofs (particularly proofs of the Deduction Theorem) play a role in proving that 𝒞 is complete. One deathtrap for unwary teachers is the Cut rule, Metatheorem 7. This is very easy to prove directly for some calculi, for example natural deduction. But for the truth tree calculus below, the truth of the Cut rule is a deep fact equivalent to Gentzen's cut elimination theorem; a syntactic proof of it is a major enterprise.

The entry "Proof Theory" contains much more information about proof calculi. For example it discusses how one can translate proofs in one proof calculus into proofs in another. If two proof calculi translate into each other in this way, then clearly soundness and completeness theorems for one calculus carry over to the other calculus too. This is probably the main reason why logicians often talk about "the completeness theorem" as if there was a single theorem for all proof calculi, when strictly each complete proof calculus has its own completeness theorem.

We introduced a kind of proof calculus when we discussed the construction of models. Suppose we have a tree diagram in the style of that section, showing that there is no Hintikka set containing the sentences
(102) ϕ₁, …, ϕ_n, ¬ψ.
Then in view of Metatheorem 8 we count this diagram as a formal proof of the sequent
(103) ϕ₁, …, ϕ_n ⊨ ψ.
The proof calculus based on Hintikka sets in this way is called truth trees.

The truth tree calculus is sound by Metatheorem 15. In order to establish that the calculus is complete, we must show the following: If it is not possible to construct a closed truth tree in a finite number of steps starting from ϕ₁, …, ϕ_n, ¬ψ, then it is possible to construct a truth tree starting with these sentences, in which one branch forms a Hintikka set (so that it has a model by Metatheorem 16). We have to bear in mind that a branch might go on forever. In fact if L has infinitely many closed terms, then (H3) implies that every Hintikka set in L must be infinite. In this case the conditions (H1)–(H7) impose infinitely many separate requirements on a Hintikka set, and we have to be sure that we construct our branches in such a way that each of these requirements will eventually be faced and met if possible. This can be arranged.

Decidability

When the signature σ is finite, we can assign natural number values to the symbols of L (σ) and thereby express each formula of L (σ) as a finite sequence of natural numbers. (In fact this is possible, though less tidy, when σ is countably infinite.) Hence it makes sense to apply notions of computability theory to L (σ). Thus we say that a set X of finite sequences of numbers is computably enumerable (abbreviated to c.e.) if a computer can be programmed to output all and only the sequences in X. Also we say that X is computable if a computer can be programmed to output Yes if a sequence in X is input, and No if a sequence not in X is input. These notions carry over immediately to theories in L (σ). We say also that a theory T is decidable if the set of its logical consequences is computable. A procedure for computing whether any given sentence is a consequence of T is called a decision procedure for T.

Now suppose we have a proof calculus 𝒞 for L that meets the conditions (a), (b) and (c) of section 10. Then 𝒞 is sound and complete, so that we have the equivalence
(104) ϕ is a consequence of T if and only if:
There is a finite subset U of T and there is P such that
[P is a formal proof in 𝒞 of the sequent U ⊧ ϕ].
By property (c) of 𝒞, the relation in square brackets is computable. It follows that if T is computably enumerable, we can program a computer so that it (i) lists all possible finite subsets U of T, all sentences ϕ of L (σ) and all formal proofs P in 𝒞, (ii) tests, for each U, ϕ and P, whether or not P is a proof of U ⊧ ϕ, and (iii) outputs ϕ whenever the answer to (ii) is Yes. This computer will output all and only the logical consequences of T. We have shown:

metatheorem 17

If T is a c.e. theory in a first-order language with finite signature, then the set of consequences of T is also c.e.

An important corollary is:

metatheorem 18

Suppose T is a complete, consistent, and c.e. theory in a first-order language with finite signature. Then T is decidable.

To compute whether ϕ is a consequence of T, list all the consequences of T ; eventually either ϕ or ¬ϕ will appear in the list. This is not a practical method. But when this abstract argument shows that T is decidable, one can usually find a much better decision procedure.

An important case is to determine whether a given sentence ϕ is a consequence of the empty theory, that is, whether ϕ is a theorem of first-order logic. By a result proved by Alonzo Church in 1936, the set of theorems of first-order logic (say, in a signature with at least one relation symbol of arity at least 2) is not computable. But if we restrict ϕ to come from some appropriate class of sentences, the picture changes. Suppose for example that ϕ is in propositional logic. Then we can construct a truth tree to determine whether ¬ϕ has a model, and after a finite number of steps the truth tree will have ground to a halt in the sense that there are no new sentences that we can add to it. If all its branches are closed, there is no model of ¬ϕ; if at least one branch remains open, it gives us a Hintikka set and hence a model of ¬ϕ. This provides a decision procedure for propositional sentences.

In fact the truth tree procedure allows us to test for theoremhood every first-order sentence of the form Q ₁Q ₂ϕ where Q ₁ is a string of universal quantifiers, Q ₂ is a string of existential quantifiers and ϕ is a quantifier-free formula with no function symbols.

A formula is said to be universal if it is quantifier-free or consists of a string of universal quantifiers followed by a quantifier-free formula. Thoralf Skolem proved that the problem of determining whether a given first-order sentence has a model can always be reduced to the problem of determining whether a certain universal first-order sentence has a model:

metatheorem 19

Let L be a first-order language with finite signature σ. There is a computational procedure which, given any sentence ϕ, will find a universal first-order sentence ϕ ^sk with the following properties:

(a) ϕ ^sk is in a signature got by adding function symbols and constants to σ.
(b) Every model of ϕ ^sk is a model of ϕ.
(c) Every σ-structure that is a model of ϕ can be made into a model of ϕ ^sk by adding suitable interpretations for the new function symbols and constants.

In particular ϕ has a model if and only if ϕ ^sk has a model.

We can illustrate Skolem's idea. Let ϕ be the sentence
(105) ∀x ∃y ∀z ∃w R (x, y, z, w ).
Then for ϕ^sk we can take the sentence
(106) ∀x ∀z R (x, F (x ), z, G (x, z )).
The function symbols F and G, and the functions they represent in a model of ϕ^sk, are called Skolem functions.

Thus if we want to determine whether ϕ is a first-order theorem, one way to proceed is to find (¬ϕ)^sk and determine whether it has a model or not. There is no guarantee that this approach will settle the question; the importance of Skolem's theorem is mainly theoretical.

Bibliography

In the citations below, modern introductory texts not requiring mathematical sophistication are marked with an *.

*Barwise, Jon, and John Etchemendy. Hyperproof. Stanford, CA: CSLI, 1994.

*Bergmann, Merrie, James Moor, and Jack Nelson. The Logic Book. New York: McGraw-Hill, 1997.

Boolos, George. "For Every A There is a B." Linguistic Inquiry 12 (1981): 465–466.

Church, Alonzo. Introduction to Mathematical Logic, Volume 1. Princeton, NJ: Princeton University Press, 1956.

Dalen, Dirk van. Logic and Structure. Berlin: Springer, 1980.

Davidson, Donald. "The Logical Form of Action Sentences." In Essays on Actions and Events, edited by Donald Davidson, 105–148. Oxford: Clarendon Press, 2001.

Edgington, Dorothy. "Do Conditionals Have Truth-Conditions?" Critica xviii, 52 (1986): 3–30. Reprinted in Conditionals, edited by Frank Jackson. Oxford: Oxford University Press, 1991.

*Forbes, Graeme. Modern Logic: A Text in Elementary Symbolic Logic. New York: Oxford University Press, 1994.

*Gabbay, Dov. Elementary Logics: A Procedural Perspective. Hemel Hempstead: Prentice Hall Europe, 1998.

Grice, Paul. Studies in the Way of Words. Cambridge, MA: Harvard University Press, 1989.

Hilbert, David, and Wilhelm Ackermann. Principles of Mathematical Logic. Translated from the second edition of Grundzüge der Theoretischen Logik. New York: Chelsea, 1950.

Hintikka, Jaakko. "Form and Content in Quantification Theory." Acta Philosophica Fennica 8 (1955): 11–55.

Hodges, Wilfrid. "Elementary Predicate Logic." In Handbook of Philosophical Logic. 2nd ed., vol. 1, edited by Dov Gabbay and Franz Guenthner. Dordrecht: Kluwer, 2001.

*Hodges, Wilfrid. Logic. 2nd ed. London: Penguin, 2001.

*Jeffrey, Richard C. Formal Logic: Its Scope and Limits. New York: McGraw-Hill, 1994.

Kalish, Donald, and Richard Montague. Logic, Techniques of Formal Reasoning. New York: Harcourt Brace, 1964.

Kleene, Stephen Cole. Introduction to Metamathematics. Amsterdam, North-Holland, 1964.

Lewis, David. "Scorekeeping in a Language Game." Journal of Philosophical Logic 8 (1979): 339–359.

*Mates, Benson. Elementary Logic. New York: Oxford University Press, 1965.

Mendelson, Elliott. Introduction to Mathematical Logic. 3rd ed. Belmont, CA: Wadsworth and Brooks/Cole, 1987.

Peirce, Charles S. "On the Algebra of Logic: A Contribution to the Philosophy of Notation." American Journal of Mathematics 7 (1885): 180–202.

Peirce, Charles S. "The Reader Is Introduced to Relatives." The Open Court 6 (1892): 3416–3418.

*Quine, Willard Van Orman. Methods of Logic. London: Routledge and Kegan Paul, 1952.

Shapiro, Stewart. "Classical Logic." Stanford Encyclopedia of Philosophy. Available at http://plato.stanford.edu.

Shoenfield, Joseph R. Mathematical Logic. Reading, MA: Addison-Wesley, 1967. Republished Urbana, IL: Association for Symbolic Logic and A. K. Peters, 2001.

Smullyan, Raymond M. First-Order Logic. New York: Dover Publications, 1995.

*Suppes, Patrick. Introduction to Logic. Princeton, NJ: Van Nostrand, 1957.

Velleman, Daniel J. How To Prove It. Cambridge, U.K.: Cambridge University Press, 1994.

Whately, Richard. Elements of Logic. London: Mawman, 1826. Reprinted Bologna: Editrice CLUEB, 1987.

Wilfrid Hodges (2005)

Encyclopedia of Philosophy