Computability Theory

views updated

COMPUTABILITY THEORY

0. The Informal Concept

Computability theory is the area of mathematics dealing with the concept of an effective procedure—a procedure that can be carried out by following specific rules. For example, one might ask whether there is some effective procedure—some algorithm—that, given a sentence about the positive integers, will decide whether that sentence is true or false. In other words, is the set of true sentences about the positive integers decidable? Or for a much simpler example, the set of prime numbers is certainly a decidable set. That is, there are mechanical procedures, that are taught in the schools, for deciding of any given positive integer whether or not it is a prime number.

More generally, consider a set S, which can be either a set of natural numbers (the natural numbers are 0, 1, 2, … ), or a set of strings of letters from a finite alphabet. (These two situations are entirely interchangeable. A set of natural numbers is much like a set of base-10 numerals, which are strings of digits. And in the other direction, a string of letters can be coded by a natural number in a variety of ways. The best way is, where the alphabet has k symbols, to utilize k -adic notation, which is like base-k numerals except that the k digits represent 1, 2, …, k, without a 0 digit.) One can say that S is a decidable set if there exists an effective procedure that, given any natural number (in the first case) or string of letters (in the second case), will eventually end by supplying the answer: "Yes" if the given object is a member of S and "No" if it is not a member of S.

And by an effective procedure here is meant a procedure for which one can give exact instructions—a program—for carrying out the procedure. Following these instructions should not demand brilliant insights on the part of the agent (human or machine) following them. It must be possible, at least in principle, to make the instructions so explicit that they can be executed by a diligent clerk (who is good at following directions but is not too clever) or even a machine (which does not think at all).

Although these instructions must of course be finite in length, no upper bound on their possible length is imposed. It is not ruled out that the instructions might even be absurdly long. Similarly, to obtain the most comprehensive concepts, no bounds are imposed on the time that the procedure might consume before it supplies the answer. Nor is a bound imposed on the amount of storage space (scratch paper) that the procedure might need to use. One merely insists that the procedure give an answer eventually, in some finite length of time.

Later, in section 7, more restrictive concepts will be considered, where the amount of time is limited in some way, so as to exclude the possibility of ridiculously long execution times. Initially, however, one wants to avoid such restrictions, to obtain the limiting case where practical limitations on execution time or memory space are removed.

This description of effective procedures, vague as it is, already shows how limiting the concept of decidability is. It is not hard to see that there are only countably many possible instructions of finite length that one can write out (using a standard keyboard, say). There are, however, uncountably many sets of natural numbers (by Cantor's diagonal argument). It follows that almost all sets, in a sense, are undecidable.

The following section will look at how the foregoing vague description of effective procedures can be made more precise—how it can be made into a mathematical concept. Nonetheless, the informal idea of what can be done by effective procedure, that is, what is calculable, can be useful.

For another example, consider what is required for a string of symbols to constitute an acceptable mathematical proof. Before one accepts a proof and adds the result being proved to the storehouse of mathematical knowledge, one insists that the proof be verifiable. That is, it should be possible for another mathematician, such as the referee of the paper containing the proof, to check, step by step, the correctness of the proof. Eventually, the referee concludes either that the proof is indeed correct or that the proof contains a gap or an error and is not yet acceptable. That is, the set of acceptable mathematical proofs should be decidable. This fact will be seen (in section 4) to have significant consequences for what can and cannot be proved. The conclusion follows that computability theory is relevant to the foundations of mathematics.

Before going on, one should broaden the canvas from considering decidable and undecidable sets to considering the more general situation of partial functions. Let U be either the set ℕ = {0,1,2, … } of natural numbers or the set Σ* of all strings of letters—all words—from a finite alphabet Σ. Then a k -place partial function on U is a function whose domain is included in U^k = U × U × … × U and whose range is included in U. And one can say that such a function is total if its domain is all of U^k.

For a k -place partial function f, one can say that f is an effectively calculable partial function if there exists an effective procedure with the following property:

Given a k -tuple x in the domain of f, the procedure eventually halts and returns the correct value for f (x )
Given a k -tuple x not in the domain of f, the procedure does not halt and return a value

(Strictly speaking, when U is ℕ, the procedure cannot be given numbers, it must be given numerals. Numerals are bits of language, which can be communicated. Numbers are not. Thus, the difference between U = ℕ and U = Σ* is even less than previously indicated.)

For example, the partial function for subtraction

(where ↑ indicates that the function is undefined) is effectively calculable, and procedures for calculating it, using base-10 numerals, are taught in the elementary schools.

The concept of decidability can then be described in terms of functions: For a subset S of U^k, one can say that S is decidable if its characteristic function

(which is always total) is effectively calculable. Here, "Yes" and "No" are some fixed members of U, such as 1 and 0 in the case of ℕ.

Here, if k = 1, then S is a set of numbers or a set of words. If k = 2, then one has the concept of a decidable binary relation on numbers or words, and so forth.

And it is natural to extend this concept to the situation where one has half of decidability: Say that S is semidecidable if its partial characteristic function

is an effectively calculable partial function. Thus, a set S of words—a language—is semidecidable if there is an effective procedure for recognizing members of S. One can think of S as the language that the procedure accepts.

The following is another example of a calculable partial function:
F (n ) = the smallest p > n such that both p and p + 2 are prime
Here, it is to be understood that F (n ) is undefined if there is no number p as described; thus F might not be total. For example, F (9) = 11. It is not known whether or not F is total. Nonetheless, one can be certain that F is effectively calculable. One procedure for calculating F (n ) proceeds as follows. "Given n, first put p = n + 1. Then check whether or not p and p + 2 are both prime. If they are, then stop and give output p. If not, increment p and continue." What if n = 10¹⁰⁰⁰? On the one hand, if there is a larger prime pair, then this procedure will find the first one, and halt with the correct output. On the other hand, if there is no larger prime pair, then the procedure never halts, so it never gives an answer. That is all right, because F (n ) is undefined—the procedure should not give any answer. (Of course, F is total if and only if (iff) the twin prime conjecture is true.)

Now suppose one modifies this example. Consider the total function:

Here, F (n ) ↓ means that F (n ) is defined so that n belongs to the domain of F. Then the function G is also effectively calculable. That is, there exists a program that calculates G correctly. That is not the same as saying that one knows that program. This example indicates the difference between knowing that a certain effective procedure exists and having the effective procedure in one's hands.

One person's program is another person's data. This is the principle behind operating systems (and behind the idea of a stored-program computer). One's favorite program is, to the operating system, another piece of data to be received as input and processed. The operating system is calculating the values of a two-place "universal" function, as in the following example.

Suppose one adopts a fixed method of encoding any set of instructions by a single natural number. (First, one converts the instructions to a string of 0s and 1s—one always does this with computer programs—and then one regards that string as naming a natural number under a suitable base-2 notation.) Then, the universal function
Φ(x, y ) = the result of applying the instructions coded by y to the input x
is an effectively calculable partial function (where it is understood that Φ(x, y ) is undefined whenever applying the instructions coded by y to the input x fails to halt and return an output). Here are the instructions for Φ: "Given x and y, decode y to see what it says to do with x, and then do it." Of course, the function Φ is not total.

Using this universal partial function, one can construct an undecidable binary relation, the halting relation H :

(x,y) ∈H ⇔ Φ (x,y)↓
⇔ applying the instructions coded byy to input x halts

To see that H is undecidable, one can argue as follows. Suppose that, to the contrary, H is decidable. Then the following function would be effectively calculable:

(Notice the use of the classical diagonal construction.) (To compute f (x ), one first would decide if (x, x ) ∈ H. If not, then f (x ) = Yes. If (x, x ) ∈ H, however, then the procedure for finding f (x ) should throw itself into an infinite loop, because f (x ) is undefined.) The function f cannot possibly be effectively calculable, however. Consider any set of instructions that might compute f. Those instructions have some code number k, but f has been constructed in such a way that f (k ) differs from the output from the result of applying instructions coded by k to the input k. (They differ because one is defined and one is not.) So these instructions cannot correctly compute f ; they produce the wrong result at the input k. And so one has a contradiction. That the previous relation H is undecidable is usually expressed by saying that "the halting problem is unsolvable"; that is, one cannot effectively determine, given x and y, whether applying the instructions coded by y to the input x will eventually terminate or will go on forever.

While the concept of effective calculability has been described in somewhat vague terms here, the following section will give a precise (mathematical) concept of a computable partial function. And then it will be argued that the mathematical concept of a computable partial function is the correct formalization of the informal concept of an effectively calculable partial function. This claim is known as Church's thesis or the Church-Turing thesis. Church's thesis, which relates an informal idea to a formal idea, is not itself a mathematical statement, capable of being given a proof, but one can look for evidence for or against Church's thesis; it all turns out to be evidence in favor.

One piece of evidence is the absence of counterexamples. That is, any function examined thus far that mathematicians have felt was effectively calculable, has been found to be computable.

Stronger evidence stems from the various attempts that different people made independently, trying to formalize the idea of effective calculability. Alonzo Church used λ-calculus, Alan M. Turing used an idealized computing agent (later called a Turing machine), and Emil Post developed a similar approach. Remarkably, all these attempts turned out to be equivalent, in that they all defined exactly the same class of functions, namely, the computable partial functions!

The study of effective calculability originated in the 1930s with work in mathematical logic. As noted earlier, the subject is related to the concept on an acceptable proof. Since the development of modern computers the study of effective calculability has formed an essential part of theoretical computer science. A prudent computer scientist would surely want to know that, apart from the difficulties the real world presents, there is a purely theoretical limit to calculability.

1. Formalizations

In the preceding section, the concept of effective calculability was described only informally. Now, these ideas will be made more precise (i.e., will be made part of mathematics). In fact, several approaches to doing this will be described: idealized computing devices, generative def-initions (i.e., the least class containing certain initialfunctions and closed under certain constructions), programming languages, and definability in formal languages. It is a significant fact that these different approaches all yield exactly equivalent concepts.

turing machines

In early 1935 Alan M. Turing was a twenty-two-year-old graduate student at King's College in Cambridge. Under the guidance of Max Newman, he was working on the problem of formalizing the concept of effective calculability. In 1936 he learned of the work of Alonzo Church at Princeton University. Church had also been working on this problem, and in his 1936 paper "An Unsolvable Problem of Elementary Number Theory" he presented a definite conclusion: that the class of effectively calculable functions should be identified with the class of functions definable in the λ-calculus, a formal language for specifying the construction of functions. Moreover, he showed that exactly the same class of functions could be characterized in terms of formal derivability from equations.

Turing then promptly completed writing his paper, in which he presented a different approach to characterizing the effectively calculable functions, but one that—as he proved—yielded once again the same class of functions as Church had proposed. With Newman's encouragement, Turing then went to Princeton for two years, where he wrote a doctoral dissertation under Church.

Turing's paper remains a readable introduction to his ideas. How might a diligent clerk carry out a calculation, following instructions? He might organize his work in a notebook. At any given moment his attention is focused on a particular page. Following his instructions, he might alter that page, and then he might turn to another page. And the notebook is large enough that he never comes to the last page.

The alphabet of symbols available to the clerk must be finite; if there were infinitely many symbols, then there would be two that were arbitrarily similar and so might be confused. One can then without loss of generality regard what can be written on one page of notebook as a single symbol. And one can envision the notebook pages as being placed side by side, forming a paper tape, consisting of squares, each square being either blank or printed with a symbol. At each stage of his work, the clerk—or the mechanical machine—can alter the square under examination, can turn attention to the next square or the previous one, and can look to the instructions to see what part of them to follow next. Turing described the latter part as a "change of state of mind."

Turing wrote, "We may now construct a machine to do the work" (1936–1937, p. 251). Of course, such a machine is now called a Turing machine, a phrase first used by Church in his review of Turing's paper in The Journal of Symbolic Logic. The machine has a potentially infinite tape, marked into squares. Initially, the given input numeral or word is written on the tape, but it is otherwise blank. The machine is capable of being in any one of finitely many states (the phrase "of mind" being inappropriate for a machine). At each step of calculation, depending on its state at the time, the machine can change the symbol in the square under examination at that time, can turn its attention to the square to the left or to the right, and can then change its state to another state.

The program for this Turing machine can be given by a table. Where the possible states of the machine are q ₁, …, q_r, each line of the table is a quintuple 〈q_i, S_j, S_k, D, q_m 〉, which is to be interpreted as directing that whenever the machine is in state q_i and the square under examination contains the symbol S_j, then that symbol should be altered to S_k and the machine should shift its attention to the square on the left (if D = L ) or on the right (if D = R ), and should change its state to q_m. For the program to be unambiguous, it should have no two different quintuples with the same first two components. (By relaxing this requirement regarding absence of ambiguity, one obtains the concept of a nondeterministic Turing machine, which will be useful later, in the discussion of feasible computability.) One of the states, say q ₁, is designated as the initial state—the state in which the machine begins its calculation. If one starts the machine running in this state and examining the first square of its input, it might (or might not), after some number of steps, reach a state and a symbol for which its table lacks a quintuple having that state and symbol for its first two components. At that point the machine halts, and one can look at the tape (starting with the square then under examination) to see what the output numeral or word is.

Now suppose that Σ is a finite alphabet and that f is a k -place partial function on the set Σ* of words. One says that f is Turing computable if there exists a Turing machine M that, when started in its initial state scanning the first symbol of a k -tuple w⃗ of words (written on the tape, with a blank square between words, and with everything to the right of w⃗ blank), behaves as follows:

If f (w⃗ ) ↓ (i.e., if w⃗ ∈ dom f ), then M eventually halts, and at that time it is scanning the leftmost symbol of the word f (w⃗ ) (which is followed by a blank square).
If f (w⃗ ) ↑ (i.e., if w⃗ ∉ dom f ), then M never halts.

This definition can be readily adapted to apply to k -place partial functions on ℕ.

Then Church's thesis, also called—particularly in the context of Turing machines—the Church-Turing thesis, is the claim that this concept of Turing computability is the correct formalization of the informal concept of effective calculability. Certainly, the definition reflects the ideas of following predetermined instructions, without limitation of the amount of time that might be required. (The name Church-Turing thesis obscures the fact that Church and Turing followed different paths in reaching equivalent conclusions.)

As will be explained shortly, Church's thesis has by now achieved universal acceptance. Kurt Gödel, writing in 1964 about the concept of a formal system in logic, involving the idea that the set of correct deductions must be a decidable set, said that "due to A. M. Turing's work, a precise and unquestionably adequate definition of the general concept of formal system can now be given" (Davis 1965, p. 71).

The robustness of the concept of Turing computability is evidenced by the fact that it is insensitive to certain modifications to the definition of a Turing machine. For example, one can impose limitations on the size of the alphabet, or one can insist that the machine never move to the left of its initial starting point. None of this will affect that class of Turing computable partial functions.

Turing developed these ideas before the introduction of modern digital computers. After World War II he played an active role in the development of early computers and in the emerging field of artificial intelligence. (During the war, he worked on deciphering the German battlefield code Enigma, work that remained classified until after Turing's death.) One can speculate whether Turing might have formulated his ideas somewhat differently, if his work had come after the introduction of digital computers.

primitive recursiveness and minimalization

For a second formalization of the calculability concept, a certain class of partial functions on ℕ will now be defined as the smallest class that contains certain initial function and is closed under certain constructions.

For the initial functions, one can take the following simple total functions:

The zero functions, that is, the constant functions f defined by the equation:
f (x ₁, …, x_k ) = 0

The successor function S, defined by the equation:
S (x ) = x + 1

The projection functions I^k_n from k -dimensions onto the n th coordinate,

where 1 ≤ n ≤ k.

One can form the closure of the class of initial functions under three constructions: composition, primitive recursion, and minimalization.

A k -place function h is said to be obtained by composition from the n -place function f and the k -place functions g ₁, …, g_n if the equation
h (x⃗ )=f (g ₁)(x⃗ ),…,g _n(x⃗ ))
holds for all x⃗. In the case of partial functions, it is to be understood here that h (x⃗ ) is undefined unless g ₁(x⃗ ), …, g_n (x⃗ ) are all defined and 〈g ₁(x⃗ ), …, g_n (x⃗ )〉 belongs to the domain of f.

A (k + 1)-place function h is said to be obtained by primitive recursion from the k -place function f and the (k + 2)-place function g (where k > 0) if the pair of equations
h (x⃗, 0) = f (x⃗ )
h (x⃗t +1) = g(t, h (x⃗, t ), x⃗ )
holds for all x⃗ and t.

Again, in the case of partial functions, it is to be understood that h (x⃗, t + 1) is undefined unless h (x⃗, t ) is defined and 〈t, h (x⃗, t ), x⃗ 〉 is in the domain of g.

For the k = 0 case, the one-place function h is obtained by primitive recursion from the two-place function g with the number m if the pair of equations
h (0) = m
h (t + 1) = g (t, h (t ))
holds for all t.

Postponing the matter of minimalization, one can define a function to be primitive recursive if it can be built up from zero, successor, and projection functions by use of composition and primitive recursion. In other words, the class of primitive recursive functions is the smallest class that includes the initial functions and is closed under composition and primitive recursion.

Clearly, all the primitive recursive functions are total. One can say that a k -ary relation R on ℕ is primitive recursive if its characteristic function is primitive recursive.

One can then show that a great many of the common functions on ℕ are primitive recursive: addition, multiplication, …, the function whose value at m is the (m + 1)st prime, …

On the one hand, it is clear that every primitive recursive function should be regarded as being effectively calculable. On the other hand, the class of primitive recursive functions cannot possibly comprehend all total calculable functions, because one can easily "diagonalize out" of the class. That is, by suitably indexing the "family tree" of the primitive recursive functions, one can make a list f ₀, f ₁, f ₂, … of all the one-place primitive recursive functions. One can then consider the diagonal function d (x ) = f_x (x ) + 1. Then d cannot be primitive recursive; it differs from each f_x at x. Nonetheless, if one makes the list tidely, the function d is effectively calculable. The conclusion is the class of primitive recursive functions is an extensive but proper subset of the total calculable functions.

Next, one can say that a k -place function h is obtained from the k + 1-place function g by minimalization and one writes
h (x⃗ )=μy [g (x⃗,y )=0]
if for each x⃗, the value h (x⃗ ) either is the number y such that g (x⃗, y ) = 0 and g (x⃗, s ) is defined and is nonzero for every s < y, if such a number y exists, or else is undefined, if no such number y exists. The idea behind this μ-operator is the idea of searching for the least number y that is the solution to an equation, by testing successively y = 0, 1, …

One can obtain the general recursive functions by adding minimalization to the closure methods. That is, a partial function is general recursive if it can be built up from the initial zero, successor, and projection functions by use of composition, primitive recursion, and minimalization.

The class of general recursive functions is (as Turing proved) exactly the same as the class of Turing computable functions. And Church's thesis therefore has the equivalent formulation that the concept of a general recursive function is the correct formalization of the informal concept of effective calculability.

What if one tries to diagonalize out of the class of general recursive functions, as one did for the primitive recursive functions? As will be argued later, one can again make a tidy list φ₀, φ₁, φ₂, … of all the one-place general recursive partial functions. And one can define the diagonal function d (x ) = φ_x (x ) + 1. In this equation, d (x ) is undefined unless φ_x (x ) is defined. The diagonal function d is indeed among the general recursive partial functions, and hence is φ_k for some k, but d (k ) must be undefined. No contradiction results.

The class of primitive recursive functions was defined by Gödel, in his 1931 paper on the incompleteness theorems. Of course, the idea of defining functions on ℕ by recursion is much older and reflects the idea that the natural numbers are built up from the number 0 by repeated application of the successor function. The theory of the general recursive functions was worked out primarily by Stephen Cole Kleene, a student of Church.

The use of the word recursive in the context of the primitive recursive functions is entirely reasonable. Gödel, writing in German, had used simply rekursiv for the primitive recursive functions. Retaining the word recursive for the general recursive functions was a, however, historical accident. The class of general recursive functions—as this section shows—has several characterizations in which recursion (i.e., defining a function in terms of its other values, or using routines that call themselves) plays no role at all.

Nonetheless, the terminology became standard. What are here called the computable partial functions were until the late 1990s standardly called the partial recursive functions. And for that matter, computability theory was called recursive function theory for many years, and then recursion theory. And relations on ℕ were said to be recursive if their characteristic functions were general recursive functions.

An effort is now being made, however, to change what had been the standard terminology. Accordingly, this entry speaks of computable partial functions. And it will call a relation computable if its characteristic function is a computable function. Thus, the concept of a computable relation corresponds to the informal notion of a decidable relation. In any case, there is definitely a need to have separate adjectives for the informal concept (here, calculable is used for functions, and decidable for relations) and the formally defined concept (here, computable ).

loop and while programs

The idea behind the concept of effective calculable functions is that one should be able to give explicit instructions—a program—for calculating such a function. What programming language would be adequate here? Actually, any of the commonly used programming languages would suffice, if freed from certain practical limitations, such as the size of the number denoted by a variable. One can give here a simple programming language with the property that the programmable functions are exactly the computable partial functions on ℕ.

The variables of the language are X ₀, X ₁, X ₂, … Although there are infinitely many variables in the language, any one program, being a finite string of commands, can have only finitely many of these variables. If one wants the language to consist of words over a finite alphabet, one can replace X ₃, say, by X ‴.

In running a program, each variable in the program gets assigned a natural number. There is no limit on how large this number can be. Initially, some of the variables will contain the input to the function; the language has no input commands. Similarly, the language has no output commands; when (and if) the program halts, the value of X ₀ is to be the function value.

The commands of the language come in five kinds:

(1) X_n ← 0. This is the clear command; its effect is to assign the value 0 to X_n.
(2) X_n ← X_n + 1. This is the increment command; its effect is to increase the value assigned to X_n by one.
(3) X_n ← X_m. This is the copy command; its effect is just what the name suggests; in particular it leaves the value of X_m unchanged.
(4) Loop X_n and endloop X_n. These are the loop commands, and they must be used in pairs. That is, if 𝒫 is a program—a syntactically correct string of commands—then so is the string:
loop X_n
𝒫
endloop X_n
What this program means is that 𝒫 is to be executed a certain number k of times. And that number k is the initial value of X_n, the value assigned to X_n before one starts executing 𝒫. Possibly, 𝒫 will change the value of X_n ; this has no effect at all on k.
(5) While X_n ≠ 0 and endwhile X_n ≠ 0. These are the while commands; again, they must be used in pairs, like the loop commands, but there is a difference. The program
while X_n ≠ 0
𝒫
endwhile X_n ≠ 0
also executes the program 𝒫 some number k of times. Now, however, k is not determined in advance; it matters very much how 𝒫 changes the value of X_n. The number k is the least number (if any) such that executing 𝒫 that many times causes X_n to be assigned the value 0. The program will run forever if there is no such k.

And those are the only commands. A while program is a sequence of commands, subject only to the requirement that the loop and while commands are used in pairs, as illustrated. Clearly, this programming language is simple enough to be simulated by any of the common programming language, if one ignores overflow problems.

A loop program is a while program with no while commands; that is, it has only clear, increment, copy, and loop commands. Note the important property: A loop program always halts, no matter what. It is easy, however, to make a while program that never halts.

One can say that a k -place partial function f on ℕ is while-computable if there exists a while program 𝒫 that, whenever started with a k -tuple x⃗ assigned to the variables X ₁, …, X_k and 0 assigned to the other variables, behaves as follows:

If f (x⃗ ) is defined, then the program eventually halts, with X ₀ assigned the value f (x⃗ ).
If f (x⃗ ) is undefined, then the program never halts.

The loop-computable functions are defined in the analogous way. There is the difference, however, that any loop-computable function is total.

Theorem

(a) A function on ℕ is loop-computable iff it is primitive recursive.

(b) A partial function on ℕ is while-computable iff it is general recursive.

The proof in one direction, to show that every primitive recursive functions is loop-computable, involves a series of programming exercises. The proof in the other direction involves coding the status of a program 𝒫 on input x⃗ after t steps, and showing that there are primitive recursive functions enabling one to determine the status after t + 1 steps, and the terminal status. Because the class of general recursive partial functions coincides with the class of Turing computable partial functions, one can conclude from the previous theorem that while-computability coincides with Turing computability.

definability in formal languages

In his 1936 paper in which he presented what is now known as Church's thesis, Church utilized a formal system, the λ-calculus. Church had developed this system as part of his study of the foundations of logic. In particular, for each natural number n there is a formula n of the system denoting n, that is, a numeral for n. More important, formulas could be used to represent the construction of functions. He defined a two-place function F to be λ-definable if there existed a formula F of the λ-calculus such wherever F (m, n ) = r then the formula {F }(m, n ) was convertible, following the rules of the system, to the formula r , and only then. An analogous definition applied to k -place functions.

Church's student, Stephen Cole Kleene, showed that a function was λ-definable iff it was general recursive. (Church and his student, J. B. Rosser, were also involved in the development of this result.) Church wrote in his paper, "The fact … that two such widely different and (in the opinion of the author) equally natural definitions of effective calculability turn out to be equivalent adds to the strength of reasons … for believing that they constitute as general a characterization of this notion as is consistent with the usual intuitive understanding of it" (Alonzo 1936, p. 346).

Earlier, in 1934, Gödel, in his lectures at Princeton, formulated a concept now referred to as Gödel-Herbrand computability. He did not, however, at the time propose the concept as a formalization of the concept of effective calculability. The concept involved a formal calculus of equations between terms built up from variables and function symbols. The calculus permitted the passage from an equation A = B to another equation obtained by substituting for a part C of A or B another term D where the equation C = D had been derived. If a set ℰ of equations allowed the derivation, in a suitable sense, of exactly the right values for a function f on ℕ, then ℰ was said to be a set of recursion equations for f. Once again, it turned out that a set of recursion equations existed for f iff f was a general recursive function.

A rather different approach to characterizing the effectively calculable functions involved definability in first-order logic over the structure of the natural numbers with addition and multiplication. Say that a k -place partial function f on ℕ is a Σ₁-function if the graph of f (i.e., the (k + 1)-ary relation {〈x ₁, …, x_k, y 〉 │ f (x ₁, …, x_k ) = y }) is definable in the structure with universe ℕ and with the operations of addition, multiplication, and exponentiation, by an existential formula (i.e., a formula consisting of a string of existential quantifiers, followed by a quantifier-free part). Then the class of partial Σ₁-functions coincides exactly with the class of partial functions given by the other formalizations of calculability described here. Moreover, Yuri Matijasevič showed in 1970 that the operation of exponentiation was not needed here.

Finally, say that a k -place partial function f on ℕ is representable if there exists some finitely axiomatizable theory T in a language having a suitable numerals n for each natural number n, and there exists a formula φ of that language such that (for any natural numbers) f (x ₁, …, x_k ) = y iff φ(x ₁, …, x _k, y ) is a sentence deducible in the theory T. Then once again the class of representable partial functions coincides exactly with the class of partial functions given by the other formalizations of calculability described here.

2. Basic Results

First, one has the remarkable fact that all the formalizations of the preceding section yield exactly the same class of partial functions on ℕ. And this fact is not only remarkable, it is also reassuring, indicating that the concept captured by the formalizations—the concept of a computable partial function—is natural and significant. Moreover, it gives evidence that the concept captured by the formalizations is actually the correct formalization of the informal concept of effective calculability. That is, it gives evidence for Church's thesis (or the Church-Turing thesis). This thesis was first set forth by Church in a 1935 abstract, and then published in full in his 1936 paper. (At the time, Church was unaware of Turing's approach, but he knew of several of the other formalizations described in the preceding section.) This assertion, that computability is the precise counterpart to effective calculability, is not really a mathematical statement susceptible of proof or disproof; rather, it is a judgment that one has found the correct formalization of the one's informal concept.

The situation can be compared to one encountered in calculus. An intuitively continuous function (defined on an interval) is one whose graph one can draw without lifting the pencil off the paper. To prove theorems, however, some formal counterpart of this notion is needed. And so one gives the usual definition of ℇ-δ-continuity. One should ask if the precise notion of ℇ-δ-continuity is an accurate formalization of intuitive continuity. If anything, the class of ℇ-δ-continuous functions is too broad. It includes nowhere differentiable functions, whose graphs cannot be drawn without lifting the pencil—there is no way to impart a velocity vector to the pencil. Nonetheless, the class of ℇ-δ-continuous functions has been found to be a natural and important class in mathematical analysis.

In a similar spirit, one can ask how accurately the formal concept of computability captures the informal concept of effective calculability. As with continuous functions, the precisely defined class (of computable functions) appears to be, if anything, too broad. It includes functions for which any procedure will, for large inputs, require so much computing time and memory (scratch paper) space as to make implementation absurd. Computability corresponds to calculability in an idealized world, where length of computation and amount of memory are disregarded. (This will be discussed further in section 7.) In any case, however, the class of computable partial functions has been found to be a natural and important class in mathematical logic.

Empirical evidence that the class of computable functions is not too narrow is provided both by the fact that the attempted formalizations (as described in section 1) have all yielded the equivalent concepts, and by the fact that no counterexample have arisen—the functions considered thus far that mathematicians have felt were effectively calculable have turned out to be computable. In the decades since 1936, Church's thesis has gained universal acceptance.

normal form

In each of the formalizations described in the preceding section, one can in a straightforward way code the instructions for any computable partial function by a natural number e. In the case of Turing machines,e encodes the set of quintuples that determine the machine's operation. In the case of a function built up from the zero, successor, and projection functions by primitive recursion and minimalization, e encodes the ancestral tree describing exactly how the function is built up. In the case of while programs, e encodes the program.

Normal form theorem

There is a ternary computable relation T and a total computable function U with the following property: For each 1-place computable partial function f on ℕ, there is a natural number e such that
f (x ) = U (μyT (e, x, y ))
for every number x.

Here (as elsewhere), equality has the natural meaning: Either both sides of the equation are defined and are the same, or else both sides are undefined.

One can construct the relation T (called the Kleene T -predicate) so that T (e, x, y ) expresses the idea that e encodes the instructions for f, and y encodes the entire history of the step-by-step computation of f with input x, from the beginning through the final step at which the computational procedure comes to a halt. Then the function U (the upshot function) extracts from y what the answer or output is.

The normal form theorem can be extended to k -place functions. One can make a (k + 2)-ary computable relation T_k such that for each k -place computable partial function f, there is a number e such that
f (x ₁, …, x_k ) = U (μyT_k (e, x ₁, …, x_k, y ))
for every x ₁, …, x_k. Moreover, one can construct T_k and U so that they are even primitive recursive.

The significance of the normal form theorem is that it allows one to form a universal partial computable function. One can define
φ_e (x ) = U (μyT (e, x, y ))
(where, of course, φ_e (x ) ↑ if the right side of the equation is undefined, which happens if there does not exist a y such that T (e, x, y )). Then on the one hand, φ_e (x ) is a computable partial 2-place function of x and e. And on the other hand, each 1-place computable partial function equals φ_e for some e. That is,
φ₀, φ₁, φ₂, …
is a complete list of all the computable partial 1-place functions.

Similarly, one can extend these ideas to k -place partial functions:

Then

is a complete list of all the computable partial k -place functions.

Whenever one has such a list, one can diagonalize out of it. One can define the set K by the condition
x ∈ K ⇔ φ_x (x ) ↓
so that a number x (thought of as encoding a program for computing a partial function) belongs to K iff that program, given x itself as input, halts and returns a value.

Then the diagonal function

is a total function, but it cannot equal φ_e for any e, because it differs from φ_e at e. So d is not a computable function. If K were computable, however, then d would be computable, because the partial function φ_x (x ) + 1 is computable.

One can conclude that K is not a computable set; its characteristic function C_K is not a computable function. But the partial characteristic function

is a computable partial function; c_k (x ) = 1 + 0·φ_x (x ).

Theorem

For a set A of numbers, the following are equivalent:

(1) The partial characteristic function of A is a computable partial function
(2) A is the domain of some computable partial function
(3) For some computable binary relation R,
x ∈ A ⇔ R (x, y ) for some y

(Here (2) ⇒ (3) because x ∈ dom φ_e ⇔ T (e, x, y ) for some y. And (3) ⇒ (1) because one can use the function 1 + 0·μyR (x, y ).)

A set A with the properties of this theorem is said to be computably enumerable (c.e.). The concept of a c.e. set is the formalization of the informal concept of a semidecidable set, discussed in section 0. And Church's thesis assures one that it is the correct formalization.

In the previously standard terminology mentioned earlier, a set A with the properties of the theorem was said to be recursively enumerable (r.e.). In fact, this terminology—especially the abbreviation—has become so well established that the prospects for reform are uncertain.

The theorem extends to the case where A is a k -ary relation on ℕ; now in part (3) the relation R is k + 1-ary. Thus, one may speak of c.e. (or r.e.) relations on ℕ.

Unsolvability of the halting problem

The binary relation {〈x, y 〉 │ φ_y (x ) ↓} is c.e. but not computable.

This relation—the halting relation—cannot be computable lest the previous diagonal function d be computable. It is c.e., because φ_y (x ) ↓ ⇔ ∃zT (y, x, z ).

If one defines W_e = domφ_e, then as a consequence of the normal form theorem, one has a complete list
W ₀, W ₁, W ₂, …
of all the c.e. sets. The set K can be described simply as {x │ x ∈ W_x }.

The following is not hard to see:

Kleene's theorem

A set is computable iff both it and its complement are c.e.

For example, the complement K̄ is not only noncomputable, it is not even c.e.

For another example of an undecidable set, take the set of programs that compute total functions:
Tot = {e │ φ_e is total}
The same argument used for K shows that Tot is not computable. Moreover, Tot is not c.e. In fact, a slightly stronger statement holds: There is no c.e. set P such that {φ_e │ e ∈ P } coincides with the class of total computable functions. Thus, if P is a c.e. set of programs that compute only total functions, then there must be some total computable function with no program in P.

Rice's theorem

Suppose that C is a collection of computable partial 1-place functions, and let I be {e │ φ_e ∈ C }. Then I is computable only in two trivial cases: when C is empty and when C is the collection of all computable partial functions.

For example, suppose one focuses attention on a particular computable partial function f. Rice's theorem asserts that one cannot always decide of a given program whether or not that program correctly computes f.

The name computably enumerable corresponds to yet another characterization: A set is c.e. iff there is a Turing machine (augmented with a suitable output tape) that can generate, in some order, the members of that set, one after another. More formally, a set S of natural numbers is c.e. iff it is either empty or is the range of some total computable function f, that is, S = {f (0), f (1), … }. In fact, one can even insist that f be primitive recursive. In general the function f will not enumerate the members of S in numerical order, however (i.e., f will not in general be an increasing function). The range of an increasing function (or even of a nondecreasing function) will always be a computable set.

It is easy to see that if f is a two-place computable partial function, then the result of holding one variable fixed (as a parameter)
g (x ) = f (36, x )
is a one-place computable partial function g. Often, one needs the more subtle fact: A program for g can be effectively found from the program for f and the value of the parameter.

Parameter theorem

There is a total computable function ρ such that
φ_e (t, x ) = φ_{ρ(e, t)} (x )
for all e, t, and x.

The analogous statement holds for more variables, that is, for an m -tuple t and an n -tuple x in place of t and x. The parameter theorem commonly goes by the cryptic name of the S-m-n theorem.

A deeper result is the following theorem, which is due to Kleene.

Recursion theorem

For any computable partial function g, one can find an e such that
φ_e (x ) = g (e, x )
for all x.

Again, x can be replaced by an n -tuple x. The proof of the recursion theorem is similar to the argument used to produce self-referential sentences in number theory, such as those used in proving Gödel's incompleteness theorem.

3. Axiomatizable Theories

The connection between computability theory and logic hinges on the fact that proofs must be effectively recognizable.

The concept of a proof is basic to logic. What exactly is a proof? As indicated in section 0, for a proof to be acceptable, it must be possible—in principle—to fill in enough steps that a hard-working graduate student (or a referee) can verify its correctness. One cannot demand that this student have the same brilliant insight that the proof's discoverer had. Nor can one demand that the student spend an infinite amount of time checking an infinite number of cases. What one can insist is that, given a correct proof (with all the steps filled in), the student will eventually complete the verification and stand up and say, "Yes, this proof is correct."

This is just to say, however, that the set of correct proofs must be at least semidecidable. And typically one expects that the set will even be decidable, lest the student work forever attempting to verify an incorrect proof. In an axiomatic theory, one expects to be able to tell (effectively) an axiom from a nonaxiom, and one expects to be able to determine (effectively) whether or not a rule of inference is being correctly applied.

Even with the weaker property of semidecidability, it follows that the set of theorems—the set of sentences that have proofs—is semidecidable. (Given a sentence, one could employ the brute-force procedure of going through all strings of symbols in a systematic way, spending more and more time on each, attempting to verify that it is a proof of that sentence.) That is, the set of theorems must be c.e.

More formally, assume one has a first-order language, such as the language for set theory. Formulas are (or can be made to be) strings over a finite alphabet, so the concepts of computability theory are applicable. (It is being assumed here that the language has a reasonably simple array of nonlogical symbols.) One can define a theory to be a set of sentences closed under logical consequence. In particular, for a set A of formulas adopted as axioms, one can obtain the theory T_A consisting of all sentences that are logical consequences of A.

Theorem

(a) If A is a computable set or a c.e. set of axioms, then the set T_A of logical consequences of A is c.e.

(b) (Craig's theorem.) Conversely, if a theory T is c.e., then there is a computable set A of axioms such that T is the set of logical consequences of A.

Part (a) follows from the Gödel completeness theorem for first-order logic. The set of logical consequences of A is the same as the set of sentences derivable from A in the predicate calculus. If one has a machine that can generate the axioms, then one can organize a machine to generate the theorems.

Part (b) utilizes the simple fact that if one can generate the members of T in some order,
T = {τ₀, τ₁, τ₂, … }
then one can generate a suitable set of axioms in increasing order:
A = {τ₀, τ₀ ∧ τ₁, τ₀ ∧ τ₁ ∧ τ₂, … }
So A is computable.

If one defines a theory T to be axiomatizable if there exists a computable set of axioms for it (or equivalently, if there exists a c.e. set of axioms for it), then there is the conclusion: A theory is axiomatizable iff it is c.e.

For example, the usual ZFC axioms for set theory form a computable set of axioms, so the set of theorems of ZFC is a c.e. set. At the other extreme, taking the set of axioms to be empty, one can conclude that the set of valid sentences is c.e. The set of valid sentences is, however, undecidable:

Church's theorem

Assume the language has at least one two-place predicate symbol. Then the set of valid sentences is not computable.

4. GÖdel Incompleteness Theorem

This section examines Gödel's first incompleteness theorem, from the point of view of computability theory. As the context, first-order theories of arithmetic, that is, theories dealing with the natural numbers with the operations of addition and multiplication, will be considered. Certainly, the study of the natural numbers with addition and multiplication is a basic part of mathematics, in the real sense that it is the topic in mathematics that school children study first.

The structure that is focused on here
𝔑 = (ℕ; 0, S, +, ×)
consists of the set ℕ of natural numbers with the distinguished element 0 and the operations of successor (S ), addition (+), and multiplication (×). The first-order language corresponding to this structure has quantifiers ∀ and ∃ ranging over ℕ, a constant symbol 0 for the number 0, and function symbols S , +, and × for successor, addition, and multiplication.

The set of all sentences of this language that are true in standard structure 𝔑 will be called the theory of true arithmetic. Although this theory deals with basic topics, it is by no means trivial. For example, it is not hard to see that the set of prime numbers is definable in 𝔑, that is, one can write down a formula π(x ) of the language that is satisfied in 𝔑 when the number n is assigned to x iff n is prime:

where one substitutes for x the numeral n for n, that is the numeral SS ···S0 . Using this formula π one can then write down a sentence in the language that expresses the twin prime conjecture, or a sentence that expresses Goldbach's conjecture. But the truth or falsity of these conjectures remains unknown.

What can one say quantitatively about the complexity of the theory of true arithmetic? It will be seen in this section that the theory is not c.e. and hence is not an axiomatizable theory. One connection between 𝔑 and computability is expressed by the result:

Theorem

Every computable relation over ℕ is definable in the structure 𝔑. That is, for each computable k -ary relation R ⊆ ℕ^k there is a formula _ρ defining R in 𝔑:
〈n ₁, …, n_k 〉 ∈ R ⇔ ⊧_𝔑 ρ [n ₁, …, n_k ]
As an immediate consequence of this theorem, one can conclude that c.e. relations are also definable in 𝔑. This is because any c.e. relation Q is the domain of some computable relation R :
m ∈ Q ⇔ 〈m, n 〉 ∈ R, for some n
Thus, if ρ (x, y ) defines R in 𝔑, then ∃yρ (x, y ) defines Q. Moreover, ¬∃yρ (x, y ) defines the complement Q̄ of Q. And by repeating the previous argument, the domain of Q̄ is definable.

The conclusion is that any relation over ℕ that is obtainable from the computable relations by the operations of forming the domain (i.e., projection) and forming the complement, iterated any number of times, will be definable in the structure 𝔑. (The converse is also true; these are exactly the definable relations; see section 6.)

In particular, the set K̄ is definable in 𝔑, where K is the c.e. but noncomputable set constructed earlier. That is, there is some formula κ(x ) that defines K, so that ¬κ(x ) defines K̄ and
n ∈ K̄ ⇔ ¬κ(n ) is true in 𝔑.
It follows from this, however, that the set of sentences (of the language) true in 𝔑 cannot be semidecidable, lest equivalence yield an effective procedure for recognizing membership in this. Thus, one comes to the conclusion that truth in arithmetic is not a c.e. concept:

Theorem

The set of sentences true in 𝔑 is not c.e.

(An elaboration of this argument would give Tarski's theorem: The set of sentences true in 𝔑, when converted to a set of natural numbers, is not definable in 𝔑.)

This theorem, with the previous section, asserts that the theory of true arithmetic is not axiomatizable. So any axiomatizable subtheory fails to give all of true arithmetic.

Gödel's incompleteness theorem

For any axiomatizable subtheory T of true arithmetic, one can find a true sentence that is not derivable in T.

In fact, here is how one can find that true, underivable sentence. Let
J = {n │ T ├ ¬κ(n )},
the set of numbers that T "knows" are in K̄. Because T is axiomatizable (and hence c.e.), the set J is c.e., and so J = W_j for some number j. Moreover, J is a subset of K̄ so it cannot be all of that set; there is a number in K̄ that is not in J. In fact, j is such a number.

That is, the sentence ¬κ(j ) is a true sentence (j is really in K̄ ) that T does not prove (T does not know that j ∈ K̄ ). Thus, the sentence ¬κ(j ) is a specific witness to T 's incompleteness.

And what might this sentence ¬κ(j ) say? Interpreted in 𝔑 it speaks of numbers and their sums and products. One can give it a more interesting translation, however:

That is, the witness (the true unprovable sentence) asserts, in a sense, its own unprovability!

The conclusion is that the computability theory approach to Gödel's incompleteness theorem, based on c.e. sets, is not so different from the more traditional approach, which uses a diagonal construction to produce a sentence asserting, in a sense, its own unprovability.

5. Degrees of Unsolvability

Some unsolvable problems are more unsolvable than others. To make sense of this idea, one can employ the concept of relative computability.

Consider a fixed set B of natural numbers. Then a partial function f should be considered effectively calculable relative to B if there is a procedure that computes f and is effective except that it is allowed to appeal to an "oracle" for B. An oracle for B can be thought of as a device that, given a number x, responds by saying whether or not x is in B.

Any of the formalizations of calculability given in section 1 can be augmented to incorporate such an oracle. For example, in the case of primitive recursive functions, one can simply add the characteristic function of B as a new initial function. As before, the various formalizations give exactly the same class of partial functions. Thus, one may speak unambiguously of computability relative to a set B.

Of course, if B is a computable set, however, the computability relative to B is simply equivalent to computability. For a noncomputable set, however, some noncomputable functions will become computable relative to B (the characteristic function of B, for one).

The concept of relative computability was introduced by Turing in a 1939 paper. At first glance, it seems an odd concept, combining as it does the most constructive approach to functions (that of computability) with the least constructive approach (that of a magical oracle). It is to Turing's credit that he perceived the value of the concept.

For sets A and B of natural numbers, one can say that A is computable in B, or that A is Turing reducible to B (written A ≤_T B ) if the characteristic function of A is computable relative to B. That is, saying that A ≤_T B implies that membership in A is no harder to decide than is membership in B. The ≤_T relation is transitive and is reflexive on 𝒫ℕ (i.e., it is a preordering). Informally, transitivity of ≤_T corresponds to connecting machines in series. Consequently, the symmetric version
A ≡_T B ⇔ A ≤_T B and B ≤_T A
is an equivalence relation on 𝒫ℕ, and ≤_T gives a partial ordering of the equivalence classes. These equivalence classes are called degrees of unsolvability, or simply degrees.

There is a least degree 0 , the class of the computable sets. Each degree must be a countable collection of sets (because there are only countably many programs), and so there are 2^ℵ₀ equivalence classes altogether. Any two degrees have a least upper bound. The earlier construction of a noncomputable set K can be relativized:

(where φ^B_x is the partial function computed, relative to B, by the program e ). Then the degree of B ′ is strictly larger (under ≤_T) than the degree of B ; thus, there is no largest degree.

The set B ′ is called the jump of B. Thus, the jump operation can be applied to a set to obtain a set of higher degree, and this operation can be iterated:
B <_T B ′ <_T B ″ <_T B ‴ <_T …

The degrees are not linearly ordered. It is possible to construct simultaneously sets A and B in such a way as to sabotage each machine that might reduce one set to the other. In fact, much more is true; one can construct 2^ℵ₀ degrees that are all incomparable to each other under the ordering.

One can define a degree to be c.e. if it contains a c.e. set. These degrees are of particular interest because they are the degrees of axiomatizable theories. The least degree 0 is the degree of the decidable theories. Earlier, a noncomputable c.e. set K = {x │ φ_x (x ) ↓} was constructed. So the degree of K, denoted 0 ′, is a c.e. degree greater than 0 . The halting problem for Turing machines (regarded as a set of integers) also has degree 0 ′. It is not hard to show that 0 ′ is the largest c.e. degree: for every c.e. degree a one has a ≤_T 0 ′. (Thus, any c.e. set of degree 0 ′ is ≤_T-complete for c.e. sets, in the sense that every other c.e. set is computable in it.)

A number of undecidable axiomatizable theories turn out to have degree 0 ′: the validities of predicate calculus (with at least a binary predicate symbol), first-order Peano arithmetic, ZF set theory (if consistent), and others.

In 1944 Emil Post raised the question whether there were any c.e. degrees other than 0 and 0 ′. This question, which became known as Post's problem, was finally answered in 1956 (two years after Post's death), independently by Richard Friedberg (in his Harvard senior thesis) and by A. A. Mučnik (in the Soviet Union). They showed that intermediate c.e. degrees do indeed exist, and in great profusion. Gerald Sacks later showed that any countable partial ordering can be embedded—as a partial ordering—in the partial ordering of c.e. degrees.

Although the natural axiomatizable theories have turned out to have either degree 0 or degree 0 ′, Solomon Feferman showed that every c.e. degree contains some axiomatizable theory.

There is a simpler way in which questions about membership in one set might be effectively reducible to questions about another set. One can define A to be many-one reducible to B (written A ≤_m B ) if there is a total computable function f such that
x ∈ A ⇔ f (x ) ∈ B
for all natural numbers x. The idea is that each question "x ∈ A ?" about A is reduced by f to one question about B. Moreover, if there is such a reduction function f that is one to one, then one can say that A is one-one reducible to B (written A ≤₁ B ). Clearly,
A ≤₁ B ⇒ A ≤_m B ⇒ A ≤_T B
and in general neither arrow can be reversed. Again, both ≤₁ and ≤_m are preorders, so the corresponding symmetric relations
A ≡₁ B ⇔ A ≤₁ B & B ≤₁ A and A ≡_m B ⇔ A ≤_m B & B ≤_m A
are equivalence relations on 𝒫ℕ, and the equivalence classes (the one-one degrees and the many-one degrees) are partially ordered. John Myhill showed that if A ≡₁ B, then there is a total computable permutation of ℕ taking A onto B.

It is not hard to make a c.e. set that is ≤₁-complete for c.e. sets, that is, every c.e. set is one-one reducible to it. In fact, K is such a set.

6. Definability in Arithmetic

As in section 4, let
𝔑 = (ℕ; 0, S, +, ×)
be the standard structure for arithmetic, consisting of the set ℕ of natural numbers with the distinguished element 0 and the operations of successor, addition, and multiplication. In this structure, what sets (or relations or functions) are definable by first-order formulas? In section 4 it was noted that every computable relation is definable in arithmetic, and section 5 used the fact that some noncomputable sets (such as K̄ ) are also definable. Now, one can approach the matter more systematically.

Say that a relation (on ℕ) is arithmetical if it is definable in 𝔑. Of course, only countably many relations can be arithmetical, because there are only countably many formulas. One wants to classify these relations, according to the quantifier depth of the defining formulas.

From section 2 it is known that a relation A is c.e. iff it is the domain of some computable relation R :
m⃗ ∈ A ⇔ R(m⃗,n) for some n
If ρ (x ₁, …, x_k, y ) defines R, then the formula ∃yρ (x ₁, …, x_k, y ) defines A, so that A is "one quantifier away" from being computable. One can say that such relations A are Σ₁. (Yuri Matijacevič showed in 1970 that in fact every c.e. relation is definable by an existential formula, that is, one of the form
∃y ₁···∃y ₁ ρ (x ₁, …, x_k, y ₁, …, y₁ )
where ρ is quantifier-free, but that fact is not needed here.)

Next, call a relation Σ₂ if it is definable by a formula
∃y ₁∀y ₂ ρ (x ₁, …, x_k, y ₁, y ₂)
where ρ defines a computable relation. Call a reaction Σ₃ if it is definable by a formula
∃y ₁∀y ₂∃y ₃ ρ (x ₁, …, x_k, y ₁, y ₂, y ₃)
where ρ defines a computable relation), and so forth.

The dual concept, where one reverses existential and universal quantifiers, gives the Π_k relations. That is, call a relation Π₁ if it is definable by a formula
∀y ρ (x ₁, …, x_k, y )
call it Π₂ if it is definable by a formula
∀y ₁∃y ₂ ρ(x ₁, …, x_k, y ₁, y ₂),
call it Π₃ if it is definable by a formula
∀y ₁∃y ₂∀y ₃ ρ(x ₁, …, x_k, y ₁, y ₂, y ₃),
and so forth, where in each case ρ defines a computable relation.

Then the Π_k relations are exactly the complements of the Σ_k relations. In effect, one is measuring how far away a relation is from decidability.

By adding vacuous quantifiers, one sees that any Σ_k relation is both Σ_{k +1} and Π_{k +1}. And every definable relation appears somewhere in this hierarchy, because it will be definable by a prenex formula, the quantifier-free part of which always defines a computable (in fact primitive recursive) relation.

For example, the set {e │ φ_e is total} of programs of total functions is Π₂, because φ_e is total iff ∀m ∃n T (e, m, n ). By Kleene's theorem, a relation is computable iff it is both Σ₁ and Π₁. The set K is Σ₁ but not Π₁. And in analogy to this fact, one can, for each k, construct a set that is Σ_k but not Π_k. Thus, letting the noun Σ_k denote the collection of all Σ_k relations, one has proper inclusion in both of the chains
Σ₁ ⊂ Σ₂ ⊂ Σ₃ ⊂ ···
Π₁ ⊂ Π₂ ⊂ Π₃ ⊂ ···
and in both cases the union of the chains is exactly the class of arithmetical relations. One can say that these chains define the arithmetical hierarchy.

From the point of view of the arithmetical hierarchy, one can obtain Tarski's theorem that the theory of true arithmetic is not arithmetical:

Tarski's theorem

The set of sentences true in 𝔑, regarded as a set of numbers, is not definable in 𝔑.

Let T be the set of true sentences. It suffices to show, for each k, that T cannot be Σ_k. Let A be a set that is arithmetical but not Σ_k (as indicated, there is such a set, and one can even make it Π_k ). Then A is definable by some formula α (x ) and
n ∈ A ⇔ α (n ) ∈ T
which shows that A ≤_m T (where T has been identified with the corresponding set of numbers). That is, for some total computable function f (which substitutes numerals into α ),
n ∈ A ⇔ f (n ) ∈ T.
If, contrary to one's hopes, T were Σ_k, then the previous line would let one conclude that A is also Σ_k, which it is not.

There is also a connection between the arithmetical hierarchy and relative computability, as in section 5. The following result extends the fact that a relation is Σ₁ iff it is c.e.

Post's theorem

(a) A relation is Σ₂ iff it is c.e. in ∅′, the jump of the empty set.

(b) More generally, a relation is Σ_k ₊₁ iff it is c.e. in ∅^(k), the k th jump of the empty set.

7. Feasible Computability

Up to now, this entry has approached computability from the point of view that there should be no constraints on the time required for a particular computation, or on the amount of memory space that might be required. The result is that some total computable functions will take a long time to compute. If a function f grows rapidly, then for large x it will take a long time simply to generate the output f (x ). There are also, however, bounded functions that require a large amount of time.

To be more precise, suppose one adopts one of the formalizations from section 1 (any one will do), and one defines in a reasonable way the "number of steps" or the "time required" in a computation. (Manuel Blum converted the term reasonable into axioms for what a complexity measure should be.) Then Michael Rabin showed that for any total computable function h, no matter how fast it grows, one can find another total computable function f with range {0, 1} such that for any program e for f (i.e., f = φ_e ), the time required for e to compute f (x ) exceeds h (x ) for all but finitely many values of x. (The function f is constructed in stages, in such a way as to sabotage any fast program that might try to compute f.)

Is there a more restricted concept of "feasibly computable function" where the amount of time required does not grow beyond all reason, where the amount of time required is an amount that might actually be practical, at least when the input to the function is not absurdly large? To this vague question, an exact answer has been proposed.

Once can call a function f polynomial-time computable (or for short, P-time computable) if there is a program e for f and a polynomial p such that for every x, the program e computes f (x ) in no more than p (│x │) steps, where │x │ is the length of x.

This definition requires some explanation and support. If f is a function over Σ*, the set of words over a finite alphabet Σ, then of course │x │ is just the number of symbols in the word x. If f is a function over ℕ, then │x │ is the length of the numeral for x. (Here, one comes again to the fact that effective procedures work with numerals, not numbers.) So if one uses base-2 numerals for ℕ, then │x │ is about log₂x.

Moreover, there was vagueness about exactly how the number of steps in a computation was to be determined. Here the situation is encouraging: The class of P-time computable functions is the same, under the different reasonable ways of counting steps.

Back in sections 0 and 1 there was the encouraging fact that many different ways of formalizing the concept of effective calculability yielded exactly the same class of functions. As remarkable as that fact is, even more is true. The number of steps required by one formalization is bounded by a polynomial in the number of steps required by another. For example, there exists a polynomial p (of moderate degree) such that a computation by a Turing machine that requires n steps can be simulated by a loop-while program that requires no more than p (n ) steps. Consequently, the concept of a P-time computable function is robust: One can get the same class of functions, regardless of which formalization from section 1 is employed. To be sure, the degrees of the polynomials will vary somewhat, but the class of P-time functions is unchanged.

Encouraged by this result, and inspired in particular by 1971 work of Stephen Cook, people since the 1970s have come to regard the class of P-time functions as the correct formalization of the idea of functions for which computations are feasible, without totally impractical running times.

By analogy to Church's thesis, the statement that P-time computability corresponds to feasibly practical computability has come to be known as Cook's thesis or the Cook-Karp thesis. (The concept of P-time computability appeared as early as 1964 in work of Alan Cobham. Jack Edmunds in 1965 pointed out the good features of P-time algorithms. Richard Karp in 1972 extended Cook's work.)

So what are the P-time computable functions? They form a subclass of the primitive recursive functions. All the polynomial functions are P-time computable, as are some functions that grow faster than any polynomial. There is, however, a limit to the growth rate of P-time computable functions, imposed by the fact that printing an output symbol takes a step. That is, there is the following growth limitation property: If f is computable in time bounded by the polynomial p, then │f (x )│ ≤ │x │ + p (│x │). This prevents exponential functions from being P-time computable; there is not enough time to write down the result.

Often, P-time computability is presented in terms of acceptance of languages (i.e., sets of words). Where Σ is the finite alphabet in question, consider a language L ⊆ Σ*. One can say that L ∈ P if there is a program and a polynomial p such that whenever a word w is in L, then the program halts on input w (i.e., it "accepts" w ) in no more than p (│w │) steps, and whenever a word w is not in L, then the program never halts on input w (i.e., the program does not accept w ). This is equivalent to saying that the characteristic function of L is P-time computable, because one can add to the program an alarm clock that rings after time p (│w │). For example, it is now known that the set of prime numbers (as a set of words written in the usual base-10 notation) belongs to P.

Of course, if the characteristic function of L is P-time computable, then so is the characteristic function of its complement, L̄. That is, P = co-P, where co-P is the collection of complements of languages in P.

Informally, L is in P if L is not only a decidable set of words, but moreover there is a fast decision procedure for P—one that can actually be implemented in a practical way. For example, finite graphs can be coded by words over a suitable finite alphabet. The set of two-colorable graphs (i.e., the set of graphs that can be properly colored with two colors) is in P, because it is fast to check that the graph has no cycles of odd length. The set of graphs with an Euler cycle is in P, because it is fast to check that the graph is connected and that every vertex has even degree.

What about three-colorable graphs or graphs with Hamiltonian cycles? Here, there are no known fast decision procedures, but there are weaker facts: Given a proper coloring with three colors, it is fast to verify that it is indeed a proper coloring. Given a Hamiltonian cycle, it is fast to verify that it is indeed Hamiltonian. Both three-colorable graphs and Hamiltonian graphs are examples of languages that belong to a class known as NP.

One way to define NP is to use nondeterministic Turing machines. (The acronym NP stands for nondeterministic polynomial time.) Back in section 1 the definition of a Turing machine demanded that a machine's table of quintuples be unambiguous, that is, that no two different quintuples have the same first two components. By simply omitting that demand, one can obtain the concept of a nondeterministic Turing machine. A computation of such a machine, at each step, is allowed to execute any quintuple that begins with its present state and the symbol being scanned. Then, one can say that L ∈ NP if there is a nondeterministic Turing machine M and a polynomial p such that whenever a word w is in L, then some computation of M starting from input w halts in no more than p (│w │) steps, and whenever a word w is not in L, then no computation of M starting from input w ever halts. (An accepting computation can be thought of as having made a number of lucky guesses.)

There is an equivalent, and somewhat more workable, characterization along the lines of Σ₁ definability: L ∈ NP iff there is binary relation R ∈ P and a polynomial p such that for every word w,
w ∈ L ⇔ ∃y [│y │ ≤ p (│w │) and R (w, y )].
Another example of a language in NP is SAT, the set of satisfiable formulas of sentential logic. The truth-table method for determining whether a formula with n sentence symbols is satisfiable involves forming all 2ⁿ lines of the formula's truth table and looking to see if there is a line making the formula true. This is not, however, a feasible algorithm, because 2⁸⁰ microseconds greatly exceeds the age of the universe. If one (nondeterministically) guesses the correct line of the table, however, then one can quickly verify that the formula is true under that line.

There is a clear analogy between computable and c.e. sets on the one hand, and P and NP on the other hand. The computable sets are decidable; the sets in P are decidable by fast algorithms. And c.e. sets are one existential quantifier away from being computable; sets in NP are one existential quantifier away from being in P. Moreover, there are c.e. sets that are complete with respect to ≤_m; there are NP sets with a similar property. One can say that L ₁ is P-time reducible to L ₂ if there is a P-time computable (total) function f that many-one reduces L ₁ to L ₂. The following result was proved independently by Cook (1971) and Leonid Levin (1973):

Cook-Levin theorem

SAT is in NP, and every NP language is P-time reducible to SAT.

In other words, SAT is NP-complete. Karp showed that many other NP languages (three-colorable graphs, Hamiltonian graphs, and others) are NP-complete.

p versus np

How far does the analogy between NP and c.e. go? It is known that there are noncomputable c.e. sets, and a set is computable iff both it and its complement are c.e. While it is clear that P ⊆ NP ∩ co-NP (i.e., every language in P is in NP, as is its complement), it is not known whether P = NP, or if NP is closed under complement.

The diagonalization that produces a noncomputable c.e. set can be relativized in a straightforward way to show that for any fixed oracle B, there is a set B ′ that is c.e. in B but not computable in B. Might some diagonal argument produce a set in NP that was not in P? Would that argument then relativize? The definitions of P and NP extend easily to P^B and NP^B, where the computations can query the oracle B (in one step).

In a 1975 paper, Theodore Baker, John Gill, and Robert Solovay showed that there are oracles B and C such that on the one hand P^B = NP^B and on the other hand P^C ≠ NP^C. This result suggests that the P versus NP question is difficult, because whatever argument might settle the question cannot relativize in a straightforward way. It has also been shown that if one chooses the oracle B at random (with respect to the natural probability measure on 𝒫ℕ), then P^B ≠ NP^B with probability 1.

The P versus NP question remains the outstanding problem in theoretical computer science. In recognition of this fact, the Clay Mathematics Institute is offering a million-dollar prize for its solution.

8. Analytical Hierarchy

The ideas in section 5 can be utilized to consider partial functions that take as input not only numbers (or words over a finite alphabet), but sets of numbers or, more generally, functions from ℕ to ℕ. One can think of the computational procedure as being given a set or a function if it is given an oracle for it.

Let ℕ^ℕ be the set of all total functions from ℕ to ℕ. For a function α in ℕ^ℕ, a calculation can query an oracle for α by giving it a number n. The oracle then supplies (in one step) the number α (n ). For example, the partial function whose value at α is the least n, if any, for which α (n ) = 0 is a computable partial function on ℕ^ℕ. One can broaden the concept of computability to include partial functions that take as inputs k numbers and 1 members of ℕ^ℕ, and produce numbers as outputs.

For definiteness, suppose that f is a partial function on ℕ × ℕ^ℕ. Informally, f is effectively calculable if there exists an effective procedure that, when given a number x and an oracle for an α in ℕ^ℕ, eventually halts and returns the correct value f (x, α ) if this is defined, and never halts if f (x, α ) is undefined. As before, the various formalizations in section 1 all can be adapted to incorporate inputs from ℕ^ℕ. One can thereby obtain the concept of a computable partial function on ℕ × ℕ^ℕ.

The basic results of section 2 can be adapted to this broader situation. An essential point is that any one computation takes finitely many steps before producing its output and so can make use of only finitely many values from the given oracles. To obtain a normal form theorem, one again needs to adopt a way of encoding an entire step-by-step history of a computation when a program e is given an input number x and an oracle α. It is natural to do this in such a way that, where y is the number encoding the history, the oracle is asked for values α (t ) only for t < y. Let ᾱ (y ) be a number encoding the finite sequence α (0), α (1), …, α (y − 1) consisting of the first y values of α.

For the Kleene T -predicate, one now needs for T (e, x, s, y ) to say that e encodes a program, and y encodes the step-by-step history of the computation that program produces on input x, where the oracle supplies values according to the sequences coded by s. This is a decidable property of e, x, s, and y.

Normal form theorem

There is a 4-ary computable relation T on ℕ and a total computable function U with the following property: For each computable partial function on ℕ × ℕ^ℕ, there is a natural number e such that
f (x, α ) = U (μyT (e, x, ᾱ (y ), y ))
for every x and α.

Analogous results hold for partial computable functions with more arguments. It is interesting to note that because U and T have only natural numbers as arguments, computability on ℕ × ℕ^ℕ can be characterized in terms of computability on ℕ.

As before, one can define a subset of ℕ × ℕ^ℕ to be computable if its characteristic function is computable. Informally, this means that one has an effective decision procedure that, given 〈x, α 〉, decides whether or not it is in the set. Of course, the decision procedure will be able to utilize only finitely much information about α before rendering a verdict. Because of this fact, any computable set will be both open and closed in the natural topology on ℕ × ℕ^ℕ (where ℕ has the discrete topology and ℕ^ℕ has the product topology).

Moreover, one can define the c.e. sets to be the ones whose partial characteristic function is a computable partial function. From the normal form theorem, it follows that if Q is c.e., then there is a computable ternary relation R on ℕ such that
Q (x, α ) ⇔ ∃yR (x, ᾱ (y ), y )
for every x and α. Any such set Q will be open in the natural topology.

In section 6 the connection between computability and definability in arithmetic was examined. The definable relations formed a hierarchy, where the place of relation in the hierarchy (Σ_k or Π_k ) depended, roughly, on how many quantifiers away from being computable it was.

Now, one can extend those ideas to second-order definability in arithmetic, where besides quantifiers over ℕ, quantifiers over ℕ^ℕ can be used. One can start with
the class of arithemetical relations.
Furthermore, one can define Σ¹_k _{+ 1} to consist of relations definable from Π¹_k relations by prefixing existential quantifiers over ℕ^ℕ. Similarly, one can define Π¹_k ₊₁ to consist of relations definable from Σ¹_k relations by prefixing universal quantifiers over ℕ^ℕ. Finally, one can define Δ¹_k to be Σ¹_k ∩ Π¹_k.

As with the arithmetical hierarchy, one has proper inclusion in both of the chains
Σ¹₁ ⊂ Σ¹₂ ⊂ Σ¹₃ ⊂ ···
Π¹₁ ⊂ Π¹₂ ⊂ Π¹₃ ⊂ ···
and in both cases the union of the chains is exactly the class of relations that are second-order definable in 𝔑. One can say that these chains define the analytical hierarchy.

Bibliography

Barwise, Jon, ed. Handbook of Mathematical Logic. Amsterdam, Netherlands: North-Holland, 1977. Part C, "Recursion Theory," comprises eight papers on computability theory.

Church, Alonzo. "An Unsolvable Problem of Elementary Number Theory." American Journal of Mathematics 58 (1936): 345–363.

Davis, Martin, ed. The Undecidable: Basic Papers on Undecidable Propositions, Unsolvable Problems, and Computable Functions. Hewlett, NY: Raven Press, 1965. A collection of fundamental papers by Kurt Gödel, Alonzo Church, Alan M. Turing, J. B. Rosser, Stephen Cole Kleene, and Emil Post.

Griffor, Edward R., ed. Handbook of Computability Theory. Amsterdam, Netherlands: Elsevier, 1999. A collection of eighteen papers exploring topics in computability theory.

Herken, Rolf, ed. The Universal Turing Machine: A Half-Century Survey. New York: Oxford University Press, 1988.

Kleene, Stephen Cole. Introduction to Metamathematics. Amsterdam, Netherlands: North-Holland, 1952.

Rogers, Hartley. Theory of Recursive Functions and Effective Computability. New York: McGraw-Hill, 1967.

Soare, Robert I. Recursively Enumerable Sets and Degrees: A Study of Computable Functions and Computably Generated Sets. Berlin: Springer-Verlag, 1987.

Turning, A.M. "On Computable Numbers, with an Application to the Entscheidungsproblem." Proceedings of the London Mathematical Society 42 (1936–1937): 230–265. A correction, 43 (1937): 544–546.

Herbert B. Enderton (2005)

Encyclopedia of Philosophy