## Artificial and Natural Languages

## Artificial and Natural Languages

# ARTIFICIAL AND NATURAL LANGUAGES

The only natural languages we know of are human. In addition to such human languages as English, Spanish, Russian, and Chinese, with which we are all familiar, there are many less well-known languages, many of them spoken by hundreds of people. The more marginalized languages are dying out at an alarming rate. Owing to lack of evidence, our information about their origin is limited, but it seems likely that they evolved out of communication systems similar to those used by animals for communication. Living human languages are learned as first languages by infants and are used for face-to-face communication and many other purposes.

Natural languages are influenced by a mixture of unconscious evolutionary factors and conscious innovation and policy making. In most cases, the historical record does not allow us to tell what role these factors played in the development of a given feature, but the difficulty of consciously controlling the language used by a large population suggests that unconscious causes predominate.

The term "artificial language" is often used for humanlike languages that are created either for amusement (like J. R. R. Tolkien's Elvish) or for some practical purpose (Esperanto). Information on such projects can be found in Alan Libert's work (2000).

Artificial languages of a quite different sort are created for scientific and technological reasons, and the design of such languages is closely connected with logical theory. Logic originated with Aristotle in his *Prior Analytics*. Although Aristotle's syllogistic theory used symbols for terms (such as "some," "all," "not") that make up propositions, such symbols and the expressions made up out of them were not generally considered as part of a linguistic system until much later.

Modern logical theory and its connection with artificial languages owes much to the search for a universal language in the seventeenth century (Maat 1999). In Britain, George Dalgarno (1968 [1661]) and John Wilkins (2002 [1668]) promoted the idea of a philosophical language based on rational principles. In retrospect, their ideas seem to be more closely aligned with the goal of designing an improved human language than with the mainstream development of logic and were more concerned to facilitate clear expression of ideas than to serve as a framework for developing a theory of reasoning. Their projects stressed the need for basing a vocabulary on a rational ontology and are more closely connected with later attempts to develop taxonomies and thesauri than with logic per se.

At about the same time, however, G. W. Leibniz attempted to develop a "universal characteristic" based on several ideas central to the later development of logic and artificial-language design. In his "Dissertatio de Arte Combinatoria" (excerpts in Loemker 1956, pp. 117–133), written in 1666 when he was nineteen years old, Leibniz presents a logical program that, in its main proposals, informed his philosophy for the remainder of his life.

Like Dalgarno and Wilkins, Leibniz adopted the goal of a rationally ideal philosophical language, but he differs from them in the stress he lays on reasoning and in the degree to which his account of reasoning is inspired by mathematics. The leading ideas of his program—that truth can be discovered by analysis, or division of concepts into basic constituents; that such analytic reasoning is analogous to combinatory reasoning in mathematics; and that it is facilitated by a language with a clear syntactic structure reflecting the meanings of expressions—have furnished important insights for subsequent work in logic. The stress that Leibniz placed on calculation as part of the reasoning process gives him a well-deserved central place in the history of logic and computation.

The two weak points in Leibniz's program are (1) the assumption that once analysis was achieved in an ideally rational language, testing a proposition for truth should be a relatively trivial matter, and (2) the idea that analysis is appropriate and possible across the entire range of rational inquiry. The first of these weaknesses was corrected late in the nineteenth century, when Gottlob Frege developed a symbolic language for the representation of "pure" or mathematical thought. Frege's "Begriffsschrift," or conceptual calculus, achieves the goal prefigured by Leibniz of a language designed to facilitate reasoning by allowing the relations between concepts to be clearly and unambiguously displayed. And it conforms to the methodological ideal of being completely explicit more than any previous attempt to present an artificial language. Frege's presentation of the *Begriffsschrift* makes it possible to test each constellation of symbols to tell whether it is a well-formed formula (an expression that conforms to the syntactic rules of the system). Although part of proving such a formula in Frege's calculus is a matter of analysis, or the application of explicit definitions, the result of such analysis is a formula that must be proved using logical laws. These laws are explicitly formulated, so that it is also possible to tell whether or not a purported proof conforms to the rules of the system. But *whether there is a proof* of an analyzed proposition need not be a question that can be solved algorithmically. In fact, as the theory of the nature of reasoning systems has shown, we cannot in general expect to have an algorithmic criterion for whether a formula is provable.

The second weakness in Leibniz's program is more difficult to deal with decisively. But many years of experience indicate that we have no reliable methodology for isolating universal atoms of human thought. In many extended attempts to make the rules of reasoning in some domain explicit, it seems more useful to deal with many primitives that are conceptually related by axioms rather than by definitions.

Alonzo Church summarized the results of more than seventy-five years of philosophical and mathematical development of Frege's achievement in section 7, "The Logistic Method," of his *Introduction to Mathematical Logic* (1956). In that and the subsequent two sections, Church sets out the methods logicians had established in the first half of the twentieth century for constructing artificial logical languages (or, to use the usual current term, *formal languages* ) and theorizing about them. These methods have changed slightly in the subsequent forty-eight years, the most significant changes having to do with interest in applications other than the explication of deductive reasoning and in the widespread use of formal languages in digital computing. In the beginning of the twenty-first century, it is not essential for formal languages to have a deductive component, and in some cases it may be important to associate implemented computational procedures with a formal language.

What are the essential features of a formal language? First, a formal language must have a *syntax*, a precise definition not only of the vocabulary of the language but also of the strings of vocabulary items that count as well-formed formulas. If other types of complex expressions than formulas are important, for each such *syntactic type* there must be a precise definition of the set of strings belonging to that type. These definitions must be not only precise but *effective* ; that is, questions concerning membership in syntactic types must be algorithmically decidable. These syntactic definitions are usually presented as inductive definitions; for instance, the simplest formulas are defined directly, and rules are presented for building up complex formulas from simpler ones. The set of well-formed formulas is not only decidable but usually belongs to a known restricted class of efficiently computable sets of strings. The *context-free* sets of strings are heavily used in computational applications, and are also capable of standing in for large parts of human languages.

Second, if proofs are associated with the language, these too must be precisely defined. Whether or not a list of formulas is a proof must be algorithmically decidable.

Third, the formal language must have a semantic interpretation, which associates *semantic values* or *denotations* with the well-formed expressions of the language. The importance of a semantic component was recognized by Alfred Tarski, who also provided a methodology for placing semantics on a sound mathematical basis and applying it to the analysis of mathematical theories.

A version of Tarskian semantics due to Alonzo Church (1940) starts with a domain of individuals (the objects that the language deals with) and a domain of truth-values (the two values True and False) and constructs possible denotations by taking functions from domains to domains. Sentences, for instance, denote truth-values, and one-place predicates (verblike expressions taking just one argument) denote functions from individuals to truth-values.

In a semantics for deductive reasoning, truth-values are essential. Once the legitimate interpretations (or *models* ) of a language are given, the validity of an inference (say of formula *B* from formula *A* ) can be defined as follows: The inference is valid if every model that assigns *A* the value True also assigns *B* the value True.

The theory of any language (natural or artificial) has to be stated in some language. When one language serves as a vehicle for formulating and theorizing about another language, the first is called the *metalanguage* for the second, and the second is called an *object language* of the first. Nothing prevents a metalanguage itself from being formalized. When logicians wish to investigate theories of language, they may wish to formalize an object language and its metalanguage. The language in which the theory of both languages is stated would be a *meta-metalanguage*. Since formalization is a human endeavor, the whole enterprise is usually conducted in some human language (typically in some fairly regimented part of a human language, supplemented with mathematical notation), and this language serves as the metalanguage for all the languages developed in the course of the formalization project. In theory, a language can be its own metalanguage, but in such cases we have a situation that can easily lead to paradox.

The use of digital computers has led to the wholesale creation of special-purpose formal languages. Since computer scientists have borrowed the methods for presenting these languages from logic, computational formal languages usually conform to Church's recipe. Sometimes, however, a semantics is not provided. (For instance, mathematical tools for providing semantic interpretations for programming languages only became available years after such languages had been developed and used.) Often it is important to specify the crucial computational procedures associated with such a language. For example, a *query language*, intended to enable a user to present questions to a database, has to provide a procedure for computing an answer to each query that it allows. Sometimes a computational formal language is pointless unless procedures have been implemented to enable computers to process inputs formulated in the language. A programming language is useless without an implemented program that interprets it; a markup language like HTML (Hypertext Markup Language) is useless without browsers that implement procedures for displaying documents written in the language.

These are very natural additions to Church's logistic method. Even in 1956 a semantic interpretation was thought to be desirable but not essential. The methods developed by logicians in the first half of the twentieth century for formalizing languages have not changed greatly since then and are likely to be with us for a long time.

The distinction between natural and formal languages is not the same as the distinction between naturally occurring and artificial languages. Rather, it is the distinction between naturally occurring languages and languages that are formalized, or precisely characterized along the lines suggested by Church. As far as the distinction goes, what prevents a natural language from being formalized is the difficulty (or perhaps impossibility) of actually formalizing a language like English or Swahili. Can natural languages be formalized? Can the grammar of naturally occurring languages be articulated as clearly as the syntax of an artificially constructed language? In assigning denotations to the expressions of a natural language, do we encounter problems that do not arise with artificial languages designed to capture mathematical reasoning?

In fact, there are difficulties. But logical work on formal languages has served as one of the most important sources of inspiration for theories of natural-language syntax, and is by far the most important source of inspiration for semantic theories of natural language. Both types of theories are now primarily pursued by linguists.

The ideal of syntax stated by Church derives from earlier work by David Hilbert, Rudolf Carnap, and other logicians. The essential ideas are an utterly precise description of the syntactic patterns of a language and algorithmic rules specifying how complex expressions are built up out of simpler ones. In essentials, this ideal is also the one that Noam Chomsky proposed in 1957 for the syntax of natural language. It has persisted through the evolution of the theories that Chomsky and his students have created and is also accepted by most of the leading rival approaches. Although there are methodological difficulties associated with the paradigm, they are no worse than the difficulties encountered by other sciences. The idea that natural-language syntax resembles that of formal languages has proved to be a fruitful paradigm for almost fifty years of syntactic research.

Semantics presents a more difficult challenge. Tarski's program addressed the semantics of specialized mathematical languages, and its success seems to depend essentially on certain features of these languages that are not shared by natural human languages: (1) Mathematical notation is designed to be neither ambiguous nor vague, whereas natural languages are both vague and ambiguous. (2) Natural languages have many sorts of *indexical* or context-sensitive expressions, like "I" and "today," whereas mathematical notations tend to use only one kind of indexical expression, variables. (3) *Intensional* constructions like "believe" are not found in mathematics, and they create other difficulties. The verb "believe" does not act semantically on the truth-value of the sentence it modifies. If you know that "Sacramento is the capitol of California" is true, this does not tell you whether "Jack believes that Sacramento is the capitol of California" is true. There are practical difficulties as well as difficulties in principle. Natural languages are so complex that the task of formalizing them is open-ended and much too large for a single linguist or even for a single generation of linguists.

Richard Montague, a logician who taught at the University of California at Los Angeles until 1971, is primarily responsible for showing how to overcome obstacles that seemed to prevent a semantics for natural languages along the lines advocated by Tarski. His work began a program of research along these lines that is still being pursued. Montague's solution to the problem of ambiguity was to assign denotations to *disambiguated syntactic structures*. With a syntactic structure and a single reading for each word in a sentence, the sentence can have only one meaning. His solution to indexicality was to relativize interpretations to contexts. And his solution to the problem of intensionality, which followed earlier work by Rudolf Carnap, was systematically to assign linguistic phrases two denotations: an *intension* and an *extension*. Montague treated possible worlds as semantic primitives. Intensions, for him, were functions from possible worlds to appropriate extensions. The intension of a sentence, for instance, is a function from possible worlds to truth-values. Montague presented several formal "fragments" of English, the idea being to achieve rigor by focusing on a limited family of natural-language constructions. He also showed how to use higher-order logic to obtain a remarkably elegant and unified semantic interpretation.

This work on natural-language semantics leaves open a number of challenging questions concerning whether natural languages contain elements that somehow resist formalization. For one, Montague did not deal with vagueness, and there are difficulties with his accounts of intensionality and indexicality. These issues have been a major preoccupation of analytic philosophy since the 1970s. Although no philosopher has persuasively argued that the problems are unsolvable, they are certainly more difficult than many people imagined them to be in 1971. While the final question of whether natural languages can be completely formalized remains open, the assumption that this is possible has certainly inspired a fruitful paradigm of research.

** See also ** Semantics; Syntactical and Semantic Categories.

## Bibliography

Carnap, Rudolph. *Logische Syntax der Sprache Schriften zur wissenschaftlichen Weltaufiassung*. Vienna: Verlag von Julius Springer, 1934.

Carnap, Rudolph. *Meaning and Necessity*. 2nd ed. Chicago: University of Chicago Press, 1956. First edition published in 1947.

Chomsky, Noam. *Syntactic Structures*. The Hague: Mouton, 1957.

Church, Alonzo. "A Formulation of the Simple Theory of Types." *Journal of Symbolic Logic* 5 (1) (1940): 56–68.

Church, Alonzo. *Introduction to Mathematical Logic*. Vol. 1. Princeton, NJ: Princeton University Press, 1956.

Dalgarno, George. *Ars Signorum, Vulgo Character Universalis et Lingua Philosophica* (1661). Menston, Yorkshire, U.K.: Scholar Press, 1968.

Frege, Gottlob. *Begriffsschrift: Eine der arithmetischen nachgebildete Formalsprache des reinen Denkens*. Halle, Germany: L. Nebert, 1879. Translated in *Frege and Gödel: Two Fundamental Texts in Mathematical Logic*, compiled by Jean van Heijenoort. Cambridge, MA: Harvard University Press, 1970.

Libert, Alan. *A Priori Artificial Languages*. Munich: Lincom Europa, 2000.

Loemker, Leroy E., ed. *Gottfried Wilhelm Leibniz: Philosophical Papers and Letters*. Vol. 2. Chicago: University of Chicago Press, 1956.

Maat, Jaap. "Philosophical Languages in the Seventeenth Century: Dalgarno, Wilkins, Leibniz." PhD diss., Institute for Logic, Language, and Computation, University of Amsterdam, Amsterdam, 1999.

Montague, Richard. *Formal Philosophy: Selected Papers of Richard Montague*, edited by Richmond H. Thomason. New Haven, CT: Yale University Press, 1974.

Tarski, Alfred. "The Concept of Truth in Formalized Languages" (1936). In his *Logic, Semantics, Metamathematics*. Oxford, U.K.: Clarendon Press, 1956.

Wilkins, John. *An Essay towards a Real Character and a Philosophical Language* (1668). Bristol, U.K.: Thoemmes, 2002.

*Richmond H. Thomason (2005)*