Generations, Languages

views updated

Generations, Languages

Programming languages are the primary tools for creating software. As of 2002, hundreds exist, some more used than others, and each claiming to be the best. In contrast, in the days when computers were being developed there was just one language—machine language.

The concept of language generations, sometimes called levels, is closely connected to the advances in technology that brought about computer generations. The four generations of languages are machine language, assembly language, high-level language, and very high-level language.

First Generation: Machine Language

Programming of the first stored-program computer systems was performed in machine language. This is the lowest level of programming language. All the commands and data values are given in ones and zeros, corresponding to the "on" and "off" electrical states in a computer.

In the 1950s each computer had its own native language, and programmers had primitive systems for combining numbers to represent instructions such as add and compare. Similarities exist between different brands of machine language. For example, they all have instructions for the four basic arithmetic operations, for comparing pairs of numbers, and for repeating instructions. Different brands of machine language are different languages, however, and a computer cannot understand programs written in another machine language.

In machine language, all instructions, memory locations, numbers, and characters are represented in strings of zeros and ones. Although machine-language programs are typically displayed with the binary numbers translated into octal (base-8) or hexadecimal (base-16), these programs are not easy for humans to read, write, or debug.

The programming process became easier with the development of assembly language, a language that is logically equivalent to machine language but is easier for people to read, write, and understand.

Second Generation: Assembly Language

Assembly languages are symbolic programming languages that use symbolic notation to represent machine-language instructions. Symbolic programming languages are strongly connected to machine language and the internal architecture of the computer system on which they are used. They are called low-level languages because they are so closely related to the machines. Nearly all computer systems have an assembly language available for use.

Assembly language was developed in the mid-1950s and was considered a great leap forward because it uses mnemonic codes, or easy-to-remember abbreviations, rather than numbers. Examples of these codes include A for add, CMP for compare, MP for multiply, and STO for storing information into memory. Like programs written in other programming languages, assembly language programs consist of a series of individual statements or instructions that tell the computer what to do.

Normally an assembly language statement consists of a label, an operation code, and one or more operands . Labels are used to identify and reference instructions in the program. The operation code is a symbolic notation that specifies the particular operation to be performed, such as move, add, subtract, or compare. The operand represents the register or the location in main memory where the data to be processed is located. However, the format of the statement and the exact instructions available will vary from machine to machine because the language is directly related to the internal architecture of the computer and is not designed to be machine-independent. Machine dependence is a significant disadvantage of assembly language. A program coded in assembly language for one machine will not run on machines from a different or sometimes even the same manufacturer.

The principal advantage of assembly language is that programs can be very efficient in terms of execution time and main memory usage. Nearly every instruction is written on a one-for-one basis with machine language. Since all the instructions of a computer are available to the assembly language programmer, the programmer can readily manipulate individual records, fields within records, characters within fields, and even bits within bytes .

Programs written in assembly language require a translator to convert them into machine language. An assembly language instruction for multiply, MP, has no meaning to the computer because it only understands commands in the form of 11010110. Therefore, a program called an assembler is needed to translate each assembly language instruction into a machine-language instruction.

Although assembly languages are an improvement over machine language, they still require that the programmer think on the machine's level. Because the level of detail required to write assembly programs is very high, it is easy to make mistakes. Although some programmers still use assembly language to write parts of applications where speed of execution is critical, such as video games, most programmers today think and write in very high-level or fourth-generation languages.

Third Generation: High-Level Language

Third-generation languages spurred the great increase in data processing that occurred in the 1960s and 1970s. During that time, the number of mainframes in use increased from hundreds to tens of thousands. The impact of third-generation languages on society has been huge.

A programming language in which the program statements are not closely related to the internal characteristics of the computer is called a high-level language. As a general rule, one statement in a high-level programming language will expand into several machine language instructions. This is in contrast to assembly languages, where one statement normally generates one machine language instruction. High-level programming languages were developed to make programming easier and less error-prone.

High-level languages fall somewhere between natural languages and machine languages, and were developed to make the programming process more efficient. Languages like FORTRAN (FORmula TRANslator) and COBOL (COmmon Business Oriented Language) made it possible for scientists and business people to write programs using familiar terms instead of obscure machine instructions. Programmers can now pick from hundreds of high-level languages.

The first widespread use of high-level languages in the early 1960s changed programming into something quite different from what it had been. Programs were written in an English-like manner, making them more convenient to use and giving the programmer more time to address a client's problems.

Although high-level languages relieve the programmer of demanding details, they do not provide the flexibility available in low-level languages. A few high-level languages like C and FORTH combine some of the flexibility of assembly language with the power of high-level languages, but these languages are not well suited to the beginning programmer.

Some third-generation languages were created to serve a specific purpose, such as controlling industrial robots or creating graphics. Others are extraordinarily flexible and are considered to be general-purpose. In the past, the majority of programming applications were written in BASIC (Beginners' All-purpose Symbolic Instruction Code), FORTRAN, or COBOL—all considered to be general-purpose languages. Some other popular high-level languages today are Pascal, C, and their derivatives.

Again, a translator is needed to translate the symbolic statements of a high-level language into computer-executable machine language. The programs that translate high-level programs into machine language are called interpreters and compilers. Regardless of which translator is used, one high-level program statement changes into several machine-language statements. Each language has many compilers, and there is one for each type of computer. The machine language generated by one computer's COBOL compiler, for example, is not the same as the machine language of some other computer. Therefore, it is necessary to have a COBOL compiler for each type of computer on which COBOL programs are to be run.

Using a high-level language makes it easier to write and debug a program and gives the programmer more time to think about its overall logic. In addition, high-level programs have the advantage of being portable between machines. For example, a program written in standard C can be compiled and run on any computer with a standard C compiler. Since C compilers are available for all types of computers, this program can run as written just about anywhere. However, porting a program to a new machine is not always easy, and many high-level programs need to be partially rewritten to adjust to differences between user interfaces, hardware, compilers, and operating systems.

Fourth Generation: Very High-Level Languages

With each generation, programming languages have become easier to use and more like natural languages. However, fourth-generation languages (4GLs) seem to sever connections with the prior generation because they are basically nonprocedural. Procedural languages tell the computer how a task is done: add this, compare that, do this if something is true, and so on, in a very specific step-by-step manner. In a nonprocedural language, users define only what they want the computer to do, without supplying all the details of how something is to be done.

Although there is no agreement on what really constitutes a fourth-generation language, several characteristics are usually mentioned:

the instructions are written in English-like sentences;
they are nonprocedural, so users can concentrate on the "what" instead of the "how";
they increase productivity because programmers type fewer lines of code to get something done.

An example of a 4GL is the query language that allows a user to request information from a database with precisely worded English-like sentences. A query language is used as a database user interface and hides the specific details of the database from the user. For example, Structured Query Language (SQL) requires that the user learn a few rules of syntax and logic , but it is easier to learn than COBOL or C. It is believed that one can be ten times more productive in a fourth-generation language than in a third-generation language.

Consider a request to produce a report showing the total number of students enrolled in each class, by teacher, in each semester and year, and with a subtotal for each teacher. In addition, each new teacher must start on a new page. Using a 4GL, the request would look similar to this:

TABLE FILE ENROLLMENT

SUM STUDENTS BY SEMESTER BY TEACHER BY CLASS

ON TEACHER SUBTOTAL PAGE BREAK

END

Although some training is required to do even this much, one can see that it is fairly simple. Conversely, a third-generation language like COBOL would typically require a few hundred lines of code to fulfill the same request.

4GLs are still evolving, which makes it difficult to define or standardize them. A common perception of 4GLs is that they do not make efficient use of machine resources. The benefits of getting a program finished more quickly, however, can far outweigh the extra costs of running it.

Object-Oriented Languages

Smalltalk, developed in the 1970s by Alan Kay at Xerox's Palo Alto Research Center, was the first object-oriented programming language. In object-oriented programming, a program is no longer a series of instructions, but a collection of objects. These objects contain both data and instructions, are assigned to classes, and can perform specific tasks. With this approach, programmers can build programs from pre-existing objects and can use features from one program in another. This results in faster development time, reduced maintenance costs, and improved flexibility for future revisions. Some examples of object-oriented languages are: C, Java, and Ada (the language developed by the U.S. Department of Defense).

What will be the next step in the development of programming languages? Future languages will probably have little in common with earlier ones. They will likely be much closer to natural languages.

see also Algol-60 Report; Algorithms.

Ida M. Flynn