Stephen Cook Advances Knowledge of NP-Complete Problems, Assisting Computer Scientists

views updated

Stephen Cook Advances Knowledge of NP-Complete Problems, Assisting Computer Scientists

Overview

In 1971 mathematician Stephen Cook (1939- ) was able to show that a solution to a certain family of computational problems (computer science problems) could not be computed in a reasonable amount of time on even the fastest computers that can exist. He was also able to show that this family of problems was related, so that if a "fast" solution could be found for a single one of them, they could all be solved in the same manner. These problems, called "NP-Complete" problems, turn out to be very important because they deal with optimizing many sorts of activities, including travel itineraries, computer architecture, scheduling, electrical circuits, and more. As a result of the work done by Cook and others, we now have a better idea of problems that are hard to solve, those that can be solved easily (and quickly), and how to differentiate between them. This, in turn, is valuable to computer scientists and others who work to compute solutions to complex problems.

Background

An algorithm is a methodology that can be used to solve a problem. It is, in essence, a list of instructions that can be followed to arrive at a solution in a predictable number of steps. For example, someone consistently late for school could generate the following algorithm to solve this problem:

7:00: Alarm sounds; 7:01: Turn off alarm, get out of bed; 7:10: Choose clothes, get dressed, brush teeth; 7:15: Eat breakfast, gather books, leave house; 7:30: Arrive at school

If followed precisely, this algorithm will guarantee a predictable outcome, as mathematical algorithms will help reach a solution to mathematical problems.

In general, complex problems require complex algorithms if they are to be solved in a methodical manner. However, it is not a simple matter to determine how long an algorithm must take to reach a solution, based only on the type of problem. Certain types of problems, called "P" problems, can be reliably solved in what is called "polynomial time." In other words, a relatively simple mathematical relationship can be developed, using polynomials, to find out how long it will take to solve a "P" problem. P problems are usually cast as decision problems with an answer that must be either yes or no. One example of a P problem would be, given a group of numbers, is there any number that is evenly divisible by three? This is a P problem because, no matter how many variables you have, the number of steps to find an answer is easily calculated. For example, with any number of variables, the largest number of steps to completely solve the problem is equal to the number of variables because the algorithm would simply be to divide every number by three to see if the answer is yes or no. The amount of time to calculate the answer to such problems is not difficult to determine. In addition, since one can determine a series of steps to find an answer and the length of time that the process will take, such problems are called "deterministic."

Another set of problems may not be solvable deterministically and are called, appropriately enough, "NP," for non-deterministic polynomial problems. Proposed solutions to these problems can be verified or rejected in polynomial time, but a precise solution, to the best of our knowledge, cannot be. In other words, if you make a guess at a solution, it is easy to determine if the guess is correct. On the other hand, it is difficult (perhaps impossible) to arrive at an answer using a methodical algorithm. A famous example of such problems is the "Traveling Salesman" problem.

The Traveling Salesman problem is an exercise in optimization. A traveling salesman has a number of stops to make that are scattered all over. He is supposed to find a route that will visit each stop only one time and that will cover the shortest distance possible. If there are only a few stops, the problem is not difficult to solve and can usually be solved by simply taking the distance between each set of stops and adding them up in all the various ways possible to find the shortest route. However, as the route grows larger, the problem becomes very difficult very quickly. The reason for this is that the number of options grows much faster than the number of stops. In fact, the number of possible itineraries grows as the factorial of the number of stops. A route with three stops will have six possible itineraries because three factorial (3×2×1, written as 3!) is equal to six. Four stops brings us to 24 itineraries, five will give 120, and six, 720 itineraries. Writing an algorithm that will give an answer to this problem is easy, but the time to solve the algorithm cannot be represented by a polynomial, and success in solving such problems is not guaranteed.

Now, consider trying to solve the traveling salesman problem with a computer. If your computer can test one complete itinerary in a microsecond, the 10 stops will take over 3.5 seconds to solve (10! equals 3,628,800). Fifteen stops will take slightly over fifteen days to solve, and 20 stops will take over 77,000 years to reach a solution by testing every single possible itinerary. More complex itineraries can't be solved in the life of the universe using any known algorithm.

What Cook was able to do was to show that some problems, called "NP-complete" problems, are similar to all NP problems and to each other. Therefore, if a single NP-complete problem can be solved in polynomial time, then all NP problems can be solved in polynomial time. This would mean that computers could be programmed to solve even very large problems (problems with a large number of variables) in a relatively short time.

This sounds, in many ways, like an abstract mathematical problem with very little bearing on the "real world." In some ways, this may be true, but there are a great many practical applications for Cook's work and for NP-complete problems in general.

Impact

One thing that was found as work on NP problems progressed was that it might not be necessary to test every single itinerary. For example, you wouldn't want to calculate the distance from a city to itself—even though this is one of the possibilities—because the answer is going to be zero and because it doesn't help solve the problem. So you can automatically throw out all route segments that start and end in the same place. In addition, for specific routes, you might choose to say that the salesman has to start and end in specific cities. Removing even just one or two terms from a factorial can drastically cut down on computation time, possibly making the problem solvable. This, in turn, helps to simplify the problem.

It was also found that it is not always even necessary to find a shortest route. For example, you might decide that the salesman simply has to be back in the office within a week. In such a case you need not find the single shortest route, but just one that will take a week or less. If there are many routes that can be traveled in a week, the problem will be solved more rapidly. From a practical standpoint, it might be preferable to send a salesman out, knowing that he might take five days to complete his route instead of four, but also knowing that it might take months or years to calculate the perfect route. Knowing in advance that a problem is NP, then, can help to set the strategy you will use to attack it—that is, whether to try to find the single best solution or simply one that is good enough, given everything else you know about the situation.

There are many problems that fall into the NP-complete category besides the classical traveling salesman problem. For example, finding the optimum route for internet packets to take when traveling from computer to computer might fall into this category and, as the global computer network becomes increasingly complex, so does the problem of routing e-mail, making internet phone calls, downloading information, and so forth. Another problem of this sort would be scheduling work for the different processors in parallel or multiprocessor computers. Developing production schedules for industry, plowing roads after a heavy snowfall, optimizing production of different products with different profit margins and popularity, and so forth are other problems that are NP or NP-complete.

In some of these cases, it is tempting to simply guess, knowing that a methodical approach could take many years while a lucky guess could solve the problem in the first try. There are two reasons why guessing is not a good strategy, though. First, if you are trying to find the single shortest path, even if you guess it on the first try, you still have to try every path to prove that it is the shortest. So, in this case, guessing saves no time at all.

In spite of this, guessing still looks attractive if you are simply trying find an acceptable route. To use a previous example, if you are merely trying to find a route that will get the salesman back in a week, you can stop looking as soon as a single such route is found, which could happen by sheer chance with the first guess, the tenth, or the hundredth. However, it might also take until the 77 trillionth guess, making you wait nearly 2.5 years. And that is the problem with guessing—you have no idea how long it will take to reach a solution. This lack of certainty makes guesswork a poor method for solving a problem of any sort, especially a real-world problem in which large sums of money may be at stake.

Put simply, Cook's research helped to clarify the boundaries between the problems we know can be solved in a reasonable amount of time and those that we know probably can't. By making this clear and by showing that solving a single NP-complete problem will erase those boundaries, Cook was able to also help us to determine which problems are worth trying to solve exactly and which aren't.

P. ANDREW KARAM