Computer vision is a technical capability that enables a computer to recognize or understand the content of a picture. Robotics is a field that is particularly affected by the development of computer vision. A robot with vision capability will be much more effective in its workspace than one that is un-sighted. Robots with vision can be programmed to respond to unstructured or random changes. For example, imagine that a robot is needed to pick up and use bolts that are lying loose in a box. The bolts can be at any orientation with respect to the gripper and only a sighted robot will have the ability to adapt to this random configuration. A robot with no vision system would have to be pre-programmed by somebody to know exactly where every bolt was lying in the box—not a very cost-effective solution.
Human vision, which is what one would normally expect a computer to be able to mimic, is actually a more complex process than most people might imagine it is. The human eye is composed of a lens that is flexible. Muscles around the periphery of the lens can stretch the lens so that it becomes thinner and its focal length increases, or the muscles can compress the lens so that it becomes thicker and its focal length decreases. These adjustments all take place "automatically" as far as we are concerned; we make no conscious effort to adjust the focal length of the lenses of our eyes. Instead, whether we look at distant objects or those close to us, the eye is automatically adjusted to focus accordingly. A vision system connected to a computer cannot function this way; the machine must be designed and programmed to distinguish between far and near objects and then have the capacity to adjust its focus accordingly.
As well as managing lens adjustment, the human eye has an intricately structured light sensitive "film" at the back of the eye called the retina. The retina is photosensitive , so it collects the focused light and converts it into signals that are dispatched to the brain. The light-sensitive elements are classified as either cones or rods. There are approximately six to seven million cones in the retina, mostly grouped around its center. The cones are responsive to variations in the wavelength of the light that reaches them— this means they are the receptors of color. The cones can also sense a variation in light intensity and as such are the principle enablers of daylight vision.
The function of rods is slightly different. There are about 130 million of them and they are predominantly found around the periphery of the retina. Rods are really only sensitive to the amount of light and tend to be more useful in assisting with night vision. The rods and cones are connected by minute nerve endings that all tend to gather at one point on the retina, where the optic nerve connects the eye and the brain. Arranging for a computer to incorporate a device that could act in a way that is anything like a retina is challenging.
The fact that humans have two eyes that are forward-facing makes it possible, with the help of the brain, to perceive depth. The two lines of sight between each eye and one object in focus, together with the line between the two eyes, form an isosceles triangle . The height of the triangle provides a perception of the distance between the plane that contains the eyes, and the object itself. A computer vision system attempting to simulate human vision would also have to try to deal with distance perception. It is possible, though, that distance measurement might be attained in other ways in a computer system; for example, a computer's sensory system could be augmented with extra facilities that use ultrasonic or radar techniques.
Components of Computer Vision
A computer vision system requires one or more television cameras that are capable of digitizing an image so that these can be stored in memory. A processor will then be commanded to analyze the information by identifying and defining any objects that may be significant. It then deals with primary objects differently than it does with objects from the scene or background. There are several commonly used camera types, but in all cases they operate by transferring an image into horizontal scan lines that make up a picture. In most cases the cameras and the central processor will be assisted by two other elements: a "frame-grabber" and an "image pre-processor."
The frame-grabber is a special device that can take an image from the camera, digitize it if it has not already been digitized, and make it available as one object to the remainder of the system. The digitizing is necessary since the computer only handles digital data and many cameras will produce only analog images. The digitized images tend to be rather large blocks of data. Each individual pixel in the image will have parameters for brightness, and perhaps color, associated with it. Thousands of these are found in just one static picture, and as soon as either the scene or the camera position changes, some or all of the data associated with the pixels changes and must therefore be updated.
In order to help with the management of these large amounts of data, the pre-processor assists by discarding parts of the image that are not considered useful, thereby reducing the image size and the computational effort involved in dealing with the image. It does this by various means, but one common way is to group together collections of pixels that have almost the same color or brightness values and changing them so that they are all the same. This obviously distorts the original image, but it can be a successful optimization technique if none of the important image information is lost. Unfortunately, it is difficult to guarantee that all the important information will always be safeguarded.
Steps of Computer Vision
Once an image has been captured by the camera and set up for further operations by the pre-processor, then the real business of computer vision can begin. One of the important results of the pre-processing operations is known as "edge-detection." Here, the pre-processor identifies areas of the image that possess regions of clearly differing light intensity. This is the first step in providing the vision system with a way of recognizing objects.
The next step is called "image segmentation." Segmentation is an operation whereby the vision system tries to identify particular shapes or regions by differentiating them from their backgrounds. It does this by choosing an intensity threshold level and organizing the image so that all pixels darker than this level are sectioned off from all those that are lighter. This breaks the image up into segments that will correspond directly to the main features and objects in the field of view. The difficulty is in selecting the most suitable threshold level; normally there will be a way to adjust the threshold so that the most satisfactory segmentation results can be produced.
Image segmentation can be artificially enhanced, if the environment is conducive to it, by intentionally selecting bland and uniform backgrounds that contrast in color and brightness with objects in the foreground. Image segmentation can also be carried out using attributes of the image other than light intensity such as color or texture.
After segmentation has been completed, the next step is image extraction and recognition. Segmentation permits regions of the image to be isolated, but getting the computer system to recognize what it is looking at requires further effort. This operation requires "training" the computer system to be aware of certain specific objects and shapes that it is expected to encounter. This is easier to do if one limits the nature of the working environment of the machine. The computer can be given data sets of standard objects called "templates" and then can be asked to ascertain similarities among these templates and the segments of the images that it captures. It uses statistical techniques to attempt to determine what types of objects it is looking at, based on the templates that it has been given.
Computer vision is important in manufacturing and other industry areas that use robotic technology, but it is also being developed to assist with pattern recognition and provide support to visually impaired people.
see also Artificial Intelligence; Optical Technology; Pattern Recognition; Robotics.
Malcolm, Douglas R., Jr. Robotics: An Introduction. Belmont, CA: Breton, Wadsworth Inc., 1985.
McCormick, Ernest J., and Mark S. Sanders. Human Factors in Engineering and Design, 5th ed. Singapore: McGraw-Hill, 1986.
Pugh, Alan, ed. Robot Vision. United Kingdom: IFS Publications Ltd., 1983.
Sanz, Jorge L. C. ed. Advances in Machine Vision. New York: Springer-Verlag, Inc., 1989.
Shahinpoor, Mohsen, A Robot Engineering Textbook. New York: Harper and Row, 1987.
Snyder, Wesley E. Industrial Robots: Computer Interfacing and Control. Englewood Cliffs, NJ: Prentice-Hall, 1985.