Search Engines

views updated May 29 2018

Search Engines

What is almost certain is that this entry on search engines will soon be obsoleteso rapid and dynamic are the changes that affect this central technology and service on the Internet. Thus in the entry published in the last edition of this volume the name Google did not even appear, but just a few years later Google has become the leading search engine provider the world over. So what is a search engine?


Search engines are software systems that associate search words entered by a user, looking for information, with websites on the World Wide Web that contain the words of the query. To accomplish this linking, search engines must be backed by databases that hold words that Web sites use as linked lists. Search words may produce just a handful or a very large number of Web sites. The search word "supercalifragilisticexpialidocious" produced around 294,000 hits in 2006 on Google; the somewhat obscure and specialized word "nunciature" (the office or period of office of a nuncio) produced 82,400 hits; the word "nuncio" itself (an ambassador for the papacy) yielded 1,050,000 hits. The name Chu Yuan-chang, the 14th century founder of the Ming Dynasty in China, produced 725 hits. It is difficult to find stand-alone search words with a low number of hits; even misspellings bring rich resultsbecause words are often misspelled on Web pages too and dutifully indexed by the search engines. This very wealth of hits makes it necessary for search engines to store additional information about every Web site in order to enable the engine somehow to present results in some kind of rationally ranked order. Complex algorithms are used to rank hits. The principal method is to present those sites first which have been clicked on most frequently in the past; and sites with more links to other sites get preference, all things equal.

A search engine, thus, requires its own internal logic and functionality, the software, and a database. But this database must first be built, maintained, updated, and grown as new sites are added to the Internet. Search engines, therefore, have a massive data acquisition function. In the early days the databases were built by people who scanned the web, followed links on Web sites, and indexed new pages they found. This technique is still in use with specialized Web sites and, until October 2002, was used by the world's second-ranking search engine, Yahoo. In the mid-2000s the databases of almost all search engines are built and maintained by search robots that seek out sites and capture their contents for indexingunless the site itself prohibits this activity. The robots are themselves software programs. They are known as "crawlers" because they "crawl the Web" acquiring information. Alternatively, Web site owners can also register their sites with search enginesa technique used by commercial sites eager to be found.

Search engines are 1) technologies of searching, 2) databases in support of searching, and 3) services provided to users. Search engine owners can cover their costs by all three means. The technology they own can be licensed or deployed for others at a fee; the databases can be made available for money; and the services provided can be paid for using advertising. The most effective linking of the search function itself with advertising was pioneered by Google under the name "Adwords." Specific words are sold to advertisers. When searches using the words appear, the advertisers' small ads are displayed with search results. Advertisers pay a fee when the engine users "click through" to the advertiser's own site. Other techniques make use of search words or phrases and display closely matching spot ads on the Web page.


The Internet owes its dramatic growth to the development of search engines. The first such engine was Lycos, launched in mid-1994 with 54,000 documents. Using its crawler technology, it had expanded its database to 1.5 million documents by early 1995 and had 60 million by the end of 1996. Another claimant to the founding role was AltaVista, introduced in 1995 and still active on the Web. Until Lycos and AltaVista appeared, access to the Internet required advanced knowledge of Web addresses, and roaming the Internet involved following links from site to site as these referred to each other.

The services provided by search engines become obvious with a few statistics. According to the Internet Systems Consortium (ISC), which conducts four surveys every year, in January 2006 around 395 million Internet hosts were in operation, each one hosting multiple sites, each site consisting of several Web pages on average. Extremely simple searches on leading engines provided up to 17 billion hits on Google in 2006 (for the word "the," for instance); AltaVista produced 7.4 billion, 2.1 billion, and MSN 2.4 billion hits on the word. AltaVista uses Yahoo technology; Yahoo itself, asked to search for "the," simply shrugged off the labor and provided a single hit on a corporation with the THE acronym. Some estimates put the number of pages on the Internet at hundreds of billions, but as the ISC points out from a depth of survey experience, it is not possible to determine the actual size of the Internet. In any case, several million hosts, never mind 17 billion pages, are already astronomically big numbers. The ability of search engines to provide access to such magnitudes in matters of a second or so makes the Internet the useful phenomenon that it is. The rankings of hits, which actually reflect frequency of use by others, makes using very massive search results practical. Who, after all, can afford to review 60,000 hitsor even 700.


Search Engine Watch, a Web journal concentrating on search engines and related matters, began operations in 1997, thus three years after the first search engine appeared. The company offers prizes, has public information as well as a membership service, and is an excellent source of developments in this field. Search Engine Watch (hereafter referred to as SEW) produces rankings and technical information about this industry. What follows has been gleaned largely from

Search Engines

SEW identifies Google, Yahoo, and as the top search engines on the Internet. may be more familiar to users as; the company simplified its name in 2006. All three of these leaders began with proprietary methods and technologies. Google's search engine is the most widely used by others under license. Yahoo, which began by using human indexers, began to shift its data acquisition processes to crawlers in October 2002 after a period of using Google technology.'s basic search engine was developed by Teoma, a company that it owns, but Ask also developed an expert-based indexing technique that, in the past, enabled it to serve more "human language" queries.

In a second tier SEW lists (powered by Yahoo), AOL Search (powered by Google), and HotBot (using Google, Yahoo, and Teomacurrently merged with Ask).

Under a category SEW calls "Other Choices," it lists AltaVista (using Yahoo), Gigablast (a tiny engine with propriety technology), LookSmart (compiled by people), Lycos (using HotBot and others), MSN (Microsoft's search engine, developing proprietary methods), Netscape (using Google), and Open Directory (using Google).

As is evident from this listing, the number of proprietary technologies widely used is much smaller than the search engines on offermany of them on the Web using Google and Yahoo. But each of the search engines has its special features and add-ons.


Chris Sherman, writing for Search Engine Watch, defined this category as follows: "Unlike search engines, metacrawlers don't crawl the Web themselves to build listings. Instead, they allow searches to be sent to several search engines all at once. The results are then blended together onto one page." Thus metalcrawlers, also called metasearch engines, have carried the basic strategies of search engine companies a step further: they simply use search engines, being an intermediate between others. Sherman listed 21 such metacrawlers operating in 2005. Those that had won SEW awards included Dogpile, Vivisimo, Kartoo, Mamma, and Surfwax.


From the viewpoint of the small business hoping that its Web site is found as often as possible by searchers on the Web (traffic equals sales, after all), the chief issue regarding search engines is how to be found by them andmore importantlyhow to be ranked high enough actually to be seen at all. Being 82nd in a list of 200 hits is almost equivalent to invisibility. On a typical Google search result, the entry will be on the 9th pageand rare the user who will examine nine pages of a search.

Creating, promoting, and structuring a company's site for maximum visibility is a very complex subject and will require substantial homework or expert advice. A good beginning point is SEW's Web page entitled "Search Engine Submission Tips." It provides a systematic tutorial on the major aspects, including registering the site with search engines, which may be free or may have to be paid for, using advertising services such as Google's Adwords program, and internally structuring the Web site to present the most favorable features to Web crawlers. Rankings go up when a site offers multiple links to other sitesand also when many other sites point to one's own. Self-contained sites (one might say solitary or self-centered sites) tend to be ranked low. Search engines inherently favoring a communal spirit of interconnectednessthe very essence of the Internet. The small business intent on maximizing its exposure should engage an experienced Web page design firm. Such organizations typically have the know-how to structure the Web page appropriately and also to guide the owner on additional steps to take.


A discussion of search engines would be incomplete without pointing to the frustrations and pleasures of using such services. Thus, for instance, it may be possible to find 700-some-odd pages on an ancient Chinese emperorbut frustrating sometimes when a specific phrase is sought, usually entered into the search engine between quotes, and getting the standard "Your search'X'did not match any documents." At the same time, it is often quite easy, remembering just a little snatch of a song's lyrics, to enter that truncated phrase and to get pages and pages of hits with the lyricsand more: the music itself, played on the sound system to bring back the tune. This experiencewhether in a serious business context or just for funis exhilarating. And things are moving so rapidly that by the time this text is out in print or visible on the Internet it may well be possible that search engines will provide genuinely helpful suggestions when the "did not match" message appears. Currently the advice is next to useless. But just wait a while.

see also Internet Domain Name; Web Site Design


"ISC Internet Domain Survey." Internet Systems Consortium. Available from Retrieved on 27 May 2006.

"Lycos: A brief history of the Lycos search engine." The Web Marketing Workshop. Available from Retrieved on 27 May 2006.

"Search Engine Submission Tips." Search Engine Watch. Available from Retrieved on 25 May 2006.

SearchEngineWatch. Web Site. Available from Retrieved on 26 May 2006.

Sherman, Chris. "Metacrawlers and Metasearch Engines." SearchEngineWatch. 23 March 2005. Available from Retrieved on 27 May 2006.

Sullivan, Danny. "Major Search Engines and Directories." SearchEngineWatch. 28 April 2004. Available from Retrieved on 26 May 2006.

                                      Darnay, ECDI

Internet Resources

views updated May 23 2018


The large number of Internet sites related to aging has both simplified and complicated electronic information retrieval. Many government, professional, trade, and consumer organizations maintain informational pages on aging on the World Wide Web. In addition, nearly every institution of higher learning has its own sitewith links to countless others. Commercial websites are available, ranging from those promoting specific services, products, and materials to those offering a virtual shopping mall of items.

Medical and pharmaceutical sites vary in quality, with some offering sound scientific advice and others promoting questionable "cures" and nostrums that raise concerns about Internet quackery. Caution is needed when researching information from unknown commercial sites.

Rather than inundate the reader with am extensive index of sites, the following selections offer a basic list of reliable sources of information from well-established agencies and organizations in aging. Each of these sites have links to many other locations. In addition, many of the entries in this encyclopedia refer the reader to Internet sites that are specific to that topic.

AARP (formerly the American Association of Retired Persons; This thirty million member organization has tremendous influence on aging policy and legislation in the United States and around the world. The AARP promotes the interests of people age fifty and older, particularly on issues of health and economic well-being. The "Research and Reference" section in the topic guide index on AARP's home page opens into a wealth of articles, data, and information on aging.

Administration on Aging (AoA; U.S. Department of Health and Human Services; ). The AoA was established as the agency with primary responsibility for programs initiated under the Older Americans Act (1965). The AoA's website offers information to various audiences, including older people, caregivers, practitioners, and researchers. The home page "Quick Index" provides direct access to various programs that provide caregiver assistance, including the "Eldercare Locator," a directory service that assists people in finding local support resources for older persons. The site also provides information about the characteristics and needs of older Americans and about government programs that provide their welfare.

Alzheimer's Association ( The Alzheimer's Association is a voluntary organization that funds research on causes and treatments of Alzheimer's disease. Through a national network of local chapters, the organization serves as an educational and support resource for persons with Alzheimer's, and for those who care for them. The site has information tailored to persons with Alzheimer's, their families, health care workers, and the media.

American Geriatrics Society (AGS; The AGS is a professional society for physicians and other health providers. The AGS website describes the organization, its publications, and its activities, but also includes consumer education and health information, as well as numerous links to other professional and trade organizations and consumer aging and health.

American Society on Aging (ASA; A source of information and training for persons working in the field of aging, the San Franciscobased national organization brings together professionals engaged in services, research, education, and policy. The site links to numerous constituent groups, such as the Business Forum on Aging; Forum on Religion, Spirituality and Aging; Healthcare and Aging Network; Lesbian and Gay Aging Issues Network; Lifetime Education and Renewal Network; Multicultural Aging Network; Mental Health and Aging Network; and the Network on Environments, Services and Technologies.

Canadian Association on Gerontology (CAG; This national multidisciplinary organization promotes research, education, and policy on issues of aging. The site offers information concerning CAG conferences and publications, as well as links to Canadian educational programs in gerontology and geriatrics.

Elderhostel ( A nonprofit organization founded in 1975, Elderhostel offers short-term educational travel experiences for persons age fifty-five and older. The website explains participation and registration, and also lists catalogs of learning and service programs conducted in most U.S. states and Canadian provinces and in many countries around the world.

Gerontological Society of America (GSA; The GSA is the foremost U.S.based organization promoting research and scholarship in aging. Members come from the biological sciences, clinical medicine, behavioral and social sciences, policy and practice fields, and arts and humanities. The GSA publishes prestigious scholarly journals, and its website features links to recent research findings and to many other organizations in aging, including sources for research funding. Allied units of the GSA include a policy institute, the National Academy on an Aging Society, and the Association for Gerontology in Higher Education.

GeroWeb ( This "virtual library," sponsored by the Institute of Gerontology at Wayne State University, will provide the user with a list of Internet sites in response to search terms, or in relation to predefined categories such as biology/genetics, local resources, mental health, employment, grants/funding, sociology, and retirement. All the sites included in the virtual library are oriented towards those "interested in gerontology, geriatrics, the process of aging, services for the elderly, or the concerns of senior citizens in general."

Healthfinder ( Sponsored by the U.S. Department of Health and Human Services, this site leads users to sound health information, including "selected online publications, clearinghouses, databases, web sites, and support and self-help groups, as well as government agencies and not-for-profit organizations that produce reliable information for the public." The homepage is organized with links in categories such as "Hot topics," "Health news" and "Just for you," under which there is a special category for seniors.

InfoAging ( Site is sponsored by the American Federation for Aging Research (AFAR), an organization that promotes biomedical research, this educational site is organized like an E-zine, featuring interesting, up-to-date articles on biology, advances in medicine, and health concerns of older persons.

International Longevity CenterUSA (ILCUSA; This multinational, nonprofit institute is dedicated to research, education, and policy about longevity and population aging. ILCUSA emphasizes positive ways that greater life expectancy can impact nations around the world. The site lists the educational symposiums and workshops sponsored by ILCUSA, and also posts reports, working papers, and other articles.

Medicare ( This official site offers both basic and detailed information on the Medicare health insurance program. The site describes health-plan choices, the location of facilities, and participating physicians. There is information about Medigap policies and prescription drug assistance programs. The site has Medicare-related news and updates, as well as links to various types of health care resources.

National Archive of Computerized Data on Aging (NACDA; This research-oriented site is located in the Inter-University Consortium for Political and Social Research (ICPSR) at the Institute for Social Research at the University of Michigan. NACDA acquires, preserves, and distributes scientific data sets relevant to studies in aging. NACDA provides free access to over one hundred such data sets, as well as providing links to other ICPSR data sets. NACDA provides user and technical support and also conducts educational programs to promote secondary data analysis in research on aging.

National Center for Health Statistics: Aging Activities ( This site provides access to "Trends in Health and Aging," an electronic data warehouse that describes the health status, behaviors, utilization, and cost of health care for older Americans; to "Longitudinal Studies of Aging," a set of surveys of health status and behaviors across two cohorts of older persons; and to the Federal Interagency Forum on Aging Related Statistics; as well as to such forum products as Older Americans 2000: Key Indicators of Well Being, Wallchart on Aging, 65+ in the United States, and Trends in the Health of Older Americans.

The National Council on the Aging (NCOA; The NCOA is an association of organizations and professionals engaged in advocacy and service provision for older people. Its primary activities include assisting community organizations; program development and implementation; and promoting aging-friendly public policies, legislation, and practices. The NCOA website describes research and demonstration projects, policy initiatives, and activities of the ten constituent units of the organization. The site also provides resources specifically oriented toward the aging worker.

National Institute on Aging (NIA; The NIA conducts and supports biomedical and behavioral research on aging processes. An agency within the National Institutes of Health, it was founded in 1974 to conduct research within its own laboratories and clinics, as well as fund research on aging at universities, medical centers, and scientific institutes. The site provides access to information regarding research funding and training opportunities, intramural and extramural research programs, research conferences, workshops, and meetings. The site also contains an authoritative section on "health information," which provides links to the NIA's Alzheimer's Disease Education and Referral Program. Copies of public service ads are also included, as well as access to a large number of NIA fact sheets on medical and lifestyle topics.

Social Security Administration ( This is the agency that administers the Social Security program of retirement, survivors, and disability benefits. The website has information about taxes, eligibility and benefits, and instructions about how to apply for benefits. There is also a section with information for employers. The site describes the operations and history of the program, its financing, and the future of Social Security. Links to Medicare are included.

SeniorNet ( A nonprofit organization that provides access to, and education about, computers and the Internet to persons age fifty and older. SeniorNet offers computer classes and workshops at hundreds of local "Learning Centers," which are reviewed on their site. The site also sponsors "Roundtables," senior discussion groups on a variety of topics, and "Enrichment Centers," with content in areas of special interest.

Seniors Canada Online ( A Canadian government site that provides easy online access to a wide range of information and to the services offered by multiple governmental offices. The site includes articles related to health, family, housing, and legal issues. In the category of "Employment," one can find information regarding various federal assistance programs, resources, and regulations for employers, as well as employment services. The site also contains key information about old age security and the Canada Pension Plan.

United Nations Programme on Ageing ( The UN Programme on Ageing, which focuses on the aging of the world population, gathers information on national, international, and nongovernmental programs and policies. The site includes articles about the aging of the world population, describes the World Assembly on Ageing, and lists other international activities. There is a global database of policies and programs that can be searched by year, country, issue, or certain population characteristics.

The U.S. Census Bureau ( and This site provides an abundant amount of data on any number of population topics. By selecting "Age Data" or "Elderly/Older Population Data" at the "Subjects A to Z" index, users can view data tables and other governmental publications regarding older populations. The links are not only organized by geographic level of data (county, state, national, international), but also by specific older populations, such as baby boomers, persons 55+, and persons 65+.

Charles Huyett

Search Engines

views updated May 29 2018

Search Engines

A search engine is an information retrieval system that allows someone to search the vast collection of resources on the Internet and the World Wide Web. All major search engines are similar in that keywords, phrases, or in some instances, questions, are entered in a search form. After clicking on the search command button, the database returns a collection of hyperlinks to resources that contain the search terms. These hyperlinks are listed in some sort of order, usually from most relevant to least relevant, or by how important the web pages are, depending on the search engine used. Search engines are composed of computer programs that create databases automatically. They should not be confused with human-built directories, such as Yahoo!, which depend on people for development and maintenance.

Search Engine Basics

Search engines have three components. The first part is a computer program called a spider or robot, which gathers information on the Internet. The spider retrieves hyperlinks attached to documents. It starts with an existing database and follows the existing hyperlinks to gather new and updated resources to add to the list. If a web page does not contain hyperlinks to other web pages, the search engine cannot find it. Other types of resources that most spiders are unable to locate include files that are not written in Hypertext Markup Language (HTML) , and from specialized databases that require the user to fill out a search form. Spiders automatically do this gathering of documents at intervals that differ from service to service.

Second, resources collected by the spider are loaded into a database that indexes them using a formula that is unique to each. The index contains a copy of every web page the spider finds. People can also submit web pages to this database in case the spider either fails to access it quickly enough, or if there are no links on the pages. While most search engines claim to index the entire World Wide Web, none actually do. Although spiders have many different ways of collecting information from web pages, the major search engines all claim to index the entire text of each web document in their databases. This is called full-text indexing . Some search engines may not index common words such as: and, a, I, to. These are called stop words.

The third part of the search engine is software that allows users to enter keywords in search forms using some type of search expression, with syntax that is supported by the search engine. The search results are then listed in order according to a ranking algorithm . Some search engines list results by relevancy, while others list them by how many web pages link to them, thereby showing the most important, or popular, web pages first, and others group results together by subject. Many search engines employ a combination of these.

Search Features

It is important to understand the different search features available before beginning to use a search engine as each engine has its own way of interpreting and manipulating search expressions. Because a search can retrieve many documents, it is common to have a number of hits, but only a few that are relevant to the query submitted. This is called low precision/high recall . On the other hand, a searcher may be satisfied with having very precise search results, even if a very small set of hits is returned. This is defined as high precision/low recall . Ideally, the search engine would retrieve all of the relevant documents that are needed. This would be described as high precision/high recall . Search engines support many search features, though not all engines support each one. If they do support certain features, they may use different syntax in expressing them. Before using a search feature, the user should always check the search engine's help pages to understand how the feature is expressed, if it is supported at all. Some examples of search syntax and features used by search engines are: Boolean operators (and, or, not), implied Boolean operators (+ and -), phrase searching, natural language searching, proximity searching, truncation, and field searching .

Types of Search Engines

Search engines can be divided into three basic types: general or major search engines, meta-search engines, and specialty search engines. Each of the major search engines attempts to do the same thingindex as much of the web as possibleso they handle a huge amount of data. Due to this tremendous amount of information, it is common for documents of little useful content to be picked up, making the quality of the ranking scheme used very important. In most first-generation search engines, such as AltaVista and HotBot, results are ranked by relevancy. Relevancy is determined by algorithms that usually count how many times the keywords typed in the search form appear in the documents that exist in the database. Second-generation tools such as Vivisimo, Google, and Direct Hit, use ranking algorithms that use techniques such as grouping and sorting results, importance or popularity of web sites, and human judgment from prior searches. Meta-search engines are tools that search more than one search engine or directory at once, compiling the results and consolidating them into an overall list.

Examples of meta-search engines are Metacrawler, Vivisimo, and One drawback of meta-search engines is that they do not include all of the search engines possible, and they are unpredictable in how they handle complex searches. They can be useful for obscure searches.

Specialty search engines, or specialized databases, are search tools that focus on particular subjects, or types of file format (e.g. images or music files). These databases can be time savers because their databases are much smaller and focused on a particular subject area, or type of resource. For example, if a certain legal opinion is needed, a searcher would achieve greater success with FindLaw <> rather than spending the time in a major search engine such as AltaVista looking through perhaps hundreds of results.

Difficulties and Benefits of Major Search Engines

Search engines send their spiders to crawl the web periodically, so there may be infrequent updates and new sites may not be immediately added. Specialty search engines may be better for very current, dynamically changing information, such as fast-breaking news stories. There is evidence that the major search engines realize this problem and are starting to team with specialty services that provide recent news. For example, AltaVista uses the Moreover news service to provide users with news stories. Another difficulty is that according to a 1999 study by Steve Lawrence and C. Lee Giles, only 16 percent of the web is indexed. Besides content that cannot be gathered by search engine spiders, such as dynamically generated web pages, and pages that contain no hyperlinks, and certain file types, there is also evidence that commercial sites are more often indexed than non-commercial sites. This part of the web that is hidden from the major search engines is often referred to as the invisible web.

Another difficulty is that information found in major search engines has not been evaluated. The responsibility is placed upon the individual to evaluate what is found. These drawbacks should not detract from the benefits of these major search tools, however. Many general or major search engines, realizing the added benefit of human-managed information, include directories such as the Open Directory Project, in conjunction with the computerized indexes. And some directories, such as Yahoo!, employ search engines to search the web when their directories fail to provide the resources needed by the searcher. The usefulness of being able to search for obscure topics, multi-faceted subjects, specific web pages and sites, in addition to information from specific dates, languages, news stories, images, and more, makes search engines necessary tools for the searcher to learn and use.

Popular Search Engines

Some of the most popular search engines include:

see also Information Access; Information Overload; Information Retrieval; World Wide Web.

Karen Hartman


Ackermann, Ernest, and Karen Hartman. Internet and Web Essentials: What You Need to Know. Wilsonville, OR: Franklin, Beedle, and Associates, 2001.

Cohen, Laura. "Searching the Web: The Human Element Emerges." Choice Supplement 37 (2000): 17-30.

King, David. "Specialized Search Engines: Alternatives to the Big Guys." Online 24, no. 3 (2000): 67-74.

Lawrence, Steve, and C. Lee Giles. "Accessibility and Distribution of Information on the Web." Nature 400, no. 6740 (1999): 107-109.

Snow, Bonnie. "The Internet's Hidden Content and How to Find It." Online 24, no. 3 (2000): 61-66.

Internet Resources

Lawrence, Steve, and C. Lee Giles. "Accessibility and Distribution of Information on the Web." <>

Sullivan, Danny. "How Search Engines Work." <>

. "Search Engine Features for Searchers." <>