Internet Search Engine
Internet Search Engine
An Internet search engine is a service that searches the Internet for specific items, following search terms specified by a user. The items retrieved may be texts, images, audio files, or video files. The leading Internet search engines, in decreasing order of United States usage share as of May 2006, were: Google (47%), Yahoo! (16%), and MSN Search (11%).
There are three basic stages to the functioning of an Internet search engine. First, the provider of the service must systematically visit as many of the Web pages on the Internet as possible—preferably all of them. Since millions of new Web pages appear daily, this effort to traverse the whole Web, called “crawling the Web,” is an ongoing process. Crawling is not done by human beings but by a software device known as a spider or Web crawler that follows every link that appears on every Web page. No search engine actually visits more than about a fifth of all pages on the publicly available Web. (Many pages are available only when a query has been submitted to a database: no direct link to such a page appears on any Web page, making it difficult for a crawler to discover.)
Second, the search engine provider must index the pages visited. All text appearing on the Web page being crawled is downloaded to the computers of the service provider. On the order of hundreds of terabytes (trillions of bytes) is required to store the text available on the Internet, but storage is so cheap today that this is not an overwhelming cost factor, even with backup. Text is analyzed to determine whether it belongs to a file name, Web link, readable text, or other category.
Third, the index must be searched when a user submits search terms. This stage is where the methods used by search engine providers become especially complex. There may be millions or billions of hits for some words, but most users do not have the time to scan more than a few dozen hits. Therefore, rules must be devised to display the links that are most likely to be of interest to a user at the top of the results list. New rules are continually being devised to rank results according to relevance.
When results have been displayed in a Web browser window on the user’s computer screen, the user chooses a result link to click on, and their Web browser takes them to the page—if it is still available. Web links are continually becoming unavailable or being relocated, a phenomenon called “link rot.” Adapting to link rot is another task continually pursued by the automated systems of the search engine provider.
The first Internet search engine, Wandex, appeared soon after the appearance of the World Wide Web itself in 1993. Over two dozen search engines have appeared since, but not all have remained available. Google was launched in 1998 and Yahoo! Search and MSN Search in 2004. Without Internet search engines, the World Wide Web would be comparatively useless.
Users do not pay fees to access Internet search engines. The costs of the companies supplying the service are supported by advertising. This includes sidebar placement or top-of-list placement of links to companies whose sites are relevant to a given Internet search. For example, a sport equipment manufacturer might pay to have their website listed first when a user searches for the term “sneakers.”
Because search engines are so key to accessing the content of the Web, some governments censor
search-engine results. For example, German and French law forbids anti-Semitic, Nazi, and White Supremacist speech, and in compliance, Google does not show users in these countries results for a number of such Web sites. In the United States, the First Amendment to the Constitution protects all political speech, including offensive speech such as is found on these Web sites, so Google results are not censored for U.S. users. In contrast, the Chinese government is one of the most severe censors of the Internet in the world, and Google’s China branch (operative since January, 2006) censors thousands of search terms in compliance with the Chinese government’s censorship policies. For example, a Google search for the phrase “Tiananmen Square” performed by a U.S. user brings up thousands of references to the Chinese government’s massacre of pro-democracy demonstrators in Tiananmen Square in Beijing, China in 1989, while a user in China will not be shown these links. Instead, the Chinese user is shown a notice that content has been blocked.