Search Engine Showdown
[an error occurred while processing this directive]


Fast Stays on top
Search Engine Statistics: Database Relative Size
by Greg R. Notess

Data from search engine analysis run on Feb. 21, 2000.

Fast continues in first
Graph Excite makes significant gains
Why does size matter?
Methodology details available.

For 25 specific, single word queries, Fast, (at either AlltheWeb.com or Lycos advanced search) found the most hits, followed by Northern Light and AltaVista. Excite and Google also show gains. On individual searches, Fast ranked first on 18 of the searches, yet on some searches other search engines found more (with one tie for first):

Fast

Found most 18 out of 25 searches

Northern Light

Found most 2 out of 25 searches

AltaVista

Found most 5 out of 25 searches

Excite

Found most 1 out of 25 searches


This chart compares the size of the databases of the Web search engines. For this comparison, I used 25 single keyword searches that are processed almost identically by each search engine. Bar Chart

This comparison is based on the reported number of hits from each database, verified by visiting the last page of results when possible. This is not a measure based on precision, recall, or relevance but only on the raw database size. As such, it provides an important measure of database coverage. For earlier comparisons see below:

Specific Database Notes

AltaVista clusters results, but this analysis used the Advanced Search which does not cluster by site. AltaVista is notorious for inconsistencies in reporting the number of hits it finds. Each search result set was checked and only the number of hits available for display was counted. Since the advanced search can only display the first 1,000 results, none of the search terms used reported more than that number. Because AltaVista can time out on a search and not give a full results set, their total database size may be under-represented here. However, it does reflect what searchers can find when using AltaVista.

Google includes some results (URLs) that it has not actually indexed. These can be readily identified by the lack of a extract or a "cached" copy. These are URLs which are linked from other pages but not necessarily yet verified by the Google spider. For this reason, the Google size will be reflected as larger than their database of fully indexed pages actually represents.

HotBot clusters results by site, and there is no way to uncluster them This makes it difficult to accurately measure the size of their database. For this comparison, the advanced search was used. All the top level domains in the results were noted and then the search was re-run using the domain limitation with all found top level domains ORed together. Though tedious, this effectively turned off the site clustering to find HotBot's total number of hits.

Northern Light automatically recognize and search the English-form of word variants and plurals. For that reason, only nonplural terms are used. Only the Web portion of Northern Light was searched, not their Special Collection. Northern Light also clusters hits by site with no ability to disable the site clustering. The number of reported hits was used, rather than trying to verify the number under each site. This could cause a misrepresentation of their size.

Excite provides no capability for searching all languages simultaneously (it defaults to English only). Therefore all the searches were done in each language and the resulting numbers combined to come up with the Excite total.

MSN Search will only display up to 200 hits, so their reported numbers above that amount could not be verified.

AOL Search includes the Open Directory, an AOL database, and an Inktomi database. Like the other search engines using an Inktomi database, only the Inktomi results were used.

Lycos provides access to the Fast database in its advanced search. That version is represented by Fast. On this chart, the column labeled Lycos is the regular Lycos search engine.

More details on the study's methodology provide an example of the comparison process used here.

Older Reports with Largest Three at that Time
Jan. 2000 (supplement): Fast, Northern Light, AltaVista
Nov. 1999:Northern Light, Fast, AltaVista
Sept. 1999:Fast, Northern Light, AltaVista
Aug. 1999:Fast, Northern Light, AltaVista
May 1999:Northern Light, AltaVista, Anzwers
March 1999:Northern Light, AltaVista, HotBot
January 1999:Northern Light, AltaVista, HotBot
August 1998:AltaVista, Northern Light, HotBot
May 1998:AltaVista, HotBot, Northern Light
February 1998: HotBot, AltaVista, Northern Light
October 1997:AltaVista, HotBot, Northern Light
September 1997:Northern Light, Excite, HotBot
June 1997:HotBot, AltaVista, Infoseek
October 1996:HotBot, Excite, AltaVista

While decisions about which Web search engine to use should not be based on size alone, this information is especially important when looking for very specific keywords, phrases, and areas of specialized interest. See also the following statistical analyses: