Search Engine Showdown
[an error occurred while processing this directive]


Fast Back in 1st!
Search Engine Statistics: Database Relative Size
by Greg R. Notess

Data from search engine analysis run on Nov. 29, 1999 with supplement on Jan. 12, 2000.

Fast leaps into first with claims of 300 million pages
Graph See also Total Size Estimates
Methodology details available.

Fast makes a major launch of a new database of 300 million URLs. Read the special supplemental report for more details including methodology, analysis, and detailed results. It only compares the three largest and uses a different set of terms.

Among these top three on 25 queries, Fast most often found the most hits.

Fast

20 times

Northern Light

2 times

AltaVista

3 times


Since the information above came from the special supplement which only covered the three largest search engines, the following details which cover more search engines from the Nov. 29, 1999 analysis is still included.

Despite the claims of 250 million URLs from AltaVista, it found fewer hits than Northern Light with its 200 million. The three largest remain Northern Light, AltaVista, and Fast, but Google is growing. Among the top three, the graph shows that they are all relatively close in size. Out of the 25 searches run, they traded off and tied a few times for which search engine found the most hits:

Northern Light

11 times

Fast

8 times

AltaVista

9 times

This chart compares the size of the databases of the Web search engines. For this comparison, I used 25 single keyword and phrase searches that are processed almost identically by each search engine. Bar Chart Since Infoseek and Northern Light automatically recognize word variants and plurals, those terms that could be pluralized were OR'ed together in search engines that do not support automatic plural searching. Since all language cannot be searched on Excite simultaneously, the searches were done in each language and the total results used. These and other inconsistencies between the search engines may skew the results slightly.

This comparison is based on the reported number of hits from each database, verified by visiting the last page of results when possible. This is not a measure based on precision, recall, or relevance but only on the raw database size. As such, it provides an important measure of database coverage. For earlier comparisons see below:

Due to HotBot's way of clustering results by site, it has proven difficult to analyze in the past. For this version, the advanced search was used. All the top level domains in the results were noted and then the search was re-run using the domain limitation with all found top level domains ORed together. This effectively turned off the site clustering to find HotBot's total number of hits. AltaVista's results were counted unclustered by using the Advanced Search. Infoseek's clustering was turned off by ungrouping search results.

Older Charts with Largest Three at that Time
Nov. 1999:Northern Light, Fast, AltaVista
Sept. 1999:Fast, Northern Light, AltaVista
Aug. 1999:Fast, Northern Light, AltaVista
May 1999:Northern Light, AltaVista, Anzwers
March 1999:Northern Light, AltaVista, HotBot
January 1999:Northern Light, AltaVista, HotBot
August 1998:AltaVista, Northern Light, HotBot
May 1998:AltaVista, HotBot, Northern Light
February 1998: HotBot, AltaVista, Northern Light
October 1997:AltaVista, HotBot, Northern Light
September 1997:Northern Light, Excite, HotBot
June 1997:HotBot, AltaVista, Infoseek
October 1996:HotBot, Excite, AltaVista

While decisions about which Web search engine to use should not be based on size alone, this information is especially important when looking for very specific keywords, phrases, and areas of specialized interest. See also the following statistical analyses: