Search Engine Showdown
[an error occurred while processing this directive]


iWon and Google hit top spot
Search Engine Statistics: Relative Size Showdown
by Greg R. Notess

Data from search engine analysis run on July 6-7, 2000.

Bar Graph - 10 KB + iWon Advanced Finds Most
+ Google a close second
+ AltaVista passes Fast
+ Northern Light Drops to 5th

For 33 specific, single word queries, iWon's Advanced Search (which uses an Inktomi database including the GEN3 database) found more hits than any other search engine. Google, with its recently announced 560 million page index, came in a close second. AltaVista and Fast, (at either AlltheWeb.com or Lycos) followed, with Northern Light dropping to fifth place.

Google placed first when looking at that 33 searches and counting the number of times in which it found the most hits for each individual search. Also note that if the one search that had the greatest difference between iWon and Google was dropped, then Google would have placed first with iWon in second place.

Note that each of the top five search engines found more hits than any of the others on a few searches, as shown below. In three cases, there were ties for first place.

Google

Found most 14 out of 33 searches

iWon Advanced

Found most 8 out of 33 searches

AltaVista

Found most 7 out of 33 searches

Fast

Found most 5 out of 33 searches

Northern Light

Found most 2 out of 33 searches

For more details and additional search engines compared, see the chart and the notes below. For iWon, these results are only reflected in the Advanced Search. The iWon basic search will only display one page per Web site and gives no access to the additional pages that might have been found, unless you use the Advanced Search.


This chart shows the results from all 14 Web search engines and gives a comparative, effective measure of database size. The results are based on the total hits from 33 single keyword searches that are processed almost identically by each search engine. Bar Chart - 9 KB

This comparison is based on the reported number of hits from each database, verified by visiting the last page of results whenever possible. The number of records that many search engines can display is often different from the number that the search engine first reports. While this comparison is not a measure based on precision, recall, or relevance, it is an important indicator of the number of records that a searcher can find. It measures the effective database size . For earlier size showdown winners, see the links to older reports and the top three from each at the bottom of this page.

Specific Database Notes

iWon Advanced Search uses an Inktomi database which also pulls records from the Inktomi GEN3 database. On the basic iWon search, only one page per Web site is shown, and no access to the additional pages that might have been found on that site are available. The Advanced Search shows all pages, unclustered by site. In this comparison, iWon surpassed Google primarily due to one of the search terms: dulcitol. iWon advanced search, at the time of the comparison, was able to display 1020 hits as opposed to only 274 for Google, even after all the clustered results were taken into account. Google placed sixth on that search. Most of iWon's additional records were from a single genome research site, which was obviously crawled much more extensively by the Inktomi crawler than by Google. Yet, the other two Inktomi partners that pull from the Inktomi GEN3 database (HotBot and Snap) did not display nearly as many hits for dulcitol as did iWon.

Google includes some results (URLs) that it has not actually indexed. When it counts all the indexed and unindexed URLs, it claims over one billion. But as these examples show, the effective size is considerably less, since most searchers will see very few of the unindexed hits. These URLs that have not been crawled can be readily identified by the lack of a extract or a "cached" copy. Google also clusters results by site and will only display two pages per site. The numbers used here were painstakingly derived by checking the hits for each site, not just the ones that Google displayed initially.

AltaVista clusters results, but this analysis used the Advanced Search which does not cluster by site. AltaVista is notorious for inconsistencies in reporting the number of hits it finds. (Although to be fair, most of the other search engines compared this time had similar inconsistencies in the number of hits reported and what they actually displayed.) Each search result set was checked and only the number of hits available for display was counted. Since the advanced search can only display the first 1,000 results, none of the search terms found more than that number. Because AltaVista can time out on a search and not give a full results set, their total database size may be under-represented here. However, it does reflect what searchers can find when using AltaVista.

Fast is available at several sites, most notable All the Web and Lycos. Since both basically search the same database, Lycos was not tested separately. For this comparison, All the Web was used, but on a few tests, Lycos found the same hits. So either search engine can be used.

HotBot clusters results by site, and there is no way to uncluster them, despite their recently introduced feature on the advanced search that was supposed to make this possible. Therefore, for this comparison, the advanced search was used and then all the top level domains in the results were noted and the search was re-run using the domain limitation with all found top level domains ORed together. Though tedious, this effectively turned off the site clustering to find HotBot's total number of hits available. The results, while not as large as iWon's Advanced Search, still demonstrated that HotBot can pull on Inktomi's larger GEN3 database.

Snap is another Inktomi partner. The results, while not as large as iWon's Advanced Search, were very similar to HotBot and demonstrate that Snap can pull on Inktomi's larger GEN3 database. In the past, both HotBot and Snap found fewer results than other Inktomi partners, especially MSN Search and Anzwers.

Northern Light automatically recognize and search the English-form of word variants and plurals. For that reason, only nonplural terms are used. Only the Web portion of Northern Light was searched, not their Special Collection. Northern Light also clusters hits by site with no ability to disable the site clustering. The number of reported hits was used, rather than trying to verify the number under each site. Northern Light is typically fairly accurate in its counts and presents both the total number of hits and the number of sites.

Excite provides no capability for searching all languages simultaneously (it defaults to English only). With its new Excite Precision Search, it is even more difficult to search all languages. This is due to Excite separating the different language records into their own databases. Due to the impossibility of combing all the records, these numbers only reflect the size of Excite's largest database segment: English-language pages. Past size showdowns used the total number of pages from all languages. While a few more pages can be found in other languages, especially for some of the search terms used, most searchers will not have the patience to try the search in all the different languages.

MSN Search, an Inktomi partner, will only display up to 200 hits, so their reported numbers above that amount could not be verified.

Anzwers, an Australian Inktomi partner, used to find more hits than the other Inktomi search engines, but at this point it does not appear to be using the larger Inktomi GEN3 database.

Direct Hit appears in these comparisons for the first time, now that it will display more than the top ten hits for each search.

WebTop also appears for the first time, especially since its recent claim to have indexed over 500 million Web pages. However, WebTop only partially indexes each page, using some keywords and adding its own meta tags rather than indexing every word that appears on the page. Therefore, in this comparison which looks at the results from single word searches, WebTop finds far fewer results than all but WebCrawler.

More details on the study's methodology provide an example of the comparison process used here.

Older Reports with Largest Three at that Time
April 2000: Fast, AltaVista, Northern Light
Feb. 2000: Fast, Northern Light, AltaVista
Jan. 2000 (supplement): Fast, Northern Light, AltaVista
Nov. 1999:Northern Light, Fast, AltaVista
Sept. 1999:Fast, Northern Light, AltaVista
Aug. 1999:Fast, Northern Light, AltaVista
May 1999:Northern Light, AltaVista, Anzwers
March 1999:Northern Light, AltaVista, HotBot
January 1999:Northern Light, AltaVista, HotBot
August 1998:AltaVista, Northern Light, HotBot
May 1998:AltaVista, HotBot, Northern Light
February 1998: HotBot, AltaVista, Northern Light
October 1997:AltaVista, HotBot, Northern Light
September 1997:Northern Light, Excite, HotBot
June 1997:HotBot, AltaVista, Infoseek
October 1996:HotBot, Excite, AltaVista

While decisions about which Web search engine to use should not be based on size alone, this information is especially important when looking for very specific keywords, phrases, and areas of specialized interest. See also the following statistical analyses: