Search Engine Showdown
StatisticsDirectoriesReviewsOthersRead More
FeaturesStrategiesNews SearchesMulti-Search EnginesPhone Numbers
Showdown News Vol. 1 No. 7

= = = = = = = = = = = = = = = = = = = = = = = = = =
SHOWDOWN NEWS
The Search Engine Showdown Online Newsletter
Dec. 8, 1999 Vol. 1 No. 7
By Greg R. Notess, Search Engine Showdown
http://www.notess.com/search/
= = = = = = = = = = = = = = = = = = = = = = = = = =

In this issue: Search Engine Sizes, Fast Advanced Search, free USGOVSEARCH, search engines news, and more.

SEARCH ENGINE SHOWDOWN SIZE ANALYSIS:

With AltaVista claiming 250 million pages in its database and Fast and Northern Light both announcing over 200 million, which finds more results on actual searches? In the latest Search Engine Showdown analysis from November 29 the answer was Northern Light. It came in first with Fast and AltaVista in second and third place in a close race. Google moved up to fourth.

As in previous Search Engine Showdown analyses, 25 separate queries were used, drawn from a wide variety of disciplines. The relative sizes and the derived total size estimates are based on the total number of results delivered from these searches. Each search result was verified by going to the last page of results, and only the total number of records displayed was used, rather than the number reported by the search engine.

Note that the relative size graph and total number estimates do not tell the whole story. Northern Light did not consistently find more than Fast or AltaVista. As a matter of fact, each of the top three found more hits than their competitors on several of the 25 searches.
   Northern Light found the most 11 times
   Fast found the most 8 times
   AltaVista found the most 9 times
(There were a couple of ties, which is why these do not add up to 25.)

Based on the results from the 25 searches and using Northern Light's reported size, I also updated the Total Size Estimates page. I also ran a dead link comparison to determine what percent of the search engine results might be inaccessible. The Total Size Estimates page also includes estimates that factor in the dead link analysis. After factoring in the dead link percentages, none of the search engines came up with more than 200 million pages. They range from Northern Light at 196,345,924 down to HotBot's 38,810,341. For all the details, see the relative size, total size estimates, dead link report, and change over time pages for more details.
<http://www.notess.com/search/stats>

Commentary:

So what do the numbers mean for the searcher? I continue to run these analyses because I find them very informative for my own searching patterns. When searching the Internet for very specific information or even for a more general question for which a very specific query statement can be used, starting with the largest search engine is a sound strategy. And comparing the search engines over time helps show whether or not the engine's database is larger or smaller than it was a few months ago. So let's see what the numbers say about specific search engines.

AltaVista

Although my analysis did not verify AltaVista's claim of 250 million pages, it did show a significant increase since the analysis done in early September. Unadjusted for dead links, AltaVista grew significantly from the September total size estimate of 137,486,307 to 191,213,426 at the end of November -- a roughly 140% growth in size. In addition, AltaVista found the highest number of records 9 times out of the 25 searches used.

What could have caused the large discrepancy between AltaVista's claim of 250 million and the Search Engine Showdown estimate of 191 million? First of all, remember that AltaVista will time out on a search and prefers to deliver partial results quickly rather than finding every record in its database. According to some folks at AltaVista, Monday, November 29 was an especially busy day and that could partially explain why searches then found less. In addition, it is possible that at the time of the trials part of the database was inaccessible, down for backup, or otherwise unavailable as has happened at other search engines. Of course, that would mean it was also unavailable for all other searchers at that time as well.

Northern Light

Northern Light's steady growth helped pushed it out in front on this analysis. While dead links in Northern Light were a major problem last spring, they seem to be maintaining a cleaner database now. On the dead link analysis which evaluated 100 records from three different, common searches, Northern Light only had 2% of its results which had errors and a total of 4% inaccessible when including errors and failures to connect. Finding the greatest number of hits 11 times out of the 25, Northern Light should certainly be searched for any comprehensive search, especially since it is so rarely included by multiple search engines.

Fast

While Fast's search engine at http://www.alltheweb.com remains large it found fewer results than in the past two analyses. Its dead link rate also was at 18% when counting pages not found or forbidden and to 22.3% when the no connect pages were added. Factoring in the dead links caused Fast to drop below AltaVista on the total size estimates. However, shortly after this comparison Fast introduced new features. See below for more information on these new capabilities.

HotBot

How the mighty have fallen! For a long time, HotBot was always one of the three largest search engines. Then, as the Inktomi-based search engines, HotBot started clustering results and delivering an ever smaller database. At one time it seemed to search a database with over 100 million records. Now, this latest estimate places it at less the 40 million. On the positive side, its dead link rates improved dramatically with one of the lowest rates for any Inktomi search engine and a lower rate than Northern Light.

Inktomi Databases and Others

The best scoring Inktomi database, Anzwers, only rated an estimate of about 78 million records, putting it in fifth place. The newer iWon also scored well, finding nearly as many as Anzwers. These were followed by Yahoo!'s Inktomi database, AOL, Snap, with HotBot last of all. Note that only these six Inktomi databases were analyzed.

Over the course of 1999, the largest Inktomi databases have been finding fewer hits, even for the same searches. Some, like Yahoo! (its search engine side, not the directory), started small at the beginning of the year and then showed significant growth. But almost all were down in this comparison.

Google, Excite, and Lycos all showed growth when compared with the September analysis. Google and Yahoo's Inktomi also scored extremely well on the dead link analysis. The unique hits and overlap analyses will be complete sometime soon, so check back on Search Engine Showdown for updated information.

SEARCH ENGINE NEWS:

Fast introduced some additional search features in its new advanced search. The new features include language limits, domain limits, and title and URL field searching. It also permits displaying up to 100 hits at a time. Other search form lines have been added offering the ability to do approximate Boolean searching, using the drop down options of Should Include, Must Include, or Must Not Include. While it is great to see Fast adding new search features, it still does not support Boolean operators or nesting, it has not truncation, and no link field searching.

Also, this initial version of the advanced search defaults in the main search box to "Any of the words." Note what happens when you put in a search term and then add a language limit without changing the default. For example, search on the nonexistent word "klazq" and change the language to English. Fast will respond with over 137 million hits, if the default "any of the words" is left as is. It has obviously done an OR between the nonexistent word and all English language pages in its database. To get it to work correctly, be sure to change the main drop down box to "all of the words" rather than the default on the Advanced Search of "any of the words."
<http://www.alltheweb.com/advsearch> Advanced Search
<http://notess.com/search/features/fast/review.html> Showdown Review

A public access version of USGOVSEARCH is now available directly with registration. This version only searches the government Web sites in the full subscription USGOVSEARCH and not the NTIS or Special Collection databases. Even so, it is the largest U.S. federal government search engine available. And if you are looking for NTIS or Special Collection databases, they are available in the full Northern Light. Advantages of the public access USGOVSEARCH are that it focuses specifically on government resources, including many not in the .gov or .mil domains and also its specialized agency and government subject search options.
<http://usgovsearch.northernlight.com/publibaccess>

AltaVista's Advanced Search no longer clusters results by default, unless the searcher specifically checks the "Show one result per Web site" box. The size of the Boolean box has also been increased back closer to its former size.
<http://www.altavista.com/cgi-bin/query?pg=aq&what=web>

Britannica.com went live with its new version which includes the full text of the Encyclopædia Britannica along with the Britannica Web directory (now called the Web's Best Sites). It also includes full-text articles from about 70 periodicals and links to books available for purchase from Barnes & Noble.
<http://www.britannica.com>

IntelliSeek officially launched its InvisibleWeb.com, a directory of more than 10,000 Web-accessible searchable databases. Many of these specialized databases can be used to find information not readily available from other Web sources.
<http://www.invisibleweb.com>

Northern Light announces its receipt of U.S. Patent #5,924,090 for its Custom Search Folders search result sorting technology. Northern Light also announced improved relevance ranking algorithms which rely more heavily on link analysis.
<http://www.northernlight.com/docs/press_company_pr.html>

Oingo has come out of beta and is offering its "meaning-based search" technology royalty-free to portals and other content providers. It remains to be seen who may take up Oingo on the offer. Meanwhile, their site continues to use the Open Directory and AltaVista database to showcase their product.
<http://www.oingo.com>

WholeWeb.net is an up and coming technology to watch. It promotes the use of its very large database technology as a way to build much larger Web search engines, update them more frequently, and offer better relevance. I have seen a demonstration of the technology which looks promising, but it is not yet available in any version that searchers can use.
<http://www.wholeweb.net>

SEARCH ENGINE SHOWDOWN UPDATES:

A central Inconsistencies page has been added. It points to the various inconsistency reports available, which at this point include AltaVista, Google, HotBot, and Northern Light. Others will be added as time permits and inconsistencies surface. Each inconsistencies page also includes a section for reporting additional inconsistencies.
<http://www.notess.com/search/inconsistent.shtml>

A new Current Awareness services section has been added under the heading of Alerts. It includes information on alerting services such as Mind-It, Karnak, TracerLock, and others.
<http://www.notess.com/search/alerts>

Some field searches which used to be available are so no longer. Google no longer supports the flink: field search. And Infoseek stopped supporting the alt: field search. While the flink: search for following a link pattern forward had little utility for most searchers, the alt: search on Infoseek provided the ability to limit searches to just the image alternate text tag.
<http://www.notess.com/search/features/infoseek/review.html>
<http://www.notess.com/search/features/google/review.html>

The Deja.com review has been updated to reflect that Deja.com changed the initial search screen so that it no longer allows the searcher to select Usenet (discussions) or its product reviews. It now automatically searches both and displays results from both. Use the advanced search to limit to one database or the other.
<http://www.notess.com/search/usenet/deja>

SEARCH ENGINE READINGS:

Greg R. Notess. "Raising Dead Links." EContent. 22(6): Dec. 1999.
Discusses issues related to dead links on the Web, including strategies for finding all or partial content from the dead pages.

Danny Sullivan moderated a Search Engine Strategies Seminar '99 in San Francisco on November 18, 1999. Sponsored by Internet.com, this seminar was geared more toward marketers, the Web positioning industry, and the search engine companies rather than searchers. But watching the interplay between those communities can be very informative for the searcher as well. Check out the links below for the agenda, and a couple of excellent reports on the seminar from attendee Chris Sherman.
<http://seminars.internet.com/sew/sf99>
<http://www.infotoday.com/newsbreaks/nb1206-2.htm>
<http://websearch.about.com/internet/websearch/library/weekly/aa112999.htm>

Over the past few months I gave two somewhat similar presentations which covered search engine inconsistencies and strategies to cope with those inconsistencies:

Greg R. Notess. "Searching the Web: Search Engine Inconsistencies and Successful Strategies." A presentation at Sökmaskiner och cyberbyar in Stockholm, Sweden on Nov. 9, 1999. HTML version of the PowerPoint presentation available at
<http://www.notess.com/speak/stockholm99/>

Greg R. Notess. "Search Engines: Failures, Glitches, and Solutions." A presentation at Online World in Chicago on Oct. 27, 1999. HTML version of the PowerPoint presentation available at
<http://www.notess.com/speak/onlineworld99/>

= = = = = = = = = = = = = = = = = = = = = = = = = =
Showdown News: copyright © 1999, Greg R. Notess
This may be forwarded to others if it is forwarded
in its entirety including this copyright notice.

= = = = = = = = = = = = = = = = = = = = = = = = = =
For more information about Showdown News,
including subscription information, see
http://www.notess.com/search/lists/news.shtml
Questions or problems? mailto:greg@notess.com
= = = = = = = = = = = = = = = = = = = = = = = = = =