[Photo]

Greg R. Notess
Reference Librarian
Montana State University

ON THE NETS

Comparing Net Directories

DATABASE, February 1997
Copyright © Online Inc.


----------------------------------------------------------------

Featured Sites
Excite Reviews
http://www.excite.com/Reviews/

Magellan
http://www.mckinley.com/

Lycos Top 5% Sites
http://point.lycos.com/categories/

Lycos Sites By Subject (a2z)
http://a2z.lycos.com/

Yahoo!
http://www.yahoo.com/

Finding appropriate and relevant information resources on the World Wide Web is often a hit or miss endeavor. The large search engines routinely find an amazing number of irrelevant sites sprinkled with a few gems. Bookmark lists of your favorite sites become dated in a few months with dead ends resulting from site reorganizations, resources moving to different hosts, or the demise of entire sites. Frequently updated, well-organized, subject-oriented directories of Internet resources may come as close as anything in providing the most useful starting points for actual information retrieval on the Net.

With the surprising commercial success of Yahoo! and other directory companies, many directory efforts have been spawned. Excite offers Excite Reviews, Lycos bought out Point and a2z, McKinley launched Magellan, and other competitors are waiting in the wings. Let us see how some of these compare to each other and how well they describe, rate, and cover sites in specific subject areas.

EXCITE REVIEWS

Excite has been expanding its search offerings to include a variety of special search features beyond its Web search engines. One of these is called Excite Reviews, covering about 60,000 sites with ratings on a scale of one to four and brief descriptions of a few sentences in length. The Excite Reviews are classified by subject, and are accessible by subject terms as well as by keyword search. Reviews are written by a team of editors. Each of the 16 subject areas of the Excite Reviews begins with an introductory paragraph by the section editor that highlights a few of the sites.

Excite's team of editors consists of journalists. Their writing style is typically at a popular level, which often emphasizes evaluation based on how intriguing the content is rather than an objective evaluation of the information content. For example, under Patents and Intellectual Property Reviews, the entry for Cornell's U.S. Patent Law site reads "This site reads like an encyclopedia entry (yawn!) but does provide lots of info on the law. Mostly in legalese, Cornell Law School describes the law and its constitutional origin and implications." While rating a yawn and a score of two, this is a site that provides substantial content on patent law, including links to relevant sections from the U.S. Code, the Patent Cooperation Treaty, the Paris Convention, and recent patent decisions from the Supreme Court.

It is usually easy to criticize a directory for entries that are not included. Since the directories are selective, there are bound to be sites that were not selected. Even so, it is disappointing that the Patents category in the Excite reviews lists neither the U.S. Patent and Trademark Office's patents databases (http://patents.cnidr.org:4242/) nor Questel*Orbit's QPAT*US service with its patent database and full-text patents (http://www.qpat.com/).

Excite uses the same search engine for Excite Reviews as for the full Excite Web search. This search engine uses a process called Intelligent Concept Extraction (ICE). In a technical paper on ICE (http://www.excite.com/ice/tech.html), Excite presents an intriguing overview of information search technology, beginning by describing Boolean searching as "the earliest, most primitive technology" and finishing with a claim of increased precision and recall when using ICE. Try out their Intelligent Concept Extraction and judge for yourself if it is more precise than basic Boolean searching.

MAGELLAN

Given the criteria used for ranking, anyone looking primarily for information content can safely ignore the ranking and try to evaluate potential usefulness of a site based on the description.
With Excite's recent purchase of Magellan, the two services will be merging their databases, but for now Magellan has both a general Web search engine and a subject directory of sites that have been reviewed and rated. The directory is arranged by subject in 26 categories and can be searched by keyword. Each entry includes a one paragraph description, keywords, audience, producer, and information on cost. Like Excite, a team of editors and writers review and rate sites in Magellan. The ratings, which range from one to four, are based on a numeric scoring system that evaluates sites on criteria such as ease of use, "Net appeal," currency, and comprehensiveness.

The descriptions in Magellan are a bit less breezy than the Excite Reviews. Since they are typically about twice as long, there is more room to include mention of the actual content of a site. For the Patent Law site at Cornell, Magellan does list some of the documents available on the site, although it still ends the description with an inane, "You'll have to stay up late if you open this box, chums." Again, Cornell gets only a score of two. Given the criteria used for ranking, anyone looking primarily for information content can safely ignore the ranking and try to evaluate potential usefulness of a site based on the description.

Magellan also fails to include an entry for either the CNIDR or QPAT*US patent databases, although it does include a Questel*Orbit Patent page that links directly to QPAT*US. One useful feature in Magellan is that on the top of a search, Magellan suggests related topics. These are subject categories that are suggested for finding related sites, and the links can be quite useful if the first search does not turn up relevant sites.

LYCOS TOP 5% AND A2Z

Another well-known source for ranking of Web sites is Lycos' Top 5% Sites, formerly known as Point. Going along with the trend among Web search engines to add subject directories to their search offerings, Lycos acquired both Point and a2z. Prior to acquisition, Point made a name for itself on the Internet by designating sites as being in the top 5% of Web sites and making a graphic available to sites included in the listing. The honored sites then added the graphic and a link back to Point, providing very effective advertising for Point. Lycos continues the practice with its renamed Top 5% Sites section. Unfortunately, the criteria for inclusion in the Top 5% are not clearly identified. Ratings within the Top 5% Sites are on a scale of zero to 50 in three categories: content, presentation, and experience. However, like the other ratings directories, the ratings are not very useful in determining the quality or even the quantity of actual information content on specific sites

The descriptions of sites in the Top 5% are typically a paragraph long, about the same as Magellan. The tone is similar as well. Rather than a concise description of the significant information content on a site, it too lapses in its commentary. For example, the European Patent Office is considered "useful, but not really too, ah, inventive." After reading a few of these reviews, it is easy to wonder what criteria cause a site to be included in the roughly 5% of Web sites considered "Top." Access to the Top 5% Sites ratings and reviews is via the 16 main subject categories and their subcategories and is keyword searchable.

The former a2z directory, another Lycos acquisition, is now listed on Lycos' site as Sites by Subject. It is both searchable and browsable by subject category. The criterion for inclusion in Lycos' a2z is that the site is one of the 10% of Web sites in Lycos that is most linked-to by users. Although some sites link to a Top 5% review, the others include no rating. The brief, one or two sentence descriptions are much more concise and informative than any of the other directories mentioned so far.

Both of the Lycos databases seem to have missed the patent databases in their collections. It is less surprising here since the closest category available is the general Law category. Their absence also can be surmised to be due to a lack of links to these sites, since the Lycos directories base inclusion on a measure of the number of links to a site or an appearance of being in the top 5%.

Another feature on both Lycos databases, which is rare among other Internet search engines, is the ability to sort. Within the subject categories (which are quite similar between the two databases), sort options are given near the top. The default sort is alphabetical, but the a2z sections can also be sorted by listing the most popular first or in random order. The Top 5% categories can be sorted by any of the three rating criteria.

YAHOO!

The best-known and most popular of the subject directories is Yahoo!. While the others provide reviews, ratings, and descriptions, Yahoo! concentrates on the indexing and arrangement of sites into hierarchical subject categories. Access is through the 14 top-level categories, and then subcategories, or through a keyword search. For each category, a number in parentheses designates how many entries are in that category. This is helpful information when navigating through Yahoo! and can be used to gauge how large a category is.

Yahoo! consists of sites submitted by users, so it is not a very selective directory. Many sites are submitted by the companies themselves, but anyone can submit a site. Due to Yahoo!'s prominence on the Web, its coverage of commercial Web sites is especially good. Formerly, after running a keyword search, all the individual sites found would just display alphabetically by category. Thus, the extensive Business listings would appear first, even if the user was looking for a scientific site. Search results now display matching categories first, before the individual listings. Note that keyword searching defaults to automatic truncation and that multiple word searches default to an AND operator. Choose the Options link to change those defaults or to change the number of results to display per page.

Many entries contain no description, while others may have a descriptive phrase or sentence after the listing. Recently, Yahoo! has begun to add reviews for a few sites. In general, there is little rating of sites, except for an occasional "Cool" graphic (looks like a pair of sunglasses), which denotes those sites that the Yahoo! team considers to have good presentation or content for their respective topic area.

In terms of Yahoo!'s coverage of patents and the patents databases, it fares better than its competitors, but still has room for improvement. The Cornell Patent site is listed, although with no descriptive statement. QPAT*US is listed under Business and Economy:Companies:Law: Intellectual Property:Patents: Services. There are many other links to patent sites and multiple categories. The other patents database, housed at CNIDR, is not listed directly, although it can be found from the Patent and Trademark Office site, which is listed.

A COMPARATIVE CASE STUDY

Unlike the larger Web search engines, a straight across-the-board comparison of these directories is more complex than just trying to gauge the number of entries in each database. The ideal directory will list just those sites that provide quality information content for every topic that a user will need. No directory is likely to ever live up to the ideal of high precision and recall for all searches. As one quick measure of the usefulness and accuracy of the various directories, reviews, and rating services, I compared the treatment of a known Internet resource: the Code of Federal Regulations (CFR), available on the U.S. House of Representatives' Law Library site (http://law.house.gov/cfr/) and searchable with software from Personal Library Software, Inc. (PLS).

A number of features of this version of the CFR make this a useful case study. The CFR is a major resource for federal regulations, and it would seem a logical item to include in any directory that includes a law section. Secondly, when PLS first made the CFR available, it was set up as a demonstration project and used an older version of the CFR. The page itself notes that some of the sections are quite dated. A statement noting that this source contains out-of-date regulations should be included by any responsible directory. Third, the URL for this resource changed in mid-1996. While the old URL (http://www.pls.com:8001/his/cfr.html) still exists, it now states that the site has moved and points to the current URL (http://law.house.gov/cfr.htm). Thus, noting which URL is listed can give a sense of how frequently entries are verified.

The results of this experiment proved disappointing. None of the directories that included a description for the CFR site mention that this version of the CFR is dated and does not contain up-to-date regulations. While the Excite Reviews and Yahoo! point to the current URL, both of the Lycos sites point to the old one. Magellan does not include an entry for the CFR, although it does have an entry for the whole Internet Law Library site. That entry has the current URL, but again, no comment on the date of resources available with the Internet Law Library.

Yahoo! includes entries for both the dated version of the CFR and an entry for a CFR version from Counterpoint. The latter shows up under two categories: Business and Economy: Companies:Publishing: Counterpoint Publishing and Government: Documents. The dated version appears under Government:Law:Federal. Entries for the full House Internet Law Library appear under two other categories: Government:Law:General Information and Government: Legislative Branch:House of Representatives. This demonstrates more entries for CFR sources than the other directories, but it also points out inconsistency in the use of the Yahoo! categories. Multiple subject headings makes sense, but both versions of the CFR should appear under the same subjects headings, and they do not.

These subject directories are very useful resources. While a critical look turns up many defects, deficiencies, and inaccuracies, these directories remain one of the most effective ways to begin a search for specific information on the Internet. Due to its size, search features, and organization, Yahoo! remains one of the best initial approaches for a search. There are plenty of other directories, such as EInet's Galaxy and subject specific directories. See Yahoo! under Computers and Internet:Internet: World Wide Web:Searching the Web: Directories for many more options.

----------------------------------------------------------------

Communications to the author should be addressed to Greg R. Notess, Montana State University Libraries, Bozeman, MT 59717-0332; 406/994-6563; greg@notess.com ; http://www.notess.com.

Copyright © 1997, Online Inc. All rights reserved.