In the last issue of _ONLINE_, On the Nets compared search
techniques and capabilities of different indexes for the World-Wide
Web [1]. As the high-quality information resources available on the
Internet continue to multiply, the need for indexes to these resources
grows apace. While none of the indexes reviewed offer the kind of
sophisticated Boolean and field searching that is standard on CD-ROM
and online databases, even the best retrieval system will be useless if
the information in the database is inaccurate or incomplete. The ease
with which Web pages can be brought online, changed or deleted means
that no index can be either comprehensive or entirely accurate. Yet
despite their weaknesses, the available indexes to the World-Wide Web
and other Internet resources fulfill an important role in information
retrieval on the Internet.
The best of the indexes, including archie, veronica, Lycos, and
WebCrawler, have been developed by researchers or academicians and
freely disseminated on the Net. These databases of Net resources have
been highly used but are beset with system overloads and database
maintenance problems. Can the commercial sector find a more
efficient and effective way to provide such databases? With the
advent of the commercial InfoSeek service, users can decide for
themselves.
Businesses are scrambling to find the magic combination for
successful Internet marketing. InfoSeek demonstrates great Net savvy
in offering useful free services and an attractive pricing structure.
InfoSeek is a budding online databank, aiming to tap into the Internet
market. It gives subscribers access to full-text and bibliographic
databases. Anyone can search the InfoSeek WWW Pages database, but
only subscribers can see more than the first ten hits. By combining
limited free access and reputable databases, InfoSeek makes an
aggressive play for the Internet online market.
INFOSEEK
To what does this Santa Clara, California company aspire?
"InfoSeek is a new full-text search service that makes finding
information easy. You can search WWW pages, Usenet News, over 50
computer magazines, newspaper newswires and press releases,
company profiles, movie reviews, technical support databases, and
much more" [2]. In answer to the question whether InfoSeek is cheaper
than DIALOG and CompuServe, the online documentation states that "in
most cases, InfoSeek is the lowest cost information search and
retrieval service available." InfoSeek appears to be aiming for the
Internet-and computer-user market, combining computer, news and
business databases with an easy interface and competitive pricing.
The databases available for searching from this new databank are a
modest selection of standard commercial offerings combined with
some unique Internet databases: WWW Pages, Usenet News, Wire
Services, Cineman Reviews, Computer Select, MDX Health Digest, full
text of ComputerWorld and InfoWorld, and two of the Hoover Business
databases. Figure 1 shows the InfoSeek search screen with the list of
available databases. The "Wire Services" heading includes AP Online,
BusinessWire, PR Newswire, Newsbytes News Network, and Reuters
Business Report. InfoSeek has also stated that it plans to add Medline
and unnamed databases in business, finance, health, sports and
national news within the next six months.
PRICING
InfoSeek uses a transaction fee-based pricing. Each search request
counts as one transaction and each document retrieval request counts
as one transaction. Transaction charges range from $0.10 to $0.20,
depending on which of the three subscription plans is chosen. The
standard plan costs $9.95/month and includes 100 transactions, with
each additional transaction costing $0.10. The light use plan costs
$1.95/month and includes ten transactions, with each additional
transaction costing $0.15. The occasional plan has no monthly charge
but transactions cost $0.20 each. No other per-minute or per-record
charges apply, except for the premium collections that have additional
access charges. Site license discounts are also available.
In what should prove to be a very effective marketing move,
InfoSeek supplements their commercial offerings with several free
services. Their WWW Pages database is available free of charge via
Netscape's Internet Search page at
http://home.netscape.com/home/internet-search.html. Searching is
free, but the display is limited to a maximum of only ten references.
(Registered users can display up to 200 per transaction.) InfoSeek has
an extensive and well-organized Frequently Asked Questions about
InfoSeek file. The FAQ is available to everyone for free searching at
http:// www.infoseek.com/FAQQuery. In addition to the free search of
the WWW Pages databases and the free searching of their FAQ,
InfoSeek also gives new users a free trial run. The demonstration
account lasts for one month or 100 transactions, plus a $5 credit for
either additional transactions or access to the Premium collections.
WWW PAGES DATABASE
The most heavily used InfoSeek database is the WWW Pages--not
surprising since InfoSeek offers limited free access. InfoSeek claims
to have the largest index of WWW pages, but that depends on how you
count pages. A single Web document could be composed of multiple
HTML files and include links to even more URLs. According to an
InfoSeek comparison between InfoSeek and Lycos in February of 1995,
the InfoSeek database included more than 214,000 URLs while Lycos
included over 318,000. Since Lycos includes http, gopher, and FTP
URLs, and InfoSeek only includes http URLs or those that use the WWW
protocol, InfoSeek still claims a larger database. They based their
claim on the size of the file containing the raw data: InfoSeek's was
813MB compared to Lycos' 634MB. But part of the reason that the raw
data measurement is larger for InfoSeek is that InfoSeek indexes the
full text of the documents while Lycos does not index entire pages,
only the title, headings, subheadings, hypertext links and the "100
highest weighted words" in the page. The method that is most
effective may well depend on the specific search.
Results from an InfoSeek WWW Pages search can be seen in Figure
2. The top line of each record is the title of the document and is
highlighted as the hypertext link to the resource. The title is followed
by a brief description taken from the beginning of the body of the
document. The URL of the resource on the next line is followed by a
page-size designation in kilobytes, which can be especially useful to
those on a slow connection.
InfoSeek updates its WWW database weekly, paying special
attention to submitted URLs and ones mentioned in the press. In
addition, an update is run on the entire database once a month. This
ensures that any content changes in the thousands of pages in the
database are correctly indexed. Maintaining currency in an Internet
index is a delicate balancing act. On the one hand, network documents
change so quickly and often that almost daily verification is necessary
to maintain currency. On the other hand, frequent verification of
thousands of resources involves a huge amount of bandwidth and an
undue burden on all of the individual pages. The InfoSeek once-a-month
approach strikes a happy medium.
LYCOS AND WEBCRAWLER
InfoSeek's WWW Pages database is an impressive index, but how
does it compare to the other two major indexes, Lycos and
WebCrawler? The numbers given in the InfoSeek comparison mentioned
earlier do not quite tell the whole story. While the comparison
mentions the over 300,000 explored URLs, it neglects to point out that
Lycos included over a million unexplored URLs with descriptions. By
April of 1995, Lycos boasted over 3.3 million "unique URLs," including
the explored and unexplored. Some of the additional URLs can be
attributed to the Lycos inclusion of FTP and gopher resources. The
numbers for WebCrawler are also confusing. Its database includes over
100,000 "explored" documents and another 900,000 "unexplored"
documents.
So which of the three is the most comprehensive? None of them
alone. Any attempt at the impossible "comprehensive" Internet search
must include at least all three. Searching for very distinctive
keywords to try on all three, I explored some Japanese Web sites that
included references to the Oyodo River and the Hyga orange. Yet none of
the three indexes found these pages or any reference to them based on
a simple single-word search of "oyodo" or "hyga". With other single-and
multiple-word searches, each of the three databases turned up unique
items not seen on the other two. In general, Lycos had the highest
number of hits but less precision than InfoSeek. Some of the Lycos
records are duplicates or too dated to be of use. WebCrawler usually
had less than either of the other two, but occasionally WebCrawler
would retrieve relevant documents not found by either InfoSeek or
Lycos.
AVAILABILITY
One major problem with existing Internet indexes is that they have
become overwhelmed with use and can be difficult to reach. With a
free service, popularity rarely attracts the necessary capital for
upgrading equipment to handle the increased load. The "Big Lycos"
database is often so busy that search requests are refused.
WebCrawler has the same problem. The ready availability of InfoSeek
at all hours is a significant advantage over the free indexes--one for
which many may be willing to pay. However, even the commercial
InfoSeek is not without its availability problems. InfoSeek states up
front that the ten free WWW pages search is not its priority and will
not always be available. In addition, even the commercial version was
not available at all times. While InfoSeek's availability is much, much
better than Lycos and WebCrawler, it is not yet perfect.
WEB POSITIONING
Companies, libraries and any other organization that would like to
establish an Internet presence should be aware of the major Internet
databases. Does your library or company have a home page? If so, a
good test of any Web database is to try to find that local home page. In
the event that it is not available, all three of the major WWW indexes
give an opportunity to register the URL of your home pages. In
submitting URLs, be sure to avoid any typos. Depending on the index, it
may take a few days or several weeks for the submitted URLs to show
up in the database.
The Usenet News database available from InfoSeek presents another
important opportunity. Since it can be difficult to guess which of the
thousands of newsgroups may contain mention of a specific
organization or person, using InfoSeek searches across all of them.
Was a competitor recently mentioned in rec.humor.funny or a complaint
posted in misc.consumers? The Usenet database also can be used as a
way to determine which newsgroups most frequently discuss certain
topics.
FUTURE WISHES
As Powell points out, one of the great advantages to the Web and
its HyperText Markup Language is that documents are structured [3].
HTML documents can have titles, headings for major sections, and
named hypertext links. While the database gathering tools may look at
these specific fields in gathering their data, and Lycos returns search
results in records with definite field labels, none of the indexes
provide a simple field-searching option. Although Pinkerton, the
WebCrawler developer, notes that "titles are an optional part of an
HTML document, and 20% of the documents that the WebCrawler visits
do not have them," [4] the ability to restrict certain words to the title
or named hypertext links of a document could help improve precision.
Only one InfoSeek database can be searched at a time (with a
separate transaction cost for each). Multiple database searching could
be a major time saver for the busy searcher; so could the addition of a
current awareness service to InfoSeek's services. Continued active
expansion of the WWW Pages database will be essential to maintaining
the database as an effective indexing tool. If gopher, telnet, and FTP
resources are not added to the WWW Pages database, perhaps InfoSeek
will develop a new database to cover those resources. Until that is
accomplished, InfoSeek can be considered only a partial search for
Internet resources.
InfoSeek is not without its problems. It is only a small databank
with a sophisticated but limited search language. The WWW Pages and
Usenet News archives are important databases that have been
combined with significant computer science and general databases. A
bit more growth in the number of available databases is still needed.
Yet, the savvy shown in its current marketing approach and pricing
structure may uniquely position InfoSeek to become a major player in
the online information marketplace. Even if they never live up to that
potential, the WWW Pages database offers a significant, although far
from comprehensive, step in the right direction for creating access to
the wealth of Internet information resources.
REFERENCES
[1] Notess, Greg R. "Searching the World-Wide Web: Lycos, WebCrawler,
and More." _ONLINE_ 19, No. 4 (July/August 1995): pp. 48-53.
[2] "InfoSeek Home Page." http://www.infoseek.com/
[3] Powell, James. "Adventures with the World Wide Web: Creating a
HyperText Library Information System." _DATABASE_ 17, No. 1 (Feb.
1994): pp. 59-66.
[4] Pinkerton, Brian. "Finding What People Want: Experiences with the
WebCrawler." Electronic Proceedings of the Second World Wide Web
Conference '94: Mosaic and the Web.
http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/Searching/pinkerton/WebCrawler.html (1994).
Communications to the author should be addressed to Greg R. Notess, Montana State University Libraries, Bozeman, MT 59717-0332, 406/994-6563; Internet--greg@notess.com ; http://www.notess.com.
Copyright © 1995, Online Inc. All rights reserved.