[On the Nets]
Greg R. Notess
Reference Librarian
Montana State University

----------------------------------------------------------------

The Internet as an Online Service:
Bibliographic Databases on the Net


DATABASE, August 1996
Copyright © Online Inc.

Despite many voices of reason, a common misconception of the Internet is that all the information in the world is available on it. People who are new to the world of online information find interesting Web sites and a great quantity of information and then assume that everything else can be found there as well. Information professionals know the limits of online sources, and the Internet in particular. Librarians find students spending hours on the Net when a quick search in Readers' Guide or an almanac may find the answer more quickly.

How does the Internet compare with the online services? Can the information content of Chemical Abstracts, ERIC, INSPEC, and ABI/Inform be found on the Internet? Since most online services can be reached via the Internet, all of the databases within the services are technically Internet-accessible, but when most people refer to the Internet they are talking about freely-accessible information sources. Given that definition, the answer remains no. Internet-accessible bibliographic databases represent only the rudimentary beginnings of a competitor to commercial online services. The databases are relatively few and far between, have little depth in years of coverage, and feature only simple search syntax. The U.S. federal government produces or sponsors a number of major bibliographic databases. As they were in the early days of online services, these databases have been in the forefront of Internet-accessible databases.

ERIC

Since ERIC is DIALOG database number 1, it is only fair to begin exploring the Internet as an online service by looking at the availability of ERIC on the Net. ERIC has been available for a number of years, at different sites and in different formats. Back in 1993, this column discussed the availability of ERIC via several OPACs [1]. Since then, ERIC became accessible via gopher and then on the Web. The ERIC Clearinghouse on Information and Technology sponsors the AskERIC Web site, which includes the ERIC database at http://ericir.syr.edu/Eric/. The site uses a forms interface for searching, and then gives results in HTML format. The form for search input includes Boolean capabilities and field searching. The search engine, from Personal Library Software, ranks the search results according to relevancy.

The availability of the ERIC database in an easy-to-use format on the Web does a great service to end-users interested in educational topics. However, the professional searcher will quickly become disappointed by this resource. The database coverage only goes back to 1991. Even more frustrating is the unreliability of the site, suffering as it does from unpredictable down times. The search interface does not permit nested search statements, no online thesaurus is available, and search results are not saved as sets, so there can be no post-search combining of previous searches. The documentation states that "the most relevant document will appear first," which is what all relevancy ranking engines claim. How well that succeeds is something users should judge on their own. Other sort options are not even available: the user cannot request a reverse chronological list or an arrangement by author.

Compared to the search and display capabilities of a good CD-ROM or online version of ERIC, this Web version is quite primitive. However, for quick searches of recent years, this can still be a useful resource. When it is functioning properly, it can be more convenient for end-users than coming into a library to search a CD-ROM. In addition, for those seeking more advanced capabilities who are willing to deal with more problematic connections, the ERIC database is still available through a few other options, including gopher, telnet, and tn3270. The ERIC Clearinghouse on Assessment and Evaluation lists these at http://www.cua.edu/www/eric_ae/search.html.

GOVERNMENT PUBLICATIONS

Another bibliographic database from the government is the Government Printing Office (GPO) Monthly Catalog. This database of citations to government publications can be searched on the GPO Web site at http://www.access.gpo.gov/su_docs/dpos/ adpos400.html. This is another database that is frequently available electronically in libraries (at least depository libraries) and on online services. The daily updates on the Web site offer access to more recent records than most other electronic versions that update only monthly or quarterly. Another advantage to the Web version of this database is the locator service. Not only can GPO records be searched, but once records are displayed, a button labeled Locate Libraries produces another form that gives state and ZIP code access to depository libraries that should have the specific document.

Unfortunately, this version of the Monthly Catalog database suffers from numerous problems similar to the ERIC database. First, it only covers items back to January 1994. Most online and CD-ROM versions include the entire run available electronically, back to July 1976. Search options on this WWW version include field searches by title, SuDoc number, item number, stock number, and publication year. Boolean operators, phrase searches, and truncation are supported in the WAIS interface. Like ERIC, nesting is absent and searches are not saved as sets. The absence of index browse capabilities, direct author searching, and post-search combination of previous searches greatly limits the functionality of this database.

TECHNICAL REPORTS

Yet another government database freely available on the Internet is the DOE Reports Bibliographic Database at http://www.doe.gov/html/dra/dra.html. Like the Monthly Catalog, this database only covers items back to January 1994. On the plus side, it also links into a locator service for finding depository libraries in a specified state that should own the exact report. Note that these bibliographic records are not available in the GPO Monthly Catalog as separate items, although they do have SuDoc numbers, so this is not just a subset of the Monthly Catalog database. Besides the SuDoc numbers, the DOE Reports records include the NTIS order number and the primary report number. Given the information provided, it should be relatively easy to find individual documents in a depository library or from NTIS.

The DOE Reports Database covers a significant portion of the report literature in energy research. The dates of coverage limit the utility of the database now, but as time goes on and more records are added, it should become increasingly useful. Boolean operators, phrase searching, field searching, and truncation provide some advanced search options, but the database is still hamstrung by the same limitations as ERIC and the Monthly Catalog. Another problem is that Netscape (version 2.0 and above) cuts off the last part of the display in many records, usually the library locator portion.

Another technical reports bibliographic database is offered by NASA: the Center for AeroSpace Information (CASI) Technical Report Server or RECONselect. Available at http://www.sti.nasa.gov/RECONselect.html, this service features three databases that provide much broader coverage than just 1994 to the present. The NASA Technical Reports database covers records that appear in Scientific and Technical Aerospace Reports (STAR), and this database goes back to 1962. The NACA Technical Reports database covers 1915 to 1960 and includes reports from NASA's predecessor "as well as various other aviation reports." The third database, dubbed Open Literature, covers 1962 through the present and includes selected aeronautical and space science literature published in other sources. With options to search the databases separately or all together, RECONselect provides access to a substantial number of citations and abstracts in aeronautics and the space sciences.

A new, more sophisticated search interface is under development, and can be explored through the CASI TRS Version 3 link. The "enhancements include a revised search results headline display, improved search capabilities, and a new document display format." The former display format was in a fixed width font while the newer version is in HTML with hypertext links when appropriate. Comparing the search capabilities of the two versions demonstrates some of these differences, but even more significantly, it turns out that no records appear with a publication date of 1995 or 1996 in the default version. The Version 3 database must be searched to retrieve records published after 1994. Presumably, the new version will shortly replace the current default database.

THE WAIS COMPONENT

One of the most common database formats underlying bibliographic databases on the Internet is a Wide Area Information Server (WAIS) interface. While it is often the availability of WAIS that makes Internet access to these databases possible, available WAIS interfaces also limit the search features available. Not all implementations of WAIS support Boolean queries. Those that do, typically require that the operators be input in uppercase: AND, OR, NOT, ADJ. Output options and sorts other than relevancy rankings are not generally available. WAIS does not necessarily make this impossible. The NASA Technical Reports (Version 3) server gives six sort order options: relevancy ranking, reverse relevancy ranking, largest first, largest last, alphabetical, and reverse alphabetical. Yet commonly requested output orders, such as reverse chronological and author last names, are not available.

WAIS does allow for natural language queries, but in the GPO, DOE, and NASA databases, a search is far more effective with keywords, such as "laser cavities," than trying something like "I am looking for studies on laser cavities." Single keyword searches may work equally well in a WAIS database versus most other database structures, but the success of more complex searches depends on the quality of the WAIS interface and even the version of WAIS being used.

UnCoverWeb

Government bibliographic databases are not the only ones available on the Web. For the past few years, UnCover has been one of the few general periodical indexes freely accessible on the Internet. World Wide Web access to UnCover can now be found at http://www.carl.org/uncover/. UnCover continues to be a significant table of contents resource, covering over 17,000 periodicals. The Web interface is very similar in search syntax to the telnet interface, with keyword searching and periodical title browsing as the main access points. No new search features have yet been included in this Web front end. So far, it also seems to be faster than the telnet connection, which could get so slow that the connection would time out before a search result displayed. The Web interface should make the database more accessible to end-users than the telnet interface.

The search capabilities of UnCoverWeb remain fairly primitive. Even with a Web connection, it seems to react more slowly than other sites. The absence of subject indexing and a controlled vocabulary in the UnCover database remains a major limitation to its utility for comprehensive subject searching. Typographical errors further complicate retrieval. Yet it is the only general periodical index with coverage in all fields that is Internet-accessible without a fee and without any required registration.

FUTURE DIRECTIONS

It seems certain, at least for the short-term future, that the Web will continue to be the primary area of development for new Internet bibliographic databases. The limitations of WAIS should cause those seeking to make such databases accessible to look at other software options or more advanced uses of WAIS. Not surprisingly, one of the most promising developments in the realm of Web front ends to online databases comes from a commercial online service. Questel*Orbit has just brought up QPAT*US (http://www.qpat.com/), a WWW service providing access to U.S. patents issued since 1974. The service includes access to the full text of patents for registered, paying customers. However, they also offer a free database of citations and abstracts, simply requiring no-cost registration.

Not only does QPAT accept natural language searches, field searches, Boolean operators, and nesting, but it keeps track of previous queries in search sets. At last a WWW form-based search permits combining of search sets, not just search terms. So many other Web databases give the searcher only one search statement. The query can be redone, but such search forms give no option to combine different sets. Considering that it is only recently that some forms of Boolean capabilities have become common on the Web, it should come as no surprise that advanced search features have taken so long. For whatever reason it has taken so long to bring this search feature to the Web, it is long overdue. Finally, online searchers can try a World Wide Web interface that can run advanced searches using the basic search capabilities that they have come to expect of online services and well-designed CD-ROMs. QPAT even does automatic truncation, searching for word variants rather than just any word starting with the specified stem, and provides an option to turn off the automatic truncation.

Some field searches are restricted to paying subscribers, but, except that limitation, QPAT*US offers a remarkably well designed and very powerful interface to the U.S. patents bibliographic database. Compare the features of QPAT to that of the CNIDR U.S. Patents Project at http://patents.cnidr.org:4242. CNIDR has done a remarkable job of creating a functional interface to the peculiarities of a WAIS database. Three different search options--simple, Boolean, and advanced--offer field searching, Boolean operators, nesting, truncation, and finally a system that can sort results chronologically. Like QPAT, the database covers back to 1974. CNIDR is to be commended for an excellent set of search options that are well ahead of most other databases mentioned in this column. Yet after seeing the speed and the ability to combine previous search sets on the QPAT server, it is tough to go back to CNIDR. Trying the search on CNIDR, in a single search statement, resulted in the message, "Sorry, your search took longer than 6 minutes and has timed out." [Editor's Note: For an in-depth look at QPAT*US, see the review by patent expert Nancy Lambert in this issue. --PH]

For online searchers used to the comfort and reliability of commercial online services, QPAT represents the possibilities of the future. Questel*Orbit presents a compelling vision of how to transfer the search capabilities of a command line system into a WWW interface. In addition, they follow the practice of UnCover in that they freely give away access to the bibliographic database in the hopes of reaping profits through selling access to the full-text portion. While it is too much to expect that all bibliographic databases will be free one day, this kind of WWW interface can at least help in moving completely away from connect-time charges. Let the transition begin!


Featured Databases

ERIC Database http://ericir.syr.edu/Eric/
GPO Monthly Catalog http://www.access.gpo.gov/su_docs/dpos/adpos400.html
DOE Technical Reports http://www.doe.gov/html/dra/dra.html
NASA Technical Reports http://www.sti.nasa.gov/RECONselect.html
UnCover Web http://www.carl.org/uncover/
QPAT*US http://www.qpat.com/
CNIDR Patents Projecthttp://patents.cnidr.org:4242/

REFERENCE

[1] Notess, Greg R. "Offspring of OPACs: Local Databases On The Net." DATABASE 16, No. 3 (June 1993): pp. 108-110.

----------------------------------------------------------------

Communications to the author should be addressed to Greg R. Notess, Montana State University Libraries, Bozeman, MT 59717-0332; 406/994-6563; greg@notess.com ; http://www.notess.com.


Copyright © 1996, Online Inc. All rights reserved.