Tod Shacklett
LIBR 289
16 April 2001
Return to Academic Résumé

Topic #6:

Integration of various formats into collections in Libraries and Information Centers has always provided challenges to the staff. Over the years special handling, indexing, or cataloging required for nonbook formats such as microforms, videos, CD-ROMs, serials, and other items have required specialized procedures. Now that the resources on the Internet can be part of the resources of any library or information center, what are the issues surrounding the integration of this new format into the library or information center? What is unique about the Internet resources, what is the same? What methodologies will be used?

Style Manual:

American Psychological Association. (1994). Publication manual of the American Psychological Association. (4th ed.). Washington, DC: Author.

American Psychological Association. (2000). Electronic reference formats recommended by the American Psychological Association. [Web site]. Washington, DC: Author. Retrieved April 6, 2001, from the World Wide Web: http://www.apa.org/journals/webref.html

Internet Resources and the Library Catalog

The explosive growth of the Internet in recent years has resulted in its becoming a massive information exchange medium. Being in the information business, it is incumbent upon libraries to include at least some Internet resources in their collections, but the pertinent questions are which ones, and how should those resources be incorporated into what are primarily print collections? Internet resources are worthy of inclusion into library collections, provided those resources meet established selection criteria. Cataloging rules should also be modified to facilitate the inclusion of these resources.

Selection

Internet resources should be selected based on the same criteria used for other, more traditional materials, namely the value of those resources to the users of the collection. (Lam, 2000, p. 52-53) Web resources are plentiful, and most anyone who has Internet access probably also has a plethora of favorite Web sites to recommend. Without a clear selection policy, librarians may be flooded by requests to catalog sites that add little or no value to the collection. Cataloging policies and practices should be shaped by a formal recognition of the value of resources, and "the logical starting point is an existing collection development policy for any given subject area" (Porter & Bayard, 1999, p. 393). Alternatively, the library may prefer to develop a separate document dealing specifically with Internet resources, (Porter & Bayard, 1999, p. 393) however the selection criteria should still mirror that used in the selection of print resources.

At the Nashville State Tech Library, the selection policy for Internet resources evolved as a matter of necessity. Librarians began by developing lists of Internet resources and placing those lists on the library’s home page. Links were added somewhat randomly, and some staff members contributed personal link lists, or bookmark files, to the home page as well. Despite the fact that the collection of links was poorly organized, it was discovered that many students began using the Internet resources exclusively and ignoring the library’s physical collection. To counteract this problem, the decision was made to add the Web resources to the library’s catalog. Efforts were turned towards "doing those fundamental things for digitized resources that we have always done best for other media—selecting and cataloging the best and most appropriate materials for our clientele" (Veatch, 1999, p. 64).

After the decision was made to add Web resources to the catalog, librarians at the Nashville State Tech Library used a number of different criteria in their selection of those resources. The content must be valuable to the users, have a good graphical presentation, "and at least appear to have some degree of stability—traits usually found in .org and .gov sites" (Veatch, 1999, p. 65-66). Academic uniform resource locators (URLs) may have been included but were regarded with some suspicion. Those resources could be student-constructed, and as such be unstable, disappearing when the author graduates. Personal home pages were also avoided because of the difficulty in verifying the information that those pages contain. Also avoided were pages that charge a fee for access to major portions of the site, although some may have been included if there is significant free information to be found there. Sites were also not included if "there is no identifiable corporate or personal author or publisher—an infrequent occurrence, although we’ve been surprised to find as many as we have. (Veatch, 1999, p. 65-66)

The Nashville State Tech Library staff did encounter some problems. Few selection aids were found for Internet resources, compared to the scores available for print resources. Also, few Online Computer Library Center (OCLC) records were found for the Internet resources, forcing the cataloging staff to do more time-consuming original cataloging. (Veatch, 1999, p. 65) This study was done in 1997 however, and these problems may be less acute today. Sources for reviews of print materials now often include Internet resources as well, like reviews in Choice; which "carefully describe the sites in terms of content, design, further links, for which type of library it is most suitable, and the authority of the producer, and College & Research Libraries News; "which describes Web sites on a specific topic each month" (Porter & Bayard, 1999, p. 393). Despite these problems, the library has increased the depth of its collection by using Internet resources and has reported some satisfaction with the stability of the sites selected. (Veatch, 1999, p. 66)

A similar study of the feasibility of adding Internet resources to the library catalog done by the University of Notre Dame Libraries unfolded along similar lines. Placing links to Internet resources on the home pages required users to search both those pages and the online public access catalog (OPAC). (Porter & Bayard, 1999, p. 390) When the decision was made to include Internet resources in the OPAC, "they emphasized cataloging sites by ‘reliable’ producers of information and committing to using link-checking software as a database maintenance procedure. Few of their sites changed URLs or went under during the project" (Weiss & Carstens, 2000, p. 52).

One of the questions raised in their study was how much should a library spend in terms of human resources in order to provide users with access to free Internet resources? (Porter & Bayard, 1999, p. 390) Should paid catalogers spend time cataloging free resources? This concern seems a bit facetious, as there is certainly some value added by organizing and making available those resources. In fact, the authors conclude that

Quality control, or the library’s stamp of approval, is important to users, as is enhanced access through subject headings and notes that allow users to retrieve Web sites along with information in more traditional formats on the same subject, the cherished ‘one-stop-shopping’ idea. A Web-based catalog that allows for almost instantaneous retrieval of resources is powerful in terms of user satisfaction. (Porter & Bayard, 1999, p. 394)

Perhaps part of the selection criteria should be a desire to evaluate and provide access to resources that the library patron is likely to use, regardless of whether or not the library actually owns those resources. "The library’s catalog has exceeded its traditional role. It is no longer [an] inventory of what the library owns but rather [to] what the library has access. . ." (University of Rochester River Campus Libraries, 1997, as cited in Weber, 1999, p. 300).

Stability

In addition to being selected on the basis of established criteria, Internet resources must also be relatively stable. (Lam, 2000, p. 52-53)

Complaints about Web resources invariably center on the difficulties in organizing and archiving them, on the inconsistent quality among Web sites, and the disappearance of their uniform resource locators (URLs), resulting in the dreaded ‘404’ message. (Porter & Bayard, 1999, p. 390)

In a study primarily concerned with the availability of electronic resources cited in academic journal articles, Carol Anne Germain (2000) reported that the average lifetime for a URL is forty-four days. Further,

A longitudinal study undertaken by Wallace Koehler reviewed the persistence of 361 randomly chosen Web sites and Web pages over one year. Results of this study found that 110 (31%) of the Web sites and Web pages failed to respond at the final test. This electronic environment, though very exciting and stimulating, also is quite volatile. (p. 361)

In her own study, Germain found that "almost 50 percent of the URL citations could not be accessed and two-thirds of the journal articles contained corroded citations" (p. 363). The fact that Germain studied article citations and not cataloged Internet resources probably makes a significant difference in the findings. One would probably not choose articles to cite in a paper with as much concern for retrievability as one would have when choosing Web resources for a library catalog.

In the University of Notre Dame study discussed above, librarians were well aware of the problem of disappearing URLs. This was taken into account in the selection process. Sites were more likely to be chosen for cataloging if there was reason to believe that those sites would remain stable.

At two-month intervals after the project ended, the stability of the selected sites was verified. One site moved and another was redesigned, but the content remained the same. In the final analysis, the only site to disappear happened to be a personal home page. The stability is in large part due to careful selection of resources, mostly from academic institutions, government sources, or large digitization projects not likely to be abandoned. (Porter & Bayard, 1999, p. 392)

Selecting sites that appear to be stable seems to be a bit like selecting lottery numbers that appear to be lucky, but it is true that certain resources are likely to be more long-lived than others. The Library of Congress (www.loc.gov), for example, will likely remain at the same address for some time, although individual pages within the site may move or change.

One way to provide stability to Internet URLs is "to relocate the maintenance issues to a second source, so that the URL in the record can remain the same even if the file moves" (Schneider, 1997, p. 77). OCLC has developed a utility for this purpose called the Persistent Uniform Resource Locator (PURL). (PURL home page, n.d.) The PURL is an Internet address, much like an URL, except that it does not represent an actual location, but is instead an identification code for the resource to which an URL may be linked. "If the URL changes, the record-maintainer can enter the corresponding PURL database to change the address represented by the PURL; the PURL itself remains constant" (Schneider, 1997, p. 77). PURLs are relatively new, and their use in not yet widespread. Extensive use of well maintained PURLs would ease the problem of unstable URLs considerably, but maintenance of the PURL database would still require some human intervention.

Cataloging

The cataloging of Internet resources differs from print resources primarily in two areas: the description of the changing characteristics of Internet resources and the provision of access to those resources. (Lam, 2000, p. 53) The basic rules used for cataloging in the United States and most other English speaking countries are the Anglo-American Cataloging Rules, Second Edition (AACR2). As a supplement, the manual entitled Cataloging Internet Resources, is generally accepted as the equivalent AACR2 chapter for describing Internet resources. (Porter & Bayard, 1999, p. 391; Anglo-American cataloging rules, 1998; Olson, 1997) A persistent problem with cataloging Internet resources is that the creators of those resources often do not provide the standard bibliographic information that AACR2 requires. "The terms ‘author’, ‘title’, and ‘publication information’ frequently do not have the same meaning when applied to Internet resources. Information deemed essential by libraries may not have the same meaning or context for the producers of Internet resources" (Weber, 1999, p. 301).

In a view somewhat to the contrary, Bella Hass Weinberg (as quoted in Weiss & Carstens, 2000) asserts that indexing the Web is not much different than dealing with any other materials, noting that "there were at the time of her writing approximately two million Web sites, compared to thirty-nine million records for unique materials in OCLC, and thirty million in RLIN," (Weiss & Carstens, 2000, p. 51) showing that the number of Web sites in existence may not be as high as generally believed. Weinberg also likened the structure of Web sites to periodicals, and pointed out "that even a single edition of a book can contain variants and that merely being mutable (like a loose-leaf publication) does not make Web materials unusually novel" (Weiss & Carstens, 2000, p. 51). Her preference was that librarians just catalog Web resources as they would any other materials using existing cataloging rules, rather than create elaborate bookmark files on a library Web page.

Rule Changes

Despite Weinberg’s assertion, many sources agree that some changes to the cataloging rules are called for if Internet resources are to be handled effectively and, in fact, many such rule changes are in the works. One problem that needs to be addressed concerns the multiple versions that must be created if AACR2, specifically Rule 0.24, is followed literally. The implication of Rule 0.24 is that a new record must be created if there is a variation in the physical carrier between two documents, even if the documents have the same intellectual or artistic content. (Anglo-American cataloging rules, 1998 , p. 8) This means, for example, that both the print and online versions of a serial must be cataloged as separate records, even if the content is identical. Cataloging identical items separately based only on differences in physical carrier could result in hours of wasted time for the cataloger and frustration for the catalog user who must sort through many retrieved records that contain the same basic content. "In their local catalogs, many libraries are already adding holdings for various versions of serials to only one catalog record. Even though this local practice has been sanctioned by the Cooperative Online Serials Program (CONSER), it is still contrary to AACR2 rules" (Weiss & Carstens, 2000, p. 48-49).

The cataloging code also "assumes that the content of a document is permanently fixed within a physical object" (Weiss & Carstens, 2000, p. 49).

Although the current rules allow the cataloger to handle this material by taking a ‘snapshot’ (Delsey 1998, 35) of the item while at the same time leaving certain variables vague, this technique is not adequate because there is no way to know when content might have changed or whether the ‘snapshots’ compiled by various catalogers will be sufficiently similar to identify the item. (Weiss & Carstens, 2000, p. 49)

Thus there is no adequate way to reflect the variable nature of the content of Internet resources using the current rules.

The concept of seriality in AACR2 creates another problem for Internet resources. The lack of a predetermined conclusion is an attribute that many Web sites share with serials, but Web sites are generally "not issued in successive parts and do not bear numeric or chronological designations, which are the other AACR2 attributes of seriality" (Weiss & Carstens, 2000, p. 49).

Since these electronic resources lack the other ‘serial’ attributes, they cannot be treated as serials by AACR2; yet because of their continuing nature they cannot be handled effectively as monographs. This means that there is now a large and growing category of bibliographic resources that cannot be handled effectively by the code. (Weiss & Carstens, 2000, p. 49)

There is clearly a need for AACR2 to be changed to conform to the actual characteristics of Internet resources.

Various organizations are working on ways to modify AACR2 to better accommodate Internet resources. In a report prepared for the Joint Steering Committee (JSC) of AACR2 (as cited in Weiss & Carstens, 2000), Jean Hirons and others attempted to redefine the notion of seriality. First, a distinction is drawn between finite and continuing publications. Finite works are complete, whereas continuing works "are intended to be continued for an indeterminate period and include traditional serials" (Weiss & Carstens, 2000, p. 49). The key change recommended would be to define another type of publication called an integrating resource. An integrating resource is one that is added to or changed through the use of updates that may substantially change the content. These integrating resources may be loose-leaf publications, databases, or Web sites. (Weiss & Carstens, 2000, p. 49) "These new rules, if adopted, should allow AACR2 to handle more effectively this rapidly growing category of resources" (Weiss & Carstens, 2000, p. 49). Hirons recommends changes to AACR2 that will allow catalogers to acknowledge that integrating resources change over time rather than require them to only catalog a ‘snapshot’ of that resource taken at a given time. This would allow the catalogers to consider the publication as a whole rather than be restricted to a single aspect or manifestation of the publication. (Weiss & Carstens, 2000, p. 49) The concept of the chief source of information would be eliminated. Title and author information would be taken from any part of "the latest issue or iteration of a resource," and a uniform title would be added to successively cataloged titles to provide a stable title.

These two recommendations would provide the user with a more holistic approach to the whole work—i.e., both earliest and latest titles—while still providing stability. JSC has now endorsed many of these changes. The use of title information from the latest piece in hand was endorsed for integrating resources but not for traditional serials (JSC 1999). (Weiss & Carstens, 2000, p. 49)

Adoption of these recommendations would allow catalogers to provide a clearer description of the evolving content of an integrating resource.

Other changes to AACR2 are also driven by the proliferation of Internet resources. The Association for Library Collections & Technical Services Committee on Cataloging: Description and Access (CC:DA) recommended the following changes. "The replacement of the term ‘computer file’ with the term ‘electronic resource’ throughout chapter 9 is the first change recommended by the CC:DA Task Force on Harmonization" (Weiss & Carstens, 2000, p. 50). Also, the CC:DA report

urged that a resource be considered a new edition only if there are significant changes in the intellectual or artistic content. Statements such as new release, version, level, or update as well as changes in physical character, display format, or printer-related file formats would require a new record only if there were a change in content. (Weiss & Carstens, 2000, p. 50)

As in Hirons report to the JSC, the task force suggested that the chief source of information be the entire resource and not just the title screen. Other recommendations included widening the scope of the rules to include new types of electronic resources; the creation of more specific terms for use in the file characteristics area; and including more current examples in the note fields. (Weiss & Carstens, 2000, p. 50)

One of the most important developments for cataloging Internet resources actually occurred many years ago. An OCLC Research Project in 1991-92 demonstrated "that, with a few modifications, USMARC computer files format and AACR2R Chapter 9 could be used to catalog Internet resources" (Lam, 2000, p. 52). One of the modifications recommended resulted in the creation of Field 856, Electronic Location and Access in the MARC (MAchine Readable Code) format in 1994. (Lam, 2000, p. 52) This allowed the URL to be placed directly in the MARC record. When Web-based OPACs came along, those URLs could be displayed as hyperlinks, making access extraordinarily easy for OPAC users. (Medeiros, 1999, p. 58)

MARC or Metadata?

Generally speaking, there are two basic ways to locate Internet resources. One is to create MARC records to place in an OPAC, and the other is to use metadata that will allow retrieval by search engines. As alluded to above, MARC stands for MAchine Readable Code, which is the standard communications format used in library OPACs. (MARC standards, 2001) The basic definition of metadata is that it is "data about data". "Metadata sounds sexy, but it really stands for cataloging—the professional control of materials by the use of predictable terms and fields" (David Seaman, as cited in Chepesiuk, 1999, p. 63). One of the leading versions of metadata, not surprisingly sponsored by the ubiquitous OCLC, is the Dublin Core (DC). (Dublin Core Metadata Initiative home page, 1995-2001) DC consists of 15 core elements, "title, author or creator, subject and keywords, description, publisher, other contributor, date, resource type, format, resource identifier, source language, relation, coverage, and rights management" (Chepesiuk, 1999, p. 61-62).

DC is primarily intended to describe Internet resources, but some supporters of DC advocate its use as a replacement for MARC.

However, to librarians, the thought of abandoning this proven standard, which has millions of records invested, is heresy. Nonetheless, the need to establish an enhanced means of access to online resources, combined with the prohibitive cost of cataloging the Internet in traditional MARC, has turned attention to the Dublin Core. (Medeiros, 1999, p. 58)

The MARC format, developed in 1967, was designed to facilitate the exchange of information on digital tape. However, years of use and countless adaptations have rendered MARC a near universal format for the exchange of bibliographic information. (Weiss & Carstens, 2000, p. 50-51) Adaptations made to accommodate to the need to catalog Internet resources include System Requirements and Mode of Access notes (MARC Field 538), Computer File Characteristics (MARC Field 256), and Type of Computer File or Data Note (MARC Field 516), in addition to the highly useful MARC Field 856, Electronic Location and Access, discussed above. (Medeiros, 1999, p. 58)

By comparison, "metadata, with its multiple standards and formats, may not prove as resilient as MARC" (Weiss & Carstens, 2000, p. 52). "Metadata cannot currently be integrated into a standard library catalog. Instead, metadata must be attached to Web pages or it can be confined to a separate database of metadata records" (Weiss & Carstens, 2000, p. 50-51). Replacing MARC with DC would not be advisable, but perhaps it is feasible to use the two in combination.

The most adventurous application for Dublin Core metadata within the library community entails the creation of records to be contributed to a shared catalog. This Dublin Core-specific database would index all 15 elements, and make searching and/or limiting on these elements possible. This approach utilizes cooperative efforts, and results in a search engine that consists entirely of human-authorized metadata, whether manually input as such, converted from another standard, or harvested. Since the metadata creation is moved from the content provider to the librarian in this scenario, controlled vocabulary can be utilized, and database maintenance routinely performed. These metadata surrogates would form the basis for a de facto scholarly search engine. (Medeiros, 1999, p. 59)

Norm Medeiros (1999) outlines the future use of this combination of MARC and DC as follows. Internet resources purchased by the library would continue to be incorporated into the catalog using MARC format.

Not only are subject, title, and keyword access desirable, but necessary acquisitions information must be attached to the parent bibliographic record. Thus, the purchasing of an Internet resource will continue to drive its bibliographic presence in the OPAC. (p. 59).

However, free Internet resources that are currently being cataloged in MARC will not require "full bibliographic representation. These items are prime targets for a metadata record to be included in the Dublin Core database" (p. 59-60).

Consequently, the need to maintain lists of ‘great sites’ on a Web page is unnecessary in this new MARC/Dublin Core environment since the Dublin Core database will be more current and comprehensive. Also, controlled vocabulary will be utilized in library and the Dublin Core catalogs, simplifying the potential for future cross-searching capabilities or ‘federating’ of databases. (p. 59-60)

It seems more complicated than having all resources cataloged in the OPAC, rather like the old days of having print resources in the OPAC and Web resources on the home page, but if the cross-searching can be made invisible to the user, it may indeed make the best use of both MARC and DC.

CORC

The Cooperative Online Resource Catalog (CORC), yet another project undertaken by OCLC, is a major step toward this combined MARC/DC worldview. CORC includes "automated Web site selection, record creation, descriptive cataloging, subject headings assignment, classification, and dynamic page building that libraries can integrate with their gateways" (Porter & Bayard, 1999, p. 392). CORC bills itself as "a metadata creation system for bibliographic records and pathfinders describing electronic resources," the goal of which is to use "both MARC and Dublin Core records to create a searchable database of quality Internet resources" (CORC home page, 2001).

CORC’s interoperability is one of its major features. Contributors can enter records in either MARC or Dublin Core format. These records are then stored in Extensible Markup Language (XML), and delivered or exported to the end-user in either MARC or Dublin Core format. (Medeiros, 1999, p. 60)

OCLC expects CORC to "evolve into a general-purpose, web-based cataloging service" (Oder, 2000, p. 50) similar to their highly successful WorldCat, the shared bibliographic catalog. In fact, CORC is synchronized with WorldCat in real time, and most libraries that use CORC export records to their OPACs in the same manner as they would WorldCat records (Oder, 2000, p. 50). CORC users can choose to display records in either OCLC-MARC or DC. CORC is also trying to develop a "harvesting program" that will create basic metadata automatically. (Oder, 2000, p. 51)

Conclusion

Properly selected Internet resources, those that have value to the users of the collection, should be added to library catalogs. The experience of libraries’ that included Web site links on their home pages, only to find that patrons used those links instead of the physical collection, confirms this. In addition to being selected on the basis of value to users, Internet resources must also be relatively stable. PURLs, or similar technology, may help to improve stability, but only the passage of time will tell which sites are truly stable—PURLs will not help if the site disappears completely. Changes to the cataloging rules to facilitate the handling of Internet resources should also be implemented, and ways of combining MARC and metadata to improve access to materials should continue to be explored. Projects like CORC will provide copy-catalogers with access to prepackaged records for Internet resources just as WorldCat currently does for other types of resources. All of this should be undertaken in the name of serving the users by providing access, but there is a potential dark side. Although most of the Web resources added to library catalogs are free, they may not remain so for long. Owners of the content may be encouraged by the organization and access provided by the library’s labor and decide that the free content they are providing is worth charging for after all.

References

          Anglo-American cataloging rules. 2nd ed. (1998) Chicago: American Library Association.

          Chepesiuk, R. (1999). Organizing the Internet: The "core" of the challenge. American Libraries 30 (1), 60-63.

          CORC home page. (2001). Dublin, OH: OCLC. Retrieved March 12, 2001, from the World Wide Web: http://www.oclc.org/corc/

          Dublin Core Metadata Initiative home page. (1995-2001). Dublin, OH: OCLC. Retrieved April 4, 2001, from the World Wide Web: http://dublincore.org/

          Germain, C. A. (2000). URLs: Uniform resource locators or unreliable resource locators. College & Research Libraries 61 (4), 359-65.

          Lam, V. (2000). Cataloging Internet resources: Why, what how. Cataloging and Classification Quarterly 29 (3), 49-61.

          MARC standards. (2001, March 13). Washington, DC: Library of Congress. Retrieved March 8, 2001, from the World Wide Web: http://lcweb.loc.gov/marc/

          Medeiros, N. (1999). Making room for MARC in a Dublin Core world. Online 23 (November/December), 57-60.

          Oder, N. (2000). Cataloging the net: Two years later. Library Journal 125 (16), 50-51.

          Olson, N. B. (1997). Cataloging Internet resources: A manual and practical guide. (2nd ed.). Retrieved March 12, 2001 from the World Wide Web: http://www.purl.org/oclc/cataloging-internet/

          Porter, G. M. & Bayard, L. (1999). Including Web sites in the online catalog: Implications for cataloging, collection development and access. The Journal of Academic Librarianship 25 (5), 390-94.

          PURL home page. (n.d.). Dublin, OH: OCLC. Retrieved April 4, 2001, from the World Wide Web: http://purl.oclc.org/

          Schneider, K. G. (1997). Cataloging Internet resources: Concerns and caveats. American Libraries 28 (3), 77.

          Veatch, J. R. (1999). Insourcing the Web. American Libraries 30 (1), 64-67.

          Weber, M. B. (1999). Factors to be considered in the selection and cataloging of Internet resources. Library Hi Tech 17 (3), 298-303.

          Weiss, A. K. & Carstens, T. V. (2000). The year’s work in cataloging, 1999. Library Resources & Technical Services 45 (1), 47-58.

Return to Academic Résumé