<<Back to Graduation Portfolio Home

Unstructured Searching in a Structured Query World:

Final Project Paper

Observations and Literature Review by Barbara J. Hampton, 2003

in partial fulfillment of the requirements of ILS 537-70

under the supervision of Dr. Mary Brown

at Southern Connecticut State University

Click on the links below to jump to sections of this page:

Abstract - Introduction - Method - Results - Discussion

1. The Effect of Current Preferences for Internet Information

2. Aspects of Serendipty

3. Productive Serendipity

4. Intangible Benefits of Serendipity

5. Information Services that Support Serendipity

References - Appendix

^Back to Top^

Abstract


This paper examines the conflicts between structured query access to electronic information and the more serendipitous discoveries that might occur more readily in the traditional library. Valuable insights are sometimes had by virtue of a "happy accident" connecting diverse information via a creative mind in new ways. Using data from observations of information seeking behavior, interviews of library users, and a survey of 416 adults, we measured the role of browsing (a common approach to serendipitous discovery of information) in the search strategies of various users. These reported behaviors were then compared to the methods which foster creative research and discovery.


Introduction


Until recently, the role of serendipity as a productive agent in society appears to have been held in disdain. Inventors, scientists, business leaders, and researchers took full credit for their successes and implied that these were the result of hard work rather than good luck. Edison (1932) said "Genius is one per cent inspiration, ninety-nine per cent perspiration." Reversing this trend, Jones (1991), Flatow (1993), and Roberts (1989), among others have published popular works celebrating accidental inventions that were the result of serendipity. A recent query to the search engine Google.com using "joy of finding out" produced 230 hits.
Simultaneously, the advent and exponential growth of electronic keyword searching and the declining use of print information sources may be impeding the serendipity that browsing the adjacent library shelves, journal displays, or even paper catalogue cards fosters. This paper addresses five questions concerning serendipity in information seeking behavior:


1. To what extent are current information seeking preferences impeding the accidental discovery of useful information?
2. What paradigms best describe the role of serendipity in information seeking?
3. How has productive library research serendipity been observed in specific domains?
4. What intangible benefits do opportunities for serendipitous discovery offer the human psyche?
5. How can librarians and information specialists support serendipity in information seeking?

^Back to Top^


Method


To assess current information seeking preferences of a variety of users, a graduate school library science class designed and administered a survey questionnaire to a diverse group of adults. Survey research has been used in the information field as a useful technique for providing generalized descriptions of certain aspects of information-seeking behavior. Analysis of responses and patterns is also useful in designing future research studies by identifying issues and terminology meaningful to a particular group of individuals.
A convenience sample was used by 17 researchers distributing paper or electronic questionnaires to participants in the northeastern, northwestern, southeastern, southwestern, south central, and mid-western sections of the United States.
Four hundred sixteen adults participated in the study. Participants were not paid for their participation.


Of the participating adults, 65% were female. Respondents' reported highest educational levels were 31% high school, 38% bachelor's degree, 20% master's degree, and 9% doctoral degree. Age groups represented were disproportionately older middle-aged (46-55 years) 28% ; other groups included: 18-25 years, 19%; 26-35 years, 19%; 36-45 years, 18%; 56-65 years, nine per cent; and over 65 years, 6%. Predominantly, respondents resided in the northeastern United States (61%). All other regions of the country were also represented, excepting the north central: northwest, 13%; southwest, 6%; midwest, 1%; south central, 6%; south east, 12%. The majority of respondents described themselves as professionals (51%). Students comprised the next largest (19%) portion of the sample. Clerical, sales, and trade workers were 12% of the group, retired seniors 8%, and 8% were not employed.


One instrument, delivered either in printed or electronic format, was used for collecting data. No personally identifying data was collected, although most respondents were acquainted with the researchers requesting their participation. The instrument consisted of a cover letter of welcome and directions and 38 items, organized into 8 single and 7 multi-part questions and 7 demographic questions. Each of the 8 single and 7 multi-part questions was designed to elicit information about various aspects of the individual's information behavior. Responses were coded and tallied within various demographic sub-groups and across the entire sample.

^Back to Top^


Results


The Internet was confirmed as the preferred source of information for personal research by 52% of respondents overall. See Table 2. Among Traditional Students and Students (unspecified), this preference rose to two-thirds of respondents or more. The next most popular choice of information sources was People (24% of respondents overall). While 13% overall chose "Books" as their preferred source, only 5% identified "Browsing in Libraries" as the preferred source. Browsing on the Internet was not one of the options. While some viewing of the Internet might be described as browsing (for example, scanning within a known website), broader searching via a search engine with keywords would not.
When asked to identify the primary purpose of library visits, the dominant response was in search of something particular, either by title or topic, together accounting for 71%. See Table 1. Those who reported going to the library most often to browse constituted 13%. Only 6% reported that they do not visit libraries (although it is unclear whether this also excludes remote access to libraries via the Internet). Programs and Bringing Someone Else to the Library appeared as providing only rare impetus for library visits, 3% and 4%, respectively. Those who identified themselves as "Professionals-Law" cited the search for a particular book or item as the reason for a library visit more often (67%) than other respondents. Students - Traditional (65%), and Students -Returning (32%), Professionals (39%), and Professionals - Education (39%) were most likely to be searching for a particular topic. Thus conscious efforts at browsing play a minor role in library use at present.

^Back to Top^


Discussion


1. The Effect of Current Preferences for Internet Information


The Internet is rapidly becoming the primary if not sole source for information for many Americans, as confirmed by the survey reported by Hampton (2003) and Bishop (2003). Yet electronic sources (the Internet, computer databases, etc.) offer poor support for serendipity. Ford (1999) examined the disconnect between information retrieval systems and creative thinking, where the results ranked highest in relevance by the search algorithm tend to represent convergent thinking, confirming the existing conceptualizations of the searcher rather than generation new insights.


Computer searches consist almost exclusively of structured queries in which the user selects specific terms and sources for a search. Other sources, intellectually adjacent to the original query, may not be retrieved because of variations in the terms occurring in the source. In the traditional, physical library, both materials and finding tools display holdings in relationship to their subjects: books on rocks are located near books on minerals; clippings on the history of the Pomperaug Plantation are in the vertical file near the history of Russian Village, in the Southbury Public Library's files on the history of the town; a search in the periodical index for "presidential elections, judicial contests of" will steer a searcher to the VanBuren election as well as the Bush-Gore election. When results are retrieved as electronic documents, delivered directly to the computer screen where the researcher is working, the user misses these "intellectually adjacent" items.


The physical search process also brings into view unrelated ideas that were not sought and might even have been avoided. In perusing a display on singer Marian Anderson, a student might see Ryan's (2002) depiction of the segregated public accommodations that she encountered. Gup (1997) decries our declining exposure to new problems, topics, and ways of thinking as a result of electronic searching. The searcher retrieving electronic documents from a keyword search may never see anything that the search engine doesn't drop into his or her results list.

^Back to Top^


2. Aspects of Serendipty


Horace Walpole, an eighteenth century British writer, created the term "serendipity" from the name given an imaginary kingdom in a fairy tale in which the principle characters "making discoveries by accident and sagacity. . . . for you must observe that no discovery of a thing you are looking for comes under this description." (Word History, "serendipity, American Heritage Dictionary (1992), p. 1647, quoting Walpole.) Contemplating instances of serendipity, Liestman (1992) describes six paradigms of serendipity and their application to information seeking:


Coincidence - discovery by purely random luck;
Prevenient grace - discovery benefiting from the efforts of others (e.g., through cataloguing and classification) performed on behalf of the searcher by others unseen and unknown;
Synchronicity - simultaneous occurrence of two meaningfully but not causally connected events;
Perseverance - the drudge method of exhaustive and open-ended searches until a discovery is made;
Altamirage - discovery as a result of the searcher's habits, character, knowledge which uniquely connect the searcher with information;
Sagacity - discoveries through intuition and skill which consciously or unconsciously steer the searcher to a fruitful source of information.


Coincidence and synchronicity are inconsistent with Walpole's concept, which includes the interaction of accident and sagacity, (discerning, farsighted, wise). Examples in the library occur when a user spots a resource that may be alphabetically sequential or on the way to a chosen resource. Relying purely on stochastic factors, such examples would not be ripe for information system planning and manipulation, and will thus not be further considered in this paper.


The "prevenient grace" and "sagacity" paradigms provide a role for the skill and efforts of the librarian in identifying and organizing promising resources, whereas user factors are critical in the "perseverance" and "altamirage" paradigms. In Liestman's (1992) examples of sagacity-driven discoveries (altamirage and sagacity), the user was able to illuminate the significance of a piece of information through the light of an existing knowledge base, of which the librarian will have little control. To benefit from serendipitous encounters, this "prepared mind" must also be open and questioning. (Foster & Ford, 2003; Liestman, 1992).

^Back to Top^

3. Productive Serendipity


In analyzing the results of a knowledge management studies, Koenig (2003) pointed to the productivity gains observed in pharmaceutical research groups where the research library actively supported current awareness reading. Prominent displays of "today's journals" and "yesterday's journals" and literature-alerting services were among the effective practices observed. Newspaper journalists demonstrated the use of serendipity in discovering new angles and sources for stories in research by Attfield & Dowell (2002). Kulthau & Tama (2001) reported that lawyers developing the theory of a case in preparation for trial preferred print materials because of the benefits of serendipitous discoveries.


The print resources allowed them to look for 'one thing and find another'. The computerized system, on the other hand, was designed to be too specific to allow for the flexibility needed to facilitate construction [of a theory of the case]. . . . They were willing to use databases when the systems met their needs, particularly when they were looking for a specific case that they already knew about. However, when their state of knowledge was more ambiguous and ill-defined they found the database less useful. Databases worked well for routine tasks and specific inquiries but no so well for complex tasks and unspecified queries. (p. 41)

While the lawyers and journalists generally made their serendipitous discoveries on their own, historians working with original documents and archival material described their reliance on the archivists to Duff & Johnson (2002). Following the sagacity paradigm, the historians turn to the archivists to recommend particular collections as useful for a particular research problem:


One historian mentioned that, although he went to an archives with a list of names he wanted to check into, he did not mention this list when he talked to the archivists hoping they would "suggest things to begin with in case I diverted them in a certain way and that closed down other options that I didn't know about." (p. 483)


Koenig (2003) points to the "20-25 percent rule", in which all white-collar professionals consistently spend 20-25 percent of their time information seeking, for both managers and knowledge workers. Guided by an intuitive "satisficing" mechanism, at about the same point, they "begin to conclude that they have to get on with the rest of their job, and that if they have not already done so, they will soon run into diminishing returns in their information seeking, and that it is time to proceed based on what they have." The 20-25 percent time budget for information seeking can include opportunities for browsing and serendipity if efficient and effective structured searching is used.


^Back to Top^

4. Intangible Benefits of Serendipity


In addition to "functional needs" (factual knowledge) for information, consumer information has been shown to serve hedonic (pleasure-giving) needs, innovative needs, aesthetic needs, and sign needs (symbolic and social expression). (France, Yen, Wang & Chang, 2002). Internet keyword searching and traditional classification schemes can serve the factual information needs well. France, et al. (2002) propose a model for data mining and Web searching which acknowledges the other information needs and recommend data mining tasks (association rules, clustering, classification, and forecast) particularly suited to those needs. Their proposal notes that "the data mining process does its searching based on the data itself" rather than relying on the user's query. (p. 248). The non-factual information needs are poorly accessed via traditional structured queries, necessitating support of serendipitous connections found in data mining to optimize consumer information and decision-making.


As Gup (1997) notes, serendipity can also expand our thinking beyond the expected and our parochial views. Fundamentally, psychologists point to the innate curiosity of human nature as motivation for much information browsing. (Case, 2002, p. 84-86). Judging from anecdotes of serendipitous finds related by Liestman, searchers also enjoy being winners in the information lottery: capturing the elusive tidbit of knowledge or blending disparate concepts creatively for new discoveries.


Jack endorses the "adversarial approach," particularly as it applies to online searching. Implicit in this pugnacious attitude, he says, is the belief that online databases and services "are not to be regarded as close personal friends but as temporary whiies [sic.] which might turn on the searcher at any moment." He advises not take zero (0) postings as an answer. "The searcher must always be prepared to call up any trick in the book to outsmart the database. Fight back. Don't be gentle. Take it personally." (Liestman, 1992, quoting Robert F. Jack (Dec. 1985), "Meatball searching: The adversarial approach to online information retrieval," Database 8, 50.)

^Back to Top^

5. Information Services that Support Serendipity


Open classified stacks are basic to productive browsing. (Liestman, 1992) Freed from the space and maintenance issues of paper catalogues, online public access catalogues (OPACs) can increase the number and variety of access points: more subject entries, more cross-referencing, context searching, searching of notes and contents entries, etc. Liestman (1992) notes criticisms of existing class schemes and the limitations inherent in the use of controlled vocabulary. In predicting the demise of specialized thesauri to construct bibliographic data, Svenonius (2001) states:


The situation where users searching for information must translate their search terms into the vocabularies of a number of different retrieval languages is no longer tolerable. Until better alternatives are developed, users will cross-database search using natural language.
. . . The requirements for a universal bibliographic language and for sophisticated search engines dictate that the single most important direction for vocabulary development is toward the large-scale mapping of natural-language vocabulary onto controlled vocabularies. (p. 194-195)


An important goal in the creation of digital libraries is the mapping of "natural" vocabularies to metadata vocabularies, which Borgman (2000) demonstrates is complicated by semantic and syntactic issues. As digital information grows in depth and scope, Borgman notes that content designed for one community of users may be pertinent to others who approach the topic with different terminology and different levels of expertise, making structured query searching problematic. The information professional must bridge vocabulary and interface differences among different audiences.


Bibliographic training generally focuses on the query methods for particular sources, but Liestman (1992) reminds us of the importance of mental preparation to think about a topic: brainstorming, reading or discussing a topic, removing distractions, and remaining flexible in conceptualizing a topic. These are skills which can benefit structured searching as well as moments of serendipity.


The corporate library of a research organization can support an environment of rich communication and free-flowing information. (Koenig, 2003) By inculcating in the researcher the value of current-awareness reading and browsing, however, the librarian can plant the seeds of future serendipitous discoveries. To the extent that the librarian is the crucible in which domain knowledge and information skills are joined, these two mechanisms support the importance of communications between the researcher and the information professional, so that librarian is attuned to the potential connections between information sources and user needs. Koenig (2003) urges librarians in research libraries to coach users on current awareness services, periodically interviewing key employees to develop and update profiles of their interests and responsibilities.


Foster and Ford (2003) suggest hypertext navigation can be used to exploit serendipitous connections. Browsing and navigation interfaces tap into users' "recognition knowledge" by offering categories and choices in multiple windows, and menus that don't require the user to recall specific terms. Borgman (2000) describes these features as particularly appropriate for smaller, focused collections in which choices are limited and for searching ill-defined problems.


After hypothesizing that results under current structured queries in the middle ground of relevancy may best support creative thinking, Ford (1999) sets a goal for information retrieval systems to be able to draw disparate concepts or entities together in new ways to suggest new themes or patterns, perhaps using artificial intelligence and fuzzy logic.


While some of these tools remain concepts under development, others are simple extensions of existing information science. The most important steps are the recognition of serendipity as a productive and creative research tool and the collaboration between librarian and searcher to set the stage for serendipitous discoveries.

^Back to Top^


References

American Heritage Dictionary of the English Language, 3rd Ed. (1992). Boston: Houghton Mifflin.

Attfield, S., & Dowell, J. (2002). Information seeking and use by newspaper journalists. Journal of Documentation, 59, 187-204. Retrieved September 26, 2003, from Emerald FullText database.

Bishop, P. (2003). Why do people use the library and the internet ? Specifics are the key. Unpublished manuscript. Southern Connecticut State University. Retrieved November 17, 2003, from http://www.geocities.com/pamela_joykd/Study_paper.html

Borgman, C.L. (2000). From Gutenberg to the global information infrastructure: Access to information in the networked world. Cambridge, MA: MIT Press.

Case, D.O. (2002. Looking for information: A survey of research on information seeking, needs, and behavior. New York: Academic Press.

Duff, W.M. & Johnson, C.A. (2002). Accidentally found on purpose: Information-seeking behavior of historians in archives. Library Quarterly, 72, 472-496. Retrieved September 4, 2003, from Academic Search Premier

Edison, T.A. (1901). Cited in the Columbia World of Quotations (1996). Retrieved November 20, 2003, from http://www.bartleby.com/66/39/18439.html

Flatow, I. (1993). They all laughed: From light bulbs to lasers: The fascinating stories behind the great inventions that have changed our lives. New York: Harper Collins.

Ford, N. (1999). Information retrieval and creativity: Towards support for the original thinker. Journal of Documentation, 55, 528-542. Retrieved November 11, 2003, from Emerald FullText database.

Foster, A. & Ford, N. (2003). Serendipity and information seeking: An empirical study. Journal of Documentation, 59, 321-340. Retrieved October 29, 2003, from Emerald FullText database.

France, T., Yen, D., Wang, J., & Chang, C. (2002). Integrating search engines with data mining for customer-oriented information search. Information Management & Computer Science, 10, 242-254. Retrieved November 3, 2003, from Emerald FullText database.

Gup, T. (1997). The end of serendipity. The Chronicle of Higher Education, 44 (13), A52.

Hampton, B.J. Choosing an information guide: Are information preferences affected by job status? A collaborative study. Unpublished paper. Southern Connecticut State University. Retrieved November 11, 2003, from http://members.aol.com/HFlex/FullData.htm

Jones, C.F. (1991). Mistakes that worked. New York: Doubleday.

Koenig, M.E.D. (2003). Knowledge management, user education, and librarianship. Library Review, 52, 10-17. Retrieved November 11, 2003, from Emerald FullText database.

Kulthau, C.C., & Tama, S.L. (2001). Information search process of lawyers: a call for "just for me" information services. Journal of Documentation, 57, 25-43. Retrieved September 11, 2003, from Emerald FullText database.

Liestman, D. (1992). Chance in the midst of design: approaches to library research serendipity. RQ, 31, 524-532. Retrieved November 11, 2003, from Emerald FullText database.

Roberts, R.M. (1989). Serendipity: Accidental discoveries in science. New York: John Wiley & Sons.

Ryan, P.M. (2002). When Marian sang: The true recital of Marian Anderson. New York: Scholastic Press.

Svenonius, E. (2002). The intellectual foundation of information. Cambridge, MA: MIT Press.

 

Appendix

Collaborative Survey Questionnaire

Cooperative ISB Survey Subset Data Analysis

Cooperative ISB Survey Analysis

Table 1 -Purpose of Library Visits by Job Status

Table 2 - Preferred Source of Information for Personal Use

Observations

Consent for Observations

^Back to Top^

Question? Problems? Suggestions? Please contact page owner, Barbara J. Hampton.

Last revised 21 November 2003.