Web scale discovery services
 9780838958292, 083895829X

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Library Technology R

e

p

o

r

t

s

Expert Guides to Library Systems and Services

Web Scale Discovery Services Jason Vaughan

alatechsource.org American Library Association

Library Technology R

e

p

o

rt s

ALA TechSource purchases fund advocacy, awareness, and accreditation programs for library professionals worldwide.

Volume 47, Number 1 Web Scale Discovery Services ISBN: 978-0-8389-5829-2

American Library Association 50 East Huron St. Chicago, IL 60611-2795 USA alatechsource.org 800-545-2433, ext. 4299 312-944-6780 312-280-5275 (fax)

Advertising Representative Brian Searles, Ad Sales Manager ALA Publishing Dept. [email protected] 312-280-5282 1-800-545-2433, ext. 5282

Editor Dan Freeman [email protected] 312-280-5413

About the Author Jason Vaughan is the director of library technologies at the University of Nevada, Las Vegas (UNLV). In this capacity, he serves as a senior administrator and provides overall leadership for three units within the UNLV University Libraries: Digital Collections, Library Systems, and Web and Application Development. Vaughan earned his MLS and BA from the University of North Carolina at Chapel Hill. Previous positions at UNLV include Head of Library Systems and Information Systems Librarian. He has published extensively on library and library technology topics, including library automation, digitization, planning, and policy. To date Vaughan has served as coauthor and investigator on three funded LSTA grants and has most recently focused on delivering web scale discovery talks at the local, state, and national levels.

Copy Editor Judith Lauber

Editorial Assistant Megan O’Neill [email protected] 800-545-2433, ext. 3244 312-280-5275 (fax)

Production and Design Tim Clifford, Production Editor Karen Sheets de Gracia, Manager of Design and Composition

Library Technology Reports (ISSN 0024-2586) is published eight times a year (January, March, April, June, July, September, October, and December) by American Library Association, 50 E. Huron St., Chicago, IL 60611. It is managed by ALA TechSource, a unit of the publishing department of ALA. Periodical postage paid at Chicago, Illinois, and at additional mailing offices. POSTMASTER: Send address changes to Library Technology Reports, 50 E. Huron St., Chicago, IL 60611. Trademarked names appear in the text of this journal. Rather than identify or insert a trademark symbol at the appearance of each name, the authors and the American Library Association state that the names are used for editorial purposes exclusively, to the ultimate benefit of the owners of the trademarks. There is absolutely no intention of infringement on the rights of the trademark owners.

alatechsource.org Copyright © 2011 American Library Association All Rights Reserved.

Abstract Web scale discovery services for libraries are services capable of searching quickly and seamlessly across a vast range of local and remote preharvested and indexed content, providing relevancy-ranked results in an intuitive interface expected by today’s information seekers. The first of these services debuted in late 2007, with the majority of services reaching initial public release in 2010. This report reviews web scale discovery services tailored to the library environment and explores why libraries should take notice of these tools. This report describes in detail the content, interface, and functionality of web scale discovery services developed by four major library vendors: OCLC, Serials Solutions, Ebsco, and Ex Libris. Each of these services is evolving rapidly, indicative of their open framework design and an ongoing expansion of indexed content as additional publisher and aggregator agreements are brokered. Although many similarities among the services are apparent, this report also outlines some observed differences, though these differences are becoming hazy as each vendor adds new functions, features, and content. To help individual libraries evaluate which service will best meet the needs of the library and its community, this report provides detailed evaluation questions and concludes with a section providing additional background information on each service.

Subscriptions

alatechsource.org/subscribe

Contents Chapter 1—Web Scale Discovery What Is Web Scale Discovery? Why Web Scale Discovery? Audience and Scope A Few Key Concepts A Note of Caution Notes

5

5 7 8 9 10 21

Chapter 2—OCLC WorldCat Local

12

Chapter 3—Serials Solutions Summon

22

Chapter 4—Ebsco Discovery Services

30

Chapter 5—Ex Libris Primo Central

39

Chapter 6—Differentiators and A Final Note

48

Chapter 7—Questions to Consider

53

For More Information

60

Overview Content and Scope Interface Features: Overview, Results, and Navigation Additional Features Upcoming Directions Notes Overview Content and Scope Interface Features: Overview, Results, and Navigation Additional Features Upcoming Directions Note Overview Content and Scope Interface Features: Overview, Results, and Navigation Additional Features Upcoming Directions Overview Content and Scope Interface Features: Overview, Results, and Navigation Additional Features Upcoming Directions Content Metadata and Relevancy Price Integration with Other Systems The Interface A Final Note Note

Section 1: General and Background Questions Section 2: Local Library Resources Section 3: Publisher and Aggregator Agreements and Indexed Content Section 4: Open-Access Content Section 5: Relevancy Ranking Section 6: Authentication and Rights Management Section 7: User Interface

12 13 14 18 20 21 22 22 24 27 29 29 30 30 32 36 37 39 39 41 45 47 48 49 50 50 51 52 52 53 55 56 57 57 58 58

Chapter 1

Web Scale Discovery What and Why?

Abstract

What Is Web Scale Discovery? Connecting users with the information they seek is one of the central pillars of our profession. Succinctly put, Web scale discovery can be considered as deep discovery within a vast ocean of content. The mechanics behind Web scale discovery are not necessarily new, though a commercial application of this approach within the library environment—efficiently and, it’s hoped, effectively—is very new. While there are

Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

Web scale discovery services for the library environment have the capacity to more easily connect researchers with the library’s vast information repository. This includes locally held and hosted content, such as physical holdings, digital collections, and local institutional repositories. Perhaps more significantly, web scale discovery also accesses a huge array of remotely hosted content, often purchased or licensed by the library, such as publisher and aggregator content for tens of thousands of full-text journals, additional content from abstracting and indexing resources, and content from open access repositories. This chapter defines web scale discovery and highlights a few key concepts essential for understanding these services. For anyone who has worked a reference interview and heard a student utter, “I couldn’t find an article in the library catalog,” web scale discovery services hold tremendous potential. Extensive research on user expectations in the discovery arena, and the tools used by those seeking information—tools often disassociated from the library and often overlooking much of what the library holds and licenses—provide ample rationale for why web scale discovery is important for the library environment.

various approaches to Web scale discovery, this issue of Library Technology Reports will focus on what today appears to be the most common approach, which, at its heart, involves huge, centralized, preaggregated indexes searched by the end user. Expanded further, Web scale discovery is—or certainly holds the potential to be—the evolution that libraries have long sought for information discovery. As information professionals, we all have at least a general awareness of the evolution of discovery tools within the library context. Such tools initially were print-based, such as bound handwritten catalogs, the card catalog, and works such as Poole’s Index to Periodical Literature and the Reader’s Guide to Periodical Literature. For the past several generations, such tools gradually transferred into the automated, electronic realm, with an obvious example the development and evolution of the online integrated library system (ILS) with a front-end catalog accessible to librarians and end users. These catalogs were initially available within the local library’s physical building, often through a menu-driven, text-based interface. With the development in the 1990s of the Web’s physical infrastructure and the empowerment of many to access this new environment, library vendors created HTTP Web-based online catalogs. Other evolutions included early pioneers in broad online information systems, including Dialog and LexisNexis. The 1990s ushered in growth in publisher-based electronic journal content, e-text and e-book content, abstracting and indexing databases, and full-text content aggregators looking to pull related information together within an easily accessible and searchable electronic medium. Many of these services or products were initially provided on CD-ROM or through text-based, menu-driven networked systems, which all eventually evolved into

5

Library Technology Reports  alatechsource.org  January 2011

Chapter x

6

information search and delivery through a Web-based environment. During this time—we’ll say the mid-1990s—many of us witnessed (as we still witness) the confusion suffered by many end users faced with the choice of myriad information systems with their myriad interfaces and specialized bodies of content. Many of us may remember, from around 1998, a new search engine developed by two Stanford graduate students. Who knows whether there was any suspicion that within a decade, Google would often be the first stop—and at times the sole gateway to information discovery— for a new generation of students? Back in the library and academic environment, other information discovery and delivery systems (often separate from the ILS) were developed, such as institutional repository, course management, electronic reserves, and digital collection management systems. These systems provided additional avenues for hosting, discovering, and delivering to end users local library or institutional content. Library vendors developed and marketed federated search solutions, which attempted, with mixed (limited) success, to simultaneously search, retrieve, and adequately display content from various remote information hosts—such as abstracting and indexing and full-text databases—and were often difficult or time-consuming to configure and maintain. In a sense, the original federated search technologies, based on protocols such as Z39.50, represent an early approach to Web scale discovery of content at the article-citation and full-text levels. In the mid- to late 2000s, library Web catalogs evolved into “next-generation” library catalogs, offering increased intuitive functionality and a user interface more in line with popular sites on the Web (e.g., Amazon.com). These next-generation catalogs also provided the capacity to harvest records from various locally hosted library silos of information—such as catalog records from one host and digital collection records from another host—with search, retrieval, and presentation of the collective results from within the single, next-generation catalog interface. In many cases, these next-generation catalogs are still state of the art for many libraries. In short, these systems offered new discovery layer options, uncoupled from any specific underlying ILS. Things began to open up. This abbreviated and selective history brings us to the present. Library vendors have developed nextgeneration catalogs with many features and cues understood and expected by today’s researchers, such as faceted navigation, tag clouds, and Web 2.0 social features. These systems can resolve a user’s research need, in many instances, to the final item level, such as a book. Or they can resolve a user’s research need to a collection level, such as the existence of a Journal of Psychology. The user is given a call number for a physical copy to browse (if the library still subscribes Web Scale Discovery Services  Jason Vaughan

to print) and a MARC record 856 link, which takes the user perhaps first to a link resolver to select a content provider and then to the publisher’s or aggregator’s website, where the user must then search again, this time for article-level results, within a different discovery interface. This level of resolution for the library catalog is increasingly not acceptable, as more and more articles from scholarly journals, magazines, and newspapers are available in electronic form. It is troubling that students in 2010 utter the same thing as did students in 2000—”I need an article on psychology, and I can’t find it in the catalog.” Many libraries have additional avenues beyond ILS-based Web catalogs for further discovery of information in electronic form, such as A–Z lists of the hundreds of databases they subscribe to, A–Z lists of the thousands or tens of thousands of full-text journals large libraries have access to, and librarian-created guides focused on subject areas. As link resolvers and the OpenURL became mainstream in the 2000s, their magic allowed users to resolve their search to full-text articles, though often the user still needed to search a variety of front-end interfaces prior to connecting to the full text article as brokered through the link resolver. Web scale discovery services for the library environment are an evolution holding great potential to easily connect researchers with the library’s vast information repository. By preharvesting and centrally indexing content sourced across multiple silos, Web scale discovery services hold the promise to fundamentally improve and streamline end user discovery and delivery of content. Such content includes physical holdings, such as books and DVDs; local electronic content, such as digital image collections and institutional repository materials; and remotely hosted content purchased or licensed by the library, such as e-books and publisher or aggregator content for thousands of full-text and abstracting and indexing resources. For purposes of this issue of Library Technology Reports, Web scale discovery can be considered a service capable of searching across a vast range of preharvested and indexed content quickly and seamlessly. Web scale discovery services provide discovery and delivery services that often have the following traits: • Content. These services harvest content from local and remotely hosted repositories and create a vastly comprehensive centralized index—to the article level—based on a normalized schema across content types, well suited for rapid search and retrieval of results ranked by relevancy. Content is enabled through the harvesting of local library resources, combined with brokered agreements with publishers and aggregators allowing access to their metadata and/or full-text content for indexing purposes. • Discovery. These services have a single search

Chapter x

box providing a Google-like search experience (as well as advanced searching capabilities). • Delivery. These services provide quick results ranked by relevancy in a modern interface offering functionality and design cues intuitive to and expected by today’s users (such as faceted navigation to drill down to more specific results). • Flexibility. These services are agnostic to underlying systems, whether hosted by the library or hosted remotely by content providers. These services are open compared to traditional library systems and allow a library greater latitude to customize the services and make the service its own.

Why Web Scale Discovery? As illustrated by research dating back to as recently as 2010 or as far back as the 1990s (if not earlier), library discovery systems within the networked online environment have evolved, yet continue to struggle to serve users. As a result, the library (or systems supported and maintained by the library) is often not the first stop for research—or worse, not a stop at all. Users have defected, and research continues to illustrate this fact. Rather than weave these research findings into a paragraph or page, some illustrative quotes convey this struggle. Those wishing to read the full context of the research can do so at their leisure. The quotations below were chosen because they succinctly capture findings from research involving dozens, hundreds, and in some cases thousands of participants or respondents:

* * *

Today, there are numerous alternative avenues for discovery, and libraries are challenged to determine what role they should appropriately play. Basic scholarly information use practices have shifted rapidly in recent years, and as a result the academic library is increasingly being disintermediated from the discovery process, risking irrelevance in one of its core functional areas [that of the library serving as a starting point or gateway for locating research information]. . . .We have seen faculty members steadily shifting towards reliance on network-level electronic resources, and a corresponding decline in interest in using locally provided tools for discovery.2

A seamless, easy flow from discovery through delivery is critical to end users. This point may seem obvious, but it is important to remember that for many end users, without the delivery of something he or she wants or needs, discovery alone is a waste of time. 3 End users’ expectations of data quality arise largely from their experiences of how information is organized on popular Web sites. . . . [User] expectations are increasingly driven by their experiences with search engines like Google and online bookstores like Amazon. When end users conduct a search in a library catalog, they expect their searches to find materials on exactly what they are looking for; they want relevant results.4 * * *

Users don’t understand the difference in scope between the catalog and A&I services (or the catalog, databases, digitized collections, and free scholarly content).5 * * *

It is our responsibility to assist our users in finding what they need without demanding that they acquire specialized knowledge or select among an array of “silo” systems whose distinctions seem arbitrary. . . . The continuing proliferation of formats, tools, services, and technologies has upended how we arrange, retrieve, and present our holdings. Our users expect simplicity and immediate reward and Amazon, Google, and iTunes are the standards against which we are judged. Our current systems pale beside them.6 * * *

Q: If you could provide one piece of advice to your library, what would it be? A: Just remember that students are less informed about the resources of the library than ever before because they are competing heavily with the Internet.7

Other factors, apart from user behavior and preferences, also give reasons for libraries to use Web scale discovery services. First and most obvious is that if something is not discovered, it has no chance of being used. Whether a librarian conducts a reference interview, a user browses the shelves, a friend provides word of mouth, a user searches in Google or a library database, or a user scans issues and article titles in an electronic journal, discovery must happen, either by focused intent or serendipitously. Libraries Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

People do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable—so long as it requires little effort to find—rather than using information they know to be of high quality and reliable, though harder to find.1

* * *

7

Library Technology Reports  alatechsource.org  January 2011

Chapter x

8

often spend tremendous amounts of money every year to purchase or pay for access to an ever-growing body of electronic content, and the cost for access to this content often increases on an annualized basis. At least for academic libraries, most collections monies often go for electronic, not print, resources. Clearly purchases are made with the hope that the content will be used. For the content to be used, it must be discoverable—and for many of today’s users, easily discoverable. Publishers and aggregators also have a vested interest in discoverability and use of their content. Libraries regularly review content-usage statistics and frequently analyze overlaps within their collections. Extensive research into whether these Web scale discovery services increase discovery and usage of publisher and aggregator content does not yet exist, though it seems reasonable to assume that if materials are more easily discoverable, they will be used more heavily and access statistics will increase. Statistics are not the sole factor libraries analyze in deciding whether to continue a subscription, but they often play an awfully big role. While extensive research doesn’t exist, some studies have been published. For example, Doug Way compared database and full-text usage statistics before and after the implementation of the Summon service at Grand Valley State University Libraries.8 His research, involving analysis of GVSU’s link resolver statistics before and after Summon’s implementation, suggests that the new discovery service was broadly adopted by his institution’s population and has resulted in an increase in the use of the library’s electronic resources. Bill Kelm at Willamette University has presented results indicating that Willamette’s implementation of WorldCat Local has similarly led to an increase in the use of the library’s electronic resources and the number of ILL requests.9 Ex Libris Primo Central, Ebsco Discovery Service, and Innovative Interfaces Encore Synergy were all released in 2010, and no published research has yet been identified, at least in reference to surfacing Web scale content as offered by these services. Another tie-in focuses on information literacy. To the degree that these new tools may be used as a gateway to information by users, such use could serve as another tool to build information literacy. Web scale discovery services are by default searching local library collections and remote licensed content, purchased (it’s hoped) with a purpose. Presumably, this information, at least collectively, is more accurate, relevant, and appropriate for research purposes than, for example, a webpage with dubious, less informed, or opinionated content that may appear on the first page of a search engine’s results. In other words, the content is vetted to some degree as more likely to be worthy and true. At least one discovery service has a focus on scholarly articles, and several services allow users to refine a search to only materials designated (by Web Scale Discovery Services  Jason Vaughan

Ulrich’s or through a separate analysis) as scholarly or peer-reviewed. One service has an optional scholarly article recommender component, which helps bring together related scholarly materials in an increasingly interdisciplinary research realm. In short, if part of building information literacy involves steering users to discovery systems exposing library-vetted content, and if the features and content scope of such tools are attractive enough for users to bite, then these new discovery tools hold great potential.

Audience and Scope The primary goal of this work is to provide a valuable foundation to libraries that wish to know more about library-focused Web scale discovery services and to aid libraries contemplating a marketplace review for their local environment. It does not presume that the reader is familiar with any of these services, nor will the report delve into extreme technical detail. Such detail is best addressed by direct library-to-vendor dialog because that approach will supply the most up-to-date information available and because libraries may often have specific questions pertaining to their unique position or environment, local library skill sets, and staff workflows. If this issue of Library Technology Reports introduces a new concept to some readers, serves as a foundation for other libraries to utilize in their own marketplace evaluations, or provides a few additional questions or twists that those already well along in their investigations hadn’t considered, then it has met its goal. The author identified five Web scale discovery services from major vendors: OCLC WorldCat Local, Serials Solutions Summon, Ebsco Discovery Services, Innovative Interfaces Encore Synergy, and Ex Libris Primo Central. This report profiles four of the five. Encore Synergy by Innovative Interfaces is not profiled, which in no way implies that this product is not of interest or should not be considered by potential library customers. Indeed, Encore Synergy is already in use at a host of academic and public libraries. However, given space restrictions, time restrictions, and the fact that Synergy was one of the most recently released services, boundaries had to be drawn somewhere. In addition, the approach of Encore Synergy to Web scale discovery is a bit different from that of each of the other four services in this report; Synergy does not create a preharvested, preaggregated central index for local and remote publisher or aggregator content. Like other vendors, Innovative Interfaces has made agreements with content providers, though its approach focuses on using modern Web services to access the publisher or aggregator content, remotely hosted, in real time. This author makes no judgment calls on whether this approach is better or worse than others, and potential

Chapter x

library customers owe it to themselves to investigate each of these five services. Another boundary is that this report focuses on released, publicly available products sold by library vendors to the library community. It does not focus on the development of open source initiatives related to Web scale discovery of library resources, such as the quite interesting eXtensible Catalog (XC) project spearheaded by the University of Rochester.

A Few Key Concepts The Discovery Layer and Harvesting

Exposure of Licensed Content A key concept is the exposure—or surfacing—of licensed content within the discovery service indexes. As a test-drive of some of the already live and searchable library-specific discovery services—as well as Google Scholar—will easily illustrate, a library need not be a subscriber to much of the content contained in the discovery service index for the content to be discoverable. In other words, in many instances, the user is actually searching a broader collection than the local library may physically possess or own or license electronically. This is a potential boon to the researcher. Simply knowing of an item’s existence is better than not knowing, assuming that appropriate delivery options (e.g., interlibrary loan) are available if the library hasn’t licensed access to the electronic full text. In general, access to the full text is where a user must be authenticated. That said, uniquely indexed content, in particular content from some abstract and indexing databases, is not open for surfacing unless the library also subscribes to that resource. However, citation-level content covered in subscription abstracting and indexing databases is often available directly from publishers, aggregators, or other databases. Put another way, even if one didn’t consider any of the Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

Continuing the trend that next-generation library catalogs introduced to a wide audience a few years back, Web scale discovery services unlatch the discovery layer—the interface within which the user works for information discovery and delivery—from the traditional underlying host and delivery systems maintained at the local library level. Discovery services harvest or otherwise obtain ILS catalog content and content from other local repositories, such as digital collections and institutional repositories, often through the OAIPMH protocol, FTP, or a related mechanism. For ILS records, the level of content is adequate so that a user looking at the record in the discovery system generally need not click through to an underlying full catalog record as housed in the ILS, especially considering that enriched content and real-time status calls are integrated into the Web scale discovery experience. Still, each service provides a link to the full catalog record within the native ILS interface. For other materials, the discovery service may surface the item and provide a link to a more capable, function-replete system housing the native content. For example, digital collection management systems often include additional functionality that at present isn’t offered through discovery service interfaces, such as zooming, panning, and rotation of images. It’s important to note that the descriptive staff work related to the library’s print and e-content collections—the original cataloging and other metadata work—is still performed within the underlying library system, whether it be the ILS, digital collection management system, or institutional repository or related system. Staff are not natively cataloging records within the discovery system database/interface. In addition, local library systems may provide additional indexes and associated fielded searches not offered through the Web scale discovery service directly. In summary, at this stage of development, there is a role, from the perspectives of the user and library staff, for both the encompassing Web scale discovery service and the user interfaces and underlying systems associated with the library’s local collections. Similarly, publisher and aggregator content may often reside in a database tuned to that particular

content type or for a particular audience or discipline that typically uses that resource (e.g., Medline). Therefore, there continues to be a need for the end user to be able to utilize the native interfaces of these resources. Recognizing this need, several discovery services have recently introduced or begun investigations into database and collection highlighting within Web scale discovery results. For example, for a broad search, “engineering,” the database IEEE explorer may be returned at the top of the list; the user can click this link to connect to and search the IEEE database directly, leaving the web scale discovery service. As mentioned, each discovery vendor profiled in this report has entered into agreements with content providers that allow it to harvest, normalize, preindex, and expose through its Web scale discovery service the publisher or aggregator content. To facilitate regular, automatic content updates as new information is published, automated transfer routines and load tables are configured between the discovery services and the content vendors. Similarly, routines and procedures are set up between the local library and the discovery service vendor to accommodate updates and changes for the locally harvested records (additions, deletions, changes). Harvested content is normalized into an underlying schema, developed by the discovery service vendor, that facilitates indexing, relevancy ranking, and an even level of presentation for different content types with potentially varying levels of metadata.

9

Library Technology Reports  alatechsource.org  January 2011

Chapter x

10

e-journal, publisher package, or database subscriptions and licenses the library pays for at its local institution, there is a very large base of content indexed by the discovery services that is available for search and discovery. If a library had no paid e-content licenses or subscriptions, there would still be a large index of content to search against. Publisher agreements can permit these discovery services to index their content and provide access to citation-level metadata within the central index to all customers regardless of whether the local library itself has a licensing agreement with that publisher and has purchased access to this content. When a user clicks through to access the full text (usually using a link resolver and associated rights management information), that’s usually the point where authentication will be requested (assuming a user isn’t already authenticated; if a user is physically at a library site or on campus, authentication is usually behind the scenes and automatic, based on authorized IP address ranges tied to the library subscription). In addition, each of these discovery services includes a huge amount of open-access content. If a picture is worth a thousand words, then a test-drive of the very open Summon index will quickly illustrate this concept. You, as an unauthenticated user with no known association to a library or a library’s e-content subscriptions, can, from off site, go to any Summon library website, leave the search box blank, and execute a search. Then click on the refinement option Add Results Beyond Your Library’s Collection. Enjoy the half billion plus items that have just been surfaced. Of course, it would be frustrating if there weren’t a method available to scope to the local library collection—the physical collections and licensed e-content accessible to that library’s population. Fortunately, each discovery service is attuned to this issue and has one or more ways of providing just results for which the user has rights to, and/or providing a visual cue— whether it be a search that is by default scoped to the subscribing library’s collection (which users can expand if they wish), a Full Text Available icon next to the brief item result, or some other approach. In some cases, full resolution to an item for a library may be an interlibrary loan request, and discovery services provide for this capability as well, often in conjunction with the library’s link resolver knowledge base.

A Note of Caution Web scale discovery services, as defined in this report and focused on the library environment, are extremely new. After development periods for each service, OCLC WorldCat Local became publicly available at the end of 2007, Serials Solutions Summon in January 2009, Ebsco Discovery Service in July 2010, and Innovative Interfaces Encore Synergy and Ex Libris Primo Central Web Scale Discovery Services  Jason Vaughan

both in mid-2010. In short, the majority of services, at press time, are a year old at most. Each service is evolving extremely rapidly, with enhancement cycles measured in months, if not weeks. During the writing of this report, things changed rapidly, and significant revisions occurred between drafts. As the report goes to press, vendors will continue to ink new contracts with additional content providers, adding more content to an already huge pool. There will be interface updates, changes, new capabilities, and cosmetic changes. That said, the purpose of this report is not to highlight, let alone mention, every last detail, capability, and set of information associated with each service—that is beyond the scope of a single edition. This report profiles four discovery services; in the past, a whole issue of Library Technology Reports has focused solely on a single service.10 The purpose of this report is to provide a broad, yet substantive overview of several of the major services. Through presentations and conversations, the author has found that awareness of these tools in the library environment isn’t necessarily widespread yet. In short, this issue of Library Technology Reports is meant to introduce or further introduce the concept of Web scale discovery, to profile and provide an overview of several of the key services available today, and to provide some questions for thought and additional resources that potential library customers may wish to consider in the context of their own evaluations. Given the quite extensive array of features and functionality associated with these services, the extreme pace of development, and the flexible nature of these services, which allows a high level of local library customization and integration within the library discovery environment, there may be an unintended inaccuracy or two, or otherwise outdated information somewhere within this work. The author has made his best efforts to report accurately and apologizes in advance if errors are discovered. Research efforts related to this work included identification and review of existing publicly available and published information on these services—vendor website information and press releases, webcasts sponsored by vendors or third parties, and other published literature (which remains somewhat sparse at this early stage). It also included test-driving multiple beta or live sites for each of the profiled services, with an eye toward what the end users see. It also included detailed questions sent to each vendor, and often additional follow-up. If a particular feature is mentioned or highlighted for one service and not others, readers shouldn’t automatically assume that the other services do not have the same or a similar feature or capability. It may mean that this feature jumped out at the author while he was preparing one of the particular profiles. Or it may reflect the fact that vendors often answered the same question with different information, at different levels

Chapter x

of completeness, or from a different angle and focus. This should be no surprise to those who have conducted comparison research in the past (this author included), looking at different tools performing similar functions. For reasons that should be obvious, it would be foolish to take each new angle or feature mentioned by one vendor for its service and ask all the other vendors about it. If this approach were pursued, this report would turn into a truly Sisyphean task and would never reach completion—by the time one draft was done, each service would have changed, offering new features, and the cycle would start anew. It’s fair to say that during the course of this research, vendors at times challenged publicly made statements from other vendors11 that, for example, their service is the only one that has this feature or is the only one that covers this content. The author has striven to write this report in a neutral tone and hopes no bias toward one product or another is present. This report specifically does not include a comparison matrix of features, functions, and content for the services. Given that things are changing so rapidly with these tools, and given the reality that quite a lot of local configuration options are available to each library customer, a comparison matrix did not seem appropriate. Within each of the four chapters profiling individual discovery services, a short Vendor Perspective section is included. The author invited each discovery service vendor to provide its own short sound bite highlighting its particular offering. Each vendor chose to participate. The author suggested to each vendor that it briefly touch on the following broad points related to its Web scale discovery service:

Ground rules were simple and direct. Vendors were asked to focus on their service and why they think it is a worthy contender and not focus on competitive marketplace offerings. They were asked to keep their responses neutral in tone and were not allowed to mention competing vendors by name. In short, they were given an opportunity to succinctly summarize their service and what they think makes it stand out. The target word count was 500, and each vendor came in at around that

Library Technology Reports  alatechsource.org  January 2011

• Content scope—the capability to present local library content, such as ILS content and local digital collections, as well as, and with a special focus on, the centralized preaggregated index of primary and secondary publisher and aggregator content • End-user searching and functionality—highlights about the user interface • Local library customization capabilities—highlights related to how the library can make the discovery service its own through customization, choosing what features are offered, branding, and so on

length. The author has chosen not to edit the Vendor Perspective content, and thus the “Vendor Perspective Section” in each chapter is truly the vendors speaking in their own words. From one perspective, perhaps the most delicate topic within this report is that of publisher and aggregator indexed content. All vendors have made and continue to make agreements with publishers and aggregators for the rights to index content for purposes of surfacing such content within the discovery service. The exact scope of the content associated with each service is challenging to nail down, and vendors have provided varying levels of information on this topic; for some content-related questions, they chose not to provide particularly illuminating responses—somewhat understandable, given the competitive marketplace and, perhaps in some cases, the confidential nature of signed content agreements. For any specific publisher or aggregator, a different level of content—basic metadata, more detailed metadata, author-provided metadata, abstracts, full text—may be provided to the different discovery vendors. Some vendors may have additional in-house staff who enrich content that’s been received. Some discovery vendors may have exclusive agreements with one or more content providers. Content for the same item—a single unique journal article, for example—can be sourced from many different outlets. The amount or richness obtained from one outlet may not be equal to that from another outlet. In short, the devil is truly in the details. Each library considering a serious marketplace review should thoroughly do its homework, which in this case means thorough questioning of the vendors. It also means conducting plenty of sample searches in existing live implementations, perhaps focused on the subjects or material types most pertinent to your clientele and analyzing the results. It also means conducting your own assessment of existing usability studies and published literature. To borrow a direct quote, as the author doesn’t believe he can more succinctly express the same sentiment, with which he wholeheartedly agrees, “As history has shown, multiple solutions arise to address real needs, and each solution has its own characteristics. In terms of discovery solutions, I’m confident that each library, after conducting a thorough evaluation of facts and features, will be able to determine which of the available products best fits the library’s mission, needs, policies, and environment.”12 The author hopes this issue of Library Technology Reports serves as a bright torch to help light your way as you consider, begin, or continue investigations into Web scale discovery services within your own library environment. Continued on page 21

Web Scale Discovery Services  Jason Vaughan

11

Chapter 2

OCLC WorldCat Local

Abstract Debuting at the end of 2007, WorldCat Local represents the first to market web scale discovery service as defined in this report, and presently enjoys the largest install base of any web scale discovery service profiled in this report. This chapter provides a brief history, overview, and a few insights into the future development path of WorldCat Local, describes the local and remote content associated with the WorldCat Local index, and highlights some of the features, functionality, and flexibility associated with the WorldCat Local interface.

Library Technology Reports  alatechsource.org  January 2011

Overview

12

OCLC released the initial version of WorldCat Local in November 2007, following an earlier development period with trials dating to spring 2007. The experience of a pilot development partner, the University of Washington, was profiled in the August 2008 issue of Library Technology Reports.1 The UW pilot went live in spring 2007, and thus, for the library environment, represents the first single search discovery service combining millions of physical and electronic items within a single search result set. Approximately thirty million article-level items were intermingled with the WorldCat database in the UW pilot. In 2009, OCLC ramped up WorldCat Local and entered into additional partnerships to include substantially greater amounts of article-level content, all within an interface utilizing a single search box, relevancy-ranked results, and a back-end centralized index. Two versions of the discovery platform exist, the full-fledged WorldCat Local and the streamlined WorldCat Local “quick start.” A few of the differences are noted later in this chapter; Web Scale Discovery Services  Jason Vaughan

for a more detailed comparison, see OCLC’s informative FAQ list on its website. Many of the features in WorldCat Local are available in WorldCat Local “quick start” (and, as noted below, much of the look, feel, and functionality of both versions are carried over from the WorldCat.org catalog interface). In brief, a few key options available in WorldCat Local and absent from “quick start” include integration flexibility with multiple ILSs (for example, both a local ILS and a consortial ILS, instead of a single ILS), the option to enable users to refine search results by branch location, relevancy ranking that takes into account collections from other libraries in a consortium, and resource-sharing options other than through WorldCat Resource Sharing or ILLiad. At the time of this writing, over 1,000 sites in North America and Europe have implemented either WorldCat Local or WorldCat Local “quick start.” The majority of implementations are academic institutions, though public libraries and special libraries are represented as well.

WorldCat Local “quick start”–related FAQs www.oclc.org/us/en/support/questions/worldcatlocal/ quickstart.htm

Regardless of version, the interface and discovery service for WorldCat Local is hosted by OCLC. Product support is offered through various modes (phone, e-mail, website) and available 24/7. Assuming a library has holdings within the WorldCat catalog and a FirstSearch WorldCat subscription, WorldCat Local “quick start” is included in an institution’s base subscription at no additional cost. The full version of WorldCat Local has a one-time implementation fee and is

Chapter x

available as a yearly subscription, with pricing based on the library’s user population. Regardless of version, OCLC updates and enhancements are provided; interface and functionality updates are currently provided and installed on a quarterly basis.

Content and Scope Publisher Content

WorldCat Local single-search access: Database list www.oclc.org/worldcatlocal/overview/metasearch/dblist

WorldCat Local works with an institution’s proxy server (including the EZproxy proxy server, marketed by OCLC) to enable offsite authenticated access to licensed resources. WorldCat Local works with the library’s link resolver and the WorldCat knowledge base as a broker to licensed content; customers must have library holdings record information for their serials titles in WorldCat. OCLC has an eSerials Holdings service (free with OCLC cataloging membership) to facilitate adding and updating library holdings information (from an A–Z journals list or a link resolver database) to the WorldCat catalog. Local Resources Libraries can incorporate local resources into WorldCat Local by having their local content (records) within Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

At the time of this writing, the central index associated with WorldCat Local includes nearly a half billion items (over half of these being articles), with content sourced from journal publishers, article citation aggregators, and the WorldCat database. Updates to content, ranging from daily to annually, are provided by publishers; once provided, such content is loaded and indexed within a few days at most. Article citation content is sourced from four major pools. First, journal publisher agreements with Springer, Taylor and Francis, Wiley, IGI Global, Nature, Sage, Emerald, and others contribute over seven million records directly into the WorldCat Local central index. Second, article citation aggregators such as ArticleFirst, Medline, ERIC, British Library Inside Serials, JSTOR, OAIster, and Elsevier provide over 100 million citations open to search to all WorldCat Local customers. Third, the WorldCat database contains nearly four million additional article citation records. As noted in a 2010 press release, readers are likely aware that OCLC is transitioning away from serving as a host and reseller of commercially published content and actively engaged in publisher partnerships to increase content with the WorldCat.org and WorldCat Local platforms.2 Reflecting this transition, a fourth major pool of content is sourced from direct agreements with database providers, both OCLC-hosted and licensed databases (other than ArticleFirst, referenced above), and other database providers, such as MLA, Ebsco, Gale, and H. W. Wilson. Through agreements with OCLC, these providers enable some of their subscription database content to be indexed and included in WorldCat Local’s central index, and this content is available for those WorldCat Local customers who also subscribe to those databases (through whichever platform the library chooses to subscribe). Content from the first three pools is discoverable for all WorldCat Local customers; content from the final pool is scaled and included in the discovery experience for those libraries that maintain matching subscriptions to these third-party databases. At the time of this writing, content from over 180 directly licensed databases and collections (from over 61,000 journals) are incorporated into the central preaggregated index, some or all of which are available for search and discovery, dependent as mentioned on local library subscriptions. A regularly updated list of these databases is available at the OCLC WorldCat Local website (which also includes a list of database

searchable via federated search, described toward the end of this chapter). The local library has control over which particular (and eligible) databases can be included as part of a WorldCat Local default search. As with other discovery services, OCLC continues to work with new providers to expand the coverage of its central index. OCLC indicated that it has recently signed agreements that will add content from another eighty-plus databases and collections, adding significant citation-level content. The article citation records in WorldCat Local provide to WorldCat Local subscribers discoverability of 75 percent or more of journal content from over 1,000 licensed databases and collections of importance to libraries, enabled through the greater integration of the WorldCat knowledge base with WorldCat Local. In addition to commercial article-level content, WorldCat includes content, searchable by all customers, from various well-known open-access repositories, such as materials from OAIster, mentioned above, and HathiTrust. Over 4.5 million e-book metadata records from mass digitization providers such as Google (described in more detail later), major aggregators (such as NetLibrary, ebrary, and Ingram), and additional commercial e-book publishers are also included. In addition, at the time of this writing, OCLC is working with over a dozen partners on expanding the functionality of the WorldCat knowledge base to facilitate sharing article content. In part, these efforts will build upon the discovery of additional large volumes of electronic content through the WorldCat Local discovery service and are part of OCLC’s development of Web scale subscription and license management services and workflow.

13

Chapter x

the WorldCat catalog. Indeed, a vast number of libraries already have their catalog holdings information within the WorldCat database. However, depending on local practices and priorities, records and holdings information in the library’s local catalog may be more complete, accurate, or recent. If the local library’s records are not already in WorldCat, they are not part of the WorldCat Local discovery experience; such records (and the library’s OCLC symbol) must be present within the records in the WorldCat master catalog. Libraries can request a one-time (free) batchload of current ILS records into the WorldCat database, which will apply necessary updating and reindexing of MARC records. Local ILS records must have an indexed OCLC number. On an ongoing basis, if a record is cataloged in OCLC by the local institution, it becomes visible in WorldCat Local in real time. Nightly updates of library holdings records are performed. Apart from traditional catalog records, content from other OAI-PMH–compliant local library repositories can be harvested, with the metadata subsequently crosswalked to MARC, the underlying schema for the WorldCat database. OCLC provides a WorldCat Digital Collection Gateway tool for CONTENTdm and other OAI-PMH–compliant digital repositories, facilitating the import and publishing of CONTENTdm records into WorldCat, at which point they become searchable and exposed to all customers through WorldCat and WorldCat Local. Taken in full, the WorldCat Local preaggregated central index is one unified index encompassing both the preindexed remote publisher and aggregator content and local content sourced from the local library and other OCLC member libraries.

Library Technology Reports  alatechsource.org  January 2011

Relevancy

14

By default, a WorldCat Local search is a keyword search against a majority of indexed MARC record fields. At present, WorldCat Local does not search full text, though OCLC has indicated that it intends to pursue rights to index full text and enable full-text searching. OCLC’s relevancy ranking includes search term proximity measures on terms appearing in the title, subject, and author fields; other MARC fields are also included in the search but given less weight. Currency also factors into relevancy, as well as the number of institutions holding an item (for physical holdings). By default, results are sorted and returned by library and relevancy. However, a library can choose to remove the weighting of its local holdings from the relevancy calculations and have items returned based strictly on relevance (and regardless of whether the library locally owns that item or not). At present, some level of deduplication is applied to records. For records within the first content pool as described above (available to all WorldCat Local libraries), records are deduped upon loading. At the time of this writing, OCLC is currently Web Scale Discovery Services  Jason Vaughan

evaluating potential methods for applying deduping processes to the other content pools (those requiring local library subscriptions).

Interface Features: Overview, Results, and Navigation General WorldCat Local can be envisioned as a localized, customized version of the WorldCat catalog. The system interface is currently available in six languages (Chinese, Dutch, English, French, German, and Spanish). Many elements from the parent interface, such as the search features, social tools, and WorldCat user accounts, are present in WorldCat Local. Using the parent WorldCat template as a start, WorldCat Local customers have several “look and feel” interface elements they can customize to help set this service apart from the parent catalog. Customers can define the background color scheme and customize the WorldCat Local menu bar and search box with their library name and logo and branding elements. The local library can add its own custom hyperlinks within the header present on the default search page and returned results page. As with each discovery tool profile in this issue of Library Technology Reports, the description that follows represents a typical instance of WorldCat Local, but many elements can be customized by the local library. By default, WorldCat Local provides a single search box (figure 1); a link to an advanced search option is provided and available from both the initial search screen and the returned results pages. For the full version of WorldCat Local, libraries can have up to four selectable collection “tiers” for a user to search; the first three are defined by the library, and the last is the always-present global tier of WorldCat libraries (called Libraries Worldwide in the search box pull-down menu). In a typical instance, many customers may choose to use two tiers—the tier coinciding with their own local institutional holdings (including the article-level content as described above) and the Libraries Worldwide tier, which expands the search to include the holdings of other libraries worldwide as represented in the parent WorldCat catalog. Intermediate tiers, if the library chooses to utilize additional tiers, may, for example, include holdings of a consortium of which the library is a member (or a group the library would like to set as an intermediate tier for other purposes, for example, interlibrary loan). WorldCat Local “quick start” customers can utilize up to three tiers—the local institution collection, a group view of other libraries that may be on the same ILS (if such is the case for that particular library system), and the global view of all WorldCat libraries. Regardless of which collection tier the user chooses to search,

Chapter x

Figure 1 WorldCat Local single search box

local institutional holdings will appear toward the top of the returned results (assuming that the default sort option, described below, is based on library and relevance). The advanced search option (figure 2) allows a user to enter search terms for up to three user-selected record fields; input a range of years to search; select which formats to return (such as book, article, sound recording, journal/magazine/newspaper, etc.); select an audience (juvenile or non-juvenile); and choose to return results in only a particular language. In addition, users can select content type to return (fiction, nonfiction, biography, thesis/dissertation). Most of these advanced search options are also offered via faceted searching, described shortly. WorldCat Local does not require offsite users to authenticate to search the service. The local library configures authentication requirements and at what point users may be asked to authenticate, typically either before they conduct an initial search or when they try to retrieve a full-text item. For content drawn from some of the pools of specific local library database subscriptions (as described above), an unauthenticated user might, prior to the initial search being executed, get the message saying, “Some of the information that you’ve requested can only be displayed to authorized users” and providing an option to authenticate. As mentioned above, WorldCat Local works with typical library proxy servers to provision offsite access.

Once a user has conducted the search, returned results occupy most of the screen. A Refine Your Search pane, providing faceted navigation, described below, occupies the left side of the screen. Users can choose to sort results by Library and Relevancy or by Relevancy Only, as well as by author, title, and date. Icons exist for various item types, such as books, audiobooks, articles, and so on. For books (figure 3), initial returned information includes title, authors, language, publisher (name and city), and publication date. If a library has a subscription for Syndetic Solutions enrichment information (see below), book cover images, if a match exists, display within the brief results view. A real-time status call is not made within the brief results view; such a call is made in the detail view, described below. An indication of whether the local library owns the item is displayed within the brief list of results. A View All Editions and Formats link retrieves a listing of all formats for that item, such

as different language editions, audiobook editions, and so on. Typical information provided for article content (figure 4) includes article title, author, journal title, publication information (publisher, city, date), and the source database for the content. Faceted Navigation and Search Refinement Faceted navigation is provided in WorldCat Local through a Refine Your Search pane (figure 5). Facet categories include author, format (book, article, sound recording, etc.), year, content (biography, thesis/dissertation, etc.), audience (juvenile, non-juvenile), language, and topic. The top five choices for each category are displayed, followed by a Show More option to expand the list of choices for that facet category (if more choices exist). Users can make one facet choice per category (e.g., a single year—and not a range—in the Year facet category, though a range of years can be specified within the advanced search mode). As additional choices from different categories are made, a Search Results breadcrumb trail displays at the top, allowing a user to backtrack (and thus expand the results list) to earlier steps of the refinement. For all facet choices, the number of matching items for that choice is displayed in parentheses. Returned Results—Detail View Clicking on a title or cover image invokes the detail view of a record in the same browser window (figure 6). Given that the user is now at the single-record level, the refinement pane disappears. For books, assuming a local library subscription to Syndetic Solutions’ Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

Returned Results—Brief View

Figure 2 WorldCat Local advanced search

15

Chapter x

Figure 3 WorldCat Local brief results: book

Library Technology Reports  alatechsource.org  January 2011

Figure 4 WorldCat Local brief results: article

16

enrichment content exists (and an item match occurs), information such as book cover images, reviews, first chapters, and tables of contents may be displayed within the detailed record view. If a library does not subscribe to an enrichment service, evaluative content is still available within WorldCat Local and included with all subscriptions to the service. This includes over thrity-four million pieces of evaluative content, such as millions of book cover images, tables of contents, summaries, and reviews and ratings, as well as other evaluative data. A variety of additional information is provided via collapsible headings appearing below the basic citation-level information. First is a Find a Copy in the Library heading, which makes a real-time status call and provides location, status, and call number information for a local library copy (if owned) and provides a link to request or hold the item (whose functionality in part relies on local policies and systems in place). Also available is a listing of other libraries that own the item, retrieving matches to nearby libraries possessing the item (and providing a link to request the item via interlibrary loan). A Details heading provides more base-level information about the book, such as material type (e.g., biography, primary school), ISBN, physical size, and so on. Some libraries may opt to include a Buy It heading, which displays the prices and contextual links to online booksellers offering the item. A Reviews heading offers user-contributed reviews and can include reviews from venues such as Amazon, weRead, and Goodreads. A Tags heading provides user-contributed tags, if any have been contributed. A Similar Items heading provides contextually linked related subject headings, as well as publicly viewable user lists (if any) that include the book (described in more detail below). The detailed information view for an article (figure 7) provides many of the same collapsible information headings as a book, though some of these are not as relevant to items at the article level (such as user reviews and tags). If the library receives the associated journal in print, location information and holdings information are provided. How a link appears for access to the full text varies depending on the source of Web Scale Discovery Services  Jason Vaughan

Figure 5 WorldCat Local facet pane

Chapter x

Figure 8 WorldCat Local export options

Figure 6 WorldCat Local details: book

Figure 9 WorldCat Local shopping cart

Exporting Options, Shopping Carts, RSS Feeds

Figure 7 WorldCat Local details: article

the article as described in the content section above. In many cases, a heading Find a Copy Online includes a Get It, Check for Electronic Resources, or similarly named link that connects with the library’s link resolver to broker a connection to the full text. Alternatively, in some instances a direct link to the full text may exist in the record. Recently, OCLC introduced a knowledge base, as referenced above, which provides

WorldCat Local offers a variety of export options (figure 8). A check box is present near each item in the brief results view; users can select particular items or choose Select All. Selected items can be saved to a list (figure 9)—a newly defined and named list, an already existing list, a Things I Recommend list, a Things I Own list, or a Things to Check Out list. A user must have a WorldCat user account and be logged in to save items to a list. Users can choose to make lists viewable by anyone or designate a list as private. The list offers a shopping cart view, complete with book cover images (if available), citation information, and various export options. Users can choose to export citations as HTML, rich text, CSV, or the citation tag format RIS Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

link-resolution functionality built into WorldCat Local. With this functionality, a library can synchronize its holdings information to provide more seamless oneclick access to an item.

17

Chapter x

Vendor Website OCLC WorldCat Local

www.oclc.org/us/en/worldcatlocal/default.htm

Example Implementations Lincoln Trails Library System www.lincolntrail.info/linc.html

University of Delaware www.lib.udel.edu

University of Washington www.lib.washington.edu

Willamette University

http://library.willamette.edu

Library Technology Reports  alatechsource.org  January 2011

Figure 10 WorldCat Local Cite Export dialog box

18

and can choose to export to citation management programs, such as RefWorks or EndNote. Users can choose from five citation styles (APA, Chicago, etc.). Users can choose to print the list or share the list with up to five e-mail addresses. Users can also monitor changes to any public list by subscribing to an RSS feed. Users can add tags and notes to their lists. In addition to marked items, users can choose to save searches. Apart from the shopping cart approach, users have various export options from the detail view of a record. A Cite/ Export link allows them to generate and copy a citation in various formats and export citations to various citation management programs (figure 10). Users can also print and e-mail a record from the detail view. Some of this functionality is also available from the brief results view.

Additional Features Did You Mean? Spelling Suggestions WorldCat Local provides Did You Mean? functionality to help address misspelled words, and the contextualized hyperlinked suggestion can be clicked to automatically execute a search for the suggested word. Social Features WorldCat Local supports various social features, including the ability for users to add reviews and tags to records. Users can choose to share a hyperlinked search query, detailed item of interest, or created list via various social and bookmarking sites, such as Facebook, Delicious, and over a hundred others. Users can create a public profile within WorldCat, providing Web Scale Discovery Services  Jason Vaughan

a photo and personal details such as favorite websites and interests. Embedding in Other Resources WorldCat Local uses permanent URLs accessed via a Permalink icon, so libraries wishing to provide a link for a “canned search” can do so and embed these links where they desire. The WorldCat Local search box can be embedded in external webpages, such as a course management system, a LibGuide, and so on. If desired, libraries can manipulate the HTML and precode search parameters into the search string behind the scenes, allowing them to customize the scope of content searched for an embedded search box. Searching of Additional Remote Resources While it is not the focus of this issue of Library Technology Reports, WorldCat Local does offer federated search capabilities. WorldCat Local “quick start” customers have access to add only OCLC databases; WorldCat Local full customers can also add non-OCLC databases. As referenced earlier, a regularly updated list of these databases is available at the OCLC WorldCat Local website, with indications of which resources are available in the central index and which can optionally be configured and included via federated search. Should a library include such additional databases, results from the central index and remotely searched databases are blended together and presented to the user. New database connectors for federated search purposes are released on an ongoing basis by OCLC and are free of charge to customers.

Chapter x

Figure 11 WorldCat Local Google integration

OCLC and Google in 2008 entered into an agreement that, through mutual data exchanges, helps expose WorldCat records within a Google Book search, and expose Google digitized books within WorldCat. In a Google Books search, when viewing a record (figure 11), the user is presented with various buying options (Amazon, etc.) as well as a Find in a Library link. Clicking on this link connects to the WorldCat.org catalog and executes the appropriate search, presenting the record and additional content, including libraries nearby that hold the item. In addition a Google Book search API is able to facilitate a connection to books scanned through the Google books digitization project via the Get It link within the WorldCat detail record view. At the time of this writing, over three million Google books have been loaded into the WorldCat catalog. Statistics WorldCat Local uses the Adobe SiteCatalyst (powered

by Omniture) analytics tool to provide various statistics. Among other things, statistics available include number of searches conducted, search terms used, unique user counts, number of search results, zero-hit searches, and viewed items. Also provided is information on which facet choices were used for refinement purposes, as well as information on searches using the advanced search mode. Information on usage of library fulfillment options, such as holds placed and ILL requests, is also available. Statistics are available by customizable date range and include graphing and export options. Mobile Interface At the time of this writing, a mobile website version of WorldCat Local is under development (figure 12) and currently exists in beta status, with a production version likely to be released in 2011. It’s expected the production version will support searching as found on full WorldCat Local sites, as well as configurable links, library-specific details, and library-specific landing pages optimized for the mobile environment. The beta optimized mobile version of the WorldCat.org Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

Google Books Integration

Figure 12 WorldCat Local mobile version

19

Chapter x

catalog can be accessed on the WorldCat website. The mobile version allows users to search for materials and displays nearby libraries holding the items, utilizing the location services provided by many modern smartphones and their associated operating systems. The mobile version provides book cover images (if available), citation information, and the ability to e-mail citations. In addition, several dedicated third-party iOS-based applications exist. Such applications display information provided in part by WorldCat, and currently include Book Bazaar, CampusBooks, pic2shop, and RedLaser. Increasingly, dedicated apps are also available for the Android operating system.

WorldCat Mobile Web Beta www.worldcat.org/m

Upcoming Directions OCLC has provided some directions the company is currently investigating for possible future WorldCat Local enhancements. OCLC continues to work with publishers and other data providers to surface electronic resources in WorldCat Local in order to represent the subscriptions and collections of OCLC member libraries. OCLC intends to continue evaluating usability test results to help inform improvements to the user interface, which will likely include additional deduplication of records and the ability for the user to limit items to full-text resources only and to peer-reviewed items only. The company seeks to evaluate and potentially streamline the number of user clicks to full resolution of an item (such as to the full text) through leveraging the WorldCat knowledge base and incorporating real-time status checks for ILS catalog materials into the brief results view. Local

Library Technology Reports  alatechsource.org  January 2011

Vendor Perspective: WorldCat

20

WorldCat Local is the discovery and delivery service that offers access to more than 475 million items from a user’s own library and the collections of OCLC member libraries around the world through a single search box. One search provides instant access to all library materials—digital objects, electronic materials, databases, eJournals, music, videos, audio, eBooks, maps, journals, theses and books—in addition to materials in group and consortial catalogs and thousands of WorldCat libraries worldwide. Because WorldCat Local provides access to libraries’ collections through a single search box, users no longer have to consult a variety of separate resources and interfaces. OCLC partners with organizations like Google Books, the HathiTrust, JSTOR and OAIster to provide every WorldCat Local search with deep and useful results from an extraordinary range of collections. OCLC also works with major publishers and content partners from around the world to allow WorldCat Local libraries to provide access to: major aggregators of eBooks, including NetLibrary, Ebrary, Overdrive and MyiLibrary; large mass digitization collections, including Google Books and HathiTrust; content from publishers such as Springer, Wiley, Elsevier, Taylor & Francis, Oxford University Press and more. Results sets from a WorldCat Local search are presented with a user’s library materials first, then group and consortial results, and finally items throughout the world represented in WorldCat. Nowhere else will users find so much authoritative content in one place. With WorldCat Local, users are presented with only the most appropriate fulfillment options, quickly connecting them with the items they need. The service integrates with existing circulation, resource sharing, and Web Scale Discovery Services  Jason Vaughan

resolution options for an intuitive user experience. And because WorldCat Local integrates with live circulation data, users know immediately whether (and where) an item is available. One click lets an authorized user view an electronic copy, place a hold or make a resource sharing request. Library staff will appreciate benefits related to centralized access, too. When all the library’s collections are represented in the WorldCat database, less time is required to maintain data in multiple locations and systems. No separate data loads are required for libraries that contribute and maintain their holdings in WorldCat. WorldCat Local builds on the processes already in place at the library. OCLC’s unique position as a worldwide library cooperative allows every member to contribute to and benefit from the combined purchasing and licensing power of the membership as a whole. Working together, OCLC members are able to provide better service for library users everywhere. WorldCat Local’s social networking and workflow tools also allow people to explore information together, sharing opinions and expertise with peers while creating a greater connection to the library. Faculty can organize materials for class requirements with WorldCat Lists, while students can use them to keep track of what they need to borrow for their research. User reviews, recommendations, tags and personal profiles let people customize the discovery experience and interact even further. WorldCat Local brings people and content together in more ways than ever, helping users and groups incorporate library resources into their everyday learning activities.

Chapter x

customers may be empowered to add additional local data to WorldCat records (such as local subject headings, notes, etc), with such data available for search and display. Local holdings information for all formats (including such things as volume, issue, call number, and location information) would be open to search and display. OCLC intends to maintain a focus on internationalization, localization, and translation efforts aimed toward optimizing the experience for non-English-speaking users. Additional local library flexibility with the user interface (such as providing library-specified modules or widgets in the display) is also being investigated. The development of a

production mobile version of OCLC WorldCat Local, as referenced above, is well underway.

Notes 1. Jennifer Ward, Pam Mofjeld, and Steve Shadle, “WorldCat Local at the University of Washington Libraries, Library Technology Reports 44, no. 6 (Aug./ Sept. 2008). 2. OCLC, “OCLC and H.W. Wilson to Transition Database Subscriptions from FirstSearch to WilsonWeb,” news release, March 17, 2010, www.oclc.org/news/ releases/2010/201016.htm.

Web Scale Discovery, continued from page 11

Notes 1. Marcia J. Bates, Improving User Access to Library Catalog and Portal Information, final report, version 3 (Washington, DC: Library of Congress, 2003), 4, www .loc.gov/catdir/bibcontrol/2.3BatesReport6-03.doc .pdf. 2. Roger C. Schonfeld and Ross Housewright, Faculty

Survey 2009: Key Strategic Insights for Libraries, Publishers, and Societies (New York: Ithaka S+R, 2010),

4, www.ithaka.org/ithaka-s-r/research/faculty-sur veys-2000–2009/Faculty%20Study%202009.pdf. 3. OCLC, Online Catalogs: What Users and Librarians Want (Dublin, OH: OCLC, 2009), 20, www.oclc .org/reports/onlinecatalogs/fullreport.pdf. 4. Ibid., vi, 14. 5. Karen Calhoun, The Changing Nature of the Catalog

and Its Integration with Other Discovery Tools: Final Report, (Washington, DC: Library of Congress, 2006),

We Provide Bibliographic Services for the University of California: Final Report (University of California

Libraries, 2005), 2, http://libraries.universityof california.edu/sopag/BSTF/Final.pdf. 7. OCLC, College Students’ Perceptions of Libraries and Information Resources (Dublin, OH: OCLC, 2006), pt. 1, p. 4, www.oclc.org/reports/pdfs/studentpercep tions.pdf.

Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

35, www.loc.gov/catdir/calhoun-report-final.pdf. 6. Bibliographic Services Task Force, Rethinking How

8. Doug Way, “The Impact of Web-Scale Discovery on the Use of a Library Collection,” Serials Review 36, no. 4 (December 2010): 214–220. 9. Bill Kelm, “WorldCat Local Effects at Willamette University.” July 21, 2010, presentation on Prezi, http:// prezi .com/u84pzunpb0fa/worldcat-local-effects-at-wu. 10. Jennifer Ward, Pam Mofjeld, and Steve Shadle, “WorldCat Local at the University of Washington Libraries,” Library Technology Reports 44, no. 6 (Aug./ Sept. 2008). 11. Stan Sorensen, “Serials Solutions Responds to Letter by EBSCO’s Tim Collins,” letter to the editor, Charleston Advisor (July 2010), www.charlestonco.com/ index.php?do=Letters+to+the+Editor&pg=let_ details&let_id=174; Nancy Dushkin, “Ex Libris Primo Responds to Interview by Jane Burke,” letter to the editor, Charleston Advisor (July 2010), www.charles tonco.com/index.php?do=Letters+to+the+Editor& pg=let_details&let_id=173; Tim Collins, “EBSCO Responds to Jane Burke Interview,” letter to the editor, Charleston Advisor (July 2010). www.charlestonco .com/index.php?do=Letters+to+the+Editor&pg=l et_details&let_id=172. 12. Nancy Duskin. “Ex Libris Responds to Interview by Jane Burke,” letter to the editor, Charleston Advisor (July 2010), http://charlestonco.com/index .php?do=Letters+to+the+Editor&pg=let _details&let_id=173.

21

Chapter 3

Serials Solutions Summon

Abstract With a general release in mid-2009, Serials Solutions Summon is another early contender in the Web scale discovery space for libraries. Serials Solutions Summon was built from the ground up as a library Web scale discovery service. This chapter provides a brief history and overview of the Summon service, describes the local and remote content associated with the Summon index, and highlights some of the features, functionality, and flexibility associated with the Summon interface.

Library Technology Reports  alatechsource.org  January 2011

Overview

22

Serials Solution began dedicated development of its Web scale discovery solution, Summon, in 2008, building the product from scratch as a new platform. Public announcement occurred in January 2009, and after work with development partners, Summon entered general release in July 2009, making it one of the early entrants into the library Web scale discovery environment. At the time of this writing, Summon has over 120 committed customers in eighteen countries; 80 of these sites are currently live. Summon’s development focus was academic customers, and such customers make up the lion’s share of current sites. That said, the Summon discovery service is also in use at three public libraries, as well as at a statewide library system, of which hundreds of public libraries are members. Summon is offered as a hosted software-as-aservice solution providing the Summon service and index. Annual subscription pricing relies primarily on the institution’s FTE count, but also considers other factors, such as the degree-granting status for university customers. The pricing for Summon is not impacted Web Scale Discovery Services  Jason Vaughan

by the number of items included from a library’s local collections. Discounts are available for multiyear and consortial subscriptions. The annual subscription fee is inclusive and covers items such as ongoing support, inclusion of local content, access to developed APIs, and application enhancements. Serials Solutions provides updates and enhancements approximately every three to four weeks, and, because the service uses a hosted model, these updates are provided quickly to its customers. Serials Solutions support is available 24/7, and a variety of communication options are provided. Serials Solutions indicates that new customers can typically have their Summon instance live within six weeks from the start of implementation.

Content and Scope Publisher Content Summon currently has a very large centralized index, providing access to content sourced from a multitude of commercial databases and publishers. This material includes content from 94,000+ journals and 6,800 publishers. As of August 2010, the Summon index numbers over half a billion items. By item count, the two largest content types are newspaper articles and journal articles, though various other content types, such as books, theses and dissertations, conference proceedings, music scores, and audiovisual materials are also present. A regularly updated list of participating publishers and journal titles indexed can be accessed at the Serials Solutions website. Agreements have been made with many major content providers and aggregators; chief providers participating in Summon include ProQuest, LexisNexis Academic, and Gale (which include around 4,000 publishers).

Chapter x

Nearly 100 academic publishers are involved, including Springer, IEEE, Emerald, ingentaconnect, Sage, and Taylor and Francis. Additional key players include Thomson Reuters Web of Science and ABC-CLIO. In addition to licensed commercial content, the Summon service also indexes several open-access repositories, such as the DOAJ (Directory of Open Access Journals), Hindawi Publishing, arXiv.org e-Prints, and the HathiTrust materials. Serials Solutions notes that over 10 percent of members of the Association of Research Libraries use the Summon discovery service and that the Summon index covers between 85 and 95 percent of the breadth of their collections. At the time of this writing, Serials Solutions is working with Elsevier on a trial related to incorporating Elsevier’s direct content into the index. As noted in chapter 1, there are often multiple avenues to particular content for discovery services; for example, regardless of the above-mentioned trial, Serials Solutions notes that a large amount of Elsevier content is already present within the Summon index, such as 100 percent of the ScienceDirect Freedom Collection and approximately 90 percent of Scopus.

Summon Content & Coverage page www.serialssolutions.com/summon-content-and-coverage

The Summon service utilizes a single unified index composed of both publisher and aggregator content and content sourced from the local library’s collections. Summon is able to harvest records from all major ILS platforms, digital collection management systems, and institutional repositories, based on typical schema such as MARC, Dublin Core, XML, and EAD. Summon can accommodate “home-grown” or nonstandard local databases as well, provided the library can export the records. Summon supports harvesting and delivery methods such as OAI-PMH and FTP. All Summon customers search across this unified index. Content is scoped and informed by local holdings information. By default, Summon displays only search results for content accessible by that library, whether it is content sourced from publishers and aggregators or content harvested from the local library. Should users click on the option Add Results Beyond Your Library’s Collection (described below), they are able to expand their search to the full Summon index (with the exception of other libraries’ catalog records harvested into the index). Libraries can selectively configure whether or not a search by their users will include other libraries’ digital collections and institutional repository materials. If not, users choosing Add Results Beyond Your Library’s Collection can search such content. Note that, should a library wish, it does have the option of keeping its own digital collections and institutional repository content private so that the content is not searchable by other Summon customers. At the time of this writing, content from over fifty digital collections and institutional repositories is included and discoverable in all Summon sites through the expanded Add Results Beyond Your Library’s Collection choice. For local harvested ILS records, data is updated nightly. Information from other local repositories, such as digital collections data, is updated on a schedule determined by the library working with Serials Solutions; such updates can be handled through an automatic update schedule and can happen as frequently as daily. Relevancy By default, a Summon search is a keyword search conducted across both metadata and full text, with items returned ranked by relevancy. Summon uses a proprietary relevancy algorithm, with different weights assigned to various metadata fields. Relevancy determination for indexed full text includes parameters such as term proximity and frequency. Different parameters may apply to different content types. For example, relevancy calculations for a journal article include whether or not it appears in a peer-reviewed journal and the number of times the article has been cited. Currency is a factor in relevancy determination for almost all content Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

Serials Solutions seeks rights to index the full text from the content providers it works with and indicates that it indexes the full text of the vast majority of content providers. In addition, the Summon service indexes and utilizes fielded metadata provided by publishers and aggregators. Serials Solutions utilizes automated processes that allow new content to be added and indexed quickly. Different content providers provide new content on a variable basis, and content is indexed and included in the Summon service on a schedule appropriate to the content, which, for example, may be daily for newspaper content and monthly for a monthly journal. As with other vendors, the central index continues to grow as additional content agreements are pursued and finalized; for example, between July and December 2009, the number of indexed journals doubled. The Summon index is open to search and does not require initial user authentication. In a usual configuration, Summon works with library’s link resolver to broker access to full-text content owned or licensed by the library, and works with the library’s proxy server or alternate authentication method to enable access to licensed content by offsite users. For library customers that subscribe to other Serials Solutions products, such as 360 MARC Updates and 360 Core services, these products contain holdings information to help inform rights management.

Local Resources

23

Chapter x

Figure 13 Summon single search box

types included in the Summon index. As with other vendors, the relevancy algorithm is continuously tuned. As content for a single unique item can be sourced from multiple content providers (e.g., metadata from database records, journal article content from publishers, etc.), Summon creates a merged record accommodating what it feels is the strongest metadata from each provider. Regardless of whether content for a particular item comes from one provider or multiple providers, Serials Solutions looks to correct errors and normalize the data. In addition, further enrichment information is provided, informed by Ulrich’s peer-review information and Serials Solutions’ journal authority information. Merged records help with the deduplication of unique items, and Serials Solutions indicates that it works to continually enhance and refine its deduplication routines.

Interface Features: Overview, Results, and Navigation

Library Technology Reports  alatechsource.org  January 2011

General

24

There is no universal description for the Summon interface, as it is quite flexible, giving customers much latitude with customization and design choices. Summon is built upon a Web-based open API, allowing broad flexibility. At one extreme, customers may use the basic out-of-the-box template; at the other, they may tap the open API, allowing library staff to design an interface from scratch while using the Summon service and central index behind the scenes. In this latter scenario, data can be pulled from the Summon service and presented in a custom-designed locally hosted interface or within a competing interface produced by another web scale discovery service or vendor. For the out-ofthe-box interface, which will form the basis for the rest of this chapter, library customers have some latitude in customization. For example, libraries can add a hyperlinked library logo. A new enhancement, the Summon Customizer, provides additional flexibility, allowing libraries to integrate custom HTML (including library style sheets) into the header and footer sections of the interface. Local language support is provided for over a dozen languages, including English and a wealth of additional Western European languages, Japanese, and simplified and traditional Chinese. By default, Summon offers a single search box (figure 13). Once a search is conducted, the full interface Web Scale Discovery Services  Jason Vaughan

Figure 14 Summon advanced search

is invoked, which includes an advanced search option (figure 14). Several of the advanced search options, many keyed to fielded data, are also offered via faceted navigation, described shortly. Users can choose to show only items with full text online and to exclude newspaper articles through both the advanced search interface and facets in the standard interface. Invoking the advanced search option pushes the rest of the interface toward the bottom of the screen, so initial results are still viewable with the advanced search box displayed at the top of the screen. The advanced search box includes free-text boxes for terms, author (Written/Created By), words in title, and publication title, and boxes for volume and issue. Users can input a From/To publication date range. Should a library wish, the advanced search link could be offered in concert with the single search box at the outset, prior to the user conducting the first search. Returned Results—Brief View Once a search is started, users will spend most of their time in the returned results interface for refinements and subsequent searches. Most of the screen real estate is dedicated to a user’s returned results; a left-side refinement pane is described below. By default, results are sorted by relevancy; additional sort options can be chosen via pull-down menu and include relevance, date newest, and date oldest. Each item type, such as journal articles, newspapers, and so on, has a unique icon, along with a Full Text sunburst graphic for items whose full text is available online. For books (figure 15), typical information provided

Chapter x

Figure 16 Summon brief view: article with callout

Figure 15 Summon brief result: book with callout

Faceted Navigation and Search Refinement A refinement pane occupying the left side of the interface allows a user to drill down to more specific

Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

includes author, ISBN, publication date, length in pages, and subject terms. For books physically held at the library, call number information is provided, as is real-time status availability and library locations. Depending on image availability, book cover images are provided. Summon works with various enrichment services providing book covers, and as of mid-2010, the enriched content is based on whatever subscription the individual library customer has with a provider. If no subscription is present or if an image is otherwise not available, a book icon is presented for that material type. For journal, trade publication, and newspaper articles (figure 16), typical brief record information includes author, publication title, ISSN, volume and issue information, and publication date. Records may also include subject terms, article start page, and a few lines of the full text. Results from various item types appear basically the same, as the underlying content from these sources is mapped to a universal Summon schema. With all returned results regardless of format, users can place the cursor over the item title, invoking a Preview callout with more item information. Depending on resource type, more information could include an abstract, language, start page, and genre (e.g., interview, news).

results (figure 17). At the top of the refinement pane, options are presented for limiting to full text online only and to articles from scholarly publications (informed by information from Ulrich’s). There’s also an option to exclude newspaper articles and the option Add Results Beyond Your Library’s Collection. As described above, clicking this check box will enable the user to search the entire Summon index and not just materials owned and licensed by the local library. This search doesn’t include other libraries’ bibliographic ILS records but can include their digital collections. As noted in chapter 1, the Summon index used by all Summon customers is completely open for search; that said, some resources require a local library subscription even to see the basic metadata— abstract and index information (such as MLA-sourced content). Faceted navigation is provided through the refinement pane. Facet choices are organized under various categories; which categories may appear is dynamic to the search results. The Content Type facet category includes choices such as newspaper articles, journal articles, theses/dissertations, books, book reviews, trade publication articles, conference proceedings, and e-books. Other possible content type choices, depending on the search, include special collections, video recordings, and maps. Other usual facet categories include Subject Terms, Library Location, Author, Language, Genre, Region, and Time Period. The Publication Date facet includes a graphical slider with a bar chart, allowing the user to choose publication date boundaries; alternatively, users may type in start and end dates for publication ranges. Next to each facet choice, the number of items matching that facet choice for the particular search terms is shown in parentheses. The most popular facet choices are shown for each category; in each category, a More Options link invokes a menu callout with additional

25

Chapter x

Figure 18 Summon Include/Exclude facet choices

Library Technology Reports  alatechsource.org  January 2011

facet choices, allowing a user to include or exclude particular choices within that category (figure 18). Facet choices are presented with check boxes, so users can include or exclude multiple refinement choices within the same category. Serials Solutions is investigating customer customization of facets as a potential future enhancement. Present under the search box are radio buttons for Keep Search Refinements and New Search, allowing the user to maintain the refinements already selected for subsequent searches or clear them.

26

Returned Results—Detail View More details for a record in Summon can be retrieved, in part, by mousing over the item title, invoking a callout that provides more record information, as described above. Clicking (as opposed to mousing over) any returned item title opens a new tab or window, depending on how the user’s browser is configured. Clicking on a physical item title, such as a hard-copy book, will open up a window or tab for the library’s underlying traditional OPAC (or newer discovery layer) and provide full additional information on the item through the catalog’s native interface. Clicking on an online electronic resource will invoke the institution’s link resolver and, depending on the link resolver configuration, either carry the user straight to the full text or present a list of content providers from which to choose. Exporting Options, Shopping Carts, RSS Feeds

Figure 17 Summon refinement pane

Summon offers a variety of export options. Users can mark items of interest and see all consolidated items in a shopping cart list (figure 19). Export options include sending to a printer, sending as e-mail, or exporting into a citation management program. Supported citation manager programs include RefWorks, EndNote, and BibTeX. RSS feeds can be set up for the results of a

Web Scale Discovery Services  Jason Vaughan

Chapter x

search. Individual user accounts are not associated with Summon, so users cannot log in, save a results list, and access it later; rather, export methods are session-based.

Additional Features Did You Mean? Spelling Suggestions Summon offers Did You Mean? functionality to address misspelled words; clicking on a suggestion automatically reruns the search. For searches where no results occur, a separate list of suggestions will appear in addition to the Did You Mean? suggestion. In addition to generic suggestions such as Try Different Keywords and Try Fewer Keywords, one suggestion includes a link that, when selected, will activate the option to include results beyond the library’s collection and automatically rerun the search. Embedding in Other Online Venues Summon uses persistent URLs, so libraries wishing to provide a link for a “canned search” can do so and embed these links where they desire. The Summon

search box can be embedded in external webpages, such as a course management system, a LibGuide, and so on; should the library choose, such search boxes can be prescoped to particular facets or limits appropriate to, for example, a particular LibGuide subject guide. A search-box-creation widget works in conjunction with the Summon administration module to help libraries create different Summon search boxes. Searching of Additional Remote Resources Serials Solutions offers a federated search product, known as the 360 Search service, which combines features of the earlier 360 Search and the WebFeat federated search products. While it would be technically possible to create a widget and use the associated 360 Search API to pass on a query entered into Summon to a 360 Search, no current Summon customers have looked to integrate traditional federated search components with the Summon service. Summon is strongly focused on providing all content through the preaggregated, unified single index. Traditional federated search products are not the focus of this issue of Library Technology Reports.

Vendor Perspective: Summon it is the only way to provide the search and discovery experience that researchers and students require—subsecond response time, relevancy ranked results, unbiased content (i.e., it does not matter who the content provider is—all content is treated the same). The result is that the Summon service does not rely upon federated search to augment the results. We are continually developing and advancing the Summon service by adding content, by adding features to make it an even better search and discovery platform for library researchers, and by adding elements to make it even easier to use for librarians. For example, we recently added deep integration with Web of Science to include citation counts both as a part of relevancy calculations (articles with lots of citations get “bumped”) and for display and navigation. We also added a metadata mapping tool so librarians can map their MARC and Dublin Core metadata to the Summon schema in a way that maximizes the way it is searched and displayed in the Summon service. And the results show us that the Summon service is filling a need: we recently passed our 100th customer mark in just over one year. This is truly a testament to how the Summon service is reaching the audacious goal that we set for ourselves and is helping libraries around the world stay relevant by providing a place for researchers to search and discover valuable library content.

Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

Serials Solutions conducted a variety of in-depth studies into the research habits of today’s researchers and students. Among the revelations was that the library is widely acknowledged to have superior content, but getting to what the researcher wants is daunting. In fact, we discovered the most confusing part of the research process was the lack of a clear starting point for searching library collections. Because of this, students and researchers were abandoning the library and its valuable resources. We envisioned leveraging technology to bring students and other researchers back to the library by creating a superior search and discovery experience . . . one that mimicked the simplicity and familiarity of open web searching. Meeting this audacious goal required building from the ground up rather than repurposing existing search structures, and Serials Solutions committed to this. During initial planning for Summon, we quickly realized that the only way to provide this superior search and discovery experience was to create a single unified index of the library content, and this is at the core of the Summon service. The Summon index unifies the full breadth of the library content including library catalog records, institutional repository items, subscribed electronic content such as journal articles, ebooks, newspapers, dissertations, and more. To date, the index has over 6,800 publishers participating that provide more than 94,000 journals. We built all of this into the single unified index because

27

Chapter x

Figure 19 Summon shopping cart

Library Technology Reports  alatechsource.org  January 2011

Database Recommender

28

A Database Recommender feature (figure 20) was unveiled in early 2010. Tuned to and depending on a user’s particular search, this feature recommends databases subscribed to by the library. This feature works in conjunction with trigger words entered in a Summon search and also applies intelligent reading of the content provided in the results set. Serials Solutions notes that this feature “showcases sources that don’t lend themselves to be indexed by any service—such as dynamic or statistical databases—but make the library so well fitted to its academic community.”1 Database recommendations are provided along with the other retrieved results; clicking on the database title will take users to the database, where they can conduct a search on their topic. Database recommendations are not returned for all searches; for example, a search on the broad topic engineering may return the IEEE Electronic Library Online database as a recommendation. A search on aerodynamics may not provide a database recommendation. This service holds potential to connect users to underutilized or underexposed resources. Particular local collections, such as a library’s digital collection, can also be spotlighted to appear in an initial set of results for particular searches. Statistics Various statistics are available with Summon through its statistical reporter, known as Summon Analytics (an interface familiar to existing users of Google Analytics). Report examples include, but are not limited to, the number of session visits and searches conducted; average searches per session visit; queries (all queries, both popular ones and those returning fewer results); IP address and geolocation reports, providing insights into where usage originates from; and browsers and platforms used to access the service. In addition, facet reports are available, showing how often Web Scale Discovery Services  Jason Vaughan

Figure 20 Summon Database Recommender

refinement limiters and facets (and choices within facet categories, e.g., content types) are being used. Reports are customizable by date range, and graphing options visually chart much of the data. Reports can be downloaded into various external programs (such as Excel) for additional analysis. Serials Solutions offers an optional, additional product, 360 Counter, which provides additional information of a library’s e-resource usage.

Figure 21 Summon mobile view

Mobile Interface A mobile, Web-based interface (figure 21) exists for Summon and is compatible with browsers present on various smartphone platforms. Results are formatted to fit the device interface, and a link is present should the user wish to access the full regular Web interface. Users can refine results, mark items of interest, and access this list in a shopping cart fashion. Users can e-mail a list of records to their e-mail account. Initial realtime status checks do not occur in the mobile interface, though users can click on an Availability link to gather this information, at which point they leave the Summon interface and are taken to the underlying catalog.

Upcoming Directions As noted in the overview to this chapter, Serials Solutions provides updates and enhancements approximately every three to four weeks. When asked about upcoming directions, the vendor highlighted some recently released features. At the time of this writing,

Chapter x

Vendor Website Serials Solutions Summon

www.serialssolutions.com/summon

Example Implementations Dartmouth College

www.dartmouth.edu/~library/home/find/summon

Drexel University

www.library.drexel.edu

University of Calgary http://library.ucalgary.ca

Western Michigan University

http://wmich.summon.serialssolutions.com major enhancements since ALA Annual Meeting 2010 include new facets (contextual catalog-focused facets and discipline-specific facets) and a Summon Customizer (a Web-based administrative console). Through the Summon Customizer, libraries can integrate custom HTML (including library style sheets) into the header

and footer sections of the interface and incorporate widgets. Any third-party widget can be integrated into the interface, and given the open API-based nature of the Summon platform, broad latitude exists with the integration of widgets into custom-designed interfaces. Serials Solutions recently introduced Web of Science citation counts integrated on the Summon screen (for libraries subscribing to Web of Science); citation counts also factor into relevancy, with a high citation count boosting relevance. A metadata mapping tool allowing direct MARC and Dublin Core record mapping was also released, helping libraries with custom mapping of local collections, allowing staff to make and preview changes while refining the mapping. Serials Solutions has investigated including LibraryThing content (such as tags and reviews) into appropriate content (primarily books) from the Summon index; such tags and reviews may be included in a future Summon release.

Note 1. Serials Solutions, “Summon Service Debuts Database Recommender.” news release, March 17, 2010, www.serialssolutions.com/news-detail/ summon-service-debuts-database-recommender.

Library Technology Reports  alatechsource.org  January 2011

Web Scale Discovery Services  Jason Vaughan

29

Chapter 4

Ebsco Discovery Services

Abstract

Library Technology Reports  alatechsource.org  January 2011

Ebsco Discovery Services (EDS) represents the first of the trio of web scale discovery services debuting in 2010. Built off the established EBSCOhost platform, EDS extends the platform into the web scale discovery space through a preharvested, centralized index encompassing content sourced from Ebsco databases and beyond. This chapter provides a brief history, overview, and a few insights into the future development path of EDS, describes the local and remote content associated with the EDS index, and highlights some of the features, functionality, and flexibility associated with the EDS interface.

30

Overview Ebsco began development of Ebsco Discovery Service (EDS) in 2008. Public announcement occurred in spring 2009, and after a beta period concluding later that year, public release occurred in early 2010. At the time of this writing, late summer 2010, approximately twenty-five customers have gone live with an EDS implementation. Ebsco indicates that new customers can generally be set up and ready to go live within eight to ten weeks. EDS is based in large part on the infrastructure and interface associated with the popular EBSCOhost platform, which debuted around 1994. Early EDS development partners were generally academic customers, though other library types, including at least one public library, have more recently begun trials of the service. EDS is offered as a hosted platform; no local installation options are available. When initially released, EDS required user authentication prior to conducting a search. In mid2010, Ebsco released a Guest Mode option providing Web Scale Discovery Services  Jason Vaughan

unauthenticated users with some limited search capabilities. The annual subscription pricing model relies primarily on the institution’s full-time equivalent (FTE) count and level of service desired. Level of service can include factors such as the number and types of local library resources harvested and indexed (such as local digital collections and institutional repositories). Multiyear and consortial discounts are available. Ebsco provides underlying application or interface updates to portions of EDS approximately every three months. Ebsco telephone customer support is available 24/7 Monday through Friday and for reduced hours on the weekend; in addition, customers can report issues through Ebsco’s website or via e-mail.

Content and Scope Publisher Content At time of writing, the base index underlying the EDS service includes content from nearly 20,000 providers, in addition to metadata drawn from tens of thousands of book publishers. This base index presently includes metadata for more than 45,000 journals, more than 800,000 CDs/DVDs, nearly six million books, and more than one hundred million newspaper articles; this base index is searchable by all EDS customers. EDS includes items from several open-access repositories, such as materials from the DOAJ (Directory of Open Access Journals), OAISTER, and arXiv.org e-Prints. Ebsco creates a unique index for each EDS customer, which includes local harvested content (ILS catalog records, digital collections, etc.), the base index content, and additional content pulled from Ebsco-sourced databases for which the library has a

Chapter x

Local Resources EDS can harvest local collections, such as ILS catalog records, digital collections, and institutional repositories based on various underlying schema, such as MARC, Dublin Core, XML, and EAD. EDS utilizes various harvesting and delivery mechanisms, such as OAI-PMH and FTP. For local collections, there is no mandatory minimal set of field information required; relevancy ranking (discussed below) is, to a degree, dynamic, depending on the resource type and the level

of metadata present. Metadata associated with locally harvested collections is mapped and transformed to an underlying EBSCOhost schema. Depending on library need, updates can be harvested on a daily basis if necessary. As noted above, the price of the EDS service is partially dependent on the count of harvested local materials. Relevancy By default, an EDS search is a keyword search against full text and fielded metadata, with results returned ranked by relevancy. Ebsco indicates EDS can load records having different levels of metadata and tunes relevancy appropriately based on the data source. For example, relevancy for records with more replete metadata would take into account items like subject, author-supplied keywords, and title, while a record with less fielded metadata may have full text weighted more heavily; relevancy is then normalized across these different data types. Factors such as term or document frequency calculations, which field a word appears in, and the uniqueness of the word in the overall index play a role in relevancy determination, as do other factors, such as currency, number of times cited, type of document, and so on. Subject headings from controlled vocabularies, titles, author keywords, abstract keywords, and full-text keywords also play a role. A detailed overview of EDS’s relevancy ranking is freely accessible on the open Web at the Ebsco support website. The library does not have influence over the relevancy algorithms nor the ability to promote certain items in a given results list. Records are deduped in the index based on analysis of the various citation fields and matching algorithms developed by Ebsco.

How does the EBSCOhost search engine determine relevancy ranking? http://support.ebscohost.com/knowledge_base/detail .php?id=3971

In instances where content related to a single unique item is sourced from multiple providers, Ebsco creates a composite record, combining what’s considered the strongest metadata and indexing from each source in the creation of this composite or superrecord. Such records can include the subject headings from each content provider. For example, a particular article could be crossdisciplinary, and as such, may have different subject headings from different providers (each more closely aligned with users in that particular discipline), which are combined in this merged record. In 2011, a level of FRBRization for book records is expected, in the sense that available editions or manifestations of a unique item will be presented Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

subscription. For example, an EDS customer whose library maintains a strong EBSCOhost-based collection may find the number of included journals in the central index could rise to over 80,000. Ebsco indicates the central index encompasses metadata from hundreds of thousands of sources from more than 20,000 providers. Ebsco has agreements with a large array of providers. Examples include Alexander Street Press, Association for Computing Machinery, Cambridge University Press, Emerald, IEEE, Ingenta, LexisNexis, NewsBank, Oxford University Press, Readex, Sage, Springer Science+Business Media, Taylor and Francis, Wiley, and Elsevier. In mid-2010, Ebsco announced a partnership with Thomson Reuters, providing access via EDS to Web of Science data. Given that Ebsco has been an established aggregator of content for years (nearly 300 full-text and secondary research databases) and that it leverages this established content and existing detailed indexing within EDS, the company describes its metadata as deep and detailed. This rich metadata often includes elements such as author-supplied abstracts and keywords for many major publishers, such as those referenced above. Ebsco notes that controlled vocabulary subject headings can be searched for content found in many EBSCOhost databases and that in general, an EDS search includes a search of the indexing, abstracts, and full text of EBSCOhost full-text databases for which the library maintains a subscription (including resources such as the popular Academic Search database). Ebsco indicates an overall broad level of full-text searching within content, dependent in part on publisher agreements in place. With Ebsco’s recent acquisition of NetLibrary, full-text searching of these e-books is provided from the EDS interface (presuming the library has this e-book subscription). In a typical configuration, EDS works with an institution’s proxy server to provision off-site access. For resolution to the full text, the institution’s link resolver is used, whether Ebsco’s LinkSource link resolver or another link resolver typically used by libraries. Given that Ebsco is also a content provider, resolution to the full text often does not require a link resolver step, depending on the amount of Ebsco-sourced content the library subscribes to.

31

Chapter x

in a single record view under the heading Available Formats/Editions.

Interface Features: Overview, Results, and Navigation

Library Technology Reports  alatechsource.org  January 2011

General

32

EDS offers a template that libraries can customize to their local environment. Within the EBSCOhost administrative interface, the library can customize various branding elements, such as colors and logos, and specify some layout details, such as the positions of logos. Libraries can choose to have a custom toolbar, which appears at the top of the interface, and can include and name library-specified hyperlinks appearing in the toolbar. Libraries can choose which elements appear in the toolbar, such as Sign In, Folder, Language, and can customize the names of the text labels. In addition, libraries can provide custom text to appear at the bottom of the interface, and choose which screens will include such custom bottom branding. Ebsco provides an EBSCOhost integrated toolkit, an API utilizing SOAP/REST Web services. At present, customers do not have direct access to the style sheets that control the layout, the effect of mouseovers, and so on. At the time of this writing, the interface is available in dozens of European languages, and by the end of 2010, it was expected to be available in dozens of Asian and other non-European languages. By default, EDS offers a single search box (figure 22), with a Search Options hyperlink below the box providing more choices (figure 23). These choices include a Search Modes category with radio buttons including Boolean/Phrase, Find All My Search Terms, Find Any of My Search Terms, and so on. A SmartText Searching option allows users to enter long strings of text, which are summarized by EDS into search terms then applied against the search; Ebsco indicates that this search feature works with some database content incorporated into the EDS experience. In addition, users can indicate if they wish to display only results with linked full text, or scholarly or peer-reviewed journals. Users can search by title and author and indicate a publication date range. Several of these options are also available from within the search refinement pane, described shortly, once search results are returned. Users also have the option to enter advanced search or visual search modes. The advanced search mode provides the ability to conduct fielded searches and the use of Boolean operators via pull-down menus. The visual search mode offers another method for users to refine searches; as a user clicks on subheadings, a visual display maps the relationship between the original results set through to the final refined items. Visual search returns up to the top 250 query results.

Web Scale Discovery Services  Jason Vaughan

Figure 22 EDS single search box

Figure 23 EDS advanced search

Returned Results—Brief View Once a search is conducted, the full interface is displayed. In a typical installation, the full interface is divided into a large central section bounded by one or two vertical panes; functionality offered within these panes is described shortly. For all content, by default, items are returned ranked by relevancy; other sort options are chosen by a pull-down menu and include date descending, author, and title. The user can define various layout options and other parameters via a Page Options link in the header area (figure 24), and the local library can define which layout option is the default. The majority of the EDS interface is dedicated to presenting results from a search. Each content type, such as journal articles, newspapers, and so on, has a unique icon. Book cover images, sourced from Baker and Taylor, are provided for all EDS customers. If a library subscribes to additional enrichment information (from Baker and Taylor or another provider), such content can be included in the EDS experience. For books (figure 25), typical information provided includes title, author, publication information, page count, size, and

Chapter x

Figure 25 EDS brief results: book

Figure 26 EDS brief results: article

language. Subject headings are provided, and a realtime status call to the library catalog is made at time of search, which provides call number, status, and location information, as well as a link to the full catalog record. This link, when clicked, pulls up the record in the native ILS interface in another browser window or tab. For journal articles and similar content (figure 26), typical information provided depends on the level of metadata information and may include common elements such as title, author, publication, volume and issue, and page numbers. For many articles, subject headings, the first few lines of abstracts, and thumbnail images of figures or tables within the article are also provided, depending on the source of content. Full-text availability for articles and similar content types is indicated through an icon and a link indicating the full text

in PDF or HTML; clicking on this link takes the user immediately to the full text, as described below. For other items with full-text availability, an icon and the link Check the Library Linkresolver for Full Text may be displayed, or, for some content providers, another message (e.g., for NewsBank-sourced content, Browse This Newspaper Title at NewsBank.). Which options exist depends on the source of content and the rights Ebsco has to display the full-text content natively within the EDS interface. For all item types, a magnifying glass icon appears to the right of the title. Mousing over this icon invokes a callout display (figure 27) providing much of the same information; in addition, it lists the database housing the record (such as the library catalog or a particular subject database) and, for particular item types, such as journal articles, a more complete abstract.

Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

Figure 24 EDS page layout options

33

Chapter x

Figure 27 EDS magnifying glass callout

Library Technology Reports  alatechsource.org  January 2011

Faceted Navigation and Search Refinement

34

EDS offers various search refinement methods, including faceted navigation. By default, the left-screen pane is used to refine results (figure 28); the right-side pane is used to incorporate the optional EBSCOhost Integrated Search federated search component and widgets of additional functionality, described shortly. At the top of the refinement pane, users can choose to limit to full text, scholarly peer-reviewed journals, or the library catalog only. At the time of this writing, Ebsco is engaged in the classification of peer-review status for journals and expects that by the end of 2010, over 100,000 journals will have been classified. Users can also define a publication date range through a slider or by typing in boundary years. The majority of the refinement pane houses facet categories. Naturally, facets are dynamic according to the searches; not all searches will necessarily yield the same facet categories. For each facet category, several facet choices are displayed, and a Show More option invokes a callout that presents additional choices for that category. Facet choices have check boxes; thus, a user can limit to multiple facet refinements within each category. A breadcrumb refinement trail exists in that facet selections and limits applied by the user are indicated near the top of the refinement pane and can be x-ed out by users if they wish to easily remove the limits and expand their results. A Show More link underneath this top refinement section invokes the search options advanced search mode, described above. By default, the top facet group is the Source Type category, which includes choices such as All Results, Academic Journals, Periodicals, Dissertations/Theses, Books/Monographs, and so on. The Subject category appears next, allowing for limiting results with one Web Scale Discovery Services  Jason Vaughan

or more specific subject terms. Other examples of facet categories that may be presented for a particular search include Publication Title (such as newspaper or journal title), Author, Location (such as a library branch), and Content Provider. This last category provides a listing of databases that have content matching the query; this list also includes the library’s catalog as a content provider source. The only facet category that indicates the num- Figure 28 ber of matches next to EDS refinement pane each choice is Content Provider; the number of matches in each database is indicated in parenthesis. Ebsco plans to introduce hit counts for other facet categories in 2011. Returned Results—Detail View In the brief results, clicking on an item title retrieves a detailed citation record view within the same browser window, though part of the interface changes (as there is no longer a need for facets, etc., at the single-item level). For physical books held within the library (figure 29), the detail view provides typical citation information (title, author, source, etc.) and, if available, an enlarged book cover image. Status and holdings information similar to that provided in the brief results view

Chapter x

Figure 29 EDS detail view: book

Figure 31 EDS integrated PDF viewer

Note that the above description assumes a user is authenticated, usually behind the scenes via a campus IP address, or possibly through explicitly logging in through some other mechanism, such as a proxy server. As mentioned earlier, Ebsco released a guest access mode in mid-2010. This mode allows an unauthenticated user (such as someone from off campus who has not authenticated via a proxy server) to conduct basic searching of the library’s EDS instance. In this unauthenticated state, citation information will be provided for a wealth of resources, but some citation information is not available or allowed for unauthenticated views, and the user will get a message: “This result from [content source] cannot be displayed to guests. Login for full access.” Exporting Options, Shopping Carts, RSS Feeds EDS offers a variety of export options (figure 32), Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

(call number, location, status) is provided in a pane to the left of the main screen content, as is a link to the native catalog record through the ILS interface (which, when clicked, opens up the native catalog in a new window or tab. For articles and similar content, the detailed record view (figure 30) provides similar information to that displayed through the magnifying glass callout in the brief record view (i.e., title, author, source, subject terms, geographic terms, abstract, etc.). As described above, for full-text article content, an icon and link indicate full-text availability. As Ebsco is a provider of full-text content through many of its EBSCOhost databases, much of this EBSCOhost full-text content is displayed natively within the EDS interface, depending on local library Ebsco subscriptions. This display occurs when a user clicks on the PDF or HTML fulltext indicator. In both cases, the user stays within the EDS interface wrapper. For PDF records, EDS uses the integrated Ebsco PDF viewer (figure 31). Within this view, the full-text content displays in the majority of the screen, while the left-hand pane allows users to see other articles within the same issue as the full-text item they are viewing and to navigate to other issues of the same journal. A large portion of the PDFs are in native PDF format; users can search the full-text item for a term. For items for which a PDF or HTML link does not exist, a link to browse the full-text content in the native interface may be provided; in such cases, a single click will return the full text. In other cases, a link to check the library’s link resolver is provided, and, depending on how the local library has its resolver configured, the user may be immediately taken to the content or to a menu choice of content providers.

Figure 30 EDS detail view: article

35

Library Technology Reports  alatechsource.org  January 2011

Chapter x

36

with printing, e-mailing, saving, and exporting to a citation management program all supported. A half dozen citation formats are provided, and the user can export to popular management programs, such as EndNote, ProCite, RefWorks, and BibTeX. From either the main interface (the screen with the initial list of returned results) or the detailed citation view of an individual item, users can place items into a folder (i.e., the item is marked for some future action by the researcher). Figure 32 On the main brief results EDS export options page as well as at the detailed view, a folder icon appears at the top; a user can click on this icon to see a shopping cart–style view of all marked items (figure 33) and select which items to print, e-mail, save, or export into a citation management program (figure 34). In addition, in the detailed view, presented within a right-side pane is an inventory of current items the user has placed in the folder (marked). Carrying over functionality from the EBSCOhost platform, EDS allows users to create an Ebsco username and password account (or use an existing EBSCOhost account if they already have one), which allows them to store marked items, saved searches, and alerts. Users can set up alerts to automatically inform them via e-mail of new information; such alerts can be set up to periodically run a saved search or to alert users when a new issue for a selected journal title becomes available. Users can define some parameters, such as how frequently a particular search is run (e.g., daily, weekly, or monthly). Alternatively, users can set up alerts for a predefined search or new journal publication as an RSS feed. The personal account also allows the user to set preferences that provide partial control for some features, including how results are displayed (such as number of results per page) and export preferences. These preferences will be applied for subsequent EDS sessions, assuming the user logs in with his or her username and password at the next session.

Additional Features Did You Mean? Spelling Suggestions EDS offers Did You Mean? functionality to address misspelled words. For a search term not matching any entry in the index, the user is returned to the initial Web Scale Discovery Services  Jason Vaughan

Figure 33 EDS shopping cart

Figure 34 EDS citation export

search box with the indication No Results Were Found and a Results May Be Available For . . . statement with one or more hyperlinked term suggestions; clicking on the linked term executes a search against that term. Embedding in Other Online Venues EDS uses persistent URLs, so libraries wishing to provide a link for a “canned search” can do so and embed such links on different webpages. The persistent URL for a single item is displayed as part of the information in each detailed citation-view record. Persistent URLs to help define a canned search can be obtained by running the search and clicking on the Alert/Save/Share link near the top of the brief results screen. Bookmarks to EDS can be pushed to social networking/bookmarking sites, such as Digg, Facebook, and so on. In addition, a Search Box Builder tool helps libraries embed

Chapter x

Vendor Website

Ebsco Discovery Services

www.ebscohost.com/discovery

Example Implementations James Madison University www.lib.jmu.edu

Mississippi State University http://library.msstate.edu

Northeastern University www.lib.neu.edu

University of Oklahoma http://libraries.ou.edu

the EDS simple search box on other webpages, such as the libraries’ home page or institutional course management system. Should the library wish, it can use this tool to preconfigure limits for searches in this embedded search box; for example, a search box embedded within an engineering subject library guide could automatically be set to limit searches to materials available within an engineering collection, published after a certain date, in a certain format, and so on. Searching of Additional Remote Resources

Widgets Widgets (figure 35), optional pieces of additional functionality, also appear in the right-side pane. Widgets are another avenue of pulling external content into the EDS experience. Examples of widgets could include integration of other library tools, such as LibGuides subject guides and blogs. Other examples include Flickr image, Google Books, and Wikipedia search functionality widgets and incorporation of an online chat client, such as Meebo. Some external sources, such as LibGuides, provide information helpful to widget creation; Ebsco has examples of widgets and their associated code at its support website. Widgets can be added to the interface using custom HTML code or as an iframe URL.

Statistics EBSCOadmin, the administrative module, provides, among other things, statistics related to EDS (and is the same module used for the EBSCOhost platform). For EBSCOhost databases within EDS, statistics available include number of sessions and searches (by month and database), number of abstract views, and number of full-text article requests (by month and journal). Statistics are customizable by a monthly date range. Statistical reports and graphs can be generated and displayed by date (month), by database, and by total searches. Statistical reports can be sorted and exported in a variety of formats. Mobile Interface EDS can utilize the out-of-the-box EBSCOhost Webbased mobile interface optimized for various smartphone platforms (figure 36). For EDS customers, this interface searches the entire centralized index. Users can conduct a search, apply limiters, see results, select an item, and ultimately choose to access the full text or e-mail items. Some EDS features available in the full Web version are not currently supported in the mobile interface, such as automatic real-time status calls for ILS catalog holdings. At time of the writing, Ebsco was working on an EDS application.

Upcoming Directions While limited details can be shared to the broad public, Ebsco provided some directions that the company is currently exploring. Ebsco plans to continue expanding the multilanguage support of the EBSCOhost/EDS platform, including support for displaying Chinese, Japanese, and Korean characters. Ebsco is exploring Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

As mentioned above, a right-screen pane is provided in the EDS interface, focusing on optional functionality the library may wish to incorporate into the EDS experience. Ebsco’s optional federated search product, EBSCOhost Integrated Search, can be incorporated within this pane and allows for searching of additional sources whose content is not available or currently licensed to be preharvested and preindexed within EDS’s central index. Traditional federated search products are not the focus of this issue of Library Technology Reports.

Figure 36 EDS mobile view

Figure 35 EDS widgets

37

Chapter x

Vendor Perspective: EBSCO Discovery Service

Library Technology Reports  alatechsource.org  January 2011

The goal of discovery is to allow libraries to maximize the visibility, value and usage of their overall collections. Through EBSCO Discovery Service™ (EDS) library collections are more accessible, allowing users to gain the most benefit out of the information the library makes available. EBSCO Discovery Service creates a unified, customized index of an institution’s information resources, and an easy, yet powerful means of accessing all of that content from a single search box. These custom solutions are created by harvesting metadata from both internal (library) and external (vendors) sources, and creating a pre-indexed service of unprecedented size and speed. In addition to the most comprehensive and robust collection of metadata from the best content sources, EDS also provides full indexing for EBSCOhost® databases and other partner databases including: Web of Science, Oxford University Press, Baker & Taylor, NewsBank, Readex, LexisNexis, Alexander Street Press and more. Not only does EDS search the most inclusive set of metadata, but superior relationships and licenses with academic publishers make EDS the most comprehensive service for searching the complete full text of journal articles and other sources—offering a truly integrated one-stop search experience for all of a library’s journals, magazines, books, special collections, OPAC and more. By leveraging the fast and familiar EBSCOhost platform, EDS offers a single interface for discovery and powerful features to heighten the research experience—everything the researcher needs in one place. When a library uses EDS on EBSCOhost, consistent

38

a database advisor capability, suggesting databases the user may wish to search based on the query. Enhancements to the mobile interface will continue, and development of additional widgets that can be embedded in the EDS interface is expected. Ebsco is looking to expand information provided for monographs through incorporation of additional enrichment information (such as reviews) and through

Web Scale Discovery Services  Jason Vaughan

searching does not end with the result list. EBSCO Discovery Service offers a full-featured experience for users to remain within the structure of EBSCOhost—limiters and expanders, alerts, email, print, export, citations, bookmarkable URLs, persistent links, RSS, etc. In fact, EBSCOhost features are available for any applicable database, from any vendor including such features as: EBSCOhost basic and advanced search, screen functionality, subject clustering (facets), publication clustering (facets), sorting results by relevancy or date, date slider limiter, adding to folders and custom links. With EDS, EBSCO builds on the strength of the interface to serve the end user throughout their search, not just to find results based on a simple thin metadata search and then send searchers off on their own. Another element that makes EDS different is the level of customizability that is available. The goal is to put the power and control in the hands of the customer, and have the “EBSCO” name take a backseat. EBSCO Discovery Service offers an unprecedented level of customization to the interface for prominent logo placement, interface colors, naming of the service itself, tool bar customization, etc.—all in such a way that can easily dovetail with university marketing/branding efforts. EBSCO Discovery Service allows sites to set up widgets (e.g., Library Guides, etc.) directly on the result page, as well as export bits of functionality from the EBSCO experience to be used in other sections of a university’s website. With these options, the power of EBSCOhost remains but the look and feel of discovery becomes closely associated with the institution’s website and identity. provision of a level of FRBRization for book records in the sense that available editions, manifestations, and formats of a unique item will be presented in a single record view. In addition, Ebsco is exploring audiobook download capabilities. Ebsco expects to expand support for consortia-based catalogs or catalogs for libraries having close working relationships.

Chapter 5

Ex Libris Primo Central

Abstract Publicly released in mid-2010, Primo Central extends the Primo next generation discovery layer, released by Ex Libris several years earlier. This chapter provides a brief history, overview, and a few insights into the future development path of Primo Central, describes the local and remote content associated with Primo Central, and highlights some of the features, functionality, and flexibility associated with the Primo Central interface.

Overview

Content and Scope Publisher Content At the time of this writing, the hosted and centrally managed Primo Central index numbers approximately Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

Ex Libris began development of its next-generation discovery layer, Primo, in 2005, with official public release occurring in 2007; Primo version 3 was released in spring 2010. Hundreds of libraries worldwide have implemented Primo. The Primo discovery platform harvests and indexes local library collections, such as bibliographic records, digital collection materials, and items within institutional repositories, and provides a common interface for discovery of these materials. In addition, Primo can be configured to search remote repository indexes and blend the library’s local collections with the remote index results. Primo Central, Ex Libris’s Web scale discovery component, was officially released in mid-2010. Primo Central extends the base Primo discovery experience by also searching a large preharvested central index of article-level content from a variety of publishers and aggregators. For the remainder of this chapter, Primo Central and Primo will be used interchangeably; many of the points discussed apply equally to the Primo next-generation discovery layer with or without the Primo Central service. Given that Primo Central is an extension of Primo, the

interface and many of the features are the same. At the time of this writing, late summer 2010, approximately fifty customers have signed on as subscribers to the Primo Central service, with several customers already live on Primo Central. While academic libraries make up the large proportion of customers, public library customers are also present and will likely become more numerous in the future. For example, Ex Libris recently announced that the National Library of Finland, representing various library types, research institutes, archives, and museums, had chosen the Primo/Primo Central platform. The Primo discovery layer can be hosted by Ex Libris or the local library; in either case, the central, preaggregated index associated with Primo Central is offered as a managed service and hosted by Ex Libris in a cloud environment. Primo Central is offered as a subscription service. Pricing considerations for Primo include whether an ExLibris hosted or locally installed instance is chosen, the institution’s full-time equivalent (FTE) student count, and the number of local records (such as local library catalog and digital collections records) harvested into the system. Consortial discounts are possible. Ex Libris provides minor application and interface enhancements approximately every three months; major release updates occur approximately every fourteen to sixteen months. Ex Libris customer support is available 24/7, and a variety of communication options are supported.

39

Chapter x

300 million items obtained from primary and secondary publishers and aggregators as well as open-access information repositories. Some notable examples include content sourced from Accessible Archives, the Association of Computing Machinery, BioOne, ebrary, Gale, IGI Global, LexisNexis, Oxford University Press, Springer, Web of Science (Thompson Reuters), and Wiley-Blackwell. A pilot project with Elsevier was slated to begin in fall 2010. A content focus for the Primo Central index has been scholarly journals, though e-books, newspaper articles, and reviews are also incorporated into the index. Open-access materials from sources such as the arXiv.org e-Prints, Hindawi Publishing , DOAJ (Directory of Open Access Journals), and the extensive HathiTrust materials are either already incorporated into the Primo Central service or on the road map for inclusion. Primo Central does not yet index the full text of major e-book content providers, though it may in the future. Ex Libris seeks to negotiate and obtain full text from content providers when possible, and in such cases, the full text is indexed and available for search in the Primo Central index. Ex Libris indicates that in most cases, it has at a minimum the abstracts, if not the full text, for content. Ex Libris has additional information about its Primo Central Publisher Program on its website. As with other vendors, the central index continues to grow as additional content agreements are pursued and finalized. Ex Libris utilizes automated processes allowing new content to be added and indexed quickly. While different content providers provide new content on a variable basis, Ex Libris indicates that on average, updates occur weekly.

Library Technology Reports  alatechsource.org  January 2011

Primo Central Publisher Program

40

www.exlibrisgroup.com/category/PublisherProgram

Primo does not require offsite users to authenticate to be able to search the service, though the local library is in control of authentication requirements and at what point a user is asked to authenticate—typically either before the user conducts an initial search or when the user tries to retrieve a full-text item. In a typical configuration, Primo works with an institution’s proxy server or other authentication method to provide offsite access and can interface with a single sign-on solution; Primo interfaces with all common link resolvers (including Ex Libris’s own link resolver, SFX) to broker access to library-licensed full-text content. Local Resources As mentioned in the overview to this chapter, harvesting of local collections is a feature of the Primo discovery layer, and by extension, a Primo Central customer Web Scale Discovery Services  Jason Vaughan

simultaneously searches both the local collection index and the Primo Central index, with the results blended together and presented to the end user. The local Primo index can incorporate local library collections, such as records from the library’s ILS catalog, digital collections, and institutional repositories. Existing pipes and connectors exist, enabling harvesting of local resources, including pipes to major ILS platforms and other information repositories typically used at libraries or their parent institutions, such as ArchivalWare, bepress, CONTENTdm, Digital Commons, DigiTool, DSpace, Fedora, and Luna, as well as other institution-specific repositories (such as the Television News Archive at Vanderbilt University). Primo can ingest content utilizing various schema, including MARC/MARC XML, Dublin Core, and EAD. With the release of Primo version 3 in April 2010, Ex Libris indicates that basically any kind of structured XML can be accommodated with the Primo experience. Customers can define custom harvesting rules and determine how often harvest updates occur (daily if necessary), and Primo supports various harvesting and delivery methods, including OAI-PMH and FTP. Harvested content is normalized into an underlying Primo record format. Relevancy By default, a Primo search is a keyword search conducted across both metadata and full text, with items returned ranked by relevancy. Primo’s proprietary relevancy-ranking algorithm includes but is not limited to factors such as term frequency, field weighting, number of times a record has been accessed, and currency. Peer-review status is taken into account as part of the relevancy of an item when the Peer-Reviewed Journals facet is employed. Libraries can choose to define boosting metrics for the relevancy-ranking algorithm based on metadata normalization rules; choosing to have an item boosted places that item higher in the relevancy ranking. Boosting rules include but are not limited to such mechanisms as setting the importance of particular record fields. In addition, Primo allows for boosting by synonym. Typically, a record that contains a synonym to the search term is ranked below a similar record that contains the search term; the degree to which the record that contains the synonym should be pushed down is configurable by the library. If libraries subscribe to content-enrichment services, this data, such as table of contents, is incorporated as part of the relevancy ranking. Ex Libris regularly tunes the relevancy ranking for Primo. As content for a single unique item can be sourced from multiple content providers, Ex Libris utilizes a routine that “merges” duplicate records for the purpose of search and then groups the records when presented for display to the end user; the publisher record (if available) is used as the default record for display.

Chapter x

Vendor Website Ex Libris Primo Central

www.exlibrisgroup.com/category/PrimoCentral

Example Implementations (Note: Example implementations are listed in alphabetical order. Some implementations are more open to search by an external audience, based on configuration decisions at the local library level.)

Brigham Young University ScholarSearch www.lib.byu.edu (Note: Choose All-in-One Search)

Northwestern University

http://search.library.northwestern.edu

Vanderbilt University DiscoverLibrary http://discoverlibrary.vanderbilt.edu (Note: Choose Books, Media, and More)

Yonsei University (Korea) WiSearch: Articles + Library Holdings

http://library.yonsei.ac.kr/main/main.do (Note: Choose the Articles + Library Holdings link. The interface is available in both Korean and English; to change to English, select English at the top right of the screen after you have conducted a search and are within the Primo Central interface. The Yonsei University site is also an example of a site that has implemented the optional bX Recommender service)

Interface Features: Overview, Results, and Navigation Primo allows for extensive library customization of the interface. Customers can choose to use a basic template out of the box, or libraries with appropriate staffing and skill sets may tap into Primo Central’s available APIs and Web services layer and choose to essentially completely redesign or integrate a locally developed interface. Thus, there is no hard-and-fast description of the Primo interface, as it is quite flexible. For example, a library customer could choose to configure multiple views with different look-and-feel elements and facet categories and configure display defaults. Primo can be configured to define a search by location (such as a campus or branch), topical area of interest (such as a particular set of science databases), or particular material types (such as books or articles). Different elements can often be configured to appear as tabs or radio buttons or presented in pull-down menus. Many customers choose to build off a basic

Returned Results—Brief View Once a user has conducted a search, returned results occupy the majority of screen real estate; a refinement pane providing faceted navigation (described shortly) occupies the left side of the screen. By default, results are ranked and presented by relevancy; a library can choose which other sort options, and how many, are presented through a drop-down box (such as date, author, and title, as well as other fields within the Primo-based underlying record). In the returned results, icons exist for each item type (book, article, video, etc.). For books (figure 39), information provided includes title, author, and publication information (city, publisher, date). If the library subscribes to an enrichment service and a match exists, book cover images are provided in place of the book icon. Primo supports a variety of enrichment sources and providers, such as Syndetic Solutions, Content Café, and Amazon. For books, the library can configure which Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

General

template, and the description that follows applies to the basic features and look and feel of the Primo platform based on this basic template. As mentioned, Ex Libris offers a local library hosted option and several Ex Libris hosted options, and in part, the level of customization is dependent on which hosting package is chosen. Among other things, libraries can choose a color scheme, add their library logo and branding elements, and provide library-specified hyperlinks to other pages (such as the library homepage). Libraries can choose to develop different views within a single instance—for example, to give a different look and feel to a different member within a consortium or a different branch library at the same institution. Ex Libris provides various read/write APIs that are available within Ex Libris’s user support community, the EL Commons portal. The Primo interface currently offers local language support for approximately two dozen languages, including a broad mix of European and Asian languages. By default, Primo Central offers a single search box (figure 37). Once an initial search is executed, a returned results screen provides brief item results, detailed below. Users may also choose an advanced search option (figure 38). In both the basic and advanced search modes, users can choose particular indexes, usually presented through a pull-down menu. In a typical installation, the library may offer pulldown menus that allow the user to specify whether the record is an exact match for the terms, starts with the term, or contains the term somewhere within a specified field (such as title, author, subject, or user tag field). Advanced search also allows users to predefine parameters, such as publication date, material type, and language, or to search a particular collection or library. Such choices are defined by the local library.

41

Chapter x

Figure 37 Primo Central single search box

Figure 38 Primo Central advanced search

Library Technology Reports  alatechsource.org  January 2011

Figure 39 Primo Central brief result: book

42

Figure 40 Primo Central brief result: article

elements are included in a real-time status call to the underlying ILS. For example, a library can configure this real-time status call to indicate a status (such as Available in the Library, Not Available, Online Access, or Restricted Online Access), call number, and location information (such as collection title and library, branch, or floor). For multiple formats of the same item, a link in the brief record will appear: There Are X Versions of This Item. Clicking on this link will display the various versions, such as a hard-copy book and an e-book. For articles (figure 40), returned information includes article title, author(s), journal title, publication date, volume and issue, page numbers, and an indication of whether the publication is a peer-reviewed journal. In addition, the user is alerted to whether the full text is available. Regardless of content type, Primo offers several options within the brief record for obtaining more detailed information. Users can click on the brief record title or on one of several links appearing at the bottom of each brief record. Libraries can configure which links are available, set the order of the links, and name the link labels; since libraries can modify Primo’s cascading style sheets, other options can be provided as well. Typical links used in current Primo installations include Details, Reviews and Tags, Additional Services, and for items in the library’s ILS, Locations and Holdings List. For online resources, such as e-books and articles, Web Scale Discovery Services  Jason Vaughan

Figure 41 Primo Central facets

Chapter x

an Online Resource link is typically provided. Each of these links is also available in the detailed record view, should a user click on an item title directly, and are described in more detail below. By clicking on the item title in the brief record view, the user is typically taken to the full item information displayed on the full screen (similar to clicking the Details link and expanding the frame, described below). Alternatively, depending on the configuration chosen by the library, clicking on the title can take the user directly to the full-text resource. In addition to the links described below, other link options include Request, which allows a user to place a hold or document delivery request (directly from within the Primo interface or through linking to another system at the library, depending on the library’s local configuration and systems in place). If the library subscribes to Ex Libris’s bX Recommender service (described below), users can view such recommendations by clicking a Recommendations link. Faceted Navigation and Search Refinement

Figure 42 Primo Central detail view: book

Figure 43 Primo Central detail view: article

Journals). Additional facet categories include Creation Date (each choice providing a date range), Resource Type (books, maps, scores, etc.), LCC Classification, Genre (e.g., biography, electronic books, interviews, case studies), and Journal Title. In a typical configuration, the top five choices for each facet category will be displayed. If more choices exist for that category for a user’s given search, a Show X More link provides a list of more choices. For all facet categories, the number of corresponding items matching each facet choice refinement is shown in parentheses next to the choice. Returned Results—Detail View Clicking on an item title or one of the links within a brief record retrieves more information for the item. With the exception of clicking on the Holdings List link, the user stays within the full brief record results set. Upon clicking a link, a frame is opened, expanding the record and providing more information. The main user interface (search box, refinement pane, and other brief records) remain in view. A small icon in the frame allows the user to choose to display the frame in a full new browser tab. Clicking on the Locations link in the brief record expands the record to show library name, Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

Primo Central offers faceted navigation, and in a typical configuration, facet categories display within a Refine My Results pane along the left side of the screen (figure 41). Once one or more facets are selected, a Refined By header appears at the top of the results list, along with the facets, providing a breadcrumb trail, which allows the user the option of individually undoing each facet refinement to backtrack and expand the results. By default, Primo Central includes many predefined facet categories. In addition to the out-ofthe-box categories, libraries can define new facet categories. Libraries can choose which facet categories to display, customize the facet category labels, and control the order of the facet categories within the pane. A typical configuration includes a Show Only category, with choices such as Peer-Reviewed Journals, Online Resources, and Available. The Online Resources facet choice might typically include e-books, digital collections, streaming video, e-journals, and so on; the Available facet choice might include items such as e-books and physical books held at the local library. Other usual facet categories include Topic, Creator, and Collection (with choices such as a physical library collection—for example, Main Library—and Publisher Collections—for example, University of Chicago Press

43

Chapter x

Figure 44 Primo Central export options: send to menu

Library Technology Reports  alatechsource.org  January 2011

Figure 46 Primo Central e-shelf export options

44

Figure 45 Primo Central e-shelf shopping cart

location, call number information, and real-time status information (which the library can also configure to show automatically in the brief results view). Clicking on the Details link for a physical library holding, such as a book (figure 42), provides item information, such as subjects, format, related titles, and ISBN. It also provides a description one or more sentences long, which may include summaries and table of contents. It also allows for providing contextualized links to external resources, such as Amazon and WorldCat, allowing the user to connect to these external sites for more item information; clicking on one of these external links will open up a new browser window or tab with the item-level information from that external resource. For articles and similar content (figure 43), typical information retrieved from the Details link includes title, author, subjects, journal information (title, date of publication, volume and number, page numbers, peer-review status), and language. It may also include a description one or more sentences long. Primo supports user-generated tags, ratings, and reviews. The Reviews & Tags link provides usercontributed reviews and a tag list or tag cloud (assuming reviews and tags exist for that item). In a typical configuration, a library may have the user log in to the user’s Primo account (described below) to contribute tags, ratings, or reviews. The library can review submissions first, if it prefers. The Additional Services and Online Resources links invoke the library’s link resolver to broker the connection to the full text within the resulting frame or to otherwise provide a list of aggregators or content providers Web Scale Discovery Services  Jason Vaughan

providing the full text from which the user can then choose (depending on the local library’s link resolver configuration). For physical items harvested from the library catalog, selecting the Holdings List link opens a new browser window or tab. Because libraries can configure which links are displayed, a library may opt to not include a Holdings List link in either the brief or the detailed record view and instead display the real-time status information (status, call number, location) without the need for an extra click. Should a library choose to include the Holding List link, it can be configured to open up a framed view of the full item record—displayed within the native ILS— while still maintaining the user’s presence within the Primo experience (with the Primo search box and the various links pulling up additional information about the item still present). Exporting Options, Shopping Carts, RSS Feeds Primo offers a variety of export options that can be accessed via a variety of avenues. One avenue is through a Send To pull-down menu (figure 44) accessed from the detailed record view, or the brief record view (once a user has clicked on a link for more information), or the e-shelf shopping cart. Options include e-mailing, printing, and exporting items to citation management programs (such as EndNote or RefWorks) or social bookmarking sites (such as Connotea or Delicious). Also included is an Add to E-shelf option, which acts as a consolidated shopping cart basket (figure 45). Items can be marked—added to the e-shelf—by clicking on the star icon appearing next to each item title

Chapter x

or by choosing the Add to E-shelf option from the Send To pull-down menu. Unauthenticated guest users can retrieve their session-based e-shelf by clicking on the My Items link at the top of the interface. The e-shelf information lists each item and includes item type, author, title, and the date on which the user added the item to the e-shelf. Highlighting an item pulls up more detailed item information. From the e-shelf, users can mark which items they wish to export and how (figure 46). Users can also review a list of session queries from the e-shelf. Primo allows users to create a username and password account. The e-shelf provides additional functionality for signed-in users. They can save their e-shelf item list to access later, set default session preferences, tag and review items (if allowed by the library), and so on. Depending on local library configuration and other underlying systems in place, accounts may also be linked to or used to perform traditional ILS functions, such as requesting or recalling items, renewing books, reviewing fines, and making some updates to the user account (e-mail address, phone, etc.). Users can create folders to help manage items (for example, by topic). Users can automate future searches for information by configuring alerts and RSS feeds from their e-shelf. An alert automatically runs a query at selected future intervals and sends results via e-mail. As an alternative to configuring feeds from within the e-shelf, an RSS feed icon is present at the bottom of the left-side refinements pane.

Additional Features Did You Mean? Spelling Suggestions

Embedding in Other Resources Primo Central’s search box can be embedded in other online venues, such as library subject guides, course management systems, social networking sites such as a Facebook account, and so on. Primo utilizes persistent URLs, so “canned searches” with preconfigured

Figure 48 Primo Central mobile view

and defined search parameters can be constructed and embedded in other webpages. As mentioned above, users can push items to social bookmarking sites, such as Delicious. Searching of Additional Remote Resources While not the focus of this issue of Library Technology Reports, additional components are offered by Ex Libris to search additional publisher content not presently available within Primo Central (or otherwise allowed by publishers). These tools predate development of Primo Central, and include Primo Deep Search, which uses API frameworks to access and search remote resources, and MetaLib, Ex Libris’s federated search product, which uses metasearch standards such as Z39.50 to search remote information. These components can be incorporated into the Primo/Primo Central experience. Recommender Service In mid-2009, Ex Libris introduced an optional scholarly Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

Primo provides various measures to help prevent deadend searches. Alternate suggestions are offered with Did You Mean? functionality, which helps address spelling variations as well as misspelled words. A library can influence when Did You Mean? appears; for example, the library can configure the message to appear only when fifty or fewer results are returned for a search. At the bottom of the refinement pane in the brief results view, a facet category exists for Suggested New Searches. This choice provides various hyperlinked additional suggested searches, such as author and subject suggestions, with the suggestions (obviously) dynamic to the search. Clicking on any suggestion automatically executes that search.

Figure 47 Primo Central bX Recommender

45

Chapter x

recommender service, the bX Recommender service, which can be integrated into the Primo Central service. Users can view article-level recommendations by clicking on a Recommendations link within the brief or detailed item view. This link retrieves a list of recommended related items—“Users interested in this article also expressed an interest in the following”—generated from the analysis of extensive SFX link resolver usage logs. An analogy to this service is Amazon’s feature: “Customers who bought this item also bought . . . .” The bX Recommender service is offered as an on-demand, hosted service by Ex Libris. Statistics A variety of statistics are available through Primo’s administrative interface, using the open-source Eclipse BIRT (Business Intelligence and Reporting Tools) project report engine. Such statistics populate various

usage reports, including total number of searches per day, number of results per search, and top searches with no results. Statistics exist for facets, so designers can gather insights into which parts of the interface are being used most. System response statistics are available, and Primo provides access to several raw log files for additional processing and analysis. Many statistics are definable by date range, and various sort, graphing, and export options are available. Mobile Interface With Primo version 3, Ex Libris offers an out-of-thebox, Web-based mobile interface (figure 48) accessible on various smartphone platforms, formatting results and allowing users to search and retrieve materials, including content within the Primo Central index. Real-time status is available for catalog ILS materials. Users can also send materials from a PC-based

Library Technology Reports  alatechsource.org  January 2011

Vendor Perspective: Ex Libris Primo Discovery and Delivery Solution

46

Already the choice of more than 700 institutions worldwide, the Ex Libris Primo discovery and delivery solution is a library platform that provides a single, intuitive interface through which end users search the library’s entire collection—print, electronic, and digital resources— and obtain one result list, sorted by relevance. This Google-like search interface, a sophisticated search engine, and a variety of options for narrowing down result lists combine to help users focus on relevant materials in a way that is familiar to them. Tight, standards-based integration with OpenURL link resolvers and integrated library systems enables users to obtain online materials and perform OPAC operations through the Primo interface in a comprehensive, one-stop-shop environment. The modern user interface design, the comprehensive content, and the seamless integration with services such as citation-management tools and library recommender services are fully controlled by the library. An open, flexible, and scalable system, Primo helps libraries disambiguate the complex information landscape by creating a unified view of the information that they manage—primarily their library catalog, digital repositories, and course materials. With the Primo Webbased administration tools, librarians can easily select the content that they wish to offer, customize all aspects of the user interface—including views for mobile devices—and integrate Primo with user environments. Furthermore, libraries can control the search options, the display of results, and the relevance ranking algorithm to best tailor the system to their users’ needs. Primo leverages the library’s local collection by offering it as part of a global or regional information landscape that is indexed by the Primo Central megaaggregate index of scholarly materials. Primo Central,

Web Scale Discovery Services  Jason Vaughan

covering hundreds of millions of scholarly materials obtained from primary and secondary information providers, is hosted by Ex Libris in a cloud computing environment. Available with every Primo installation, Primo Central expands end users’ search scope to encompass the entire library collection, including licensed and open-access materials. Libraries can define Primo Central’s scope to match their subscription offering and disciplinary focus. Users searching via Primo obtain one result list, with items from the local library collection seamlessly blended with global and regional items and displayed instantaneously, sorted by relevance. A true Web 2.0 system, Primo is integrated with a variety of services to enhance the user experience. For example, bibliographic information is augmented with additional data such as book covers and tables of contents obtained from third-party suppliers; the Ex Libris bX article recommender and the Karlsruhe Institute of Technology BibTip book recommender enable users to find items of interest that the system suggests on the basis of selections made by other users; and OpenURL link resolvers, such as the Ex Libris SFX link resolver, offer not only the delivery of library-licensed full text but also a variety of other context-sensitive services. A rich set of open interfaces enables libraries to extend Primo with code that they develop or download from the Ex Libris customer-collaboration Web site (EL Commons). Furthermore, through the use of these open interfaces, libraries can embed Primo services where the users are, such as in institutional or library Web sites and Facebook. Like all other Ex Libris products, Primo operates in a multilingual and multicultural environment and supports a range of consortial models.

Chapter x

session to a smartphone via SMS, allowing them to retrieve the materials later, away from a PC. In addition, customized mobile views of Primo have been developed by other Primo customers, such as BYU and Northwestern, and often such sites may share their work on the EL Commons community website.

Upcoming Directions While limited explicit details can be shared to the broad public, Ex Libris provided some directions that the company is currently exploring. These include additional enhanced support for consortia, enhanced display of text snippets from the brief results view, continued refinement to the service’s ranking algorithms, full-text usage statistics from within Primo (usually, full text statistics are available from a library’s link resolver), and continued development of the Primo mobile interface.

Library Technology Reports  alatechsource.org  January 2011

Web Scale Discovery Services  Jason Vaughan

47

Chapter 6

Differentiators and A Final Note

Library Technology Reports  alatechsource.org  January 2011

Abstract

48

The previous chapters introduced web scale discovery and profiled a majority of the key players engaged in this space as relates to the library environment. While similarities abound, differentiators are present as well. This chapter highlights some of the differences in the areas of content coverage, metadata and relevancy, pricing, integration with other systems, and the interface. As evidenced throughout this report, each service continues to evolve at an extremely rapid pace in terms of content covered, and the features, functionality, and flexibility of the interface. While these services each hold great potential, a final note observes that web scale discovery services, at least at their present stage of development, are not the “final word” for the library discovery environment.

W

eb scale discovery platforms customized to the library environment, handling local library and remotely hosted aggregated publisher content are in their extreme infancy. As observed in the first chapter, features, functionality, and content scope are changing—expanding—rapidly for all players. Press releases occur often, and annual library conferences provide a showcase forum for vendors to introduce their products to potential new customers and highlight enhancements to existing customers. Vendors host presentations and panel sessions discussing the merits of their discovery service and oftentimes provide information on why they feel their offering is the best on the market. From the preceding chapters, readers will note many similarities among the discovery services, and this observation is indeed valid. Given the extremely rapid cycle of development combined with the growing openness of such platforms, this issue of Library Technology Reports

Web Scale Discovery Services  Jason Vaughan

wasn’t constructed as a compare-and-contrast product survey. Things change through enhancement cycles as vendors progress beyond version 1.0 and customers request new features. Customers also create their own innovations facilitated by the openness of these platforms; as platforms become more open, libraries with technical staffing can truly customize these tools to their local environments and include additional functionality. Consider just a few examples. Claremont Colleges Library, with its Sherlock Search, has brought two discovery services together—a front-end Primo interface with harvested local resources, blended with commercial content populated through the Summon index service in the background. North Carolina State University has developed its own front-end interface, QuickSearch, which pulls content from a multitude of services. For a single search, this custom interface returns organized results including (but not limited to) commercial content—such as articles—from the Summon index, book and media materials from NCSU’s Endeca-based catalog, results from a library website search, and Did You Mean? suggestions utilizing Yahoo! Web Services. So, acknowledging the power and creativity opening doors as never before, what are some things to keep in mind for new customers that have yet to embrace a Web scale discovery service? Here are some broad factors to consider.

Content The ultimate goal of any discovery service, bar none, is to place content in the hands of the user or, more specifically, to discover, present, and deliver relevant content in a convenient, intuitive manner to today’s

Chapter x

Claremont Colleges Library Sherlock Search

http://chipri04lsna.hosted.exlibrisgroup.com:1701/ primo_library/libweb/action/search.do?vid=CLA &fromLogin=true

NCSU Libraries QuickSearch www.lib.ncsu.edu (Note: Choose Search All.)

More information on QuickSearch www.lib.ncsu.edu/search/about.html

Metadata and Relevancy Metadata (amount and quality), sound indexing, and relevancy-ranking algorithms are all crucial in best matching items to a user’s search. Different vendors have varied viewpoints on what constitutes sound metadata and the source of that metadata and talk about why they feel their approach is the ideal solution. Metadata conversations encompass “thin metadata”—a few record fields, perhaps a table of contents—and “thick metadata”—covering more fields, including additional abstracting and indexing by dedicated staff, or including author-supplied subject headings and abstracts. Some vendors have access to complete and comprehensive metadata from wellestablished content databases. Several vendors utilize a super or merged record, where different fields or levels of metadata for the same item—received from multiple content providers—are joined through common matchpoints and, through normalization and deduplication processes, result in a rich, accurate, highly discoverable and relevant record. Reading between the lines, 100 percent coverage of a particular resource from one vendor may not be precisely the same as 100 percent coverage of that same resource from another vendor. More specific statements are difficult, given the fact that thousands and thousands of indexed titles exist, and detailed studies would be needed to judge the accuracy of one vendor’s point of view—and facts—versus another’s and are outside the scope of this report. Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

researcher. As far as content scope goes, the overall volume of content natively indexed by each service remains a differentiator, but the difference is rapidly shrinking. Each vendor is busy inking agreements for access to publisher metadata and, preferably, full text for purposes of discovery; each completed agreement can open up thousands if not millions of new items that can be included in the discovery service’s index. Publishers are aware that libraries look at click statistics and usage of their content. Unused resources are ripe for the chopping block in tough—or any—economic times. A growing number of publishers are willing to participate with Web scale discovery services. Some vendors developing Web scale discovery platforms may be entering into exclusive agreements for content from particular publishers; others may refuse to enter into exclusive agreements, believing content should be open to any discovery service. Some vendors indicate they are “content-neutral” and, since they are not themselves native providers of content, suggest that returned query results utilizing their services are free of any potential bias related to provider or source of content. They posit that content neutrality holds a potential for rich future publisher agreements. Given that there is no conflict of interest—the discovery service isn’t owned by a parent company that’s also a content provider (competitor) itself—they suggest more publishers may be willing to enter into agreements for purposes of having their content centrally indexed. To be fair, other vendors deny any hints or suggestions from competitors that query results from their products are biased or that their publisher agreements are lacking compared to other services. The author is not presenting a view or opinion one way or another on this question but raises the concept of content neutrality for the reader to consider. It’s a touchy philosophical subject. No matter what content is covered in the central index, it’s important for individual libraries—potential customers—to work with vendors to conduct content overlap analyses to see what amount of that library’s licensed or purchased content is included in each vendor’s centralized index. Ideally, a lot of what

the library subscribes to and researchers are interested in—online, full-text, and 24/7-available electronic content—will be included and discoverable in the central index and, through a link resolver or similar mechanism, accessible from start to finish for the researcher. Libraries can choose how exhaustive they wish an overlap analysis to be. At one extreme, the library could choose to provide to discovery vendors a full set of electronic journal titles and publisher packages with holdings information and ask to what degree the central index encompasses such content. Or the analysis may be more streamlined—such as a library determining the top 100 or 500 journal or newspaper titles and asking vendors to provide an overlap analysis with this information. Always remember that all vendors are working aggressively with publishers to ink additional agreements and expand the content coverage of their services. Some vendors have focused on scholarly article–type content, others have greater e-book content, and some have greater coverage of newspaper content. Vendors are alike in that their initial focus has been on academic customers, who often have richness and depth of article-level subscriptions with publishers and aggregators.

49

Chapter x

Each vendor has developed its own proprietary relevancy algorithms. Some indicate that they take into account publishers’ own relevancy ranking for materials provided by that publisher. Each offers a strategy for how to prevent items with thin metadata from being lost among items with thick metadata; however, no system will ever be perfect for all searches by all users. Some services allow the local library to influence the algorithm or otherwise promote or boost items within search results, and, depending on the service, this boost may be at the item level, collection level, or database level. Some vendors may place greater emphasis on currency, some on full text, some on subject headings. Some fields may factor heavily into one service’s algorithm and carry less weight in another service’s. Such factors can vary by item type, regardless of service. It’s up to the local library to question vendors, conduct sample searches, and gauge what level of satisfaction they have with the vendor’s approach.

Library Technology Reports  alatechsource.org  January 2011

Price

50

Each vendor has its own pricing model, and while some similarities exist, differences are also present. Some pricing models include, among other factors, references to the number of local records harvested. Some focus on institutional FTE or level of degree granted by academic customers. Most, if not all, vendors are willing to discuss consortial and multiyear discounts or to give price breaks if other products they market are also purchased or subscribed to. Staffing is also a pricing consideration. All vendors offer completely hosted versions of their discovery service—providing the hardware, maintaining backups, and hosting the interface and centralized index. Such a scenario relieves local staff from maintaining hardware and performing backups. Some services allow the library to host the hardware and the interface (whether the vendor’s or one developed locally). In some cases, hosting the hardware locally may provide even greater flexibility in customizing the service. In all cases, the preaggregated central index is hosted and accessed remotely. That said, response times for all the services are outstanding and similar to a Google search; the only (short) lag noticed may be with the real-time status check for items in the library’s ILS.

Integration with Other Systems A fundamental shift occurred several years back with the advent of next-generation discovery layers (e.g., Ex Libris Primo, Innovative Interfaces Encore, Serials Solutions AquaBrowser); such discovery layers added new features and functionality on top of the traditional Web Scale Discovery Services  Jason Vaughan

ILS online public-access catalog and were agnostic to the underlying ILS. Several of the new Web scale discovery services are built on top of these next-generation discovery layers from a few years back. One Web scale discovery service, Summon, was built from the ground up. How well and to what degree a particular web scale discovery service may integrate with a given ILS from a particular vendor may vary for purposes of placing holds, seeing which items are checked out, and so on. A library with a Web scale discovery service and underlying ILS from the same vendor may find tighter integration, such as easily enabling the same student account to be used for both systems, enhanced information display capabilities, and so on. A critical step for any library considering a Web scale discovery service is to ask the vendor detailed questions about integration with the underlying ILS (and other information repositories). It’s important to understand what discovery services may require a jump to the underlying ILS for traditional OPAC functions (holds, requests, ILL) and which ones can accommodate such functions from directly within the discovery service interface. Just as important, libraries should ask if any existing customers using the prospective library’s ILS have gone live (such examples likely exist). Potential customers can take a look at the live site and contact the live library for its experiences and observations. All of this aside, keep in mind that the pool of traditional library holdings—physical items cataloged into the ILS—is not the shining star and chief selling point for Web scale discovery, and so level of integration with the underlying ILS shouldn’t necessarily be a strong area of scrutiny. Rather than focusing on local content, numbering in the thousands to several million items, depending on library, Web scale is focused on the hundreds of millions of items not present in the ILS—the massive, current, growing body of journal articles, newspaper articles, conference proceedings, and so on. The beauty of these Web scale discovery services is their ability to host, search, combine, and deliver content from both content pools, local and remote. Some systems may presently handle consortiallevel implementations better than others. This is an interesting topic for some but not all libraries and was left out of this issue of Library Technology Reports. It’s fair to say that some systems (e.g., WorldCat Local) are built upon systems with extensive knowledge of other libraries’ materials and have integrated mechanisms and available workflows in place to facilitate things like ILL requests. Another system (Primo) can search the local indexes (ILS records and other information repositories) of other Primo sites; a Summon search can include the digital collections and institutional repository materials from other Summon sites. WorldCat Local tiers can be constructed, scoping the search from the local library to an extensive consortium. Not to be left out, Ebsco has features facilitating

Chapter x

bundled product purchases or associated annual licensing and maintenance. Reliance on a single vendor has some potential downsides—competing products may offer what the library deems are must-have features; a vendor could choose to inordinately raise maintenance or support pricing; and so on. Fortunately, exit strategies exist through the open nature of these products. Link resolvers, proxy servers, ILS systems, and discovery services from different vendors can often be mixed and matched into a precise solution fitting the libraries’ needs and workflow.

The Interface Some services profiled in this report are more open than others. While all offer some level of customization allowing libraries to make the discovery service their own, the level of openness and flexibility vary. At a minimum, all offer a basic template for libraries wishing to make some choices but perhaps not deeply tinker. At the other end, some offer extreme flexibility, enabled and augmented by capable toolkits, flexible established APIs, use of modern open Web technologies, and a user group consisting of established customers who have already shared or are willing to share developed code and ideas. For those libraries with sufficient staffing and skill sets, such flexibility can be attractive. As stated earlier, the purpose of Web scale discovery services is to connect users, as seamlessly and easily as possible, to content. Assuming the content is there to be discovered (and this is becoming less of a differentiator as all vendors ink more agreements with publishers and aggregators), and assuming the vendors have quality metadata and finely tuned relevancy algorithms (that’s for the library to investigate), then a final question revolves around the interface. All vendors indicate they have conducted (and continue to conduct) extensive usability studies in designing their interface; some discovery services are becoming established to the degree that early-adopter libraries have conducted their own independent usability studies. Some vendors provide usability information directly on their websites, and others may be willing to share reports if asked. Assuming one is using the default template, the interfaces for the discovery services look quite similar—a search box at top, results presented in the middle of the screen, and facets and other search refinements in a pane along the left. That said, there are some differences, and how significant those differences are can be determined only by the prospective library customer for its environment. At the time of this writing, some differences exist. Some, but not all, alert the user that an item is a full-text item. Some allow you to limit to full text only, and some designate peerreview status. Some present additional information Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

consortial installations as well, and every system offers additional consortial options not mentioned here. If libraries are interested in a consortial purchase of a discovery service, they will be well served to ask each vendor about how its service can fit into their consortial environment. Questions about staff workflows, integration with (or accommodation of) consortial catalogs, branch or site branding, and the ability of the discovery service to be scoped to the particular hard-copy and electronic holdings of disparate consortial members are all relevant. Different vendors offer optional add-on products, usually for an additional cost. All Web scale discovery service vendors also offer federated search products in addition to their preaggregated central index. Vendors generally agree that it is likely that some content of interest to libraries will always be missing from the central index. This statement is definitely true for the present and at least the short-term future. Several vendors indicate that federated search can help plug the gaps and will be part of the discovery landscape if one wants to conduct a search that’s as inclusive as possible; others indicate that combining federated search with Web scale discovery via a central index can be confusing and difficult and that the traditional problems of federated search remain—problems such as slow delivery times, poor relevancy-ranking capabilities, limited query returns, and results lost within the larger aggregate of centralized content. Different vendors have taken different approaches, and each has arguments for why it feels that its approach is best. Potential library customers should learn how federated search fits—or doesn’t fit—into the overall discovery solution by conducting their own research, talking to existing customers, and having detailed questions prepared for vendor visits and conversations. One service, Primo, offers the optional bX Recommender service (described in chapter 5), which merits investigation. Parallel with the integration discussion arises an efficiency discussion (and, on the flip side, the discussion of reliance on a single vendor). The highlight of Web scale discovery lies with exposing the huge amount of published content subscribed to or purchased by libraries. As all libraries are aware, collection development, rights management, and maintenance of electronic content are significant tasks. Different vendors offer different products in their portfolio, which, when taken in sum, could often be seen as a complete library solution (similar to turnkey ILS systems awhile back). Such products can include an ILS (or components of an ILS), an electronic resource management system, MARC record services, enrichment content services, link resolver or similar knowledge base for rights management, A–Z title lists, proxy server, and so on. Products from a single vendor are often designed to integrate well and can foster staff efficiencies. Fiscal efficiencies may also result through

51

Chapter x

when the user clicks on a tab or hyperlink; others rely more on mouseovers. Some accommodate the addition of widgets to provide additional services. Some offer boosting or highlighting of items, collections, or databases. Some have an index more open to search by unauthenticated users. One has established relationships with Google to help drive users to library-available content. All products offer rich export options for items of interest; some offer more than others. Some offer different facet refinement categories that may be of interest; one allows the library to define its own facet categories. All have advanced search modes with often similar capabilities, yet subtle differences exist. All discovery service search boxes can be embedded in different webpages or portals. Regarding resolution to the full text, some offer more streamlined access, at least for some resources from some content providers. Some offer a user account where researchers can save and later retrieve items of interest. Some offer rich social community tools, such as tagging and reviews; others don’t (and suggest that tagging, ratings, and reviews benefit primarily the smaller sea of ILS and digital collection materials more than the ocean of articles and newspaper content).

Library Technology Reports  alatechsource.org  January 2011

A Final Note

52

The majority of vendors profiled in this report provided some details, included in the respective chapters, about potential near-future enhancements for their services, all geared toward refining these services to best meet the needs of today’s generation. While library Web scale discovery has tremendous potential, there are several things to keep in mind. First, such services do not cover everything of interest as pertains to a library’s collections. This fact is due to a variety of factors, such as some publishers having yet to come on board and open up their content for indexing by these third-party discovery services. In some cases, contributing factors may also include current technical or compatibility issues. Second, specialized databases may have search or presentation capabilities not easily integrated into the discovery service interface, at least at this early stage of development; as a result, database recommendations are starting to be integrated within discovery service search results. To a degree, silos of information—various repositories of information and their associated interfaces—will remain for the foreseeable future. For the present at least, library staff will continue performing cataloging and metadata work within their local ILS systems, digital content management systems, and institutional repositories. Third, current discovery services can’t read the researcher’s mind and know precisely what he or she is searching for. However, apart from continued Web Scale Discovery Services  Jason Vaughan

refinement of relevancy algorithms, various recommender feature components are advancing the goal of returning relevant information to a given search. Fourth, existing resources to which students have flocked for research needs are not going away. Google and Wikipedia are two of the most popular websites in the world, with good reason. Purchase and implementation of a library-focused Web scale discovery service is a first step; libraries will still need to studiously work to steer users to these services. Libraries purchasing a Web scale discovery service obviously have implementation and marketing decisions to consider. Many libraries that have implemented a Web scale discovery service place the search box on the library’s homepage, recognizing the importance of such services. Often, libraries provide a tabbed search box approach, allowing the user to choose which resource they want to search—be it the discovery service index, the local catalog (whether a traditional ILS or a “next generation” ILS discovery layer), a list of databases, an A–Z list of journals, etc.). Whether libraries choose to make the discovery service index the default search or not is a local library decision. Indeed, adoption of a Web scale discovery service can impact design decisions throughout a library’s website—as mentioned, a search box for the service can be placed in multiple areas of not only the library website (and external websites for which the library has an account—such as Facebook), but other (often university controlled) sites such as course management systems. The level of marketing and bibliographic instruction can range from minimal to extensive. Web scale discovery vendors often suggest no instruction is needed, given the ease of use of such tools. Finally, “pleasing” all user groups is always challenging for libraries. Established faculty instructors (and librarians) may be used to the existing ILS, have their favorite topical databases, and enjoy browsing the table of contents of favorite journals. This competes with freshman undergraduates, who, as research shows, want quick, relevant information from the first tool they search. They perhaps (or likely) have no previous exposure to the university’s ILS, or have favorite scholarly databases or journals. For the library, perhaps it’s all about striking a happy medium. At the present stage of development, one new resource—no matter how promising—can (or should) immediately supplant a host of other, established resources. It is possible for several discovery systems to continue to coexist, and can be a fascinating (or frustrating) exercise for libraries to best choose how to design their webpages, market the strengths of the different systems, and provide appropriate instruction where needed. Acknowledging some of the above challenges—or considerations—it will be fascinating to watch as these infant services mature. Eric Lease Morgan suggests lots

Chapter x

of interesting possibilities, noting that opportunities for future library catalogs (and, the author suggests by extension, Web scale discovery services) can be found in services—services that help researchers use the information they’ve found and better sense who the researcher is (such as a student or an instructor).1 He offers examples of potential services, such as compareand-contrast functionality, the ability to create different versions of a document, services to plot on a map, and services to translate. His extensive possibilities list includes many items that are beginning to appear in next-generation library catalogs and Web scale discovery services alike. This issue of Library Technology Reports concludes where it began—with an acknowledgement that library-focused Web scale discovery services hold great potential and are evolving rapidly. This report has provided a snapshot of several Web scale discovery services developed and marketed by major

established vendors. The marketplace and development environment are still young for next-generation library catalogs, and younger still when Web scale discovery is added to the mix. Features, functionality, level of integration with other systems, scope of content, and soundness of metadata are all evolving, and, it’s hoped, will continue to evolve, better meeting the needs and expectations of today’s researchers. Things not offered by a service today may be offered tomorrow; things not quite envisioned are ripe to be imagined.

Note 1. Eric Lease Morgan, “Next Generation Data Format,” May 2008, Infomotions website, Infomotions’ Musings on Information and Librarianship section, http:// infomotions.com/musings/ngc4mla.

Library Technology Reports  alatechsource.org  January 2011

Web Scale Discovery Services  Jason Vaughan

53

Chapter 7

Questions to Consider

Abstract

Library Technology Reports  alatechsource.org  January 2011

For numerous reasons, libraries contemplating the purchase of a web scale discovery service should very carefully complete their homework. The following set of questions can serve as a primer for those engaging in their own evaluations of the increasingly competitive library web scale discovery space. Questions are divided into several topical areas: general and background questions; local library resources; publisher and aggregator indexed content; open access content; relevancy ranking; authentication and rights management; and the user interface.

54

T

he following questions can serve as a springboard for prospective library customers seeking more information from vendors as part of their own evaluation of library Web scale discovery services.

Section 1: General and Background Questions 1. Customer Install Base

• How many current customers do you have that have implemented the product at their institutions (i.e., the tool is currently available to users or researchers at the institution)? • How many additional customers have committed to the product?

• How many of these customers fall within our library type (e.g., higher ed academic, public, K–12)? Web Scale Discovery Services  Jason Vaughan

2. References

• Can you provide website addresses for live implementations that you feel serve as a representative model matching our library type? • Can you provide references—name and contact information—for the lead individuals you worked with at several representative customer sites that match our library type?

3. Pricing Model, Optional Products

• Describe your pricing model for a library type such as ours, including initial up-front costs and ongoing costs related to the subscription and technical support.

• What optional add-on services or modules (federated search, recommender services, enrichment services) do you market that we should be aware of and that are related to and able to be integrated with your discovery solution?

4. Technical Support and Troubleshooting

• Briefly describe options (including hours of availability) customers have for reporting mission-critical problems and for reporting observed non–mission-critical apparent glitches.

• Briefly describe any consulting services you may provide above and beyond routine support services (e.g., consulting services related to harvesting of a unique library resource for which an ingest/transform/normalize routine does not already exist).

Chapter x

• Is there a process for suggesting enhancements for potential future incorporation into the product?

5. Size of the Centralized Index. How many periodical titles does your preharvested centralized index encompass? What is the current count of unique indexed items? 6. Statistics

• Please describe what you feel are some of the more significant use, management, or contentrelated statistics available out of the box with your system. • Are the statistics COUNTER compliant?

7. Ongoing Maintenance Activities, Local Library Staff

• For instances where the interface and discovery service are hosted on your end, please describe any ongoing local library activities associated with maintaining the service for the local library’s clientele (e.g., ongoing maintenance associated with periodic local resource harvest updates, etc.). • For instances where the hardware and interface are hosted at the local library, please describe any ongoing local library activities associated with maintaining the service for the local library’s clientele (e.g., backup responsibilities, hardware troubleshooting, etc.). Note: Most systems are hosted by the vendor; this question applies to only a few select vendors.

8. Metadata Requirements

• What mandatory record fields for local resources must exist for the content to be indexed and discoverable within your platform (e.g., title, date)? • To what degree can collections from different sources have their own unique field information that is displayed or figures into the relevancyranking algorithm for retrieval purposes? For example, one particular digital collection at the library has unique fields pertinent to that collection. How would such unique fields be handled by the discovery service?

9. Crosswalking and Existing Ingestors

• Are both local and remote content normalized to a single schema? If so, please offer comments on how local and remote (publisher or aggregator) content is normalized to this single underling schema.

• If existing ingestors do not exist, please describe any tools your discovery platform may offer to assist local staff in crosswalking between the local library database schema and the underlying schema within your platform.

10. Schedule

• For records hosted in systems at the local library, how often do you harvest information to reflect record updates, modifications, deletions?

• Can the local library invoke a manual harvest of locally hosted resource records on a perresource basis (e.g., if the library launches a new digital collection and wants the records to be available within the discovery service shortly after they are available in our locally hosted digital repository, is there a mechanism to force a harvest prior to the next regularly scheduled harvest routine)? • After routine harvesting, how long does it typically take for such updates, additions, and deletions to be reflected in the discovery service?

11. Policies and Procedures. Please describe any general policies and procedures not already addressed of which the local library should be aware as they relate to the harvesting of local resources. 12. Examples of Local Collections Incorporated into the Discovery Solution • Our library uses the ABC digital collection management software. Do you have any existing customers who also utilize this platform, whose digital collections have been harvested and are now exposed in their instance of the discovery service?

• Our library uses the ABC institutional repository software. Do you have any existing customers who also utilize this platform, whose IR materials have been harvested and are now exposed in their instance of the discovery service? 13. Consortial Union Catalogs. Can your discovery service harvest and provide access to items within a consortial or otherwise shared catalog (e.g., an Innovative Interfaces INN-Reach catalog). Please describe.

Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

Section 2: Local Library Resources

• Please verify that your platform has existing connectors and standard ingest/transform/ normalize tools or application profiles for the following schema or standards used by local systems at our library (e.g., MARC 21 bibliographic records; Unqualified/Qualified Dublin Core, EAD, etc.).

55

Chapter x

Section 3: Publisher and Aggregator Agreements and Indexed Content 14. Publisher and Aggregator Agreements: General

• With approximately how many publishers and aggregators have you forged content agreements?

• Are there major publisher or aggregator agreements that you feel are especially significant for your service? If so, which ones and why (e.g., other discovery platform vendors may not have such agreements with those particular providers, the amount of content was so great that it greatly augmented the size and scope of your service, etc.)? • Are these agreements indefinite, or do they have expiration dates?

• Have you entered into any exclusive agreements with any publishers or aggregators (i.e., the publisher or aggregator is disallowed from forging agreements with competing discovery platform vendors or disallowed from providing the same deep level of metadata or full text for indexing purposes).

Library Technology Reports  alatechsource.org  January 2011

15. Metadata or Full Text Provided by Publishers and Aggregators. Could you please provide some comments and detail on the level of metadata provided to you, for indexing purposes, by the majority of major publishers and aggregators with whom you’ve forged agreements? Please describe to what degree the following data elements are provided by these agreements and play a role in your discovery service:

56

A. Basic bibliographic information (article title, journal title, author, publication information) B. Subject descriptors

C. Keywords (author-supplied?) D. Abstracts (author-supplied?) E. Full text

16. Topical Content Strength

A. Do you feel there is a particular content area your discovery service covers especially well or leans heavily toward (e.g., humanities, social sciences, sciences)?

B. In what subject or content areas, if any, do you feel the discovery service may be somewhat weak? Are there current efforts to mitigate these weaknesses (e.g., publisher agreements on the horizon)? C. Do you feel there is a particular format type Web Scale Discovery Services  Jason Vaughan

that your discovery service covers very well or leans heavily toward (scholarly journal content, magazine content, newspapers, conference proceedings, etc.)?

17. Content Overlap Analysis. Please describe what sort of content overlap analysis you can provide comparing our local library holdings with titles covered in your discovery service. • Content Considered Key by Local Library (by publisher). Following is a list of some major publishers whose content, licensed by the library, is considered key. Has your company forged agreements with these publishers to harvest their materials? If so, please describe the scope of the agreement and whether the materials are preharvested into your central index or incorporated through an optional federated search mechanism. How many titles are covered for each publisher? What level of metadata are they providing to you for indexing purposes (e.g., basic citation-level metadata [title, author, publication date], abstracts; full text)? A. ex. Elsevier B. ex. Sage

C. ex. Taylor and Francis D. ex. Wiley-Blackwell

• Content Considered Key by Local Library (by title). Following is a list of the top 100 major journal or newspaper titles whose content, licensed by the library, is considered key. Could you please indicate whether such titles are incorporated into your preaggregated central index or as a target in an optional federated search mechanism? If so, please describe the level of indexing (e.g., basic citation-level metadata [title, author, publication date], abstracts; full text). E. ex. Nature

F. ex. American Historical Review G. ex. JAMA

H. ex. Wall Street Journal

18. Google Books/Google Scholar/Hathi Trust. Do any agreements exist at this time to harvest the data associated with Google Books, Google Scholar, or Hathi Trust into your discovery service? If so, are such items incorporated into your preharvested central index or as a target in an optional federated search product offered by your company? Please describe the level of indexing (e.g., basic citation-level metadata [title, author, publication date], abstracts, full text).

Chapter x

19. WorldCat Catalog. Does your service include the OCLC WorldCat catalog records? If so, are such items incorporated into your preharvested central index or as a target in an optional federated search product offered by your company? What level of information is included? The complete record? Holdings information? 20. E-Book Vendors. Does your service include items from major e-book vendors?

21. Record Information. Given the fact that the same content (e.g., metadata for a unique article) can be provided by multiple sources (e.g., the original publisher of the journal itself, an open-access repository, a database or aggregator, another database or aggregator, etc.), please provide some general comments on how records are built within your discovery service. For example:

A. You have an agreement with a particular publisher or aggregator that agrees to provide you with rich metadata for its content, perhaps even provide you with indexing it has already done for its content, or provide you with the full text for you to be able to “deep index” its content.

B. You have an agreement with a particular publisher that happens to be the only publisher or provider of that content. It may provide you rich information, or it may provide you rather weak information. In any case, you choose to incorporate this into your service, as it is the only provider or publisher of the content. Or alternatively, it may not be the only publisher or provider of the information, but it is the only publisher or provider with which you’ve currently entered into an agreement for that content.

22. Deduplication. Related to the question immediately above, please describe your discovery service’s approach to deduplicating items (or not) in the central preaggregated index. If your discovery service incorporates content for the same unique item from more than one content provider, does your index retrieve and display multiple instances of the same title? Or do you create a merged, composite, or super-record that is the only record displayed? Please describe.

23. Does your discovery service automatically include (out of the box, at no additional charge) materials from open-access repositories? If so, could you please list some of the major repositories included (e.g., arXiv.org e-Prints; Hindawi Publishing; Directory of Open Access Journals; etc.)? 24. Open-Access Content Sources: Future Plans. In addition to the current open-access repositories that may be included in your discovery service, are there other repositories whose content you are planning to incorporate in the future?

25. Exposure to Other Libraries’ Bibliographic, Digital Collection, or IR Content. Would ILS bibliographic records from other customers using your discovery service be exposed for discoverability in our instance of the discovery service? Would digital collection records? Institutional repository records?

Section 5: Relevancy Ranking 26. Relevancy Determination. Please describe some of the factors that figure into your discovery service’s relevancy algorithm (fields and elements that play a role and their relative weighting in the relevancy determination). 27. Currency. Please comment on how heavily currency of an item is weighted in relevancy determination. Does currency factor more heavily for certain content types (e.g., newspapers)?

28. Local Library Influence. Does the local library have any influence over the relevancy determination? Can it choose to “bump up” particular items or local collections for a search? Please describe. 29. Local Collection Visibility. Please offer some detail on how local content (e.g., ILS bibliographic records, digital collections) remains visible and discoverable within the larger pool of content indexed by your discovery service. For example, local content may measure a million items, while your centralized index may cover half a billion items.

30. Exposure of Items with Minimal Metadata. Some items likely have weaker or less rich metadata than other items. Could you please offer some comments on how your discovery service ensures discoverability for items with lesser or minimal metadata? 31. Full-Text Searching. Does your discovery service offer the capability to search the full text of materials (i.e., is a user searching a full-text keyword index)? If so, approximately what percentage of

Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

C. For some items appearing within your service, content is provided from multiple sources with which you’ve made agreements. There may be some or many cases of overlap for unique items, such as a particular article title. In such cases, do you create a merged, composite, or super-record where your service utilizes metadata from each of the multiple sources, creating a strong single record built from these multiple providers of content?

Section 4: Open-Access Content

57

Chapter x

items within your service are indexed at the fulltext level?

32. Please describe what is presented by your discovery service when no results are retrieved for a search. Does your system enable “bestmatch” retrieval—that is, something will always be returned or recommended? What elements play into this determination; how is the user prevented from having a completely dead-end search? What Did You Mean? functionality is included with your discovery service?

Section 6: Authentication and Rights Management 33. Open or Closed Nature of Your Discovery Service. Does your discovery service offer an unauthenticated view or access? Please describe and offer some comments on what materials will not be discoverable or visible for an unauthenticated user versus an authenticated user. It’s assumed that licensed full-text materials would be available only to authenticated users (such as through a login or IP authentication). Please offer comments on the discoverability of the following:

Library Technology Reports  alatechsource.org  January 2011

A. Records solely sourced from abstract and indexing databases (i.e., That database is the only provider of that specific content within your discovery service.)

58

B. Citation information for content sourced from multiple content providers (Please describe the general level of citation information available to an unauthenticated user versus the level available to an authenticated user.) C. Enrichment information (e.g., book cover images, tables of contents, abstracts)

34. Exposure of Nonlicensed Resource Metadata.

• Directly related to the question above, if one weren’t to take into account any e-journal, publisher package, or database subscriptions and licenses the local library pays for, is there a base level of citation information that’s exposed and available to all customers of your discovery service? This may include open-access materials or bibliographic information for some publisher or aggregator content (which often requires a local library license to access the full text). Please describe. • Would a user need to be authenticated to search (and retrieve results from) this base index? • How large is this base index that all customers

Web Scale Discovery Services  Jason Vaughan

may search, regardless of local library publisher and aggregator subscriptions.

35. Rights Management.

• Please describe how rights management is initialized and maintained in your discovery service for purposes of determining whether a local library user should have access to the full text (or, if not full text, full resolution for non– full-text resources—e.g., full resolution to the abstract or more complete citation as may be found within an A&I resource). • Our library uses the ABC link resolver. Our library uses the ABC A–Z journal listing service. Our library uses the ABC electronic resource management system. Is your discovery service compatible with one or all of these systems for rights management purposes? Is one approach preferable to the other, or does your approach explicitly depend on one of these particular services?

Section 7: User Interface 36. Openness to Local Library Customization. Please describe how open your discovery service is to local library customization. For example, please comment on the local library’s ability to A. Rename the service

B. Customize the header and footer hyperlinks and color scheme C. Choose which facet clusters appear D. Define new facet clusters

E. Embed the search box in other venues

F. Create “canned” precustomized searches for an instance of the search box

G. Define and promote a collection, database, or item such that it appears at the top or on the first page of a relevant, toipically related search H. Develop custom widgets offering extra functionality, or download widgets from an existing user community I. Incorporate links or direct inline viewing for external enriched content (e.g., Google Books previews; Amazon.com item information) J. Other things you feel are worth noting

37. Social Features. Please describe some current social Web features present in your discovery service (e.g., user tagging, ratings, reviews, etc.).

Chapter x

What, if any, plans do you have to offer or expand such functionality in future releases?

38. User Accounts.

• Does your discovery service offer user accounts? • What services does the user account provide?

A. Save a list of results to return to at a later time? B. Save queries for later searching?

C. See a list of recently viewed items?

D. Perform typical ILS functions such as viewing checked-out items, renewals, and holds? E. Create customized RSS feeds for a search?

Is it a browser-based interface optimized for small screen devices? Is it a dedicated iPhone-, Android-, or BlackBerry-based application? 40. Usability Testing.

• Please describe how your discovery service incorporates established best practices in terms of a customer-focused, usable design including modern Web interface elements expected by today’s users.

• What usability testing have you performed or do you conduct on an ongoing basis? • Are you aware of any existing customers that have conducted usability testing of their own related to your discovery service?

39. Mobile Interface. Please describe the mobile interface(s) available for your discovery service.

Library Technology Reports  alatechsource.org  January 2011

Web Scale Discovery Services  Jason Vaughan

59

For More Information

Representative Overviews: User Preferences and Expectations Bates, Marcia J. Improving User Access to Library Catalog and Portal Information, final report, version 3. Washington, DC: Library of Congress, 2003), www.loc .gov/catdir/bibcontrol/2.3BatesReport6-03.doc.pdf. Bibliographic Services Task Force. Rethinking How We Provide Bibliographic Services for the University of California: Final Report. University of California Libraries, 2005, http://libraries .universityofcalifornia.edu/sopag/BSTF/Final.pdf.

Library Technology Reports  alatechsource.org  January 2011

Calhoun, Karen. The Changing Nature of the Catalog and Its Integration with Other Discovery Tools: Final Report. Washington, DC: Library of Congress, 2006), www .loc.gov/catdir/calhoun-report-final.pdf.

60

Horizon Report. The Horizon Report is a collaboration between the Educause Learning Initiative and the New Media Consortium. It is published annually and highlights trends to watch, including topics related to discovery within the library environment. The most recent edition is Larry Johnson, Alan Levine, Rachel S. Smith, and Sonja Stone, The Horizon Report, 2010 Edition (Austin, TX: New Media Consortium, 2010), www.educause.edu/ir/library/pdf/CSD5810.pdf. Past issues are also available online. OCLC Online Computer Library Center. College Students’ Perceptions of Libraries and Information Resources. Dublin, OH: OCLC. 2006, www.oclc.org/ reports/pdfs/studentperceptions.pdf. OCLC Online Computer Library Center. Online Catalogs: What Users and Librarians Want. Dublin, OH: OCLC, 2009, www.oclc.org/reports/onlinecatalogs/ fullreport.pdf. Web Scale Discovery Services  Jason Vaughan

Schonfeld, Roger C., and Ross Housewright. Faculty Survey 2009: Key Strategic Insights for Libraries, Publishers, and Societies. New York: Ithaka S+R, 2010, www.ithaka.org/ithaka-s-r/research/faculty -surveys-2000–2009/Faculty%20Study%202009.pdf.

Representative Overviews: NextGeneration Library Catalogs, Library Web Scale Discovery, and Metadata Breeding, Marshall. Library Technology Guides: Key Resources in the Field of Library Automation. Rich website with information and news tracking related to several topics, including integrated library systems, next-generation library catalogs, and library Web scale discovery services. Online at www.library technology.org. Breeding, Marshall. “Next-Generation Library Catalogs.” Library Technology Reports 43, no. 4 (July/ Aug. 2007). Charleston Advisor. A quarterly periodical that frequently has information on databases and discovery services for libraries, including Web scale discovery. Available online: www.charlestonco.com. de Groat, Greta. Future Directions in Metadata Remediation for Metadata Aggregators. Washington, DC: Digital Library Federation, Feb. 2009. Available online at www.diglib.org/aquifer/dlf110.pdf. NewsBreaks & The Weekly News Digest, edited by Paula Hane. Published several times weekly by Information Today, this resource frequently has information related to databases and discovery services for libraries, including Web scale discovery. Online at http://newsbreaks.infotoday.com.

Chapter x

Schaffner, Jennifer. The Metadata Is the Interface: Better Description for Better Discovery of Archives and Special Collections, Synthesized from User Studies. Dublin, OH: OCLC Research, May 2009. Online at www.oclc.org/research/publications/ library/2009/2009-06.pdf.

Marshall Breeding, Helen Livingston, and Jane Burke. Sponsored by Serials Solutions.

Smart Libraries Newsletter. A monthly periodical that frequently has information and news on integrated library systems, next-generation library catalogs, and library Web scale discovery services. Online at www .alatechsource.org/sln/index.

Note: Additional videos demonstrating discovery services are available on vendor websites. Some within this group are tutorial overviews; others are, for example, presentations given at library conferences. You can browse the vendor websites for more details, and sales representatives may be able to provide access to more archived presentations related to their particular Web scale discovery service.

Library Journal Archived Webcasts Free registration is required to participate in scheduled webcasts or view archived webcasts. Many recent webcasts, listed below, have focused on Web scale discovery for libraries. These webcasts are sponsored by Library Journal and a discovery service vendor (Serials Solutions or Ex Libris, as indicated below). Despite corporate sponsorship, many of the webcasts highlight information relevant to all Web scale discovery services. Online at www.libraryjournal.com/csp/cms/ sites/LJ/Tools/Webcast/index.csp. “Defining Web-Scale Discovery: The Promise of a Unified Search Index for Libraries.” Aug 18, 2009. Marshall Breeding, Eric Lease Morgan, and Andrew Nagy. Sponsored by Serials Solutions.

“Primo Central: The Ultimate in Next-Gen Discovery: Raising Research to a New Level.” June 16, 2010. Dale Poulter, Curtis Thacker, Tamar Sadeh, and Michael Kaplan. Sponsored by Ex Libris. “Returning the Researcher to the Library: The Summon Service in Real Life.” Sept. 22, 2009. Scott Garrison, Paul Pival, Ron Berry, and Mike Buschman. Sponsored by Serials Solutions. “The Success of Web-Scale Discovery in Returning Net-Gen Users to the Library: The Summon Service in Academic Libraries.” April 8, 2010. Doug Way, Jennifer Duvernay, and John Law. Sponsored by Serials Solutions. “Understanding the New Discovery Landscape: Federated Search, Web-Scale Discovery, NextGeneration Catalog and the Rest.” May 6, 2010.

Additional Selected Readings Antelman, Kristin, Emily Lynema, and Andrew K. Pace. “Toward a Twenty-First Century Library Catalog.” Information Technology and Libraries 25, no. 3 (Sept. 2006): 128–138. Bowen, Jennifer. “Metadata to Support NextGeneration Library Resource Discovery: Lessons from the eXtensible Catalog, Phase 1,” Information Technology and Libraries 27, no. 2 (June 2008): 6–19. Morgan, Eric Lease. “Musings.” Several essays, including pieces on the topic of next-generation catalogs and Web scale discovery. Online at www .library.nd.edu/daiad/morgan/musings/index.shtml. The following are of particular interest: • “‘Next Generation’ Library Catalogs in 15 Minutes.” Nov. 13, 2007. www.library.nd.edu/daiad/ morgan/musings/ngc-in-15-minutes/index.shtml. • “‘Web-Scale’ Indexes and ‘Next Generation’ Library Catalogs.” Aug. 13, 2009. www.library .nd.edu/daiad/morgan/musings/web-scale/index .shtml. • “XC and the Future of Library Search.” Oct. 29, 2007. www.library.nd.edu/daiad/morgan/musings/ future-of-search/index.shtml. Tam, Winnie, Andrew Cox, and Andy Bussey. “Student User Preferences for Features of NextGeneration OPACs.” Program: Electronic Library and Information Systems 43, no. 4 (2009): 349–374. Ward, Jennifer, Pam Mofjeld, and Steve Shadle. “WorldCat Local at the University of Washington Libraries.” Library Technology Reports 44, no. 6 (Aug./ Sept. 2008).

Web Scale Discovery Services  Jason Vaughan

Library Technology Reports  alatechsource.org  January 2011

“Making the Difference in Discovery: Why ‘WebScale’ Defines True Discovery.” Sept. 16, 2010. Jonathan Miller, Andrew Nagy, Andrea Michalek, Mike Buschman, and John Law. Sponsored by Serial Solutions.

“Understanding the Next-Gen User.” June 4, 2009. Joan Lippincott and Alison Head. Sponsored by Serials Solutions.

61

Chapter x

Notes

Library Technology Reports Respond to Your Library’s Digital Dilemmas Eight times per year, Library Technology Reports (LTR) provides library professionals with insightful elucidation, covering the technology and technological issues the library world grapples with on a daily basis in the information age. Library Technology Reports 2011, Vol. 47 January 47:1

“Web Scale Discovery Services” by Jason Vaughan

February/ March 47:2

“Libraries and Mobile Services” by Cody W. Hanson

April 47:3 May/June 47:4

“Building Community with Social Media” by Lichen J. Rancourt “Using WordPress as a Library Content Management System” by Kyle M. L. Jones and Polly Alida-Farrington

July 47:5

“Using Web Analytics in the Library” by Kate Marek

August/ September 47:6

“Re-thinking the Single Search Box” by Andrew Nagy

October 47:7 November/ December 47:8

“The Transforming Public Library Technology Infrastructure” by ALA Office for Research and Statistics “RFID In Libraries” by Lori Bowen-Ayre

alatechsource.org ALA TechSource, a unit of the publishing department of the American Library Association