168 60 3MB
English Pages 240 Year 2010
Business planning for digital libraries
BPDG_opmaak_12072010.indd 1
13/07/10 11:51
BPDG_opmaak_12072010.indd 2
13/07/10 11:51
Business Planning for
Digital Libraries
International Approaches
Mel Collier (ed.)
BPDG_opmaak_12072010.indd 3
13/07/10 11:51
© 2010 by Leuven University Press / Presses Universitaires de Louvain / Universitaire Pers Leuven Minderbroedersstraat 4, B-3000 Leuven (Belgium) All rights reserved. Except in those cases expressly determined by law, no part of this publication may be multiplied, saved in an automated datafile or made public in any way whatsoever without the express prior written consent of the publishers. ISBN 978 90 5867 837 9 D / 2010 / 1869 / 53 NUR: 800 Design cover: Friedemann BVBA (Hasselt)
BPDG_opmaak_12072010.indd 4
13/07/10 11:51
Contents
About the authors
7
Framework chapters 1. Business planning for digital libraries Mel Collier, Leuven University, Belgium
13
2. Business model innovation in digital libraries: the cultural heritage sector Harry Verwayen, Europeana, The Hague, Netherlands
23
3. Digital libraries in higher education Derek Law, University of Strathclyde, Glasgow, Scotland
33
4. Digital libraries for the arts and social sciences 45 Ian Anderson, Humanities Advanced Technology and Information Institute (HATII), Glasgow University, Scotland 5. The impact of the digital library on the planning of scientific, technical and medical libraries Wouter Schallier, LIBER, The Hague, Netherlands
57
Practice chapters 6. E-journals in business planning for digital libraries Hilde Van Kiel and Mel Collier, Leuven University, Belgium
67
7. E-books: business planning for the digital library Hazel Woodward, Cranfield University, England
79
8. Business planning for e-archives Dirk Kinnaes, Marc Nelissen, Luc Schokkaert and Mel Collier, Leuven University, Belgium
93
9. Issues in business planning for archival collections of web materials Paul Koerbin, National Library of Australia
BPDG_opmaak_12072010.indd 5
101
13/07/10 11:51
10. Organizing digital preservation Barbara Sierman, Royal Library of the Netherlands
113
11. Business planning for digital repositories Alma Swan, Key Perspectives Ltd, UK
123
12. Problems of multi-linguality Genevieve Clavel-Merrin, Swiss National Library, Bern, Switzerland
137
13. Business models for Open Access publishing and their effect on the digital library David C. Prosser, SPARC Europe, Oxford 14. Digital library metadata Stefan Gradmann, Humboldt University, Berlin, Germany
147 157
Case studies 15. FinELib: an important infrastructure for research Kristiina Hormia-Poutanen and Paula Mikkonen, Helsinki University, National Library of Finland
167
16. The digital library of Catalonia Lluís Anglada, Catalan Academic Library Consortium, Ángel Borrego, University of Barcelona and Núria Comellas, Catalan Academic Library Consortium, Spain
177
17. Digital library development in the public library sector in Denmark Rolf Hapel, Aarhus Public Libraries, Denmark
185
18. Digital libraries for cultural heritage: a perspective from New Zealand Chern Li Liew, Victoria University of Wellington, New Zealand
195
19. APEnet: a model for Internet based archival discovery environments Angelika Menne-Haritz, Stiftung Archiv der Parteien und Massenorganisationen der DDR im Bundesarchiv, Berlin, Germany
207
20. The California Digital Library Gary S. Lawrence, formerly of the University of California, USA
219
21. The Oxford Digital Library Michael Popham, Oxford University, England
229
BPDG_opmaak_12072010.indd 6
13/07/10 11:51
About the authors Ian Anderson is Senior Lecturer in New Technologies for the Humanities at the University of Glasgow, Scotland, and member of the Humanities Advanced Technology and Information Institute (HATII). He conducts research on web services for digital libraries, the archive and feminist theory, the interrelationships between identity, memory, writing and history, retention and deletion decision making in lay record managers and self-preserving digital objects. Lluís Anglada is director of the Catalan Academic Library Consortium and was formerly the library director of the Catalonia Technical University and professor for three years in the School of Librarianship of Barcelona University. He is active in various professional associations at Catalan, Spanish and international levels, including the scientific committees of professional conferences in Catalonia and Spain, European meetings of ICOLC and Library Advisory Boards of Academic Press, Blackwell Publishing, Nature Publishing Group and Springer. Ángel Borrego is a lecturer in the Department of Library and Information Science at the University of Barcelona, teaching an undergraduate course in information behaviour, and co-ordinating the School’s doctoral programme. His current research interests focus on scholarly communication and information behaviour in the electronic environment. Genevieve Clavel-Merrin is responsible for national and international cooperation at the Swiss National Library in Bern. She is currently working on projects covering digitization and its coordination in Swiss libraries, plus public-private partnerships. She is active in the field of multilingual access (MACS - Multilingual Access to Subjects) and IFLA. She has wide experience in European-funded projects and is the Swiss representative for The European Library. Mel Collier is Chief Librarian at the University of Leuven and a consultant in the fields of change management and innovation in libraries and learning in institutions of higher education. For the last ten years he has worked on business planning for digital libraries, including The European Library and Europeana. Núria Comellas is Project Manager at the Consortium of Academic Libraries of Catalonia responsible for negotiating and the licensing of electronic resources on behalf of the consortium library members and the division of costs between them. She was formerly a librarian at Universitat Oberta de Catalunya and at Universitat Politècnica de Catalunya. Stefan Gradmann is Professor of Library and Information Science with a focus on knowledge management and semantics based operations at the Humboldt University of Berlin. He studied Greek, philosophy and German literature in Paris and Freiburg (Brsg.) and received his Ph.D in Freiburg in 1986 in Literary Studies. He has worked as scientific librarian at the State and University Library in Hamburg, he was the director of the GBV Library Network, and worked for Pica B.V. in Leiden as product manager and senior consultant. Later he was Deputy Director of the University of Hamburg Regional Computing Center, the Project Director of the GAP (German Academic Publishing) 7
BPDG_opmaak_12072010.indd 7
13/07/10 11:51
Project of the German Research Association, and technical co-ordinator of the EC funded project FIGARO. He is currently heavily involved in building Europeana, the European Digital Library, specifically as co-leader of WP3 on technical and semantic interoperability and leader of EuropeanaConnect WP1 (Semantic Data Layer). Rolf Hapel is director of Citizens’ Services and Libraries in Aarhus, Denmark, and currently member of the coordination board for net libraries. He was awarded the order of Dannebrog and is an honorary professor at the Danish Library School. Aarhus Public Libraries received the Gates Foundation’s “Access to Learning Award” 2004, was among the top ranked public institutions in !nnovation Cup 2007 and is frequently top benchmarked in “Best on the Internet”. Kristiina Hormia-Poutanen is the Deputy National Librarian of the National Library of Finland and director of the Library Network Services department. She is a member of the LIBER board and chair of the LIBER Digitization and Resource Discovery section. She is the chair of the availability section of the Finnish Digital Library project. She is active in various national and international initiatives, including IFLA, ICOLC, LIBER and eIFL especially in relation to digital library infrastructures, consortia development issues, licensing and licensing models and open access. Dirk Kinnaes is Project Manager with LIBIS, the ICT department of Leuven University Library, which also provides ICT services to a large number of external library and archive clients through LIBISnet. He is a key participant in the development and implementation of LIAS, the Leuven Integrated Archive System. Paul Koerbin is the Manager of Web Archiving at the National Library of Australia with responsibilities for the PANDORA Archive and the Library’s web domain harvesting programme. Derek Law was Head of the Information Resources Directorate and University Librarian of Strathclyde University, Glasgow from 1999 until October 2008. He holds a chair in the Department of Computing and Information Science and is a member of the Centre for Digital Library Research. He has written extensively on the development of digital libraries, on the role of information in e-learning, on digital information systems and has a subsidiary interest in naval history. Gary S. Lawrence retired from the position of Director of Systemwide Library Planning for the University of California system in June 2008. Chern Li Liew is a Senior Lecturer in the School of Information Management at Victoria University of Wellington. She holds a PhD in Information Studies from Nanyang Technological University (Singapore), and an MSc from Loughborough University. She has published in the areas of digital libraries, digital cultural heritage, user studies and knowledge organization.
8
BPDG_opmaak_12072010.indd 8
13/07/10 11:51
Angelika Menne-Haritz is Director of the ‘Archives of Parties and Mass Organisations of the former GDR’ foundation in the Federal Archives in Berlin. She has extensive experience as an archival educator and teacher in administrative sciences. She received a PhD in literature and history, then undertook her archival training at the Archivschule Marburg and worked in the state archives of Berlin and of Schleswig-Holstein, before becoming director of the Archivschule Marburg. She passed her qualification for professorship (habilitation) at the German University of Administrative Sciences Speyer with a thesis on ‘Business Processes’. Paula Mikkonen is a licensing coordinator at the FinELib Consortium service unit of the Library Network Services department of the National Library of Finland. The service unit negotiates licence agreements of e-resources for the use of the FinELib Consortium member organizations. Marc Nelissen is an Archivist in the University Archive and Art Patrimonium department of Leuven University Library. He is a key participant in the development and implementation of LIAS, the Leuven Integrated Archive System. Michael Popham is Head of Oxford Digital Library, a core service of Oxford University. He is also the director of Electronic Ephemera, a project funded by JISC to digitize selections from the John Johnson collection. David C. Prosser was appointed the founding Director of SPARC Europe in 2002, following a career as a scholarly publisher for Elsevier Science and Oxford University Press. In December 2009 it was announced that he has been appointed as Executive Director of Research Libraries UK, a consortium of 29 of the largest research organizations in the UK, including the three national libraries. Wouter Schallier is since 2008 the Executive Director of LIBER (Association of European Research Libraries), based in The Hague, Netherlands. He was the Librarian of the Biomedical Library of Leuven University from 2005 until 2008. Luc Schokkaert is Head of Public Services and IT at KADOC, the documentation and research centre for religion, culture and society, based at Leuven University. He is a key participant in the development and implementation of LIAS, the Leuven Integrated Archive System. Barbara Sierman is Digital Preservation Manager at the Koninklijke Bibliotheek, the national library of the Netherlands in The Hague. She started her career at Pica (now OCLC) as a library consultant and, after various jobs in IT, she joined the KB in 2005. Alma Swan is the joint proprietor of Key Perspectives Ltd., founded in 1996. After research at Southampton University and a lectureship at Leicester University she became managing editor of an Elsevier Science biomedical research indexing service. She is a Visiting Researcher in the School of Electronics & Computer Science at the University of Southampton, Associate Fellow in the Marketing and Strategic Management Group at Warwick Business School and editor of The Euroscientist. 9
BPDG_opmaak_12072010.indd 9
13/07/10 11:51
Hilde Van Kiel is Head of the central University Library Services of the University Library of Leuven and coordinator of e-resources. She obtained a master’s degree in history from K.U.Leuven and previously worked as a librarian and network administrator for KPMG Tiberghien & Co. Harry Verwayen is Business Development Director at Europeana in The Hague and formerly worked at the think-tank Knowledgeland in Amsterdam. His specializes on business model innovation in the public sector, the use of visual thinking in problem solving and innovation in the cultural heritage sector. Hazel Woodward has been University Librarian and Director of the University Press at Cranfield University (UK) for over nine years. Her research interests include electronic publishing and scholarly communication (the subject of her PhD thesis) and she has published many papers in the professional literature on digital library issues. She is very active professionally, being currently the Chair of the JISC E-Books Working Group and a member of the JISC Journal Working Group.
10
BPDG_opmaak_12072010.indd 10
13/07/10 11:51
Business Planning for Digital Libraries Framework chapters
BPDG_opmaak_12072010.indd 11
13/07/10 11:51
BPDG_opmaak_12072010.indd 12
13/07/10 11:51
1 BUSINESS PLANNING FOR DIGITAL LIBRARIES Mel Collier
Introduction: the aim of this book It has become almost conventional to trace the history of the digital library back to Vannevar Bush, or even to notables in the origins of computing such as Charles Babbage or Ada Lovelace. That is not our intention, nor indeed is it our intention to record the history of digital libraries in any detail, but it is appropriate to explain why we present this book on business planning for digital libraries and why at this time. Librarians of a certain age, like the present editor, now nearing the end of their careers, and some already retired, have spent their entire professional lives working on the application of computers to libraries, from the early days of indexing and the creation of inventories through the development of library automation applications, then integrated library management systems, networking, mini-computer, micro-computer and PC applications, and in due time the digital library. From those origins until now is a period of fifty years at most, which in the history of libraries is perhaps a mere minute. It is a minute, however in which libraries have changed more than during any other period in library history, and within that minute digital libraries as strictly defined have occupied maybe the last fifteen seconds. Those fifteen seconds however represent fundamental change: so much so that even the continuance of the library as we know it is called into question. Libraries have changed forever during that time, or are in the process, yet only now are we getting to grips with planning that change. Those fifteen seconds have been a time of frantic development and great professional excitement, but mostly occupied with applying the white heat of technology. The literature of digital libraries is already vast, but the overwhelming majority concerns technical development, followed perhaps by issues concerned with property rights or metadata. Business planning for digital libraries is not prominent in the literature, certainly not as treated in an integrated way in which the various elements are brought together so that sustainability for the future is assured, or, put another way, so as to move from experiment or fixed term project to dependable service. This book tries to fill that gap by integrating the many issues that are required for successful business planning for digital libraries, but which are usually treated separately in the literature, and we do so by drawing on the management experience of people active in the field. On the other hand we are not starting from scratch because some important integrating works have gone before. We refer for example to Lesk (2005) whose comprehensive work approaches digital libraries primarily from a technical point of view, but also includes substantial chapters on economics and property rights. Andrews and Law (2004) provided a very useful early focus on policy and practice from a mostly, but not entirely, Anglo-American 13
BPDG_opmaak_12072010.indd 13
13/07/10 11:51
Mel Collier
perspective. Most recently Baker and Evans (2009) made an important contribution by expanding and integrating the previously scattered literature on digital library economics, primarily in the field of higher education and national libraries. This book looks at the whole process of planning the digital library from the point of view of business planning: that is planning it as if it were an enterprise that is meant to fulfil specific goals and from the outset designed to be sustainable and to provide value to those who invest in it and those who use it. Furthermore this book addresses the current trends which see digital libraries as belonging not only to the library world, but also to the broader cultural sector, and to do so from as international a perspective as possible, as exemplified by the European flagship project: Europeana.1 Business planning for digital libraries: definitions When the present writer started working on the business planning workpackage for The European Library2 in 2001 there were only a few published works relating to business planning for digital libraries. The term business planning itself was indeed only just coming into use in this context. Perhaps the first published usage occurred in Barton and Walker (2003), followed by Bishoff, L. and Allen, N. (2004). The first literature review from this project appeared in Collier (2004) followed by a more comprehensive one in Collier (2005). By then it was possible to produce a definition of the term, which we continue to use as the basic framework for this book: Business planning for digital libraries is here defined as the process by which the business aims, products and services of the eventual system are identified, together with how the digital library service will contribute to the overall business and mission of the host organizations. These provide the context and rationale, which is then combined with normal business plan elements such as technical solution, investment, income expenditure, projected benefits or returns, marketing, risk analysis, management and governance (Collier 2005). It may be considered necessary also to define what we mean by digital libraries, as there are already very many definitions in the literature. This not a matter of just academic interest as the definition of the digital library has important implications for business planning. For instance a fundamental policy decision of Europeana is that all the participating content providers must provide access to the digital object itself, albeit in many cases a digital surrogate (a digital representation of an original analogue or physical object). The reasoning behind the policy is that Europeana does not want the user to be frustrated by finding metadata which do not lead directly to the object. Providing access to the object itself in business planning terms is a very different thing from just providing a finding tool. A digital library provides access to the object itself; it cannot be just a catalogue. The aim of Europeana is to provide access to Europe’s cultural and scientific heritage through a cross-domain portal, see http://www.europeana.eu/portal/ (viewed 6 December 2009) 2 The European Library is a precursor to and in many ways the inspiration for Europeana. It provides access to the heritage of Europe’s national libraries, see http://search.theeuropeanlibrary.org/portal/en/index.html (viewed 6 December 2009) 1
14
BPDG_opmaak_12072010.indd 14
13/07/10 11:51
BUSINESS PLANNING FOR DIGITAL LIBRARIES
Our definition of the digital library implies that there is a rationale behind the assembly of the objects that it contains, what librarians call collection development and museum people call curation. It makes a clear distinction, for instance, between searching on the Internet and searching in a collection that has a design rationale behind it. It also implies that certain quality controls regarding reliability and authority of information provided have been put in place by those responsible for building the digital library. These requirements apply equally to digital libraries whether they are in the library or museum, archive or audio-visual domain: domain boundaries which by the way are in the process of breaking down anyway in the digital world. The author first formulated these elements into a definition in Collier (1997) which was adopted among others by the National Science Digital Library,3 as mentioned by Mischo (2004). We propose this definition now in a slightly modified form as the other pillar of the framework for this book: A managed environment of multimedia materials in digital form, designed for the benefit of its user population, structured to facilitate access to its contents, and equipped with aids to navigate the global network. Its users and holdings may be totally distributed, but it is managed as a coherent whole. Nowadays many digital library applications also include social networking and other Web 2.0 services which this definition did not specifically provide for but which can be encompassed in the phrases benefit of its user population or aids to navigate. The elements of business planning for digital libraries Business planning for digital libraries is about sustainability. Gone are the days when a digital library could be started up as a project without a realistic plan for its continuance into the future or could be allowed to disappear. Even fixed-term projects which are essentially researching a new technical solution need to show how the results will be exploited or made more widely available. Fixed-term projects aimed at collecting digital material or digitizing analogue material need to demonstrate how the new collection will be entrusted to a digital library organization with an assured future. The basic questions are therefore: – What is the value proposition? – Does the proposed digital library have a unique selling point? – Who are the target customers and what is the (scale of ) the target market? – What are the key enabling technologies? – What are the risks? – Who pays? Sustainability is in great measure about financing and resources, but not solely, as we shall see. Digital library content can be divided into two fundamental categories: born digital content (material created digitally which never existed in analogue form) and digitized content (a digital version or surrogate of original analogue material). The sustainability issues surrounding these two fundamental categories are rather different. Coping with 3
See a contemporary presentation at http://fox.cs.vt.edu/UKJWDL/present/zia.ppt Viewed 6 December 2009 15
BPDG_opmaak_12072010.indd 15
13/07/10 11:51
Mel Collier
born digital content is effectively an endless and immeasurable task, raising major issues of what to select, how to select it and how to allocate responsibility for preservation and future access. Because of the sheer scale of output and its often ephemeral nature it is inevitable that more information will be lost in the digital era than in the print era, with consequences for future socio-historical perspectives. Digitization of analogue material on the other hand is a more scalable challenge: the limits of a digitization project can be defined within available resources. This is not to say that digitization of existing analogue material is not a huge task, which it evidently is, but the task can be segmented, prioritized and approached gradually without usually (with the exception of material on vulnerable media) great risk of loss of the original. Selection of material (whether born digital or digitized) can be roughly divided into the processes involved in preserving and giving access to the intellectual record on the one hand and to cultural heritage on the other. Here we use cultural in its broadest possible sense. Whereas the intellectual record has well-established processes for quality control such as peer-review, selection of cultural heritage is a looser more subjective process. It can however be validated by collaborative decision making at the appropriate level, for example institutional, regional, national or along thematic lines. In this book we will find the terms business model and business plan. In the real world they are often used rather interchangeably and the reader may find that reflected in the various contributions. Broadly speaking, however, the business model addresses the first half of the questions above which relate to the sustainability concepts behind the enterprise, those being: – Vision/mission – clear statement of the aims and nature of the business – and why – Definition of the service/products offered – Target market – user/customer/client profile – Nature of the business (profit, not-for-profit, public service, self-sustaining etc., combination of these) – Nature and sources of income The business plan turns these concepts into concrete, convincing and defensible arguments and actions that will attract resourcing and realize the concept and will contain the following elements: – Expected income streams – Investment required – Recurrent costs (personnel, materials, services) – Marketing plan – Branding – Risk analysis – Milestones ․ Pilot phase? ․ Go – no go decision? ․ Live launch ․ Break-even point? – Evaluation
16
BPDG_opmaak_12072010.indd 16
13/07/10 11:51
BUSINESS PLANNING FOR DIGITAL LIBRARIES
Financing Not surprisingly the nature and sources of income figure prominently in the business model. As digital libraries move out of the experimental into the operational phase financing becomes a major challenge, for two main reasons. Firstly the digital library rarely completely replaces the physical library (and certainly not the museum or archive) so it is often (partly) a question of additional cost which the executive boards of institutions are reluctant to hear. Secondly the digital library makes new demands on infrastructure and personnel, and raises ever higher expectations among the users regarding new functionality, design, accessibility and scope. Unless the institution or its governing authority can find extra resources (not obvious in this day and age) this leads to difficult strategic decisions. Institutions and indeed governments often react by requiring proposers to find external funding and effectively shift the responsibility. This had led to a current preoccupation with public-private partnerships. Within the European Community this idea has been promoted by the High Level Expert Group in its report on public-private partnerships (PPPs).(i2010 2008). In general the European Commission operates a subsidiarity principle which makes member states responsible for digitization. The report encourages PPPs, gives advice on good practice and governance and makes welcome recommendations on copyright and the protection of public domain content. The only problem is that the advice seems to be indeed directed at very high level institutions (which may have the profile to attract major sponsors), as borne out by the case studies, but it is not at all clear how the private sector can come to the aid of the myriad institutions wanting to realize a digital library project. To be fair to the European Commission the subsidiarity principle has been slightly softened in the current funding programme, meaning that digitization can in certain circumstances be funded centrally. A similar theme is followed by a report carried out for the Strategic Content Alliance and Ithaka (Guthrie 2008) which provides advice on how the results of academic work in the form of websites can be sustained. It points out that academics are generally not natural entrepreneurs, but emphasizes the need for an entrepreneurial approach and gives practical tips. This is a principle which we wholeheartedly support, but as far as private investment is concerned we have the same reservation: how much potential is out there? As stated by Greenstein and Thorin in their review of major American academic digital libraries we believe there is no alternative for the great majority of institutions to making structural provision in the institutional budget. We can summarize possible income streams as follows: – Direct financing by the owner institution or by institutional partners (could also be called a subscription or co-financing model) – Subsidy by interested parties –government, associations, trusts, charitable bodies – Pay to publish (one of the Open Access models) – Sponsorship – Advertising – Subscription by the client institution (the institutional licence model) – Subscription by end-users – Pay-as-you-go by the article, chapter or object It is not the intention in the following chapters to provide actual costings of the respective digital library projects. Given the wide variety of projects covered here, different costing 17
BPDG_opmaak_12072010.indd 17
13/07/10 11:51
Mel Collier
aspects in various countries and the rapidly changing costs themselves, this is hardly feasible or sensible. Rather the authors were asked to indicate where possible what the financing issues were and how they were addressed. It is evident, moreover, that the economics of digital libraries is still very much work in progress and we refer the reader to Baker and Evans (2009) mentioned above. The management elements of the business plan As we mentioned above, sustainability is not only about financing, but also about effective management. The business plan needs to show therefore how competent technical, management and content personnel will be attracted and retained. Such people are very much in demand. Several of the following chapters go into this issue. Regarding content the plan will need to justify, in relation to the vision and mission, the focus of the content and how it will be discovered, selected and acquired, whether by digitization or as born digital. Regarding users the plan will need to be clear about which users are targeted for the product. A digital library developed for academic research is likely to be rather different from one developed for secondary schoolchildren, or again for the general public, especially as digital libraries are increasingly expected to provide social networking, Web 2.0 or 3.0 functionalities or facilities for user generated content. These decisions will in turn inform the marketing and publicity plan, which is crucial if there is to be an income generation element in the business plan and, even if not, will be required to ensure maximization of usage and justification of the original investment. Closely related to these points is the need to indicate beforehand what the approach to evaluation will be and the criteria for success. All stakeholders, whether they are the host institution, public sector subsidizing agencies, trusts or foundations, or commercial sponsors will require this. The stakeholders will also require a serious risk analysis, which, depending on the nature of the funding and the enterprise, can comprise technical, management, commercial, political risks or any combination of these. Regarding technical issues, the planning will cover the necessary hardware and software infrastructure and, very importantly, the format and metadata standards that will be employed for import, storage and exchange of content. Given the substantial investment often required for building a digital library the plan will need to describe policies and approaches to long term preservation and digital archiving in order to ensure the investment is not lost. Finally the business plan will provide a description, sufficient for the stakeholders, of the development schedule from original conception through design, development, test and implementation to launch, in which all the technical and management elements come together. After launch the schedule will typically provide a forecast of the usage, market uptake and income streams together with break-even points (if applicable) and timing of evaluations. Examples of business planning As stated at the beginning published examples of business planning for digital libraries are not easy to find. This is probably because institutions treat it as a confidential matter either for internal reasons or for commercial or competitive reasons. Even projects funded under European programme rules may elect to keep their plans confidential, 18
BPDG_opmaak_12072010.indd 18
13/07/10 11:51
BUSINESS PLANNING FOR DIGITAL LIBRARIES
especially if there are private sector partners. Greenstein (2002), whilst not publishing the business plans, provides much information on the planning processes and rationales behind arguably the most important university digital libraries in the USA: the California Digital Library (see also our chapter by Lawrence), Harvard, Indiana, New York University, Michigan and Virginia. An excellent example of a business plan available on the web (perhaps under public information rules) is the Ontario Digital Library,4 a collaboration between public, university, school and college libraries. All the elements are there: – Governance and management – Staffing – Technology and infrastructure – Marketing, communication, branding – Bi-linguality – User and staff training – PEST (Political, economic, social, technological) and SWOT analysis – Review of other jurisdictions – Funding requirements The author’s own work on the European Library (Collier 2004, Collier 2005) provides a full description of the process of developing the business plan and the issues that were examined: partners’ own aims, mission, added value/USP, types of media content, target groups by type and by geographic area, categories of subject content, advantages to the partners (national libraries), branding, funding models, scale of finance needed, and risk analysis. Perhaps the best set of published case studies on sustainability (as of late 2009) can be found in a further Ithaka study (Maron 2009). Whilst these studies do not constitute business plans as such and relate to initiatives of widely varying nature and scale they do provide much useful detail and experience. The approach of this book This book is an edited compilation of contributions organized into three sections. In the first section we have the framework chapters in which business planning for digital libraries is approached from the perspective of the environment in which planning occurs. Harry Verwayen (Knowledgeland, Netherlands) follows up this introductory chapter by focusing on business model innovation for the digital cultural heritage sector, expanding in particular on the concept of the long tail as envisaged in the business model for Europeana. Derek Law (Strathclyde University, Scotland) examines digital library business planning in the context of higher education, the sector with the longest history in the field and which has been the most innovative and enterprising hitherto in digital library development. In the arts and social sciences, where the printed work still plays a crucial role, the last few years have nevertheless seen enormous developments not only in the provision of digital information (and digitization is crucial here) but also in scholarly methods, as analysed authoritatively by Ian Anderson (HATII, Humanities Advanced Technology And Information Institute, University of Glasgow, Scotland). 4
http://www.accessola2.com/odl/pdf/ODL_BusinessPlan_Full.pdf This was first viewed by the author about five years ago and has so far remained accessible! Viewed 3 January 2010 19
BPDG_opmaak_12072010.indd 19
13/07/10 11:51
Mel Collier
Wouter Schallier (LIBER, Netherlands) on the other hand tackles the very real issue that in scientific, technical and medical (STM) libraries research information is now almost entirely electronic, with the result that researchers have no need of the physical library. However, usage of library space by students, paradoxically enough, is absolutely not on the decline as learning and study methods change and the library evolves into the learning centre. In the second section we present the practice chapters. Here we examine the various practical issues and challenges of planning the digital library, whether of a technical, organizational or publishing-related nature. The most remarkable strategic change of the last decade or so has been the rise to dominance of e-journals over the previous print journals. The important issues and their implications for business planning are discussed by the present writer and Hilde Van Kiel (Leuven University, Belgium). Slower to gain acceptance but now set to take off are e-books (Amazon reported that on Christmas day 2009 they sold more e-books than conventional books due to the success of the Kindle5). Hazel Woodward (Cranfield University, UK) provides a comprehensive examination of how e-books fit into business planning for digital libraries with reference (among others) to platforms, consortia, usage monitoring, acquisition and delivery. There follows a group of chapters concerned with the wide range of issues involving archiving and preservation. Kinnaes, Nelissen, Schokkaert and Collier of Leuven University discuss the issues of digital archiving from the point of view of the professional archivist, which are in some ways distinct from the concerns of the librarian or museum curator. Paul Koerbin (National Library of Australia) discusses the problems of how to record the often transient information that appears on the Web – a challenge for preservation of born digital information on a grand scale. A slightly different challenge but nonetheless immense is addressed by Barbara Sierman (National Library of the Netherlands), namely how to preserve published and nationally important digitized information – one of the core business activities of national libraries. Both authors approach these topics from their experience in their own national libraries. Relevant to the theme of preservation but in an academic context and with prime focus on research dissemination and exploitation, Alma Swan (Key Perspectives Ltd, UK) examines business planning aspects of repositories. To complete the practice section we have three chapters which deal with important overarching issues. Genevieve Clavel (National Library of Switzerland) discusses issues of multi-linguality: crucially important of course in the international context and with the many efforts that are ongoing in European framework projects. Like all digital library topics, this is very much work in progress. David Prosser (SPARC Europe, UK) takes us through the key political and economic issue of how Open Access can further the cause of scholarly publishing and possibly alleviate some of the financial pressures that threaten access to scientific research results. Finally Stefan Gradmann (University of Hamburg) provides us with a handy guide to digital library metadata, which every general manager or professional needs to know but perhaps does not dare to ask The third section provides a group of case studies which focus on how these business planning issues were or are addressed in particular projects or services. Kristiina HormiaPoutanen and Paula Mikkonen (Helsinki University Library/National Library of Finland) describe the development of FinELib, which has long been recognized as one of the most successful consortia and effectively operates as a national digital library for 5
Widely reported on the Web including among others: http://news.softpedia.com/news/Amazon-Sells-MoreE-Books-than-Physical-Ones-on-Christmas-Day-130699.shtml viewed 4 January 2010 20
BPDG_opmaak_12072010.indd 20
13/07/10 11:51
BUSINESS PLANNING FOR DIGITAL LIBRARIES
scientific information. Continuing the consortium theme Lluís Anglada, Núria Comellas (Consortium of Academic Libraries of Catalonia) and Ángel Borrego (University of Barcelona) describe the formation of the Digital Library of Catalonia which, like FinELib, has brought immense increase in access to and range of scientific information available for their members. Regarding the public library sector it might also be thought that the digital library has not had such a fundamental impact as in the academic sector, owing to the continuing and natural predilection for the book for leisure reading, but as Rolf Hapel (Aarhus, Denmark) shows, public libraries are indeed a fertile ground for innovation, particularly when supported by dynamic national agencies. The approach here is from a national strategic perspective, as indeed in the next chapter from Chern Li Liew (Victoria University of Wellington, New Zealand) who fulfils one of the aims of this book by addressing the broader issues of business planning of digital libraries for cultural heritage and possibly introducing the reader to the particular issues concerned with preservation of, exploitation of and access to the digital cultural heritage of indigenous peoples, a rather different matter from the European cultural tradition. Angelika Menne-Haritz (Stiftung Archiv der Parteien und Massenorganisationen der DDR im Bundesarchiv, Berlin) provides insight into how business planning principles are being applied to create a model for Internet access to archives with reference to the APEnet project, itself related to Europeana. The book is completed by accounts of two of the most important university digital libraries. The California Digital Library, one of the earliest to be formally set up with significant funding and governance structure and with the status of a fully fledged library within the University of California system, is described by Gary Lawrence; while Michael Popham provides important and frank insights into the business issues of the Oxford University Digital Library, including of course those involved in its being one of the libraries of the Google digitization programme. Each chapter follows a broadly similar pattern. After the introductory material the authors take us through the business planning elements as laid out above as and when they apply in their particular field or case. Each chapter has a summary in the form of bullet points and a bibliography for further reference. We hope that the reader will find this structure helpful and instructive. References Andrews, J. and Law, D. ed. (2004) Digital libraries: policy, planning and practice. Ashgate, ISBN 0754634485 Baker, D. and Evans, W. ed. (2009) Digital library economics: an academic perspective. Chandos, ISBN 9781843344032 Barton, M.R. and Walker, J. (2003), Building a business plan for DSpace, MIT Libraries’ digital institutional repository”, Journal of Digital Information, Vol.4 No.2. Bishoff, L. and Allen, N. (2004). Business Planning for Cultural Heritage Institutions. Council on Library and Information Resources, Washington DC. Collier, M. (1997) Towards a general theory of the digital library. Proceedings of the International Symposium on Research, Development and Practice in Digital Libraries, Tsukuba, Japan, 1997, pp 80-84. http://www.dl.slis.tsukuba.ac.jp/ISDL97/proceedings/collier.html (viewed 6 December 2009) 21
BPDG_opmaak_12072010.indd 21
13/07/10 11:51
Mel Collier
Collier, M. (2004) Development of a business plan for an international co-operative digital library – The European Library (TEL). Program Vol. 38 (4) 2004. Collier, M. (2005) The business aims of eight national libraries in digital library co-operation: a study carried out for the business plan of The European Library (TEL) project. Journal of Documentation, Vol 65 No.5 Greenstein, D. and Thorin, S. (2002) The Digital Library: a biography. Digital Library Federation. Council on Library and Information Resources. 2nd ed. ISBN 1887334955 http://www.diglib. org/pubs/dlf097/dlf097.pdf (viewed 3 January 2010) Guthrie, K., Griffiths, R. and Maron, N. (2008) Sustainability and revenue models for online academic resources: an Ithaka report. Strategic Content Alliance/Ithaka. http://www.jisc.ac.uk/ media/documents/themes/eresources/sca_ithaka_sustainability_report-final.pdf (viewed 3 January 2010) i2010 European Digital Libraries Initiative (2008). High Level Expert Group on Digital Libraries. Sub-group on Public Private Partnerships. Final report on public private partnerships for the digitisation and online accessibility of Europe’s cultural heritage. http://ec.europa.eu/ information_society/activities/digital_libraries/doc/hleg/reports/ppp/ppp_final.pdf (viewed 3 January 2010) Lesk, M. (2005) Understanding digital libraries. 2nd ed. Elsevier, 2005 ISBN 9781558609242 Maron, K., Smith, K.K. and Loy, M. (2009) Sustaining digital resources: an on-the-ground view of projects today. Strategic Content Alliance/Ithaka. LK. Kirby Smith and http://www. ithaka.org/ithaka-s-r/strategy/ithaka-case-studies-in-sustainability/report/SCA_Ithaka_ SustainingDigitalResources_with_CaseStudies_lower%20res.pdf (viewed 4 January 2010) Mischo, W.H. (2004) United States federal support for digital library research and its implications for digital library development. In Andrews and Law (2004)
22
BPDG_opmaak_12072010.indd 22
13/07/10 11:51
2 BUSINESS MODEL INNOVATION IN DIGITAL LIBRARIES – THE CULTURAL HERITAGE SECTOR Harry Verwayen
Introduction This chapter focuses on the challenges and opportunities faced when designing new business models for a digital library. Our approach will be first to introduce the reader to the particular characteristics of the cultural heritage sector and the theoretical framework of business model innovation which has been applied to Europeana, the European portal to digital cultural heritage objects, as an illustration of the issues. In order to do so we will introduce the reader to the background to the Europeana project. This will clarify some of the challenges faced when designing a fitting business model for a digital library of this scale and magnitude. Cultural heritage Although it is debated, for the purposes of this article we will use the UNESCO definition of cultural heritage as: ‘The entire corpus of material signs - either artistic or symbolic - handed on by the past to each culture and, therefore, to the whole of humankind (UNESCO, 2005).’ This definition is still quite broad of course and includes a whole spectrum ranging from monumental buildings to recorded oral history. In the context of business modelling and planning for digital libraries we are naturally focused on cultural heritage artefacts traditionally held within the walls of the cultural institution: images, texts, sounds and videos that provide insight into our past as well as context to the present, ranging from the Magna Carta to footage of François Mitterand’s famous answer to rumours that he had a mistress and a child, at the time of President Clinton’s indictment: ‘et alors?’. These are artefacts that provide context to the world we live in and as such represent great value to us. The selection, preservation and dissemination of this material have traditionally been vested in museums, research libraries and archives, collectively referred to as the cultural heritage sector. Although museums and archives have different organizational models and are often oriented towards a different audience, both types of institutions at their core have a common public task: to ensure that our collective memory is preserved and made accessible for future generations. The sector as a whole can be characterized as being loosely organized, mostly publicly financed and with organizational backgrounds that often go back centuries. 23
BPDG_opmaak_12072010.indd 23
13/07/10 11:51
Harry Verwayen
Business model innovation The term ‘business model’ is used for a broad range of informal and formal descriptions to represent core aspects of an organization, including purpose, offerings, strategies, infrastructure, organizational structures, trading practices, and operational processes and policies. Although for as long as business has existed there have been models to support it, the process of consciously designing business models has gained importance in the second half of the twentieth century, and became a crucial aspect of developing new business with the advent of disruptive technologies such as the Internet. In an increasingly complex technological environment the need to understand and innovate in the field of business models will only increase. Recently the concept of business model innovation has spread beyond the realm of for-profit businesses and is currently – but hesitantly - applied in not-for-profit operations. In the context of this article a business model is understood to be ‘the rationale of how an organization creates, delivers and captures value’ (Osterwalder, Pigneur 2009).
Figure 1: Business Model (Osterwalder, Pigneur 2009)
The theoretical framework of the business model consists of interrelated building blocks which depict the logic of how the organization intends to deliver value: 1. Customer segments: an organization serves one or several customer segments. 2. Value proposition: an organization seeks to solve customers’ problems and satisfy customers’ needs with value propositions. 3. Channels: value propositions are delivered to customers through communication, distribution and sales channels. 4. Customer relationships: each value proposition offered to a client group establishes a relationship. 5. Key activities: the activities that are required to offer and deliver the value proposition. 24
BPDG_opmaak_12072010.indd 24
13/07/10 11:51
BUSINESS MODEL INNOVATION IN DIGITAL LIBRARIES
6. Key resources: the resources that the organization needs to perform such activities. 7. Key partnerships: the partnership network the organization needs to establish to perform certain activities that it cannot efficiently perform by itself. The building blocks are organized in a front end, the ‘what’ and the ‘who’, which define the revenue building capacity of the organization, and a back end, the ‘how’, which establishes the cost structure of the organization. Business model innovation and cultural heritage A crucial aspect of this model is the fundamental understanding that a business model is not solely about revenue streams. The business model describes the logic of the organization as a whole to create value, therefore all building blocks are interdependent on each other. Business model innovation occurs when one or more of the building blocks change. This can happen in the back end of the organization (resource driven) or in the front end when customer needs change (customer driven) or the proposition changes (offer driven). In our case the value proposition of cultural heritage changes when artefacts are being migrated from analogue carriers of information to digital assets. Over the past ten years this has happened on a very large scale: millions of books, manuscripts, film footage, even clay tablets have been digitized. With this migration, the variable costs of distribution have decreased tremendously, which in turn opens up ways to reach many new client segments which were previously difficult or very expensive to reach. Whereas the predominant client segment in this sector used to be limited to professional researchers who would come to a physical location to conduct research, a larger public can now be engaged in using (parts of ) the archive that have a specific interest for them. Naturally, these new groups have their own specific requirements and therefore will not necessarily be reached through the same channels and services as the professionals. With the change of channel, the organization’s ‘relation’ with its user groups will shift, as it will often be manifested within an online environment. Instead of a relatively intense, enduring relationship with customers who frequent the building, we are now engaging with a group which may hardly be aware of the institution that is hosting the material. Similarly, a completely different type of organization may be required to support this new value proposition. Interacting in social networks with a group of interested amateurs is fundamentally different from catering to a group of academics who know reasonably well what they are searching for. Instead of a physical building at the centre of the cost structure, we will quite likely see a shift towards a more platform-driven cost structure. On an organizational level, different skill sets will be required to support the new value proposition. Such an organizational change will call for a new consideration of questions surrounding outsourcing and in-house development: the general public will very likely best be reached on platforms that they are already frequenting, not on the site of the institution. 25
BPDG_opmaak_12072010.indd 25
13/07/10 11:51
Harry Verwayen
When plotting the changes that occur in this industry on the business model canvas it becomes apparent that this product-driven innovation has fundamental repercussions on all aspects of the business model. The potential for additional value creation is considerable now that a much broader client group can be served at relatively low additional cost (SEO, 2006). But in order to reach this new client group and to engage with them, many other parts of the business model will have to change as well. Most notably, the need for forging new partnerships increases, especially in areas of activity that are not traditionally covered by this sector. Europeana The idea for a European Digital Library (EDL) was first formally mooted in a letter (Digital Library Initiative, 2005) to the Presidency of Council and to the Commission on 28 April 2005. In that letter, six Heads of State or Government, led by the French president Jaques Chirac, asked for the creation of a virtual European library, aiming to make Europe’s cultural and scientific resources accessible to all. This initiative should be seen in the light of numerous projects to make heritage resources available, most notably the Google Books initiative. In fact the European Digital Library - which was later named Europeana - was, at least in part, a reaction to it. Notwithstanding the great achievements of Google as a search engine (Kangas, 2007), the European heritage was deemed too important to be trusted to a single, privately held, American company. The letter to Mr. Barosso was therefore received as a plea to make these digitized artefacts available in a sustainable way with a strong focus on the reinforcement of the public domain. This notion immediately formed a crucial aspect of the business model, which was yet to be designed. On 30 September 2005 the European Commission published ‘i2010: communication on digital libraries’, where it announced its strategy to promote and support the creation of a European digital library. The speedy integration of the initiative within the European Information Society i2010 program marked the European Digital Library as an important pillar of the Union’s ambitions to become the most competitive and dynamic knowledge economy by 2010. By embedding the digital library initiative in the i2010 policy other aspects of the business model started to emerge: the digital library was to contribute to the i2010 goals, which aim to foster (economic) growth and jobs in the information society and media industries. Therefore, even before the concept of Europeana had crystallized, two important aspects of the business model had already taken shape: Europeana was to support both a socialcultural aim of the union to make its heritage widely available to all and an economic aim to contribute to growth. The real work started in July 2007 when a thematic network, EDLnet, was funded by the European Commission under the eContentplus programme, as part of the i2010 policy. The network consisted of over 100 representatives of heritage and knowledge organizations from all parts of Europe who assisted the core team (based in the National Library of the Netherlands) in solving technical, usability and business issues of the project. The project 26
BPDG_opmaak_12072010.indd 26
13/07/10 11:51
BUSINESS MODEL INNOVATION IN DIGITAL LIBRARIES
gained a head start by building on much of the work done by the Council of European National Libraries on The European Library, also based at the National Library of the Netherlands. During this phase, the goals and structure of the initiative started taking shape. Europeana was to become an integrated portal, an ‘aggregator of aggregators’, facilitating access to European cultural digital objects. Specifically, the statutes of Europeana listed the following objectives: – Providing access to Europe’s cultural and scientific heritage though a cross-domain portal – Co-operating in the delivery and sustainability of the joint portal – Stimulating initiatives to bring together existing digital content – Supporting the digitization of Europe’s cultural and scientific heritage In practice this would entail combining widely differing metadata structures, a large variety of data sources ranging from manuscripts to video, and numerous other technical, legal and cultural hurdles. A not insignificant part of the challenge was to design a sustainable business model which would ensure the longevity of the initiative beyond the initial project funding. On 20 November 2008 the prototype Europeana.eu was launched by Viviane Reding, European Commissioner for Information Society and Media. The site, which by then provided access to two million digital objects from the collections of EU cultural institutions, quickly collapsed under the massive attention it received: an estimated 10 million hits an hour caused the organization to take the site down in order to double the capacity of the infrastructure. Designing the model In view of the history of the project as described in the previous section, it was clear from the start that the mission of Europeana would be twofold: to serve the socio-cultural interest of the Union to make its shared heritage available to all, as well as to fulfill the Union’s ambitions to become a leading knowledge economy. In plain words: to make the material available for free to the end user while at the same time facilitating the generation of new (market) revenues. The operation itself would cost about 3 million euros a year to sustain. But the ambitions reached further than that: Europeana was to stimulate socioeconomic growth for society as a whole. These goals are of course not necessarily mutually exclusive, but they were felt to be a serious challenge to conceptualize. A workgroup within the thematic network was asked to take this challenge back to the drawing board and to design a suitable model, a ‘rationale to create, capture and deliver value’ with Europeana. Proposition: The Long Tail and the value of aggregation As has been argued above, the fact that from a cultural perspective Europeana would have to be as open as possible to end-users was quickly considered an axiom. One of the most critical elements that needed to be resolved therefore was the definition of the economic value of such an open platform and the implications for the business model. On a per 27
BPDG_opmaak_12072010.indd 27
13/07/10 11:51
Harry Verwayen
artefact level, studies have shown that the market value of cultural heritage is limited (Kaufman 2009). Only a small part of the content will represent a sufficiently high value to a sufficiently large audience for its revenues to exceed the costs of making it available in digital format.
Figure 2: Cost/value ratio per artifact
Figure 3: The long tail effect
Long Tail However, in a digital environment, where the costs of distribution come close to nothing, a long tail effect can be expected. Economic value will be derived not only from a small number of ‘hits’ - the high volume head of the traditional demand curve – but also from the endlessly long tail of the revenue distribution. Or, as Anderson has famously stated, one comes to the position to be ‘selling less of more’ (Anderson 2006). Nevertheless, other than in the - mostly cultural and creative - environments that Anderson refers to, the total economic market value of Europeana’s content is still estimated to be significantly lower than the total costs needed to ‘share Europe’s culture and heritage in an online world’. The break-even point on a per item level is expected to move to a more favourable point on the curve, but it will be a long way from sustaining the economics of 28
BPDG_opmaak_12072010.indd 28
13/07/10 11:51
BUSINESS MODEL INNOVATION IN DIGITAL LIBRARIES
the venture. Besides, Europeana is unlikely to have any commercial rights to exploit the content itself. 1 Aggregation By opening up a critical mass of cultural content in aggregated form, however, another dynamic comes into play. The value of the proposition is no longer dictated by the value of individual artefacts, but rather by the whole. The value is not so much the artefact itself as the relationship it has to other artefacts. This relationship could be a marketable product in itself and generate income to sustain the project. Such an approach can lead to a self-propelling network-effect between cultural investment and economic demand: a critical mass of supply opens itself to demand; increasing demand drives additional supply, etc. This dynamic can attract advertisers and affiliated services, but also search engines and semantic operators looking for new ways to extract value out of aggregated content, metadata and user profiles. On a national and European level, the economic spin-off effects of attracting such a class of partners and clients can be far greater than the direct revenue it will secure for the Europeana service itself: many will see in Europeana the possibilities of broadening their working domain and horizons, set up new businesses, and create new services. Europeana had articulated the ambition to bring together millions of artefacts or surrogates it does not own and for which it may not have much leeway for commercial exploitation. The cultural impact of this endeavour was not much debated. The economic aspect, however, was considered an issue: how much revenue could be expected from the market, and how much funding would be necessary from other sources? Client segments The business model for Europeana is comprised of the building blocks mentioned above. As the most important questions centred on the proposition for the various client groups, these building blocks became central to the design. Based on the value definition as explained above, the following three client segments were identified:
1
Europeana will not host content, but will merely link to content on external sites. It was clear from the start that the business model would have to operate within the framework of limitations such as copyright issues. These issues have obviously been taken into account but were not a principal driver of the business model. Work in this field will be undertaken in the affiliated project Europeana Connect. 29
BPDG_opmaak_12072010.indd 29
13/07/10 11:51
Harry Verwayen
Figure 4: client groups
Group 1 - End Users In order to stay close to its mission, Europeana needed to keep the service as open as possible to end users. A subscription model on a closed service would therefore not be a suitable revenue model. The offering to this group is primarily cultural: many more individuals will enjoy access to these rich resources. The economic returns on offering the content for free to end users will therefore be mostly indirect. Provided that Europeana is able to attract a high level of (returning) users this can attract revenue streams from the market from partners such as advertising agencies, search engines, companies dealing with semantic enrichment of data and other content providers. A high level of use will also be a metric for another group which has a stake in this operation: administrations on the European level may find this a sign of increased social and cultural cohesion in the Union and individual countries and institutions will find Europeana a useful new window in which to display their artefacts. Group 2 - Sponsors Europeana’s funding partners comprise essential stakeholders willing to invest in the initiative, foreseeing direct cultural and indirect economic returns (spin-off). Public funding could therefore be expected from the European Commission, the EU member states, and the member heritage institutions. Participating institutions were quickly dismissed as potentially viable sponsoring partners; the incentive for them is too low, the costs of securing the funding too high and they were already contributing in natura (effort, links to their content, metadata). For the Commission, already a strong supporter of the Europeana project, the cultural diversity aspect of Europeana is key. As we have tried to demonstrate, this cultural diversity will be the main value proposition for the coming years; therefore continued significant support from the Commission seems appropriate, also after the end of the project period 30
BPDG_opmaak_12072010.indd 30
13/07/10 11:51
BUSINESS MODEL INNOVATION IN DIGITAL LIBRARIES
in 2011. Additionally, supporting open content distribution in uniform formats may spur additional economic activity from other related sectors such as semantic operators and the like. For the member states, participation in such an endeavour will secure the international availability and circulation of national content, thereby creating a new window for national cultural and linguistic heritage. Member heritage institutions can in turn showcase the treasures in their holdings, and do so in a broader context. Group 3 - Market Overall, also based on our experiences with similar projects, the expectation after this exercise was that public funding would comprise around 95% of the income of Europeana in 2011, declining to about 85% in 2015 (SEO, 2006). Market revenue would rise by the same amount as the services would develop and additional content and use would make the overall service more attractive. To search engines and semantic operators Europeana’s high quality, aggregated, meta-dated and contextualized content represents additional income opportunities from new or different revenue windows. For corporate sponsors, Europeana is a means to pursue corporate goals more effectively, and it offers them an increase in their visibility. A small amount of the funding of Europeana could be expected to come from this group, but the amount of business development resources needed to secure the funding was expected quickly to outweigh the revenue. At the time of writing this chapter (November 2009), the ambitions of Europeana have far from dwindled. The European Commission has re-endorsed the need for this endeavour and many hundreds of experts around Europe are working hard to make this venture a reality. Many hurdles have yet to be overcome. Rights-holder issues in particular continue to be a challenge, but technological and cultural issues also require constant attention. In fact, as with any emerging business, the conceptual shape of Europeana itself continues to be debated while the project is taking shape. And along with that evolves the business model. Summary - In the context of this chapter a business model is understood to be ‘the rationale of how an organization creates, delivers and captures value’. - The cultural heritage sector is challenged by digitization to innovate with regard to its business models. The value proposition of cultural heritage changes when artefacts are being migrated from analogue carriers of information to digital assets. Over the past 10 years this has happened on a very large scale: millions of books, manuscripts, film footage, even clay tablets have been digitized. By identifying and securing new client groups a great amount of new value can be created, but this will have implications on all aspects of the business model. - Europeana is a cultural heritage initiative with specific challenges and opportunities. Even before the concept of Europeana had crystallized, two important aspects of the business model had already taken shape: Europeana was to support both a 31
BPDG_opmaak_12072010.indd 31
13/07/10 11:51
Harry Verwayen
social- cultural aim of the union to make its heritage widely available to all and an economic aim to contribute to growth. -E uropeana has articulated the ambition to bring together millions of artefacts which it does not own and from which it can expect only indirect commercial benefit. The key issue is how much revenue could be expected from the market, and how much funding would be necessary from other sources. - Th e model assumes the “long tail” effect and has been built up around the three client segments: end-users, sponsors (including public subsidy) and the market. - Th e model targets a ratio of 95:5 sponsorship-market in the first instance, progressing to 85:15 over a five-year period.
References Anderson, C. (2006), The Long Tail: how endless choice is creating unlimited demand. New York, Hyperion Bishoff, Liz & Allen, Nancy (2004): Business Planning for Cultural Heritage Institutions; A framework and resource guide to assist cultural heritage institutions with business planning for sustainability of digital asset management programs. Council on Library and Information Resources, Washington D.C (USA). www.clir.org/pubs/reports/pub124/pub124.pdf (viewed 24 May 2010) Boyle, J. (2006), The Public Domain. Enclosing the Commons of the Mind. New Haven, Yale University Press Digital Library Initiative (2005) Letter from six heads of state to the President of the European Commission, Jose Manuel Barosso, http://ec.europa.eu/information_society/activities/digital_ libraries/doc/letter_1/index_en.htm (viewed 24 May 2010) Jokilehto, J. ed. (2005) “Definitions of Cultural Heritage”, ICCROM Working Group ‘Heritage and Society’ Kangas, P., Toivonen, S. and Back, A (eds) (2007), Ads by Google and other social media business models, VTT Tiedotteita- Research Notes 2384 Kaufman, P. (2009), On Building a New Market for Culture, http://www.jisc.ac.uk/media/ documents/publications/scaintelligenttvsponsorshipreport.pdf (viewed 24 May 2010) Osterwalder, A. (2007), How to describe and improve your business model to compete better. http://www.slideshare.net/Alex.Osterwalder/describe-and-improve-your-business-model (viewed 24 may 2010) Osterwalder, A. and Pigneur, Y (2009), The business model generation, preview available at http://www.businessmodelgeneration.com/downloads/businessmodelgeneration_preview.pdf (viewed 24 May 2010) SEO (2006), Images for the Future Costs and Benefits, available at http://www.imagesforthefuture. org/en/319/Costs_and_Benefits (viewed 24 May 2010) Surowiekcki, J. (2004), The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. London: Little Brown Wall Communications (2002) “A study of business models sustaining the development of digital cultural content”, available at www.wallcom.ca/Documents/DigitalCultContent.doc (viewed 24 May 2010) Le Monde, (2009}, Google voudrait créer la plus grande librairie privée de l’histoire, available at: http:// www.lemonde.fr/culture/article/2009/08/18/la-bnf-negocie-avec-google-la-numerisation-deson-patrimoine_1229430_3246.html (viewed 31 August 2009) 32
BPDG_opmaak_12072010.indd 32
13/07/10 11:51
3 DIGITAL LIBRARIES IN HIGHER EDUCATION Derek Law
Vision and mission of the digital library Higher Education libraries tend rather thoughtlessly to be considered a necessary, if expensive, part of a university which requires little justification. By extension the same approach often characterizes digital libraries. And yet without a clear understanding of the purpose of such libraries, expensive white elephants can all too easily be built. Visions and missions for digital libraries are quite rare. Partly because, in the absence of any common view of the nature of the digital library in Higher Education, it is then important to clarify why a digital library is being created. A surprisingly large number of digital libraries appear to have neither vision nor mission, at least according to their websites. This may be due to the fact that they are embedded as part of their larger institutional libraries. Imperial College, Cambridge University and Columbia University, for example, are all silent on their large digital library programmes. Others are more forthcoming and usefully illustrate the differences in what is being attempted. Perhaps the most ambitious, but certainly the clearest, mission statement comes from the British Library (2009): “The Digital Library Programme’s mission is to enable the United Kingdom to preserve and use its digital output forever.” The vision is equally crisp: “Our vision is to create a management system for digital objects that will – ingest, store and preserve any type of digital material in perpetuity – provide access to this material to users with appropriate permissions – ensure that the material is easy to find – ensure the authenticity of the material – ensure that users can view the material with contemporary applications – ensure that users can, where possible, experience material with the original lookand-feel” Other definitions are either fairly generic or concerned with process. The Oxford Digital Library (ODL) has Principles and Guidelines which commit it to “build up a significant set of digital resources for local and remote online access. Like traditional collection development, long-term sustainability and permanent availability are major goals for the ODL. Therefore the use of standards in the digital conversion process and for the description of digital resources will be essential for projects funded through and supported by ODL” (Oxford, 2009). Harvard notes that its “Initiative was consciously constructed on a different model: the integration of digital resources into the existing 33
BPDG_opmaak_12072010.indd 33
13/07/10 11:51
Derek Law
library structure. Integrated access to the collections, regardless of format, was a key aim of the Library Digital Initiative” (Harvard, 2009). At Hong Kong University Library “more digital projects are being developed to provide continuous access to digital content and services” (Hong Kong, 2009). And finally the Glasgow Digital Library provides “a distributed digital library based in Glasgow which aims to produce a coherent digital learning and information environment for Glasgow’s citizens, through the development and implementation of a common collection development policy and an agreed technical and inter-working infrastructure” (Glasgow, 2009). One of the longest standing digital libraries is the California Digital Library, explored more fully elsewhere in this volume where its origins and mission are fully described. As far as its current public presence on its website goes, however, it “supports the assembly and creative use of the world’s scholarship and knowledge for the University of California libraries and the communities they serve” (California, 2009). In addition, the CDL “provides tools that support the construction of online information services for research, teaching, and learning, including services that enable the UC libraries to effectively share their materials and provide greater access to digital content.” This lengthy set of examples is intended to demonstrate that there remains a degree of vagueness in public statements as to why digital libraries are being created. This is understandable in some cases where these libraries were early explorers and adopters of the digital world. However, after the project and exploration stage there is a real need for clarity of purpose if the case for creating a digital library is to be made and accepted. Even rarer than the sort of vision and mission described above is a link to the institutional strategic plan. And yet if the digital library is to have a context and if it is to have institutional support, it should be obvious that it must be part of delivering the overall aims and objectives of the institution. The business case Interestingly, none of the visions described above considers the business case for investing in digital libraries. University libraries do not stand in isolation, although all too often they appear isolated – or at least insulated – from the rest of the institution. If the business of the university may be loosely defined as teaching and research, the business case for creating digital libraries should relate to how the business of the university is supported, promoted and ideally enhanced. Tenopir (2009) has begun some work on this area, and while the evidence thus far is qualitative rather than quantitative it does have regard for how digital libraries promote research. Allowing researchers to cut out lines of research already used by others, exploring what will assist grant applications, identifying other research teams for joint research proposals, exploring interdisciplinary boundaries in novel ways, gaining access to raw research data and so on form the basis for a powerful business case built around the general well-being of the institution rather than the exclusive wellbeing of the library. Even the well documented increase in the number of articles read by researchers may form part of a justification (Tenopir et al, 2009). A recent CIBER (2009) study also shows massive increases in the use of electronic resources. Journals are the lifeblood of research. There is a strong correlation between e-journal use and research outputs. The ROI (Return On Investment) can reasonably be demonstrated to lie in how researchers perform rather than simply in how the library is used. 34
BPDG_opmaak_12072010.indd 34
13/07/10 11:51
DIGITAL LIBRARIES IN HIGHER EDUCATION
Higher education libraries have not hitherto had to undertake a great deal in the way of financial planning. In the United Kingdom, for example, the budget is typically last year’s figure plus a few per cent. Until very recently the Library was seen as a necessary, if expensive, part of the fabric of any university. Of course the budgets were and are very well managed, but little was needed in the way of business planning and such revenue generation as was undertaken either tended to be for endowments or was a way of paying for new services the costs of which were readily identifiable, whether photocopying, interlending or online searching. In essence the Library was simply a top-sliced cost from the University budget, or delegated to faculties, as is often the case in continental European universities. There are more complicated methods of calculating value, using economic models. Attempts have been made to put a monetary value on library activity (Baldwin, 2004) as a way of demonstrating value for money, while the British Library (2004) used contingent valuation to demonstrate that it creates value at a rate of 4.4 times the level of its grant. Kelly (2008) has estimated that the value of knowledge transfer from university libraries in Scotland to the external community can be calculated as £7.1 million. This calculation is part of a wider study on the economic impact of universities, which identifies the market value of activities, in this case using nationally collected statistics on the number of visits made to university libraries in Scotland by external users. However there is a wider issue than identifying costs and attributing them or charging for services. It may well be unhelpful to try and isolate the value and costs of the digital library. University libraries are part of the wider academic exercise, and not simply a cost. Thus the existence of a great library may be a factor in student recruitment, for example, but this would not necessarily show up in a simple calculation. Roosendaal (2003) makes a key point: The economic impact of ICT on the academic institution and its library cannot be discussed in an isolated way…the costs of establishing a digital academic library should be considered as part of the integral costs of establishing an overall digital environment for the entire academic institution. Making the business case for a digital library is then much more than simply identifying the costs of a new service and attributing income. Target user group(s) It is usually straightforward to define the user groups in higher education as either members of the university or some subset of them such as students. In many cases a digital library begins by undertaking learning tasks such as digitising examination papers and providing a repository for theses. These are clearly aimed at the student community, as is more general support for a digital learning environment. The purchase of access to e-journals will be targeted more clearly at researchers. Most digital libraries focus on content, but the fruitful area of services also needs to be considered. Training can be aimed at all groups, while identifying new research tools such 35
BPDG_opmaak_12072010.indd 35
13/07/10 11:51
Derek Law
as Openwetware (2009) or Blue Obelisk (2009) is much more clearly targeted at the research community. Twenty-four hour reference services can also be facilitated by the use of networks (Davis, 2004) and, even if little used, can be symbolically important. User design principles The key but often neglected element of digital libraries is that they must aim to be integrated into the workflow of users. For example, availability is important. We know from weblogs that 25% of use is outside the traditional working day and 15% is at the weekends (Nicholas, 2009). There is also a need for simplicity. Advanced search tools are used by a vanishingly small number of users and are simply not worth the effort of inclusion (Nicholas, 2009). It would also be valuable to consider how far the library might help to meet new requirements placed on the university in a cost-effective way. Thus research funders, for example, increasingly require the research team to maintain project websites for some years after funding ends. They may also require a data curation plan; they mandate depositing research outputs in repositories, and a growing number of institutions now require this too. It should be possible to demonstrate that all of these activities fall under the broad heading of the organisation of knowledge, which is the cornerstone of what information science and libraries are about. We also know that users wish to be able to speed through the site, and (perhaps counter-intuitively) the best researchers speed through the fastest (Nicholas, 2009). Speed rather than comprehensiveness should be a key design aim. Coupled with this is familiarity. There is resistance to learning another system when the system you have invested learning time in works well enough. Where possible the digital library should mimic existing popular services. And, if possible, another aim should be the ability to customise services for individual users rather than building the one size fits all platforms which we have tended to do hitherto. Collaboration and shared infrastructure can assist in reducing costs and adding value, and should be explored. NARCIS (2009) provides access to all Dutch science, almost 250,000 scientific papers as well as datasets, with the vast bulk available on open access. The Dutch university system with only 13 universities is perhaps more easily organised than many, but regional or disciplinary consortia can also follow this approach. Technical approach: architecture, infrastructure, metadata Although every case is different, at the extremes there are two kinds of digital library. The first and perhaps commoner is a reflection of the past and perhaps arises from a different era where the Library was a taken for granted common good. For a decade most higher education librarians have preferred to ignore the mushrooming growth of born digital material and focus on the past. They have created consortia to negotiate deals with publishers, but deals which are usually declared unsatisfactory, managing the commercial output of the journals industry. They have learnt the uncomfortable truth that increasing access to digital material comes at the price of leasing it rather than owning it in many 36
BPDG_opmaak_12072010.indd 36
13/07/10 11:51
DIGITAL LIBRARIES IN HIGHER EDUCATION
cases. At the same time we have generally preferred to digitise the paper collections we already own, rather than dealing with the born digital material our staff and students are creating. At the other extreme lies what may broadly be called the repository movement, which does aim to address institutional outputs. The University of Adelaide, for example, with its Adelaide Research & Scholarship project offers a sort of hybrid which “provides a platform for the collection, organisation, access, annotation and preservation of information in digital formats, as well as digital management of information in physical formats. Its primary focus is on the scholarly output of members of this University, and items of interest to those members (for example the rich resources of Special Collections)” (Adelaide, 2009). It is not possible for every library to support and manage all of its ‘business processes’, especially as the demands on the library grow, and service expectations and technologies change so quickly. Libraries have historically depended on shared platforms for services, and we may be about to see another step change in adoption. The motivation is to remove redundancy and to build capacity through collaboratively sourcing solutions, so as better to focus library effort on where it can make a distinctive local impact on the quality of the research and learning environment (Dempsey, 2006). The ideal then is perhaps to aim for shared common infrastructure, but to take sole responsibility for the bibliographic integrity and metadata of what is produced locally. This may imply a distributed architecture as we may expect researchers, for example, to wish to hold their own datasets rather than pass them on to the library. This may not matter if the library is seen as the arbiter of policy and standards. Costing It is a curiosity of the traditional library that we know very little about its real costs. We know a great deal about budgets, and thanks to long-standing arrangements for data collection we have excellent time series analysis of how patterns have changed. But we know almost nothing about the life-cycle costs of libraries and their indirect costs. And yet it is very easy to demonstrate that the costs for buildings and utilities alone are probably greater than the typical annual library budget. So there are significant dangers in attempting to compare the costs of physical libraries with those of digital libraries. It is still rare for a digital library not to be part of a physical library. How does one assign costs and value? Perhaps the biggest mistake is to do it in the context of the library alone. It is possible to do this mechanistically as the major LIFE (Life Cycle Information for E-Literature) Project demonstrated. The project is based at University College London (McLeod, 2005) and it aims at a complete analysis of all the activities which relate to the management of digital content, from its selection, through licensing and acquisition to ingest, metadata creation, adding links, access, user support, storage costs and preservation. The study used evidence from large scale examples to calculate average costs. This study then provides at least the basis for creating a very robust methodology to define digital library costs. The project has made some quite specific estimates of costs, based on British experience: 37
BPDG_opmaak_12072010.indd 37
13/07/10 11:51
Derek Law
It established that in the first year of a digital asset’s existence: – The lifecycle cost for a hand-held e-monograph is £19 – The lifecycle cost for a hand-held e-serial is £19 – The lifecycle cost for a non hand-held e-monograph is £15 – The lifecycle cost for a non hand-held e-serial is £22 – The lifecycle cost for a new website is £21 – The lifecycle cost for an e-journal is £206 LIFE also predicts that in the tenth year of the same digital assets’ existence: – The total lifecycle cost for a hand-held e-monograph is £48 – The total lifecycle cost for a hand-held e-serial is £14 per issue – The total lifecycle cost for a non hand-held e-monograph is £30 – The total lifecycle cost for a non hand-held e-serial is £8 per issue – The total lifecycle cost for a new website is £6,800 – The total lifecycle cost for an e-journal is £3,000 It is interesting to note that the precision of these numbers is almost impossible to match for our print on paper libraries. We should also understand that most of these costs will apply to non-commercial material. In all the debate about economics the huge volumes of grey literature, donations and archives which research libraries acquire each year tend to be neglected. And these costs are just as real for their electronic equivalent in digital libraries. The common view appears to be that digital libraries cause increased costs (Baker & Evans, 2009). The authors list the cost of digitising material, the cost of copyright clearance, constant technological change and innovation, training, preservation, administration and so on. But it is possible to consider not only the additional costs which a digital library will incur but the savings which may be made elsewhere. Ayris (2005) has suggested that savings in inter-lending costs could be used to fund a repository. Law (2009) has noted that at Strathclyde the cost of utilities to support a million volume library was £500,000 in 2007/8. By investing in electronic materials and disposing of paper stock Strathclyde plans to halve the physical space the library will occupy. Crudely this will save half the utility bill, but a more substantial saving will come if the university is able to shed a teaching building in the city centre and use the saved library space for teaching rooms. This reduction in the university estate will have even greater savings. We know a huge amount about the direct costs of traditional libraries, but almost nothing about their indirect costs and overheads. These could well look very different for digital libraries, not least because some of the costs, such as equipment, move downstream to the user and away from the university. Another interesting, if largely unexplored, area is that of shared services and open source systems. Library automation systems are expensive to purchase and have high recurrent costs. As we move on, there is renewed interest in creating shared services. These are now seen as utility-like functions where costs can be reduced by sharing. There is also interest in open source software, whether for repositories – such as D-Space, Greenstone and ePrints – or for library management systems such as Koha, OpenBiblio and PhPMyLibrary. One of the curiosities of Open Source is that university procurement offices can have difficulty accommodating this, since there is no one to tender in a conventional way. 38
BPDG_opmaak_12072010.indd 38
13/07/10 11:51
DIGITAL LIBRARIES IN HIGHER EDUCATION
Anticipated income streams Lesk (2004) has perhaps done most work on trying to identify how digital libraries might be funded. A traditional British University Library consumes about 3% of institutional income and produces very little income beyond fines and photocopying. Of that sum rather more than half will go on staffing, one third will go on acquiring content and the remainder will be spent on computing services, furniture and supplies. Higher education libraries tend to indulge themselves in a myth that they are free. This has never been true. Although they are generally funded through some form of topslice of university budgets, a range of services has always been charged for and/or other services are rationed against a notional charge. Thus services as varied as binding and inter-lending may be charged for. Photocopying is usually charged; on-line searching was often a charge. And some charges such as fines are in practice often in reality used as a de facto lending fee by at least some students. And, of course, external users (such as health service employees) are often charged for individually or in bulk. Nevertheless, Collier’s (2004) work on TEL (The European Library) resulted in the business case resting on providing value for money rather than on considering how income can be generated, and based this on several criteria: – That digitised content would be important – That the library should concentrate on material not readily available elsewhere – That material should be as far as possible free at the point of use – The strong public service ethic of the participating national libraries Interestingly this work focused on content and not on services, which are a much neglected area of digital libraries, but most higher education libraries would feel comfortable with this approach. Subsidy from the parent institution will almost certainly prove to be the default mode of funding. A now somewhat dated but still valid study of Digital Library Federation members in 2001 (Greenstein and Thorin, 2002) showed that “ordinary” library funds were being diverted into support for digital library activities. A range of other models has been proposed: – Subscription: password-based with the password having a fee – Pay per use: micropayment systems would pay for this and we know that there is available money in the system from experience with photocopying and printing. – Licensing: it is less clear whether this would be charged directly or as a kind of lab fee. – Author/owner pays: experiments with this are inconclusive so far – Sponsorship: there is little evidence that this would prove viable, not least at a time of great economic stringency. – Advertising: this is an environment to be explored. However, early experiments have proved discouraging in terms both of the level of income generated and the ambition of various bodies to have powers of veto over what may be advertised.
39
BPDG_opmaak_12072010.indd 39
13/07/10 11:51
Derek Law
Marketing There is a certain irony in the fact that the more transparent and easy to use a digital library becomes and the more integrated it becomes with network tools such as Google or larger resources such as Amazon, the less visible and obvious it becomes to users. This is particularly important with senior academic staff, as increasingly those who make decisions about the size and shape of library budgets are the least likely to visit and use the library as a place. Thus marketing the library’s services as well as its content becomes a critical activity. Publishers have brands which they work hard to maintain. Very few libraries, other than great historic ones such as the Bodleian Library, have brands. Marketing therefore becomes critical. The one thing we have learned above all from digital services, products and environments is that the market will decide, not quality or price or even availability. It is vital to be responsive to user needs and to recognise that the user has alternatives. Risk analysis Some of the risks to the digital library are obvious, and to an extent generic. Power failure, flood, fire and theft all have huge potential to damage. But perhaps the area which requires most management is the whole area of access to content. The loss of access to leased data through financial problems, the loss of access to data through government prohibition or failure of companies, the balance between locally hosted and externally accessed data, and LOCKSS (2009) type arrangements to ensure data security all require careful consideration. Other risks may be associated with staffing problems, the failure of organizational partner agreements, key stakeholders not buying into or supporting the project, technical hardware and software issues. As always, the best strategy is to spread the risk. No one member of staff should be critical, there should be multiple partners, there should be alternative or overlapping stakeholders and the use of generic hardware and software will allow easy replacement. Most basically of all, creating, maintaining and monitoring a risk register is now considered basic good practice. Implementation plan Implementation plans are de rigueur for all major projects. Again the key is not so much the creation of the plan but its monitoring and maintenance. There is also a danger that such plans can be monolithic rather than responsive. Ideally the plan should recognise the importance of a feedback loop, so that system design responds to users. One substantial tool which is available is log analysis. Surprisingly little attention is paid to this rich source of information.
40
BPDG_opmaak_12072010.indd 40
13/07/10 11:51
DIGITAL LIBRARIES IN HIGHER EDUCATION
Financial planning A financial plan is much more important for a digital library than a conventional one. If nothing else, the experience of working with library automation systems shows the need for a proper understanding of equipment replacement cycles. Brewer (2002) describes a service at the University of Derby which did not prove sustainable, although it was very popular. This emphasises the need for the business plan to reflect the needs of the institution, and in turn for the financial plan to be realistic about where the emphasis of investment should be to provide a service tailored to the needs of the institution. Concepts such as break-even point and ROI may not be entirely quantifiable, as discussed above, but there should certainly be a clear understanding of when and how the digital library will reach some kind of equilibrium. There will undoubtedly be new and additional costs, and these should be clearly understood and budgeted for. If other costs (such as interlending) are to be netted off against these new costs there needs to be a shared clarity of what this means. Conclusions The economics of digital libraries in higher education remain at a very primitive stage, not least because our lack of understanding of the overall costs of traditional libraries is so incomplete. Many digital libraries appear to emerge as services from projects without a clear understanding of their role, function and cost. Yet these are the most important factors. Clear articulation of a vision and mission are essential pre-requisites for the creation of the business case, technical design and service definition which must lie at the heart of the development. Summary – Visions, missions and business plans for DLs are quite rare in Higher Education and the understanding of the economics is at a primitive stage. – Business planning and business cases may be essential in a way in which they were not for traditional libraries – The business case should ideally be based on the institutional strategy – User design and responsiveness are even more important for DLs than for traditional libraries as students and researchers change their habits – Planning should encompass two distinct aspects: access to the published sources and access to/preservation of institutional research or digitised content – Costing should cover total costs of ownership. It is not safe to assume that the total cost of DLs is higher than that of traditional libraries over time. – Focus on services may open up new income streams – As the institutional DL gets covered by global web services more local marketing and PR may be needed to demonstrate value and gain credit for the initiative
41
BPDG_opmaak_12072010.indd 41
13/07/10 11:51
Derek Law
References Adelaide (2009) http://digital.library.adelaide.edu.au/ (viewed 5 June 2010) Ayris, Paul (2005). Note of Research Communications Forum One-day Conference – 7 March 2005 www.berr.gov.uk/files/file10867.doc (viewed 5 June 2010) Baker, David & Evans, Wendy (2009) Digital Library Economics: the key themes. In Baker, D. & Evans W. Digital Library Economics: an academic perspective. Cambridge, Chandos. Baldwin, Jerry (2004) Mn/DOT Library Accomplishments. TRUpdate 9, Spring Blue Obelisk (2009) http://blueobelisk.sourceforge.net/wiki/Main_Page (viewed 5 June 2010) Brewer, Gordon, (2002) The University of Derby Electronic Library: a case study of some economic and academic aspects of a local digitised collection. Program: electronic library and information systems vol 36 pp.30 - 37 British Library (2004) Measuring Our Value. http://www.bl.uk/pdf/measuring.pdf (viewed 5 June 2010) British Library (2009) http://www.bl.uk/aboutus/stratpolprog/digi/dom/mission/index.html (viewed 5 June 2010) California Digital Library (2009) http://www.cdlib.org/ (viewed 5 June 2010) CIBER (2009) E-journals: Their Use, Value and Impact London:RIN Collier, M (2004) Development of a business plan for an international co-operative digital library - The European Library (TEL). Program 38, no. 4 pp225-231 Davis, K. & Scholfield, S. (2004)“Beyond the virtual shore”: an Australian digital reference initiative with a global dimension Library Review 53, pp61-65 Dempsey, L. (2006) The (Digital) Library Environment: Ten Years After. Ariadne, No 46 http:// www.ariadne.ac.uk/issue46/dempsey/ (viewed 5 June 2010) Glasgow (2009) http://gdl.cdlr.strath.ac.uk/documents/gdloverview.htm (viewed 5 June 2010) Greenstein, D. & Thorin, S. (2002) The Digital library: a Biography. Digital Library Federation, Washington, DC Harvard (2009) http://hul.harvard.edu/ois/ldi/ (viewed 5 June 2010) Hong Kong (2009) http://lib.hku.hk/database/ (viewed 5 June 2010) Kelly, Ursula, McNicoll, Iain & Brooks, Richard. (2008) Towards the estimation of the economic value of the outputs of Scottish HEIs: Next Steps Project. Final Report. Glasgow, University of Strathclyde http://eprints.cdlr.strath.ac.uk/3106/ (viewed 5 June 2010) Law, D. (2009) An awfully big adventure: Strathclyde’s digital library plan http://www.ariadne. ac.uk/issue58/law/ (viewed 5 June 2010) Law, D. (2009a) Digital Library Economics: Aspects and Prospects in Baker, D. & Evans, W. Digital Library Economics. Oxford, Chandos, 2009 Lesk, Michael (2004) in Andrews, J. & Law, D. Digital Libraries. Aldershot, Ashgate LOCKSS(2009 ) http://www.lockss.org/lockss/Home (viewed 5 June 2010) McLeod, R., Wheatley, P. and Ayris, P. (2006) Lifecycle information for e-literature: full report from the LIFE project. Research report. LIFE Project, London, UK. NARCIS (2009) http://www.narcis.info/index/Language/EN/ (viewed 5 June 2010) Nicholas, D. (2009) What is beyond books and journals? Pointers from CIBER’s Virtual Scholar programme. Third Bloomsbury Conference on e-publishing and e-publications, 25 & 26 June 2009. Beyond Books and Journals. Programme available at http://www.ucl.ac.uk/infostudies/epublishing/e-publishing2009/ (viewed 5 June 2010) Openwetware (2009) http://openwetware.org/wiki/Main_Page (viewed 5 June 2010) Oxford (2009) http://www.odl.ox.ac.uk/principles.htm (viewed 5 June 2010) Roosendaal, H., Huibers, T., Geurts, P. and van der Vet, P. (2003) Changes in the value chain of scientific information: economic consequences for academic institutions. Online Information Review Vol. 27 pp. 120 - 128 42
BPDG_opmaak_12072010.indd 42
13/07/10 11:51
DIGITAL LIBRARIES IN HIGHER EDUCATION
Tenopir (2009) The study is being conducted for Elsevier and will be published once results are finalised. The grant to fund this is noted at http://web.utk.edu/~tenopir/research/grants.html (viewed 5 June 2010) Tenopir, C, King, D., Edwards, S., Wu, L.(2009) “Electronic Journals and Changes in Scholarly Article Seeking and Reading Patterns,” Aslib Proceedings: New Information Perspectives, 61: 5-32.
43
BPDG_opmaak_12072010.indd 43
13/07/10 11:51
BPDG_opmaak_12072010.indd 44
13/07/10 11:51
4 DIGITAL LIBRARIES FOR THE ARTS AND SOCIAL SCIENCES Ian Anderson
Introduction Over the last fifteen years digital libraries have rapidly evolved from small-scale, independent, experimental projects to large-scale programmes which are increasingly integrated into the activities of the ‘bricks and mortar’ library (Greenstein and Thorin, 2002). In so doing they have moved from seeking to develop their own ‘killer applications’ to modular systems increasingly utilising international standards and being offered as customisable open source infrastructures. Moreover, recent years have seen a concerted research effort to develop reference models for digital libraries and produce models, risk assessments and audit tools for digital repositories and digital preservation. At the same time e-journals, electronic thesis, institutional repositories and mass-digitisation programmes, particularly in the arts and humanities, have made an unprecedented range of content available to populate the new digital library infrastructure. Developments in more specialist fields such as metadata standards and interoperability, data interchange, storage and network capacity have all contributed to making digital libraries viable, largescale, global prospects. As other chapters in this book indicate, however, for those responsible for the business planning of digital libraries, research has not kept pace with developments. Critical issues such as market analysis and segmentation, user evaluation, impact measurement, costs, income, financial planning, ROI, marketing and risk analysis are more opaque than they ought to be. Furthermore, the very profusion of research outputs, the rapid pace of technological developments and the seemingly never ending debate over what is and isn’t a digital library can make the whole enterprise seem so bewilderingly complex as to prevent its execution with any degree of certainty. Justifying an arts and social science approach Nor is the process helped by the bulk of digital library research being framed in terms of domain analysis, such as architecture, access, content, interoperability, preservation and evaluation rather than subject knowledge. This has been a characteristic of information science research for the best part of thirty years and is not peculiar to digital libraries (Hjørland, 2005). This is not the place to debate the pros and cons of subject knowledge in bottom-up or top-down models of information science theory but, nevertheless, a justification for taking an arts and social science subject, or discipline, based approach to planning digital libraries needs to be made. 45
BPDG_opmaak_12072010.indd 45
13/07/10 11:51
Ian Anderson
So what is it about the arts and social sciences that justifies taking discipline considerations into the business planning of a digital library? Defining what we mean by the arts and social sciences can be problematic in itself, but a relatively uncontroversial list would include the performing and visual arts, history – in many of its guises – archaeology, classics, philosophy, literature, languages and theology in the arts. Political economy, politics, social policy, sociology, anthropology, and economic, social and medical history would typically be found in the social sciences. One may also find disciplines such as education, geography, law, international studies, business and management classified as social sciences. There are undoubtedly other disciplines that can be included and others might fall off the list. It has never been possible precisely to define these areas and they are subject to continued change. For centuries the ‘humanities’ was everything other than divinity, and more recently disciplines such as natural philosophy and mathematics sat comfortably within the arts. That the boundaries of the arts and social sciences are difficult to define, that they will vary considerably depending on location, tradition, institutional context and that they are subject to continual flux are reasons why a discipline based approach can be justified. Presuming that any digital library with arts and social science content and users cannot ignore this variability, the only sensible approach is to try and incorporate disciplinary factors within the business planning process. This chapter does not suggest a magic, one-size-fits-all solution, but rather outlines some of the important characteristics of the arts and social sciences as they relate to the digital library, and suggests a variety of ways in which they can be accounted for. E-Journal use One of the most obvious distinguishing characteristics is the very different balance between analogue and digital content within the arts and social sciences compared to that in many other disciplines. Looking at e-journals, the cornerstone of the digital library, at first sight the differences may not appear significant. It is estimated that 96.1% of journals in science, technology and medicine are available on-line compared to 86.5% in the arts, humanities and social sciences (Cox , L and Cox, J for ALPSP, Scholarly Publishing Practice Survey 2008, quoted in RIN, 2009). A less than 10% difference would hardly appear significant. Except that there is a qualitative significance to the analogue rump of arts and social science journals. Many of the longest established, most rigorous and prestigious journals are in print- only format, often published by small learned societies and professional associations for which the costs and risks associated with online publication are currently too great. When one considers the journal titles held by UK universities the differences are more marked. The number of electronic subscriptions has doubled to almost one million between 2001/2 and 2006/7, while the number of printonly subscriptions has halved over the same period to about 250,000. This still means that a fifth of all journal subscriptions remain in print-only format (SCONUL, 2008 quoted in RIN, 2009). Electronic journal subscription also effectively rules out libraries exchanging their print editions, a further drag on the move to electronic-only access. Secondly, journal publications in the arts and social sciences tend to have greater longevity than their counterparts in science and medicine, meaning that even if a journal has gone on-line, unless the back issues have also been digitised, then the paper volumes 46
BPDG_opmaak_12072010.indd 46
13/07/10 11:51
DIGITAL LIBRARIES FOR THE ARTS AND SOCIAL SCIENCES
remain valuable resources. Within the University of Glasgow Library 98% of science journals were available and accessed on-line and the 2% of print-only editions were used so infrequently that it is was possible to withdraw the entire run of print science journals from open shelving, a decision which it would be impossible to justify in the near future for arts and social science journals. The implications for those planning digital libraries is that they cannot count on withdrawing or reducing, at least quickly, physical editions for the arts and social sciences in the way they might in other disciplines, even when digital equivalents are available. Information retrieval patterns None of this is meant to suggest that scholars in the arts and social sciences are digital Luddites; far from it, they are enthusiastic users of digital resources, but they are a heterogeneous and promiscuous bunch in their information seeking behaviour and research methods. In the early 1990s Tibbo comprehensively analysed the problem of abstracting and information retrieval in the humanities in the context of the digital library (Tibbo, 1993). The causes of these problems are multiple; scholars in the arts and social sciences tend to use a very wide and varied range of literature and sources; their research questions are often multi-facetted or framed in abstract or conceptual terms; abstracts and keyword terms are less commonly found in arts and social science journals and less frequently used as access points, at least in the Arts. Abstracting, indexing and cataloguing arts and social science publications are generally more difficult because adequately summarising heterogeneous literature in terms which accurately reflect and distinguish nuances of the discipline, yet enable sufficient aggregation for efficient and consistent retrieval, is both intellectually demanding and time consuming. Just as Tibbo described historians’ preference for browsing shelves and consulting lengthy book reviews in 1993, more recent studies have confirmed their many and varied methods of information seeking in the digital age. A decade after Tibbo’s study, following leads in print, informal leads, printed finding aids and bibliographies were historians’ most popular information retrieval methods for locating primary sources (Anderson, 2004). But these were just four out of 19 retrieval methods which encompassed everything from OPACs through bibliographic utilities to using research assistants. Some of these trends can appear contradictory. Although humanists in general still rely heavily on works in print for information retrieval they can also demand access to equivalent electronic services. At the University of Glasgow Library 80% of the texts in the Early English Books Online (EEBO) service are also held by the library in print, yet there was a strong demand for, and use of, EEBO because of the cross-searching functionality that the service provided. The recent Research Information Network report on the use of e-journals within UK Higher Education also emphasises the different usage patterns between and within disciplines as revealed by deep log analysis of publication databases. Economics readers are least likely to access e-journals through gateway services, such as Google, with 19% of page views arrived at this way, compared to 49.2% in chemistry and 65.9% in life sciences. Although somewhat contradictory, data in the CIBER report on the usage and impact of e-journals which were used in the RIN report indicate that economics 47
BPDG_opmaak_12072010.indd 47
13/07/10 11:51
Ian Anderson
journals were accessed using Google for 37% of the sessions compared to 45% for history (CIBER, 2009, pp92-93). Conversely, economists made the greatest use of abstract views per session at 30.4%, compared to 23.3% for chemistry and 19.5% for life sciences (RIN, 2009, p24). As the RIN report goes on to note “E-journal databases such as Oxford Journals, do not appear to force users into a common style of behaviour. Subjects do! Historians search for and use e-journals in ways very different from their scientific and social science colleagues. Compared, for instance, with life scientists, historians are more likely to access e-journals via Google, and to use search tools, especially menus, once they are inside the publisher’s platform.” (RIN, 2009, p25) The RIN report also supports the earlier observation of the greater longevity of arts and social science journal articles. In the field of economics the average age of the youngest article viewed was two years, compared to one year for the life sciences and physics, while the average age of the oldest article viewed for economics was 4.7 years compared with 2.7 years for life sciences (RIN, 2009, p28). There is also further evidence of the relative unimportance of abstracts to historians, who viewed three times fewer abstracts than economists (CIBER, 2009, p93). In addition to these variations in e-journal use between disciplines, there are variations between institutions, even in the same subject area. It is perhaps not surprising that the most research intensive universities make the most use of e-journals because they have larger numbers of researchers. However, the bigger users also have shorter and more focused sessions, use a narrower range of on-line facilities and functions and make less use of added value services, such as alerts, than less research intensive institutions in the same discipline (RIN, 2009, pp26-30). The attention that e-journals receive can also obscure the fact that the print monograph remains one of the main means of scholarly communication within the Arts and Social Sciences. For all the advances in e-book readers and the speed with which Google is digitising vast swathes of academic books it seems unlikely in the foreseeable future that those planning digital libraries can ignore this subject and institutional variation in e-journal use. Sources and methods There are also wide variations in research methodology and sources within and between arts and social science disciplines. In social science disciplines, such as economics, one may find precisely defined and delimited research questions which utilise an explicitly experimental methodology, with the results analysed in terms of covering laws, which is virtually indistinguishable from that in the natural sciences. But one may also find these methods employed in history, archaeology or geography. The creative arts necessarily involve a different working practice from the analytical arts disciplines, but all may express or reflect their approach from any number of theoretical perspectives – modernism, post-modernism, structuralism, Marxism or any number of other ‘isms’, theories or frameworks, or none at all. Approaches may be descriptive, narrative, empirical, analytical, quantitative, qualitative or a combination of these; they may be 48
BPDG_opmaak_12072010.indd 48
13/07/10 11:51
DIGITAL LIBRARIES FOR THE ARTS AND SOCIAL SCIENCES
concerned with the general or the particular. These methodological variations perhaps explain, in part, the variations in information retrieval described above. These methodological variations also encompass an equally wide range of source material and research data in both analogue and digital formats. Although this is obviously something of a generalisation, arts disciplines primarily deal with sources which are already in existence. The sciences, engineering and medicine primarily generate their own data through experimentation. The social sciences sit somewhere in between, both using existing sources and generating their own research data. Within arts disciplines in particular the holdings of special collections, audio and video material, maps, plans, artefacts and newspaper collections are valuable sources which exist primarily in analogue format. While digital surrogates of this material and their born digital counterparts can be found in digital libraries the majority of material remains in analogue format. Nor is this material typically suited to the mass digitisation approach which has seen large quantities of journals, books and newspapers digitised. Material in special collections is invariably unique or rare, valuable, fragile and/or of a form and size which make digitisation a particularly time- consuming and expensive activity which more often than not takes place within the context of sporadic, one-off projects, rather than a rolling programme of mass digitisation. The same can be said for audio-visual material and outsize material such as maps and plans, to which the additional problem of intellectual property is often attached. Problematic though this type of material may be to digitise, describe, manage and preserve, it is a relatively known quantity within a digital library context. But there are a whole host of other digital sources and outputs which fall outwith the boundaries of the digital library, or archive, gallery or museum for that matter. Innumerable digital images, electronic texts, databases, spreadsheets, web sites, 3D point clouds, CAD/CAM and GIS data pour forth from the arts and social sciences each year. If we are lucky some of this material may find its way into an institutional repository, subject or national data archive but the vast majority sits in digital limbo, scattered across web servers, networked drives and desktop computers. Once again not all of this material is unique to the arts and social sciences, but its variety necessarily raises the question of what is the scope of the digital library, what to include and how to define its content boundaries. Approaches The exact nature of the relationship between sources, methodology and information seeking behaviour is unclear, and certainly warrants further research, but both the variety of methodologies, material and the combination of analogue and digital in itself make business planning for arts and social science digital libraries more complicated. The features described above are inevitably a snapshot of the arts and social sciences. It is well beyond the scope of this chapter to provide a comprehensive picture of the nuances of arts and social sciences discipline by discipline, even if this were possible, but the above sections go some way to demonstrating that to ignore a discipline orientated component to business planning for digital libraries risks making decisions on structure, content, cataloguing and functionality which do not meet the requirements of arts and social 49
BPDG_opmaak_12072010.indd 49
13/07/10 11:51
Ian Anderson
science users or capture their sources and research outputs. On the other hand, if a major part of business planning is the identification of risk, assessing its likelihood and the severity of its impact and taking steps to mitigate these risks, then taking an arts and social sciences based approach would appear to open up a can of worms for which there are no easy solutions; however there are approaches, frameworks and tools which together can provide an effective means to plan the business aspects of arts and social science digital libraries. Strategic planning Those attempting to find a strategic direction in planning digital libraries find themselves in an invidious position. Access to e-journals is controlled by publishers; projects such as JSTOR (JSTOR Digital Archive) are providing access to digital journal back issues and more besides; projects such as Perseus (Perseus Digital Library) provide examples of the ever expanding virtual digital library; Google is digitising large quantities of published literature and also providing the primary means of access to large amounts of digital content through its search services and Google Scholar, even at the expense of publishers’ on-line services. In the current digital landscape it may appear that digital libraries as locally built collections are being squeezed out by competing digital content providers. Strategic planning should form part of any business planning process, but the question is how high should the sights be set? Where is there a gap in the market? What are libraries’ competitive advantage in the digital age and how can this be best exploited? Strategic planning often emphasises the importance of aligning aims with broader host institutional objectives and mission statements, particularly in the field of digitisation (Ross, 1999). While such an approach is worthwhile it should not be limiting. A library’s mission often extends beyond that of its host institution; it acts not just to serve immediate goals, but to build and develop collections of lasting value, act as a trusted repository, add value to the material it holds and serve a broader community than its most immediate users. Although the digital landscape is rapidly evolving and highly competitive, digital libraries must retain some of their early spirit of experimentation and innovation. Google is not a trusted repository, nor is the Internet, and even as an access tool Google is a fairly blunt instrument. Given the variety in subject and institutional uses, third party digital library tools do not appear to be significantly shaping information seeking behaviour, nor adequately reflecting it. In this context the library’s role as a trusted digital repository which can add value to access is more important and relevant than ever, not less so. This is where libraries’ competitive advantage lies, where in fact it always has done – in identifying, collecting, analysing, categorising, describing and maintaining long-term access to intellectually, historically, socially and culturally important resources in ways which are relevant to their users. This is particularly the case for arts and social sciences disciplines the sources, outputs, information seeking behaviour and methodologies of which are typically more complex than elsewhere. In thinking strategically about digital libraries in the arts and social sciences due account must also be taken of the library’s size, organisation, resources (both physical and human) and technical capabilities. How these features relate to other digital library activity at the 50
BPDG_opmaak_12072010.indd 50
13/07/10 11:51
DIGITAL LIBRARIES FOR THE ARTS AND SOCIAL SCIENCES
local, national, international or discipline level must also be considered. Digital library development is time-consuming and expensive, re-inventing the wheel, duplicating activity or infrastructure all need to be avoided. Indeed, thinking strategically about digital library development for the arts and social sciences opens up the possibility for collaborations beyond the library sector itself. Whilst libraries bring expertise in providing value added access to digital collections, the need to build, manage and re-use this material also calls on skills which can be contributed by archives, galleries and museums, as is discussed in other chapters of this book. Collaboration doesn’t happen by itself of course; it needs to be planned for from the outset and the potential collaborators and skills that they can bring will vary from institution to institution. The requirements and capabilities of a small liberal arts college are vastly different from those of a large, multidisciplinary university, just as they will be different from those of a public central library compared to a rural branch library. The mantra of ‘think global, act local’ is never more appropriate, but there are challenges and compromises in reconciling an ambitious, forward-thinking strategy with prudent development, between spreading risk and costs through collaboration while satisfying local user needs, between effective risk management and experimentation. User awareness One way of addressing some of these challenges is through user evaluation. It is axiomatic that any business plan must take account of its customers, but until relatively recently user evaluation was relatively low on the list of priorities for those developing electronic information systems. In the digital library domain the evaluation of system performance, collection properties, test-beds and architecture was more in evidence than user orientated evaluation, although the literature on the last has expanded enormously (Zhang, 2004). More recently, deep log analysis has provided insights into users’ information seeking behaviour in digital libraries on a hitherto unimagined scale (Nicholas and Huntington, 2009). The fascinating patterns which deep-log analysis and data mining techniques reveal do not, however, obviate an institution from local user evaluation. Indeed, as the RIN and CIBER reports mentioned above have indicated, these large-scale analyses have revealed widespread institutional variations. While we cannot assume that the retrieval patterns for e-journals would be replicated for all genres of material, the variation at institutional level within subjects suggests that the need for local user evaluation is greater, not less, as a result. If we build local user evaluation into the business planning process, however, what is to be done with the results? The trinity of electronic journal publishers, Google and virtual digital library collections may appear to leave little room for the locally built arts and social sciences digital library. The responses to these challenges are multiple – active collection of digital content, the development of trusted digital repositories, and the development of added value personalisation services. In his recent PhD thesis Konstantelos has demonstrated the value of thorough user evaluation and the feasibility of developing tailored metadata and services which can be integrated into a digital library infrastructure. By examining the field of digital art this research also establishes the benefit of digital libraries collecting non-traditional material, 51
BPDG_opmaak_12072010.indd 51
13/07/10 11:51
Ian Anderson
as current digital art portals and services frequently fail to address the needs of both artists and users (Konstantelos, 2009). Fully fledged personalisation services do not necessarily need to be created on a bespoke basis. Integrating the digital library interface with bibliographic tools such as Zotero and EndNote, personal ‘dashboards’ such as iGoogle (already implemented by the University of North Carolina at Chapel Hill Library), as well as the range of generic social bookmarking sites such as delicious and StumbleUpon as well as publication sharing systems such as Bibsonomy, Bebop and CitULike. Active collecting This is not the place to revisit the debate on what does and does not constitute a digital library, but while there is an almost unlimited range of digital objects, not all digital objects can be considered digital library objects. To use Collier’s definition, a digital library is “a managed environment of multimedia materials in digital form, designed for the benefit of its user population, structured to facilitate access to its contents, and equipped with aids to navigate the global network ... with users and holdings totally distributed, but managed as a coherent whole” (Collier 2006). On this basis it is fair to assume that digital surrogates created from the sort of analogue material typically held by libraries; manuscripts, rare books, maps, illustrations etc. should fall within a managed and structured environment relatively easily. However, this leaves a vast range of material outwith this environment, but still of immense use to the arts and social science disciplines. Libraries have made a vast contribution to the digital content of external projects without necessarily realising a return. TheGlasgowStory project is prime example of this (TheGlasgowStory). Funded by the UK Lottery’s New Opportunity Fund, TheGlasgowStory digitised content from the city’s libraries, archives and museums to produce a pictorial history accompanied by a range of themed essays. Aimed at promoting lifelong learning and social inclusion the project attached considerable importance to associating comprehensive descriptive metadata with each image and providing a sophisticated infrastructure for their online delivery. Despite being a flagship digitisation programme, there was no further NOF funding available at the project’s end, and additional funding was only sufficient to develop two small-scale spin-off projects. The web site at least remains online, something that not all NOF projects have managed, but the site is not actively managed. As a result the domain name registration almost expired without notice and one server upgrade inadvertently altered the case of file names, which took the site offline for a period. Nor does any of the digital content in the site appear to have been reused by contributing institutions, although each received high quality master digital images of its material. Examples of such orphaned digital resources litter cyberspace and, combined with other forms of research output and in-house digitisation, provide a significant opportunity for the entrepreneurial arts and social science digital library rapidly and cost effectively to build content. 52
BPDG_opmaak_12072010.indd 52
13/07/10 11:51
DIGITAL LIBRARIES FOR THE ARTS AND SOCIAL SCIENCES
Digital repositories The capability to ingest, manage and re-use such content assumes that digital libraries have a suitable digital repository. The provision of institutional digital repositories has seen rapid growth in recent years. There are currently 120 repositories in 89 of the UK’s 116 Universities, up from 70 repositories in mid 2007 (OpenDOAR). However, it is far from clear that these repositories are capable of storing, sustaining and preserving the wide range of digital sources and data available. Only 49% have defined content policies; 47% have undefined or unknown content submission policies; 58% have undefined or unknown metadata re-use policies; in only 16% of cases is the full non-profit re-use of data items permitted; 54% have undefined or unknown data re-use policies; and only 22% have defined preservation policies (OpenDOAR). Moreover, current content types housed in digital repositories suggest that they are primarily designed as e-print services, with 69% holding journal articles, 45% conference and workshop papers, 40% unpublished reports and working papers, 31% books, chapters and sections, and 30% thesis, compared with 23% multimedia and audio visual materials, 11% learning objects, and only 6% datasets (OpenDOAR). Planning a digital repository capable of ingesting the wide variety of media used and produced in the arts and social sciences could easily entail a book in itself, but there are a number of tools which have recently been released, discussed in the section below, which greatly assist the process. Experimentation and risk The creation of a digital library inevitably involves a degree of experimentation and risk because the individual circumstances of institutions require evaluation, testing and trial and error, and what works for one library will not necessarily work for another. In planning the digital repository component to ensure that it meets the needs of the arts and social sciences two tools, the Data Audit Framework (DAF) and Digital Repository Audit Method Based on Risk Assessment (DRAMBORA), are of immense value in establishing data requirements and identifying and managing risk. DAF recommends that audit of research data assets proceeds in a four step process. In the planning stage the purpose and scope of the audit are defined and the schedule optimised. The purpose of the second stage, identifying research data, is to establish what data assets exist and classify them according to their anticipated value to the organisation, and further audit activities concentrate on only the most significant assets. Stage three, assessing management of data, assists auditors in identify weaknesses in data policy and current data creation and curation procedures. This provides the basis of recommendations in the final stage of the audit (DAF). Although initially designed for auditing research data the DAF methodology is flexible, non-prescriptive and self-administered. As such it can be adapted to particular organisational and content requirements. Applying DAF to arts and social science sources and outputs will enable those planning digital libraries to quantify their most important assets and identify weaknesses in policy, procedure and infrastructure which need to be addressed in advance of, or in conjunction with, repository design. 53
BPDG_opmaak_12072010.indd 53
13/07/10 11:51
Ian Anderson
In this regard, DRAMBORA provides another useful planning tool. It presents a meth odology for self-assessment which encourages organisations to establish a comprehensive self-awareness of their objectives, activities and assets before identifying, assessing and managing the risks associated with digital repository design. DRAMBORA rationalises the uncertainties and threats which inhibit efforts to maintain digital object authenticity and understandability, transforming them into manageable risks. Six stages are implicit within the process. Initial stages require auditors to develop an organisational profile, describing and documenting the repository’s mandate, objectives, activities and assets. Risks are then derived from each of these, and assessed in terms of their likelihood and potential impact. Finally, auditors are encouraged to conceive of appropriate risk management responses. This process enables effective resource allocation, enabling repository administrators to identify and categorise the areas where shortcomings are most evident or have the greatest potential for disruption. The process itself is an iterative one, and therefore subsequent recursions will evaluate the effectiveness of prior risk management implementations (DRAMBORA). Given the relatively immature state of institutional digital repositories and the breadth of sources, methodologies and outputs that characterises the arts and social sciences, employing the DAF and DRAMBORA toolkits as part of the planning process will go a long way towards adequately accounting for these characteristics. Summary What then might characterise an effective business plan for an arts and social science Digital Library? Firstly, the plan needs to acknowledge four key factors: – The close and persistent relationship between the analogue and the digital which is crucial for many arts and social science subjects. – The breadth of sources, methods and outputs across and within subjects – The diversity of users’ information-seeking behaviour – Be risk aware but not risk averse The plan itself should be based on: – Seeking appropriate collaboration at the local, national or international level – Clear strategic goals and objectives to achieve these – Thorough evaluation of local user needs – Audit of existing and potential content using tools such as DAF – Audit and risk management of digital repository design using tools such as DRAMBORA The plan should enable: – A sustained programme of digitisation, rather than one-off projects – Active collection and ingestion of research outputs and digital content which complement in-house collections – Long-term management, access and re-use of trusted digital content – The development of services that add value for arts and social science users – Ongoing experimentation 54
BPDG_opmaak_12072010.indd 54
13/07/10 11:51
DIGITAL LIBRARIES FOR THE ARTS AND SOCIAL SCIENCES
Acknowledgements The author would like to acknowledge the valuable information provided by Richard Bapty and Graham Whitaker of University of Glasgow Library. References Anderson, I, ‘Are you being Served? Historians and the Search for Primary Sources’, Archivaria, No 58, (Fall 2004). CIBER, Evaluating the Usage and Impact of E-Journals in the UK, CIBER Working Paper 4, February 2009 available at http://www.rin.ac.uk/files/Information_usage_behaviour_CIBER_ ejournals_working_paper.pdf (Accessed 16/08/09) Collier, M, Strategic change in higher education libraries with the advent of the digital library during the fourth decade of Program. Program, 40(4), 2006. DAF at http://www.data-audit.eu/methodology.html (Accessed 16/08/09) DRAMBORA at http://www.repositoryaudit.eu/about/ (Accessed 16/08/09) Greenstein, D. and Thorin, S.E. The Digital Library: A Biography, Digital Library Federation, 2002. Hjørland, Birger, ‘Domain Analysis in Information Science’, in Encyclopedia of Library and Information Science, 2nd ed, First Update Supplement, edited by Drake, Miriam A. Taylor and Francis, 2005. Konstantelos, L, Digital Art in Digital Libraries. A study of User-oriented Information Retrieval, PhD, University of Glasgow, 2009. Nicholas D, Huntington P. ‘Employing deep log analysis to evaluate the information seeking behaviour of users of digital libraries’ in Tsakonas G, Papatheodorou C, Editors, Evaluation of digital libraries. Oxford: Chandos, 2009. OpenDOAR at http://www.opendoar.org (Accessed 16/08/09) Perseus Digital Library at http://www.perseus.tufts.edu/hopper/ (Accessed 16/08/09) Research Information Network, E-Journals: their use, value and impact, 2009 available at http://www. rin.ac.uk/files/E-journals_use_value_impact_Report_April2009.pdf (Accessed 16/08/09). Ross, S. ‘Strategies for Selecting Resources for Digitization: Source-Orientated, User-Driven, AssetAware Model (SOUDAAM)’ in Making Information Available in Digital Format: Perspectives from Practitioners, edited by Terry Coppock, Edinburgh, The Stationery Office, 1999. TheGlasgowStory at http://www.theglasgowstory.com (Accessed 16/08/09) Tibbo, H, Abstracting, Information Retrieval and the Humanities, American Library Association, 2003. Zhang, Y, Human Information Behavior in Digital Libraries, http://comminfo.rutgers. edu/~miceval/research/DL_HIB.html (Accessed 16/08/09)
55
BPDG_opmaak_12072010.indd 55
13/07/10 11:51
BPDG_opmaak_12072010.indd 56
13/07/10 11:51
5 THE IMPACT OF THE DIGITAL LIBRARY ON THE PLANNING OF SCIENTIFIC, TECHNICAL AND MEDICAL LIBRARIES Wouter Schallier
A framework for innovation Scientific, technical and medical libraries (STM libraries) are challenged by rapid changes in their technological, scholarly and educational context. STM libraries constantly need to improve and re-invent their services and products in order to address user needs. Regarding access to research literature, the core business of STM libraries today is almost completely concerned with the provision of access to electronic journals and databanks. Unless they have a specific archival or heritage function specialist STM libraries may not have a general cultural role, which may call into question the need for a physical library at all. It is clear that research staff no longer need to come to the library as they have access to (most of ) all they need from the desk top. On the other hand if the STM library is based in higher education the needs of the students are also of paramount importance. Students still need a place to study, and all the indications are that their needs for study space of various sorts are actually on the increase. This may call for radical changes in the business planning of the STM library. At the biomedical library of the Catholic University of Leuven (K.U.Leuven) just such a radical review is under way. It has been decided on the one hand to opt as far as possible for e-only and on the other hand to change rapidly from a library to a learning centre model. The library as space Many STM libraries are facing the question of what to do with their physical space. There are several issues here. First, most collections of STM libraries are digitally available (e-journals and e-books). This means that most of the paper can be removed, or at least be stored in a less prominent and less costly space than a reading room. Second, STM libraries are no longer the exclusive places for access to scholarly information. Students and researchers have access to the Internet and to scholarly information almost everywhere: in the office, in the laboratory, on the train, at home. So the question is: does the STM library as space still have a place in education and research? Most STM libraries understand that the age of paper collections is over. Paper in a digital world has as much reason for existence as a horse in a world of cars: you buy a horse only because you like it, not because it is more practical for transport. Many STM libraries have made the move towards e-only collections. On the other hand, some paper collections still need to be preserved (usually because of their historical value) and libraries 57
BPDG_opmaak_12072010.indd 57
13/07/10 11:51
Wouter Schallier
are trying to organise this as efficiently as possible: they store them in places where the loss of space is minimal and the climatic conditions are optimal, and they often share the storage facilities thereby distributing the preservation effort. Not all copies of journals and books are being kept in every library, but interlibrary agreements determine the responsibilities amongst partner libraries in terms of preservation and access to valuable paper collections. In Flanders, the K.U.Leuven is a partner, with the other universities possessing biomedical faculties, in a scheme to have shared storage of one paper copy of biomedical journals somewhere in the region. The scheme is governed by a service level agreement whereby each library agrees to store its allocated titles for the future as long as the paper subscription continues. This means that the most prominent library space, the reading room, can be almost completely freed from paper. Many STM libraries have been converting this space into a working and social environment that meets the needs of modern education and research. This means much more than simply replacing the book shelves with computers, because this alone will not attract more students or researchers. They have divided the public library space into flexible modules with adequate infrastructure for different work forms such as quick reference, individual study, group work, brainstorming sessions, presentations, classes, social contacts, etc. Many libraries are transforming into these kinds of learning and research environments, which are variously referred to as learning centres, learning zones, learning grids, etc., which gives them a tremendous potential to remain (or become again) a natural environment for students and researchers. Into the user’s environment As with all libraries, the goal of an STM library is to deliver products and services that meet the users’ needs. There is a large variety of users: researchers, students, medical staff, general public, delegates of private companies, or even more specific categories like bachelor students in dentistry, final year students in engineering, doctoral students in cardiology, nurses and so on. The explicit request for a new service or product sometimes comes directly from the user. A few years ago, a researcher sent an e-mail to the K.U.Leuven Biomedical Library asking why the library did not have e-books. The researcher had discovered e-books on a publisher’s website and was not aware that the biomedical library actually already offered access to a small collection of e-books. When this request came and then another, the library knew there was a growing need to be addressed and started to invest systematically in a considerable e-books collection. Much more often, however, it will be up to the library to understand the users’ needs and proactively translate this into products and services. This is how we developed a library toolbar, built with the Conduit software (www.conduit.com), after having seen a demonstration at the Medical Library of Groningen University. Once this library toolbar is installed on the user’s computer, it is displayed on top of his browser and allows him to search in the information resources of the library. 58
BPDG_opmaak_12072010.indd 58
13/07/10 11:51
THE IMPACT OF THE DIGITAL LIBRARY
Why did we think this was an important solution for our customers? From the usage statistics we learned that most users visited the library website simply to access information resources (databases, journals, e-books etc.), but only rarely to search in the library catalogue or to read the other information on the website. We also learned that several users bypassed the library website by bookmarking the URLs of their favourite journals/ databases, which was rather problematic because URLs tend to change. So we were looking for a way to give our students and researchers direct access to the information resources without their having to pass through the library website, and we found the answer in the toolbar. The library toolbar allows direct and customised access to information resources, and central management of the URLs at the same time. It became an immense success: after three months 494 people had downloaded the toolbar on their computers; after six months we had 752 installations; after a year 1745; and after two years as many as 3818, with an average of between 750 and 780 users per weekday. Most reactions from our users about this toolbar were very positive. On the other hand, several colleagues from the library were sceptical because they feared the possible effect of the toolbar: that users would no longer have to pass through the website, and this would seriously decrease the visibility of the library. They were wrong of course: the toolbar made the products and services of the library more visible than ever, since they were all permanently available on top of our users’ browsers. Besides, the library gained a lot of credit amongst the users since it had proven its ability to find a creative and innovative solution to one of their most essential needs: quick and easy access to scholarly information. The success of the library toolbar illustrates how users increasingly expect the library to come to their working environment, and not the other way around. This is a radical change. The library’s environment (reading room, catalogue, website) used to be the only access point to trustworthy scholarly information, but this is no longer the reality. Nowadays STM libraries have to push their products and services to the user and make them visible in his (virtual) work environment. In order successfully to push products and services, the library is supposed to know the needs of the user, to deliver products and services which stimulate customisation and participation, and to build a sustainable relationship with the user. The so-called social web (web 2.0) offers plenty of tools which allow interaction with the user. Many STM libraries have a profile or a community in social networks like Facebook, LinkedIn, Netvibes or are using a library toolbar. Trends towards library 2.0 and library 3.0 have been criticised on the basis that there may be too many of these applications for the library to be present in all of them; that they may be only a passing fad; that the added value may be unclear; and that researchers may not (yet) use them and so on. This may be so in some cases, but still it would be a mistake for STM libraries not to take advantage of these applications. The relationship of the library with its users has undergone a fundamental change, with the result that library services are no longer part of the scholarly workflow by default and libraries need to act in a proactive way if they want to remain relevant. Since students and researchers are increasingly active in digital learning and research environments, including social networks, and are increasingly accessing the Internet via handheld devices, STM libraries simply need to be there with an adequate offer of interactive products and services. 59
BPDG_opmaak_12072010.indd 59
13/07/10 11:51
Wouter Schallier
The establishment of a sustainable relationship with the customer and good communication are old marketing principles, and still we develop and describe our products and services too often in terms of functionalities (system oriented approach), not in terms of what the customer gets. Instead of saying “The library selects and buys access to clinical medical journals”, STM libraries could write (and think) in a more customer oriented way: “The library provides access to clinical medical information which enables clinicians to answer diagnostic and treatment questions”. The message that the library sends here is: we develop products and services which are relevant to the customer because they allow him to study and to do research in a more efficient way, and by doing so we directly contribute to the success of the institution. This is not just a communicative detail, it makes a big difference to the customer’s perception. Embedding the library in research and education Paradoxically enough when the Internet provides seemingly unlimited access to information of all kinds, the digital library in research and education requires that users have information skills of a higher order. Students and researchers need skills which enable them to locate, access and, crucially, evaluate the information which they will then organise and re-use in an efficient and correct way. The wide availability of information on the Internet makes it very important that students understand correct attribution and the avoidance of plagiarism. Many universities have introduced training in information skills into the curriculum, since they want their students and researchers to play a prominent role in society and to be fully equipped for lifelong learning. Ideally, information skills are integrated into the curriculum both horizontally and vertically. By vertical integration we mean that information skills are worked on systematically and progressively, from the first to the final year of the curriculum. Horizontal integration, on the other hand, means that information skills are worked on in and made relevant to as many courses as possible, and not just one isolated course. Teaching information skills is a shared responsibility of academic and library staff. The academic staff is able to guarantee full integration in the curriculum. Librarians are familiar with the technical side of information (and information sciences) and can address the need for systematic and progressive training. This requires that librarians develop didactic skills and a greater understanding of the educational process. Close collaboration between faculty and library staff is also beneficial in research. Scholarly communication is undergoing tremendous changes, enabling researchers to share, replicate, validate and re-use raw research data, to publish quickly and in open access. Information professionals will increasingly need to take up the responsibility of data and publications management at their institutions. Data management will also be key in preparing grant applications. Clinical librarians have a long tradition of assisting health professionals in applying the most up-to-date biomedical literature to patient care.
60
BPDG_opmaak_12072010.indd 60
13/07/10 11:51
THE IMPACT OF THE DIGITAL LIBRARY
Flexible internal organisation with new profiles Specialist STM libraries are often small organisations, requiring simple and flexible organisational structures which allow for good internal communication, teamwork and rapid adaptation to changing circumstances. As the nature of the work and the information market change rapidly, individual and collective responsibilities need to be redefined regularly. In terms of profiles, STM libraries increasingly need people with abilities to communicate and collaborate with faculty, IT specialists and other services inside or outside the organisation. As information professionals, they will assist students and researchers in every stage of the data management cycle, from data creation to archiving and re-use. This supposes a strong grounding of librarians in IT, education and research. Scholarly, communicative, advisory and didactic skills have become as important as a qualification in library and information sciences, if not more so. The back office and low level jobs relating to the acquisition and management of paper-based information products are disappearing, and are being replaced by a sharper focus on what is necessary for the front office (advice, facilitation, coaching) and for the back office (licence negotiation, budgeting and management). The changing financial model of the library In STM libraries the traditional business model whereby libraries bought outright ownership of print resources has already been largely replaced by the licensing model providing access to, but not ownership of, information resources. In decentralised library organisations like the K.U.Leuven this has necessitated a change from decentralised purchasing to centralised negotiation and management, since the subscriptions apply to the institution as a whole and the process of negotiation and management needs special expertise. This is particularly so in the case of the big deals whereby publishers offer an increasing amount of information in return for the maintenance of their income streams, accompanied often by restrictive practices. See the chapter on electronic journals. The changed landscape has two important implications. As jobs relating to paper-based operations disappear the personnel cost profile changes, and as operations are centralised the financing model shifts from a local to a central approach. Decisions about subscribing to electronic resources become more complicated because there is a wider user base at institutional level who may be interested and, as subscription costs rise well above normal inflation, evaluation of the electronic resources receives a sharper focus based, on the one hand, on usage statistics and, on the other, on impact factors and even article-level quality metrics. STM libraries need to inform their customers about the criteria they apply in decision making, such as the results of usage analysis and cost per download. Also, STM libraries need systematically to gather information about the users’ interests and information needs and usage patterns and translate them in their acquisition policy.
61
BPDG_opmaak_12072010.indd 61
13/07/10 11:51
Wouter Schallier
Since electronic journals, the big deals and concerns about the associated price rises have become dominant over the last decade, efforts to combat the negative effects have been most noticeable in the STM sector and there are now signs that these will make a contribution to the changing business model for the scholarly communication market. In the Open Access (OA) model access to information is free and the cost of publication is met by the author, a fact which generates a shift from paying for access to information to paying for publishing. See the chapter on Open Access. PLoS, BioMedCentral and other OA publishers in Science, Technology and Medicine offer such a model. Institutions can lower the burden of the publishing costs for individual researchers by becoming members of the OA publishers. Several studies seem to show that overall OA publishing has less financial impact on libraries and on their host institutions than traditional publishing. STM libraries are increasingly challenged to find complementary funding. This can come from revenues generated by products and services for which the customer pays, such as document delivery. These paying services usually are not intended to make a net profit but to compensate, at least partially, for expenses (personnel costs for example). Complementary funding can also come from sponsorship. This has a lot of potential, especially when library and sponsor succeed in working out a programme of partnership which is of benefit to them both. It can give what the library needs (advantageous co-operation and/or extra financial means) and the sponsor what he is looking for (advantageous co-operation and/or visibility). In Europe, most libraries at least have experience with occasional sponsorship in the context of specific events, but have less experience with structural sponsorship. Developing attractive programmes for structural sponsorship requires familiarity with the needs of private companies and a proactive approach. Library and sponsor need to define the rules of the game clearly, in order to avoid ethical or other kinds of conflict. Summary – The digital library is forcing radical change in business planning for STM libraries and provides a framework for innovation – STM libraries need to adapt their space to the needs of modern learning and research environments – STM libraries need to come into the user’s environment and workflows, not the other way round. – There are plenty of opportunities for STM libraries to embed the library in research and education – STM libraries need new staff profiles and a flexible organisation
62
BPDG_opmaak_12072010.indd 62
13/07/10 11:51
THE IMPACT OF THE DIGITAL LIBRARY
References Canada Health Library Association. Association des bibliothèques de la santé du Canada (2008), Canadian Virtual Health Library - Feasibility Study and Readiness Assessment / Environmental scan. Canada Health Infoway Phase 0 Report: available at http://www.chla-absc.ca/nnlh/cvhlfeas.pdf (accessed 31 August 2009) Collier M. (2005), “The business aims of eight national libraries in digital library cooperation”, Journal of Documentation, Vol. 61 No. 5 Council on Library and Information Resources (2008), No brief candle: Reconceiving Research Libraries for the 21st Century: available at http://www.clir.org/pubs/reports/pub142/pub142. pdf (accessed 31 August 2009) Guthrie, K., Griffiths, R. and Maron, N. (2008), Sustainability and Revenue Models for Online Academic Resources – An Ithaka Report: available at http://www.ithaka.org/ithaka-s-r/strategy/ sca_ithaka_sustainability_report-final.pdf (accessed 31 August 2009) Harriman, J.HP. (2008), Creating your library’s business plan: a how-to-do-it manual with samples on CD-rom, Facet Publishing, London MacKintosh, P.J. (1999), “Writing an effective business plan for fee-based services”, Journal of Interlibrary loan, document delivery and information supply, Vol. 10 No.1 McNamara, C. (2008), “Business planning (for nonprofits or for-profits)”, Free Management Library: available at http://www.managementhelp.org/plan_dec/bus_plan/bus_plan.htm (accessed 31 August 2009) Marvel P. (2003), “How 2003 Library of the Year Las Vegas-Clark County effectively uses marketing and PR planning”, The Gale Report (August): available at http://www.gale.cengage. com/enewsletters/gale_report/2003_08/voices.htm (accessed 31 August 2009) Ontario Digital Library Steering Committee (2003), The Ontario Digital Library business plan, available at http://www.accessola2.com/odl/pdf/ODL_BusinessPlan_Full.pdf (accessed 31 August 2009) Open Society Institute (2003), Guide to business planning for launching a new open access journal, 2nd ed.: available at http://www.soros.org/openaccess/oajguides/business_planning.pdf (accessed 31 August 2009) Schallier W. (2006), De Biomedische Bibliotheek van de K.U.Leuven. Krachtlijnen voor de ontwikkeling tot een informatie- en leercentrum [unpublished report] Schultz L. (1998), “Strategic planning in a university library”, MLS: Marketing Library Services, Vol. 12 No. 5: available at http://www.infotoday.com/mls/jul98/story.htm (accessed 31 August 2009) Van den Brekel G. (2006), Into the User Environment now!: How the users changed and how the libraries can adjust, 10th European Conference of Medical and Health Libraries: available at http://www.eahil.net/conferences/cluj_2006/www.eahilconfcluj.ro/docs/plenary_session3/ vandenbrekel.doc (accessed 31 August 2009) Weingand D. (1999), Marketing/Planning library and information services, Libraries Unlimited, Englewood
63
BPDG_opmaak_12072010.indd 63
13/07/10 11:51
BPDG_opmaak_12072010.indd 64
13/07/10 11:51
Business Planning for Digital Libraries Practice chapters
BPDG_opmaak_12072010.indd 65
13/07/10 11:51
BPDG_opmaak_12072010.indd 66
13/07/10 11:51
6 E-JOURNALS IN BUSINESS PLANNING FOR DIGITAL LIBRARIES Mel Collier and Hilde Van Kiel
State of the art In no other aspect of the digital library has the development been so rapid and the dominance so definitively established as in e-journals. In 1995 e-journals still existed only in embryonic and experimental form, whereas now, only fifteen years later, e-journals are the dominant method of publishing in the natural and the exact sciences and the preferred way of publishing research results. It is estimated that there are now (2008-9) some 59,549 e-journals in existence and any large broadly based university library is likely to have subscriptions to a substantial proportion of the available total, normally through the so-called “big deals”. At the University of Leuven for instance some 23,276 e-journals are available to users. While it is true that growth in the humanities has not been so dramatic, the trend towards electronic publishing can also be discerned there, encouraged by the wealth of access now available to humanities scholars through retrospective digitization programmes. The dominance of e-journals is graphically illustrated by the increasing debate over whether for certain disciplines the traditional library as a place is required at all. This debate is likely to be liveliest where the library is highly research orientated in a narrow enough set of disciplines where it can reasonably be said that e-journals can satisfy the great majority of needs, and where there is no overriding mission to retain paper stocks and subscriptions for long-term preservation purposes. This debate at the biomedical campus of the University of Leuven has led to a decision to go e-only in the campus library. Similarly the chemistry departmental library at the University of Oxford had already been effectively abolished in favour of online access as early as 2003. The dominance of e-journals is further illustrated by the proportion of the journals budget that is devoted to them. At Leuven in 2009 around 75% of current journal titles are available electronically (due to pricing policies and differential value added tax rates a substantial number are for the time being also available as print), which itself may not be a particularly high proportion because of the strength of the humanities and the need for Dutch language publications. This change, radical enough in itself, has also brought another fundamental change in the market and in the relationship between publisher and user or subscriber. With paper journals the subscriber becomes the owner of the purchased volumes, whereas in the electronic domain the subscriber normally obtains only access to, not ownership of, the current output. This leads to concerns over continuity of access in the future and to provision for preservation of the research record. It is true that several publishers who only 67
BPDG_opmaak_12072010.indd 67
13/07/10 11:51
Mel Collier and Hilde Van Kiel
license access to current output nevertheless sell ownership of the archive or permanent access after a certain period. This may provide reassurance about access in the long term if technical aspects of preservation and legal guarantees can be solved. Brief historical review Although ideas for e-journals had been circulating for some time the first experiments started to appear as early as the 1980s. In Britain, for instance the BLEND (Shakel 1983) project was supported by the British Library Research and Development Department. Other experimental projects included HyperBIT (Mcknight 1993). By 1992 the chairman of the Advisory Committee of the British Library’s Research and Development Department, Sir Peter Swinnerton-Dyer, wrote a paper suggesting it was time for electronic journals to be established in the UK on a regular rather than experimental basis (Meadows 1994). In the middle of the 1990s the first commercial products began to appear. By 1995 there were about 100 peer-reviewed e-journals available (Hitchcock 1998). The growth to the present figure amounts to a phenomenal change in the structure of an industry, in the process of scholarly communication and in library economics over an extremely short period. This growth in e-journals generally brought great improvements in access to scholarly publication, both in immediacy of access (provided you have access to the Internet) and in range of accessible journals (provided you or your institution can afford them). The Big Deal appeared on the market. The change from ownership to licensing of access began to change the nature of the library and the behaviour of its users. It created uncertainty about future access to publications for which a hefty licence fee had already been paid, alleviated somewhat by the offer of archives for purchase. During this period there was also a continuous trend towards consolidation in the commercial journal publishing industry, raising further concerns about monopolistic practices. At the same time not-forprofit organizations such as JSTOR were busy digitizing back sets of journals and making them available at reasonable prices. This has started to have a major impact on access to scholarly information in the humanities. In response to what was seen by some as the unbridled exploitation by commercial publishers of research results and scholarly output which was usually funded in the first place by the public purse, the Open Access movement was founded and gathered pace. It remains to be seen whether this will provoke structural change in the market. See the further discussion in the chapter by Prosser. Alongside the provision of e-journals associated services for their management have developed. Under pressure from users and libraries e-journal publishers are gradually providing improved statistics on the use of their products, ideally according to the COUNTER standards. Library management systems suppliers and serials agents also provide solutions for the management of e-journal subscriptions and for federated searching of e-journals across other sorts of resources. This is dependent of course on the e-journal suppliers making their metadata available or harvestable. 68
BPDG_opmaak_12072010.indd 68
13/07/10 11:51
E-JOURNALS IN BUSINESS PLANNING FOR DIGITAL LIBRARIES
The e-journal market: consolidation and consortia Consolidation Consolidation in the scholarly journal market over recent years has led to the domination of the market by a relatively few players such as Elsevier, Springer and Wiley-Blackwell. This has not led to a reduction of titles on offer (as the big players themselves continually offer new ones) but to the buying-out of smaller publishers by larger ones, and in some cases learned society publishers selling out to commercial ones. It does however have an important effect on library budgets, as expenditure flows to fewer and fewer suppliers and the budget allocated to the remaining smaller publishers comes under pressure. This dominance provides the big players with two great advantages: the ability to control prices – and always until now in an upwards direction at more than the rate of general inflation – and to protect their position as owners of journals with the highest impact factors. Consortia In order to cope with this radical effect on library expenditures and on the complexity of licensing and access issues libraries formed consortia for the purpose of negotiating with publishers. They were actually encouraged in this to a large extent by publishers, who could see reductions in their sales and marketing costs by not having to deal directly with so many customers. Libraries are no strangers to consortia as librarians have long seen the advantages of scale in cooperation in cataloguing, inter-library loan and acquisition(Perry 2009). But ever since OHIOLINK made its first deal for e-resources in the 1990s there has been an enormous growth in the number of consortia worldwide with an emphasis on negotiating e-resources. An overview of European consortia has been given by Giordano (2005) and by Hormia-Poutanen (2006). Because of the so-called serials crisis through the 1990s, users had seen a sharp decrease in the number of articles and journals to which they had access. Therefore the focus of OHIOLINK’s deal originally was not so much on saving money as on gaining more access for its users (Kohl 2001). The idea and format of what Frazier was to call the ‘Big Deal’ was born: the consortium would get access to a large number of journals by agreeing to pay at least the sum of its members’ current spend plus an agreed annual price cap and a no-cancellation clause for their print subscriptions. Frazier (2001) put his finger on the downside of the Big Deal: yes users did get more access, but a few big publishers would drain most of the libraries’ budgets, thus squeezing out the smaller ones. Those big publishers would also gain a solid position, because a consequence of the Big Deal was that the selection power of the librarians diminished and the non-disclosure clauses made it difficult for libraries to exchange information on their deals. Consortia soon realized that internal information exchange on deals and other consortialinked problems might become very important. In 1997 COC (Consortium of Consortia) came together as an informal group to exchange information between (North American) consortia. At this meeting consortia coordinators were able to talk about the outlines of their deals and proposals without breaking the non-disclosure condition. Publishers were 69
BPDG_opmaak_12072010.indd 69
13/07/10 11:51
Mel Collier and Hilde Van Kiel
asked to present their products and to agree to a so-called grill session. This was beneficial for both parties: consortia coordinators and librarians could explain the problems they had with the publishers’ product(s), licence(s) or technical conditions; publishers saw this as a very effective way of saving marketing resources, as they would get a reaction from a large group to their proposals (Friend 2003). Through this dialogue publishers and librarians sometimes came to a better understanding. The organization captured the interest of consortia worldwide and became ICOLC. By 1999 a European section of ICOLC (e-ICOLC) was set up and the coalition now has a spring meeting in the US and an autumn meeting in Europe.1 An analysis of the agenda of the American and European ICOLC meetings shows an interest in all subjects related to electronic resources from a consortium perspective: structure and governance of consortia, the way to negotiate and license, pro’s and con’s of big deals and possible exit strategies or retreat models, future business models, longterm preservation, perpetual access, and usage statistics. Furthermore Open Access, institutional repositories, Transfer Code of Practice, eBooks and Google Books as well as more technical issues like ERM, discovery tools, authorization and authentication appear also on the agenda, although arguably less on the European agenda than on the American. The fact that American consortia were formed more on an existing operational base whereas European consortia started up especially for the e-resource negotiations may be a reason for this. Perry (2009) observes a growth in the number of consortia worldwide by 56% between 2000 and 2009, so this type of organization must clearly have benefits for the library community, but what are they and what could be the downside? Financial Benefits of Consortia Despite efforts to get away from it, the Big Deal still seems to be an attractive way to give access to a very large number of electronic resources for library users (Gatten 2004). Although it puts a huge stress on library budgets and has its negative points, there is a strong supporting financial case on the basis of cost per article. Consortia are able to provide more access to electronic content than any single institution could afford on its own budget. Users get a better service from their library because of the consortium arrangements. A study of the deep logs of OHIOLINK showed that all of the journals within their deals are really used (Nicholas 2006). Within K.U.Leuven one of our consortium deals gave us back access to a number of journals which had been cancelled in the previous years due to budget constraints. Seven of them show up in our thirty most used journals from that publisher for 2008. Licences are essential to gain access to e-resources. In the early days each publisher had his own licence, sometimes even a different licence per product. Soon consortia realized a model licence accepted by a large library community and by a large number of publishers would be beneficial for both parties. Hence the proposed model licences by LibLicense and Nesli2. One of the first achievements of ICOLC was a statement on licensing issues which is used as a guideline for negotiations and endorsed by many consortia.2 1 2
http://www.library.yale.edu/consortia/. Statement of current perspective and preferred practices for the selection and purchase of electronic information, 1998. http://www.library.yale.edu/consortia/statementsanddocuments.html. 70
BPDG_opmaak_12072010.indd 70
13/07/10 11:51
E-JOURNALS IN BUSINESS PLANNING FOR DIGITAL LIBRARIES
Pressure for Better Service Consortia representing a number of libraries have more power to improve licence terms during negotiations than an individual library would have. Terms on interlibrary loan, institutional repositories and learning environments have been achieved by consortia worldwide (Grogg 2009). Consortia also have tried to find models to get away from the big deal, as for example the orderly retreat from OHIOLINK (Gatten 2004). Usage Monitoring If a large part of your collection is in an electronic format, a big advantage is that you will be able to look at usage. As only a few consortia are able to load all data locally, as OHIOLINK does, most consortia still rely on usage data provided by the publishers. To compare usage of different publishers there has to be a generally accepted format. Based on work done by ARL, NISO and the Guidelines for Statistical Measures of usage of Web-based Information Resources from ICOLC, the COUNTER Code of Practice has now been accepted by most of the publishers, thus providing consortia and librarians with a uniform format of usage statistics (Shepherd 2002). Improvement is still possible, but the results are already extremely useful. Policy Benefits Other statements have been issued; the latest statement on the economic crisis by ICOLC was supported by a hundred consortia worldwide, reinforced on some points by an ARL statement (Hahn 2009). This gives the statements very strong support and we see that publishers do take them into account. A study of the Canadian situation showed that librarians felt consortia activities brought them more together and gave them more knowledge and respect for each other’s situation (Maskell 2008). This applies not only to librarians but also their university management seems to get a new perspective on the library community. Politicians and university deans take notice more of the library activity because of the consortia activity and will consider libraries involved in consortia as being more forward looking (Friend 2003). At Leuven University this has clearly been the case: via talks about the Big Deals arranged by the university library, the contacts between university management and library management became much closer than before and library issues gained more interest at university management level. This higher profile also provided opportunities to talk about Open Access possibilities and invited researchers and academic staff to rethink their involvement in library policy making. Other Benefits Besides making available e-journals and other electronic resources, some consortia provide extra services to their members. Consortia offer platforms with access to the resources for their members, be it via loading all content on this platform as OHIOLINK does, by developing a portal (for example Nelli by FinELib) or by providing access through a commercial package (HealLink) (Hormia-Poutanen 2006). The fact that most consortia work with one or more dedicated negotiators gives these people the opportunity to 71
BPDG_opmaak_12072010.indd 71
13/07/10 11:51
Mel Collier and Hilde Van Kiel
become experts in the field, which is beneficial for their members as they will not all have specialized personnel to carry out negotiations. The pressure a consortium is able to put on publishers gives it the ability to work towards new licensing and publication models (see below on the future of consortia). Benefits on the publishers’ side are quite clear: they have to deal with some hundreds of consortia which represent thousands of libraries, which means substantial staff savings, and through the big deals they are assured of constant revenue for a number of years. Because of the larger consortia deals publishers have seen a quicker change to e-only deals than would have been the case had they had to deal with all the institutions separately. They also had an immediate starting point for the marketing of extra products such as archive collections and e-books. Disadvantages Consortium forming does not always run smoothly. Certainly a few years ago libraries spent quite some time making up internal cost division models. Librarians spent a large amount of their time on discussions to find a good working model for their consortium; in some cases this caused serious internal problems within the consortium. The Big Deal, which has many advantages as mentioned above, can also be seen to have serious downsides. Publishers are in a quasi-monopolistic position and can use this strength to place restrictive conditions on deals such as limited scope for cancellation of titles. Many deals are so constructed that if clients wanted to have a selection of the publisher’s titles they would pay far more than they would for the total package. For publishers this is an excellent lever for maintaining income streams but places serious limitations on libraries’ room for manoeuvre. In the good times this is less of a problem, but when all libraries are suffering budget restrictions it is clearly a disadvantage. To make matters worse publishers have been used to using their position to demand exorbitant price rises year on year. It is clear that the Big Deal, or at least the supply-side philosophy and the practice, is in need of reform, especially in the current financial climate. The progressive consolidation of the academic publishing industry mentioned above tends to concentrate more and more of a library’s budget on fewer and fewer publishers and can put small publishers in a very difficult position. There can be different sorts of consortia: regional, subject oriented, national. As a result a library which wishes to buy an electronic resource can choose between different consortia which all provide a deal for that product. In this case there is the possibility of undesirable competition between consortia (Grogg 2009). Consortia are providing more access, but also more streamlined access. Collection development therefore shifts from institutional level to consortium level, and this puts pressure on librarians to explain to their users why there is less flexibility in decision making than used to be the case. For a small region like Flanders, deals for the universities are not on an opt-in base. This puts pressure on the institutions to find the necessary money if a majority of the universities would like to have a certain deal.
72
BPDG_opmaak_12072010.indd 72
13/07/10 11:51
E-JOURNALS IN BUSINESS PLANNING FOR DIGITAL LIBRARIES
Future of Consortia Consortia are active in exploring new publication models. They can use their negotiating power to combine access to the subscription collection of a publisher with an agreement that allows their researchers to publish a number of articles in Open Access for the same amount of money they spent before, for example the Springer deals with Max Planck and UKB (The Dutch consortium).3 They can provide information on new publication models: the French consortium Couperin opened a website on Open Access to make the research community aware of all the possibilities.4 Consortia will also try to explore new licence models. In Europe consortia soon come up against national borders, so DEFF (Denmark), JISC (United Kingdom), DFG (Germany) and SURF (Netherlands) joined forces to start the Knowledge exchange. This is a way to explore the possibilities of supra-national contracts, new licence models and an experiment with tender procedures instead of negotiated licences as well as getting a layer of information freely available on the Internet.5 Consortia are also looking to extend their activities to e-books, such as with the JISC programme (see the chapter by Woodward). In the USA we see a number of consortia merging. SOLINET and PALINET now form Lyrasis which is a huge consortium of more than 4000 libraries.6 Maybe these very large consortia will become more like an intermediary or agent between publishers and libraries. Agents like Swets and Ebsco may be entitled to feel more than a little rueful about this. The form of the e-journal In the early days it was expected that technology would bring about radically new forms of scholarly publication and certainly diversion from the traditional print journal model. Interestingly this is yet to happen on any large scale. E-journals generally still mimic the paper form. This is probably due to the fact that the economic model of the paper form (the subscription) has transferred easily into the e-journal era and the academic world relies heavily on the traditional form for its measurement of research excellence. Electronic resources management (ERM) As electronic collections grew larger, libraries soon realized that the process of handling these resources was quite complex and libraries and consortia sought systems to oversee the whole flow of information they had to manage. Publishers provided information by a variety of means: emails, paperwork and of course licences themselves, which libraries had to disseminate to different user groups. Traditional library management systems (LMS) provided librarians with no means to manage the workflow of e-resources. In 2001 Tim Jewell started working on best practices for the selection and presentation of commercially available electronic resources (Jewell 2001) which was taken further by the Springer, Max Planck Reach Deal, Library Journal (feb 2008), http://www.libraryjournal.com/article/ CA6533053.html (viewed 7 December 2009). 4 http://www.couperin.org/archivesouvertes/ (viewed 7 December 2009). 5 http://www.knowledge-exchange.info/ (viewed on 7 December 2009). 6 http://www.lyrasis.org/About-Us/Overview.aspx (viewed 26 November 2009). 3
73
BPDG_opmaak_12072010.indd 73
13/07/10 11:51
Mel Collier and Hilde Van Kiel
Digital Library Federation and resulted in 2004 in a report from the DLF ERM Initiative with standards for ERM data (Jewell 2004). Major issues that ERMs should solve are: – Being a central point of administrative data on e-resources (licence details, acquisition, statistics) – Being able to follow the workflow of the licensing process – Providing different licence details to multiple people – Being able to work with the structure of the packages and their details – Providing facilities for trials – Interchange of information with the LMS – Overseeing the information in a consortium environment With a standard now being available, commercial vendors used this as a guideline to develop their products, but in the meantime larger universities or consortia had also already developed their own electronic resource management systems, such as MIT’s VERA, Johns Hopkins’ HERMES and Gold Rush from the Colorado Alliance of Research Libraries (Nelson 2009). More recently subscription agents also offer systems for resource management, namely ERM Essentials from Ebsco and eSource Manager from Swets. Although it is only five years since the standard was developed, products have appeared and disappeared again partly because some of the major LMS vendors have merged (for example Ex Libris acquired Endeavor from Elsevier and SirsiDynix partnered with Serial Solutions) but also because some of the home-grown products such as HERMES were no longer maintained. A good overview of products and their specifications can be found in Collins’ article (Collins 2008). So far it appears that existing systems on the market have difficulty in being flexible and responsive enough to cope with the ever changing environment. ERMs are still ‘Ghosts in the machine’(Emery 2007) as they do not yet offer the right balance between the effort required to input the information and the deliverable outputs. This may be a very difficult task as the e-resources environment still has not stabilized and may not do so in the near future. It is a big problem to keep track as journal titles change every now and then from one publisher to another. Packages may be cancelled while perpetual access rights still guarantee access, and up to now there does not appear to have been a watertight system to provide an historical overview of collections for the past five to ten years. A possibility being explored at Leuven University is combining the Swets ERM product which would provide a system pre-populated with subscription data and financial information together with the next generation of an LMS provider, for example URM from Ex Libris,7 a solution which might be able to remove most or ideally all of the current obstacles. Cost-benefit of e-journals compared with paper journals A number of benefits of e-journals can be readily identified: benefits which are shared with the digital library in general. E-journals can be consulted independently of time and space by the authorized user; powerful search tools are available to identify the desired subjects and articles; and, once identified, access to the required article is instant to the 7
Unified Resource Management: The Ex Libris Framework for Next- Generation Library Services, Ex Libris Document version 1.1, 2009, http://www.exlibrisgroup.com/files/Solutions/TheExLibrisFrameworkforNextGenerationLibraryServices.pdf (viewed 16 January 2010). 74
BPDG_opmaak_12072010.indd 74
13/07/10 11:51
E-JOURNALS IN BUSINESS PLANNING FOR DIGITAL LIBRARIES
desktop. The system of big deals has progressively provided more titles and content to the user year on year. Publishers argue that per article available their prices have reduced over time. Also in support of e-journals, library managers can argue that against the licence cost of the electronic product, administration costs are reduced by their not having to process the paper volumes, and not storing them, which reduces capital expenditure. Various studies support this argument. A study at Drexel University (Montgomery 2002) suggests that the overall costs per use of providing an electronic journal collection are less than the costs of providing a print collection. Key costs to factor in here are the amortized costs of space for storing print items. Although selection and negotiation costs are higher, cataloguing costs can be reduced by the use of various web lists and tools. It should be noted that Drexel takes little responsibility for long term archiving. Another study suggested that labour requirements generally are lower in a digital than a paper library; that space requirements are lower; and that materials requirements are also lower. The study is inconclusive about digital equipment requirements (Connaway 2003). A further study (Schonfeld 2004) develops the comparison methodology much further on a life cycle basis. The study restricts itself to the non-subscription costs and concludes that the transition to electronic journals is likely to offer substantial long term savings. On the negative side the caveat to all these studies is that the electronic form at this time offers no water-tight assurance of long-term access or digital preservation. This is clearly so, but the question for each library is whether long-term preservation and access are its responsibility. It is arguable that the responsibility for long-term access and preservation lies with the national institutions of the country in which the material is published. In so far as digital access can be assured through the techniques available at a given time, it makes little difference whether access is provided by a local, national, continental or global partner, provided sufficient assurance and redundancy are built in. A further debatable point is the long-term sustainability of the big deal culture. On the one hand the big deals tend to offer a lower cost per title and possibly per use, but on the other the monopolistic practices of the publishers tend to render the overall cost of the big deals unsustainable as they result in annual price rises above normal inflation and tend to disallow cancellation from the package of titles which are not well-used. Nevertheless, leaving aside the aesthetic issues of reading on screen as against reading on paper, it would appear that the long-term strategic cost-benefits are strongly stacked in favour of e-journals, particularly in the natural and exact sciences, and increasingly in the humanities also, assuming that the library in question does not take upon itself the responsibility for long-term access and preservation. Business planning elements Budget As can be gathered from the previous parts of this chapter the major issue facing research libraries with large e-journal budgets is the challenge of paying for the licences in a culture of big deals where year-on-year prices are rising at rates above inflation. The fact that 75
BPDG_opmaak_12072010.indd 75
13/07/10 11:51
Mel Collier and Hilde Van Kiel
many librarians feel tied in to the deals either by contractual terms or user demands means that the e-journal budget will progressively absorb more and more of the available budget, and the share of the e-journal budget will flow to fewer suppliers. Restrictive practices which disallow cancellation of titles within the big deals can have the effect of subordinating collection development. On the other hand the costs of acquisition and storage are dramatically reduced or even eliminated. Administration Negotiation of e-journal contracts is undoubtedly a more complex and time-consuming affair than was the case with the paper journal. Whereas in the past much of the routine administrative work was done by the serials agents at their cost (from within their discount margins) they have now largely been squeezed out and the cost falls on the library. It is true that costs can be shared by belonging to a consortium, but the cost of belonging to the consortium must be included in the budget. The combination of big deals and fewer suppliers may imply that under public sector procurement rules periodic tender procedures have to be employed. On the other hand labour intensive tasks of check-in, chasing missing parts, binding and physical handling are eliminated. Consequences for library organization Although long heralded (or threatened, depending on your point of view) the demise of the library as we knew it is now at least partly coming to pass. Commercial firms which once had substantial scientific research libraries see no further need for them. Academic libraries which are specialized in the sciences where e-journals are dominant are coming more and more into question, and if they survive are being re-born as learning centres. This inevitably can bring uncomfortable times for the staff involved, hopefully resolved by appropriate re-orientation of jobs or redeployment. Long-term access Longevity of e-resources is an unresolved issue, so the move to e-only journals coupled with non-retention of the paper back issues poses a risk for long-term access. This can be addressed by co-operative storage of the paper version if it exists. If the journal is published only in electronic form then one can only rely on best practice to provide for future accessibility. Responsibility for long-term access must imply utilization of the best possible digital archiving procedures available at the time and onwards into the future. This responsibility cannot be left to the publisher alone, as the future of the publishing concern, whether commercial or not, can never be guaranteed. Steps must be taken therefore, as Elsevier has done with the Dutch Royal Library, to entrust archiving to reliably permanent institutions, backed up in more than one place. Prospects E-journals, already dominant in the natural sciences, will become so gradually across all disciplines including the humanities. It is not unreasonable to suppose that the paper versions of e-journals will eventually be eliminated, for economic reasons. When that 76
BPDG_opmaak_12072010.indd 76
13/07/10 11:51
E-JOURNALS IN BUSINESS PLANNING FOR DIGITAL LIBRARIES
happens, there should be no impediment to the development of innovative forms of the e-journal, whether in the business model or the use of multi-media, because the paper paradigm will have disappeared. The present stranglehold on the market by the commercial scientific publishing sector seems to be unsustainable. Even modest annual price increases over general inflation will lead in a relatively short period to the inability or unwillingness of institutions to pay. The business models must therefore change in the interests of both suppliers and consumers. The Open Access movement can be a lever in this process, but cannot of itself be the total solution, which will need resolute action by the research community. It is inevitable that libraries which are specialized in the natural and exact sciences and which have no archival mission will go e-only. For purely research libraries in those disciplines the library as a place comes seriously into question. For those with an education mission they will transform into learning centres focused on student-centred learning. For multi-disciplinary libraries there are no credible signs that the library as a place will disappear, for books and other non-digital resources remain important for the humanities disciplines and for the preservation of heritage. This does not mean, however, that even those libraries must not change, for it is clear that the behaviour of users is radically changing in the Internet age and the changes we are seeing in libraries as a result of e-journals are only the start. Summary – E-journals have become the dominant form of formal scholarly communication within a short period. – Market consolidation and big deals have led to much wider access, but also to quasimonopolistic practices, which are not sustainable. – The customers responded by forming consortia which can bring significant benefits through collective negotiation. Consortia and consolidation have largely squeezed out intermediaries. – The total cost of access to e-journals is lower than the total cost of ownership of print journals – For business planning big deals pose significant budget challenges. Administration is more complex, but logistics much simpler or even eliminated. – In the natural sciences e-journals can bring the library as a place into question. – Long-term access and preservation of e-journals are not a solved problem, but trusted repositories may provide the solution. References Collins, M. (2008). Electronic resource management systems (ERM’s) review. Serials Review, Vol. 34 No. 4, pp. 267-299 Connaway, L.S. and Lawrence, S.R. (2003) Comparing library resource allocations for the paper and the digital Library. D-Lib magazine Vol. 9 No.12, http://www.dlib.org/ (viewed 5 June 2010) Emery, J. (2007). Ghosts in the machine: the promise of electronic resource management tools. The Serials Librarian, Vol. 51 No.3, pp. 201-208. 77
BPDG_opmaak_12072010.indd 77
13/07/10 11:51
Mel Collier and Hilde Van Kiel
Frazier, K. The librarians’ dilemma: contemplating the costs of the Big Deal. D-Lib Magazine, March 2001, Vol. 7 No. 3, http://www.dlib.org/ (viewed 5 June 2010) Friend, F. J. (2003) Consortia, library buying, Encyclopedia of Library and Information Science, 2nd ed. pp. 97-101. Gatten, J.N. and Sanville, T (2004) An orderly retreat from the big deal: is it possible for consortia? D-Lib magazine Vol. 10 No.10, http://www.dlib.org/ (viewed 5 June 2010) Giordano, T (2005) Overview of European consortia: library consortia in western Europe, Encyclopedia of Library and Information Science, 2nd ed. pp. 1613-1620 Grogg, J.E. and Ashmore, B. The art of the deal. Searcher, March 1, 2009, Vol. 17, pp. 40-48 Hahn, K. (2009). ARL statement to scholarly publishers on the global economic crisis. Research library issues 262 pp. 6-11 Hitchcock, S., Carr, L., Hall, W. (1998) A survey of STM online journals 1990-95: the calm before the storm. Dept of electronics and computer science, University of Southampton, http:// journals.ecs.soton.ac.uk/survey/survey.html#broad-picture (viewed 5 June 2010) Hormia-Poutanen, K. et al. (2006) Consortia in Europe: describing the various solutions through four country examples. Library Trends Vol 54 No.3 pp. :359-381 Jewell, T. D. (2001) Selection and Presentation of Commercially Available Electronic Resources. Digital Library Federation and Council on Library and Information Resources, 2001, http:// www.clir.org/pubs/abstract/pub99abst.html (accessed 17 January 2010) Jewell, T.D. et al. (2004). Electronic resource management: report of the DLF ERM initiative. Digital Library Federation, http://www.diglib.org/pubs/dlf102/ (viewed 5 June 2010) Kohl, D. (2001) To select or not select: taking off the blinders in collection development. In 2000 Charleston Conference Proceedings: Is bigger better? Ed. Rosann Bazirjian and Vicky Speck. Charleston, SC: Against the Grain Press, 2000. pp. 25-33.; also reprinted in Collection Management Vol. 26 No.2, 2001. Mcknight, C. (1993) The electronic journal: a user’s perspective. Serials 6 (1) 13-19, http://uksg. metapress.com/app/home/contribution.asp?referrer=parent&backto=issue,4,14;journal,48,63; linkingpublicationresults,1:107730,1 (viewed 5 June 2010) Maskell, C. A. (2008). Consortia: anti-competitive or in the public good? Library Hi Tech, Vol. 26 No.2 pp. 164-183. Meadows, J. (1994) Innovation in information: twenty years of the British Library Research and Development Department. Bowker Saur. p.108 Montgomery, C.H. and King, D. (2002) Comparing library and user related costs of print and electronic journal collections: a first step towards a comprehensive analysis. D-Lib magazine Vol. 8 No. 10, http://www.dlib.org (viewed 5 June 2010) Nelson, R. (2008). Gold rush: an in-depth look at one of the first ERM’s. The Serials Librarian, Vol 55 No.3, pp. 419-427 Nicholas, D., Huntington, P., Jamali, H.R. and Tenopir, C. (2006) What deep log analysis tells us about the impact of big deals: case study OHIOLINK. Journal of Documentation Vol. 62 No. 4 pp. 482-506 Perry, Katherine A. (2009) Where are Library Consortia Going? Results of a 2009 Survey. Serials Vol 22 No. 2 pp. 122-130 Schonfield, R.C. et al. (2004) Library periodicals expenses: comparison of non-subscription costs of print and electronic formats on a life-cycle basis. D-Lib magazine Vol. 10 No. 1, http://www. dlib.org/ (viewed 5 June 2010) Shakel, B., Pullinger, D., Maude, T.I. and Dodd, W.P (1983) The BLEND-LINC project on ‘electronic journals’ after two years. Computer journal Vol. 26 No. 3 pp. 247-254 Shepherd, P. & Davis, D. (2002). Electronic metrics, performance measures, and statistics for publishers and libraries: building common ground and standards. Portal: Libraries and the Academy, Vol. 2 No.4, pp. 659-663.
78
BPDG_opmaak_12072010.indd 78
13/07/10 11:51
7 E-BOOKS: BUSINESS PLANNING FOR THE DIGITAL LIBRARY Hazel Woodward
History and background E-books as a significant component of the digital library have been around for about a decade but, apart from a visionary description of the ‘Memex’ – a conceptual device used to store, retrieve and display books - made by Vannevar Bush in 1945 (Bush, 1945), the first real attempt to make books available online was in 1971 when Michael Hart keyed in the words of the Declaration of Independence, thus starting Project Gutenberg (Project Gutenberg). Over 20,000 public domain books are now accessible from this freely available service. By the late 1990s a number of publishers and vendors were beginning to experiment with making electronic versions of books available for purchase on the Internet. This was a time-consuming and expensive process as, not only did it involve keying or scanning printed texts into a different format, it also involved obtaining the rights to sell the e-book format. One of the first aggregator companies to offer an e-book service to libraries was NetLibrary (NetLibrary, 2009) in 1999 with an investment of $120 million. The startup service provided access to some 2,000 titles from a range of different publishers. NetLibrary was quickly followed by Questia (Questia, 2009) in 2000 and ebrary in 2001. Both these companies experimented with offering services directly to end users as well as marketing to libraries. In 2001, when the Internet bubble burst, NetLibrary began having financial difficulties, and in 2002 it was bought by OCLC who are its current owners. Confidence in e-books waned, but many publishers did continue to build their e-book portfolios – Taylor & Francis, Elsevier, Wiley and Oxford University Press, to name just a few. As well as building their own e-book platforms most publishers also sold their titles on to aggregators. In 2004 MyiLibrary (MyiLibrary, 2009) and Ebook Library (Ebook Library, 2009) launched their services to libraries, attempting to offer innovative and flexible purchasing models. Most publishers and librarians would agree that the uptake of the scholarly monograph e-book (which is the major component of the publisher and aggregator services referred to above) has been slow. In 2005 the European Commission commented that “the low level of sales [of e-books] has meant that no tracking has been established”, and this seems to still hold true today (Vassilou, 2008). The Book Industry Study Group (BISG) states: “Defining e-books for the purposes of sales reporting involves confronting several questions. Most people in the book business readily agree that sales of e-books for use on personal computers and dedicated reading devices should be included. But 79
BPDG_opmaak_12072010.indd 79
13/07/10 11:51
Hazel Woodward
what about part of books, such as chapters, sold separately in electronic format? What about textual databases? Electronic course materials? Downloadable audio books? Customised electronic products?” (Bennett, 2009). It is extremely difficult to obtain accurate and up-to-date statistics on e-books. Although e-books represent only a small proportion of the total book market, the number of e-books available has grown hugely. Just (2007) claims an average annual rate of growth of about 20%, but that is put into perspective by the fact that in 2006 only 135,492 e-books were available in the US compared to 1,218,397 printed books. Gray also quotes comparable figures from the International Digital Publishing Forum (IDPF) showing that in 2004 there were 70,000 e-books published in the US with a sales revenue of $45 million, and in 2005 100,000 e-books were published with a revenue of $57 million (Gray, 2007). The most recent figures available from the IDPF show that for the first eleven months of 2008, e-book sales were up by 64% and e-book sales were up 108% for the month of November 2008 - but these data only represent returns from 13 US publishers and are based on wholesale figures only (Coker, 2009). The Canadian Association of Research Libraries (2008) documents that expenditure on e-monographs across its member libraries has grown from $1,127,372 in 1999/2000 to $6,048,491 in 2006/2007 – a 436% increase. However, this still represents a small percentage of their total spend on books. According to Blummer, in the US in 2005/2006 e-books represented only 5% of academic library book collections and 2% of public library collections. The reasons for the slow uptake have been well documented. A survey undertaken for the Joint Information Systems Committee – JISC - by the Higher Education Consultancy Group (HECG) (2006) surveyed all UK higher education libraries and obtained an excellent response rate of 68%. Of the 92 respondents, 89 said they were either ‘eager’ or ‘very eager’ to develop e-book collections and 37% thought that in five years’ time their book collections would be half print and half electronic. However, the research indicated some of the reasons for the current low uptake of e-books, and these included: too few e-books were available; available e-books were often out-of-date or unsuitable for the UK market; high pricing and complex pricing models; and lack of e-textbooks and high demand titles. These findings are backed up by ebrary’s global e-book survey (2007). This international study demonstrates that the price and content of e-books are the primary concern of librarians, and for users ‘lack of awareness’ is the primary inhibitor of use, closely followed by ‘difficult to read’ ‘difficult-to-use platforms’ and ‘lack of training’. Types of e-book The difficulty with business planning for e-books is that, across the information industry, there is confusion about what is meant by the term “e-book”. In a recent article, Vassiliou (2008) examined this issue and arrived at a two-part definition: “An e-book is a digital object with textual and/or other content, which arises as a result of integrating the familiar concept of a book with features that can be provided in an electronic environment. 80
BPDG_opmaak_12072010.indd 80
13/07/10 11:51
E-BOOKS: BUSINESS PLANNING FOR THE DIGITAL LIBRARY
E-books typically have in-use features such as search and cross reference functions, hypertext links, bookmarks, annotations, highlights, multimedia objects and interactive tools.” There are many different types of e-book – reference books such as encyclopaedias and handbooks, scholarly monographs, e-textbooks, digitised manuscripts, to name just a few – all nestling under the name of “e-book”. Some of these electronic publications are already successfully incorporated into digital libraries, whereas publishers and librarians are still struggling to find successful business models for other types of publication. So for the purposes of this chapter it will be necessary to identify a range of different options for business planning for e-books within the digital library. As discussed by Sandler et al. (2007), some sectors of the e-books market have performed well beyond expectation. A good example is the large collections of older books, digitised either by commercial publishers or by public sector programmes. These include collections such as Early English Books Online (EEBO), containing some 125,000 volumes, and Eighteenth Century Collections Online (ECCO), containing 150,000 volumes, which have been widely purchased by libraries and library consortia and enthusiastically received by scholars. Sandler attributes this to a range of factors including: low per-volume cost and a one-off payment; clear value to scholars; availability of bibliographic records; and little duplication of print holdings. Another group of publications – electronic reference works – has also flourished in the e-book market place. Titles such as Oxford Reference Online, Encyclopedia Brittanica, KnowUK (over 100 widely used reference publications) and Gale Virtual Reference (21 electronic reference titles including several multi-volume sets) have also shown considerable market strength. Issues relating to scholarly monographs have been touched upon in the opening section of this chapter, but there is a final category of e-books of significant relevance to the digital library – and that is e-textbooks and course reading list materials. These are the high demand items which all librarians and users want to acquire electronically, and more often than not they are simply not available. The reasons are clear. These are the titles in a publisher’s portfolio which are the big sellers and have an obvious market outside libraries – namely students. Publishers are very concerned that if these titles were to be available electronically via libraries then they would lose a significant proportion of student sales. The importance of e-textbooks was a key finding of the Higher Education Consultancy Group (2006) research and has been pursued by the JISC E-Books Working Group in the form of the National E-Book Observatory Project. In this project, 36 course reading e-books in the subject areas of business studies, engineering, medicine and media studies are being made freely available at the point of use to some 120 UK higher education libraries for a period of two years. JISC funding of £600,000 was made available for the purchase of the titles and selection was by extensive community consultation. How, when and where the titles are being used is currently being studied by the CIBER research group at University College London using deep log analysis. In addition, each title will be analysed against print sales figures provided by the publishers/aggregators and print circulation data provided by libraries over the lifetime of the study and for the previous three years. This will provide unique market research data for both libraries and publishers (Joint Information Systems Committee, 2009). 81
BPDG_opmaak_12072010.indd 81
13/07/10 11:51
Hazel Woodward
Business planning Business planning is an exercise which should be undertaken in all libraries on a regular and rolling basis. The starting point is the mission or vision statement from which all planning and delivery flows. In a presentation on developing vision and mission statements in European libraries, Hans Geleijnse states that in constructing the statement for Tilburg University he had uppermost in his mind: an emphasis on better use of the digital library; the shift of focus from the library as provider of information to the added value from the customer perspective; mobile computing and access to the network; and focus on the specific tailored needs of individual users (Geleijnse, 2004). Many good examples from libraries abound, including “The British Library is at the forefront of managing the UK’s digital future”, and from Delft University of Technology “Our goal is to evolve into a national (digital) knowledge centre for the technical-scientific world”. Once the statement is crafted the library management team can then focus on drawing up a delivery plan to implement the vision. Given the digital future of libraries which is reflected in our mission statements, all librarians must therefore give thought to developing their e-book collections in the context of supporting research, learning and teaching within their institutions. There is a good business case for doing this – from the perspective of both the user and the library. For the user, the key benefits include: 24 x7 access from anywhere in the world; browsing and searching capabilities across individual texts and often across a corpus of titles; reference linking; highlighting and annotating; cutting, pasting, saving and printing (where allowed by the publisher). Ultimately this provides users with a quicker and more efficient way of finding information and saves them time. For the librarian, benefits include: space saving (and therefore substantial cost saving); elimination of manual processing; speeding up of the ordering process; no risk of books being lost or stolen; usage data; and, in some cases, MARC records supplied by the vendor. There are, however, downsides to e-books which need to be acknowledged. These include software and hardware issues; multiple and different interfaces; publishers’ digital rights management (DRM) restrictions; and lack of availability of key titles. Acquiring e-books The most significant cost in the provision of e-books is their purchase, and one of the major decisions to be made is whether to purchase outright or to subscribe under licence. Neither model predominates, as shown by the ebrary survey (2007) which found that 59% of librarians said they preferred outright purchase of e-books and 55% said they preferred the subscription model. Business models for the purchase of e-books vary wildly. At one end of the spectrum is Springer which effectively offers its e-books as a ‘big deal’ or as subject packages – as publishers do with e-journals. Springer has all its books available electronically from 2005 (even textbooks) and is adding new titles at the rate of some 4,000 a year. Books are purchased outright and have no DRM, so there is no restriction on use, usage, downloading or printing. Titles can also be purchased individually from third-party aggregators including ebrary, NetLibrary and MyiLibrary (myiLibrary, 2009). Pricing is based on the size of an individual library or consortium, 82
BPDG_opmaak_12072010.indd 82
13/07/10 11:51
E-BOOKS: BUSINESS PLANNING FOR THE DIGITAL LIBRARY
and the annual packages offer significantly lower title costs than list price as well as perpetual access to acquired content. Many publishers have recognised the different business requirements of libraries and provide a range of different purchasing options. Taylor & Francis is a good example. It has over 20,000 e-book titles, and these can be purchased outright, purchased through third party vendors/aggregators, or acquired on annual subscription. Individual titles can be acquired through its ‘pick and mix’ option or libraries can opt for bundles such as subject collections and bestseller packages. Taylor & Francis claims that online sales of e-books currently represent 10% of its book sales (Chesher, 2008). Another major publisher, Wiley-Blackwell (2009), also offers a range of purchasing options to libraries. The Wiley InterScience OnlineBooks collection contains more than 1,700 titles in the subject areas of STM (scientific, technical, medical) and business and finance. Titles are available from the InterScience platform and customers may purchase title-by-title, outright or on subscription. The company recently announced an agreement with YPB Library Services which it claims makes it the first large publisher fully to integrate its entire list of online research books with a leading library book distributor (Wiley-Blackwell, 2009). Both Taylor & Francis and Wiley-Blackwell offer other benefits to libraries, including free MARC records and COUNTER-compliant usage statistics. A further example of a publisher-based offer is that of Cambridge University Press. As well as offering individual titles for purchase from its eBookstore, it offers a 221-title ‘Complete Companions Collection’ which consists of currently published textbook series, introducing major artists, writers and philosophers. It also contains some 2,000 specially commissioned essays on related topics.
E-textbooks The issue of e-textbooks has already been touched upon earlier in this chapter. E-textbooks are almost certainly the most difficult type of e-book for libraries to acquire as the ‘student pays’ model is still predominant. A number of publishers and other online retailers provide services for students to purchase e-textbooks. John Smith Campus Bookshops (part of the Ingram Digital Group which owns MyiLibrary) have, over the past few months, negotiated e-licences for some 90 textbooks which they are making available via their bookshops and online, as part of a package with the printed copy of the book (John Smith, 2009). Sellstudentstuff.com and ebooks.com offer download options to students on a rental basis, and Cengage’s iChapters provides students with options for purchase and download at both title level and chapter level. Another option, specifically for UK libraries, is to utilise the CLA’s “Comprehensive HE Licence” which allows higher education institutions to scan extracts from printed books (and journals) which they have purchased, and make them available via their online short loan collection or via the institution’s virtual learning environment (VLE). The (UK) Publishers’ Association estimates that between 70% and 90% of textbook income comes from students, and expenditure by undergraduate students in the UK in 2006/2007 was £219.5 million. According to the (US) Student Public Interest Research Group (SPIRG) university students spend on average $900 a year on textbooks and the price of textbooks has increased at four times the rate of inflation since 1994 (Content Complete, 2009). Thus a move to e-textbook provision by libraries would mean a major 83
BPDG_opmaak_12072010.indd 83
13/07/10 11:51
Hazel Woodward
change in the economic model for publishers, and a substantial increase in costs for libraries. A recent report to JISC by Content Complete (Content Complete, 2009), currently being considered by the JISC E-Books Working Group, recommends the setting up of a series of trials of business models for library provision of e-books. The suggested options include: offering concurrent user and session-based access to e-textbooks via aggregated platforms; extending access to textbooks via publisher platforms; and libraries participating in offering students a range of access options. If the trials go ahead, they will attempt to answer questions such as “What do students want if they have a choice?”; “Does making online access available in libraries increase sell-through of existing adoptions?”; and “Does offering online access through libraries increase the potential for new adoptions?”.
Open access As far as publishers’ business models for e-books are concerned, one of the most interesting recent announcements has come from Bloomsbury, which has created a new imprint, Bloomsbury Academic (Bloomsbury Academic, 2009). They will be using essentially an open access model. Some 50 titles in the humanities and social sciences will be made available free of charge online using a Creative Commons licence (which allows only non-commercial use) (Serials e-News, 2009). Both librarians and publishers will be watching the progress of this innovative initiative closely. Another interesting open access development is that of academics making course materials freely available online. The Text Book Revolution website is run by student volunteers and lists hundreds of open access textbooks on a wide range of subjects. Flat World Knowledge is another new company also promising free access to textbooks. One of the first titles to be made available is “Introduction to economic analysis”, written by an economist from California Institute of Technology.
Aggregator services Another popular way for libraries to acquire e-books is through aggregator services, already mentioned above, such as NetLibrary, MyiLibrary, Ebrary, EBL (eBook Library) and Dawsonera (Dawsonera, 2009). Such services have the advantage that they provide access to a wide range of e-books from different publishers. Once again, books can usually either be purchased outright or subscribed to, and many aggregators offer a choice of multiple concurrent use or non-simultaneous use. The growing use of electronic data interchange (EDI) for ordering e-books also provides libraries with a more streamlined ordering process - thus saving on staff time. A variant on the large multi-disciplinary aggregator services is the subject-based packages such as Safari Tech Books Online - information technology e-books (Safari, 2009) and Knovel - engineering e-books (Knovel, 2009). Once again these books come from a range publishers; Safari, for example, contains titles from O’Reilly Media, and from Pearson, the imprints of which include Addison-Wesley, IBM Press, Microsoft and Prentice Hall.
Library consortia Library consortium activity in the field of e-books pales into insignificance compared to their work on the licensing of e-journals and e-databases, but there is nevertheless 84
BPDG_opmaak_12072010.indd 84
13/07/10 11:51
E-BOOKS: BUSINESS PLANNING FOR THE DIGITAL LIBRARY
substantial interest in e-books among the International Coalition of Library Consortia (ICOLC) members. JISC Collections (2009), the main UK library consortium, has been active in this area for a number of years and offers e-book deals to its members from major publishers such as Taylor & Francis, Oxford University Press, Wiley, Gale, Brittanica and ProQuest. The JISC E-Books Working Group has also commissioned a range of research projects on e-books. The National E-Books Observatory Project is described above, and project work is currently ongoing on the management and economic impact of e-textbook business models. Other research reports are freely available on JISC Collections web pages (JISC Collections, 2009). OhioLINK – a leading US library consortium - has recently signed an agreement with CourseSmart to address concerns about the high price of textbooks. Although this is still rooted in the ‘student pays’ model of textbook purchase, OhioLINK has in effect become an affiliate of CourseSmart by waiving its affiliate fee and making textbooks available for purchase via the OhioLINK portal with higher levels of discount. In Korea, KERIS has formed an e-book consortium for 90 universities and has created a shared portal for English language e-books from NetLibrary. Research by KERIS estimates that the duplication on monographs among its member libraries is 60%–70% (Park, 2007). Thus consortium purchase, at discounted prices, provides significant value-for-money to member libraries. Many other consortia have also negotiated and licensed content from major publishers and aggregators, and there are clearly considerable cost benefits to libraries in this approach. In terms of business planning, the cost of the purchase or the subscription is easy to substantiate, although it has to be pointed out that, in the majority of cases, the outright purchase price of individual titles by individual libraries is often higher than when purchasing the print equivalent, as the price paid for the e-book is that of the hardback edition of the book. In the case of purchase from aggregators, the cost of access to the e-book platform must also be taken into consideration. Where titles are purchased in packages, cost per title will almost certainly be lower than list price. Associated benefits, such as integrated EDI systems, are more difficult to cost – although savings in staff time are bound to accrue. Where libraries are part of a consortium or opt in to a consortium deal, there are additional benefits. Not only will the price be discounted by bulk purchase, but consortium negotiators are skilled in the art of negotiation and frequently utilise model licences which provide libraries with more favourable conditions of use than they could negotiate individually.
Usage data For business planning, once e-books have been acquired by the library, it is extremely important to monitor their usage. The COUNTER Code of Practice for E-Books (COUNTER, 2009) was published in 2007 but compliance with the code by publishers has been slow. To date there are 12 e-book publishers signed up as compliant. Where COUNTER data are available, libraries are able to compare usage of titles across different publishers and monitor the overall usage of aggregator services. Titles which are little used, or not used at all, can be removed from the collection and substitutions made. Usage data from non-COUNTER-compliant publishers are much more difficult to analyse and interpret and can be misleading. There is a move within ICOLC for all consortia to include a clause in their model licences requiring e-book publishers to become COUNTER-compliant. 85
BPDG_opmaak_12072010.indd 85
13/07/10 11:51
Hazel Woodward
Delivering e-books Business planning for e-books must also take into consideration the technical infrastructure, the cost of providing that infrastructure, standards associated with e-books and the cost of delivery. Most libraries deliver e-books via the library OPAC or library web pages, alongside links to other e-resources such as journals and databases. Although in the early days there was a myriad of e-book formats, currently the most common delivery format is PDF. There is however a growing consensus among publishers that XML is the up and coming format. Palmen from Innodata shares this view: “We are convinced that the ultimate technology will be XML. It is the central building block for e-content. It will improve access, allowing re-purposing of content”. Moreover, in his view it is misguided to think of e-books as simply the e-versions of printed books. They will be successful only when they have the “portability of cell phones”, when readers can “exchange views and opinions about them” and users can “create their own personal digital library” (Wilkie and Harris, 2008). While delivery via library workstations and personal laptops is the norm at the moment, libraries are certainly going to have to give consideration to offering delivery via handheld devices. Currently, there is huge, three-way competition going on regarding e-book readers between Sony, Amazon and Google. Sony’s reader is already available in the UK. Google has 1.5 million books scanned and available through the iPhone, and Amazon is about to launch (mid 2009) the second generation of its e-book reader, the Kindle (although the date for the UK launch has yet to be announced). The most significant new feature is a text to speech function so that it can read the book to its owner. Jeff Bezos, CEO of Amazon, said at the US launch that Amazon’s vision was to have “every book ever printed in any language available in less than 60 seconds” (Feldman, 2009). Most of the cost of delivering e-books to library customers is on the publisher’s side. It is publishers who are having to create the various versions of e-books and conform to the developing standards that are emerging. They are also having to develop DRM systems to prevent e-book piracy which is growing substantially – just as it did in the music industry. The Association of American Publishers (AAP) has found websites, for example Textbook Torrents, offering pirated versions of as many as 5,000 textbooks each. There are also e-book sharing sites such as www.scibd.com which allow illegal sharing of copyright material. The problem is becoming so extensive that the International Publishers’ Association held a seminar about it in Frankfurt in 2008 (Marcus, 2009). In the majority of libraries, e-books are currently being delivered via technologies, software and hardware that are already in place. However, that may be all about to change if an e-book reader hits the marketplace with the same impact as the ipod or iphone. Only a handful of libraries have, so far, experimented with e-book readers, and there is no casestudy literature on costs, benefits and pitfalls. The most significant benefit to libraries of delivering books in e-format rather than print is the saving on space. Research which specifically addresses the impact of e-books on library space is non-existent (whereas such research for e-journals is available), but there is no doubt that space is an expensive commodity. Figures from Phase I of the UK Research Reserve Pilot Project claim that overall the eight partner libraries repurposed over 11,000 metres of shelf space, with a saving in recurrent estate costs of £308,000 and a capital value approaching £3.8 million per year (UK Research Reserve, 2009). 86
BPDG_opmaak_12072010.indd 86
13/07/10 11:51
E-BOOKS: BUSINESS PLANNING FOR THE DIGITAL LIBRARY
Finding e-books Both librarians and end-users need to find and locate e-books. From the librarian’s perspective there is a need firstly to identify whether a title exists in e-format and secondly to establish where it can be purchased from. Unfortunately, no one single place exists in which to find this information - unlike with printed books which are recorded in Books in Print and its national equivalents. Librarians wishing to purchase e-books must use a myriad of resources and services including bookseller websites such as those of Coutts, Dawson and Amazon and individual publishers’ websites. End-users can be equally challenged. Libraries manage access to e-books in different ways. Some libraries attempt to catalogue all their e-books so that records appear in the OPAC and users can click on a link and go straight to the title. Other libraries add e-book titles and/or e-book collections to their A-Z list of e-resources, but clearly in the case of collections this does not provide a satisfactory discovery mechanism for users, as individual titles are not listed. Research by the CIBER research group at University College London demonstrates that e-books which appear in the library OPAC attract twice the level of use as those which do not appear (CIBER, 2007). Research by Springer validates the CIBER findings. It shows that at Turku University the average number of e-book downloads more than doubled once e-book MARC records were loaded onto the OPAC (Springer, 2007). Catalogue records rarely have to be created from scratch by librarians. Most publishers and many aggregators now provide free MARC records with their e-books. Examples include ebrary, Springer, Wiley InterScience and Taylor & Francis. In terms of business planning, libraries need to factor in the cost of providing access to e-books which will be a mixture of costs associated with original cataloguing of titles by cataloguing staff and/or the purchase of catalogue records from suppliers such as OCLC, the cost of uploading free MARC records supplied with e-books by systems staff, and the addition of titles and collections to the A-Z e-resources list by acquisitions staff. There is some concern among libraries and consortia about the quality of MARC records supplied by aggregators and publishers. Research by JISC (Joint Information Systems Committee, 2009) and work by Rightscom (2006) shows that the quality of the records is highly variable. The Rightscom TIME project developed solutions for converting between metadata formats. The issues addressed by the research included: maintaining the integrity of the library catalogue; bringing e-book records up to the same standard as other catalogue records; managing catalogue entries for collections of e-books (purchased as a collection); and converting from UK-MARC to MARC-21. JISC Collections has issued a Fact File for publishers entitled Metadata for e-books (Joint Information Systems Committee, 2007) which emphasises the importance of good metadata: “Metadata is a great enabler: it can help readers to find the e-books they need and to ensure they have the right to use it. Very importantly, it also enables systems to exchange information about e-books between themselves without human interaction”.
87
BPDG_opmaak_12072010.indd 87
13/07/10 11:51
Hazel Woodward
This is an important issue for librarians. High quality metadata will not only allow librarians and end-users to find out about e-books but they also facilitate their identification, ordering and processing, thereby keeping library processes streamlined and cost-effective. Conclusion Business planning for e-books is at an early stage. It is only relatively recently that some libraries have begun to acquire a critical mass of e-books within their digital collections, and other are still at an experimental stage. Lack of key titles, unsatisfactory purchasing models, the proliferation of platforms and interfaces, and the slow uptake of the COUNTER Code of Practice for E-Books have held the market back. But many publishers and librarians now agree that the age of the e-book is about to dawn. Significant research by consortia, publishers and academia, e.g. JISC, CIBER, Springer, ebrary, mentioned above, is demonstrating that the demand for e-books is growing. This demand, at the moment, is coming essentially from students and librarians, and there is a need to engage faculty to a greater extent. Nevertheless, research demonstrates that when the right titles are made available they will be heavily used. The publishing industry has its work cut out. This applies not only in the area of e-book technical standards and metadata but also in the area of publishing statistics and business models. Librarians too need to undertake more research into the cost benefits and valuefor-money aspects of e-books within their collections. It is to be hoped that current research will inform these areas – to the benefit of all the players in the information industry. As always, predicting the future is a risky business. One significant development which has not been mentioned in this chapter – but which may have a huge impact upon e-book provision in libraries – is the Google book digitisation project. Google has digitised thousands of books held in a number of important research libraries and made them searchable through Google Book Search. The legal action brought by the Authors’ Guild and the Association of American Publishers in 2005, has recently been settled, and paves the way for institutional subscriptions (currently limited to US libraries) to the online content. There is little doubt that the market will hot up. Book publishers will find that change is inevitable and e-books will also evolve in response to users’ requirements. It is certain that the e-book industry and e-book provision by libraries will look very different in five years’ time. Summary – The market in e-books is growing rapidly and, research suggests, is about to take off – Development has been slow until now owing to the unavailability of relevant titles, high pricing and complex pricing models, the lack of e-textbooks and high demand titles, the lack of user awareness and difficulties with platforms 88
BPDG_opmaak_12072010.indd 88
13/07/10 11:51
E-BOOKS: BUSINESS PLANNING FOR THE DIGITAL LIBRARY
– Types of e-book include reference works, handbooks, scholarly monographs, e-textbooks and digitised works – Development of the market for e-textbooks is key for the education market, but problematic owing to publishers’ concerns about income streams – The business case should be strong: 24x7 access, browsing across corpi, reference linking, mash-ups, space and time saving, speedy ordering and delivery, no losses or theft, usage data – Open access is emerging – Aggregators can play a significant role – Consortia can negotiate advantageous terms – Delivery is currently through the net, but hand-held devices may take off – Metadata are key - e-books recorded in the OPAC receive more use References Bell, A. and Chesher, C. (2008), “Taylor & Francis eBooks”. Presentation at 10th European ICOLC Fall Meeting, Munich, 21 October 2008 Bennett, L. (2009). Personal communication Bloomsbury Academic, “Homepage”. Available at http://www.bloomsburyacademic.com (accessed 3 March 2009) Blummer, B. (2006), “E-Books revisited: the adoption of electronic books by special, academic and public libraries”, Internet Reference Services Quarterly, Vol. 11 No. 2 p. 1 Book Industry Study Group (2009) “Welcome to the Book Industry Study Group, Inc.”. Available at http://www.bisg.org (accessed 1 March 2009) Bush, V. (1945), “As we may think”, The Atlantic Monthly, July 1945. Available at http://www. theatlantic.com/doc/194507/bush (accessed 1 March 2009) Canadian Association of Research Libraries (2008) Copyright Committee, Task Group on E-Books:,”E-Books in Research Libraries: issues of access and use”. Available at http://www. carl-abrc.ca/projects/copyright/pdf/CARL%20E-Book%20Report-e.doc (accessed 1 March 2009) CIBER (2007), “SuperBook Project” Available at http://www.ucl.ac.uk/infostudies/research/ciber/ superbook/ (accessed 1 March 2009) Coker, M. (2009), “The rise of e-books: IDPF reports November e-book sales up 108 percent – and here’s some analysis” TeleRead: Bring the E-Books Home, January 2009. Available at http://www. teleread.org/2009/01/23/the-rise-of-e-books-idpf-reports-e-book-sales-up-108-percent-andheres-some-analysis (accessed 1 March 2009) Content Complete Ltd and OnlyConnect Consultancy (2009), “Study on the Management and Economic Impact of e-Textbook Business Models on Publishers, e-Book Aggregators and Higher Education Institutions: Phase One Report”. Unpublished draft, February 2009 COUNTER Code of Practice for E-Books. Available at http://www.projectcounter.org/cop/books/ cop_books_intro.pdf (accessed 3 March 2009) Dawsonera, “Homepage”. Available at http://www.dawsonera.com/ (accessed 3 March 2009) Ebook Library, “Homepage”. Available at http://www.eblib.com (accessed 3 March 2009) ebrary (2007). “ebrary’s global e-books survey, 2007”. Available at http://www.ebrary.com/corp/ collateral/en/Survey/ebrary_eBook_survey_2007.pdf (accessed 1 March 2009) Feldman, G. (2009), “Bezos unbound” The Bookseller, 13 February 2009, p. 14 Geleijnse, H. (2004) “Developing vision and mission statement in European libraries”. Powerpoint presentation delivered in Milan, 14 October 2004. Available at http://www.tilburguniversity. nl/services/lis/staff/geleijnse/milanovision.ppt (accessed 1 March 2009) 89
BPDG_opmaak_12072010.indd 89
13/07/10 11:51
Hazel Woodward
Gray, J. (2007), “It’s what users want: eContent aggregation to suite library needs and end-user demands”. Presentation to Tools of Change for Publishing: O’Reilly TOC Conference, June 2007. Personal communication from James Gray to Hazel Woodward. Higher Education Consultancy Group (2006). “A Feasibility Study of the Acquisition of e-Books by HE Libraries and the Role of JISC”. Available at http://www.jisc.ac.uk/media/documents/ jisc_collections/ebooks%20final%20report%205%20Oct.doc (accessed 1 March 2009) Joint Information Systems Committee (2007), “Metadata for e-Books, Fact File”. Available at http://www.jisccollections.ac.uk/about_collections/publisher_information/coll_jiscfactfile/ coll_factcards_ebooks.aspx (accessed 2 March 2009) Joint Information Systems Committee (2009), “E-Books Working Group”. Available at http:// www.jisc-collections.ac.uk/workinggroups/ebooks.aspx (accessed 1 March 2009) John Smith Campus Bookshop (2009). “Featured E-Textbooks”. Available at http://www.jscampus. co.uk/shop/ebooks.asp?mscssid=UNJ7CJ0CUSVL8P7L87ES5M6DBW0FAM5B& (accessed 1 March 2009) Joint Information Systems Committee (2009). “JISC National E-Book Observatory Project”. Available at http://www.jiscebooksproject.org (accessed 1 March 2009) Just, P. (2007), “Electronic books in the USA – their numbers and development and a comparison to Germany”, Library High Tech, Vol. 25 No. 1 p. 157 Knovel, “Homepage”. Available at http://www.knovel.com/portal/home (accessed 3 March 2009) Marcus, J. (2009), “Digital pirates cause havoc in academic publishing”, Times Higher Education, 8 January 2009, p. 16 MyiLibrary, “Homepage”. Available at http://www.myilibrary.com/company/home.htm (accessed 3 March 2009) NetLibrary, “Homepage”. Available at http://www.netlibrary.com (accessed 3 March 2009) Park, Y. (2007), “A Study of Consortium Models for e-Books in University Libraries in Korea”, Collection Building, Vol. 26, No. 3, p. 77. Project Gutenberg (2009), “Main Page”. Available at http://www.gutenberg.org/wiki/Main_Page (accessed 1 March 2009) Questia, “Homepage”. Available at http://www.questia.com/Index.jsp (accessed 3 March 2009) Rightscom Ltd (2006), “Testbed for the Interoperability of eBook Metadata (TIME), Final Report”. Available at http://www.jisc.ac.uk/media/documents/programmes/pals2/time_final_report.pdf (accessed 2 March 2009) Safari Books Online, “Homepage”. Available at http://my.safaribooksonline.com (accessed 3 March 2009) Sandler, M., Armstrong, K. and Nardini, B., “Market formation for e-books: diffusion, confusion or delusion”, Journal of Electronic Publishing, Vol. 10 No. 3. Available at http://quod.lib.umich. edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0010.310 (accessed 1 March 2009) Serials-eNews (2008), “Bloomsbury Academic, a new scholarly imprint”, Serials-eNews, No. 181, 17 October 2008. Available at http://www.ringgold.com/UKSG/si_pd.cfm?AC=0861&Pid=10&Zid=4122&issueno=181 (accessed 1 March 2009) Springer (2008), “eBooks – The End User Perspective”, White paper. Available at http://www. springer.com/cda/content/document/cda_downloaddocument/eBooks+-+the+End+User+Exp erience?SGWID=0-0-45-608298-0 (accessed 1 March 2009) UK Research Reserve (2009), “The Pilot Project – UKRR Phase 1”. Available at http://www.ukrr. ac.uk/about/project.aspx (accessed 1 March 2009) Vassilou, M. and Rowley, J. (2008), “Progressing the definition of e-book”, Library Hi Tech, Vol. 26 No. 3 p. 355. Available at http://www.emeraldinsight.com/Insight/viewContentItem.do; jsessionid=D72EDD74111FA37F0C49D11CFE1C42B4?contentType=Article&content Id=1745069 (accessed 1 March 2009) 90
BPDG_opmaak_12072010.indd 90
13/07/10 11:51
E-BOOKS: BUSINESS PLANNING FOR THE DIGITAL LIBRARY
Wiley-Blackwell (2009), “Wiley Announces Online Books Agreement with YBP Library Services”. Wiley-Blackwell Press Release, 21 January 2009. Available at http://www.seyboldreport.com/ wiley-announces-online-books-agreement-with-ybp-library (accessed 1 March 2009) Wilkie, T. and Harris, S. (2008), “E-books are here to stay”, Research Information, April/May 2008. Available at http://www.researchinformation.info/features/feature.php?feature_id=167 (accessed 1 March 2009)
91
BPDG_opmaak_12072010.indd 91
13/07/10 11:51
BPDG_opmaak_12072010.indd 92
13/07/10 11:51
8 BUSINESS PLANNING FOR E-ARCHIVES Dirk Kinnaes, Marc Nelissen, Luc Schokkaert and Mel Collier
Introduction Although it is not uncommon for archives departments to be housed in libraries, archivists are keen to point out that the professional processes associated with archive management are different from those of library management. Archivists have a strict appraisal process for deciding what needs to be preserved and what not, and when an item is selected for retention a decision is being made about the period for which it will be retained, if not in perpetuity. The criteria for appraisal are not absolute: in the administrative context documents may be retained because they are statutorily required or they contain vital information about the organization. If being selected for perpetuity other factors come into play, such as their relationship to the mission of the organization or their historical importance. The method of appraisal and selection in our contemporary society, with its excessive production of documents, is crucial for the efficiency and the feasibility of the archive. The archivist could retain or destroy on the basis of document type (for instance bank statements could be destroyed when they are no longer needed for audit purposes) or on the basis of function (for instance policy documents could be retained which are essential for the ongoing operation of the organization or for historical purposes). The process of appraisal and selection should always be carried out in the full understanding of the document production of the creator of the archive as a whole. The archivist must become familiarized with the processes of the entity which formed the archive and the context in which it was formed. When items are selected for retention they are organized by the archivist within the retained archive in a way which reflects their position within the whole, as nearly as possible to the original order. This principle is carried through in the approach to metadata creation: the context in which an archive was formed and the relationship of the objects with each other within it must be reflected in the archival descriptions. This is of course rather different from normal library cataloguing. Fundamentally there is no difference of principle in this regard between digital and nondigital archiving. The criteria and methods can be applied to both. In practice however, the scale of digital object production requires serious additional business planning measures. Paper archives are visible and (perhaps) rather less likely to be disposed of without thought, and they can be stored until the archivist has time to deal with them. If necessary an emergency operation can be launched to rescue them. With digital archives this is not so obvious. Furthermore the quantity of digital production makes the cost and organizational challenge of emergency intervention by the archivist much more problematic. Digital archiving makes business planning even more imperative 93
BPDG_opmaak_12072010.indd 93
13/07/10 11:51
Dirk Kinnaes, Marc Nelissen, Luc Schokkaert and Mel Collier
than analogue archiving and requires active participation by the archive creators. Proactive planning can ensure that a certain level of pre-appraisal and selection can occur with essential metadata creation in a standardized way at the point of creation of the document, which will aid automatic transfer from the administrator or archive creator to the archivist when the time comes. Business planning for e-archives Despite the fact that the internal organization and professional procedures of archives may be different from those of other cultural sectors, the principles of business planning for e-archives have much in common with others because the principles and processes involved in repository management (see the chapter by Swan) are themselves common. A digital repository for an organization can of course serve its various sectors: administration, archive, library, heritage and museum functions and audio-visual archiving. Much of the literature and advice about e-archives is indeed based on the seminal work of the Trusted Repositories Audit and Certification taskforce which was initiated in 2003 by the Research Libraries Group and the National Archives and Records Administration in the USA, and then further developed by contributions from Europe (TRAC 2007). In the Netherlands a testing framework for digital archiving (ED3 2008), which was developed in reaction to a report concerned with the government “getting dementia”, (Een dementerende overheid 2005) is based to a large extent on the TRAC work. There are several sources of advice and support in the whole area of archiving and repositories, for instance Digital Preservation Europe1 provides a wide range of services and support, including the planning tool PLATTER (2008). This structures its planning advice into: – Business plan – Acquisition – Staffing – Access – Technical – Data – Succession – Disaster – Preservation In Flanders the excellent Expertisecentrum DAVID (eDAVID)2 provides a range of advice and services (in Dutch but with the very effective Google translation tool), including a document on the organizational aspects of building and managing a digital depot across all the cultural sectors (Schaule 2009). Given the common ground between the sectors we restrict ourselves in the following sections to those aspects or nuances of business planning which seem peculiar to the archives sector.
1 2
http://www.digitalpreservationeurope.eu/. http://www.expertisecentrumdavid.be/. 94
BPDG_opmaak_12072010.indd 94
13/07/10 11:51
BUSINESS PLANNING FOR E-ARCHIVES
Vision and mission The starting point for the vision and mission of an e-archive is likely to be focused on the needs of the organization to which it belongs. Most institutions, certainly universities, will wish also to provide services orientated towards external users, but the e-archive is likely to be set up primarily to provide the mother organization with its persistent memory. Persistent could imply preservation of and access to digital objects which will be kept in perpetuity or for a defined period. Examples of objects which would probably be kept in perpetuity are constitutional and policy documents, minutes of key decision making committees and boards, correspondence of key board members and executives, annual and financial reports and so on. Examples of items which might be kept for a given period could be records of financial transactions (kept for a statutory period for audit purposes) and personnel records (kept for a certain period after the person has left or died). This is all for the organization to decide, but in the digital age the decisions are rather more crucial in view of the volume of material and potential cost of archiving. The digital environment also brings new questions: what about archiving the rector’s or chief executive’s e-mails, for instance? With paper correspondence he or she may easily keep personal correspondence separate from official correspondence; in the digital environment this would need to be planned in advance with the creator of the files, as to sort it out retrospectively would require human intervention on a mammoth scale. Attitudes to digital archiving may differ between types of mother organization: a company might be inclined to take a strictly practical view of its needs, whereas a university, for instance, might well have a strong sense of its history in the making. Process and procedures Among the many documents available on the eDAVID site is a checklist on digital archiving for archivists (Boudrez n.d.). The checklist includes items under the following headings: – Development of archiving policy – Raising awareness of archive creators and IT staff – Supervision of digital archiving – Implementation of procedures – Acquisition into the digital store – Long-term physical retention of the bits and bytes – Display and interpretation of the digital documents in the long term Once material has been selected the process of inclusion in the e-archive can begin. It goes without saying that the e-archive should follow the principles of the reference model for an open archival information system (OAIS 2002), which is currently under review.3 Compliance with OAIS can be checked with the TRAC criteria. The Leuven Integrated Archive System (LIAS) (Kinnaes 2009) defines the following functions, based on OAIS: – Pre-ingest: rudimentary selection and transformation from digital objects to submission information packages (SIPs), including the creation of digital objects from ana3
http://cwe.ccsds.org/moims/default.aspx. 95
BPDG_opmaak_12072010.indd 95
13/07/10 11:51
Dirk Kinnaes, Marc Nelissen, Luc Schokkaert and Mel Collier
logue originals (if necessary), virus and bit control, the addition of metadata: technical (may be automatically generated), structural, administrative, rights, descriptive. – Ingest: capture of the SIPs, quality control, generation of archival information packages (AIPs). It includes decisions about the nature of the objects and their relationship to each other (e.g. a file of correspondence); what metadata are available and whether they are compliant; which collections do objects belong to; which derived versions (manifestations) e.g. preservation, consultation versions, should be created; in what way should the objects be displayed to the user; should the objects be available to external search engines? – Preservation: choice of strategies and standards for migration and/or emulation. In principle a strategy is chosen for each document type, but there must at least be an archive copy in the original format, together with a consultation copy. – Storage: induction of AIPs into permanent mass store, backup and recovery, preparation of AIPs for consultation – Data (including metadata) management: adding, changing, deleting information. In the Leuven Integrated Archive System (LIAS) the components are: the library cataloguing system for books, journals and separately catalogued audio-visual documents, the archive management system for archive documents in context and the repository system itself for technical, structural, administrative and rights metadata, and an authority files system. – Administration: negotiation of the submission agreement, checking compliance of the SIPs, system configuration – Access: making the archived objects available to users. Definition and design of display environment, presentation via specific systems such as Aleph or ScopeArchiv OPACs (see below), search facilities, registration of users, reading room administration of original documents. Infrastructure While documents are still being used for administrative purposes and still in the dynamic phases of use, modification and re-use, it is conventional to talk of holding them in Electronic Document Management Systems (EDMS). When documents pass out of that stage, become static and of archival value they are transferred to Electronic Records Management Systems (ERMS) or archive management systems, where they should no longer be changed or deleted except in strictly documented and specified circumstances. As stated elsewhere in this book a digital archive can be organized on a large or a small scale and can cost a little or a lot. It can use Open Source software or commercial products. At Leuven, where there had been no previous archive management system in place, the requirement was for a solution which combined the acquisition of an institutional archive management system, linked to a repository which would be also be available for other applications such as digitized print, manuscripts, art objects and the Academic Bibliography (register and repository of academic output). There was a policy preference for choosing a commercially supported product in preference to in-house development on the basis of Open Source. Eventually the solution consisted of: – Aleph (the existing Ex Libris library management system) for separately catalogued items 96
BPDG_opmaak_12072010.indd 96
13/07/10 11:51
BUSINESS PLANNING FOR E-ARCHIVES
– ScopeArchiv for archive management – Digitool (Ex Libris repository product) – ODIS (Flemish research resource and database)4 for authority management – Mass storage with security, backup and recovery (including distributed tandem operation for redundancy) provided by the University ICT services department The first three are managed at application level by the University Library’s ICT department, LIBIS, but all network and hardware management is outsourced to the University’s central ICT services. Legacy systems on DSpace are also running. Implementation of the Ex Libris integrated search platform, Primo, is under consideration. Value proposition, finance and sustainability The value proposition of archives in many cases is likely to consist in the corporate memory and historical value of the archive to its owners and the public and the possibilities of exploiting them, and to the more prosaic imperative of the preservation of mission critical documents for policy or for audit reasons. The introductory chapter of this book and the chapter by Swan, as well as the report by Schaule (Schaule 2009), all provide similar summaries of possible sustainability models. As mentioned above, however, organizational archives mostly exist to serve the organization, and thus it is hard to see how they could be financed other than by the organization itself, or at least sources very close to the organization. Other public and government archives are seen as public property, created at public cost through the taxation system (see the chapter by MenneHaritz), which should therefore continue to be financed through the public purse. It is always possible that a few highly notable archives will attract foundation funding or other sponsorship, but in the main there can be no substitute for the establishment of a secure budget line within the host organization, as recommended for digital libraries in general by Greenstein and Thorin (Greenstein 2002). This may be no easy matter, however, as the digital archive may well be a partly or wholly new budget line which is additional to existing analogue archive costs, and may not at first sight be able to be financed from savings elsewhere, even though arguments can be advanced that the digital archive can bring savings in storage and staff time. The costing of e-archives has similar considerations to that of other digital libraries, and indeed information systems in general. Special consideration must be given to longterm preservation, however, as, although the term digital archiving can imply fixed term retention, it implies in the main preservation in perpetuity. As stated in the chapter by Sierman, the costs of long-term retention are difficult to predict because, although costs of storage media are likely to continue to reduce, the future of the archive may be one of endless growth, and the nature, scale and difficulty of future preservation activities are by definition unknown. Other costs are however reasonably amenable to estimation. At Leuven the following are taken into account: – Staff costs (including overheads) directly associated with the development, operation, maintenance and management of the digital archive 4
http://www.odis.be/eng/opc/databank.htm. 97
BPDG_opmaak_12072010.indd 97
13/07/10 11:51
Dirk Kinnaes, Marc Nelissen, Luc Schokkaert and Mel Collier
– The purchase, licensing and maintenance of the archive management system – The purchase, licensing and maintenance of the repository system – The purchase and maintenance of hardware directly associated with the archive – The outsourcing of hardware operations including an all-in charge for secure mass storage to the University’s ICT Services department Costs which are not taken into account include networking (covered centrally), the costs of archive or library personnel associated with development and operations, the costs of the original administrators or creators of the archive who need to prepare it for transfer to the archivist and the costs of management and senior academic personnel involved in the steering committee. Leuven University has a distributed budget model whereby faculties, faculty groups and service departments have considerable budgetary autonomy and are charged for services. The costs mentioned above are all brought together with an element for future system replacement and then charged to client groups participating in use of the repository on the basis of a one-off entry fee plus a banded fee per package of mass storage used per annum. Staffing Digital archiving is a relatively new field, even within the young field of digital libraries, and the recruitment and retention of experienced and competent staff is a major challenge, as indeed it is for the digital library field as a whole (see the chapter by Popham). This is exacerbated by the fact that even archive management systems themselves are not yet as well established as library management systems. The pool of experienced personnel is still small, and because much digital archive work is financed from fixed-term project funds they may be hard to retain. On the other hand there is a vigorous innovative digital archiving community and a number of organizations, resources and web sites which can offer support, training and advice for existing staff development. Risks For institutional archives the risks associated with e-archives are likely to be similar to those of other information systems in general: system failures or catastrophic events which can be planned for with normal backup and security measures and disaster recovery plans. As just mentioned, recruitment and retention of competent staff is always a risk to be reckoned with. For e-archives which have not established a secure institutional budget line and rely on ongoing income generation, continuous fund-raising efforts can be precarious and exhausting. For private or independent archives which do not have obvious stakeholders and rely on fund-raising, these risks are accentuated. For high-profile projects depending on substantial public subsidy political risks can be considerable, as they are at the mercy of public opinion and political priorities.
98
BPDG_opmaak_12072010.indd 98
13/07/10 11:51
BUSINESS PLANNING FOR E-ARCHIVES
Conclusion Digital archives, perhaps even more than other sorts of digital library, are very much work in progress, mainly because of the challenges and unknown future characteristics of longterm preservation, and the relative newness of technology and practice. Digital archivists need to focus in these circumstances on the application of best practice as generally agreed in the professional field. The field is however very active and collaborative and there is no shortage of available support and advice. Summary – The principles and processes involved in acquisition, selection and management of archives are essentially the same for digital and non-digital archives. – Digital archives however bring additional challenges because of the scale of digital production and uncertainty about the technology of long-term preservation. – Definition of the mission of the archive is essential for business planning. Considerable advice and support is available for business planning in the international digital archiving sector. – The value proposition of institutional or organizational archives may consist of the preservation of the official record, documentary justification of past decisions and activities and the corporate memory. – Archiving processes are defined as: pre-ingest, ingest, preservation, storage, data management, administration and access. – Planning for infrastructure costs and business risk involves similar considerations as for other information systems, with special attention needing to be paid to the costs of preservation and permanent large scale storage. References Boudrez, F. (n.d.) Checklist voor de digitale archivaris. eDAVID. http://www.edavid.be/davidproject/teksten/Richtlijn8.pdf (accessed 31 January 2010) (Een) Dementerende overheid? Risico’s van digitaal beheer van verantwoordingsinformatie bij de centrale overheid (2005) Rijksarchiefinspectie. http://www.rijksarchiefinspectie.nl/uploads/ publications/Een%20dementerende%20overheid%20versie%20DEF%20arial.pdf (accessed 30 January 2010) ED3: eisen duurzaam digitaal depot (2008). Version 1. Landelijk Overleg Provinciale Archief Inspecteurs. http://www.provincialearchiefinspectie.nl/pdf/ED3_v1.pdf (accessed 30 January 2010) Greenstein, D. and Thorin, S. (2002) The Digital Library: a biography. Digital Library Federation. Council on Library and Information Resources. 2nd ed. ISBN 1887334955 http://www.diglib. org/pubs/dlf097/dlf097.pdf (accessed 5 June 2010) Kinnaes, D. and Schokkaert, L. (2009) Concepten LIAS. Internal document. OAIS (2002) Consultative Committee for Space Data Systems. http://public.ccsds.org/publications/archive/650x0b1.pdf (accessed 30 January 2010) PLATTER (2008) DPE repository checklist and guidance. Digital Preservation Europe. http:// www.digitalpreservationeurope.eu/platter.pdf (accessed 30 January 2010) 99
BPDG_opmaak_12072010.indd 99
13/07/10 11:51
Dirk Kinnaes, Marc Nelissen, Luc Schokkaert and Mel Collier
Schaule, S. (2009) Organisatorische aspecten bij het bouwen en het beheren van een digitaal depot. Expertisecentrum DAVID. http://www.edavid.be/digitaaldepotproject/publicaties/SSchaule_ organisatAspecten.pdf (accessed 30 January 2010) Trusted repositories audit and certification: criteria and checklist (TRAC) (2007) Version 1 CRL and OCLC. http://www.crl.edu/sites/default/files/attachments/pages/trac_0.pdf (accessed 30 January 2010)
100
BPDG_opmaak_12072010.indd 100
13/07/10 11:51
9 ISSUES IN BUSINESS PLANNING FOR ARCHIVAL COLLECTIONS OF WEB MATERIALS Paul Koerbin
The state of the art Introducing the subject The World Wide Web (the ‘web’) is such a pervasive part of our working and personal lives that it presents unprecedented challenges for libraries to collect and manage for preservation. We are more involved with the Web every day in a way in which we have not been with other information materials; it is at once seemingly omnipresent and elusive. Confronted with the idea of web archiving an obvious first question to ask is ‘how do we do it?’. However, while the ‘how’ is certainly a challenge it cannot be usefully addressed without first confronting what are the rather more difficult questions of ‘what to archive’ and ‘why?’. We all have some concept of the web, but what do we really mean? The web is subject to all the conveniences (and excesses) of labels and jargon terminology. Even ostensibly straightforward labels can conceal a deeper complexity when we need to work practically with such entities. What, for example, is an ‘online newspaper’ exactly? Is it the news under the masthead, the blogs, the videos, the portal, the commercial services and advertisements which may be associated with the site? Perhaps all or some of these things, depending on one’s purpose or perspective. One problem we face, then, is what level of specificity we need in order to talk meaningfully about the web. Such questions are not merely rhetorical if we are to address the primary concern of what we perceive the web to be for the purpose of web archiving. Not only do we need to understand the nature of the content and the format (or formats), but we also need to deal with the dilemmas posed by the characteristics which make the web what it is: its dynamic quality that makes the dimension of time a profound challenge for collecting institutions; and the un-mediated and non-discrete nature in which it manifests itself through re-use, linking, embedding, feeds, API technologies and so on. What is web archiving? ‘Web archiving’ is the generally accepted term for the activity of collecting web delivered materials for a digital repository with the intended purpose of long-term preservation. Other terms such as ‘Internet archiving’ or ‘Internet preservation’ may be understood as synonymous; though all these terms should be understood as narrower than ‘digital preservation’, which may encompass more than born digital web material. In addition to 101
BPDG_opmaak_12072010.indd 101
13/07/10 11:51
Paul Koerbin
the imprecise nature of the term ‘web’, ‘archiving’ is also problematic in some respects, particularly for libraries since the term ‘archive’, which is implied by ‘web archiving’, can suggest a legal purpose associated with national or state archives which the library-based web archive does not necessarily fulfil. Nevertheless the term does have the advantage of implying purposeful preservation which a term such as ‘collection’ may not. Web archiving is further defined by how the process is undertaken. The content of the archive is obtained through the use of harvest (or crawl) robots. The robot acts much like a web browser, and so what is collected is a static rendering of content – a browser view – as delivered by web servers. While it may be possible for content to be ingested by other means such as deposit, the web archive, certainly for the purpose of this discussion, is to be distinguished from institutional digital repositories in which curators or creators submit packages of content to the repository. Web archiving should also be understood as a workflow process involving selection (or scoping), collection (ingest), metadata creation and administration, data storage, preservation and access (delivery). The Reference Model for an Open Archival Information System (OAIS), published by the Consultative Committee for Space Data Systems in 2002, has been widely adopted as the basis for developing these workflows. Who is involved? As Brown (2006: 8) has noted, the history of web archiving is almost as long as that of the web itself. The early pioneers included the national libraries of Sweden and Australia, both commencing web archiving programmes in 1996, and the Internet Archive, also founded in 1996. National and state libraries with deposit library mandates continue to be the leading library institutions involved with web archiving. In recent times there has been a growing interest from universities with an interest in developing collections of web resources to support research analysis, research groups such LiWA (Living Web Archives), and even some commercial ventures such as HanzoWeb. While there has been developing interest in research aspects and commercial services, major libraries remain the principal institutions willing to take on the considerable commitment involved in web archiving. Future developments in software and standards may open up possibilities for smaller libraries to be able to become involved but much depends on their answer to the ‘why (do it)’ question. For major institutions with a history of deposit and jurisdictional collecting the motivation is perhaps more evident. However, libraries with specialised clientele may also readily identify the need for archival collections of specific web resources. Approaches to web archiving The way web archiving has been undertaken by libraries can be defined most readily by scope. From this perspective, three types of approaches are apparent: selective, domain based and thematically curated web archiving. In setting up programmes libraries have tended to opt for one approach – usually as a pragmatic decision in order to get a web archiving programme underway – though more recently a mixture of approaches has emerged as more desirable (and attainable) and has, for example, been implemented as the strategy for the Danish national web archiving project Netakivet. 102
BPDG_opmaak_12072010.indd 102
13/07/10 11:51
ISSUES IN BUSINESS PLANNING: WEB MATERIALS
Selective web archiving involves targeting high value resources (as defined by the objective of the organisation running the programme) which are collected as discrete entities (and consequently extracted from the ontological context of the web). One of the motivating factors for libraries taking this approach is that it operates on a scale which makes it possible to obtain permissions before undertaking the archiving process (in jurisdictions where legal deposit provisions do not extend to online digital materials). Another advantage of or motivation for the selective approach is the possibility of providing access to the archival content almost as soon as it is archived. Well established examples of the selective approach include the PANDORA Archive in Australia and the UK Web Archive Consortium. Thematically curated web archiving is really another approach to selective archiving. This approach tends to focus on events or themed collections which are pre-defined and finite in scope. Typically this involves curators identifying target websites and building up seed lists of URLs to be harvested. As a selective approach, the thematically curated collection also allows the possibility of dealing with permission upfront and ultimately provides for open access possibilities. The prominent example of this approach is the Library of Congress Web Archives (Minerva) with its collections, among others, of United States election and Congress materials and the ‘September 11, 2001, Web Archive’ collection. Domain archiving uses the structure and connectivity of the web to allow crawl robots to follow links, much like a search index robot, collecting content within a stated web domain. Typically this would be the top level country domain, but it may also be a subdomain of the country such as the .gov domain. This approach avoids time-consuming and contentious selection processes, though it still involves a scoping process to define the parameters of the harvest. It has the capacity to collect high volumes of content, though the exact nature of the content may not be known beyond the URLs reported as being harvested. Collaboration While there have been a variety of approaches to establishing web archiving programmes, many libraries have worked collaboratively to realise web archiving programmes. The National Library of Australia’s PANDORA Archive engages the curatorial participation of nine Australian state libraries and cultural institutions while maintaining a centralised infrastructure. The UK Web Archiving Consortium has pursued a similar approach. The Library of Congress collaborates with the Internet Archive to undertake the harvesting of the collections they curate; and there were early collaborative efforts among Scandinavian libraries.
Example: the National Library of Australia’s PANDORA Archive The National Library of Australia (NLA) established the PANDORA Archive as a practical ‘proof-of-concept’ selective web archiving project in 1996. By 1998 it had become part of the operational business of the Library. A selective web archiving workflow system called PANDAS was developed in-house and implemented in June 2001, and the third much enhanced version was deployed in July 2007. Beginning in 1998 curator participants joined with the NLA to undertake web archiving work, although all the infrastructure and 103
BPDG_opmaak_12072010.indd 103
13/07/10 11:51
Paul Koerbin
archival data continue to be managed at the NLA. By 2009 there were ten PANDORA participants, including state libraries and major national heritage collecting institutions. As at July 2009 the PANDORA Archive included around 50,000 archived instances (ranging in scope from simple PDF documents to large complete websites) amounting to 70 million files and more than three terabytes of data. It is searchable and openly accessible. While the PANDORA Archive represents a relatively small-scale, quality assessed, open access archive, the NLA, as Australia’s principal documentary heritage collecting institution, has recognised the need to supplement this highly curated archiving with larger scale robot harvesting of web content. Consequently, since 2005 annual large scale harvests of the Australian web domain (.au) have been undertaken in collaboration with the San Francisco based Internet Archive. The four domain crawls conducted between 2005 and 2009 constitute a collection of more than two billion files and around 80 terabytes of data. The domain harvest collections are held at the NLA but are not publicly accessible at the moment. At the NLA the cost of the selective web archiving programme and infrastructure forms part of the operational costs of its collections management and information technology branches and is not budgeted separately. A study undertaken by the NLA in 2005 revealed that the cost of acquiring web materials was very high as compared with print publications, being four times the cost of acquiring a legal deposit monograph and more than 15 times the cost of acquiring a legal deposit serial issue (Phillips, 2005); and this did not include the costs associated with preservation The cost of acquiring the domain harvest collections ranging in size from 500 million to one billion files, including the collection, indexing, hardware and supply costs, were in the order of US$250,000 to US$350,000. Special issues When planning for the business of archiving web materials it is important to understand well the particular issues and considerations raised by working with these materials. These issues can be broadly categorised as: – Legal issues; – Technical issues; and – Ethical issues. Legal issues The extent to which copyright and intellectual rights issues will have an impact on web archiving programmes depends on the purpose and outcomes of such programmes, particularly if the objective includes open access to the archived materials. Typically web archiving in the library context has been carried out by organisations with national or state collecting responsibilities. While such organisations have worked with legislation mandating the deposit of print materials, such legislation may not extend to digital materials. In jurisdictions where legal deposit does extend to digital (and online) materials the conditions with respect to access are often limiting, such as restricting access to single 104
BPDG_opmaak_12072010.indd 104
13/07/10 11:51
ISSUES IN BUSINESS PLANNING: WEB MATERIALS
use access (by researchers) within the library building. So, the copyright law in a given jurisdiction may limit the ability to collect content on a large scale and (or) may allow for very limited access to the archived materials. Copyright law may well be an important consideration in determining the approach taken to web archiving. For example, the National Library of Australia when establishing its web archiving programme (PANDORA Archive) in 1996 had a strong objective to provide open networked access to the archival content. For this reason (though it was certainly not the only reason) a selective approach was adopted as the initial web archiving strategy since the work could be undertaken on a scale which allowed the Library to secure copyright licences from web publishers for permission to archive and provide access to the archived content. While copyright law presents a significant legal issue to be considered in planning for web archiving, even where legislation is favourable towards this undertaking it does not necessarily provide for open and assured long-term access to the archive. The un-mediated nature of web ‘publishing’ means that there is a likelihood of collecting content which may be considered defamatory or a breach of privacy. Such issues have the potential to cause considerable problems for the management of the archival content. As it is very difficult to identify all such problems before archiving, planning for the archive must include strategies and efficient technical mechanisms (such as the ability to restrict access to specific content) in order to deal with challenges to the retention of and access to disputed content. Technical issues Web materials, like all digital materials, pose particular problems for collecting institutions because they are entirely dependent upon technologies for their collection, management, storage, preservation and access. Moreover, web materials present even greater challenges than other digital materials. Unlike digitised materials, which may be created by the managing institution, in well understood formats and with a degree of homogeneity, or digital repositories which can mandate deposit in specific formats, the collection of web materials implies dealing with complex objects in multifarious formats often used inconsistently with respect to standards. Put simply, the collecting agency does not have control over the nature of the target objects and may not have detailed knowledge of the nature of the content and formats collected. Complex digital objects are a problem for acquisition because the methods of collecting the objects – normally by means of a crawl robot – have limitations and constantly confront the challenge of increasingly complex delivery by web servers and databases. While the harvesting process renders dynamic formats to a more static representation – essentially that which is delivered through a web browser view without most interactive elements – this ‘snapshot’ view will lose some of the functionality (and perhaps also content) found on the original website. Consequently, in planning the collection of web materials consideration needs to be given to the extent to which an accurate and complete representation of the original is the objective of the archive. To achieve complete and faithful renderings requires not only quality assurance processes – potentially an 105
BPDG_opmaak_12072010.indd 105
13/07/10 11:51
Paul Koerbin
expensive part of the process – but also preservation strategies for the ongoing retention and rendering of the artefact (not just the intellectual content). Technical issues in dealing with the complexity of web materials in an archival context are difficult enough, but these also need to be considered in the context of what is an ostensibly limitless publication and communication medium in which more and more content is ever available. It is a medium which is also dizzyingly dynamic. Just consider the development in the first decade of the 21st century of the web from a medium largely functioning for the dissemination of content, albeit a democratic one, to a medium characterised by social interaction, broad and narrow communication, and the multiple use and re-use of content. The relatively simple notion of the cultural artefactual nature of the web which Lyman and Kahle (1998) spoke of in the pioneering days of web archiving still has relevance, but it is subsumed within a vastly more sophisticated, dynamic and less easily characterised and definable ontology. The implication of this for business planning is that solutions, strategies and implementations can never be considered complete. Web archiving programmes which are intended to be ongoing must themselves be dynamic, strategic and responsive to the dynamic nature of the web, both in the collection (selection and acquisition) of materials and in the management processes for preservation and future access. Ethical issues When we understand the web as being more than simply a publication medium it is evident that there are also ethical issues to address when considering what to archive and why. Practitioners are beginning to suggest the need to look at some of the assumptions made in undertaking web archiving programmes. Rauber, Kaiser and Wachter (2008) identify three core assumptions which may need to be contested: – That the web is essentially a publication medium; – That the ephemerality of the web is a deficiency and the result of a lack of management; and – That the web is merely a collection of freely available materials. The point being made is that there is certainly much material on the web that is put there without the consideration, intention or understanding that it is being in any way formally ‘published’. Indeed a lot of material may be there with the very intention of being ephemeral. This does not preclude or prevent the archival objective of preserving such material if it is considered to have longer term interest than the ephemeral purpose intended by the creator; however, it may indeed have implications as to whether and how such material is collected and maintained. An archive by its nature intends to preserve material into the future. This need not be in perpetuity nor accessible immediately, but it does imply some existence in a captured form beyond the expected or intended life of the original. In this context the relationship between the objectives of the archive and the intentions of the creators of the materials archived may be at odds. Nor should this be understood as merely an intellectual dilemma, since there are practical issues to consider including both the investment of resources to archive and preserve tenuous or potentially contentious content and the responsibility (or liability) for the retention and provision of access to such material. 106
BPDG_opmaak_12072010.indd 106
13/07/10 11:51
ISSUES IN BUSINESS PLANNING: WEB MATERIALS
Infrastructure and management tools The preservation of web materials requires a technical infrastructure to manage the col lection and storage of digital files, the management of workflows and access mechanisms. Such infrastructure, even for archives of modest ambitions, is a considerable investment. Moreover such infrastructure requires management over time to deal with the preservation issues associated with digital content, and so must be understood as also being a commitment to ongoing investment in technical infrastructure development. As a relatively new endeavour the solutions of the pioneering institutions in web archiv ing have been, understandably, primarily geared towards the business needs of those organisations. This has meant that much development work has necessarily been required of organisations undertaking web archiving; and it has taken some time for more accessible open-source softwares to be available for the management of web archiving. The establishment of the International Internet Preservation Consortium (IIPC) in 2003 was a response from the pioneering web archiving organisations to work more closely to develop common standards, tools and techniques which would encourage commitment to web archiving programmes. In practical terms this has promoted the development of a number of software tools, including Internet Archive’s Heritrix as the de facto standard harvesting robot; WARC files as a standard archive file format; and the development, led by the National Library of New Zealand and the British Library, of the Web Curator Tool, a workflow management software. Given the significant infrastructure required to maintain a web archive, the option to source all or some of the infrastructure outside the responsible organisation may be a consideration. Storage is perhaps the most obvious component to outsource, though this may complicate access related processes. The Internet Archive has established a web archiving subscription service called Archive-It whereby the infrastructure, harvesting expertise and technologies and storage are provided by the Internet Archive as a subscription service. This is a simple solution model for organisations to begin to curate the archiving of web content, but it does leave tentative the issues of long-term custodianship and preservation of the collected content (though the archived content will end up in the general Internet Archive Wayback collection). Cost benefits Web archives are for the most part dealing with born digital materials, and consequently it is not a matter of choosing to collect web materials in place of other formats. Some material, particularly government publications, may indeed appear in a print published format, but the inexorable shift to quick, cheap dissemination of documents on the web suggests that the decision to be considered is not between print and digital (online) but whether or not to collect online materials. Thus the decision is really one centred on the business objective of the library and the cost to the library’s business objectives of not doing it. Considered in these terms it is the degree of commitment which needs to be determined so as to realise a cost benefit to the institution. 107
BPDG_opmaak_12072010.indd 107
13/07/10 11:51
Paul Koerbin
While the cost of establishing and maintaining infrastructure may be significant, certain processes, through automation, can be efficient. For example, the actual acquisition and storage of web materials can be handled efficiently. Indeed large-scale acquisition can be achieved through web crawling. It can also be scaled readily to suit available resources. In addition there are significant opportunities for automating networked access to the content and delivering unique and valuable archival collections to target users. There is no doubt that web archiving requires a significant commitment of infrastructure, information technology and curatorial expertise and labour. Moreover, such commitments are ongoing since the rapid and constant change which is characteristic of the web demands a strategic approach in order to ensure the continuity of the programme and the reality of preservation and long-term access – fundamental values of an archive. However, the benefits may also be significant in the expertise gained, the reputation of the library and its perceived ability to meet the challenge of collecting documentary materials in the 21st century; and in the contribution to the research and intellectual pursuits of the society it serves and supports. Business planning elements Significant business planning elements to be considered in order to establish and maintain a library web archiving programme include: – The library’s strategic objectives; – Target users and access intentions; – Technologies and infrastructure; – Staffing resources and expertise; – Integrations with other services; – Risk factors; and – Sustainability and planning. Strategic objectives As mentioned at the beginning of this discussion, the ‘why’ and the ‘what’ of web archiving are the significant questions which need to be addressed at the outset, and it is the library’s strategic objectives which ought to form the context in which to address these questions. Aligning the web archiving programme’s operation to the library’s strategic objectives will help to navigate through the complex issues of the scoping, scale and approach of the programme; and provide a rationale for the cost of establishing and maintaining it. It also provides the best opportunity to achieve the final critical element of sustainability. Target users and access intentions Consideration of the users for whom the web archive is intended will of course be an important element in the scoping of the programme. In the case of national or state libraries this may however be a very broad and diverse user profile (perhaps defined only by nationality or jurisdiction). A focus on the target user group – or groups – is important however in order to clarify the library’s intentions and objectives with respect 108
BPDG_opmaak_12072010.indd 108
13/07/10 11:51
ISSUES IN BUSINESS PLANNING: WEB MATERIALS
to access to the web archive in terms of timeliness, extent and authorisation. As previously noted, access can be one of the most complex, intractable and frustrating aspects of web archiving, and clarity of objectives in this regard is therefore important. Technologies and infrastructure A realistic understanding of the level of commitment to build and maintain the infrastructure and implement necessary technologies is critical, given the resource implications of the activity. Decisions regarding the extent of infrastructure maintained (or developed) in-house and what may be sourced externally will follow from the objectives of the archiving programme – for example, whether the collection to be developed is for an ‘in perpetuity’ archive or for a limited use period – and from the resources able to be committed initially and into the future. Staffing resources and expertise The nature of the implementation will influence the staffing required, in both the level of resource and the skills required. For example, the level and mix of Information Technology professional expertise and the curatorial and programme management expertise required will depend upon the approach (or approaches) taken and the infrastructure adopted. Integration with other services The planning process should also consider possible opportunities for integration with other services – such as other digital repositories required or implemented by the library – or, indeed, collaboration with other organisations. Within the library, single business approaches to the management of digital materials may not only provide the most costeffective management but also facilitate the desirable flexibility in the ongoing running and viability of the programme and the ability to implement future enhancements and expansion to the programme. Risk factors The analysis of risk needs to be broadly considered and will cover not only the risk associated with the resource investment, but also the risk to the library of not implementing a web archiving programme or, indeed, implementing a programme which will not service the objectives of the organisation and the needs of the target users. Moreover, the (relative) intangibility of the digital format introduces the significant risk issues of obsolescence and future accessibility and usability – whether an in-house implementation or outsourced service – in addition to standard risk factors relating to damage to the assets. Sustainability and planning One of the major elements to consider in the business planning process is the sustainability of the web archiving and the web archiving programme. For a number of the reasons already raised, there is much in the nature and character of web material and in the collecting and archiving of it that is dynamic, vast in scope and a challenge to long-term 109
BPDG_opmaak_12072010.indd 109
13/07/10 11:51
Paul Koerbin
preservation and access. The viability of embarking upon web archiving is dependent upon planning for the sustainability of and commitment to the activity of web archiving. Prospects for the future In the first decade and a half of web archiving activity, it has been the major libraries with national or state collecting responsibilities which have had the evident business imperative to establish and run web archiving programmes. However the expansion of the web as a medium for open publication, communication and the dissemination and use of information moves this imperative beyond the resources and business objectives of even those major libraries with broad collecting responsibilities. Increasingly web archiving is an activity which requires the engagement of more organisations with both broad and specialised objectives. The development and sharing of relevant technologies is already underway and provides for the possibility of a broader based involvement in web archiving programmes. Increasingly the requirement to access historic web materials will be demonstrated by extant archives and by the needs of researchers and users. It is perhaps this which will provide the impetus to develop more specialised, targeted and effectively realised archival collections of web materials. Summary – Web archiving is distinct from other preservation activities as content is harvested by robots rather than submitted by curators. The definition of which content to collect from sites can be complex. – Hitherto it has been mostly carried out by major libraries with legal deposit responsibilities – There are three types of scoping: selective, domain-based and thematic – Copyright, intellectual rights and the need to avoid defamatory material or breach of privacy have an impact on selection – Technical issues include dependence on automated processes, lack of control over content, and the complexity of digital objects, all in the context of limitless scale – Ethical issues include the preservation of material which was intended to be ephemeral and not preserved, and the preservation of tenuous or contentious material – The development of infrastructure and management tools is aided by collaborative efforts – The cost benefit depends on the mission of the collecting organization and the effect on the mission of not doing it. Much of the process can be done efficiently through automation – Business planning elements are: conformity with organizational strategic objectives, users’ needs, understanding of the necessary infrastructure, staff expertise, integration with other services and sustainability – The immensity of the Web means that archiving can only be done collaboratively and will need specialised selection of materials which are relevant for future historiographical or research purposes 110
BPDG_opmaak_12072010.indd 110
13/07/10 11:51
ISSUES IN BUSINESS PLANNING: WEB MATERIALS
References Access in the future tense. (2004). Washington D.C.: Council on Library and Information Resources. Borghoff, U.M., Rödig, P., Scheffczyk, J. & Schmitz, L. (2003). Long-term preservation of digital documents: principles and practices. Heidelberg: Springer. Brown, A. (2006). Archiving websites: a practical guide for information management professionals. London: Facet Publishing. Clausen, L.R. (2006). Overview of the Netarkivet web archiving system. Available at: http://netarchive. dk/publikationer/iwaw06-clausen.pdf (viewed 5 June 2010) Consultative Committee for Space Data Systems. (2002). Reference model for an open archival information system (OAIS). Available at: http://public.ccsds.org/publications/archive/650x0b1. pdf (viewed 5 June 2010) International Internet Preservation Consortium. (2009). Available at: http://www.netpreserve.org/ (viewed 5 June 2010) International study on the impact of copyright law on digital preservation. (2008). Available at: http:// eprints.qut.edu.au/14035/1/14035.pdf (viewed 5 June 2010) Koerbin, P. (2004). The PANDORA Digital Archiving System (PANDAS): managing web archiving in Australia: a case study. In 4th International Web Archiving Workshop. Bath, United Kingdom, 16 September 2004. Available at: http://iwaw.europarchive.org/04/Koerbin.pdf (viewed 5 June 2010) Koerbin, P. (2008). The Australian web domain harvests: a preliminary quantitative analysis of the archive data. Available at: http://pandora.nla.gov.au/documents/auscrawls.pdf (viewed 5 June 2010) Lasfargues, F., Oury, C. & Wendland, B. (2008). Legal deposit of the French Web: harvesting strategies for a national domain. In 8th International Web Archiving Workshop. Aarhus, Denmark, 18-19 September 2008. Available at: http://iwaw.net/08/IWAW2008-Lasfargues.pdf (viewed 5 June 2010) Lyman, P., Kahle, B. (1998). Archiving digital cultural artifacts. D-Lib Magazine, 4(7). Available at: http://www.dlib.org/dlib/july98/07lyman.html (viewed 5 June 2010) Masanès, J. (2006). Web archiving: Berlin; Heidelberg: Springer. PADI – Preserving Access to Digital Information. Web archiving. (2009). Available at: http://www. nla.gov.au/padi/topics/92.html (viewed 5 June 2010) PANDORA Australia’s Web Archive. (2009). Available at: http://pandora.nla.gov.au/ (viewed 5 June 2010) Phillips, M.E. (2005). Selective archiving of web resources: a study of acquisition costs at the National Library of Australia. RLG DigiNews, 9(3). Available at: http://worldcat.org/ arcviewer/1/OCC/2007/07/10/0000068921/viewer/file1.html (viewed 5 June 2010) Preservation of web resources handbook. (2008). Bristol, U.K.: JISC. Available at: http://jiscpowr. jiscinvolve.org/handbook/ (viewed 5 June 2010) Rauber, A., Kaiser, M. & Wachter, B. (2008). Ethical issues in web archive creation and usage – towards a research agenda. In 8th International Web Archiving Workshop. Aarhus, Denmark, 18-19 September 2008. Available at:. http://iwaw.net/08/IWAW2008-Rauber.pdf (viewed 5 June 2010)
111
BPDG_opmaak_12072010.indd 111
13/07/10 11:51
BPDG_opmaak_12072010.indd 112
13/07/10 11:51
10 ORGANIZING DIGITAL PRESERVATION Barbara Sierman
How to preserve? The National Library of the Netherlands (KB) holds a small manuscript with the intriguing title ‘How to preserve books for eternity’, written centuries ago, more precisely in 1527 (Porck 2007). The booklet is bound as an introduction together with a larger manuscript. Books were of course valuable treasures in those days, and monasteries appointed a person with the special task of taking care of the books. In eight rules the requirements for taking care of the books is described. As the manuscript has survived through the centuries, it seems that these eight rules worked well in this case! How we would like to have such a well defined and clear overview of a limited set of rules to preserve digital material for the long term! But, alas, the digital world is complex and constantly changing. Research into long-term digital preservation and its consequences started only 20 years ago. Although nowadays there are many initiatives and promising developments worldwide, much needs to be done before we will be able to conquer all the challenges relating to long-term digital preservation. The Digital Library concept requires us to protect the digital material which enables us to realize our core business: to give access to our collection. What is digital preservation? There are many definitions of digital preservation (or digital curation). One of them quite often cited is from the ‘Handbook of digital preservation’ (Handbook 2008) as a ‘series of managed activities necessary to ensure continued access to digital materials for as long as necessary’. Digital preservation is not only about storing digital objects for the long term, but also about keeping this information accessible for future users. The phrase “a series of managed activities” implies that there is an organization in place which is capable of doing this. Digital preservation often starts within a pilot project, carried out by some specialists in the library, to explore the playing field and gain experience. But after the project(s), digital preservation requires permanent organization and processes, with a clear mission, policies, staff, a business plan and an IT infrastructure. Libraries are very experienced in these areas in the analogue world, and this will help staff in understanding the changes that are needed to adapt the existing business to the digital world. Collection management, preservation, access methods, metadata and so on are all still as valid in the digital world as in the analogue, but practices, terminology and certainly scope of operation may differ greatly. Since 2003 the Open Archival Information System standard (CCSDS, 2002), (OAIS) has been the leading model for digital preservation. To have a basic understanding of 113
BPDG_opmaak_12072010.indd 113
13/07/10 11:51
Barbara Sierman
this model is a prerequisite for librarians involved in digital preservation. This conceptual model describes the functional entities related to long-term preservation and explains the tasks and responsibilities of a long-term archive. The word ‘ archive’ is used here in a broad sense, as an archive “consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a designated community”. The standard is applicable in a variety of organizations, from archives to institutional repositories and from space agencies to libraries. Many initiatives have been undertaken to translate this model into practical implementations of digital archives, as each organization needs to adapt this generic guidance for its own environment. Basic processes The long-term preservation of digital material involves a series of steps: – The acquisition of digital material, by creation or by purchasing – Pre-ingest: checking the received material (for viruses, completeness, etc.) and preparing it for ingest, by conversion into SIPs (Submission Information Packages) – Ingest: importing the SIPs, controlling quality and generating AIPs (Archival Information Packages) – Data management: applying appropriate preservation strategy and metadata – Archival storage of the material: importing AIPs into permanent storage – Giving access to the digital material to users, – Monitoring the digital material and its environment and planning necessary preservation actions, like refreshment, migration etc. as necessary Organizational aspects Long-term preservation of digital material involves different time frames; some speak of five to ten years, others of fifty years and more. Libraries have traditionally thought in terms of centuries and may see no reason to depart from that ideal. During this period continuous curation of the digital material is necessary, as the technology on which access to the digital material is dependent will keep changing. Unlike in the analogue world, ‘benign neglect” is not possible because of this constant change. Preservation time frames require a vision accompanied by the robust planning of resources to realize this vision in a sustainable way. The process of the long-term preservation of digital material will start at the front door of the library, as it were, when the material is created (digitization) or accepted, born digital or in digitized form, from an external source, and will end at the back door if the material is finally disposed of. In the meantime the digital material will be treated by various departments in the library to enhance the digital objects so that they are properly equipped for their time travel. Identifying the departments which will contribute to the long-term preservation will be helpful in defining and streamlining the preservation related activities in the library, to create new tasks and job descriptions (Diessen 2008). It offers an opportunity to share resources between departments and to define responsibilities clearly. 114
BPDG_opmaak_12072010.indd 114
13/07/10 11:51
ORGANIZING DIGITAL PRESERVATION
A department or group responsible for defining the collection policy of the library will identify which paper material to digitize and to keep as a preservation copy, which digital collections to acquire and which digital material will be a candidate for preservation and for how long. The finance department may advise on and manage the funding model, which should be in harmony with the level of preservation the library aspires to maintaining. People involved in acquisition activities and making agreements with third parties about the digital material the library receives should be aware of the quality factors required so as to be able to judge whether the material is ready to be preserved or needs extra (possible costly) actions. Public services departments, defining the access policies for the digital material, will focus on the (future) users of this digital material and how will they use it. They may also control and ensure compliance with authors’ rights and copyright requirements. Therefore many, if not all, departments play their part: the finance department (the budget), the acquisitions department (which material), public services (how to present it), IT (supports the realization) and cataloguing (metadata) all contribute to the digital archiving process. Digital preservation is not an isolated activity, or, as Steve Knight from the National Library of New Zealand mentioned, ”Digital Preservation requires interaction with all the organizations processes and procedures” (Knight 2008). Preservation policies It is important for libraries to create a set of policies reflecting their vision of why they preserve what and how this activity contributes to their core business. These policies can be used as a guiding principle by all the departments involved in digital preservation, and support them in taking decisions with regard to the digital material. Of course these preservation policies need to be in line with the other policies of the library, and indeed with the strategies and policies of the parent organization. The main topics in the preservation policy will be the mission and goals of preserving digital material, the relationship with other activities in the library, a description of the digital collections the library will maintain and how the responsibilities are allocated (Beagrie 2008). The library can then start thinking about how to implement them, and then more practical questions need an answer. Which file formats will we support? What about the original look and feel? How do we treat file formats outside the range of accepted formats? Which preservation strategies do we prefer: migration, emulation, normalization? Which IT environment will we use to store the objects? How are back-ups organized? A survey in the Planets project showed, however, that in practice not many organizations formulated, or at least published, their long-term preservation policies. This indicates that it may not be an easy task, even though it is an important one. However this survey reveals another interesting finding: that the organization which has a preservation policy often has been successful in ensuring that a budget is at its disposal (Planets 2009).
115
BPDG_opmaak_12072010.indd 115
13/07/10 11:51
Barbara Sierman
Staff requirements Once the policies are formulated, they need to be adopted in the library and translated into working processes and procedures. A digital library needs staff who are competent to take care of these tasks, bearing in mind that technical skills become more and more important and need constant upgrading. Digital preservation is a fairly young, international research area. Although an out-of-the-box solution is not yet available, progress is being made quickly and there is an overwhelming amount of literature about new insights and ideas, products and services. It is important that a library keeps itself informed of these developments, not only in relation to its day-to-day activities, but also as a means to update its own vision and policies, to keep aware of new developments, and to be prepared when new digital materials are offered to the library. This monitoring is part of the so-called “preservation watch activity”, an activity which deals with monitoring and identifying risks which may affect the environment of the digital objects in the collection. These risks or changes may be present in various areas, not just obsolete file formats or hardware and software which have reached the end of their lives and other technical developments which are wholly outside of the control of the library, but also changes in policies, organizational structure, staffing and budgets which could affect the digital collection. The preservation watch activity will identify these risks and alert management to take appropriate decisions. This will lead to the creation of (updated) preservation planning and preservation actions to deal with the risk. These processes are the main subject of the European Planets (Preservation and Long-term Access through Networked Services) project. One of the helpful services that resulted from this project is the decision support tool PLATO, which will help library staff in this process. This service is complemented by an integrated testbed, in which small-scale tests can be performed and where results can be compared with experiments of other organizations. A fully integrated workflow description is to be published by the end of the project in 2010.1 What kind of personnel therefore does a library need to hire for its preservation activities? Organizations are still struggling to create the right job descriptions and find appropriate personnel. A survey at the School of Information and Library Science of the University of North Carolina (Tibbo 2008), held to develop a graduate level curriculum for digital preservation, showed that technical competences, like knowledge of ICT and practical skills, were the most valued attributes of candidates for a digital preservation position. One of the lessons learnt at the Library of Congress was that “digital curators are an excellent investment” for whom knowledge of the library organization and environment, knowledge of the data in the collections and a basic technical background seemed to be the key factors for success (Madden 2007). Often digital preservation staff get training on the job, as universities and information science/library schools are not yet offering appropriate curricula. But this may change as national organizations, like for example the German nestor, are developing a curriculum in cooperation with German universities (Neuroth 2009), and this initiative is expected to be followed by others. Apart from that, the training events of European projects or 1
More information can be found on the website of the Planets project, http://www.planets-project.eu/, on Plato information can be found on http://www.ifs.tuwien.ac.at/dp/plato/intro.html. 116
BPDG_opmaak_12072010.indd 116
13/07/10 11:51
ORGANIZING DIGITAL PRESERVATION
events organized by national preservation organizations should help libraries in educating their employees. Selection of digital material for the long term In the coming years the digital library will further take shape and digital collections will grow very rapidly, whether from motives of preservation or by the ambition to make material accessible to a larger public. Although tons of paper are still delivered to libraries, more and more is published in digital form only or as a supplement to the analogue material. Many libraries are “harvesting the web” (taking digital material from the Internet), be it as a snapshot at a certain moment in time, for special occasions such as elections, or on a regular systematic basis. However it will be clear that not all material will be a candidate for long-term preservation, which means that each organization requires policies for selection in order to define the responsibility it is prepared to accept. Criteria for selection for long-term preservation include (but are not limited to): – The preservation of the intellectual record: for instance the e-journals of major international publishers are being preserved by KB, the National Library of the Netherlands, and Portico in the US. – The preservation of the historical record and cultural heritage: such as material which might otherwise not be retained, such as sound recordings of folk tales, music or dialects, videos or films of everyday life. – The preservation of fragile artefacts: such as books, manuscripts, artworks and textiles which can be better preserved through access to digital surrogates in the digital library. – Widening access to a wider public in time and space through the preservation in digital form of unique or scarce objects. – The preservation of ephemera or material under threat: for example, websites depicting events or social issues which will not forever be on the Internet. – Legal or regulatory requirements: national deposit laws, although quite often not updated for the digital world, may require the preservation of certain material in deposit libraries. Users of other libraries will expect some digital material to be available for ever in a specific library, such as university archives. There may, however, be good reason for a library not to preserve certain material, as it may present too high a business risk and full long-term preservation commitment may be ruled out on grounds of cost or technical difficulty. For example, audio and video materials are notorious for their preservation challenges. Digital material evolves and more complicated material will be developed which is dynamic and unstable. Web harvesting is done by many libraries, but web archiving (see the chapter by Koerbin) still needs further research, as is being promoted by the IIPC, the International Internet Preservation Coalition. Assistance with these sorts of problems can be greatly helped through co-operation; indeed preservation is best seen as a collective responsibility.
117
BPDG_opmaak_12072010.indd 117
13/07/10 11:51
Barbara Sierman
Copyright laws may also influence the preservation of digital material, as these laws are often not suitable for digital preservation and may even forbid certain preservation actions, like creating a new copy of the digital object. When acquiring material, it is important that the library verifies that it has the right to perform digital preservation actions (International 2008). Selection policies may also determine the application of preservation levels to the material which will make clear what to expect from the library. These preservation levels are added as metadata to the digital object and are a means of indicating how much effort the library is able to put into the long-term preservation of that digital object. At the lowest level stands “bit preservation”, where the library will take care to save the basic integrity of the files. The library can decide to have other preservation levels, like “full preservation”, where context information and the original look and feel of the object will be preserved through, for example, migration or emulation. Managing the costs of digital preservation Although very important, not much is known about the costs of long-term preservation. As libraries with a digital collection which will be preserved for the long term accept a commitment for years, it is important to have insight into the investments and budgets needed for these activities. In practice, quite often the initial costs of digital preservation are covered, but the recurring costs are not, as it seems too difficult or even too daunting to calculate these. These difficulties should not be avoided however, and the development of policies and the implementation of preservation actions should be accompanied by a financial plan. Most costing items, such as hardware and storage costs and the costs of personnel, can be reasonably projected. Assistance can be found in comparing cost models of different organizations, such as the interesting conclusion that the costs of the “ingest” process, including activities like file format identification, virus checking, and the addition of metadata, are high, but if done properly will give a pay-back during the rest of the lifecycle of the object. If not done properly, the cost of repair will be extremely high (Eakin, 2008). Costing elements Two initiatives may also throw some light on costing: the LIFE project and the Blue Ribbon Task Force. The LIFE2 project distinguishes different stages during the life cycle of a digital object and relates the various actions needed to preserve this object to these stages. Each of these activities may lead to costs which may vary between different kinds of digital material. The model is accompanied by a set of use cases, based on real life experience in the British Library. The Blue Ribbon Task Force on Sustainable Digital Preservation and Access was created in 20073 and aims to identify and develop useful LIFE: LIFEcycle Information for E-literature, a collaboration between University College London (UCL) Library Services and the British Library, which is funded by the Joint Information Systems Committee (JISC); see more on http://www.life.ac.uk/about/. 3 More about this taskforce is at http://brtf.sdsc.edu/about.html. 2
118
BPDG_opmaak_12072010.indd 118
13/07/10 11:51
ORGANIZING DIGITAL PRESERVATION
economic models for digital preservation and access. Last year it published an interim report and an overview of existing models and practices. In 2009 it plans to publish an economic model which will support “decision makers seeking economic models for access and preservation that promote reliability, cost-effectiveness, trustworthiness, and compliance to relevant policy and regulation”. The IT infrastructure for digital preservation Storing digital information for the long term requires an IT infrastructure: not just a storage facility where both the objects and their related metadata can be stored and kept safe, but also an environment for related activities, like accepting and checking material at ingest, doing metadata work and presentation equipment to support access to the material by offering finding and presentation aids. Regular preservation actions like migration or refreshment will need an IT infrastructure as well. Hitherto there has not been a wide range of possible software solutions on the market to support digital preservation activities. Some organizations are developing their own systems, together with commercial partners; others have bought a commercial solution. The requirements for such a system will not be the same for every library. They are closely related to the ambitions of the library and the digital content that will be stored. Again, the OAIS model offers a set of requirements which most of the commercial vendors like IBM (DIAS), Tessella (SDB) and Ex Libris (Rosetta) claim to meet. Organizations with the necessary technical competence often choose open source products like Fedora and DSpace, which local implementers can adapt to their requirements. When choosing and implementing a preservation system, it is important that interoperability is given particular attention in order to enable the system to communicate with other systems, whether within the library or elsewhere. This requires compliance by the supplier with international standards and the support of various standard metadata schemas. This aspect needs particular attention when one is negotiating with commercial suppliers to ensure as far as possible that they comply and that options for the future are not locked out. Quality and standards Organizations which take care of precious digital material need to gain the trust of the depositors and their clients in the way they handle this material. The Center for Research Libraries in 2007 published Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC 2007), covering the organizational requirements, the way the digital objects are preserved, and the IT environment (currently nominated as an ISO standard). This document can help organizations to identify what is required from a trustworthy digital archive and to use the checklist as a self-auditing tool. The DRAMBORA4 initiative and the Kriterienkatalog vertrauenswürdige digitale Langzeitarchive of the German 4
About DRAMBORA see http://www.repositoryaudit.eu/. 119
BPDG_opmaak_12072010.indd 119
13/07/10 11:51
Barbara Sierman
nestor group are designed for the same purpose. All three guidelines follow the OAIS concept and translate these requirements into a general business environment, not related to a specific community. For small libraries with a small staff these self-auditing methods may be daunting. Such libraries could be helped further if more expert centres, such as bodies like the German nestor, the Nationale Coalitie Digitale Duurzaamheid (NCDD, the Netherlands) and the Digital Preservation Coalition (DPC, UK), were founded, on a national and international scale, to offer them consultancy and support. Collaboration, national and international It is often said that no organization can do digital preservation on its own. The task is too big, too expensive and diverse. Collaboration is required to share experiences, knowledge and even solutions. There has always been a high degree of commitment to knowledge sharing between libraries, and it is important that experience is shared, via conferences, articles and the web. Already the web is a treasure-house of information from libraries and other organizations about their voyages of discovery in digital preservation. Not only do we need stories about experiences, but we also need practical solutions. Some organizations have developed tools for their own use and are willing to share these with colleagues, often for free. These laudable initiatives led to valuable tools like the file format registry Pronom, the format identification tool JHOVE and the metadata extraction tool of the National Library of New Zealand, to name just a few examples. European projects like Planets will produce valuable tools for preservation planning, like the PLATO tool already mentioned. To be able to rely on these products for the long term and to safeguard further development, it is important to accommodate these services in a sustainable (international) organization. The eight rules of digital preservation? Digital preservation is not an easy task, and requires ongoing research, as the world in which digital objects are created is constantly evolving. Several initiatives currently underway aim to support organizations, and it can be foreseen that in a few years’ time a network of tools, services and advice will support the libraries in their task. Maybe, just as 500 years ago, we can soon create the eight rules of digital preservation! Summary – Digital preservation is a “series of managed activities necessary to ensure continued access to digital materials for as long as necessary” – It should involve many parts of the organization, including policy making, collection management, finance, IT and public services – Policy making determines selection criteria, standards and processes, life cycles, access provision and budgetary requirements – Digital preservation is a series of processes consisting of acquiring the selected digital material, preparing the objects, placing them in permanent storage, giving 120
BPDG_opmaak_12072010.indd 120
13/07/10 11:51
ORGANIZING DIGITAL PRESERVATION
access, monitoring the collection and its environment, and reviewing preservation actions when necessary. – Staff involved in digital preservation need specialist knowledge and skills in metadata, ingest and file handling and standards, which need constant updating – Selection criteria include: the preservation of the intellectual, historical and cultural record, the preservation of analogue objects through restricting access to digital surrogates, widening access and compliance with legal or regulatory requirements – Costing should cover staff time for the technical processes alongside the infrastructure costs of hardware, software and long-term storage – Collaboration is essential for sharing the load and responsibility as well as knowledge, experience and solutions. References Beagrie, N. et al (2008): Digital preservation policies study. Available at: http://www.jisc.ac.uk/ Home/publications/documents/jiscpolicyfinalreport.aspx (accessed 14 August 2009) CCSDS (2002) Reference model for an Open Archival Information System (OAIS). Available at: http://public.ccsds.org/publications/archive/650x0b1.pdf (accessed 14 August 2009) Diessen, R. van, Sierman, B. and Lee, C. (2008): Component business model for digital repositories: a framework for analysis. Available at http://www.bl.uk/ipres2008/programme. html (accessed 14 August 2009) Eakin, L. et al. (2008): A selective literature review on digital preservation sustainability. Available at: http://brtf.sdsc.edu/publications.html (accessed 14 August 2009) Digital Preservation Coalition (2008): Handbook. Available at: http://www.dpconline.org/ graphics/intro/definitions.html (accessed 14 August 2009) International study on the impact of copyright law on digital preservation (2008): A Joint Report of The Library of Congress National Digital Information Infrastructure and Preservation Program, the Joint Information Systems Committee, the Open Access to Knowledge (OAK) Law Project and the SURF Foundation, 2008. Available at: http://www.surffoundation.nl/nl/publicaties/ Pages/International-Study-on-the-Impact-of-Copyright-Law-on-Digital-Preservation.aspx (accessed 14 August 2009) Knight, S. (2008): From theory to practice: digital preservation at the National Library of New Zealand. Available at: http://archive.ifla.org/IV/ifla74/Programme2008.htm (accessed 14 August 2009) Kriterienkatalog vertrauenswürdige digitale Langzeitarchive Version 1(2006). Available at: http:// edoc.hu-berlin.de/series/nestor-materialien/2006-8/PDF/8.pdf (accessed 5 June 2010) Neuroth, H., Osswald, A., and Strathmann, S. (2009): Qualification & education in digital curation: the nestor experience in Germany. Digcurr 2009 proceedings, pp. 12-19. Available as download at: http://www.ils.unc.edu/digccurr2009/schedule. (accessed 14 August 2009) Madden, L. (2007): Digital curation at the Library of Congress: lessons learned from American Memory and the Archive Ingest and Handling Test. Available at: http://www.digitalpreservation. gov/library/resources/pubs/docs/digital_curation2007.html (accessed 14 August 2009) Porck, M. and Porck, H. (2007): “Hoemen alle boucken beuuaren sal om eeuuelic te duerene”. Acht regels uit 1527 over het conserveren van boeken. Koninklijke Bibliotheek. [KB internal publication] Tibbo, H., Hank, C., and Lee, C.A. (2008): Challenges, curricula, and competencies: researcher and practitioner perspectives for informing the development of a digital curation curriculum. In: Archiving 2008. Final Program and Proceedings , Springfield, pp. 234-238 121
BPDG_opmaak_12072010.indd 121
13/07/10 11:51
Trustworthy repositories audit & certification (TRAC) (2007): Criteria and Checklist. Available at: http://www.crl.edu/content.asp?l1=13&l2=58&l3=162&l4=91 (accessed 14 August 2009) Planets survey analysis report (2009): http://www.planets-project.eu/docs/reports/planets-surveyanalysis-report-dt11-d1.pdf (accessed 5 June 2010)
BPDG_opmaak_12072010.indd 122
13/07/10 11:51
11 BUSINESS PLANNING FOR DIGITAL REPOSITORIES Alma Swan
State of the art Digital repositories are coming of age. Globally, on average, there has been a repository built every day for the past three years. There are currently (early 2009) around 1300 worldwide. It will be a rare research-based institution which does not have its own repository within a few years. Why the rush? Because the advantages to an institution of having a repository are so great and the payoff so important. Repositories are a strategic weapon in an institution’s armoury, providing the means for developing new institutional processes, enhancing existing ones and promoting the institution to the world. There is a higher-level view on the strategic importance of repositories, too. National or regional level strategies for e-research (or e-science, as some countries call it) articulate schemes where digital open access repositories form the fundamental data-provision layer. See, for example, the National Science Foundation’s plans in the US (NSF/JISC Repositories Workshop 2007), JISC’s Repositories Roadmap for the UK (Heery and Powell 2006); SURF’s Strategic Plan for the Netherlands (SURF 2008), Australia’s e-Research programme (Australian Government 2009) and Europe’s Roadmap for Research Infrastructures (European Strategy Forum on Research Infrastructures 2008). To enable these strategies, coordination mechanisms for repository developments are growing. On a Europe-wide basis, the DRIVER Project provides standards and guidelines for the establishment of digital repositories by research organisations across the continent. National-level repository network developments are also increasing as a result of national ICT or research funding organisations coordinating the development of networks of interoperable repositories across the research base. For example, in the Netherlands the DAREnet network has encouraged and enabled a repository in every Dutch university. Despite rapid developments, the overall situation has not yet shaken down. There are no hard-and-fast rules which can be applied, though generalities are certainly forming, and the main one of these is that institutions seem to think that having a repository is A Good Thing. But an institution can spend a lot of money establishing a repository, or very little. Its repository may fill quickly and smoothly, or it may remain virtually empty for years. It may become firmly embedded in the working life of the institution’s researchers, or it may be something of which they are barely aware. Critically, the repository can be the tool which boosts the institution’s presence on the world’s web stage, or it can help to consign the institution to web obscurity (Swan & Carr 2008). 123
BPDG_opmaak_12072010.indd 123
13/07/10 11:51
Alma Swan
A well-planned repository enables a higher education institution to: – o pen up and offer the outputs of the institution or community to the world – impact on and influence developments by maximising the visibility of outputs and providing the greatest possible chance of enhanced impact as a result – showcase and sell the institution to interested constituencies – prospective staff, prospective students and other stakeholders – c ollect and curate digital outputs (or inputs, in the case of special collections) –m anage and measure research and teaching activities – p rovide and promote a workspace for work-in-progress, and for collaborative or large-scale projects – facilitate and further the development and sharing of digital teaching materials and aids – support and sustain student endeavours, including providing access to theses and dissertations and providing a location for the development of e-portfolios This list gives rise to the first – and most important – question which any institution must answer before it even begins to build its repository: what are we doing this for? There are some sparkling examples of repositories that are real strategic assets in their institutions. They may be the windows on the institution’s research, they may facilitate didactic activity, they may be caring for the institution’s precious special collections, but wherever a repository is succeeding, it has always been conceived with clarity of purpose and implemented with strong focus. Business planning ‘What are we doing this for?’ begs another question, and that is ‘What kind of business model should our repository have?’. Deciding upon business models for enterprises that do not earn revenue is a tricky exercise, but there is some guidance from what has gone before. In 2004, Clarke discussed business models for open source software enterprises in the context of a series of questions (Clarke 2004): –w ho pays? –p ays what? – f or what? – t o whom? –w hy? These are questions that must form part of any new repository planning exercise. They should also be joined by ‘How?’ That is the moment at which the real planning activity begins. Articulating a value proposition for the repository is a useful first exercise, analogous to the articulation of a value proposition by businesses to their customers and based upon an analysis of what elements of value the business can offer. In the case of digital repositories, the value proposition of an open access repository is made to the wider scholarly community from a position of commitment to the scholarly knowledge commons, and is essentially this:
124
BPDG_opmaak_12072010.indd 124
13/07/10 11:51
BUSINESS PLANNING FOR DIGITAL REPOSITORIES
On behalf of the research community, a digital repository proposes to: –m aximise the accessibility –m aximise the availability – e nable the discoverability – e nable the increased functionality – e nable the long-term storage and curation and – e nable other potential benefits … ... of scholarly research outputs at no cost to the user. Slight modifications of this need to be made if opening access to research is not to be the primary purpose of the repository. For example, if the primary purpose is to facilitate teaching and learning, then this should be reflected in the value proposition. A different value proposition must be put within the institution to the senior managers who will sign off the cost of establishing the repository. Here, the value of the repository in assisting the institution in fulfilling its mission to encourage and engender the creation of knowledge and to disseminate it must be stated. It is part of making the business case for the repository within the institution and is a critical step in ensuring the repository’s success. This topic is discussed again later in this chapter. Possible business models for repositories and related services A number of typologies have been suggested for business models for web-based businesses (see, for example, Timmers1998; Rappa 2000). We have previously distilled these to a list of five operational models which seemed applicable to repository-related developments (Swan & Awre 2006): – I nstitutional model: institutions own and run the business to further their own goals and strategies –P ublic sponsors model: public bodies sponsor the business for the public good –C ommunity model: the business runs on a community basis, sustained by the communities it serves – S ubscription model: the business runs on a subscription basis, selling products or services to customers paying cash –C ommercial model: the business runs on a commercial basis (other than subscription-based): a number of sub-types are covered by this term, for example an advertising model These models are set in a matrix with Clarke’s five questions in Table 1. There are examples of each kind of business model in operation amongst existing institutional repositories and the services built around them. The institutional model is, of course, most common. The French national repository, HAL (Hyper-Article en Ligne), is an example of a public sponsor model, sponsored by the national science funder CNRS. The White Rose Repository, a community model, is a collaborative venture from the universities of York, Sheffield and Leeds in the UK. And paid-for services exist which exemplify the subscription and commercial models: for example, Southampton and Tilburg universities offer repository-hosting services for other institutions, and the 125
BPDG_opmaak_12072010.indd 125
13/07/10 11:51
Alma Swan
Institutional model Who pays? Institution Pays what? Cash
Public sponsors model Public body, e.g. research funder Cash
Staff, hardware, Staff, software, hardware, services software, services Service/ To whom? Itself via product internal provider accounting; suppliers if outsourcing any supply elements Why? To further To further the institutional public good aims For what?
Community model
Subscription model
Commercial model
Community members
Users
Users or advertisers
Cash and/or in-kind Staff, hardware, software, services Service/ product provider
Cash, at intervals Service or product
Cash at point of use Service or product
Service/ product provider
Service/ product provider
To further community aims
To acquire the service or product
To acquire the service or product
Table 1: Typology for business models for digital repositories
University of Utrecht offers print-on-demand sales of dissertations and theses. Open access repositories are not, then, completely incompatible with income generation, and those planning a repository may wish to investigate the opportunities for creating some revenue streams further down the line. The important thing when planning a repository is to ensure that the activity is manageable, can be put into effect with the resources available, is viable and sustainable, and is rooted in the values of the institution so that the business case holds water. The next section provides a framework for thinking about these things. Business analysis Clarke’s five questions – who pays, for what, how much, to whom and why? – cover the overall business scheme of a repository. Within that scheme, three main business issues need to be addressed in the context of the ongoing health of the repository, including the additional ‘How?’ question. Issue one, viability, concerns the factors that make the repository happen. Issue two, sustainability, is concerned with resourcing the repository. The third issue, adaptability, is about securing the longer-term future of the repository. Table 2 is a business analysis matrix which summarises these planning issues and the topics that need to be addressed under them as a series of questions (Swan 2007). This table may act as a useful guide for discussions during the business planning process. Making the business case for the repository The business case must address the needs of stakeholders both within and outside the institution – institutional managers, research managers, repository managers, researchersas-authors, researchers-as-readers and research funders. Institutional and research 126
BPDG_opmaak_12072010.indd 126
13/07/10 11:51
BUSINESS PLANNING FOR DIGITAL REPOSITORIES
Viability
Business case Does our business fit stakeholder needs and preferences? • Will the service fit user needs? • Can we make the case to the institution/ organisation? • Is a pilot project necessary or advisable? • Will it tell us much?
Business scope and development
Can we develop and launch Can we manage this business successfully? this? • What is the business going to offer? • How might this change over the short-to-medium term? • Can we do it all ourselves? • Can we make the case to the institution and to the users?
Do our resources at least match our likely costs? • What cost schedules are we • Can we afford this business? • Where might costs change? likely to face? • How do these fit with our • How does the resource implication of the business medium-term budgets? • What other resources might fit with our medium-tolong term plan? be needed and can we • Can the costs be predicted supply them? in the medium term (and met)?
Sustainability
What are the likely costs?
Adaptability
Is our model adaptable? • Can we build in flexibility? • At what cost? • Can we measure payoff? • What new demands or goals may arise?
Business management
Can we build in resilience? • What can we foresee? • How will we cope with that? • How will we monitor for future movements that might be significant?
• What key performance indicators should we use? • What goals might be imposed upon us by others? • Do we need to outsource anything? • How are we going to market our business? • What new tasks might be involved? • What policies and procedures need to be in place? Is our model adaptable and flexible? • Does our long-term plan allow this expenditure? • What margin for error should we factor in? • Can the goalposts be moved (and by whom and for what reason)? • What potential exists for a change of business model? Will all stakeholders remain committed? • What new stakeholders might be brought in? • What is the potential for new developments of any kind? • What new national or international developments may have an impact?
Table 2: Business analysis matrix
managers will view the repository as a potential management and marketing tool; repository managers want to create a system which runs smoothly and can be managed within normal constraints; authors want the repository to provide a simple-to-use interface, a secure storage place for their work and data on how that work is being used; readers want to be able to find and access content easily; funders wish to be able simply to monitor and track the research in which they have invested. Taking these needs into account when planning the repository is crucial, but the justification for a repository must be made to the institution or community which 127
BPDG_opmaak_12072010.indd 127
13/07/10 11:51
Alma Swan
will own and sustain it. The emphasis must therefore be on business reasons that hold most sway in a particular institution: for example, for institutions with a strong focus on research the business case should highlight the role of the repository in increasing the visibility of the institution’s research and the resulting profile and impact. The main business reasons for a repository, with their institutional payoffs, are listed below: Business reasons – To increase the visibility and dissemination of research outputs – To provide free access to research outputs – To preserve and curate research outputs – To collect together the research outputs of the institution in one place – To provide a place for collaborative research programmes to share material – To enable the assessment and monitoring of the institution’s research programme – To provide a place for teaching and learning materials – To enable the development of special (or legacy) digital collections
Payoffs will be measured in terms of: – Improved visibility of the institution – Improved impact of its outputs – More effective ‘marketing’ of the institution – Better management of the institution’s intellectual assets – Easier assessment of what the institution is producing and creating – Facilitation of workflow for researchers and teachers – Facilitation of collaborative research
The research outputs repository also complements other institutional digital collections, contributing to a proper, ‘joined-up’ whole when interoperably linked to databases such as human resource information (researcher names and IDs), the research information database (details of grants, projects, equipment, etc) and other institutional digital archives. In making a case to senior institutional management, the espida project, carried out by the University of Glasgow, may prove useful. This project developed a framework for helping to articulate the value of digital materials and the need to manage them properly (espida 2007). It is wise to plan for a pilot repository, one which can be run and tested for six months or so before the official launch. A pilot phase enables the evaluation of processes, staffing requirements and how content can be recruited. It also allows repository managers to assess how they will cope with peaks and troughs and with new demands placed upon the repository when it is fully functioning. Altogether, a pilot phase is a sensible and important business planning tool. This phase may be informed by the findings from the TARDis project, which examined the critical factors for success for an institutional, multidisciplinary repository (Simpson 2005). The project described distinct differences between subject communities in terms of how the repository relates to their work and these issues must be planned for.
128
BPDG_opmaak_12072010.indd 128
13/07/10 11:51
BUSINESS PLANNING FOR DIGITAL REPOSITORIES
In addressing sustainability in the business plan, the costs of implementing and running the repository come to the fore. These are dealt with in detail later in this chapter. Future adaptability of the repository requires building flexibility into the plan as far as possible. New norms and practices may arise. One example might be a future requirement – by the institution, funder or authors themselves – for the repository to house and look after digital datasets. As well as demanding considerable storage capacity, such a requirement may involve file format manipulation and more elaborate metadata schemas. These things need not be in place from the outset, but the business plan should consider how they may best be provided and at what cost, when the time comes. Business scope and development Repositories may hold many kinds of digital items. It is important at the planning stage to decide upon the scope of the repository for the immediate and longer terms. It is also important to recognise that accepting items into the repository and guaranteeing to preserve them in the long term are two different things. Preservation of some complex digital objects needs considerable expertise, and it is as well to at least acknowledge this at the repository planning stage. In some countries, help may be at hand from nationallevel libraries or data centres. Where this is not so, repository managers must be aware that there may be some costs and challenges ahead if the repository’s policy is to accept any digital items which users wish to deposit and to undertake to preserve them in an accessible state over time. Formats and standards change and format migration is not necessarily a simple task, especially where complex objects are concerned. This issue has been addressed in a number of projects and studies which readers may find helpful (PREMIS Project, PRESERV Project). The other main issue here is that of repository services. It is clear that services are a key to repository acceptance and use. Repository software mostly comes with deposit aids and some sort of search interface, but other things can also be implemented to attract users and increase the deposit rate. Perhaps the most popular is implementing a usage reporting package which provides authors with data on how many times their articles have been viewed and downloaded. The more sophisticated of these packages also provide more granular data such as from which countries or domains the downloads come. Services which provide information on publishers’ rights and permissions for self-archiving (e.g. SHERPA RoMEO), authoring tools (e.g. for file format conversion) and copyright advisory services and tools (e.g. the SURF/JISC Copyright Toolbox and the SPARC/ Science Commons Scholar’s Copyright Addendum Engine) are also extremely useful in enabling authors to manage the dissemination of their work optimally. The business plan should include provision for including such services in the repository offering. Some consideration, even at the initial planning stage, should also be given to future enhancements. Some examples of other repository services which might be incorporated are RSS/Atom feeds, metadata enhancement, the creation of subject-specific views on the repository, simple mechanisms to export content to authors’ home pages and to their CVs or grant proposals, interoperability with the institution’s CRIS (Current Research Information System), establishment of collaborative working infrastructures based on the repository (a so-called ‘collaboratory’), and so forth. 129
BPDG_opmaak_12072010.indd 129
13/07/10 11:51
Alma Swan
Business management In terms of the overall initial viability of the repository one of the key questions to address under this heading pertains to the resources available in relation to the tasks that need to be carried out. There is the possibility of outsourcing, for example. Might it be better to pay for a third party to build the repository, or to host it, or to create user services, than to do these things in-house? Can they be done in-house? Are the resources available? What advantages are there to each of the options? Other key issues to consider are: – evaluation: what performance measures should be used to assess progress? Examples are the level of content recruitment (such as the proportion of annual output from the institution that is collected in the repository), the level of awareness and understanding of users and depositors, financial monitoring and so on – policies: repositories need a range of policies. They include policies on the types of content that will be accepted, on who may deposit items, a take-down policy to cover cases where material must be removed and whether authors may withdraw their items at will, and an institutional policy on whether researchers are required (or simply requested or encouraged) to place their outputs in the repository. In the case of open access research material, for example, the evidence shows clearly that only mandatory policies produce the required level of deposit (Sale 2006) – marketing the repository: the repository must be promoted to the research community, to the institution’s administration and to the wider world. Researchers must be made aware of the advantages to them in depositing their work in the repository; senior managers must be informed that there is now a useful management information tool available; and the wider higher education community should know that the institution’s outputs are now available for use. Business planning should make provision for these marketing activities. Sustainability means planning for change, and change is mostly unpredictable in its detail but predictable in its likelihood! Fleetness of foot is an asset for repository managers. Repositories which start out with deposit mediated by the repository staff may need to switch to author-deposit if levels of materials for deposit start to rise (perhaps as a result of new policies or requirements from institutional management). Indeed, the issue of mediated deposit is one which repository planners must carefully assess. There is an argument that a truly sustainable repository is one where mediation is minimal (Carr and Brody 2007). Metadata represent another area where change may be expected: new requirements and standards will emerge over time. Finally, the need for a change of business model (the repository may develop some paid-for services, for example) may also arise. Long-term adaptability requires some agility and flexibility from repository management. Resilience can be achieved if the plans make provision for adjustments in response to altered conditions. One new condition might be the implementation of an institutional policy which requires authors to place all their outputs in the repository. Considerable advocacy and advisory effort will then be needed from the repository staff to support such a requirement. The recruitment of content to repositories, including the effectiveness of advocacy programmes, has been studied by Proudman (2008) who describes best practice from a series of case studies. And whatever changes occur in the environment in which the repository operates, performance indicators will have to be adjusted accordingly. The 130
BPDG_opmaak_12072010.indd 130
13/07/10 11:51
BUSINESS PLANNING FOR DIGITAL REPOSITORIES
repository will never be a static entity existing in an unchanging habitat: it will need to be responsive and resilient to continue to play its important role in institutional life. Cost/benefit An institutional repository represents quite a new argument for institutional support. Over the past few decades, institutions have allocated growing sums to their IT budgets and libraries to enable them to acquire, disseminate and look after institutional inputs (books, journals, etc). A repository, however, collects together, disseminates and looks after institutional outputs, a new concept for most research-based institutions, at least in terms of digital products. In most cases, the institution’s library is the instigator of the repository (in a minority of cases it is the IT department), and often the repository is established using existing library budgets. There is an argument to be made, founded on the benefits the repository will bring to the institution, for centralised support for the repository project, with a realistic budget line being established for the purpose. Institution
Set-up costs
Running costs
Massachusetts Institute of Technology, Cambridge, USA
Grant for €1.4 million for software development Staffing: 3 FTE System hardware: €310,000 Total: circa €1.9 million
Staffing: €175,000 Operating costs: €19,500 Hardware: €27,500
Queens University, Kingston, Canada
Software: free (DSpace) Hardware: mostly provided at institution from existing stock; server cost €1280 Staffing: programmer for 12 months - €31,000 Additional staff costs for advocacy work (not estimated) Total: circa €32,250
Library staff time: €15,500 IT staff time: €15,500
National University of Ireland, Maynooth, Ireland
Software: free (EPrints) One-off payment to computer science student for set-up and customisation (6 months: €15,000) Server: €5000 Total €20,000
Staffing: 1 FTE for upkeep and maintenance (€30,000)
Software: free (EPrints) Server: €2000 Installation: 2.5 FTE days (€900) Customisation: 15 FTE days (€2700)
Maintenance: (absorbed into current processes but estimated at 5 FTE days per annum: €1800 Coordination and collection of material: €45,000 Three-year update of hardware and software: 2.5 FTE days plus €5600 Total €52,400
SHERPA Project, building a repository for the University of Nottingham, UK
Total: €5600
Annual running cost: €225,000
Annual running cost: €31,000
Total €30,000
Table 3: Set-up and running costs for a sample of institutional repositories 131
BPDG_opmaak_12072010.indd 131
13/07/10 11:51
Alma Swan
The cost of setting up and running a repository can be millions of euros or just a few. It is perfectly possible to implement a fully-functional repository on a small budget. The actual costs of setting up and running repositories presented in Table 3 below have been collected from a sample of existing repositories (Swan et al. 2004; Houghton et al. 2006). The figures have been converted to euros in all cases. It should be noted that the most popular repository softwares (EPrints, DSpace, Fedora) are open source. Average figures, collected from a sample of eleven European repositories, are as follows: Set-up costs: (i) In-house building and hosting – hardware and software: €9250 – staffing: 1.5 FTEs (ii) Outsourced – repository built out-of-house and hosted in-house: €7000 – repository built and hosted out-of-house: €38000 Running costs: Average staff allocation to running a repository is 2.5 FTEs, but this average figure represents a substantial underlying variation, beginning at 0.2 FTE. The benefit side of the equation cannot be measured in cash terms, at least not easily. Instead, it is articulated in other terms, notably increased visibility and impact for the institution and the community benefit from opened-up scholarship, the latter localising the argument around a return to the core academic values of creating and sharing knowledge. High visibility attracts prospective students, researchers and teachers to the institution, along with funding flows. The visibility and impact advantage that a repository can bring in terms of comparative ranking for an institution is huge (Swan & Carr 2008). In addition to these marketing advantages the repository brings other benefits to the institution. It is a management information tool, organising research outputs for internal (and external, if applicable) research assessment procedures and for research monitoring and management processes. It is also the institutional archive, looking after digital items of value to the institution, which may include special collections. Prospects Institutional repositories are becoming an integral part of the life of research-based institutions. Their main role to date has been in disseminating research, and that promises much that has yet to be realised. We are also seeing cases of repositories aligning with university presses to provide the basis for future dissemination practices which reflect the values and traditions of the past when a central part of university missions was to spread their knowledge far and wide. 132
BPDG_opmaak_12072010.indd 132
13/07/10 11:51
BUSINESS PLANNING FOR DIGITAL REPOSITORIES
The role of the repository as a management information tool is also gaining traction. For example, some Australian universities have integrated national reporting practices with their repositories and EPrints has developed a plug-in which supports the reporting requirements for the UK’s national Research Assessment Exercise (Carr and McColl 2005). Institutions (and external funders) have not yet fully grasped the utility of the repository in this regard, but as reporting demands increase the repository will come into its own as the locus for the collection and generation of data. Other roles for repositories may also gain in importance – as collaborative workspaces, for fulfilling funder requirements to have electronic copies of outputs freely available, in adding context to published outputs, as an integral part of the publication process itself, and more. Future-proofing repositories will always be a challenge. Some aspects are fairly predictable: software upgrades, for example. New user services to help researchers and better embed the repository in institutional life will be important. The issue of mediation will come to the fore: for all its advantages in terms of quality control and content recruitment, mediation has a cost and the benefits must be carefully weighed against it. One final point relates to repository management skills. A new career path is opening up and expectations are that professionalisation will follow as repositories grow in stature. Repository management will be an interesting and challenging career option in future. Summary – Institutional repositories are growing in number and have been doing so at an average rate of one per day for the last three years – Repositories maximise the visibility and impact of the institution, and provide a shop window for the institution’s research and a safe place to store outputs – The business planning process involves making the business case, defining the scope of the business and how it will be managed and addresses viability, sustainability and adaptability – The business case includes addressing stakeholders’ needs and preferences, costs and adaptability of the business model – Business scope and development incorporates the development and launching of the repository, its current and future costs and its resilience – Business management covers how the business will be managed, how flexible the model is and a consideration of future developments, within and without the institution, which might affect the repository
133
BPDG_opmaak_12072010.indd 133
13/07/10 11:51
Alma Swan
Bibliography and references Australian Government Department of Education, Employment and Workplace Relations (2009), e-Research. http://www.dest.gov.au/sectors/research_sector/policies_issues_reviews/key_ issues/e_research_consult/ (accessed 12 February 2009) Carr, L. and Brody, T. (2007), Size isn’t everything: sustainable repositories evidenced by sustainable deposit profiles. D-Lib Magazine 13 (7/8) http://eprints.ecs.soton.ac.uk/13872/ (accessed 14 February 2009) Clarke, C. (2004), Open source software and open content as models for eBusiness. Presented at 17th International eCommerce Conference, Slovenia, June 2004. http://www.anu.edu/people/ Roger.Clarke/EC/Bled04.html (accessed 14 February 2009) espida (2007), The espida project. http://www.gla.ac.uk/espida/ (accessed 10 February 2009) European Strategy Forum on Research Infrastructures (2008), European Roadmap for Research Infrastructures. http://cordis.europa.eu/esfri/roadmap.htm (accessed 12 February 2009) Heery, R. and Powell, A. (2006), Digital repositories Roadmap: Looking forward. http://www.jisc. ac.uk/media/documents/programmes/reppres/reproadmap.pdf (accessed 11 February 2009) Houghton, J., Steele, C. and Sheehan, P. (2006), Research communication costs in Australia: Emerging opportunities and benefits. A report to the Department of Education, Science and Training. http://www.dest.gov.au/NR/rdonlyres/0ACB271F-EA7D-4FAF-B3F70381F441B175/13935/DEST_Research_Communications_Cost_Report_Sept2006.pdf (accessed 14 February 2009) Carr, L. and McColl, J. (2005), Institutional Repositories and Research Assessment: RAE software for institutional repositories ( IRRA Project white paper). http://irra.eprints.org/white/ (accessed 14 February 2009) NSF/JISC Repositories Workshop (2007), April 17-19, Phoenix, USA. http://www.sis.pitt. edu/~repwkshop/ (accessed 11 February 2009) PREMIS (Preservation Metadata Maintenance Activity) project. http://www.loc.gov/standards/ premis/index.html (accessed 14 February 2009) PRESERV (Repository Preservation and Interoperability) project. http://preserv.eprints.org/ (accessed 14 February 2009) Proudman, V. (2008), The population of repositories. In: A DRIVER’s Guide to European repositories. Weenink, K., Waaijers, L. and van Godtsenhoven, K. (Eds). Amsterdam University Press Rappa, M. (2000), Business models on the Web: managing the digital enterprise. North Carolina State University, USA, 2000. http://digitalenterprise.org/models/models.html (accessed 14 February 2009) Sale, A. (2006), Comparison of IR content policies in Australia. First Monday, 11 (4). http://eprints. utas.edu.au/264/ (accessed 13 February 2009) Simpson, P. (2005), TARDis Project Final Report. Southampton, UK, University of Southampton, University Library, 14pp. http://eprints.soton.ac.uk/16122/ (accessed 8 February 2009) SURF (2008) Strategic Plan: ‘Thinking Ahead’. http://www.surffoundation.nl/smartsite. dws?ch=ENG&id=12025 (accessed 12 February 2009) Swan, A. (2007), The business of digital repositories. In: A DRIVER’s Guide to European repositories. Weenink, K., Waaijers, L. and van Godtsenhoven, K. (Eds). Amsterdam University Press. http://eprints.ecs.soton.ac.uk/14455/ (accessed 14 February 2009) Swan, A. and Awre, C. (2006), Linking UK repositories: Technical and organisational models to support user-oriented services across institutional and other digital repositories: Scoping study report. http://www.jisc.ac.uk/uploaded_documents/Linking_UK_repositories_report.pdf Appendix. http://www.jisc.ac.uk/uploaded_documents/Linking_UK_repositories_appendix. pdf (accessed 14 February 2009) 134
BPDG_opmaak_12072010.indd 134
13/07/10 11:51
BUSINESS PLANNING FOR DIGITAL REPOSITORIES
Swan, A. and Carr, L. (2008), Institutions, their repositories and the Web. Serials Review, 34 (1). http://eprints.ecs.soton.ac.uk/14965/ (accessed 14 February 2009) Swan, A., Needham, P., Probets, P., Muir, A., O’Brien, A., Oppenheim, C., Hardy, R. and Rowland, F. (2004), Delivery, management and access model for E-prints and open access journals within further and higher education (Report of a JISC study). pp 1-121. http://eprints.ecs.soton. ac.uk/11001/ (accessed 10 February 2009) Timmers, P. (1998), Business models for electronic markets. In: Gadient, Yves, Schmid, Beat F., Selz, Dorian, EM-Electronic Commerce in Europe. EM- Electronic Markets, 8 (2) 07/98. http://www.electronicmarkets.org/issues/volume-8/volume-8-issue-2/businessmodels0.pdf (accessed 14 February 2009)
135
BPDG_opmaak_12072010.indd 135
13/07/10 11:51
BPDG_opmaak_12072010.indd 136
13/07/10 11:51
12 PROBLEMS OF MULTI-LINGUALITY Genevieve Clavel-Merrin
Introduction Libraries, especially those at the academic or national level, have traditionally held collections representing many languages and scripts. In most cases, however, access to this (mainly) printed material has been at the metadata level (bibliographic records) through single language indexes (subject or author, controlled vocabulary) which have enabled material published in different languages to be brought together. These vocabularies are naturally different from country to country, but even within one language zone vary according to library type or specialty: for example, while LCSH (Library of Congress Subject Headings) and MESH (Medical Subject Headings) have some terminology in common, they are to all intents and purposes two different languages, indeed those working in the field speak of subject heading languages. Interoperability or access to different collections in different languages is therefore already problematic on the metadata level; but the growth in digitized full-text material and networking or aggregating data presents new challenges. In addition, when planning for a multi-lingual digital library, different levels and interpretations of multi-linguality need to be taken into account: data management and display; interface; controlled vocabularies; full text. The scale of these challenges will vary according to context: at one extreme a large-scale pan-European initiative such as Europeana (http://www.europeana.eu) involves the management of multiple languages at both interface and searching, whereas in a multi-lingual country such as Switzerland the goal may be to manage the local languages (often plus English). Costing and business planning are therefore very variable, and scalability difficult. In addition, the field of cross-language access remains one of experimentation in which no standard solution is available on the market. In terms of planning, it must be recognized that the variety of cases possible and levels of complexity make it difficult to estimate costs and time required. At the same time, multi-lingual access brings benefits to the user, enabling searching to be carried out in his/her native language and thus allowing access to a wider range of resources, while it allows libraries to give wider access to their collections and promote their use, gaining a larger potential audience across the globe. The different levels of multi-linguality are discussed below, with a presentation of the state-of-the-art and consideration of future prospects.
137
BPDG_opmaak_12072010.indd 137
13/07/10 11:51
Genevieve Clavel-Merrin
Data management and display At the most basic level, access to digital resources in multiple languages may be technically difficult through the use of different character sets and scripts. Although the increase in use of Unicode (http://Unicode.org) in standard software, browsers and hardware such as keyboards facilitates data management and display, and most standard database management systems and access systems will be Unicode compliant, difficulties may occur with scripts or diacritics in searching across multiple collections or when aggregating bibliographic or full-text data from different sources (Clavel, 2006). In addition, keyboards are generally configured for local (national) languages, thus hindering the input of special characters. When planning for access to a multi-lingual digital collection, it is essential to allocate time to test data input, access and display. The time required will depend on the complexity of languages and scripts present: a collection with documents in multiple scripts will require more testing, and staff expertise. Within The European Library service (http://www.theeuropeanlibrary.org), for example, a Character Set Group has been set up to check interoperability questions of this type, sharing the test load. However many questions, particularly in the field of alphabetical sorting for display, remain, as the following examples show: z should be sorted before t in Estonian, č and ř in Slovak follow c and r, and are not inter-filed, õ, ä, ö and ü in Estonian are filed at the end of the alphabet, ch in Slovak is treated as a single separate character, and ä, ö, ü are treated as ae, oe, ue in German (whereas ä in French is treated the same as “a” . When results are sorted by relevance the problem may be seen as less acute, but it may cause surprise if an alphabetical sort is chosen. An awareness of these questions is necessary when searching and testing systems in a multi-lingual environment, bearing in mind that there are currently no systems available which manage to treat all these questions successfully. Provision of a software extended (virtual) keyboard will facilitate the input of special characters: open-source versions are available, and examples may be seen on The European Library site (http://www.theeuropeanlibrary.org ). Interface The interface to a collection may contain general information pages, help screens, commands, Web forms and comment pages. In the case of The European Library, and increasingly where collections are aggregated, there are also collection descriptions to be considered. These will significantly increase the translation load: for example the costs of translating over 300 collection descriptions in The European Library into 27 languages plus interfaces amount to €72,000, plus staff time for checking. Few digital collections may require work on so many languages, but it illustrates the complexities and requirements. Good translators are costly: while staff may assist in checking and quality control, it is recommended that the initial work be carried out by professionals, working from a source language, English in the case of The European Library.
138
BPDG_opmaak_12072010.indd 138
13/07/10 11:51
PROBLEMS OF MULTI-LINGUALITY
Maintenance of multiple interfaces can be costly and time-consuming, especially when major upgrades are carried out. In terms of software design, it is important to check that language/interface administration is not ‘hard-coded’ into a system but consists of tables separate from other code elements offering differing levels of management (view, comment, update). Scalability of languages should be considered: how many languages may be supported and in which scripts? Language management tools should be present which enable translation coordination to be optimized, e.g. is it possible to update one interface language without needing to update all simultaneously? (If not, release of time sensitive information may be hindered.) If parts of the interface are not translated, is there a default display of e.g. English to avoid dead-ends in one language? (One should be aware however that this may lead to confusing displays if translations are not managed in a timely fashion.) A management tool should ideally offer a flagging or alert mechanism so that if one part is modified those managing the other languages are tagged that update is required. If comment forms are available, ensure that someone on staff or a contact is able to read the feedback and reply in the appropriate language. Controlled vocabularies In parallel with work on cross-language information retrieval in full text collections, studies continue on the use of controlled vocabularies in the multi-lingual environment for both browsing and searching bibliographic metadata. Digital collections of image material will also need some form of tagging or subject description as a substitute for full-text searching, as image recognition tools are not yet current in digital library systems. Multilingual access initiatives cover subject headings, names and places, and classifications. There are two main challenges to the use of multi-lingual controlled vocabulary: the creation or adaptation of a multi-lingual list, and the implementation of such a list in a search interface. Work on interoperability across controlled headings has been carried out since the 1980s: initially in the field of thesauri using the ISO 5964 standard originally published in 1985: “Guidelines for the establishment and development of multi-lingual thesauri” (see also Landry, Žumer and Clavel-Merrin, 2006). Different mapping approaches have been taken, with efforts to introduce some automation, for example in the UMLS meta-thesaurus (http://www.nlm.nih.gov/research/umls/umlsdoc.html) the mapping of terms from the different thesauri is carried out using a software programme called Semantic Network. The HILT project (High Level Thesaurus, http://hilt.cdlr.strath.ac.uk/ ) is one of the most promising developments in this area, although it is restricted to English-language thesauri. The HILT pilot “terminologies route map” was built using the commercially available Wordmap software. (This company specializes in the development of systems based on knowledge maps, in which vocabularies are linked together.) Interoperability between subject headings and classifications (for example DDC) has been tested in WebDewey (http://www.oclc.org/dewey/versions/webdewey/) and the CrissCross project at the Deustche Nationalbibliothek running from 2006 to 2010 (www. 139
BPDG_opmaak_12072010.indd 139
13/07/10 11:51
Genevieve Clavel-Merrin
fbi.fh-koeln.de/institut/projekte/CrissCross/index_en.html). The project aims to create a multi-lingual, user-friendly, thesaurus-based research vocabulary. Subject headings of the German language Subject Heading Authority Files (SWD) are first linked with the indices of the Dewey Decimal Classification (DDC). These will then be linked to equivalents in two foreign-language authority files, the Library of Congress Subject Headings (LCSH, in English) and Rameau (in French). Its organisation is based on the results of the MACS Project (https://macs.cenl.org) which links LCSH, Rameau and SWD. For a description of the linking project see Landry (2009). Two commercially available subject heading lists are available for collections requiring bilingual access only: the Canadian Répertoire des vedettes matières for English-French (http://www.collectionscanada.gc.ca/rvm/index-e.html) and Bilindex (http://www. bilindex.com/) for English-Spanish. Examples of bilingual searches using these may be seen in the catalogue of Libraries and Archives Canada, Amicus (http://amicus. collectionscanada.gc.ca/aaweb/aalogine.htm) and that of the San Francisco Public Library (http://sflib1.sfpl.org/). The above projects have highlighted a number of problems in mapping generalized vocabularies: differences in word use (in different natural languages); differences in coverage (scope); in semantics (meaning), and in semantic relations (preferred terms/ levels of terms) (Doerr, 2001). Manual mapping is costly and time-consuming. For the CrissCross project an estimated 4 person years are required to link German SWD headings to 50,000 LCSH headings, with an additional 3 person years to link 70,500 DDC notations. Planning and coordination of 1 person year must also be taken into account (Clavel-Merrin, Žumer, and Landry, 2006). The same study estimated the following times based on data from some of the TEL-ME-MOR partner libraries (taking an average of 8 links per hour and a working year of 1,560 hours): – Hungary, 35,650 headings: ca. 3 person years – Estonia, 26,000 headings: ca. 2 person years – Lithuania, 54,153 headings: ca. 4.5 person years – Cyprus, 1,042 headings: ca. 1 month – Czech republic, 38,257: ca. 3 person years In addition, these figures cover only one type of subject heading per institution. SchmidtSupprian (2007) extended the TEL-ME-MOR survey to 43 national libraries in the European Library and identified 145 controlled vocabularies in use across 207 collections in those institutions. There is therefore great interest in semi-automatic mapping of headings in order to reduce costs and speed up the creation of multi-lingual lists. Within the TelPlus project (http://www.theeuropeanlibrary.org/telplus/) mapping and alignment of vocabularies in SKOS (Simple Knowledge Organization System, http:// www.w3.org/2004/02/skos/) is tested using MACS data. In the field of multi-lingual access to names, the VIAF project (Virtual International Authority File, http://www.oclc.org/research/projects/viaf/ ) demonstrates an approach to automated mapping. The project currently links the national name authority files from the Library of Congress, the Deutsche Nationalbibliothek and the Bibliothèque nationale de France in a prototype available at http://viaf.org, and in 2008 called to other national 140
BPDG_opmaak_12072010.indd 140
13/07/10 11:51
PROBLEMS OF MULTI-LINGUALITY
libraries for partners to extend the consortium and the mapping work. The goal is to allow local variations of standard forms of names to be linked and ultimately to retrieve records from systems using different standards (or languages in the case of corporate or place names). As can be seen, there are numerous mapping initiatives, but as yet no practical applications of enabling true multi-lingual (rather than bilingual) searching. The VIAF prototype links to authority records but not to bibliographic data, although this is planned in a future stage. CrissCross is still in development: it is planned to integrate it in the catalogue of the Deutsche Nationalbibliothek and in those libraries using the same software. HILT is still in a testing and prototype phase, while following a prototype access to extracts of the collections of its partner libraries (British Library, Bibliothèque Nationale de France, Deutsche Nationalbibliothek and Swiss National Library), the MACS project concentrated on mapping until 2007 when work began anew on a search interface. One of the difficulties in implementation lies in the need to have a critical mass of data to use in searching. Since manual mapping takes time, this slows implementation, and since results are not rapidly visible, it is more difficult to achieve proof of concept – and therefore funding. The commitment of national libraries and The European Library to multilingual access has, however, ensured that work continues. Initial tests have highlighted interoperability issues in searching (characters sets), and also questions of relevance and noise which need to be investigated before an operational system is developed. User testing and focus groups will need to be put in place to examine user preferences with regard to the transparency of the searching, i.e. how far to involve the user in choice of terms and whether to display the other language equivalents. Free text The fully digital library offers the user a wealth of data and search possibilities across complete texts, including opportunities for relevance ranking which are much wider than is possible within bibliographic records or metadata which are by comparison succinct, not to say minimal. However, in a multi-lingual environment, controlled vocabulary offers the advantage of allowing the user to search in his/her own language for material held in other languages which he/she may be able to understand but not use for search terms. Until recently, machine translation and cross-language searching have been extensively tested in fora such as TREC (Text Retrieval Conference, http://trec.nist.gov), SIGIR (Association for Computing Machinery’s Special Interest Group on Information Retrieval, http://www.sigir.org) or CLEF (Cross-Language Evaluation Forum, http://www.clefcampaign.org). These experiments have proved rich in results, but invariably work in a controlled experimental environment covering clearly delimited fields or themes with limited semantic ambiguity, parallel texts and static collections. The digital library (or web) world is different on all levels: it is cross-domain, covering in theory all human knowledge, frequently changing, and texts vary widely in their complexity. As collections expand and are linked, multi-lingual retrieval on full text has moved from the laboratory to become a focus for applied research with a goal to implement operational 141
BPDG_opmaak_12072010.indd 141
13/07/10 11:51
Genevieve Clavel-Merrin
services. This is reflected in the announcement made by the European Commission that in 2009 14 million Euros will be available with the Framework programme 7 for funding projects and networks in the fields of machine translation for the multilingual web, content development, best practices and standards. (http://ec.europa.eu/information_ society/activities/econtentplus/index_en.htm). The lessons and techniques learned in controlled testing are extended to the digital library in projects such as TelPlus (http://www.theeuropeanlibrary.org/telplus/index.php) which studies multi-lingual access to full text to provide building blocks for both Europeana and the European Library. The semantic management of full text poses the same difficulties as that of controlled vocabularies in ambiguity (both phonetic and lexical, syntactic and semantic), but faces additional challenges in sentence structure, identification of names or places, synonyms and also abbreviation expansion. Evolution in language use, spelling and terminology over the centuries reduces precision in searching: for example, contemporary newspapers during the 1914-18 War will not speak of the First World War or the Great War! These questions are as valid for mono-lingual searching as for multi-lingual searching. Tools such as dictionaries and stemmers are widely used in search engines, but detailed information is difficult to obtain for commercial reasons. Stemming OCR text may introduce inaccuracies depending on the level of recognition achieved: this area is still largely unexplored. Within the TelPlus project a study has been carried out on the state of the art of semantic and multilingual engines or tools for digital libraries, including open source and free resources (Freire, Mane, Petz, 2008: restricted access). It found that commercial companies are unwilling to provide details about techniques or may assure one that all areas are covered, and discovered no free information retrieval engines which completely covered all requirements in semantic management, text mining and natural language processing. However, the study did indicate that many services support automatic language recognition, that character set and language support are wide-ranging. Not all available products feature advanced linguistic tools and open source software in general has fewer language features, generally restricted to Western-European languages. Automatic language recognition in mixed language documents remains problematic. As in mapping, scalability remains a challenge, not only in terms of searching and ranking potentially several million documents but also concerning the number of languages to be managed. Word pairs are being developed (see e.g. the Wordnet structure, http://www. globalwordnet.org/ and http://www.globalwordnet.org/gwa/wordnet_table.htm for a list of available nets for each word pair) but, as Siebinga (2008) has indicated, whereas 3 languages will produce 6 pairs; 30 languages will produce 870 unique pairs, going far beyond current commercially available resources (though not Google: see below). These difficulties mean that any implementation of multi-lingual services must be aware that 100% accuracy is not achievable and that goals of 80% or much less, depending on languages, availability of dictionaries and text quality, may be much more realistic. However outside the experimental environment and in a large-scale document repository evaluating this is problematic. Ultimately, user satisfaction is the gauge. Another area to analyse when considering the introduction of multi-lingual service, and probably to test with user groups, is that of transparency: machine translation techniques may be used if the process of translation is transparent to the user – though a message should indicate 142
BPDG_opmaak_12072010.indd 142
13/07/10 11:51
PROBLEMS OF MULTI-LINGUALITY
why results are in other languages. An alternative is a dictionary-based approach which can be used for interactive systems in which the user is shown proposed translations which may be selected or changed. The degree of user interaction will also determine whether to opt for automatic language identification or to request the user to state the source language (and perhaps also the target language(s) for practical reasons of scale and understanding). Ways in which the language of results is identified should also be checked: is it done automatically or through metadata embedded in the documents? If the latter, the question of mixed languages in documents arises. An example of user interaction may be seen in the MultiMatch project (http://www. multimatch.org/), in which searches are carried out using a combination of machine translation and domain-specific dictionaries. Users may select the source and target languages as well as the most appropriate translation from among those displayed by the system (Amato et al., 2008). A good example of a pragmatic approach is provided by the Library of the Free University of Bozen-Bolzano (Kugler & Dini, 2004) which defines its goal as “We want to enable the user to type free text in his/her own language and retrieve all and only relevant hits in all languages s/he is interested in”, but it is also intended to moderate user expectations on the basis that users are: – able to understand the content of the material retrieved in the selected languages (limiting number of languages, no results translation), – used to standard web search engines (interface will be as simple as possible), – expecting the query to target full-text, not metadata, – therefore not expecting a controlled vocabulary (concentrate on free text), – experienced with the concept of query refinement (expects the need to reduce hits), – prepared to browse several results lists (do not expect full precision) Free text searching is not synonymous with full text searching: it may also be carried out on bibliographic metadata, though the comparatively small records mean that disambiguation is more complex than in full text. The CACAO project (http://www. cacaoproject.eu/) uses a hybrid approach with an architecture based on query translation in which a linguistic analysis of ambiguity is carried out but supplemented by subject headings, the identification of names or entities that are not to be translated, some manual translation and the enrichment of metadata using dictionaries and other resources. In this project, as in TelPlus, the linked French/English/German subject headings from the MACS project are being used to support searching and also as a ‘gold standard’ against which automatic mappings are compared. The need for complementary approaches using controlled and free text is now being recognized. In addition, a prototype of the CACAO information retrieval system has been entered in the CLEF 2008 track, bringing together the operational data and the laboratory environment and hopefully creating an enriching partnership which will lead to further improvement (Levergood, Farrenkopf and Frasnelli, 2008). Translation of results Once results in different languages have been retrieved, the question remains how much the user will understand. When target languages are chosen by the user, one may assume 143
BPDG_opmaak_12072010.indd 143
13/07/10 11:51
Genevieve Clavel-Merrin
that he/she will have sufficient passive knowledge to be able to understand the text retrieved. In other cases, abstracts in other languages may enable sufficient analysis to decide whether a document should be translated. A pragmatic approach for scientific and factual material is to offer a tool such as Google translate (http://translate.google.com/), which proposes 41 languages and 1,640 language pairs. Results are uneven but permit a general understanding of a text’s content, though nuances are not guaranteed. A feedback mechanism to improve results is an idea which library developers could usefully adopt. It might also be interesting to investigate use of the tool at input. Conclusion Although many exploratory projects exist and elements of solutions are available in the research and commercial sector, to our knowledge no fully operational system covering all aspects of multi-linguality is available for the digital library. Costs are difficult to estimate, as are the benefits until operational systems are available. However, demand is high, especially outside the English-speaking world, as underlined also by the emphasis placed on multi-linguality by the European Commission. While some might see this as arising from political and cultural motives, it is significant that a large Consortium such as OCLC (http://www.oclc.org) is also investing in the area. If one is planning to introduce elements of multi-linguality into a system, it is advisable to adopt a pragmatic approach: track prototyping in the projects indicated, discussing needs with user groups, being clear on goals (retrieval, translation of results, metadata or full-text multi-lingual searching) and remaining realistic, not expecting perfection. Summary – Multi-linguality covers different aspects • Data input and display • Interface • Controlled vocabulary access to metadata • Access to free text • Translation of results – Ambiguities abound: phonetic and lexical, syntactic and semantic – Problems and approaches differ widely across digital library environments – Progress is being made in many projects and research, but there are very few, if any, operational scalable examples – The European Library and Europeana are conducting research in this area – Scalability is a problem for both technical feasibility and estimation of costs. – Automatic translation and creation of thesauri remain problematic outside controlled environments or datasets – Build incrementally – work will come out of Europeana and Europeanaconnect (which includes multilingual browsing via controlled vocabularies and ontologies)
144
BPDG_opmaak_12072010.indd 144
13/07/10 11:51
PROBLEMS OF MULTI-LINGUALITY
References ANSI/NISOZ39.19-2005 Guidelines for the construction, format, and management of monolingual controlled vocabularies. Bethesda, Maryland: NISO Press, 2005. 172 pp BS 8723-4:2007 Structured vocabularies for information retrieval – Guide, Part 4: Interoperability between vocabularies. BSI: 2008, 55pp Amato, Giuseppe et al. (2008), The MultiMatch prototype: multilingual/multimedia search for cultural heritage objects ECDL 2008, LNCS 5173, pp. 385–387. c. Springer-Verlag Berlin Heidelberg 2008 Chan, L. and Zeng, M. (2002), Ensuring interoperability among subject vocabularies and knowledge organisation schemes: a methodological analysis. 68th IFLA Council and General Conference, 18-24 August, Glasgow, Scotland, UK. Available from: http://www.ifla.org/IV/ ifla68/papers/008-122e.pdf (retrieved 5 June 2010) Clavel, P. (2006), How localization challenges international portals: character sets and international access. World Library and Information Congress: 72nd IFLA General Conference and Council, 20-24 August, Seoul, Korea. Available from: http://www.ifla.org/IV/ifla72/papers/077-Clavel_ trans-en.pdf (retrieved 26 February 2009) Clavel-Merrin, G., Žumer, M. and Landry, P. (2006), Scenarios for multilingual access (Deliverable 3.6). TEL-ME-MOR: The European Library. Available from: http://www.theeuropeanlibrary. org/portal/organisation/cooperation/archive/telmemor/docs/D3.6_Multilingual_scenarios. pdf (retrieved 26 February 2009) Crane, G. (2006), What do you do with a million books? D-Lib Magazine Vol. 12, no. 3. Available from: http://www.dlib.org./dlib/march06/crane/03crane.html (retrieved 5 June 2010) Doerr, M. (2001), Semantic problems of thesaurus mapping. Journal of Digital Information 1 (8). Available from: http://journals.tdl.org/jodi/article/view/31 (retrieved 5 June 2010) Freire, N., Mane, L. and Petz, G. (2008), State of the art of semantic and multilingual engines or tools for digital libraries. Annexe 3 : Open Source and free resources. Deliverable 3.1 of the Telplus project. (Still restricted on 5 June 2010) Kugler, U. & Dini, L. (2004), Cross-lingual search in the Library of Free University of BozenBolzano Landry, P. (2009), Multilingualism and subject heading languages: how the MACS project will be providing multilingual subject access in Europe. Paper given at the CILIP Cataloguing and Indexing Group Conference, University of Strathclyde, 3-5 September 2008. To be published in Catalogue & Index in 2009. Landry, P., Žumer, M. and Clavel-Merrin, G. (2006), Report on cross-language subject access options. Deliverable 3.4 TEL-ME-MOR. Available from: http://www.theeuropeanlibrary. org/portal/organisation/cooperation/archive/telmemor/docs/D3.4-Cross-language-access.pdf (retrieved 27 February 2009) Levergood, B., Farrenkopf, S. and Frasnelli, E. (2008), The specification of the language of the field and interoperability: cross-language access to catalogues and online libraries (CACAO). International Conference on Dublin Core and Metadata Applications, North America, 010 09 2008. Available from: http://dcpapers.dublincore.org/ojs/pubs/article/view/933/929 (retrieved 27 February 2008) Schmidt-Supprian, C. (2007), Controlled vocabularies in a multilingual federated search environment: the example of the European Library. Submitted in partial fulfilment of the requirements for the degree of Master of Library and Information Studies. Dublin: National University of Ireland University College Dublin School of Information and Library Studies. 130pp
145
BPDG_opmaak_12072010.indd 145
13/07/10 11:51
Genevieve Clavel-Merrin
Siebinga, S. (2008), Implementing multilingual information access in the European Library. EDLProject Conference Frankfurt 31-01-2008. Available from: http://www.theeuropeanlibrary. org/portal/organisation/cooperation/archive/edlproject/conference/downloads/EDLconf_ Sjoerd.pdf (retrieved 27 February 2009) Zeng, M. L. and Mai Chan, L. (2004), Trends and Issue in Establishing Interoperability Among Knowledge Organization Systems. Journal of the American Society for Information Science and Technology, Vol. 55, no.5, pp. 377-395
146
BPDG_opmaak_12072010.indd 146
13/07/10 11:51
13 Business models for Open Access publishing and their effect on the digital library David C. Prosser
The Internet has radically changed the way in which researchers and students gain access to scholarly research articles. The vast majority of core research journals are available online (over 90% in science, technical, and medical subjects) and readers have become used to 24/7 desk-top access to articles of interest. While access has changed beyond recognition, other aspects of the scholarly communication process have been less impacted on. For example, with the exception of a few experiments, peer review has remained fundamentally unchanged for over 50 years. Between these two extremes lie business models for electronic publishing – a transition is currently taking place, but we are in the middle of that transition. However, even in the midst of the change it is clear that there will be significant implications for libraries as a result of the new models. Open access Over the past decade open access (OA) has become an increasingly important part of the scholarly communications landscape. In February 2002 the Budapest Open Access Initiative defined open access as the: “…free availability [of research articles] on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself.” Further, two types of OA were described: 1. Self-archiving (often referred to as ‘Green’ OA) whereby authors (or their proxies) deposit copies of their articles in open, interoperable electronic archives. These copies may be the exact versions published (including the formatting of the publisher), the authors’ final peer-reviewed manuscripts, or even pre-refereed pre-prints. 2. Open access journals (often referred to as ‘Gold’ OA) where journals place no financial barriers (such as subscriptions) between research and readers but look to other revenue sources to meet electronic publication costs. Since 2002, OA has assumed increasing importance in the provision of scholarly information. By any metric – number and size of repositories, number and impact of 147
BPDG_opmaak_12072010.indd 147
13/07/10 11:51
David C. Prosser
journals, number and impact of funder and institutional OA mandates – we see significant year-on-year improvements. However, while the library community has been intimately involved in developing the OA infrastructure, the effects to date on the digital library have been less clear. What is clear is that OA – through both repositories and journals – signals a long-term shift, not just in the delivery mechanism for scholarly materials, but also in the business models which underlie that provision. Moreover, the successful business models will almost certainly be different for green and gold OA. In the traditional, paper-based library a significant proportion of the budget was dedicated to purchasing materials, cataloguing them, and ensuring that the material was available for physical inspection by readers. In the move over the last 15 years to increasing online provision there has been a shift to licensing access to material, rather than purchasing it, with the material hosted by the publisher rather than the library. This shift started in journals, but has increasingly become apparent in monographs and textbooks. Today, the physical holdings of many research libraries are significantly smaller than the ‘e-holdings’ (or, at least, the e-access). Despite this shift it is still true that a large, and increasing, proportion of the library budget is dedicated to providing access to information. But in an OA environment access is free. There are no financial barriers placed between the research and reader and so no need for acquisition budgets, for purchasing decisions, or for systems to track holdings. What then is the role of the digital library in this open access world? To begin to answer this question we need to look at the two types of OA separately. The economics of Green OA Green OA is solely concerned with access. Articles deposited in repositories have been peer-reviewed by journals (either OA or subscription-based), and so none of the economics of peer-review apply to green OA. The main economic issue revolves around the provision of the archives into which articles are deposited. Within the OA community there is an ongoing debate as to the ideal type and location of OA repository: should repositories be centralised (for example nationally or subject-specific) or decentralised, based on academic institutions. Examples of centralised archives include the physics pre-print server arXiv and PubMed Central (PMC) in biomedicine. Decentralised, institution-based repositories abound, and over 1500 are listed by OpenDOAR and beautifully visualised by Stuart Lewis’s Repository 66 mash-up. This is not the place to rehearse the arguments for and against either centralised or decentralised repositories, but to note that the business models for both differ. In general, centralised repositories tend to hold more content than decentralised ones and so have larger costs which are often met by the host institution. PMC is funded by the US government through the National Library of Medicine. The UK equivalent, UKPMC, is funded by a group of UK biomedical research funders. arXiv is an interesting example of an evolving business model. Previously the full costs were borne by its host institution, Cornell University library. But as it continues to grow these costs have increased beyond 148
BPDG_opmaak_12072010.indd 148
13/07/10 11:51
Business models for Open Access publishing
what an individual institution can carry and Cornell has launched an appeal for funds from institutions internationally that make use of arXiv. This example typifies the problem faced by free, immensely useful services with international usage which are funded locally. As a community we have yet to develop models which allow the costs of such services to be shared fairly amongst all users. Decentralised, local repositories have tended over the past few years to be located within institutional libraries. The direct costs involved in setting up an institutional repository can be low – some server space and time to install open source software. However, there are human resources issues which any institution will need to address: how much customisation of the software will be required? Will researchers be required to deposit their articles directly or will a mediated service be offered (where staff deposit on behalf of the researchers)? Will material deposited be checked for copyright infringement? What about the accuracy of metadata? Will the institution undertake an advocacy programme to encourage usage of the repository? Clearly, the more activities and services that are envisaged for the repository the greater the potential cost. In The business of digital repositories Swan (2007) (see also her chapter in this book) lists the main benefits to an institution of an institutional repository as – Increasing the visibility and dissemination of research outputs – Providing free access to research outputs – The preservation and curation of research outputs – The collection of research outputs – Research assessment and monitoring – A place for teaching and learning materials – The development of special (or legacy) digital collections To this list can be added enabling researchers at the institution to fulfil their obligations to funders where deposit mandates are in place (see below). Different audiences within the institution will place varying levels of importance on the different benefits. A researcher may be interested in the long-term preservation of his/her own work, a research manager on the use of the repository for assessment and management, and a senior administrator on increasing the visibility of research and so improving the institution’s standing in international league tables or as a means of showing the value of the institution. This point was made forcefully by David Shulenberger in a speech at the 2008 SPARC Digital repositories’ meeting in Baltimore when he argued that the repository was a vital tool in engaging with the public and policy makers – of particular importance in these times of economic difficulties, with universities needing to make the argument for continued levels of funding. This variety of messages to a variety of stakeholders within the institution should not be seen as a lack of focus, but an indication of the flexibility and power of repositories to fulfil the disparate needs of a wide range of users. The business model for an institutional repository should take advantage of this flexibility.
149
BPDG_opmaak_12072010.indd 149
13/07/10 11:51
David C. Prosser
For institutions which do not have the resources (in terms of time, money, or people) to run a repository in-house there are a couple of options. Firstly, there are a number of repository hosting services. For example, BioMedCentral offers Open Repositories based on the DSpace repository software. The EPrints team at the University of Southampton in the UK also offers repository hosting (amongst a variety of other repository services) through EPrint Services. By handling the technical issues of the repository – ensuring interoperability, software updates, providing economies of scale, etc. – these services allow the institution to concentrate on the policy issues surrounding repositories and the practical issues of content acquisition. A second option for institutions is to work collaboratively to set up shared repositories. A group of institutions can come together to pool technical and advocacy resources. This model has been used by a collection of three universities in the north of England – the White Rose Group. White Rose Research Online brings together research from the Universities of Leeds, Sheffield and York in a single repository, but allows users to search across material in each institution separately. With the development of cloud computing the collaborative approach offers some interesting future possibilities. OpenDOAR lists 755 repositories in Europe, including 52 in France, 139 in Germany, and 169 in the UK. Each one of these repositories will have required resources from each of the institutions setting them up. Each will require server space and time to keep them running and up-to-date. It is easy to imagine a centralised (say national) resource that offers a repository service to all institutions. The virtual space could be customised to represent the branding of each institution, so providing the benefits to the institution described above, but there may be system savings to be made, plus synergies in bringing all of a country’s research together in one place. The disadvantage is that centralised services require centralised funding. To date, institutions have been able to set up repositories with relatively small funds and, although the cumulative total may be large, a national repository service would make this sum explicit. In a time of international financial insecurity (especially in the public sector) this may not be the moment for such large infrastructure projects! The economics of Gold OA In many ways the economics of gold OA are more complex than those of green OA and present some interesting issues for the digital library. The simplified version of the traditional, pre-Internet business model for journal publishing is that individual libraries would purchase individual journal titles, often through a subscription agent. This is a simplification as journals have always had mixed revenue streams – certainly institutional subscriptions made up the bulk of revenues, but there were non-negligible revenues (especially in some subject areas) from reprints, page and figure charges, advertising, personal and members’ subscriptions, etc. In the electronic era the bulk of revenue for large publishers comes from ‘big deals’, but again there is a variety of revenue streams. I labour this point to highlight that the shift to OA is in many ways not so radical: mixed revenue streams will still exist, except without electronic subscriptions. In a Guide to Business Planning for Launching a New Open Access Journal (page 14), Crow and 150
BPDG_opmaak_12072010.indd 150
13/07/10 11:51
Business models for Open Access publishing
Goldstein (2003) provide a taxonomy of potential revenue sources for OA journals, including: Author submission/publication charges or article processing fees – Off-print sales – Advertising – Sponsorships – Co-hosting of conferences and exhibits – Journal publication in off-line media (e.g., print) – Value-added fee-based services – Foundation grants – Institutional grants and subsidies – Government grants – Gifts and fundraising – Voluntary contributions – In-kind contributions Obviously, not all of the above revenue streams will work for all titles in all subject areas, but there are currently over 4500 OA journals listed in the Directory of Open Access Journals which take advantage of one or more of the above streams. The key is to find the correct combination that works for the size of journal and its subject area. A pattern is beginning to emerge whereby large, biomedical and physical science journals rely primarily on publication fees and smaller arts and humanities titles rely on subvention (either direct or indirect) and grants. (It may be that this split will develop further between journals publishing articles resulting from funded researchers and those which do not – a topic I discuss more below.) One revenue stream for OA journals which has caused a certain degree of controversy over the past few years is OA ‘memberships’. A few variations exist, but basically the library pays a membership which allows either for publication charges to be waived or for reductions in the charges for authors at those institutions which are members. Some larger OA publishers, such as BioMed Central and PLoS, have such models. This model was particularly valuable in the earlier days of OA journals when authors were not necessarily used to paying publication charges. The controversy has arisen as some libraries feel that they are paying more than the value of the discount and some publishers have changed the model in what has been viewed as an abrupt way. A large part of this problem has come about as both publishers and libraries feel their way forward in a new untested environment. It may be that this type of model lasts only as a transition model and will be phased out as more robust means are developed of ensuring that publication charges can be paid. Moves towards such robust means of enabling payment of publication charges are being undertaken by a number of universities and funders. Some institutions have set up publication charge funds, and in some cases these funds are being managed by the library. The question for the library community is how much it wishes to be involved in these funds in the future. As of today, the funds tend to be rather small and the number of calls upon them is not great. Although the number of articles in OA journals is growing, the vast majority of authors still publish in subscription-based journals and as yet have 151
BPDG_opmaak_12072010.indd 151
13/07/10 11:51
David C. Prosser
no need to call on OA publication charge funds. But as the proportion of OA papers grows there will be greater demands for funding. This will, inevitably, lead to potentially difficult decisions regarding the allocation of finite budgets. Guidance on who will be eligible for funds and at what level will need to be developed, but at some point the administrators of the funds will have to say ‘no’ to authors requesting money. So, the location of publication charge funds is an open question, and the answer may depend as much on local funding regimes as on the interests of librarians. Irrespective of where the money is held, there is the matter of where the money comes from. It could be argued that in the move from subscriptions to OA library budgets should be restructured to move from acquisitions to payment of publication fees. This is a superficially attractive idea but will almost certainly not work in the short term or in isolation for individual institutions. Subscriptions pay to bring research articles from outside the institution to readers within the institution. Publication fees pay to disseminate the research at an institution to the world beyond. These are two very different functions and it is not clear that funds for one could easily be transferred wholesale to the other without a massive, coordinated international effort. One of the great problems over the past 50 years has been that library budgets have not kept pace with increases in research funding. So as the number of articles published increases by 3% on average each year, it is harder and harder for any library to maintain a comprehensive collection (and this is before large inflation-busting price rises imposed by some publishers). Big deals have to some extent masked the problem over the past decade as libraries have had access to a large number of titles for not significantly greater sums of money. But there is a growing realisation that even ‘moderate’ price increases for big deals are outstripping increases in library budgets (if budgets are increasing at all), and over the next few years we will see a return to the wearyingly familiar pattern of decreasing numbers of titles being available to readers. So, any publication charge fund must be in proportion to the size of the research endeavour within the institution and must scale with any increase in research funding. Increasingly, funders are acknowledging that dissemination costs (and, by implication, publication costs) are legitimate research costs, and so are allowing research grant monies to be used to pay article processing charges. In the UK the Wellcome Trust has gone a step further and provided institutions with specific funds upon which authors may draw to pay OA publication fees. Also in the UK, in 2009 the Research Information Network (RIN) and Universities UK (UUK) produced a guidance note and made a number of recommendations to institutions, publishers, research funders and authors on the payment of OA publication charges. Although focused on the UK the general conclusions have wider relevance. In the note two funding streams for payment funds are identified. The first is through direct grants, so the researchers use part of the grant they receive to pay for publication. This has the advantage that it directly scales with the research funding and ties publication in as a research cost. There are, however, two practical difficulties. Firstly, some funders do not allow funds to be retained beyond the end of the grant period and publication 152
BPDG_opmaak_12072010.indd 152
13/07/10 11:51
Business models for Open Access publishing
can come many months after the official end of the research project. Secondly, some researchers (especially, but not exclusively, in the arts and humanities) receive little or no direct research funding. For this reason, RIN and UUK recommend a second funding stream – the setting-up of publication charge accounts which are funded from the overheads claimed from funding bodies for the indirect costs of the research. These accounts could then be used for articles by un-funded authors and authors whose projects have been completed. While the details of who holds publication charge funds and the financial flows need to be resolved, it is clear that the principle is relatively simple. There should be an association between the funding of research and the funding of the dissemination of that research. For OA journals which rely on sources of funding other than publication charges then the principles become rather more complex. Journals that do not charge publication fees (and over half those listed in the DOAJ make no such charge) tend to be smaller on average than those that do. They often have less functionality and are supported by the host institution, societies, or by direct governmental grant. However, the authors and readers who use a journal are rarely from one institution, society or country. Ironically, one of the perceived advantages of OA is exactly this wide global readership, which attracts a global distribution of authors, making local journals increasingly international. A sponsor who supports a journal is therefore supporting not just their own local or national community, but an entire research community worldwide. Of course, this is not necessarily a problem. Some sponsors will see wide exposure as a benefit and will be happy to contribute to support the journal. Or they may believe that international coverage is a happy side-effect of the benefits brought by OA locally, and so is worth the expense. Others, however, may feel that an international benefit is not worth the local cost. For these journals to continue and thrive the community will need to find mechanisms in which the operation costs are shared. Digital library as digital publisher Some libraries have taken a further step in the provision of electronic journals and set themselves up as publishers. One example is the library of Utrecht University in the Netherlands. The library formed a publishing department, entitled Igitur, with the following mission: “assists scientists, research groups and scientific communities in determining an optimal publication strategy and in the realization of total e-publishing solutions in close collaboration with the client. Increasing access to scientific information is central in Igitur’s strategy” For Utrecht, the aims and mission of Igitur correspond with those of the library and the institution as a whole. Igitur brings the institutional repository together with peerreviewed journals and other scholarly research outputs, integrating them to provide access to material that is considered important by Utrecht’s researchers and students. In some 153
BPDG_opmaak_12072010.indd 153
13/07/10 11:51
David C. Prosser
ways, this could be considered a model for the University Press of the 21st Century. A press where the priority is showcasing the intellectual wealth of an institution – either directly through research outputs produced by that institution (via the repository) or indirectly through the running of peer-review journals. Again, the journals published by such a press will need viable business models to survive, but some institutions may be willing to subsidise at least part of the costs of the Press in return for the kudos and visibility that this type of activity brings to the institution. A greater disruption? The integration of repository services and journal publishing within a single department of a library, as described above, is an administrative union. However, it is possible to think of a more concrete interaction between repositories and journals in the form of ‘overlay journals’. The concept of overlay journals has been around for a long time now, but has not yet taken off. The idea is that articles are deposited in a repository and then peer-review services give accreditation to them. The articles could be dispersed across a variety of repositories or within a single, large, disciplinary repository. The overlay journal would not need a site of its own and so would not have to worry about hosting charges – it would rely on the repositories to provide access and archiving. This could potentially lead to an overall reduction in the costs of scholarly communications. In 2008 the RIOJA project produced a demonstrator of how such an overlay journal could work based on papers deposited in the arXiv repository. The technology is rather simple so the fact that the idea has not taken off suggests that the issue is sociology. Publication is about much more than communication (not least in regard to researchers’ professional standing) and that ensures that there is an inherent conservatism in the system. However, as authors and readers become more comfortable with online-only journals and repositories, overlay journals may be seen as an attractive low-cost option. As the location of many repositories is the university library, libraries may become more involved in facilitating the peer review of scholarly articles rather than just providing access. The future of the digital library In many ways, the issues discussed in this chapter relate to relatively simple financial matters. The assumption is that there is a certain amount of money spent at the moment on scholarly communications – mainly paying for the purchasing and licensing of access to electronic information. In an OA environment this money shifts to the purchasing of services (such as peer review) and access is free. The issue becomes how to engineer that shift in funding and how to develop sustainable business models. However, when thinking about business models for digital libraries it is possible to envisage a wider future. For OA is a means to an end, not an end in itself. When all (or at least the vast majority) of material is OA the important question becomes what are we 154
BPDG_opmaak_12072010.indd 154
13/07/10 11:51
Business models for Open Access publishing
going to do with this material? How do we integrate OA materials and, by extension, the library into the researchers’ workflows? Are we looking at the shift from a relatively passive function (library makes material available to researcher to use) to a much more active and cooperative function where the library becomes a partner in the research process? To do this we must look at models where OA content forms the foundation to which functionality is added to create resources that serve the community: by not just providing content, but a complete research environment. As institutions, funders, and governments become more interested in collaborative, inter-disciplinary research and the possibilities of e-science there will be more emphasis placed on interactive research tools which enable such collaborative work practices. Academic libraries have been very active in creatively redefining their physical space (in particular to create innovative learning environments). The next challenge is to redefine the virtual space they occupy so that the library can engage directly with researchers. There are already a number of intriguing examples of collaborative research environments. One of these is nanoHUB – a service which provides resources for nanotechnologists and allows them to interact with each other. It has a wide variety of resources, from lectures and presentations to areas in which to run simulations. The platform allows users to add functionality to it. This makes it very different from the model of portals that were popular 10 years ago. Those were mainly passive – providing information to users – whereas the new services are expected to embrace web 2.0 functionality to allow users to become creators. The role of the library in this model is not guaranteed. In the specific example of nanoHUB above it is not clear that an academic library is involved. It is possible to envisage that publishers, abstracting and indexing services, scholarly societies, and new un-dreamt of start-ups might wish to move into this environment. So, it is up to the library community to decide whether this is an area that it wishes to move into. And if so, the challenge will be to develop sustainable business models which allow such services to thrive. Summary – Green Open Access refers to self-archiving in open repositories. Gold OA refers to free access at point of use whereby publication costs are sourced from other revenues. – Open Access is assuming increasing importance and presents a number of challenges to the business models of the digital library – Care needs to be taken to differentiate between Green and Gold OA and the different business models needed to support them – For Green OA the business case for supporting institutional repositories needs to be made locally, and the best arguments will depend on local strategy for promotion of research – For Green OA options or opportunities exist for collaborative solutions – For Gold OA greater clarity in financial flows for payment of publication charges is needed: one model is the subscription, membership or pre-payment model 155
BPDG_opmaak_12072010.indd 155
13/07/10 11:51
David C. Prosser
– The research community must find new ways to fund OA services at local or regional/national level which find the right balance between local strategic aims and international accessibility – There is an opportunity for libraries to manage OA publishing – The overlay journal is an interesting collaborative possibility – Finding business models to support sustainable OA projects is just the start – the real challenge is to find models for research environments which make full use of OA material References Budapest Open Access Initiative (2002): available at http://www.soros.org/openaccess/read.shtml (accessed 7 January 2010) Crow, R. and Goldstein, H. (2003), A Guide to Business Planning for Launching a New Open Access Journal (page 14), Open Society Institute, New York: available at http://www.soros.org/ openaccess/oajguides/html/business_planning.htm (accessed 7 January 2010) Research Information Network (RIN) and Universities UK (UUK) (2009), Paying for open access publication charges, London: available at http://www.rin.ac.uk/our-work/research-fundingpolicy-and-guidance/paying-open-access-publication-charges (accessed 7 January 2010) Shulenberger, D. (2008), SPARC Digital Repositories Meeting: available at http://www.arl.org/ sparc/meetings/ir08/closing_keynote (accessed 7 January 2010) Swan, A. (2007), In A DRIVER’s Guide to European Repositories, Amsterdam University Press: available at http://www.driver-repository.eu/PublicDocs/D7.2_1.1.pdf (accessed 7 January 2010)
Web resources arXiv - http://arxiv.org/ (accessed 7 January 2010) PubMed Central - http://www.ncbi.nlm.nih.gov/pmc/ (accessed 7 January 2010) OpenDOAR - http://www.opendoar.org/ (accessed 7 January 2010) Repository 66 - http://maps.repository66.org/ (accessed 7 January 2010) Open Repositories - http://www.openrepository.com/ (accessed 7 January 2010) EPrint Services - http://www.eprints.org/services/ (accessed 7 January 2010) White Rose Research Online - http://eprints.whiterose.ac.uk/ (accessed 7 January 2010) Directory of Open Access Journals - http://www.doaj.org/ (accessed 7 January 2010) Igitur - http://www.uu.nl/EN/library/igitur/Pages/default.aspx (accessed 7 January 2010) RIOJA - http://www.ucl.ac.uk/ls/rioja/ (accessed 7 January 2010) nanoHUB - http://nanohub.org/ (accessed 7 January 2010)
156
BPDG_opmaak_12072010.indd 156
13/07/10 11:51
14 DIGITAL LIBRARY METADATA Stefan Gradmann
Introduction The simplest and broadest definition of metadata states that they are “data about data”. In this sense, metadata can refer to almost anything in the world provided the referenced item can be conceived as ‘data’. A more specific and helpful definition focusing on Digital Libraries is that metadata are structured sets of statements on digital information objects which enable users to identify, retrieve, manage and use such information objects. The statements made in metadata can pertain to different characteristics of the information objects held in Digital Libraries (they may for instance refer to semantic, technical or administrative aspects) and may vary in granularity (between extremes such as the 15 attributes of the Dublin Core Metadata Set (DC) and the hundreds of fields and subfields of MARC21). And, finally, metadata can be part of the information objects or stored separately and then include a link to the objects they refer to. State of the art Types of metadata As in a recent JISC Technology and Standards Watch report (Gartner 2008) the following categories of metadata can be distinguished with respect to the object characteristics they refer to: – descriptive metadata: these are similar to the traditional catalogue record and contain statements on semantic aspects of the objects enabling retrieval and intellectual assessment (for differences between descriptive metadata and catalogue records one could refer to Gradmann (1998)) – administrative metadata: the information necessary to curate the object, including as sub-categories • technical metadata: all technical information (e.g. file format or file size) necessary to store and process the object • rights management: declarations of rights held in the object and the information necessary to restrict its delivery to those entitled to access it • digital provenance: information on the creation and subsequent treatment of the digital object, including details of relevant actors for each event in its lifespan
157
BPDG_opmaak_12072010.indd 157
13/07/10 11:51
Stefan Gradmann
– structural metadata: information representing the internal structure of an item so that it can be rendered to the user in a sensible form as well as effectively quoted, enabling external references to its microstructure elements. Additionally, object identifiers should be included here, since they can be considered as a specific meta-statement pertaining to a given object: instead of relating to any particular property they pertain to the object as a whole and imply a statement on its identity. Examples of descriptive metadata frameworks include Dublin Core (http://dublincore. org/) and MODS (Metadata Object Description Schema, http://www.loc.gov/standards/ mods/) - the former being an extremely simplified set of just 15 attributes like creator or rights, whereas the latter is derived from the rich MARC format and provides a more granular set of approximately 80 attributes. Technical metadata examples include such complex standards as TEI (Text Encoding Initiative, http://www.tei-c.org/) or DocBook (http://www.docbook.org/) for textual objects, Metadata for Images in XML (MIX, http://www.loc.gov/standards/mix/) for still images, MPEG 7 (http://www.chiariglione.org/mpeg/) for audio objects or MP4 for video. Statements on rights can be expressed using standards such as the XrML (eXtensible Rights Markup Language, http://www.xrml.org/), the Open Digital Rights Language (ODRL, http://odrl.net/) or again Online Information eXchange (ONIX, http://www.editeur.org/8/ONIX/). Digital provenance statements can be based on rich standards such as PREMIS (PREservation Metadata Implementation Strategies, http:// www.oclc.org/research/projects/pmwg/) and structural metadata include, once again, TEI based statements, but also emerging standards such as Resource Maps as defined by OAI-ORE (Object Reuse and Exchange, http://www.openarchives.org/ore/). Finally, object identifier frameworks include examples such as the Digital Object Identifier (DOI, http://www.doi.org/) or Uniform Resource Names (URN, http://tools.ietf.org/html/ rfc2141). Metadata interoperability The following is a MARC metadata fragment provided by the Library of Congress (and encoded as MARCXML): ?xml version=”1.0” encoding=”UTF-8”?>
[…]
Sandburg, Carl, 1878-1967.
Arithmetic / 158
BPDG_opmaak_12072010.indd 158
13/07/10 11:51
DIGITAL LIBRARY METADATA
A transformation of this fragment to DC yields the following result:
Arithmetic /
Sandburg, Carl, 1878-1967. And the same fragment rendered in MODS reads as follows:
Arithmetic /
Sandburg, Carl 1878-1967
creator
The example illustrates the fact that these and other metadata formats have originally been developed independently of each other and differ considerably in syntax and semantics; therefore one of the major problems in the domain always has been, and to some extent still is, the lack of interoperability of metadata. A promising approach with respect to syntactic interoperability seems to be the use of XML (eXtensible Markup Language) as a standard syntax for the expression of meta-statements. This has led to the establishment of METS (Metadata Encoding and Transmission Standard, http://www. loc.gov/standards/mets) which is expressed in XML and relies on the built-in mechanism for addressing external extensions, the so-called ‘name spaces’, thus identifying each of the standards listed above as a separate namespace and making them subsidiary schemas (which often are called extension schemas) of METS. This approach creates some syntactic interoperability across metadata formats but still does not address the issue of semantic diversity: it cannot, for instance, identify the semantic identity of in DC and of in combination with the subtag in MODS.
159
BPDG_opmaak_12072010.indd 159
13/07/10 11:51
Stefan Gradmann
This is attempted to some extent by another overarching approach, the Digital Item Declaration Language as part of MPEG-21 (DIDL, http://www.chiariglione.org/mpeg/ standards/mpeg-21/mpeg-21.htm), but for effectively addressing this issue other approaches are far more adequate (cf. below on RDF/RDFS). Yet another approach for creating interoperability across metadata resources is the OAI-ORE framework mentioned above: ORE enables the modelling of complex objects as aggregations of WWW resources which can be represented by a concept map. It remains unclear, for the time being, which of these approaches will finally be dominant, even though METS – given its librarian origins in the MARC community – currently is the most popular option in the librarian community (cf. below). A further common approach is currently emerging and may have substantial advantages over plain XML metadata representations, namely expressing metadata statements in RDF and using RDFS for modelling the relevant vocabulary. This is not only a WWW transparent way of expressing metadata but also enables simple semantic web inferencing operations, and thus a kind of simple ‘reasoning’ support which may – among other things – be very useful for automated generation of additional statements. Moreover, the use of RDFS for vocabulary definition may be the real solution for the still widely lacking semantic interoperability of metadata standards (cf. below). Finally, the recently published draft of the Rules for Description and Access (RDA, http://www.rda-jsc.org/rda.html), which are to replace AACR2 and have a much more general scope than traditional librarian cataloguing rules, may have a significant unifying potential and is implemented using RDF in many current testing scenarios. Although the current situation thus may seem puzzling and irritating to a non-specialist we hope to make more sense of it below! Business planning for digital library metadata It should be clear from the above that digital libraries essentially depend on metadata to be operational at all: in the case of textual information objects users may rely on search engine or text mining technology as a low quality substitute – but for image, audio or multimedia objects there currently is no working alternative to metadata based search, retrieval and manipulation of such objects. Metadata thus are a critical factor of digital library operations, but at same time are often considered an expensive one. However, this may be a highly speculative statement: while we know at least some of the major cost factors influencing traditional cataloguing of electronic resources (cf. the implementation of a cost calculation algorithm at http://www.serialssolutions.com/360-marc-updatescost-calculator/) very little is known about the actual cost of generating and integrating metadata pertaining to digital library content. The cost/benefit ratio in this case is largely influenced by expectations as to the functional quality of operations building on metadata, as well as by the type of objects present in a digital library: in the case of rather low expectations it may be possible to extract most or all metadata features (from textual objects at least) by automated means – but, given a high percentage of non-textual objects and/or very high quality standards, it may be 160
BPDG_opmaak_12072010.indd 160
13/07/10 11:51
DIGITAL LIBRARY METADATA
necessary to generate most or almost all metadata features manually, thus requiring a human intellectual effort comparable to that of traditional cataloguing. The problem here is that digital resources, particularly born digital, are and increasingly will be present on a hugely greater scale than traditional resources, and the human effort required for metadata creation may simply be unavailable. However, this effort evidently varies considerably depending on the granularity choices being made: creating a DC record with a maximum of 15 attributes for instance may be significantly cheaper than creating a MODS representation of the same digital object. Finally, relevant metadata resources may be available already for many objects present in digital libraries – as part of other digital libraries or as ‘Linked Data’ on the WWW, freely accessible or at a price. A specific case in this respect is book digitization results as part of digital libraries, where there are almost certainly external metadata available in most cases from which it is technically possible to derive metadata for the digitized objects – but the cost of identifying these as well as the costs for conversion and adaptation may be considerable and the cost/benefit ratio much depends on how neatly digital libraries want to distinguish metadata for the digitized objects from cataloguing data for the originals. Prospects for the foreseeable future As stated above, the current situation may at first sight look puzzling, and it remains unclear, for the time being, whether our example fragment will look as follows in the future:
[…]
a Arithmetic /
245
Sandburg, 161
BPDG_opmaak_12072010.indd 161
13/07/10 11:51
Stefan Gradmann
Carl a
100
We do not know for sure today whether plain XML or RDF representations will be the method of choice in the future. Neither can we be sure about the actual uptake of RDA and the way RDA based metadata will coexist with legacy data. However, the consequences of this uncertainty should be less worrying than one might at first assume, because of a clear tendency that can be observed in the development of digital library metadata standards which are constantly moving away from librarian standardization niches such as MARC and other purely librarian creations into increasingly generic standardization environments such as those controlled by the standardization body of the WWW, the World Wide Web Consortium (W3C). This tendency implies that choices to be made in the future and which rely on these generic standardization approaches are less likely to end up in disruptive scenarios with problematic migration paths. Instead, such choices will always share migration paths with lots of other data and applications on the WWW. At the same time the problems and costs of retrospective conversion or migration should never be underestimated. One essential characteristic, therefore, of economically sustainable digital libraries is standardized, interoperable metadata which are well integrated in the overall information architecture of the WWW. A second aspect affecting future developments is the fact that digital library resources and their metadata will increasingly coexist with traditional resources and associated metadata in what is often called ‘hybrid library’ settings (cf. Rusbridge 1998). Among other things this means that when referring to authority data for named entities such as persons, corporate bodies or for concept resources digital library metadata should point to the same resources as do traditional cataloguing data. References to personal names thus could point to the Virtual International Authority File (VIAF, http://viaf.org/) and concept links could point to the Library of Congress Subject Headings (LCSH, http:// id.loc.gov/authorities) – this today is not the case in most digital library metadata and could lead to inconsistent usage scenarios in the future. Again, this represents a substantial challenge for the management of retrospective metadata creation. A third important aspect is the increasing presence of digital library metadata in hetero geneous settings such as Europeana (http://www.europeana.eu/portal/), with co-presence of objects and metadata from conceptually very different cultural heritage domains such as museums, archives and audio-visual collections. This also implies that additional metadata frameworks based on sometimes very different conceptions of objects’ characteristics 162
BPDG_opmaak_12072010.indd 162
13/07/10 11:51
DIGITAL LIBRARY METADATA
and context defined in non-librarian metadata frameworks such as Encoded Archival Description (EAD, http://www.loc.gov/ead/) or CIDOC – CRM (http://cidoc.ics.forth. gr/) will be present there, together with digital library metadata. Creating a coherent user experience under such conditions will be another serious challenge for metadata model designers and for those implementing the models. Conclusion It should be seen as wholly positive that the “silos” of isolated digital libraries should be left behind. It is essential to think beyond the ‘collection’ and the narrower community of librarian, archivist or curator in order to show the context of and relationships between different types of objects. All this has substantial implications for metadata frameworks, but the final result will be much more helpful in the long run. Summary – Metadata are critical for Digital Library operations, and essential for any operations on non-textual items – Aligning to generic standards of metadata creation and use is essential for DL business planning – Such standards currently are built on XML and/or RDF – The dynamics and instability of the surrounding technology make smooth and inexpensive migration paths a critical factor for DL business planning – The extent to which metadata creation, conversion or migration requires human intervention or can be done automatically is a key business planning factor – Metadata models cannot be carved in stone in a situation as digital libraries leave the institutional and community silos and become part of more generic and open web based information architectures. References Bekaert, J. et al. (2003): Using MPEG-21 DIDL to Represent Complex Digital Objects in the Los Alamos National Laboratory Digital Library. D-Lib Magazine, Volume 9, Number 11, November 2003. http://www.dlib.org/dlib/november03/bekaert/11bekaert.html (viewed 5 June 2010) Deegan, M. and Tanner, S. (2002): Digital Futures: Strategies for the Information Age. London: Library Association Publishing. Gartner, R. (2008): Metadata for digital libraries: state of the art and future directions. Bristol: JISC. http://www.jisc.ac.uk/media/documents/techwatch/tsw_0801pdf.pdf (viewed 5 June 2010) Gradmann, S. (1998): Cataloguing vs. Metadata. Old wine in new bottles? In: Proceedings of 64th IFLA General Conference, 16-21 August 1998 in Amsterdam (Netherlands), IFLA. http:// www.ifla.org/IV/ifla64/007-126e.htm (viewed 5 June 2010)
163
BPDG_opmaak_12072010.indd 163
13/07/10 11:51
Stefan Gradmann
Johnston, P. (2004): Metadata Sharing and XML. In: Good Practice Guide for Developers of Cultural Heritage Web Services. Bath: UKOLN. http://www.ukoln.ac.uk/interop-focus/gpg/ Metadata/#section1 (viewed 5 June 2010) Kruk, S. et al. (2005): MarcOnt - Integration Ontology for Bibliographic Description Formats. DC2005, Madrid. http://dcpapers.dublincore.org/ojs/pubs/article/view/829/825 (viewed 5 June 2010) Rusbridge, C. (1998): Towards the Hybrid Library. D-Lib Magazine. July/August 1998. http:// www.dlib.org/dlib/july98/rusbridge/07rusbridge.html (viewed 5 June 2010) Styles, R. et al. (2008): Semantic MARC, MARC 21 and the Semantic Web. In: Linked Data on the Web (LDOW). http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol369/paper02.pdf (viewed 5 June 2010)
164
BPDG_opmaak_12072010.indd 164
13/07/10 11:51
Business Planning for Digital Libraries Case Studies
BPDG_opmaak_12072010.indd 165
13/07/10 11:51
BPDG_opmaak_12072010.indd 166
13/07/10 11:51
15 FinELib: AN IMPORTANT INFRASTRUCTURE FOR RESEARCH Kristiina Hormia-Poutanen and Paula Mikkonen
Activation and establishment of the FinELib Consortium’s operations In 1997, the Ministry of Education launched FinELib, the National Electronic Library, in accordance with the Government’s Information Society Programme. The purpose of its activities during its first years of operation was to support higher education, research and learning in Finland. The basic goals of FinELib were to increase the amount of electronic information available to users, improve information retrieval from the Internet and develop a graphical user interface providing access to heterogeneous information resources available to users from various sources. The main goals have been stated in the vision of the Consortium. The FinELib Consortium is a partner in ensuring the international pole position of Finnish science, research and teaching. The Consortium and its service unit also collaborate to anticipate customers’ needs. The service unit develops flexible service solutions and is an active national and international strategic partner. The work of the FinELib service unit is based on national and international co-operation and continuous development. To facilitate the attainment of the National Electronic Library Programme’s goals, the FinELib Consortium, consisting of all the universities in Finland, was formed. For the period 1997-1999, its activities were of a project nature. From 2000 onwards, its operations have become a standard part of the services provided by the National Library of Finland. During the first years of operation, the principles guiding these activities – covering such topics as licensing policy, share of central funding, selection of resources to be licensed, development activities and cooperation with the library network – were formed (Hormia-Poutanen, 2002). In 2002, FinELib was evaluated at the behest of the Finnish Higher Education Evaluation Council. The international assessment group’s report “Knowledge Society in Progress; Evaluation of the Finnish Electronic Library – FinELib” was published in the spring of 2003 (Varis & Saari, 2003). The assessment demonstrated that FinELib had satisfactorily attained its stated goals, and that it enjoyed a solid reputation among its libraries. The assessment group targeted its recommendations for further action to the Ministry of Education, FinELib and libraries. One of the most important recommendations was the preparation of a medium-term strategy focusing on the confluences with national information society policies. Attention was also paid to the tailoring of services, as well as 167
BPDG_opmaak_12072010.indd 167
13/07/10 11:51
Kristiina Hormia-Poutanen and Paula Mikkonen
the development of cooperative mechanisms and communications. The next assessment of the National Library of Finland will take place in 2010. Steering mechanism and strategy of the FinELib Consortium Today (2009), FinELib is a consortium consisting of 108 members. All Finnish universities, polytechnics and public libraries, as well as 39 research institutes and special libraries, belong to it [the FinELib Consortium]. The FinELib Consortium acquires international and domestic electronic journal and book packages, reference works and databases to support research, teaching and learning, and promotes their availability and use through a national search interface (Nelli portal). The FinELib Service Unit negotiates licence agreements centrally on behalf of its member organizations. The FinELib Service Unit is part of the range of services provided to libraries by the National Library of Finland. The main principles guiding the management of the consortium have been defined in the Memorandum of Understanding. The responsibilities of the National Library as the service provider and the consortium’s member organizations as the customers have been defined in service agreements which cover the licensing of e-resources and the maintenance of the national information retrieval portal Nelli. After the initial goals of FinELib were achieved, a strategy extending to the year 2015 was formulated (FinELib Strategy 2007–2015). FinELib has five strategic aims: meeting the service needs of customers, ensuring the availability of information, applying innovative technologies, fostering recognition and utilizing high-level expertise. The strategies specified in the second strategic term already underway have significantly boosted the effectiveness of the Consortium’s operational planning, direction and reporting. The concrete yearly actions based on long-term strategic aims are defined in the annual Action Plans (FinELib Action Plan, 2009) approved by the Steering Committee and the Consortium. Past Consortium actions are regularly reported in the annual overview report (FinELib Annual Report, 2008). Libraries in Finland are used to working within their own library sector, and funding is also allocated to each sector separately. FinELib is one of the first programmes where different types of organizations are working closely to exploit the synergetic benefits. Because the FinELib Consortium consists of different library sectors with different service needs, it was necessary, from the very outset, to create a workable administrative structure and transparent decision-making process which would give each library sector an equal opportunity to influence the development of services. The National Library of Finland and the library network have jointly developed a steering system for FinELib through which the financiers of the activities and the customer organizations can influence the Consortium’s operations (Table 1).
168
BPDG_opmaak_12072010.indd 168
13/07/10 11:51
FinELib: AN IMPORTANT INFRASTRUCTURE FOR RESEARCH
FinELib Consortium, Operational Control – Strategy of the FinELib Consortium – Operational principles of the Consortium – Service agreements between customer organizations and the National Library of Finland – Groups directing operations • Steering Group • Consortium Group • Expert Groups – Consortium’s operational plan, including finance – Consortium’s Annual Report Table 1: FinELib’s Steering System
The cooperative and decision-making bodies managing the common acquisitions of e-resources for the National Library of Finland and the Consortium’s member organizations are FinELib’s Steering Group, Consortium Group and Expert Groups. The Steering Group is a group of decision-makers representing the FinElib Consortium’s member organizations, the Ministry of Education, the end-users of online resources, and the Consortium’s significant partners. Each of the four library sectors comprising the Consortium – universities, polytechnics, research institutes and public libraries – is represented in the Steering Group. The Consortium Group, the cooperation body of the FinELib Consortium’s member organizations, draws up proposals for the Steering Group and the member sectors’ consortia. The group is made up of the directors of the libraries belonging to the Consortium, as well as the National Library of Finland’s representatives. The expertise of the representatives of the Consortium’s member organizations benefits the entire Consortium and its development, also through the expert groups. There are seven subject-specific expert groups focusing on the humanities, culture, natural resources, economics, technology, medical sciences and social sciences. Public libraries also have their own expert group. These expert groups discuss available e-resources and make prioritized proposals for new acquisitions of electronic resources for the National Library. Besides coordinating the operations of the above-mentioned groups, the National Library of Finland organizes several annual training and discussion events for libraries, and participates in events domestically and internationally. Acquisition process and principles The FinELib Service Unit provides the Consortium’s member organizations with services relating to the licensing and implementation of e-resources. The Service Unit concludes new licensing agreements for e-resources and renews the currently valid contracts on behalf of the Consortium’s member organizations. Each member organization decides for itself to which electronic resources it wishes to subscribe. Organizations register for the subscriptions in a two-phase process. The subscription process for each e-resource starts as soon as pricing and other key elements of the agreement have been negotiated. During the first phase, organizations have three weeks to inform FinELib of their initial interest 169
BPDG_opmaak_12072010.indd 169
13/07/10 11:51
Kristiina Hormia-Poutanen and Paula Mikkonen
in the e-resource. This phase is necessary for determining which libraries are interested in which resources, and for helping to set pricing. During the following three-week period, organizations enter their binding subscriptions. The licence agreements take effect on 1 January. Concerning the acquisition of e-resources, FinELib adheres to the licensing policy and principles approved by the FinELib Consortium. The licensing policy is based on internationally approved licensing principles, with additions and adjustments made by the FinELib Consortium. In negotiations, FinELib pays special attention to contractual and access-related issues such as permitted uses, liability issues, the availability of usage statistics, compatibility with the Metalib/SFX portal, perpetual access and open access. Member organizations obtain access to an administrative database for online resources through which they can register as subscribers to the resources and monitor the prices, user statistics, terms of usage and other contract conditions of the resources to which they subscribe. The database is also used to maintain, for example, the member organizations’ contact information, the number of potential users (FTE), IP addresses, and the information required for the national information retrieval portal (Nelli). The Service Unit arranges the training of users of the acquired resources jointly with publishers or vendors. It also organizes national seminars and training sessions for experts related to portal and the acquisitions of resources. The National Library of Finland designs and carries out user and customer questionnaires for the use of the FinELib Consortium. The Service Unit also aims to influence the development of the sector by participating, as the Consortium’s representative, in national and international cooperation. FinELib is part of the national research infrastructure In 2008, FinELib was recognized as one of the 24 significant national level research infrastructures in Finland. The infrastructure survey launched by the Ministry of Education charted national research infrastructures as well as participation in international research infrastructures, and laid out a road map for future needs (National Level Infrastructures: Present State and Roadmap). The criteria for national-level infrastructures include demonstrable administrative structures and responsible personnel for the upkeep and services of the infrastructure as well as an annual report or similar account of the infrastructure’s activities showing its degree of use and effectiveness. Furthermore, the research infrastructure must be of scientific significance and its work must provide added value at the national or international level and be continuously used by a significant number of Finnish or foreign researchers. Fulfillment of most of the criteria is required to qualify. The scientific e-resources licensed by FinELib constitute a significant part of the resources utilized at Finnish universities and polytechnics. Based on statistics (Research Library Statistics Database) for the year 2008, FinELib acquisitions accounted for 77% of university libraries’ total e-resources acquisitions; the comparable figure in polytechnics was 62%. The licensing agreements concluded by FinELib cover 18,000 scientific online journals, 300,000 e-books, 130 reference databases, as well as hundreds of reference works. Because the subscription decisions of each FinELib Consortium member organization are made independently, the overall quantity of the resources acquired by the organizations 170
BPDG_opmaak_12072010.indd 170
13/07/10 11:51
FinELib: AN IMPORTANT INFRASTRUCTURE FOR RESEARCH
varies. The utilization of online resources in Finland is very intensive and it is increasing yearly (Figure 1). In 2008, approximately 11 million article downloads and 43 million searches were made for services acquired by the FinELib Consortium. The volume of article downloads grew in all library sectors from the previous year. The university sector accounted for nearly 90% of all article downloads. A total of 11 million searches was made through the national information retrieval portal, Nelli (FinELib Annual Report, 2008). Naturally the amount of e-resources has also grown since 2001. 12 000 000 10 000 000 8 000 000 6 000 000 4 000 000 2 000 000 0
2001
Universities
940 829
Polytechnics
2002 1 560 359
2003
2004
2005
2 542 522
3 342 898
4 608 491
2006
2007
2008
5 597 136
7 327 054
9 585 419
149 693
162 116
192 263
204 301
263 677
406 352
508 867
745 103
Research institutes
36 214
140 462
202 940
296 416
322 504
357 105
372 867
450 417
Public libraries
42 062
28 567
16 540
13 750
14 023
22 994
45 270
69 995
1 168 798
1 891 504
2 954 265
3 857 365
5 208 695
6 383 587
8 254 058
10 850 934
Total
Figure 1. Article downloads per sector between 2001 and 2008
Owing to the increase in user volumes, the price of article downloads has dropped for many resources. There are, however, substantial differences in price trends between various sectors and resources. The funding model of the Consortium is based on centralized funding from the Ministry of Education and self-financing from the Consortium members. During the first years of operation, government funding was available only for the universities. Nowadays, polytechnics and public libraries also receive government funding, but public libraries only for the FinELib services, not for acquisitions. In 2008, the total budget for the acquisition of online resources, including centralized financing and the organizations’ own self-financing share, totalled approximately EUR 15.1 million. Furthermore, the FinELib service unit has been expanded from one person in 1997 to an expert unit with a staff of 13 at the end of 2008. The staff comprises specialists in librarianship, law, procurement and information technology. Scientific e-Resources have made life easier for Researchers Customer surveys directed to the Consortium’s member libraries, as well as the questionnaires aimed at the users of e-resources, are an effective way to evaluate and demonstrate the profitability and influence of library operations. Besides uncovering development areas, the results of the user surveys also provide libraries with an overall 171
BPDG_opmaak_12072010.indd 171
13/07/10 11:51
Kristiina Hormia-Poutanen and Paula Mikkonen
view of customers’ degree of satisfaction with the libraries’ services. Beginning in 2000, the FinELib Service Unit has conducted annual user surveys. The ninth user questionnaire survey on e-resources provided by FinELib was conducted in April 2007. User surveys have been developed in collaboration with the University of Tampere’s Department of Information Studies. Questions posed to customers at universities, polytechnics, research institutes and public libraries measured quantitative e-resource utilization as well as the customers’ degree of satisfaction with these services. Data from the surveys have been used in many scientific studies and graduate theses over the past years (see e.g. Tenopir et al., 2008; Törmä & Vakkari, 2004; Vakkari, 2008; Vakkari, 2006; Vakkari & Talja, 2006; Vakkari & Talja, 2005). The questionnaires have been conducted solely in online format, and hence the sample is overrepresented by respondents already familiar with online services and the utilization of e-resources. The surveys show that over the years, the use of e-resources has increased especially in universities (Figure 2). In 2007, about 50% of the university respondents used mainly online resources in their work, compared to a corresponding figure of 20% for 2000. At polytechnics, the use of e-resources has not gained ground significantly over the last few years. One reason is that there is a lack of appropriate e-resources for polytechnics. Additional electronic course material is needed, however. Universiteit
E-resources almost
Polytechnics
Mainly e-resources
Research institutes
Equally e-resources and printed Mainly printed
Public libraries 0 %
20 %
40 %
60 %
80 %
100 %
Figure 2: Usage of e- and print resources. (FinELib user questionnaire 2007)
Dictionaries, reference books, reference databases and journals are the most used online resources. The majority of respondents were satisfied with the e-resources available. The results show clear differences between scientific disciplines in the way people viewed comprehensiveness. For example, almost half of those connected with the medical profession at universities felt that the resources covered over 80% of their needs, while only 5% of researchers working in the humanities felt the same way. Although there are plenty of resources available, the collections still do not meet users’ information needs in every respect. Acquiring additional e-resources was considered desirable. Foreign scientific journals, dictionaries, glossaries and encyclopaedias, as well as electronic university publications and Finnish e-journals, were wanted. Students prefer to have their course books in electronic form. At the top of the list for public library customers are online versions of newspapers and periodicals.
172
BPDG_opmaak_12072010.indd 172
13/07/10 11:51
FinELib: AN IMPORTANT INFRASTRUCTURE FOR RESEARCH
The survey also indicated how the use of e-resources affects research, as well as the work and study of respondents in the university sector. Respondents were asked to assess the importance of various alternatives, or whether they had any effect at all, based on the classification employed by Professor Carol Tenopir of the University of Tennessee. E-resources were deemed to have a significant impact, particularly at universities and research institutes (Figure 3). Most respondents at universities and research institutes stated that e-resources had made it considerably easier to find and obtain the material they need in their work, and to keep up with developments in their own field. For many respondents, e-resources have improved the resources available and saved considerable amounts of working time. Many researchers stated that e-resources have also improved the quality of their work and inspired new ideas. Many of the respondents at polytechnics experienced similar effects, but they were not considered as important as in universities and research institutes (see also Vakkari, 2008). Made it easier to find material Made it easier to obtain the material Expanded resources available Made it easier to keep up with Improved quality of work developments Reduced working time 0
1 0
2 0
Research institutes
3 0
4 0
5 0
Universities
6 0
7 0
8 0
9 0
Polytechnics
Figure 3: Perceived impacts of e-resources. (FinELib user questionnaire 2007)
Changes in the operational environment and customers’ needs drive licensing The last few years have witnessed dramatic changes in the operational environment of the FinELib Consortium’s member organizations. Universities are being merged into larger units to improve the international competitiveness of research capabilities. Organizational consolidations have taken place, and continue to take place, also in other customer sectors belonging to the Consortium. Many other changes, such as new online learning environments and the internationalization of research and training student groups, are also creating pressure for the licensing of online resources. This leads to an increasing need for tailored licensing. Some libraries want to acquire e-resources for smaller user groups (for example faculties, educational sectors or research groups) or to expand licensing for new user groups. 173
BPDG_opmaak_12072010.indd 173
13/07/10 11:51
Kristiina Hormia-Poutanen and Paula Mikkonen
Sector-specific boundaries tend to inhibit the widespread availability of online resources. On average, universities have significantly more opportunities to acquire resources than sector-related research institutes. Working jointly with the library network, the National Library of Finland has submitted a proposal to the Ministry of Education regarding the information resources required for competitive research. If the proposal is implemented, a core selection of scientific online resources could be provided extensively to various research organizations. This proposal also contains essential online resources, required by the public, which could be offered for use through public libraries. Centralized acquisitions of online resources are cost-effective In the spring of 2009 the FinELib Service Unit conducted a study jointly with the Helsinki School of Economics which clarified the cost-effectiveness of centralized acquisition of online resources and licensing. The assignment was part of the Ministry of Education’s project focusing on the structural development of Finland’s university libraries. The project clarified the cost-effectiveness of FinELib’s current consortium model, in other words centralized resource acquisitions and licensing, compared to a dispersed acquisitions and licensing model where the customer libraries themselves would manage their acquisitions. Cost-effectiveness was assessed by comparing licensing costs, as well as the qualitative factors related to the consortium model. Case studies of certain libraries indicated the consortium model to be the most economical alternative with respect to the costs of e-resources. On the other hand, the consortium model generated higher working expenses when taking into account the working time for licensing and consortium management required both by the FinElib Service Unit and the customer libraries. Owing to the limited sample, however, the results of the comparison should be interpreted cautiously. Qualitative factors – the so-called immaterial effects related to the consortium model – were also evaluated. These refer to the advantages and disadvantages, resulting from the use of the consortium model, the monetary value of which is difficult to determine. Besides lowering the prices of e-resources and reducing the time required for licensing in customer libraries, the consortium model also improved contract and user terms for the e-resources. All in all, the positive effects and factors significantly outweighed any negative aspects. Conclusion The international economic recession and the structural changes taking place in the Finnish university system are creating challenges for the licensing of online resources in the near future. However, the FinELib Consortium has proved to be a successful model of co-operation. FinELib’s position as a recognized and respected research infrastructure in Finland could not have been attained without the strong commitment of the library network.
174
BPDG_opmaak_12072010.indd 174
13/07/10 11:51
FinELib: AN IMPORTANT INFRASTRUCTURE FOR RESEARCH
Summary – FinELib is a national consortium covering higher education, research and public library sectors in Finland – FinELib has five strategic aims: meeting the service needs of customers, ensuring the availability of information, applying innovative technologies, fostering recognition and utilizing high-level expertise – It is steered by representatives of the member organizations, the Ministry of Education, the end-users of online resources, and the Consortium’s significant partners. Each of the four library sectors comprise the Consortium – universities, polytechnics, research institutes and public libraries – Agreements concluded by FinELib cover 18,000 scientific online magazines, 300,000 e-books, 130 reference databases, as well as hundreds of reference works. – In 2008, approximately 11 million article downloads and 43 million searches were made – Resources acquired for the FinELib consortium have substantially enhanced the availability of information in Finnish universities and research institutes – FinELib is considered an integral part of the national research infrastructure References FinELib Action Plan 2009. Available at http://www.nationallibrary.fi/libraries/finelib/finelibconsor tium/strategy20072015.html (accessed 28 August 2009) FinELib Annual Report 2008. Available at http://www.nationallibrary.fi/libraries/finelib/imapctan devaluation/annualreport2007.html (accessed 28 August 2009) FinELib Strategy 2007–2015. Available at http://www.nationallibrary.fi/libraries/finelib/finelibconsor tium/strategy20072015.html (accessed 28 August 2009) FinELib user questionnaire 2007. FinELib electronic resources - Broad-based user research: Scientific e-resources have made life easier for researcher workers. Available at http://www.kansalliskirjasto. fi/attachments/5l4xoyz0b/5AKGVngDe/Files/CurrentFile/User_questionaire_2007_final.pdf (accessed 28 August 2009) Hormia-Poutanen, K. (2002), “The National Electronic Library in Finland, FinELib - Licensing content for research and learning environments on the basis of user needs”, In Hannesdottir, S. K. (ed.), Global Issues in 21st Century Research Librarianship: Nordinfos 25th Anniversary Publication. Helsinki: NORDINFO 2002, 234–257. National Level Infrastructures: Present State and Roadmap. Ministry of Education. Helsinki 2009. Available at http://www.tsv.fi/tik/laaja_englanti_PDF.pdf (accessed 28 August 2009) Research Library Statistics Database. Available at https://yhteistilasto.lib.helsinki.fi/language.do? action=change&choose_language=3 (accessed 28 August 2009) Tenopir, C., Wilson, C., Vakkari, P., Talja, S. and King, D. (2008), ”Scholarly e-reading patterns in Australia, Finland and the United States: A cross country comparison”, World library and information congress: 74th IFLA general conference and council, 10-14 August, Québec, Canada Törmä, S. and Vakkari, P. (2004), “Discipline, availability of electronic resources and the use of Finnish National Electronic Library – FinELib”, Information Research: an international electronic journal Vol. 1 No. 10
175
BPDG_opmaak_12072010.indd 175
13/07/10 11:51
Kristiina Hormia-Poutanen and Paula Mikkonen
Vakkari, P. (2008), “Perceived influence of the use of electronic information resources on scholarly work and publication productivity”, Journal of the American Society for Information Science and Technology Vol. 59 No. 4 Vakkari, P. (2006), “Trends in the use of digital libraries by scientists in 2000-2005: A case study of FinELib”, Proceedings of the Annual Meeting of the American Society for Information Science & Technology (ASIST) 43, Austin (US). Medford, NJ: Information Today Vakkari, P. & Talja, S. (2006), “Searching for electronic journal articles to support academic tasks. A case study of the use of the Finnish National Electronic Library (FinELib)”, Information Research, Vol. 12 No. 1 Vakkari, P. and Talja, S. (2005), “The influence of the scatter of literature on the use of electronic resources across disciplines: A case study of FinELib”, In Proceedings of the 9th European Conference on Digital Libraries, Berlin & Heidelberg: Springer Varis, T. and Saari, S. (eds) (2003), ”Knowledge Society in Progress – Evaluation of the Finnish Electronic Library – FinELib”, Helsinki, Edita
176
BPDG_opmaak_12072010.indd 176
13/07/10 11:51
16 THE DIGITAL LIBRARY OF CATALONIA Lluís Anglada, Ángel Borrego and Núria Comellas
The creation of the Digital Library of Catalonia The Consortium of Academic Libraries of Catalonia (CBUCwas formally set up in 1996 with the aim of creating and maintaining the collective catalogue of the universities of Catalonia (CCUC) (Anglada 1999) and soon extended its activities to related fields, such as setting up interlibrary loan in 1997, and the creation of a database of journals’ tables of contents in 1998. Its first experience of joint purchasing was not of electronic information, but of barcodes for automating library loans. After that, the Consortium drew up a catalogue of the databases subscribed to by the member libraries of the CBUC in order to determine the degree of duplication in the purchases made by them with a view to negotiating discounts. In late 1997 the CBUC presented the project of consortial purchasing of databases to the vice-presidents responsible for research who approved it and expressed their interest in including electronic journals in the joint purchases. As an umbrella for its activities, the CBUC decided to create the Digital Library of Catalonia (BDC), which was to “provide a common core of electronic information for all users of the libraries of the CBUC. The BDC project was presented to the government of Catalonia in 1998 and funding was obtained for the period 1999 to 2001. The first licences were purchased in late 1998 and the first information resources subscribed to by the Consortium were made available in early 1999. These were local and international databases and the e-journals of Academic Press. The first products were purchased with government funding, but it was soon discovered that for further purchases the members of the CBUC would have to provide their own funding The initial selection criteria for determining which products could be licensed consortially were the interests of members, the conditions of access and the cost, but over the years the publishers’ pricing models turned out to be the major factor. Some products which clearly met the first two conditions failed to meet the third and therefore had to be rejected. Growth and evolution of the BDC Due to the technological development and the situation of the information market in 1998, the majority opinion was that the products included in the BDC would be bibliographic databases subscribed to only by members of the CBUC and installed locally on servers of the consortium. The situation evolved rapidly, however, and the period of creation and consolidation of the BDC between 1998 and 2001 showed us that the 177
BPDG_opmaak_12072010.indd 177
13/07/10 11:51
Lluís Anglada, Ángel Borrego and Núria Comellas
priority for joint licensing would be e-journals, that other institutions in addition to the universities of the Consortium were interested in joint licensing, and finally that the time for local installations had ended and that at the start of the 21st century information would be accessed by remote connection to the Internet. .
Nowadays a digital library means online access to full text and other sorts of digital objects, but in Catalonia in the late 1990s it meant databases on CD-ROM. Projects involving access to journals had been very limited in scope (for instance, no European library was able to participate in Elsevier’s TULIP project). However, peer reviewed e-journals grew from about 100 in 1995 to a significant number by 2000. The great step forward of the time was the local installation of bibliographic databases which could be accessed through a LAN or WAN network. However, the priority soon shifted towards full-text resources –mainly journals but also books. During the first few years of the BDC the licences were only for the member institutions of the CBUC, but they soon started to include universities which were not members. These were of two types: private universities in Catalonia and public universities which were geographically close to those of the CBUC. The inclusion of these institutions in the joint licences led to the creation of a new type of member, an associated member, which initially participated only in the licensing but soon started to participate in other cooperative activities.This stage of licensing all resources for all members came to an end in 2005, when some resources began to be licensed only for members which wished to subscribe to them. To complete the overview of the evolution of the BDC, we must mention two aspects, one involving finance and one involving content. We were initially convinced that joint action by the Catalan universities would attract “central money” from the government for licensing, but this was not the case. The BDC was created thanks to grants from the Generalitat (government of Catalonia). This was used as seed capital.The amounts were never substantial, but they allowed deals to be reached.This funding now covers approximately 15% of the cost of the licences, but is far from the 50% that we thought would be obtained in 1998. Finally, the vision of what a “digital library” is has also changed. Whereas all efforts were initially concentrated on licensing, it was soon seen that this activity had to be complemented by the electronic information resources produced in Catalonia and the creation of our own e-repositories (Anglada 2005). Pricing models and cost sharing The history of consortial purchasing has been dominated by the pricing models for electronic resources. In the first few years of its history the BDC had to undertake a series of negotiations with the suppliers in order to lay the basis for an understanding and establish a pricing model which was considered acceptable by the CBUC. Until that time the pricing model of journals and databases had been the unit price, and purchases had been made through agents. The possibility of sharing electronic information and purchasing jointly direct from suppliers changed the rules of the game. 178
BPDG_opmaak_12072010.indd 178
13/07/10 11:51
THE DIGITAL LIBRARY OF CATALONIA
Consortia and publishers slowly introduced a win-win system in which the exceptional increase in the information offered allowed the consortia to obtain additional funding to pay for it. This was not easy. Some of the models that reached us had been established in a North American context and were unsuitable for the situation in Catalonia. For example, there were “cost-per-campus” models which were inapplicable in central and southern Europe, where there are practically no campuses of the type found in English-speaking countries (Anglada 2002). It was now necessary to find a formula for cost sharing within the Consortium. As the members of the Consortium were all within the public sector, the attempt was made to find an egalitarian formula. Some members of the Consortium had traditionally spent more than others on bibliographic collections, so the attempt was made to reduce their expenditure and increase that of the members which had previously had fewer subscriptions. At the CBUC the Big Deal meant that the members which gained least in titles gained most in cost and vice versa. The cost-sharing formula has three elements: 20% is shared equally, 30% according to the size of the institution, and the remaining 50% according to past expenditure. Additionally, the funding obtained from the government is not used to reduce the cost of existing licences but to facilitate new deals. This means that the government funding is used to subsidize the costs of the universities which would have to pay extra for a licensing agreement due to the application of the formula. Those who spend most thus obtain a saving; those who spend least do not have an extra cost; and all members can obtain the desired content. The first Big Deal agreements were followed by pessimistic comments on their sustainability. It was argued that once the consortial deals had been established, the publishers would raise the prices and the libraries would be too weak to do anything about it. This did not prove to be the case. Although the first agreements were difficult to establish, renewal has been fairly straightforward and the annual increases have been limited. Furthermore, the contracts have become very similar and licensing, which was a slow process for the first consortial licences, has been simplified (Anglada 2003). In around 2005, when the BDC had reached a stage of maturity, the Consortium considered the possibility of changing the internal cost-sharing model. The reason was that the formula took into account previous expenditure on journals, which over the years had become a less relevant factor in joint purchasing. However, the attempts to change the cost-sharing model failed because any change meant that some members would pay more and others less. After examining different possibilities for new formulas and rejecting the possibility of applying usage as a cost-sharing parameter, it was decided to maintain the imperfect existing formula, which offered clear benefits for all, rather than to adopt a new one which would leave some libraries unable to participate in consortial deals. Finally, the first subscriptions were for the more interdisciplinary products which were most demanded by users, and this to some extent exhausted the capacity for purchasing based on the model of “all for all”. The initial subscriptions of the CBUC had been aimed at providing the Catalan university community with a common set of electronic information, but it was difficult to obtain a consensus for including highly specialized 179
BPDG_opmaak_12072010.indd 179
13/07/10 11:51
Lluís Anglada, Ángel Borrego and Núria Comellas
subscriptions in the BDC. To meet this need it was necessary to create a type of subscription catering for individual members. In these cases however, the subscriptions are not subsidized by government funding and the cost-sharing formula is not the egalitarian one used for the joint agreements. Consortial gains, usage and satisfaction Over the last few years several studies have been carried out to evaluate the performance of the Digital Library of Catalonia. In one of the first of these studies, Urbano et al. (2004) analysed the use of four electronic journal packages (Academic Press, Kluwer, MCB Emerald and Wiley) between 2000 and 2003 in order to determine the evolution in usage, the consortial gain – understood as the percentage of previously unsubscribed to titles which was available thanks to a consortial licence –and the dispersal in consumption. The results showed a great increase both in the number of journals subscribed to (from 195 to 1495) and in consumption (from 5409 to 93,367 article downloads per semester). The analysis of the consortial gain showed that 61.49% of the articles downloaded were from journals which had not been previously subscribed to by the institution from which they were downloaded. Finally, it was observed that 80% of the articles downloaded were from 35% of the available journals, thus showing a greater dispersal than that observed traditionally in paper journals. A survey of lecturers of the member universities of the CBUC was also carried out (Borrego et al. 2007). The aim was to determine their degree of knowledge of the collection of journals available online, whether they preferred the electronic or print format, demographic characteristics of users and non-users of electronic journals and the satisfaction of users with the collection of titles available. The questionnaire was distributed to all the academic staff of the member universities of the CBUC, and 2682 responses representing 18% of the population were obtained. The respondents showed a high degree of knowledge of the electronic journal collection: more than 95% of the academics stated that they were aware of the electronic collection of serial publications. Of the respondents, 52% stated that they used electronic journals exclusively or mainly and 28% that they used both media to a similar extent. However, the preference for the electronic format was linked to the discipline and age of the academics. It was higher among the teaching staff of sciences (biomedicine, engineering and exact and natural sciences) and among younger academics. When the respondents were asked whether they would be prepared to stop using the print version of a journal if the electronic version were available, 76% answered yes. With regard to the perception of the future use of electronic journals, 91% of the respondents considered that it would increase in the next few years. Another study (Borrego and Urbano 2007) analysed the use of the journals of the American Chemical Society at the University of Barcelona. The results showed that most of the consumption was concentrated in a few IP addresses. As in earlier studies, it was found that the dispersal of consumption was higher than in paper journals and that 35% of the titles were used for 80% of the article downloads. 180
BPDG_opmaak_12072010.indd 180
13/07/10 11:51
THE DIGITAL LIBRARY OF CATALONIA
Finally, it was determined that the number of abstracts viewed was a good predictor of the number of regular readers of a journal. Coinciding with the tenth anniversary of the Digital Library of Catalonia in 2008, a report was drawn up on the use of the electronic information to which it had subscribed, including journals, databases and books. Though it is difficult to compare figures on the use of different resources, it was found that the consumption of most of the products increased during the first few years of the subscription and then stabilized. A slight reduction in the consumption of databases in the last few years was observed, affecting especially those with thematic specializations. The consumption of electronic books was still very low, showing that it had not reached a high level of acceptance in our society. Another qualitative study, carried out through an open questionnaire distributed by e-mail and personal interviews with a sample of lecturers, analysed the impact of the availability and accessibility of electronic recourses on the behaviour of academics (Ollé i Borrego 2009). The results showed that the increase in the availability of electronic journals had led to an increase in the number of articles and the diversity of journals read by teaching and research staff. However, the reading had become more superficial. The availability of electronic resources was leading to a reduction in physical visits to libraries and consequently a saving in time. Searching was a very popular option for keeping upto-date with the new literature; Internet search engines, particularly Google and Google Scholar, were becoming the most widely used information sources. The academics stated that they encountered many problems in managing their personal scientific information. During this period, the activity of the Digital Library of Catalonia led to two PhD theses. Sales (2002) studied the distribution of its subscription costs. More recently, Térmens (2007, 2008) carried out a study to determine whether the different member institutions of a consortium showed significant differences in the use of the licensed resources beyond those which could be attributed to the size of each university. These differences were justified by the history of the resources available and by the level of research carried out in each institution. He analysed the use of seven journal packages which offered COUNTER statistics: those of the American Chemical Society, the American Institute of Physics, Blackwell, Elsevier, Emerald, Springer and Wiley. The journals were classified into 33 thematic areas and their use was attributed to the teaching and research staff of the universities, divided into 199 areas of knowledge. The results showed that some institutions made greater use of the subscribed to journals than would be expected from their size. Differences were observed in thematic areas taking into account the number of teaching and research staff assigned to them. A relationship was also observed between the current level of use, the previous subscriptions and the level of research of each university. Cooperation with other consortia Spain is divided into autonomous communities (geographic areas with competences to regulate areas such as health and education). This model is also applied to the creation of library consortia, and there are currently five which have been formally set up (the BUCLE in Castilla-León, BUGalicia in Galicia, the CBUA in Andalusia, the CBUC 181
BPDG_opmaak_12072010.indd 181
13/07/10 11:51
Lluís Anglada, Ángel Borrego and Núria Comellas
in Catalonia, and Madroño in Madrid), a network of research centres which can be considered as a consortium (the CSIC), and several ad hoc buying clubs set up to obtain better conditions in the subscription for electronic products (in the Canary Islands and the Valencian Community, for example). Though each consortium operates separately, meetings have been held to exchange information when it has been considered necessary. Furthermore, REBIUN (the Spanish association of university libraries) and the consortia have recently increased their contacts with a view to collaborating more closely in the future. So far attempts to achieve a Spanish national licence for large packages of electronic information have shown no results. But there is one exception: since 2004 all the Spanish universities have had a subscription to the ISI Web of Knowledge through the licence negotiated for them by the FECYT and paid for by the Spanish Ministry of Science and Innovation. Further contacts between the consortia and the FECYT are scheduled in order to widen the range of electronic resources for which national licences can be negotiated with publishers. Other actors with whom the CBUC collaborates are aggregators or subscription agents. When for reasons of strategy or cost a consortial subscription to an information resource is not suitable for all members of the CBUC, aggregators can help to organize a subscription for some members of the consortium within a wider agreement including other clients. This has been the case with Science, MUSE and Annual Reviews (all through EBSCO). The CBUC has very good relations with its neighbours in the other countries of southern Europe through the informal group SELL-Southern European Libraries Link (Giordano 2002), which has met annually since 2001. One of the results of this synergy was the creation in 2007 of the first “transnational” agreement for the joint subscription of an electronic resource, the ALJC/ALPSP package of journals, through the platform of the Swets subscription agent. Conclusion: strengths, weaknesses and future challenges Now, with 10 years of experience in consortial licensing, we can point out some strong and weak points of the BDC. The main strong points are increasing cooperation, an increasing amount of information available and limited expenditure. The main weak points are the inability to extend the BDC beyond the universities and a certain degree of saturation in its capacity for joint subscription in the last few years. The CBUC was created before the emergence of consortial licensing, and it was in the right place at the right time to extend its activities to joint purchasing. Its ability to do so has undoubtedly strengthened it. The clear benefits of the Big Deal have consolidated collaborative relations and allowed the CBUC to extend its activities to other areas, particularly electronic repositories. The increase in information available has also been a clear result of the BDC. Though this is always an advantage of joint licensing, it is probably a greater one for consortia like the CBUC which contains universities which have traditionally had less purchasing power. 182
BPDG_opmaak_12072010.indd 182
13/07/10 11:51
THE DIGITAL LIBRARY OF CATALONIA
The advantage of offering new subscriptions at an affordable extra cost has attracted money to increase library purchasing budgets from both the government and the institutions themselves. The university community of Catalonia is highly satisfied with the radical change in access to information and the increase in available resources brought about by the change from individual subscriptions to consortial purchasing. A final positive element is the limitation of expenditure. The price of journal subscriptions has risen more than the increase in budgets based on the RPI, but consortial action has led to two very important results: the ability to plan expenditure thanks to longterm deals, and the limitation of the annual increases in comparison with the previous individual subscriptions. The BDC has managed to extend its licensing capacity to universities which are not members of the Consortium, but it has not managed to extend it to other types of libraries in Catalonia. Following the example set by other European consortia (such as the Finnish and Portuguese ones), since 2005 there have been attempts to include libraries of research centres, government agencies and hospitals in the BDC, but they have been unsuccessful, as have the attempts to include public libraries. The reasons for this are the obvious difference between the dimensions and needs of large and interdisciplinary university libraries and the libraries of other institutions, the lack of government funding for this extension of coverage of the BDC, and the fact that the libraries are under the control of different departments of the Catalan government. A feature of the BDC is the fall in the level of new joint licensing projects in the last three years. This is largely because the most relevant and common digital products for the research community have been included in the BDC. However, there are specialized resources which are of interest to only some libraries and have therefore not been included in the joint purchases. Furthermore, the attempts to reach joint licensing agreements for electronic books have been less successful than was expected. With regard to consortial purchasing of electronic information, the CBUC faces three major challenges. Firstly, it faces economic restrictions arising from the current crisis and from the fact that an increasing number of actors are competing for the same economic resources. Secondly, it must find mechanisms for incorporating special libraries in consortial purchases. This would make it possible to add to subscription agreements specialized products which cannot be subscribed to consortially unless a critical mass of institutions is attained. The egalitarian effects of cost sharing would thus be extended from universities to other institutions. Finally, the CBUC can and must build on the success of co-operation in the last few years to improve the access of the international community to Catalan scientific production through the creation and maintenance of digital repositories.
183
BPDG_opmaak_12072010.indd 183
13/07/10 11:51
Lluís Anglada, Ángel Borrego and Núria Comellas
Summary – The Digital Library of Catalonia is a joint licensing project for electronic resources developed in the CBUC consortium. CBUC began with the building of a union catalogue (CCUC) but soon evolved to encompass consortial purchasing activities – The cost sharing model provides a win-win result for all members: other models have been considered but proved less advantageous for one or another member – Seed corn funding from government is used for new projects; recurrent costs are met by members – Big deals have proved to be generally advantageous – Huge increases in usage and distribution of use have been recorded – There is a high level of awareness and acceptance among researchers – Attempts to widen the consortium to other sectors have been problematic – however expansion is necessary to provide access to more specialized resources – CBUC is also very active in the development of joint repositories References Anglada, L. (1999), Working together, learning together: the Consortium of Academic Libraries of Catalonia Information technology and libraries, Vol. 18, No. 3 Anglada, L., Comellas, N. (2002), What’s fair? Pricing models in the electronic era, Library Management, Vol. 23 No. 4/5 Anglada, L., Comellas, N., Roig, J., Ros, R. and Tort, M. (2003), Licensing, organizing and accessing e-journals in the Catalan university libraries, Serials:the journal of the United Kingdom Serials Group, Vol. 16, No. 3 Anglada, L., Reoyo, S. (2005), Actividades open access de los consorcios del SELL y del CBUC, El profesional de la información, Vol. 14 No. 4 Borrego, A., Anglada, L., Barrios, M., Comellas, N. (2007), Use and users of electronic journals at Catalan universities: the results of a survey, Journal of Academic Librarianship, Vol. 33 No. 1 Borrego, A., Urbano, C. (2007), Analysis of the behaviour of the users of a package of electronic journals in the field of chemistry, Journal of Documentation, Vol. 63 No.2 Giordano, T. (2002), Library consortium models in Europe: a comparative analysis, Alexandria, Vol. 14 No. 1 Ollé, C., Borrego, A. (2009), A qualitative study of the impact of electronic journals on scholarly information behavior. Unpublished manuscript Sales i Zaguirre, J. (2002), Models cooperatius d’assignació de costos en un consorci de biblioteques. Doctoral thesis available at http://www.tesisenxarxa.net/TDX-0318103-161333/ (accessed 12 February 2009) Térmens, M. (2007), La cooperació bibliotecària en l’era digital. Consorcis i adquisició de revistes a les biblioteques universitàries catalanes. Doctoral thesis available at http://www.tesisenxarxa. net/TESIS_UB/AVAILABLE/TDX-1017107-113943/ (accessed 12 February 2009) Térmens, M. (2008), Looking below the surface: The use of electronic journals by the members of a library consortium, Library Collections, Acquisitions, & Technical Services, Vol.32 No. 2 Urbano, C., Anglada, L., Borrego, A., Cantos, C., Cosculluela, A. and Comellas, N. (2004), The use of consortially purchased electronic journals by the CBUC (2000-2003), D-Lib Magazine, Vol.10 No. 6. Available at http://www.dlib.org/dlib/june04/anglada/06anglada.html (accessed 12 February 2009)
184
BPDG_opmaak_12072010.indd 184
13/07/10 11:51
17 DIGITAL LIBRARY DEVELOPMENT IN THE PUBLIC LIBRARY SECTOR IN DENMARK Rolf Hapel
The digital library in a public library context Definitions The lack of a precise definition of digital libraries has been a constant factor for practitioners and researchers in the information field in Denmark. A relatively comprehensive effort towards reaching a framework and a definition has been carried out in the European DELOS network of reference, where a reference model identifies the basic concepts and relationships characterizing the field. Basically the model operates with six main parameters: Content, User, Architecture, Functionality, Quality and Policy.1 In this chapter, these basic elements will be interpreted in the perspective of the broader political and cultural rationality of the existence of public libraries. An embryonic typology of Internet-based services in public libraries was presented in a study from the Bertelsmann Foundation (Hapel et al. 2001). This typology creates a basic understanding which is used in this chapter to group the examples of digital library practice in Danish public libraries presented later. For public libraries, like other libraries, the challenges are of financial as well as technological character. The rapid substituting of traditional information production and retrieval from analogue to digital has put enormous pressure on the libraries’ ability to change according to society’s needs and political will. The political acknowledgement of the idea and purpose of the traditional public library are increasingly challenged by the fact that access to ever more digital content is available for ever more users at ever lower costs. So in Denmark the public library sector has for more than a decade been working on realizing a strategy of the hybrid library deriving from an EU Commission study (Thorhauge et al. 1997). The understanding of the hybrid library is composed mainly of four elements: 1) physical place, 2) access to physical digital media (CDs, DVDs) and analogue printed media, 3) access to licensed Internet-based media in the library as well as off the library’s premises, e.g. in the home/workspace of the users, and 4) help, guidance and value-added services for users. Rationale for Public Libraries In Denmark and elsewhere in Western Europe and the USA, the existence of public libraries is based upon an ideological foundation and rationale which derive from the historical period of the Enlightenment, and have been refined and developed throughout the industrialised age. In Denmark, the first legislation on public libraries was adopted in 1920 and the public libraries of the industrial society have been a tremendous success. 1
DELOS Network of Excellence on Digital Libraries, http://www.delos.info/. 185
BPDG_opmaak_12072010.indd 185
13/07/10 11:51
Rolf Hapel
They supported democracy and development by supplying information and knowledge in book format to a broad section of citizens; about two thirds of the Danish population were library users at the end of the 20th century. The circulation of books strongly supported the capacity to read, the formal education system, and the understanding of the importance of cultural heritage. Today, as most western societies have moved into information and knowledge-based mode and going digital has been on the agenda for almost two decades, the role and practice of public libraries have changed. One of the challenges for the emerging knowledge-based society is the social divide which, to a certain extent, has turned into a digital divide. Denmark still has a relatively high ratio of socially excluded people. A large segment of the population is still not able to read – estimated at up to 15-20 per cent. Recent studies in Denmark have shown that although Internet penetration is very high,2 very many people are not actually able to make use of what this access provides. In Europe in general it is estimated that more than one third of the population lacks basic digital competences. Organizational framework for developing digital library services The Danish Agency for Libraries and Media, a state body for libraries originally established as a result of the first Danish Library Act of 1920, now under the jurisdiction of the Ministry of Culture, has played an important role in the emergence of the collaborative initiatives on developing digital library activities in the public library sector. The most important tool has been an annual fund for the purpose of supporting development projects (Thorhauge 2004). There is an annual call for proposals from the public libraries under predefined action lines. Examples of such action lines are inter-municipal cooperation, cooperation between school libraries and public libraries, and skills and competence development regarding web-guides and the creation of web-based information systems. In the mid-1990s a part of the fund was targeted towards developing IT infrastructure for libraries in small municipalities, and at the beginning of the new century the strategic focus has increasingly been directed towards the development of a coherent framework for the development of digital services in public libraries. Another important factor has been the tradition of co-operation in the public library sector. The first jointly networked cooperative net libraries were initiated by public library leaders using their interpersonal knowledge to find partners. Eventually, these networks, which were partly based on professional relations, partly on personal sympathies, changed into more formalized professional partnerships with negotiated agreements and a juridical framework based on official associations approved by city councils. In 2002, an umbrella association, Public Libraries’ Net Libraries, came into being with the purpose of ‘talking with one voice’ for public libraries in negotiations with the Danish Agency for Libraries and Media and the Ministry of Culture. In January 2007, the Agency for Libraries and Media formed a coordination board for net libraries. This board, appointed partly by the Agency for Libraries and Media, partly by representative bodies in the sector, is responsible for formulating strategies and criteria for state support for the development 2
About 85 per cent of the population has access to the Internet. As of March 2008, Denmark had the world’s highest broadband connectivity, reaching 36.5 per cent of the population, http://www.investindk.com/visNyhed.asp?artikelID=19321. 186
BPDG_opmaak_12072010.indd 186
13/07/10 11:51
DIGITAL LIBRARY DEVELOPMENT IN DENMARK
of net libraries and securing a balance between representation and professionalism in the allocation process. Elements of the digital public library in Denmark The digital public library as discussed in this chapter is not a precisely defined entity. It consists of a variety of locally organized web-services and content provision, typically organized in websites and a number of jointly operated net libraries of subject portals woven together in a national infrastructural backbone. The general system architecture is moving towards the realization of the idea of a three-layered IT model distinguishing between user interface, service layer and data layer in the IT structure. General Library Web Sites The Danish library websites have evolved very much since the first site was launched in the beginning of the 1990s. At that time they were static, providing only information on opening hours, location of physical libraries, and rules and regulations. The effort devoted to the task of creating such homepages was very often carried out by library staff with only modest training in working with the basic tools of building homepages, and with no skills in marketing and design. However, it was an important period for many library staff members because in these years a foundation of competences for managing the new tools was slowly building up within the sector. As software and templates developed, the management of websites became easier and the ingenuity of staff flourished. Eventually, the static homepages built in software such as Frontpage were replaced by content management systems (CMS), typically initially organized by private sector operators. From the beginning of the second millennium, library websites were to an increasing extent being built in open source CMS systems such as Drupal, hence becoming increasingly interactive and powerful marketing tools for events, programmes, and media in the libraries. The dynamic and interactive website of the public library of today gives access to a variety of services such as personalized profile, e-mail notifications on holds and overdue books, web payment of fees and fines, notification of imminent expiry of loans, subscribed for personalized subject lists on newly purchased media, business information, surveys and polls, virtual tours and maps, RSS feeds and SMS services. Of course this also causes an increased workload and a demand for a higher competence level within the library organization. As websites have become more complex increased editing and journalistic skills are needed, which has led to competence development activities and new staff recruitment profiles. Enhancements of the OPAC The OPAC in a public library is typically a part of an ILS (Integrated Library System), a vendor-specific solution offering a range of library functions. A relatively small number of vendors dominate the market. With a long history of general cutbacks in public libraries, the economic value of the Danish market has apparently not been tempting enough to the big international vendors in library automation to create increased competition. The core of an ILS is a relational database, often Oracle, which presents search results of records in the Marc format to the user. In Denmark, the database is typically the library holding and 187
BPDG_opmaak_12072010.indd 187
13/07/10 11:51
Rolf Hapel
the records are imported from a central cataloguing and classification unit, ‘DBC - Dansk BiblioteksCenter’, jointly owned by the Danish state and the municipalities. For the right to use the centrally produced records in the local ILS every municipality pays an annual fee based on the number of inhabitants. The OPAC has developed significantly since the turn of the century. Web-service technology has made it possible to embed in the catalogue elements from other web-based data sources, e.g. images of books’ front pages. DBC delivers copyright cleared digital front pages to the library OPACs regardless of the type of library automation system. A more interesting system based on collaborative filtering and statistical data on loan patterns has been developed in a cooperation between a few leading public libraries. The idea stems from a personalization project in the Danish libraries inspired by a Bertelsmann Foundation study (Garcia and Chia 2002) and was to exploit the “wisdom of the crowds” in creating a recommendation facility somewhat similar to the well-known recommendations in commercial websites such as Amazon.com. The service is based on frequently updated loan statistics from six major public library systems. The data are placed in a common pool and all connection between the people who have borrowed the material and the actual material has been automatically removed. Analysis of the loans results in a recommendation on each item record, which the user sees when searching the OPAC, saying: “Others who have borrowed this book have also borrowed these” – and then five other suggestions appear in ranked order. The technologies used to embed the results of these statistical data in the local OPAC are the same as those used to embed various pieces of content from the net libraries in the OPAC. This is a technical deconstruction of the net libraries or subject portals into small pieces of relevant information subsequently automatically reconstructed in the OPAC. The primary sources for these services are the Literature Site, the Libraries’ Net Music and DBC’s Authors’ Portraits. They deliver recommendations and portraits of authors and composers. Features for customizing and personalizing the OPAC are in the pipeline, whereby the user can keep track of her own loans, holds and possible fines for overdue material. Despite the development of dynamic standards such as XML and the development of e-resources (digital documents, electronic serials, e-books, networked audio-visual documents) the OPAC as a record of stock will not be rendered obsolete by general search engines such as Google, but some libraries are working on the possibilities of a totally different data-structure combined with a general search engine to get rid of the vendor-specific ILS. This has to do with the fact that an ILS is a vertical solution with many integrated functions and facilities which makes it difficult and costly to combine with other solutions. At the present stage, a strategic initiative has been launched by a consortium of DBC, Aarhus Public Libraries and Copenhagen Libraries to implement an Open Library Strategy. The idea is to open up system architecture to let the user (i.e. a library) put together a total solution based on products from several different suppliers. Similar activities to deconstruct the vendor-specific ILS have been launched by the State Library in the Summa project.3 3
Summa is a general search engine and indexing facility which can be applied to the data possessed by a public library, thus supporting integrated search and delivering services, which allow the library to present state-ofthe-art services. 188
BPDG_opmaak_12072010.indd 188
13/07/10 11:51
DIGITAL LIBRARY DEVELOPMENT IN DENMARK
Mobile Portals A relatively new service is the mobile portal, where the library gives access to a range of services for mobile phones which are Internet connected. The services range from information on opening hours and the status of borrowed materials, fees and fines to news on programmes. There are also SMS services providing reading suggestions and recommendations and facilities to place orders and holds on desired books. The growth of broadband connected mobile devices in the market has formed the necessary infrastructure and the potential of mobile access to library services can hardly be overestimated. There is little doubt that there will shortly emerge new services pushing content, such as samples, citations and quotations from new literature, young authors’ poetry, Q and A services. Digital Content Today, digital content includes e-texts, e-books, e-zines, music, photos, images, film and audio books which are directly available through download or down-loan services. Download allows the user permanent use of the e-source, while down-loan provides the user with time-limited access to digital media managed by the use of digital rights management systems (DRM). Since the mid 1990s, every Danish public library has had access to a smaller or larger portfolio of web-based databases and electronic magazines which are available for users either in the library only or both in the library and elsewhere at the users’ convenience: at home, work place, or study. The number of e-resources available for the user is limited by the local municipal funding of the library – how much money the library can spend on licences. Danish public libraries, supported by the Danish Agency for Libraries and Media, have formed consortia to negotiate licences at national level. This has eased the financing of licences and enabled broader access for users living in municipalities with less funding for public libraries. Every municipal library is still able to decide whether a certain service should be purchased for its users or not. Examples of licences are: ‘Encyclopaedia Britannica’, ‘e-brary on-line’, ‘Ebsco magazines and databases’, ‘Grove Music’, ‘Danish National Encyclopedia’, ‘Library PressDisplay’, ‘Oxford Reference on-line’ and many others. An increasing number of audio books in MP3 format are available through the library OPAC. When performing a search the user gets a result showing books in the library, e-books for immediate download, and audio books in MP3 format for down-loan. bibliotek.dk A collaborative free web-based search and ordering facility, ‘bibliotek.dk’ (http:// bibliotek.dk), makes it possible for a user anywhere in Denmark to have physical media, books, magazines, CDs, DVDs, delivered from any library in the country to a library in his or her neighbourhood. The system is based on a joint catalogue for all holdings in Danish public and research libraries combined with an ‘intelligent’ net and a daily transportation system, operated by contractors. Tenders are invited for the costs of both running and developing the web-system, the database of holdings and the transportation, and the services are paid for by the Danish Agency for Libraries and Media and financed by state grants laid down in the Danish Act Regarding Library Services. The media are transported between ten distribution centres spanning the entire country combined with 189
BPDG_opmaak_12072010.indd 189
13/07/10 11:51
Rolf Hapel
regional transport systems. Tenders are invited for the regional transport by the county libraries, but financed by the state. The substantial staffing costs of administering the processes connected with locating, packing, returning and claiming overdue physical media are met by the local municipalities. The web-based user interface has many features including Flash video user instruction, connection to local libraries’ library automation systems and OPACs, e-mail notification when material is ready to be picked up and a substantial number of e-books and e-texts for direct download. As of January 2009, the system holds more than 11 million records, and 1.4 million requests were registered in the system in 2008. Net Libraries A wide and colourful range of ‘virtual libraries’ or subject portals targeted towards different user groups and using different communication profiles emerged in Denmark in the mid and late 1990s. Front runners among the municipal public libraries initiated these cooperative services which created totally new ways of working for librarians and library staff in many libraries. The idea was to make use of the technological lead which the public libraries possessed compared to other public sector institutions. This advantage derived from the library automation period from the late 1980s to the early 1990s, when library systems were implemented in every public library. These systems were generally run on local UNIX servers and required local technical skills as well as managerial focus and local understanding of the possibilities of information technology. Contrary to some other sectors, the libraries in this local handling of a technological challenge managed to create a knowledge base which proved valuable for the emerging Internet era. The production method was at first informally organized. It was based on relationships between colleagues and had a common ideological base in the values of the public library. Colleagues typically had a start-up meeting, where the idea was presented and participants would divide tasks. Then production of a website was carried out, sometimes by library staff, sometimes purchased from private sector companies. Later, the ongoing production of content and metadata etc. was undertaken by librarians from several different public libraries – sometimes more than a hundred librarians all over Denmark were involved. The rationale for the local municipal library was obvious: by letting one member of the library staff work for maybe ten hours a week on a net library, the users of the library services got the value of 100 librarians’ joint efforts in the portal. The business model was simple: everything is free for users, every municipality decides how much effort should be involved on the part of the library and the legal entity behind the net libraries is always an association formed by the participating municipal libraries. When the number of net libraries peaked around 2003/2004 more than 30 were in operation. At that time, however, it had become increasingly obvious that there was a pressing need to consolidate and merge some of these net libraries. A number of the services were scarcely used by the public, standardization of platforms was non-existent, and the marketing efforts varied widely, to say the least. This led to a strategic analysis which stated the need for a more concerted approach to the field which eventually resulted in the formation of a coordination board for the net libraries.
190
BPDG_opmaak_12072010.indd 190
13/07/10 11:51
DIGITAL LIBRARY DEVELOPMENT IN DENMARK
In almost every case, the state, through the Danish Agency for Libraries and Media, has supported the formation of net libraries with development funds from a special pool for library development. The requirements for state support for net libraries are that they are based on open source principles and technology, that they are nationally accessible and that they meet the requirements of ´Top of the Web´.4 The most interesting of these net libraries have developed considerably over the years and have gained an ever growing audience and user base. Among these are ‘Biblioteksvagten.dk’, http://www. biblioteksvagten.dk, an on-line question and answering service operated by librarians from 71 public and research libraries. ‘Biblioteksvagten.dk’ has long daily opening hours during which users can chat with librarians and have questions answered on almost any topic based on collections and e-resources in the libraries. Securing staff for the opening hours is divided between the participating producing libraries, e.g. Aarhus Public Libraries is responsible for Sunday afternoon and Tuesday evening. Operations run through various channels of communication such as phone, e-mail, and structured semi-automated systems embedded in the web portal with facilities like take-over of the user’s screen to point out relevant Internet based e-resources. There are several other net libraries available for the Danish public, among them ‘Finfo’, http://www.finfo.dk, a net library which provides multilingual information on Danish society for new citizens and immigrant communities, and ‘Juraportal.dk’, http://juraportal.dk, a subject gateway to law, criminology and related subject areas. ‘Libraries Net Music’, https://www.netmusik. dk, is not by definition a net library, but a library service which allows registered library users access to down-loan of more than a million music tracks in MP3 format. The loan period is either one day or seven days – after that the piece of music will erase itself, unless the user chooses to buy it. A children’s portal is currently under total reconstruction and will be relaunched October 2009. A question and answer service for children, ‘Ask Olivia’, http://www.spoergolivia.dk, presents various possibilities for interaction, including games and reading recommendations. The most widely used net library is the literature portal ‘Litteratursiden.dk’, http:// www.litteratursiden.dk. This net library was started in 2000 as an amalgamation of three different initiatives. The purpose was to promote contemporary Danish and foreign fiction. The main editor is based in Aarhus Public Libraries, but the consortium consists of public libraries representing 79 municipalities. All participating libraries perform some sort of production work for the site. The facilities of the portal include an e-zine with e-mail notification with more than 3000 subscribers; book clubs; advice; recommendations; articles; a database on contemporary Danish authors including video and audio clips, biographies and bibliographies; the opportunity to place holds and requisitions through bibliotek.dk; and facilities for the participating libraries to embed content such as recommendations automatically in their own OPAC through web service technologies. The site has formed a partnership with a Danish national public service television broadcasting company. The site has 3.6 million individual visits annually. At present, 19 net libraries are in operation in Denmark.
4
“Bedst paa Nettet” is an EU evaluation and benchmark of the quality of public websites. In Denmark it is performed by the National Danish IT and Telecom Agency. 191
BPDG_opmaak_12072010.indd 191
13/07/10 11:51
Rolf Hapel
Perspectives There can be little doubt that the impact of the digital library on public libraries and their activities has not by a long way reached its peak – it has only just begun. In order to monitor a development which already results in more people using computers than their library cards when they visit the library, a key performance indicator system for comparing Danish library websites was introduced in June 2008. It measures individual visits and downloads from library websites and from net libraries, and enables comparison of not only inter-library loan but also other web activities in an accepted and standardized way. The 2010 strategy for net libraries (Danish Agency for Libraries and Media 2008) formulated by the coordination board strongly emphasized a movement from transactional to relational services. Core development areas are cooperation with players outside the sector, integration between physical libraries and net libraries, user involvement in innovation processes, relational services supporting user interaction through social technologies, transactional services which are increasingly self-serviced and user-friendly, ‘deconstructed libraries’ (where parts of the library can be transposed to a different setting or parts of the digital library can be embedded in other e-services) where and when the need emerges, and integration with bibliotek.dk. This strategy is the basis for a further development of the digital public library together with initiatives to build in web 2.0 services in the net libraries and a campaign to convert the data structures into data wells, enabling the user to experience the advantages of integrated search. All these developments lead to a mindset in which the concept of the hybrid library seems to be relinquished and a new concept of a mash-up library can be seen dimly on the horizon. The mash-up, a type of easy application which combines data or content from several sources into new integrated content, is a perfect model for the development not only of digital library activities, but also those of physical libraries, where an increasing number of services and content will be supplied from sources outside the library organization. Summary – Public libraries made a great contribution to literacy and education in the past and are now changing to encompass the knowledge based society, but the social divide is turning into the digital divide – a challenge for public libraries – The digital public library is seen as consisting of a variety of locally organized web services and content provision, and jointly operated net libraries of subject portals woven together in a national infrastructural backbone – In Denmark digital libraries for public libraries have been supported by structural funding at national level. Static web sites have developed into highly interactive sites supporting social networking – OPACs are developing many sorts of integrated web service, including collaborative ordering services – Public libraries are developing mobile services such as borrowing status and news updates – Public libraries in Denmark collaborate on national licensing for e-content 192
BPDG_opmaak_12072010.indd 192
13/07/10 11:51
DIGITAL LIBRARY DEVELOPMENT IN DENMARK
– Various sorts of networked services (‘net libraries’), such as question and answer services, literature promotion and e-zines, are being introduced – Developments are far from reaching their peak: anticipated developments include a shift from transactional to relational services, web 2.0 and data mining – the ‘mashup library’ References Garcia, J. and Chia, C. (2002): Personalization of Electronic Networked- Based Library Services, Bertelsmann Foundation, Gütersloh Hapel, R., Pirsich,V. and Giapicciconi,T. (2001): Future-oriented Internet-based Services in Public Libraries, Bertelsmann Foundation, Gütersloh Styrelsen for Bibliotek og Medier, Koordinationsgruppen for Netbiblioteker (2008): Strategi 2010 – fra netbiblioteker til vidensnetvaerk (In Danish) Thorhauge, J., Larsen, G., Thun, H.P. and Albrechtsen, H. (1997): Public Libraries and the Information Society, European Commission, DGXIII/E4 Thorhauge, J. (2004): Danish Library Agency. Entry for Encyclopedia of Library and Information
193
BPDG_opmaak_12072010.indd 193
13/07/10 11:51
BPDG_opmaak_12072010.indd 194
13/07/10 11:51
18 DIGITAL LIBRARIES FOR CULTURAL HERITAGE: A PERSPECTIVE FROM NEW ZEALAND Chern Li Liew
Introduction Modern information and communication technologies (ICT) have brought about changes and developments in many ways. One of these is that time and space no longer hinder the distribution of and access to information. For the cultural heritage sector, these developments have opened up new opportunities. There is now increased ability to enhance accessibility and transparency to cultural heritage, to attract audiences from around the world to collections of unique and exciting heritage contents which are important assets of various societies. An increasing amount of such content is now being made available in digital form and via the Internet. Using ICT, cultural heritage artefacts can also be presented through more varied and lower cost communication mechanisms. There are also new opportunities for consortia activities, partnerships among different institutions and their audiences (both existing and new) to conserve and preserve heritage contents and to promote cultural diversities. The momentum to harness the power of technologies to create digital cultural heritage resources (CHR) is great, and the enthusiasm among museums, libraries, archives and historical societies – often referred to collectively as cultural heritage institutions – is high. In fact, digital heritage has recently been accorded status as an entity in its own right, with UNESCO pronouncing “resources of information and creative expression are increasingly produced, distributed, accessed and maintained in digital form, creating a new legacy – the digital heritage” (UNESCO 2003). However, while the new technologies capable of supporting novel presentations, navigations and access to information have received the most attention, it is the underlying social, economic and policy changes which are most fundamental and which will have the most lasting effects on the future of the digital CHR landscape. Cultural heritage, by its inherent social disposition, will naturally always be influenced by social and political factors. Before jumping on the bandwagon of developing a multitude of digital libraries for CHR therefore, it is critical to take a moment to think about what we are and should be building, for whom these digital libraries are built, for what purposes and how we can ensure the manageability and sustainability of these resources. With the power of technology to widen access, the missions of access of cultural heritage institutions are suddenly much more easily achieved, but the policies, business models and ethical as well as other professional assumptions which have been used to regulate the analogue realm are becoming insufficient in some aspects for the digital landscape (Bishoff 195
BPDG_opmaak_12072010.indd 195
13/07/10 11:51
Chern Li Liew
and Allen 2004). The shift to computer-mediated forms of production, distribution and communication of cultural information “affects all stages of communication, acquisition, manipulation, storage, and distribution”, as stated by Lev Manovich (2001), and hence the importance of careful business planning among those involved. This chapter attempts to characterise the current developments in CHR digital libraries and to set these in their social and policy-related contexts, where appropriate, using the case in New Zealand as an example to illustrate a number of key concerns. The chapter does not aim to cover the research issues comprehensively but aims to highlight a number of current imperatives pertaining to the handling of CHR in a digital environment and to discuss the related key business planning elements, followed by a brief discussion of those likely to be operative in the foreseeable future. Cultural heritage resources in the digital landscape The value of CHR is more than the commercial aspects. The value is often based on the social and cultural, historical, spiritual legacy and honours of communities and the academic value. Creating and developing a CHR digital library is not just a technical task. In one of their most noble roles, cultural heritage institutions are vehicles for the enduring concerns of public spectacle, information and knowledge preservation, shifting paradigms of knowledge (Gere 2002). Digital libraries also need to distinguish themselves from mere digital collections and databases. They are expected to add value to their resources, and the added value may consist of establishing context around the resources, enriching them with new information and relationships that express the usage patterns and knowledge of the community concerned, in that the digital library becomes a context for collaboration and information accumulation rather than simply a place to find and access information (Lagoze, Krafft, Payette and Jesuroga 2005; Lynch 2002). Cultural and heritage contents are important for national/community identity. They are often the unique and irreplaceable legacy of tangible artefacts (e.g. archaeological, architectural, cultural and industrial) or intangible attributes (e.g. oral traditions and expressions, social knowledge, rituals and practices concerning nature and the universe, traditional craftsmanship and performing arts) inherited from past generations, maintained in the present and bestowed for future generations. The decisions on what contents to include may reflect difficult cost/benefit trade-offs and may be controversial. Nevertheless, digital libraries and the services they offer provoke concrete discussions about what does and does not appeal to actual users. It is also important to recognise that cultural heritage contents do not become obsolete over time. Instead, as the past recedes, they can become more valuable by acquiring new metadata or tagging to build up the contextual information. To speak of a successful and sustainable CHR digital library therefore requires one simultaneously to engage with the array of social, cultural and political contexts concerned and all their multifarious issues. 196
BPDG_opmaak_12072010.indd 196
13/07/10 11:51
DIGITAL LIBRARIES FOR CULTURAL HERITAGE: NEW ZEALAND
Special issues In New Zealand, there are significant cultural heritage contents held in local, regional and national institutions, and these contents range from manuscripts and printed materials to multimedia artefacts. The country’s effort to date in unlocking these contents through digitisation and making them available online has been largely sporadic and lacking in national coordination. Recognising this, New Zealand’s Digital Content Strategy was proposed in 2005 as a key action of the government’s digital strategy, and was subsequently launched in September 2007 (www.digitalcontent.govt.nz). It is recognised that selecting, creating and maintaining digital resources is an expensive endeavour and that it is not possible to digitise everything due to limited financial resources, technical capabilities and, in some cases, due to copyright restrictions. Intellectual property rights and cultural preferences also mean that not everything should be digitised and made available to the public. It is therefore acknowledged that a process of thoughtful selection and prioritisation which takes into consideration these factors along with the value of the materials and interest in their contents will be required. The Strategy recognises that this process is already taking place to varying degrees in different institutions and communities embarking on digitisation work, but believes that coordination at national level is essential to take advantage of opportunities for collaboration and to avoid the risk of poor understanding of gaps and user needs, duplicate work, inconsistent use of standards and poor user awareness of existing resources. Special attention is given in New Zealand to the protection of Māori cultural property while recognising the potential values of digitisation and online delivery of unique and scarce traditional knowledge and heritage. At the moment, the promotion and protection of Māori language and cultural through digitisation is undertaken by a growing number of rūnanga (caucuses), iwi (tribal) health providers, Māori-based businesses and educators. The AIO Foundation (www.aiofoundation.com), for instance, is an example of a Māoribased initiative which provides whānau (families) with the dual opportunity of preserving their culture and heritage digitally while endeavouring to maintain and promote traditional values. Māori cultural property and mātauranga Māori (which includes language, knowledge, wisdom, understanding, skills) are valued as a central part of New Zealand’s identity and should be visible to its people and to the world. Nevertheless, the Digital Content Strategy acknowledges the importance of preventing misuse, abuse, illicit access and the dilution of respect for cultural tradition, and of ensuring that control of access and use of traditional cultural property in the digital realm remain with its creator and bearers. It is proposed that the protection of cultural property could be strengthened in the digital realm by applying concepts of authenticity and kaitiakitanga (guardianship) as mechanisms to promote and protect indigenous content in digital form. New Zealand is one of the first countries to have adopted electronic legal deposit as one means of helping to preserve national cultural heritage. However, the Strategy recognises that additional mechanisms will be required to address the scope of content, particularly those of local and specialist histories and perspectives, along with public records of the country.
197
BPDG_opmaak_12072010.indd 197
13/07/10 11:51
Chern Li Liew
A few years back, Robert Sullivan (2002) proposed that “A cornerstone of an Indigenous Digital Library is that the indigenous communities themselves control the rights management of their cultural intellectual property. Local cultural protocols need to be documented and followed prior to the creation of digital content, and communities must be consulted with regard to the digitisation of content already gathered by institutions of social memory.” At the Museums and the Web 2003 conference, Hunter, Koopman and Sledge, in writing about software tools for managing indigenous knowledge, wrote, ”It is essential that traditional owners be able to define and control the rights and access to their resources, in order to uphold traditional laws; prevent the misuse of indigenous heritage in culturally inappropriate or insensitive ways; and receive proper compensation for their cultural and intellectual property. Finally, it is essential that indigenous communities be able to describe and contextualize their culturally and historically significant collections in their own words and from their own perspectives.” The above highlighted a number of key issues with regard to the creation and management of digital CHR which need to be taken into careful account in the business planning process, namely: – coordination of effort among cultural heritage institutions, governmental authorities, and other agencies – cultural intellectual property and rights management – consultation with the cultural heritage property owners and stakeholders In the next section, key elements of business planning for building a CHR digital library, particularly those associated with the above issues and concerns, are discussed. This is followed by an outline of the planning issues concerning the following, as part of the discussion of what the prospects are for CHR digital libraries in the foreseeable future: – integration and sustainable infrastructure – representation and contextualisation of indigenous knowledge in its cultural and historical perspectives – integrity and trustworthiness Key elements of business planning for cultural heritage digital library Collier (2004) defined business planning for digital libraries as “the process by which the business aims, products and services of the eventual systems are specified, together with how the digital library service will contribute to the overall business and mission of the host organisations. These provide the context and rationale, which is then combined with business plan elements such as technical solution, investment, income, expenditure, projected benefits or returns, marketing, risk analysis, management and governance”, with the added dimension of planning a successful collaboration if more than one institution or party is involved. Indeed, business models for digital libraries in general have evolved 198
BPDG_opmaak_12072010.indd 198
13/07/10 11:51
DIGITAL LIBRARIES FOR CULTURAL HERITAGE: NEW ZEALAND
over the years to reflect the roles of many such resources from mere content suppliers to sophisticated service providers. In the last few years, we have also seen a paradigm shift taking place in the digital library domain, in that from distinct, independent systems and closed library ‘silos’, digital library systems have evolved towards a networked servicebased architecture built, in some cases, as a set of interoperable local systems (e.g. digital heritage content involving museums, libraries, archives , as in Europeana). Establishing effective and successful collaboration among the dispersed communities involved (e.g. cultural heritage institutions, research institutes, universities, stakeholders and the user communities) is becoming a core concern, thus shifting from a focus on technological issues to one on those concerning the social, cultural and policy contexts in which these digital libraries are expected to function. The key business planning elements associated with the issues concerning the creation and management of digital cultural heritage resources which have been identified in the previous section are discussed in the following subsections. Coordination of effort among cultural heritage institutions, governmental authorities, and other agencies To place cultural heritage contents on to the digital landscape imposes a rethinking of the traditional approaches and structures. Coordination of effort is of utmost necessity if a content-rich and usable CHR digital library is to be not just created but sustained. An appropriate coordinating body may be needed to administer and monitor the efforts of those involved – the cultural actors, governmental authorities, legal bodies, research centres, the ICT industry and other agencies. Organisational structures such as those employed in the UK Digital Preservation Coalition (DPC) (http://www.dpconline. org), can be an example – DPC is a coalition of several sectors such as cultural heritage, education, science and research and commercial actors (e.g. publishers, broadcasters, software industries) which recognises the need to work jointly. Another example is the cross-sector EUBAM (http://eubam.de/) which works across the federal states in Germany for the coordination of national efforts with regard to a wide spectrum of activities which include selection and digitisation of contents, dealing with rights issues, metadata creation and the maintenance and preservation of and access to digital CHR in the country. These examples of coalitions have highlighted the benefits associated with risk analysis and costing, funding, networked expertise and resources, coordinated research and experiments, implementations, marketing and collective responsibility in preserving information and national memories. In New Zealand, this kind of coordination is taking place, with the cultural sector working with governmental agencies (including the Ministry of Pacific Island Affairs and Te Puni Kōkiri, which is the government’s principal adviser on Māori issues), the research and education sector and various interest groups, as reflected in the early stages of work under the national Digital Content Strategy. One of the key provisions in the Strategy is a programme aimed at achieving cross-sector collaboration on open standards and interoperability, capability, understanding users’ needs and enhancing seamless access to the country’s CHR. 199
BPDG_opmaak_12072010.indd 199
13/07/10 11:51
Chern Li Liew
Once such a national competence centre or agency has a strong base nationally, it could also explore cooperation through international cultural heritage consortia or coalitions to set up common digital assets such as infrastructures and services (e.g. a common metadata service or aggregator service) or to work towards larger-scale CHR digital libraries involving international bodies. Cultural intellectual property and rights management There are a number of legal issues which require adaptation to the digital context, namely legal deposits, trusted digital repositories, copyright and rights management, e-government legislation/record management, privacy and transparency, and freedom of information. The New Zealand Copyright Act 1994 is drawn from a United Kingdom model which provides specific exceptions to particular groups such as educational institutions, librarians (for supply of documents), archivists (for preservation) and the Royal Foundation for the Blind (for provision of Braille copies of literary or dramatic works). Some of the rights are subject to contractual restrictions. For instance, museums can contractually restrict the taking of photographs on their promises. On occasion, specific provisions can be opaque and confusing for users and owners alike. The definitions of rights and protection for various forms of objects on the Internet remain convoluted, to say the least. Take, for instance, the recent case of Lenz v. Universal Music Corporation, featured in the February 2009 issue of the American Bar Association Journal, with regard to a posting on YouTube (http://www.abajournal.com/magazine/copyright_in_the_age_of_youtube). This is just one example illustrating that on one hand stakeholders need to be convinced of security and rights protection availability for contents published on the Internet, but on the other hand over-restrictive legislation may become an obstacle for the accessibility of content. The New Zealand Ministry of Economic Development has made an effort in its review of the Copyright Act and the fair dealing exceptions to bring them up-to-date with the digital environment (http://www.med.govt.nz/templates/ContentTopicSummary____1103.aspx), but the attempt has proven extremely difficult given the volatile, evolving nature of new technologies. The right balance between rights protection and freedom of information must be struck, but it is in knowing and having a consensus on what the right balance is which often proves tricky. The tricky, grey area of the legislation is sometimes known as the “soft side” of copyright (Seadle 2002) and includes issues relating to the cultural expectations of creators and users alike. Added to these are the delicate issues concerning the ownership and custodianship of traditional indigenous knowledge and cultural heritage (Liew 2005). In an earlier study of the needs of users with regard to digitised New Zealand CHR (Dorner, Liew and Yeo 2007), the issue of cultural sensitivity emerged as one key concern. Participants raised particular concerns about the infringement of rights, for instance when re-using digital images in presentation slides and as book illustrations, and about current legislation translating poorly into the digital realm. Indeed, a culturally responsive law may provide some protection, but it may not be sufficient for addressing ownership and custodianship 200
BPDG_opmaak_12072010.indd 200
13/07/10 11:51
DIGITAL LIBRARIES FOR CULTURAL HERITAGE: NEW ZEALAND
concerns of digital CHR, as the legislation relates originally to physical artefacts. Resolution of these issues currently lies outside the legal domain. In New Zealand, in the early 1990s, as part of the effort to resolve such issues the government and its agencies initiated bicultural policies in order to engender respect for traditional Māori values. Following public consultation of the draft Strategy Discussion Document carried out at the end of 2006, it was acknowledged that current intellectual and cultural property rights, for instance The Mataatua Declaration on Cultural and Intellectual Property Rights of Indigenous People (1993), need to be defined in the digital domain and reflect the perspective of tangata whenua (indigenous people of the land). It was proposed that concepts such as authenticity, collective ownership and processes of kaitiakitanga can be used as mechanisms to promote and protect digital indigenous contents and that the relevant stakeholders must be consulted about these issues. At the moment, it seems that unless a radical revision of the current intellectual property law is carried out resolution of problems may depend on communal ethics, and that the wider community will need to take responsibility for the sensible and sensitive handling of indigenous contents. Issues related to consultation are discussed in the next sub-section. The Strategy has now included a public awareness programme among its key action items aimed at raising New Zealanders’ awareness of their intellectual property rights and obligations, and mechanisms for the protection of those rights (including those in relation to digital content). An initiative has also been put forward to help identify ways in which Māori cultural and intellectual property rights can be positively recognised and protected in the digital realm. Consultation with the cultural heritage property owners and stakeholders In New Zealand, iwi (tribes) and whanau (families, extended families) are typically responsible for the preservation and protection of traditions and heritage, with the concept of guardianship central to the issue of cultural and intellectual property (Talakai 2007; Kamira 2003). If approached to take part in digital library projects, a group may be sceptical about the idea of providing open access to knowledge and property considered spiritual and sacred. Some Māori groups, for example, propose that oral histories need to be made widely known to save a weakening tradition, while there are others who maintain that such heritage should be protected from the uninitiated (Kamira 2003). In an audience investigation and study of users’ needs with regard to digitised New Zealand CHR (Dorner, Liew and Yeo 2007), one of the main concerns among the historical researchers involved in the study as participants was indeed that certain historical documents might be considered cultural taonga (treasures) which certain Māori groups strongly believed to be sacred. Hence, they should not be disseminated publicly, and yet these documents might be public records that therefore could be legally disseminated. These are examples illustrating potential conflicts of principle concerning the use and guardianship of indigenous resources. The importance of open communication and consultation between different groups of stakeholders, and a proper facilitation of the 201
BPDG_opmaak_12072010.indd 201
13/07/10 11:51
Chern Li Liew
consultation and consensus building process cannot therefore be underestimated. When such ideological conflicts do occur, all stakeholders and interest groups must be given the opportunity mutually to explore appropriate means of allowing or limiting access to resources considered culturally sensitive. The values of both preservation and discourse must be considered by those involved in the digital library development (Ngulube 2002). Some scholars have discussed the value of reciprocity in collaborative collection building (Cedar Face and Hollen 2004; Peterson Holland 2002; Sullivan 2002). Cedar Face and Hollen (2004) quote the example of the First Nations Collection project in which the digital library team “gives back” to the people by providing them with access to their CHR in the form of bound copies of documents. Roy and Alonzo (2003) gave another example in which indigenous groups were involved in the analysis and indexing of the items gifted to a digital library. Such involvement of tribal members and groups and reciprocity may cultivate a sense of collective ownership. Prospects Integration and sustainable infrastructure Establishing effective collaboration among dispersed communities involved in a CHR digital library project will remain a core concern. Integration may become the keyword for any CHR digital library project. Integration of collections, methods, technologies, standards and services may become an increasingly attractive option. Integration can leverage resources and capabilities, bring about economies of scale and produce synergy among projects, as well as in providing a viable, sustainable resolution to many of the issues project teams face in competing for contents, resources and audience. Questions about the long-term future of the digital library in representing cultural identity and diversity and in representing the dialogue of the nation and its people and other communities in a globalised world will continue to re-surface as social, cultural and political forces develop and evolve along with changes in technologies and the changes these bring about to the dynamics. Investigations into how, for example, social networking tools can be utilised and transformed to facilitate and encourage community participation in the development of CHR digital libraries and to contribute to cultural content (Coppola 2008) will be likely to continue and will be important to the development of, particularly, community-based digital libraries for collecting, sharing and preserving personal cultural heritage information. Representation and contextualisation of indigenous knowledge in its cultural and historical perspectives Often, cultural heritage artefacts and information do not exist in isolation but are richlyconnected entities. In fact, capability of gaining a sense of digital materials in their entirety and of understanding the relationships of digital content within the appropriate historical context for example was an important concern (Dorner, Liew and Yeo 2007). The desired feature of the ideal digital CHR mentioned by a number of participants in this study was summarised by drawing on the metaphor of a “textured sculpture” – that digital 202
BPDG_opmaak_12072010.indd 202
13/07/10 11:51
DIGITAL LIBRARIES FOR CULTURAL HERITAGE: NEW ZEALAND
cultural heritage needs to be seen as multidimensional rather than as a flat entity. As such, these resources should be equipped with heterogeneous metadata, along with information from a broad spectrum of sources, some with authoritative commentaries and some also perhaps with personal opinions. Such contextual knowledge, when accumulated over the years, could form the core of a continually evolving knowledge base which will help future generations to interpret and understand the significance of the digital cultural heritage, and play a role in providing future societies with the opportunity to make decisions based on the best knowledge available to them. Some say this is perhaps the most fundamental reason why we preserve CHR in the first place. Engagement and interpretation on different levels and from different perspectives, rather than mere linear browsing of contents, should be allowed. Hence, technologies which support aggregation and consolidation to enable a CHR digital library system to act as a unified portal which offers integrated access to a rich combination of resources in a distributed environment will become increasingly critical as digital cultural heritage contents grow, as more collaborative efforts take place, and as users increasingly seek meaningful, contextual-rich contents. Thus, the digital library project team should determine (both at the business planning stage and at each review and evaluation phase) what supporting information would be important to help users to understand the content they seek in its entirety, to stimulate thinking and to foster new understanding. In the words of Clifford Lynch (2002), “the aggregation of materials in a digital library can be greater than the sum of its parts”. Integrity and trustworthiness Again, quoting Lynch (2002), “Recognize that part of what is at issue here if digital libraries are going to be more than the sum of their individual constituent content objects is the need to become more flexible in thinking about the integrity of works and authorship, of figuring out how to balance our need to honor this integrity while also being able to integrate huge numbers of such individual works.” Integrity of digital materials is, of course, in part associated with the proper management of rights and other legal issues. It is also tied to the stewardship and trustworthiness of the digital library system. From a technological as well as a user’s perspective, availability, integrity, authenticity and interpretability of contents are among the building blocks of any trustworthy digital repositories (Dobratz and Schoger 2007). A trustworthy CHR digital library must therefore be sustainable and operates according to its objectives and aims to meet the needs of both current and potential users. Hence, audience research such as that of Dorner, Liew and Yeo (2007), needs assessments and market analyses are naturally a critical part of any business plan for a long-term effort. Resource managers and funding agencies must make plans for long-term funding and granting decisions. The integrity and trustworthiness of a CHR digital library must also be assessed and reassessed, including investigations into issues concerning social inclusion (Warschauer 2003; DiMaggio and Hargittai 2001) and continue to demonstrate its value to the communities it serves. 203
BPDG_opmaak_12072010.indd 203
13/07/10 11:51
Chern Li Liew
Summary – DLs can provide much improved access, but policies, ethics and business planning from the analogue era can be inadequate – DLs are expected to add value (context, relationships, enrichment) – Integration and aggregation can enhance representation and contextualisation of cultural heritage resources – Cultural and heritage content is important for community identity – Thoughtful selection is important – New Zealand’s Digital Content Strategy pays special attention to indigenous culture through concepts of authenticity and guardianship – DLs for cultural heritage can return resources to their original guardians in an enriched form (reciprocity) – Sustainability, integrity, authenticity and interpretability are important for the trustworthiness of DLs for cultural heritage References Bishoff, L. and Allen, N. (2004), “Business planning for cultural heritage institutions”, Council on Library and Information Resources: available at www.clir.org/pubs/reports/pub124/contents. html (accessed 2 February 2009) Cedar Face, M.J. and Hollens, D. (2004), “A digital library to serve a region”, Reference and user Services Quarterly, Vol. 44 No. 2, 116-121 Collier, M. (2004), “The business aims of eight national libraries in digital library cooperation”, Journal of Documentation, Vol. 61 No. 5, 602-622 Coppola, P., Lomuscio, R., Mizzaro, S., Nazzi, E. and Vassena, L. (2008), “Mobile social software for cultural heritage: a reference model”, Business Information Systems Workshops 2008: available at ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-333/saw6.pdf (accessed 3 February 2009) DiMaggio, R. and Hargittai, E. (2001), From the “digital divide” to “digital inequality”: studying Internet use as penetration increases”, Princeton, The Center for Arts and Cultural Policy Studies, Princeton University Dobratz, S. and Schoger, A. (2007), “Trustworthy digital long-term repositories: the nestor approach in the context of international developments”, ECDL 2007 Lecture Notes in Computer Science, Vol. 4675, 210-222 Dorner, D., Liew, C.L. and Yeo, Y.P. (2007), “A Textured Sculpture: An examination of the Information Needs of Users of Digitised New Zealand Cultural Heritage Resources”, Online Information Review – The International Journal of Digital Information Research and Use, Vol.31 No.2, 166-184 Gere, C. (2002), Digital culture, London, Reaktion books Hunter, J., Koopman, B. and Sledge, J. (2003), “Software Tools for Indigenous Knowledge Management”, Paper presented at Museums and the Web 2003: available at www.archimuse.com/ mw2003/papers/hunter/hunter.html (accessed 2 February 2009) Kamira, R. (2003), “Te Mata o te Tai – The edge of the tide: rising capacity in information technology of Maori in Aotearoa-New Zealand”, The Electronic Library, Vol. 21 No. 5, 465-475 Lagoze, C., Krafft, D.B., Payette, S. and Jesuroga, S. (2005), “What is a digital library anymore, anyway?”, D-Lib Magazine, Vol. 11 No. 1
204
BPDG_opmaak_12072010.indd 204
13/07/10 11:51
DIGITAL LIBRARIES FOR CULTURAL HERITAGE: NEW ZEALAND
Liew, C.L. (2005), “Cross-cultural design & usability of a digital library supporting access to Māori cultural heritage resources”, in Design and Usability of Digital Libraries: Case Studies in the Asia Pacific, edited by Theng, Y.L. and Foo, S., Information Science Publishing, 285-297 Lynch, C. (2002), “Digital collections, digital libraries and the digitisation of cultural heritage information”, First Monday, Vol. 7 No.5: available at firstmonday.org/issues/issue7_5/lynch/ index.html (accessed 2 February 2009) Manovich, L. (2001), The language of new media, Cambridge, Mass., MIT Press Ngulube, P. (2002), “Managing and preserving indigenous knowledge in the knowledge management era: challenges and opportunities for information professionals”, Information Development, Vol. 18, 95-102 Peterson Holland, M. (2002), “We come from around the world and share similar visions”, D-Lib Magazine, Vol. 8 No. 3 Roy, L. and Alonzo, D.L. (2003), “Perspectives on tribal archives”, The Electronic Library, Vol. 21 No. 5, 422-427 Seadle, M. (2002), “Whose rules? Intellectual property, culture and indigenous communities”, D-Lib Magazine, Vol. 8 No. 3 Sullivan, R. (2002), “Indigenous cultural and intellectual property rights: a digital library context”, D-Lib Magazine, Vol. 8 No. 5 Talakai, M. (2007), Intellectual property and safeguarding cultural heritage: a survey of practices and protocols in the South Pacific: available at: http://www.wipo.int/export/sites/www/tk/en/ culturalheritage/casestudies/talakai_report.pdf (accessed 5 June 2010) UNESCO (2003), UNESCO Charter on the Preservation of Digital Heritage: available at http://portal.unesco.org/ci/en/ev.php-URL_ID=13367&URL_DO=DO_TOPIC&URL_ SECTION=201.html (accessed 30 January 2009) Warschauer, M. (2003), “Technology and social inclusion: rethinking the digital divide”, The Electronic Library, Vol. 20 No. 1, 7-13
205
BPDG_opmaak_12072010.indd 205
13/07/10 11:51
BPDG_opmaak_12072010.indd 206
13/07/10 11:51
19 APEnet: A MODEL FOR INTERNET BASED ARCHIVAL DISCOVERY ENVIRONMENTS Angelika Menne-Haritz
Mission In January 2009 the APEnet project, funded by the European Commission, started. Its aim is to create a place on the Internet which provides joint access to archival information from European countries to support research across the holdings of all archival institutions in Europe which want to be accessible there. Borders between European countries have often changed during history, states have merged and separated, laying the ground for differences in their development as well as complicating the relationships between them, be they peaceful or hostile. European countries have their common history, and their differences just make their histories even more interesting. Therefore a gateway to archives in Europe will provide the opportunity to compare national and regional developments and to understand even better their singularity and at the same time the relations of each of them to the European identity. The gateway will help visitors to discover new sources and to inspect them directly from home or from their work place. It will reveal the jigsaw puzzle of archival holdings across Europe in all its diversity with its pieces fitting together just as they originally emerged from individual communication processes, even when they treated common subjects like the creation of the common market. Approach To achieve this goal the project, which was based on recommendations of the European Board of National Archivists (EBNA), chose a collaborative approach. There will be no central board deciding on or editing content. The project takes an approach whereby the contributing archives will achieve a common publication platform for archival descriptions but full responsibility for the content remains with the archives. That includes the holdings descriptions, digital reproductions and information on their institution and on the accessibility of the material. This collaborative platform for archival institutions is conceived as an open research platform for users on the Internet and the software developed will be provided as open source. The gateway should allow users to refine their questions, to use annotations, to learn more about backgrounds, to investigate and discover new facts, new relationships and new details that nobody knew before. That is what archives are for. Archives present old records for new insights, and deliberately browsing or looking round in descriptive information is an important step in the discovery process. 207
BPDG_opmaak_12072010.indd 207
13/07/10 11:51
Angelika Menne-Haritz
The union finding aid at the centre The common publication sphere for the contributing archives is the union finding aid. This name places it in the professional tradition of describing holdings with finding aids, originally in a structure resembling book form. This form has certain undeniable advantages. For instance, books preserve stable sequences of their entries as well as protecting single items of information from getting lost. The union finding aid does the same. Furthermore it provides the advantages of a shared data environment like the union catalogues of libraries. Beside a stable and fixed sequence of the items mentioned, the book form of the original finding aid places it inside a structure with chapters and sub-chapters like the text in any book. The table of contents at the beginning gives an overview and a first orientation. The online finding aid in the form of a book as a whole provides information on one complete record group or an essential part of it by being tightly related through common provenance and being structured by the original communication processes that produced the records. As records in archives emerge from communications between units of an organisation working together to solve common problems, they reflect the processes of their origin as well as the single events which caused their creation. The structures of the finding aids represent the relations between these processes and the various steps involving utterances, demands, responses, proposals, and finally decisions communicated to the outside world. The structures are inherent in the material and are analysed and drafted in the form of a table of contents by the describing archivist. Showing the structure is essential, as it informs users about the background of each record that is needed to understand it, giving information which may be the unarticulated but implicit context of the communications which were well known and taken into account by all participants, but unknown to third parties outside these processes. The stability of the paper book has for long periods of time preserved the representation of sequence and relationships while single units of records could be described according to different aspects of their creation, for example the time, the reason for their creation and the affairs treated, identified by a call or reference number. The headers of the structural groups arranged in hierarchical order aggregate information valid for all groups or single titles underneath. Thus the hierarchy not only structures the whole but also serves to reduce redundancy from top to bottom. With the help of the new technologies around XML these special effects of analogue finding aids on paper can be preserved for digital finding aids since XML data have the form of text files in contrast to the atomised data fields of a relational database. However in contrast to free text they have a clearly defined structure. XML data have the added functionality of being suitable for combination, comparison, re-use in different applications or displays, and integration into joint access points. While paper finding aids could mainly be used for structured browsing, with perhaps some index terms in the annex, structure and full text search can now be combined in such a way that users can switch between the two whenever they want. Paper finding aids were material things and closed objects with fixed external limits, but when converted into digital format they can be integrated with other finding aids without losing their identity. 208
BPDG_opmaak_12072010.indd 208
13/07/10 11:51
APEnet: INTERNET BASED ARCHIVAL DISCOVERY ENVIRONMENTS
The APEnet gateway project tries to combine the functional advantages to which users of legacy finding aids were accustomed with the new flexibility and availability of Internet techniques. The following text provides insight into some of the issues discussed during the development of the logical model. They relate to a test bed implementation at the time of writing (autumn 2009), which may well be different from the later product version, because during 2010 the discussion will continue and the full technical realisation will take place. However the functions described here should be there because they represent the main principles of the project. The value proposition For the contributing archives The project’s aims are described in the Description of Work, the document on which the funding decision of the European Commission was made. The aims include the construction of a gateway with a union finding aid at the centre. This should be realised in such a way that contributing archives get an additional way of enhancing their visibility without investing additional effort. The threshold of requirements for participation should be maintained as low as possible by offering tools and help for integrating their own data into the gateway. On the other hand the contributing archives should be assured that they keep complete control over their own data, including the freedom to change or delete them from the central presentation whenever they want to. The more the gateway can assure partners that they can gain added value (which can consist of higher standing in their social and political context), with rather little effort on their part, the more they will be willing to contribute their content and the better the gateway can serve the public by offering search facilities across all available descriptive information. It is of the highest priority to provide value to potential users by achieving a critical mass of integrated descriptions and contributing archival institutions. On the other hand achievement of critical mass depends first and foremost on demonstrating value to those institutions. The union finding aid will be successful if it is experienced by the archival institutions as a service offering a common publication platform for the archival community which makes their work more visible. A great diversity can be observed in the way partner institutions present their holdings to the public on the Internet. This diversity reflects the variations in the history of archives in their countries and the administrations which produced the surviving paperwork according to their special needs and the characteristics of the problems they had to solve. This diversity is in itself an important aspect for users of archives and should not be obscured by any technical approach. The Internet presentations of the participating archives nevertheless have much in common. They are all structured in such a way that users may navigate from the more general information on the holdings to the more detailed descriptions. The descriptions of units also consist of a set of similar elements. For each unit a title gives an idea of what can be found there. A reference or call number identifies the unit and allows it to be ordered for consultation from the stacks, as a microfilm reproduction, or from a digital 209
BPDG_opmaak_12072010.indd 209
13/07/10 11:51
Angelika Menne-Haritz
repository. Dates indicate the time during which the records were created. These are the core elements used in all archival institutions. These common aspects of the archival approaches to Internet presentations of finding aids throughout Europe have their roots in the professional archival competences developed over a long period of time and in long traditions of archival training in the different countries. They show that the creation and keeping of records in administrations and their archiving follow certain common methods independent of the contents they reflect. The construction of the gateway can therefore base its efforts on the common structures and assure in this way the representation of the richness and diversity in the holdings of the contributing archives. For end users Users of archives may be researchers, students, teachers, journalists, or ordinary citizens who want, for instance, to trace the history of their family or the place where they live. Their needs and their research methods are different. They all constitute a general public interested in using the archives without necessarily belonging to clearly definable target groups. Several national archival laws provide for a general right of everybody to access archival holdings. Therefore archival material the creation of which was paid for by taxes from citizens is considered to be in the public domain, free of charge and free of copyright restrictions. This however may not include the right to receive reproductions for free and may be restricted in the case of specific forms of material like photographs and films. The gateway will fulfil the expectation placed in it if it allows research into and investigation of archives in a better way than with legacy finding aids on paper and in isolated archival institutions. The link to the providing archives and to their presentations will be offered as an option at any step during the research process. The added value for archival users offered by the gateway consists of: – The provision of an overview of archival material throughout Europe. – The discovery of relationships between archival fonds1 from different countries. – The identification of material for research questions in all relevant archival institutions. – The investigation of archival descriptions independently of time and place. The ability to do research in digitised archival material in context, if only to get an impression on how much effort has to be invested in exploring the original material. These achievements need strong support for users in the form of well-designed usability of the gateway. The main design requirements are: – Let users always know where they are. – Give as much orientation as possible in an unobtrusive way, making optimal use of layout, symbols, and icons in contrast to presenting the content in text form. – Allow choosing between different strategies like full text search, navigation through structures, browsing, index search at any moment without losing the search path. – Make all available options clear at a given moment, including their effects, and allow the user to undo them. – Provide options for printout, shopping lists, emailing results. 1
“Fonds“ is the European term and “record group“ is the American term. Archival fonds or record groups are the naturally grown entities or bodies. “Archival collections“ is a special term for material artificially brought together by the archive as a collector. 210
BPDG_opmaak_12072010.indd 210
13/07/10 11:51
APEnet: INTERNET BASED ARCHIVAL DISCOVERY ENVIRONMENTS
The union finding aid is therefore different from Internet search engines. It does not lead users to unknown places somewhere else, but invites users to stay with it, feeling comfortable until they find what they are looking for or definitely know that it cannot be found there. So it presents itself more like an investigation platform which allows for the re-formulating of questions more and more precisely and learning more about the material available. The presentation The structure of the union finding aid The union finding aid will give access to all descriptions submitted within an overall structure. Each part of the information has its well defined place which shows its context. All data from archival institutions are presented for integrated searching in deliberately selected combinations and for structured navigation, both in the same presentation without the need to change the search engine. As the data which are integrated into the union finding aid are delivered in data formats which comply with the international professional standards the presentation can be based technically on the XML format, which allows a flexible construction of layout and linkages. The standards used are Encoded Archival Description (EAD), Encoded Archival Context (EAC), Encoded Archival Guide (EAG) and Metadata Encoding and Transmission Standard (METS), and are formulated as XML schemas for files containing special archival information packages. Subsets of these standards have been chosen as profiles for this application. The data structured according to these subsets are in the form of files for each finding aid, holdings guide, and information about the institutions validated against these schemas. The files can be worked on locally without direct access to the central installation, and when finished they can be uploaded. This includes the conversion of exports from local archival systems and the production of corresponding packages of interrelated HTML files with the help of an editor. The necessary linking between the different files for the presentation is done automatically after uploading and indexing them on the host. The presentation of the union finding aid consists of three distinct related layers forming an open architecture. The top layer shows the archival landscape with all contributing institutions ordered either according to regions and countries or to types of archival institutions. This layer is linked to a separate page containing the general information on the institution, like opening hours and addresses. By clicking on the name of an institution, its holdings guide which is situated on the middle level opens. It contains descriptions at collection level of all records groups or fonds in this institution and may be expanded by a separate page indicating information on single records creators, called up from the corresponding description. The name of each record group is linked to the corresponding online finding aid on the bottom level, if there is one. The finding aid and the holding guide are presented with their clickable table of contents in a left hand frame and a page header indicating the classification header of the group just opened. The main part is the structured list of descriptive units, which may be linked to digital reproductions or other images on the archive’s own web server. These three layers offer several functionalities which are useful for research inside the whole. 211
BPDG_opmaak_12072010.indd 211
13/07/10 11:51
BPDG_opmaak_12072010.indd 212
Bottom layer
Middle layer
Top Layer
file ID:
METS - Digital Archival Objects
single units
file ID:
links
fonds / record groups
structured list of fonds URL of the home presentation
links
archives
URL of the home presentation
internal links
file ID: file name
URL of the home presentation links to images
descriptive units
structured metadata
single units
file ID:
structured list of descriptive units
file ID:
List of names and competencies
EAD - Finding Aids
EAC - Records Creators
single units
file ID:
structured lists of archives
adresse, opening hours etc.
EAD - Holdings Guides
EAG - Archival Services
EAD - Archival Landscape
Union Finding Aid - the architecture -
Relations inside the Union Finding Aid
Bottom layer: structured Finding Aids for each fonds with information on the descriptive units and with links - to Digital Archival Objects of images - to transcriptions for full text search
Middle layer: structured Holdings Guides for each archival service with information on each fonds and its provenance and with links - to more information on records creators - to Finding Aids
Top layer: structured Archival Landscape with a list of all contributing archival institutions and with links - to information on the archival service - to its Holdings Guide
Angelika Menne-Haritz
Relations inside the Union Finding Aid
212
13/07/10 11:51
APEnet: INTERNET BASED ARCHIVAL DISCOVERY ENVIRONMENTS
Besides offering an associative approach to research, this approach allows the selection of parts of the whole presentation for a subsequent more specific search on relevant areas. The relevance of later search results can therefore be gauged by users in advance before starting the search. This architecture has the advantage of being very flexible for later changes in the display as well as for decentralised maintenance and updating of the content. If the data uploaded conform to the XML standards in the form of the profiles used for the HTML transformation they find their place with the upload and indexing and are integrated into the union finding aid at once.
The content The gateway provides an infrastructure for the publication of descriptive data from all participating archival institutions. The added value for them consists in the cross-border research possibilities with content from other archives. The main principle is that as few data as necessary for the overall search and display of results and hits should be brought to the central host and all data intended for display but not needed for search on the host can be linked from the home servers, including images from digitised records. No data will be used only for searching purposes which are not used for the presentation of the results. The definitions of the target profiles of the data formats describe what is actually used for search and presentation on the host. The user may be redirected to the original presentation of the contributing archives from any point of the presentation where more and enriched information may be shown.
Redirecting users to the providing institutions Users may be redirected at any point to the home presentation. That may be from the title page of the single finding aid or from the descriptive unit. Any finding aid encoded in the EAD standard contains an element for its original Internet address. The standardised ID of the archival institution together with an ID of the finding aid and a further ID of the single descriptive unit in combination can assure these links. These IDs can be entered automatically into the local files during a conversion process done with the conversion tools to be delivered by the project to the institutions. The information for the links is completely controllable by the archives themselves and can be changed in batch mode by them, for example, if the addresses of the home presentation change.
The search possibilities Different search methods can be offered with the data provided by the archives. The following description concentrates on what is possible with the existing data. Other possibilities may be thinkable in the light of new Internet technologies. The gateway will however be based on the data which are available and it is not the intention to oblige the institutions to capture and provide special data for the gateway. With XML as the technical basis it is possible to combine a full text search targeted at certain elements with a structured presentation of the same material, and to offer different search strategies at the same moment inside the same presentation. The units of retrieval are the descriptive units and are shown as results. The search methods which can be combined include:
213
BPDG_opmaak_12072010.indd 213
13/07/10 11:51
Angelika Menne-Haritz
The hierarchical approach With the multi-level description the path to access the archives is laid down by the holding guides which contain a general presentation of the contents of all the record groups held by an institution as a higher level description and the finding aids which describe the units of the record group. They can be as precise as needed and can relate to the digitised documents. The overall architecture from the archival landscape at the top to the descriptive units at the bottom allows one to move through the whole amount of information using the relations as paths to follow and always keeping the orientation in order easily to find the way back or to go to other areas.
Skimming or thumbing through the finding aids Browsing within the holdings structure, like turning the pages in a printed book, can give users a better overview of what is available. This way they can discover things related to those descriptions they have just seen, and perhaps found with a full text search, without the need to name exactly what they are interested in. This is a way to proceed with archival investigations in an associative way by trial and error.
Searching with index terms If archival institutions provide index terms with their descriptions, entered during the processing of the holdings, these may be of high value for searchers, because they are more relevant than free search terms and they have been chosen by the archivist for special descriptive units. They can be linked directly to the places they refer to. That means for the presentation that a click on the reference related to an index term opens the finding aid with the display of the corresponding descriptive unit.
The full text search The full text search covers all material included with deliberately chosen search terms. It can be filtered or use Boolean operations, and is directed towards the descriptive units. The search can be done in such a way that it includes the upper hierarchy of the finding aid and integrates this information into the results. The descriptive units thus inherit the information in the headers of the structure, and if a term is found in a header all descriptive units underneath will be in the results. The display of the entrance page of the union finding aid includes a Google-like search slot where search terms can be entered, besides an unfoldable structure of the archival landscape. A search can be started at once. Filters are offered on the same page, since user studies by libraries have found that expert or advanced search features offered on special pages are rarely used.
Display of the results and relevance ranking The display of the results of a full text search places them in their context, concentrating at the same time on the descriptive unit that was found. Two principles are respected: – The structure can be used to give context because archival information is interrelated and surrounding information explains the actual hits
214
BPDG_opmaak_12072010.indd 214
13/07/10 11:51
APEnet: INTERNET BASED ARCHIVAL DISCOVERY ENVIRONMENTS
– Although the ranking of search results is often done automatically by the search engines, researchers in archival descriptions and archival material are well able to estimate the relevance themselves using the structures and the relations inside the material to make their own findings transparent for readers of their scholarly publications.
Viewing the results As the search is done with indexed full texts of the finding aids, the complete finding aids can be opened and the result can be shown at its place inside the text. The results list leads directly to the places where the terms are used in the texts of the finding aids. They are opened and can directly be navigated and read like e-books with the search term highlighted. The conversion engines Conversion engines are an essential part of the infrastructure of the gateway. They are the prerequisite for the functioning of the main principle, which is to leave the responsibility for the content with the providers and restrict the role of the host to collecting, indexing and presenting data. The conversion engine is the key tool which gives contributing archives full control over their own data and their presentation inside the gateway. Data formats used by contributing archives for their descriptions are either EAD or database exports confirming to ISAD(G). The transformation of data from ISAD(G) based databases transforms local data into the format of the target profile and integrates supplementary information like the repository ID and a language tag. If descriptive data at collection level exist, they can be converted into the target profile of EAD for holdings guides and linked to the finding aids. If they do not exist, the EAD profile for the holdings guide may be used to edit an abbreviated structure of the holdings. Information on the archival services contains opening hours, directions for reaching them, and more information as necessary. Capturing and editing can be done with the conversion engine encoding the data automatically in EAG on the basis of the EAG profile. If information on record creators exists in a database conforming to EAC or ISAAR (CPF) it can be translated to the target profile of EAC and integrated into the presentation. The data will be linked automatically to the corresponding entry in the holdings guide. Single digital images and sets of image files brought together for each descriptive unit into the form of a digital archival object formed by a METS file may be stored on the same or a separate server. This may be an image server of the contributing archives. For archives without their own facilities, the host may offer the opportunity to store the files in low resolution.
215
BPDG_opmaak_12072010.indd 215
13/07/10 11:51
Angelika Menne-Haritz
In summary, the conversion engines can be used for: – The transformation of local data formats of descriptive information into the target profile, including the preparation of files for upload or harvesting; – The integration of parameters needed for the central presentation and of the linking elements for related files; – working in batch mode for whole groups of finding aids; – the transformation or capture of data for the holdings guide, the information on the archival services and on record creators; – the transformation or editing and preparation of data about the digital archival objects linked to the descriptive information in the presentation. These conversion tools are an important pillar of the concept of the union finding aid. By giving full responsibility for the content to the decentralised archival institutions it reduces resources needed for the maintenance of the host. Data management Many Internet pages which offer access to data from different sources, including relational databases, are built on XML for the presentation. The underlying intentions are mainly to separate business structures from presentation structures and to re-use the data for different operations without interfering with their main storage. In the same way XML is used as a format for exchanging data between relational databases. Over the years, developers have devised many strategies and frameworks to facilitate the separation of business logic and presentation logic. The form and layout of the presentation if separated from the business processes are more easily adaptable to new needs. The union finding aid does not include any business processes and can therefore work directly with the delivered XML data without transforming them. The XML files can be indexed directly, like other text files, while there are units of retrieval which can be defined inside their structure. An efficient alternative to storing the data in a relational database or in a file system may be the implementation of a dedicated repository system which ingests, stores and disseminates data in EAD format or conforming to other standards, as well as images or digital archival objects with their own metadata inside an architecture following the OAIS reference model. From the beginning it was planned that the union finding aid should be based on EAD. It was at that time already the main format for international cooperation and joint access points. The family of standards which has meanwhile grown up round EAD is technically bound to the use of XML, and EAD is already widespread in Europe. First steps towards interoperability with Europeana have confirmed that the original approach to use EAD and its related standards appears to be validated. METS is an XML schema which can be linked to EAD. In the concept of the union finding aid METS files themselves link to single image files and to the corresponding text transcriptions. All the files representing one descriptive unit are composed as one digital archival object. The integrated files may be images, audio, video, or others. Because of its openness METS is also used to organise files as packages for transfer between the different 216
BPDG_opmaak_12072010.indd 216
13/07/10 11:51
APEnet: INTERNET BASED ARCHIVAL DISCOVERY ENVIRONMENTS
areas of the OAIS reference model in the repository. METS files can be maintained in the same way as EAD files in a file system or a repository. In the union finding aid the METS files which are used to control the display of digitised images or corresponding texts, which are stored separately, may be on decentralised servers of the content provider where they can be reached by pointers from the central presentation. Sustainability planning As archivists are trained to think in long periods of time the gateway for archives was from the beginning conceived to enable its maintenance with as few resources as necessary. It should be affordable for the host and therefore should be designed so that supplementary costs of the central installation are as low as possible. If further investment of resources by the contributing archives is needed they will be financed separately. Resources should be spent under their direct control and for those aims they themselves want to achieve. So the concept sees the contributing archives as the active players in the maintenance of the portal using the union finding aid for their own purposes, while the host is their service provider and maintains the technical framework. During the project phase the costs of the central host are covered by the project budget. However costs will be monitored, and if they exceed the amount provided and this cannot be covered by the host country alone, the excess costs will be covered by the other project members by an appropriate contribution model. For the operational phase it is planned that the gateway infrastructure will be integrated into the infrastructure of the national archives providing the hosting facility, according to the rationale that the technical model requires very low maintenance resources. Costs are kept low by use of the three pillars model consisting of centralised and decentralised contributions. The pillars consist of: – the host with the presentation system, which will remain rather stable after its installation phase, – the decentralised mapping and conversion tools which are completely in the hands of the contributing archives as adaptable open source tools usable also for their own offline presentations, – the data integration and interoperability functionality which links decentralised data preparation and central presentation in such a way that the whole works like a collaborative publication platform. The resources required for the presentation itself are reduced by a low level of complexity of the installation. To achieve this it will consist also of a so-called tripod system with the three legs represented by the files delivered by the archival content providers, by an open source indexer like Lucene and by XSLT-style sheets for the HTML transformation directly from the XML files. The use of XML without migrating data from one format to another, but instead keeping the files as they are delivered, is an important factor in reducing the complexity and keeping costs down.
217
BPDG_opmaak_12072010.indd 217
13/07/10 11:51
Angelika Menne-Haritz
Summary – The value proposition of the business model is to enhance the value of each archive’s holdings by the additional services provided by the union finding aid – The content owners stay responsible for their content. They may keep their data in their authentic form and update them whenever necessary – The technical model is based on three pillars: the presentation, the conversion engines and the upload functionality – The value to the content providers is that they gain visibility with a low entry threshold which ensures that they are interested in providing their data and therefore a critical mass of content can be achieved – For the user the presentation allows the investigation and discovery of unknown facts and the verification of interpretations, and different ways to proceed without losing orientation
218
BPDG_opmaak_12072010.indd 218
13/07/10 11:51
20 THE CALIFORNIA DIGITAL LIBRARY Gary S. Lawrence
The California Digital Library (CDL) was conceived as a strategic component of the tencampus University of California system and its libraries. The CDL is therefore inextricably linked with the University and its ten campus libraries and their governance, budgeting and operations. Rather than operating as a self-contained digital library, the CDL serves the UC system as an essential link between the print and digital worlds – developing and providing access to digital collections and services for the UC system, facilitating systemwide access to the rich print holdings of the campus libraries, and helping to explore the potential synergies between the print and digital environments. To understand the business planning factors that influenced the development and operation of the CDL, then, it is first necessary to say a few words about the University of California and its libraries. The Libraries of the University of California The University of California library system comprises more than 100 libraries on the ten UC campuses supporting the University’s teaching, research and service missions. Collectively, the UC libraries make up one of the largest research/academic libraries in the world, with nearly 36 million volumes in their holdings and significant digital collections. While the University is a single corporate entity, as a practical matter the ten campuses operate as a federated system. As a result, the campus libraries are accountable to their campus Chancellors and are capable of operating largely independently of each other. However, for more than 25 years the UC libraries have chosen to work together to realize a vision in which collections available at any one library are available to the patrons of all. A significant landmark in the University’s collaborative strategy for libraries was a formal comprehensive library plan published in 1977, in which the University made strategic use of emerging technologies, including an online union catalogue known as Melvyl®, support for the automation of circulation and cataloguing operations, and the shared physical infrastructure provided by two regional library facilities for storage of little-used materials (University of California, 1977). These innovations leveraged the libraries’ resources to improve efficiency and service to users while containing costs. The strategic emphasis on multi-campus collaboration, the application of new technology, and expanded University-wide sharing of the information resources within UC library collections has been successful in applying the leverage available to a multi-campus system of strong and distinguished institutions in order to maintain high-quality research collections and services in the face of rising costs and other challenges to traditional library models. 219
BPDG_opmaak_12072010.indd 219
13/07/10 11:51
Gary S. Lawrence
Despite these accomplishments, by 1996 the combined and cumulative effects of unfunded inflation in the costs of library materials and growth in enrolments and academic programmes had significantly eroded the quality of collections, problems which were exacerbated by significant cuts to the University budget beginning in 1990‑1991. To respond to these pressures, the Library Planning and Action Initiative (LPAI) was launched in September 1996. The report of its advisory task force, released in March 1998, ushered in a further period of library collaboration, one which has focused on the shared development of digital collections and the further application of technology to enhance library services (University of California, 1998). Specifically, the LPAI made seven recommendations to achieve comprehensive access to scholarly and scientific communication for all members of the University community: “1. UC should seek innovative and cost-effective means to strengthen resource sharing. 2. UC should establish the California Digital Library. 3. UC should sustain and develop mechanisms to support campus print collections 4. UC should seek mutually beneficial collaboration with libraries, museums, other universities and industry. 5. UC should develop an information infrastructure that supports the needs of faculty and students to disseminate and access scholarly and scientific information in a networked environment. 6. UC should lead the national effort to transform the process of scholarly and scientific communication. 7. UC should organize an environment of continuous planning and innovation.” The California Digital Library As one of the seven strategic directions recommended by the LPAI, the CDL joined an existing portfolio of system-wide library services, including the Melvyl Catalog, the regional storage facilities, and expedited intercampus print resource sharing services, as a key element of UC’s long-term strategic approach. From its very beginnings, then, the CDL has operated not solely as an independent library, but as a partner with the ten campus libraries and in mutually reinforcing interaction with other system-wide services. The role of the CDL within its broader institutional context of rich interaction and interdependence with other components of the library system is reflected in its founding vision and mission, as set out in the final report of the LPAI Advisory Task Force:
“To provide leadership in support of a vision that integrates digital technologies into the creation of collections and improved access to information and to guide the transition to increasingly digital collections, the University should establish the California Digital Library (CDL). As the key strategic initiative for meeting the challenges facing our libraries, the CDL will have responsibility for providing new services and extending existing ones to successfully transform our libraries over the next decade. 220
BPDG_opmaak_12072010.indd 220
13/07/10 11:51
THE CALIFORNIA DIGITAL LIBRARY
An integral strategic component of the library system and a collaborative effort of all nine campuses, the digital library should comprise a number of key elements that support and sustain the University’s teaching and research mission: – High-quality electronic knowledge resources – Personal communication tools to create, share, manipulate, store, and use information – An effortless network interface for dissemination of and access to the world’s knowledge – Distributed resources and services integrated at the point of use To accomplish its goals, the CDL … should: – license, acquire, develop, and manage electronic (digital) content in support of campus academic programs, – facilitate access to the collection, – support digitization of paper-based material, – establish policies and procedures for archiving digital content, – encourage and support new forms of scholarly and scientific communication, – and assist campuses in providing user support and training. The initial focus of the CDL should be on the information needs of UC students and faculty. To meet their needs, it will provide access to digital information, relieve pressures on print collections, and develop systems that encourage and enable the campuses to coordinate and share their print and digital resources. Ultimately the CDL should build the partnerships that will allow the University to deliver information to all Californians.” Consistently with the recommendations of the LPAI, the CDL was established in 1997 and began operation in 1999 with a mission which included: “ … a wholly digital charter and two complementary but distinct roles. As an arm of systemwide library planning, CDL supports the University of California libraries in their mission of providing access to the world’s knowledge for the UC campuses and the communities they serve. In so doing, it directly supports UC’s mission of teaching, research, and public service. The CDL also maintains its own distinctive programs emphasizing the development and management of digital collections, innovation in scholarly publishing, and the long-term preservation of digital information. The CDL serves these audiences, in the following order of priority, to fulfill its dual missions: – the UC libraries; – the broader UC community; and – external constituencies and the general public” (California Digital Library, 2008).
221
BPDG_opmaak_12072010.indd 221
13/07/10 11:51
Gary S. Lawrence
Governance Consistent with its strategic role as a service to the UC community, from its inception the CDL has been overseen and guided by: A senior council composed of key stakeholders throughout the UC system, including executive and research vice chancellors, academic deans, budget administrators, leaders of the Academic Senate, disciplinary faculty, campus and system-wide information technology administrators, key legal and policy advisors, library directors and library staff. The Systemwide Library and Scholarly Information Advisory Committee (SLASIAC; http://www.slp.ucop.edu/consultation/slasiac/) reports to the University’s Provost and Senior Vice President for Academic Affairs, is chaired by a campus Executive Vice Chancellor, and operates under a broad charge to: “…[advise] the University on system-wide policies, plans, programs and strategic priorities related to the acquisition, dissemination, and long-term management of the scholarly information, in all formats, created by or needed to support UC’s world-class teaching and research programs. This charge includes, but is not limited to, advising on systemwide long term planning for the UC libraries including the 10 campus libraries and the California Digital Library (CDL), strategies that will enhance and facilitate the transmission of scholarly and scientific communication in a digital environment, and legal, legislative, regulatory and policy issues that influence the effective provision of scholarly information services” (University of California, 2006). An informal council comprised of the University Librarians of the ten UC campuses and the CDL, which collaboratively develops and reviews operating policies and practices for the UC Libraries, identifies and plans new library initiatives, advises the CDL on its plans, priorities and operations, and oversees coordination of system-wide and multi-campus library programmes and services. The University Librarians, in turn, are supported by an extensive network of system-wide library staff groups organized under the Systemwide Operations and Planning Advisory Group (SOPAG; http:// libraries.universityofcalifornia.edu/sopag/), composed of senior professional library staff with varying portfolios, one from each campus library and one from the CDL, plus a representative of the Librarians’ Association of UC. Programmes and services The CDL is organized into five core programmatic areas. – Bibliographic Services: the Melvyl Catalog; the Request service which allows UC faculty, students, and staff to create interlibrary loan or campus document delivery service requests from the Melvyl Catalog or article databases; and UC-eLinks, a comprehensive link resolver service exposed in the Melvyl Catalog, Google Scholar, and hundreds of article databases. – Collection Development and Management: Licensed Content services which provide the UC community with access to more than 24,000 electronic journals, hundreds of thousands of electronic books, and more than 250 article and 222
BPDG_opmaak_12072010.indd 222
13/07/10 11:51
THE CALIFORNIA DIGITAL LIBRARY
reference databases containing thousands of records; Shared Print Collections, jointly purchased or electively contributed by the libraries in order to broaden and deepen UC Library collections and achieve economies not available through traditional models of collection development; and Mass Digitization programmes which seek to digitize the many volumes in the ten UC libraries and make them freely available to users over the Internet to enhance access to and management of our vast library holdings. – Digital Preservation: the UC Libraries Digital Preservation Repository, a set of services which support the long-term retention of digital objects for the benefit of the UC libraries and their users; and Web at Risk, one of eight nationally funded National Digital Information Infrastructure Preservation Program (NDIIPP) projects, developing a web archiving service which will enable libraries to capture, curate, and preserve web-based information. – Digital Special Collections: Calisphere, the University of California’s free public gateway to more than 200,000 digitized primary sources selected from the libraries and museums of the UC campuses and cultural heritage organizations across California; the Online Archive of California, a single, searchable database of finding aids to thousands of primary source materials and their digital facsimiles held in a variety of California institutions; and the UC Image Service, a collection of digital images with search and presentation tools which supports teaching, learning, and research for faculty and students at UC, providing access to licensed collections and images from UC archives, museums, and libraries. – eScholarship Publishing: the eScholarship Repository, an open-access publishing platform which offers UC academic departments direct control over the creation and dissemination of the full range of their scholarship, including pre-publication materials, journals and peer-reviewed series, postprints, and seminar papers; eScholarship Editions, a collection of digital scholarly monographs, including nearly 2,000 books from academic presses on a range of topics, including art, science, history, music, religion, and fiction; and the Mark Twain Project Online, a groundbreaking digital critical edition of Mark Twain’s works which applies innovative search, display and citation technology to more than four decades of archival research by expert editors at the Mark Twain Project. Additionally, the CDL has five internal service groups which directly support its five core programmes, as well as the overall management of the broader CDL organization, by providing (1) assessment, design and production services, (2) business services, (3) information services which respond to assistance requests from users of CDL’s systems and support outreach and instruction programmes, (4) infrastructure and application support services which manage CDL’s digital infrastructure of computers, network connections, operating systems, and security software, and (5) project planning and resource allocation services. Further, the CDL shares with the University’s senior management and the University Librarians responsibility for strategic and operational planning for all aspects of the UC Libraries as a federated system, and provides collaborative leadership and support for this function, implementing the LPAI recommendation that “UC should organize an environment of continuous planning and innovation”. 223
BPDG_opmaak_12072010.indd 223
13/07/10 11:51
Gary S. Lawrence
Business Case As a strategic component of the UC system, the CDL is largely supported by budgeted allocations of permanent funds within the context of the University’s overall budgeting strategies, priorities and methods. The core budget of the CDL is composed chiefly of the following elements: – An initial permanent allocation of $1 million from University discretionary funds, arranged in conjunction with the launch of the CDL in 1997. – Permanent allocations totalling $7 million from the State of California over a threeyear period beginning in the 1998-1999 fiscal year, complementing the University’s initial investment and providing basic operational support for the CDL, as well as supporting initiatives in resource sharing, scholarly communication, and other priorities set out by the Library Planning and Action Initiative. – Prior permanent allocations from the State to support the Melvyl Catalog, totalling about $5 million. – An additional allocation of University discretionary funds in 2000-2001, totalling about $ 1 million, to support the establishment and development of CDL’s eScholarship Publishing services. – Voluntary co-investment of funds from the ten campus libraries, currently totalling about $18.5 million annually, chiefly to support the cost of the shared Licensed Content collections; the CDL contributes about $5.5 million annually from its permanent budget for licensing costs. – Varying amounts of extramural (chiefly grant) funding to support CDL’s portfolio of research and development activities. These activities are undertaken to explore and develop new technology-enabled services and to experiment with new technologies which could be applied to existing services, and are frequently carried out in partnership with the UC campuses and a variety of external partners. Like a traditional library, the CDL attempts to maximize the value of its portfolio of services within a relatively fixed budget. Consistently with its strategic and collaborative role in the ten-campus UC system, the CDL engages in regular and extensive consultation with the campus libraries and seeks guidance from the University-wide community of stakeholders (usually through the Systemwide Library and Scholarly Information Advisory Committee, discussed above) in developing its priorities, designing its services, and allocating its budgeted resources. In consultation with its partners and users, the CDL pursues activities which: – Help to justify, defend and augment its allocated budget by demonstrating the value that is returned to the University and the State through their investment in the CDL. – Return value to the ten campus libraries that exceeds the amounts of the libraries’ co-investment in CDL-sponsored products and services. – Continue to attract research and development funding from extramural sponsors. Over the last decade, the CDL has successfully met these challenges, delivering both direct financial benefits and less tangible advantages that flow from the CDL’s position as a leading partner in and source of central support for the operations of an interdependent and collaborative library system. Among the direct benefits are: 224
BPDG_opmaak_12072010.indd 224
13/07/10 11:51
THE CALIFORNIA DIGITAL LIBRARY
– Expanded access to a wider variety of scholarly materials through shared licensed collections. Prior to the advent of the CDL, no single UC campus could afford to subscribe to the 24,000 journal titles and 250 reference databases in the CDL’s Licensed Content portfolio, either in print or online. If campus libraries independently licensed these information resources, they would spend an additional $42 million per year. – The formidable buying power of the UC system provides the leverage needed to obtain information resources at favourable prices and on good licensing terms. Comparing historic trends in the pricing of digital publications with the actual results of its publisher negotiations, the CDL estimates that the University saves about $2 million annually by negotiating collectively. – Collective negotiation for digital content converges with the shared print programme to provide additional benefits. The CDL frequently negotiates with publishers to include one print copy of each title in a digital subscription package at no additional cost as an archival copy added to the shared print collection. The estimated value of these print archives, at published prices, is about $6 million per year. In addition, the CDL provides essential support for a wide range of University-wide services which involve and depend on campus library operations outside the scope of the CDL’s direct control. In some cases, the breadth of involvement makes it difficult to determine the proportion of the tangible benefits that should be attributed to the CDL; in others, the benefits are simply less tangible and therefore less amenable to measurement. Examples include: The borrowing and lending of print materials among the UC Libraries (including delivery of photocopies or scanned images in lieu of loan) is facilitated by a number of systemwide programs, including the bibliographic services provided by the CDL (the Melvyl Catalog, the Request user-initiated ILL request service, and the UC-eLinks link resolution service) and a centrally-funded overnight courier service. Interlibrary borrowing among UC’s libraries (which accounts for about 73% of all items borrowed from other libraries) has increased by 142% since 1988-1989. If the campus libraries had been compelled to purchase and add to their own collections the items they were able to borrow from each other via interlibrary loan in 2006-2007, the total purchase cost would have been almost $37 million. The systemwide technical infrastructure created by the CDL provides a platform upon which to build additional value-added services. For example, the bibliographic systems which underpin the Melvyl Catalog also enable the provision of the interlibrary lending services discussed above. The system-wide infrastructure also supports the ability to aggregate thousands of finding aids and digital images from UC’s special collections, archives and museums in the Online Archive of California, and to make these resources available to the public through the Calisphere service. The Melvyl Catalog is also a key contributor to the success of UC’s regional library storage programme, by ensuring that library materials moved to off-site storage are easy for library users to discover, request and receive. The two Regional Library Facilities (RLFs) at Richmond (for northern campuses) and Los Angeles (for southern campuses) began 225
BPDG_opmaak_12072010.indd 225
13/07/10 11:51
Gary S. Lawrence
operation in the early 1980s and currently provide low‑cost, high‑quality off‑campus space housing 11.5 million volumes of infrequently‑used materials of enduring research value deposited by campus libraries, allowing the University to maintain a rich and distinguished research collection at a fraction of the cost required to build equivalent on‑campus library facilities. By depositing materials in the regional library facilities, the campuses avoid capital costs of about $16 million per year, on an annualized basis, which would have been incurred to build on‑campus library facilities to house these collections. A newly‑established shared print collection programme, modelled on the success of the shared digital collection, allows campuses to purchase single copies of printed material for system-wide use or assemble high-quality collections from existing campus holdings, avoiding unnecessary and unplanned duplication of collections and expenditures. The CDL’s systems and services enable campus libraries readily to identify candidate materials for addition to the shared print collections and ensure continued and reliable systemwide bibliographic access to those materials. Major collections developed through this programme include archival print copies of many of the digital journals acquired through the Licensed Collections programme, and a complete archival print collection of journals digitized by JSTOR. Through this programme, the libraries may avoid subscription costs for print journals of up to $3.5 million per year, and realize additional savings in on‑campus shelf space to house those journals. As a relatively small and focused technology-based organization, the CDL is flexible, agile and efficient in adapting to new technological opportunities on behalf of the entire system. For example, it routinely outsources many non-“core” services like software for help desk, routine statistics collection, and project planning to best-of-breed external suppliers, and has leveraged the capabilities of the UC Libraries by outsourcing the library business processes of cataloguing and acquisitions, as well as some data centre services, to UC partner campuses. CDL also adapts quickly to changes in technology to improve services and contain costs. The initial version of the Melvyl Catalog was locally programmed; the second generation was built upon a vendor solution provided by Ex Libris; and a new generation discovery service is being explored in partnership with OCLC, Inc. using its WorldCat Local service. Similarly, eScholarship Publishing services are constructed on a platform developed in collaboration with Berkeley Electronic Press (bepress). The specialized expertise assembled by the CDL in support of its own mission is also available to and used by the UC campuses in the development of their individual and collaborative projects. This pool of expertise is also a wellspring for the development of standards and best practices needed for effective operations of the UC Libraries and of value to the worldwide library community. Among the important standards efforts to which the CDL has made significant contributions are Encoded Archival Description (EAD), Archival Resource Key (ARK), and eXtensible Text Framework (XTF). Both the service capabilities of the CDL and the system-wide collaborative environment that CDL helps to foster have enabled dramatic new initiatives involving all the UC Libraries in close collaboration with external partners. UC’s participation in the Google Books programme and the mass digitization efforts of the Open Content Alliance have been facilitated by the readily-accessible assemblage of candidate materials at our 226
BPDG_opmaak_12072010.indd 226
13/07/10 11:51
THE CALIFORNIA DIGITAL LIBRARY
regional library facilities, CDL’s systems which provide effective bibliographic control of and information about the digitized materials, the capacity effectively to preserve the digitally-formatted results through the Digital Preservation Repository, and CDL’s proven capacity credibly to coordinate the complex procedures and workflows involved in these projects on behalf of the UC Libraries. Similarly, as mentioned above, the Next Generation Melvyl project teams the CDL and the campus libraries with OCLC, Inc., to explore third generation library bibliographic services through the pilot development of an enhanced Melvyl Catalog based on OCLC’s WorldCat Local service. The Bottom Line: the University’s return on investment in the CDL The success of the CDL in addressing its “business requirements” is evident from the foregoing account: with a core budget of about $14 million, the CDL attracts an additional $18.5 million annually in voluntary co-investments from campus libraries, and uses the resulting $32.5 million pool of funds to deliver about $52 million in direct benefits to campuses, supports an additional $46 million in measurable indirect benefits, and provides a technical platform and a leadership capability which fosters development of a host of service innovations that could not readily be supported by our ten campus libraries operating independently. In this way, the CDL demonstrates continuing return on University and State investments while expanding and enhancing services to the University of California’s academic community and the people of the State of California. Summary – The California Digital Library was established as a strategic initiative to support and sustain the University’s teaching and research mission – It functions as one of the libraries of the UC library system offering university-wide services, governed by councils of the whole university – It comprises: bibliographic services, content licensing services, digital preservation, special digital collections and e-publishing – It is financed by central university funds, the State of California, university discretionary funds, contributions from the campus libraries and grant funding – Its business case is based on expanded access to content, co-ordinated buying power, advantageous deals on print content for the archive, facilitation of document supply, and development of specialized expertise – It is calculated to deliver $52m annually in direct benefits against $32.5m of outlay References California Constitution (1879), Article IX, Section 9, available at http://www.leginfo.ca.gov/. const/.article_9 (accessed 9 February 2009) California Digital Library (2008), “About the CDL: Overview and Mission,” http://www.cdlib.org/ glance/overview.html (accessed 14 November 2008) 227
BPDG_opmaak_12072010.indd 227
13/07/10 11:51
Gary S. Lawrence
California State Department of Education (1960), A Master Plan for Higher Education in California 1960-1975, available at http://www.ucop.edu/acadinit/mastplan/MasterPlan1960. pdf (accessed 12 February 2009) Standing Orders of the Regents of the University of California (2008), updated 23 September 2008, available at http://www.universityofcalifornia.edu/regents/bylaws/standing.html (accessed 23 October 2008) University of California (1977), Office of the Executive Director of Universitywide Library Planning, The University of California Libraries: A Plan for Development, 1978–1988, available at http://www.slp.ucop.edu/initiatives/UC_library_plan_1977.pdf (accessed 9 February 2009) University of California (1998), Library Planning and Action Initiative Advisory Task Force, Final Report, available at http://www.slp.ucop.edu/lpai_new/finalrpt/index.html (accessed 9 February 2009) University of California (2006), Charge to the University of California Systemwide Library and Scholarly Information Advisory Committee, revised 25 May 2006, available at http://www.slp. ucop.edu/consultation/slasiac/charge.html (accessed 14 November 2008)
228
BPDG_opmaak_12072010.indd 228
13/07/10 11:51
21 THE OXFORD DIGITAL LIBRARY Michael Popham
A digital library at Oxford In 1999, thanks to generous funding from the Andrew W Mellon Foundation, a ninemonth scoping study recommended the formal establishment of a digital library service at the University of Oxford. Subsequent discussions focused on the potential allocation of resources between the functions which would be required to create new digital collections, and those to develop and maintain new services to readers. The former were seen as largely technical issues (and thus the preserve of IT specialists); the latter as comparable with more traditional library activities which would naturally fall within the domain of reader services. In practice, the boundaries between these two supposedly distinct areas have proven to be extremely blurred. Thanks once again to the foresight and largesse of the Andrew W Mellon Foundation, July 2001 saw the official launch of the Oxford Digital Library (ODL) as both a set of services and a distinct locus for more than twenty newly-created digital collections. Yet in the two years between the publication of the scoping study and the start of ODL operations, much had changed. The scoping study had reviewed existing analogue collections and consulted with librarians across the collegiate University, and provided a decision-making matrix for digitization based upon priorities such as access, preservation, feasibility, and the potential for revenue-generation or the creation of new infrastructure. By the time the ODL was established, attention was largely concentrated on the holdings of the libraries which comprised the newly-formed Oxford University Library Services (OULS), and the emphasis was firmly placed upon digitizing collections to both facilitate greater access and promote the richness of the libraries’ holdings to the widest possible audience. Even in 2001, digitization was not a new activity at Oxford. Several major digital image creation projects had taken place over the previous decade or more, and work to create digital scholarly editions and new electronic finding aids had been underway since the mid-1970s (e.g. with the foundation of the Oxford Text Archive in 1976). Not all of these earlier activities had directly involved the libraries, but by 2001 there were substantial numbers of legacy digital collections the natural home of which lay within the ODL. This combination of new and old digital collections, shifting organizational structures and priorities, and a need to combine traditional library services with cutting-edge technical expertise made for a particularly challenging business planning environment. Indeed those challenges remain to the present day, as library services and the notion of what constitutes a digital library service have continued to evolve to meet the changing demands of our users. 229
BPDG_opmaak_12072010.indd 229
13/07/10 11:51
Gary S. Lawrence
Business Planning Vision and mission When Sir Thomas Bodley founded the Bodleian Library in 1602, his aim was to create a ‘Republic of Letters’ – a community of learning, open to all. While the collections held in Oxford’s libraries have grown considerably in the intervening four hundred years, many of the traditional spatiotemporal barriers to access have remained unchanged. Indeed, with the need to preserve a growing and aging collection of nationally important materials, and the challenges of managing immense numbers of items received under Legal Deposit privilege on a reference-only basis, barriers to access arguably increased over time. In its vision for the ODL, Oxford University Library Services shares the view voiced by Michael Lesk (amongst many others) that “In the future we expect that artifacts will be relatively less important. More and more, it will be the digital version that is used ….. the old will survive, but the new will be dominant.” (Lesk, 2005) While the establishment of the ODL was certainly driven by a combination of factors, greater access to Oxford’s collections by a much broader readership was the primary goal. In the transition to becoming a hybrid library service, OULS recognizes the growing demand to connect users with content – irrespective of the constraints of time, place or original medium. The business case and users More than 60% of the readers registered to use the Bodleian Library are not members of Oxford University. Leading scholars will travel half way round the globe to access some of the unique items held in our collections, but the vast majority of potential users (even those located in the UK) are unlikely to visit in person. Like most other leading educational institutions, the University of Oxford is continually striving to expand the services it offers, to raise its profile, and attract the attentions of the brightest and the best scholars, wherever they may be. Using the Digital Library to increase our online presence and provide access to selections from the libraries’ world-class collections is a good fit with the University’s global aspirations. Although the selection of material for digitization has tended to be driven by a combination of local curatorial decisions and the interests of various funding bodies, we have rarely tried to target specific groups of users. Instead, we have adopted a “build it and they will come” philosophy, and the success of this is borne out by the fact that, for the majority of our digital collections, 96%+ of usage comes from readers operating outside the ox.ac.uk domain. Yet one major flaw in this approach is that it is difficult to gauge whether or not a digital collection is being found and used by sufficient numbers from the potential community of users, and so we must find new ways to measure the impact of our digitized collections.
230
BPDG_opmaak_12072010.indd 230
13/07/10 11:51
THE OXFORD DIGITAL LIBRARY
Most of Oxford’s previous digitization work attempted to improve access to selections of materials (typically drawn from our Special Collections holdings), by providing digital surrogates. Typically, these take the form of high-quality digital images and associated metadata (derived from pre-existing catalogues or finding aids) delivered via a relatively basic web interface. In recent years there has been a growing recognition of the merits of full-text capture (such as our work on the Early English Books Online Text Creation Partnership), particularly for those items which are less amenable to OCR technologies and where we envisage that potential users will be interested in various forms of close textual analysis (e.g. where the intellectual content of an item is likely to be of more interest to a reader than the item’s physical appearance). Nowadays, we are also looking at mechanisms to facilitate improved access to the raw digital data, so that users (or content aggregators, or service developers) can decide for themselves how they wish to access, analyse, and present materials, and by-pass our attempts to second-guess their needs. We have also been fortunate enough to participate in some of the latest mass-digitization efforts (see below) and, almost of necessity, this forces a reassessment of a digital library’s potential user base, the services it can and might offer, the infrastructure and other resources required to support such services, and has a major impact on business planning. Technology and infrastructure Since its inception, the ODL has been committed to the use of open standards and systems. The twenty digital collections created in 2001-2005 adhere to common technical standards and specifications, with the express goal that these resources will be easier and cheaper to maintain and exploit in the long term, albeit that their initial creation was probably more onerous than it might have been had we adopted a proprietary solution. At the time, these projects represented the largest co-ordinated digitization effort we had undertaken. Typically, subject-specialist librarians working closely with academic colleagues (who represented the potential end-user community) would select items from a larger body of material. The librarians would then create and quality-assure Metadata Object Description Schema (MODS) metadata records using a suite of dedicated tools which had been developed in-house. All of these projects involved a substantial amount of new digital imaging, which was undertaken by expert staff based in OULS’ Imaging Services department to an agreed specification (e.g. minimum 300ppi TIFF master files, with strict file-naming conventions). The digital imaging workflow was driven by the metadata records created by the librarians, and this was particularly helpful in the most highly selective digitization projects as it minimized any confusion about which items should be captured from, say, a loose portfolio containing drawings from an archaeological excavation. Lastly, the MODS metadata and associated image files were married together as Metadata Encoding and Transmission Standard (METS) packages, and an open source digital library application (Greenstone) was used to provide a simple test site which would permit both the librarians and academics to review and quality assure the digitized collections and their associated metadata. Unfortunately, it was at this point that the ODL’s initial business model proved to be almost fatally flawed. Having created an initial core set of digital collections, built a 231
BPDG_opmaak_12072010.indd 231
13/07/10 11:51
Gary S. Lawrence
framework of technical standards, and established some viable and costed digitization workflows, further progress depended upon additional funding being secured under the auspices of a wider campaign by Oxford University Library Services. When those funds failed to materialize this resulted in a temporary hiatus and a need to revise the ODL’s plans, which also coincided with an approach from Google to be one of the first ‘big five’ partners in the Google Books Library Program (discussed below). Costing Our attempts to develop a comprehensive cost model for the Oxford Digital Library have met with rather mixed success. Put crudely, it has been relatively straightforward to identify and cost all those aspects of our work that can be easily measured, and as those factors (apparently) account for the vast bulk of our expenditure, this has assuaged the concerns of senior management. It will not surprise readers to learn that our largest single cost is staff salaries, and as these are set within a nationally-agreed framework and managed across the University as a whole, it is relatively straightforward to predict future staff costs for an agreed period. The downside to operating within such institutional structures (and strictures) is that we do not always have the flexibility to reduce staff costs at short notice, nor is it a straightforward matter to reallocate staff to other areas of work. After staff, our next largest cost is typically that of digital capture. In the case of imaging, all our work is undertaken by OULS’ Imaging Services department, which operates on a cost-recovery basis. This approach undoubtedly has several advantages for the ODL. The staff and facilities within Imaging Services are located very close to the collections themselves, so there are no transportation or insurance costs to cover. Moreover, the staff have considerable experience in handling and working with the kinds of materials (e.g. rare and unique items drawn from Special Collections) that tend to get selected for digitization, and they are known to, and trusted by, our subject-specialist librarians and curators in a way that staff from an external vendor would not be. Similarly, if any issues or questions arise at the point of capture, staff in the Imaging Services department are able to liaise directly with their library colleagues, and can be extremely sensitive to their needs and concerns; they share a common vocabulary and have an equal interest in trying to obtain the best results. While this arrangement provides many certainties and clear benefits, it has the indirect consequence that we are less able to negotiate on costs and schedules, and Imaging Services are somewhat constrained in their capacity (e.g. it would be difficult for them to find additional resources to meet the demands of a large-scale but short-lived digitization project). The costs of full-text capture can vary considerably. Unlike some digital library organizations we do not have facilities to OCR textual data on any scale, and should the need arise we would typically out-source such work to a trusted vendor. The same is equally true of any re-keying, as we do not have access to a pool of low-cost graduate student labour, unlike some other higher education institutions (notably in the US). However, we are well placed to cater for work which involves the rigorous quality assurance of captured text (e.g. proof-reading of OCRd or re-keyed texts), or the enrichment of such material (e.g. the addition of sophisticated text encoding to create TEI-conformant XML files). 232
BPDG_opmaak_12072010.indd 232
13/07/10 11:51
THE OXFORD DIGITAL LIBRARY
The third distinct area of costs relates to the technical infrastructure required to manage, preserve, and deliver our digital collections. Once again we are fortunate to be based within a wider institutional framework, with the consequence that we can often share infrastructure costs with others in OULS or the wider University. As with Imaging Services, a simple internal market operates for the basic levels of infrastructure support, and any fees charged are almost certainly well below comparable commercial levels (e.g. for data hosting on a robust, commercially supported webserver). Many of the costs of hosting and delivering digital collections are absorbed within the wider budget of OULS, rather than directly set against individual digitization projects or charged as a group to the ODL. Perhaps the one notable exception in this area concerns the costs of digital preservation. At the time of writing there is a wide-ranging debate underway with the University about the best options to coordinate, manage, and recover the costs of such provision. To date, OULS has accorded permanent retention status to its digital collections (which, for all practical purposes, translates to an un-costed commitment to ensure that our digital collections will remain accessible in perpetuity), and when this is set alongside our intention to accession born-digital materials into our archival holdings within Special Collections, OULS is a major stakeholder in any deliberations about digital preservation within the University. Where our cost modelling is less successful is in our attempts to capture (if not necessarily recoup) the various contributions from other staff, OULS, and the institution as a whole which are less easy to quantify. Most of the librarians who have played an active role in the digitization of a particular collection either were not factored into the costs of that endeavour, or undoubtedly contributed more of their time and expertise than was included in that project’s budget. Similarly, the time of senior staff who sit on oversight and steering committees is not recorded or captured in our costings. Nor do we have a good way of capturing the costs associated with all the digitization project proposals which get dropped before we submit them to a funding body, or which get submitted but are not successful. The University currently has a fairly blunt instrument for calculating the institutional overheads and full economic costs of various activities, but it remains unclear whether or not these could usefully be applied to OULS’ digital library activities. Anticipated income streams The ODL is a core service of OULS, and any income streams are regarded as an added bonus rather than a fundamental raison d’être. Although digital library projects often provide the initial funds for image digitization work undertaken by the Imaging Services department, the latter is solely responsible for managing any fees derived from the subsequent reuse and licensing of such images by third parties (e.g. a commercial publisher who wishes to use a particular image on a book cover). Similarly, in the case of any work undertaken in collaboration with a commercial organization – a scholarly e-publisher, for example – any royalty payments resulting from that activity are negotiated and managed on behalf of OULS by our Communications and Publications department. In other words, the income streams which are derived from Oxford’s digital library activities are retained and fed into the central OULS budget, and may be used to fund any activity within OULS libraries. 233
BPDG_opmaak_12072010.indd 233
13/07/10 11:51
Gary S. Lawrence
In the case of a formal agreement with, say, a commercial e-publisher to digitize sub stantial amounts of material, the costs of the work of the Imaging Services department are typically recouped in full (although some colleagues feel that this approach does not adequately recover the indirect costs incurred by other departments e.g. subject specialists who may contribute to the selection of material, the costs of picking, transporting, and re-shelving items etc.). Moreover, we are generally averse to negotiating exclusive licensing arrangements with commercial partners – and those which are agreed are of a strictly enforced duration (e.g. five years from the date of first publication/release). In such situations we always retain the right to re-use the resulting digital images, texts etc. for scholarly purposes. Marketing This is certainly an area which requires more attention. As mentioned earlier, we have tended to rely on a “build it and they will come” philosophy, and with the growth in search engine use over the past decade we have seen a steady year-on-year increase in the use of all our digital collections. Yet such growth implies that there are substantial numbers of readers who may take some time to discover the relevance of our digital collections and, increasingly, UK funding bodies are interested in early indications of the (likely) impact of a given digital collection being made available to the user community. Such data can also prove invaluable as supporting evidence for continuation- and related funding bids. Delivering our collections via simple web interfaces has certainly helped them to be found and crawled by search engines, but we are now endeavouring to implement a more proactive strategy (e.g. pushing out information via OAI-PMH, RSS feeds, annotating Wikipedia entries etc.). Similarly, making APIs and web services available which enable others more readily to access and reuse our digital collections means that some of the efforts of marketing our collections to readers are indirectly borne by others. We are taking an active interest in the Toolkit for the Impact of Digitised Scholarly Resources (TIDSR) which has been developed with funds from the UK’s Joint Information Systems Committee (JISC). If the approach proves valid, we anticipate using this method to measure the impact of some of our existing digital collections, and also using the results to inform how we (re-)launch and promote other resources. Risk analysis Although there is no overall risk analysis of the ODL, each individual digitization project is assessed. The two most frequently identified risks at the project level are either unanticipated digitization costs (e.g. due to issues with the selection of material, new material being added, difficulties of metadata creation etc.) or the retention of key personnel. For the ODL as a whole, the chief risk would be loss of business continuity, occasioned by a catastrophic technical failure. All the digital resources created under the auspices of the ODL are subject to careful preservation management, and in the event of a major disaster we would anticipate being able to restore the majority of our collections and services to readers within 48 hours. 234
BPDG_opmaak_12072010.indd 234
13/07/10 11:51
THE OXFORD DIGITAL LIBRARY
Governance, implementation and financial plans The ODL is located within the Special Collections department of OULS, but works extremely closely with colleagues in the Systems and e-Research Services section. General oversight of ODL activities is provided by the Digital Library Strategy Group, which has representation from across the subject and service areas of OULS and reports directly to the executive decision-making body of OULS. As with risk analysis, our focus tends to be at the level of the individual project or digital collection. In the majority of cases we favour a phased, incremental release of data as a digital collection is created, rather than a single, end-of-project release date. This has the benefit that materials are made available to readers at the earliest opportunity, but brings additional overheads in terms of meeting end users’ expectations and their support needs. The overall implementation plan for the ODL is closely tied to the rolling five-year strategic and operational plans for OULS as a whole, given that in an emerging hybrid library service the distinctions between the provision of analogue and digital content are becoming less significant. Compared to running a traditional library, the costs of a digital library are not cheap (given the additional costs of retaining staff with specialist expertise, and the necessary technical infrastructure). As a core service of OULS, the main concern of the ODL is to ensure that the majority of additional costs arising from digitization work are covered by the relevant source of funds. To this extent, the ODL is heavily reliant upon its ability to continue to secure external support (e.g. via grants from funding agencies) to cover the costs of the creation of new digital collections. The costs of maintaining existing digital collections are absorbed within the general operational budget of OULS. Mass digitization changes everything Until 2005, the ODL had concentrated resources at the level of the individual project or digital collection. The subsequent arrival of two mass digitization endeavours will undoubtedly force a reassessment of some of our business planning decisions. In December 2004, Oxford announced that it was one of the first libraries to participate in the Google Book Search Library Project. The initial scope was to consider approximately one million out-of-copyright items from our nineteenth century holdings for digitization in collaboration with Google. The only other selection criteria were based on the condition of the material and its physical characteristics (e.g. dimensions, binding etc.). This undertaking proved a valuable learning experience for both parties, especially given the need constantly to devise, review, and amend various workflows and procedures as the work progressed. At the time of writing this project is still underway, but it is already apparent that the immense scale of this venture will inevitably lead to a re-evaluation of the way we manage, preserve, and provide access to large volumes of digital data. Moreover, this is also our first digitization project that will have a significant and measurable impact on the way we choose to manage our physical holdings.
235
BPDG_opmaak_12072010.indd 235
13/07/10 11:51
Gary S. Lawrence
By comparison, the JISC-funded Electronic Ephemera project seems vanishingly small. Although also described as an exercise in mass digitization, this is a two-year, £1 million project to digitize a selection of approximately 65,000 items from the John Johnson Collection of Printed Ephemera. This work is undertaken in partnership with ProQuest, a well-known publisher of scholarly e-resources, which has contracted to host and deliver the results of the project. In practice the bulk of the funding has been committed to the expert creation of rich metadata records, and only 25% of the budget has been allocated to digitization costs, for it is these rich metadata which will underpin the functionality of the delivery platform, and enable a wide range of users to explore and exploit the collection in ways which would not have been possible before. Business challenges There are four major challenges which will need to be addressed by the ODL’s (and OULS’) future business plans. The first concerns the implementation of appropriate digital preservation policies, procedures, and infrastructure. Throughout the digital library sector there remains a lack of sufficient data and suitable models upon which to base our thinking (although numerous international efforts are underway). Our policy of the ‘permanent retention’ of digital collections, coupled with a lack of control over key variables (e.g. rising staff costs, falling hardware costs) makes long-term planning particularly challenging. However, work on a Digital Asset Management System for OULS is progressing well (see Jefferies 2008). The second major challenge concerns how best to address the difficulties (and associated costs) of maintaining and upgrading our digital collections and services, especially when compared to the relatively static and well-documented costs of managing traditional collections. Users’ expectations of web-delivered content and services are continuing to evolve, and we struggle to identify and meet their expectations in a timely manner. The third challenge relates to the recruitment, retention, and retraining of digital library staff. Our reliance on external, project-based funding calls makes it particularly difficult to find and retain skilled staff, let alone support their career development. A great deal of learning takes place on the job, and this knowledge is often lost to the institution when grant-funding ends. The final challenge involves the difficulty of demonstrating the impact and benefits of digitizing a given collection in a way that it is acceptable to funding bodies, OULS, the wider University, and the users of our services. Much of the current work on web analytics presumes a commercial function and a desire to demonstrate a (financial) return on investment – but things are improving. Similarly, current methods for measuring impact (e.g. end-user surveys) can be time-consuming and expensive, and the findings of such studies do not yet appear equally acceptable to all stakeholders. However, as bestpractices in this field emerge, we can only hope that the findings of these studies will be accorded greater recognition and they can be used to inform future business decisions.
236
BPDG_opmaak_12072010.indd 236
13/07/10 11:51
THE OXFORD DIGITAL LIBRARY
Summary – Oxford’s “build it and they will come” approach to digitization is no longer tenable, and must be replaced – It is proposed to move from image surrogates of text to full text capture in order to allow users and developers to process as they wish – Business planning had to be revised and remains a major challenge, notably when trying to capture the hidden costs of digitizing materials and delivering digital collections – Costing is a mixed model: some activities can be easily costed, others less so, and some are absorbed in general running costs – Formal commercial agreements are fully costed – Mass digitization is forcing a reassessment of what we can and should offer to readers, and raising end-users’ expectations of what they can find online – ODL is moving from simply making available on the Web, to a more proactive push policy – The major risk is seen as a catastrophic systems failure which could entail a break in service – The major challenges are: preservation difficulties and costs, maintaining and updating services, recruitment and retention of staff and demonstrating impact to potential funders Bibliography and References Early English Books Online Text Creation Partnership (EEBO-TCP): available at www.lib.umich. edu/tcp/eebo/ (accessed 28 February 2009) Electronic Ephemera – digitized selections from the John Johnson Collection: available at www. bodley.ox.ac.uk/eejjc/ (accessed 28 February 2009) Google Books Library Project: available at books.google.com/googlebooks/library.html (accessed 28 February 2009) Greenstone Digital Library Software: available at www.greenstone.org/ (accessed 28 February 2009) Jefferies, N. (2008), Oxford Digital Asset Management System (DAMS) Update: available at www. titan.be/common/docs/AXIS_PASIG2008_KC14a_Oxford.pdf (accessed 28 February 2009) John Johnson Collection of Printed Ephemera: available at www.bodley.ox.ac.uk/johnson/johnson. htm (accessed 28 February 2009) Joint Information Systems Committee (JISC): available at www.jisc.ac.uk (accessed 28 February 2009) Lee, S. D. (1999), Scoping the Future of the University of Oxford’s Digital Library Collections: available at www.bodley.ox.ac.uk/scoping/report.html (accessed 5 June 2010) Lesk, M. (2005), Understanding Digital Libraries, Morgan Kaufmann, San Francisco Metadata Encoding and Transmission Standard (METS): available at www.loc.gov/standards/mets/ (accessed 28 February 2009) Metadata Object Description Schema (MODS): available at www.loc.gov/standards/mods/ (accessed 28 February 2009) Oxford Digital Library (ODL): available at www.odl.ox.ac.uk (accessed 28 February 2009) Oxford University Library Services: available at www.ouls.ox.ac.uk (accessed 28 February 2009) OULS Imaging Services: available at www.ouls.ox.ac.uk/services/copy/imaging_services (accessed 28 February 2009) 237
BPDG_opmaak_12072010.indd 237
13/07/10 11:51
Oxford Text Archive: available at ota.ahds.ac.uk/ (accessed 28 February 2009) ProQuest: available at www.proquest.co.uk (accessed 28 February 2009) Toolkit for the Impact of Digitised Scholarly Resources (TIDSR): available at http://www.oii.ox.ac. uk/research/project.cfm?id=51 (accessed 28 February 2009).
BPDG_opmaak_12072010.indd 238
13/07/10 11:51
BPDG_opmaak_12072010.indd 239
13/07/10 11:51
BPDG_opmaak_12072010.indd 240
13/07/10 11:51