Papers from the First European Workshop on Technological & Security Issues in Digital Rights Management (EuDiRights’06) 9781846633850, 9781846633843

A selection of papers examining issues arising from the need to balance fair use and individual rights to information on

162 35 2MB

English Pages 108 Year 2007

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Papers from the First European Workshop on Technological & Security Issues in Digital Rights Management (EuDiRights’06)
 9781846633850, 9781846633843

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

oir cover (i).qxd

08/02/2007

11:23

Page 1

ISSN 1468-4527

Volume 31 Number 1 2007

Online Information Review The international journal of digital information research and use Papers from the First European Workshop on Technological & Security Issues in Digital Rights Management (EuDiRights ’06) Guest Editor: Mariemma Yagüe

www.emeraldinsight.com

Online Information Review

ISSN 1468-4527 Volume 31 Number 1 2007

Papers from the First European Workshop on Technological & Security Issues in Digital Rights Management (EuDiRights ’06) Guest Editor Mariemma Yagu¨e

Access this journal online ______________________________

3

Editorial advisory board ________________________________

4

GUEST EDITORIAL The digital information war Mariemma I. Yagu¨e ____________________________________________

5

Complementing DRM with digital watermarking: mark, search, retrieve Patrick Wolf, Martin Steinebach and Konstantin Diener _______________

10

A class of non-linear asymptotic fingerprinting codes with "-error Marcel Fernandez, Josep Cotrina-Navau and Miguel Soriano ___________

22

Verification algorithms for governed use of multimedia content Eva Rodrı´guez and Jaime Delgado_________________________________

38

Toward semantics-aware management of intellectual property rights Ernesto Damiani and Cristiano Fugazza____________________________

Access this journal electronically The current and past volumes of this journal are available at:

www.emeraldinsight.com/1468-4527.htm You can also search more than 150 additional Emerald journals in Emerald Management Xtra (www.emeraldinsight.com) See page following contents for full details of what your access includes.

59

CONTENTS

CONTENTS continued

DRM, law and technology: an American perspective Bill Rosenblatt _________________________________________________

73

SAVVY SEARCHING Clustering search results. Part I: web-wide search engines Pe´ter Jacso´ ____________________________________________________

85

Book reviews _____________________________________________

92

Guide to the professional literature ____________________

98

Note from the publisher _________________________________

106

www.emeraldinsight.com/oir.htm As a subscriber to this journal, you can benefit from instant, electronic access to this title via Emerald Management Xtra. Your access includes a variety of features that increase the value of your journal subscription.

Structured abstracts Emerald structured abstracts provide consistent, clear and informative summaries of the content of the articles, allowing faster evaluation of papers.

How to access this journal electronically

Additional complimentary services available

To benefit from electronic access to this journal, please contact [email protected] A set of login details will then be provided to you. Should you wish to access via IP, please provide these details in your e-mail. Once registration is completed, your institution will have instant access to all articles through the journal’s Table of Contents page at www.emeraldinsight.com/1468-4527.htm More information about the journal is also available at www.emeraldinsight.com/ oir.htm

Your access includes a variety of features that add to the functionality and value of your journal subscription:

Our liberal institution-wide licence allows everyone within your institution to access your journal electronically, making your subscription more cost-effective. Our web site has been designed to provide you with a comprehensive, simple system that needs only minimum administration. Access is available via IP authentication or username and password.

E-mail alert services These services allow you to be kept up to date with the latest additions to the journal via e-mail, as soon as new material enters the database. Further information about the services available can be found at www.emeraldinsight.com/alerts

Emerald online training services Visit www.emeraldinsight.com/training and take an Emerald online tour to help you get the most from your subscription.

Key features of Emerald electronic journals Automatic permission to make up to 25 copies of individual articles This facility can be used for training purposes, course notes, seminars etc. This only applies to articles of which Emerald owns copyright. For further details visit www.emeraldinsight.com/ copyright Online publishing and archiving As well as current volumes of the journal, you can also gain access to past volumes on the internet via Emerald Management Xtra. You can browse or search these databases for relevant articles. Key readings This feature provides abstracts of related articles chosen by the journal editor, selected to provide readers with current awareness of interesting articles from other publications in the field. Non-article content Material in our journals such as product information, industry trends, company news, conferences, etc. is available online and can be accessed by users. Reference linking Direct links from the journal article references to abstracts of the most influential articles cited. Where possible, this link is to the full text of the article. E-mail an article Allows users to e-mail links to relevant and interesting articles to another computer for later use, reference or printing purposes.

Xtra resources and collections When you register your journal subscription online, you will gain access to Xtra resources for Librarians, Faculty, Authors, Researchers, Deans and Managers. In addition you can access Emerald Collections, which include case studies, book reviews, guru interviews and literature reviews.

Emerald Research Connections An online meeting place for the research community where researchers present their own work and interests and seek other researchers for future projects. Register yourself or search our database of researchers at www.emeraldinsight.com/ connections

Choice of access Electronic access to this journal is available via a number of channels. Our web site www.emeraldinsight.com is the recommended means of electronic access, as it provides fully searchable and value added access to the complete content of the journal. However, you can also access and search the article content of this journal through the following journal delivery services: EBSCOHost Electronic Journals Service ejournals.ebsco.com Informatics J-Gate www.j-gate.informindia.co.in Ingenta www.ingenta.com Minerva Electronic Online Services www.minerva.at OCLC FirstSearch www.oclc.org/firstsearch SilverLinker www.ovid.com SwetsWise www.swetswise.com

Emerald Customer Support For customer support and technical help contact: E-mail [email protected] Web www.emeraldinsight.com/customercharter Tel +44 (0) 1274 785278 Fax +44 (0) 1274 785201

OIR 31,1

EDITORIAL ADVISORY BOARD

Mr Chris Armstrong Information Automation Ltd, Aberystwyth, UK E-mail: [email protected]

4

Dr Judit Bar-Ilan Department of Information Science, Bar-Ilan University, Israel E-mail: [email protected] Professor Theo Bothma Department of Information Science, University of Pretoria, South Africa E-mail: [email protected] Mr D. Scott Brandt Purdue University Libraries, Purdue University, USA E-mail: [email protected] Dr Gobinda G. Chowdhury Dept of Computer and Information Sciences, University of Strathclyde, UK E-mail: [email protected] Adjunct Professor Peter Clayton School of Professional Communication, University of Canberra, Australia E-mail: [email protected] Dr Ana Maria Ramalha Correia Instituto Superior de Estatistica e Gestao de Informacao, Universidade Nova de Lisboa, Portugal E-mail: [email protected] Ms Louise Edwards Cranfield Information and Library Services, Cranfield School of Management, UK E-mail: [email protected] Professor Ina Fourie Department of Information Science, University of Pretoria, South Africa E-mail: [email protected] Dr Pe´ter Jacso´ Dept of Information and Computer Sciences, University of Hawaii at Manoa, USA E-mail: [email protected] Associate Professor Thomas R. Kochtanek School of Information Science and Learning Technologies, University of Missouri-Columbia, USA E-mail: [email protected] Associate Professor Wing Lam MISM Programme Director, Universitas 21 Global, Singapore E-mail: [email protected]

Online Information Review Vol. 31 No. 1, 2007 p. 4 # Emerald Group Publishing Limited 1468-4527

Dr ChernLi Liew School of Information Management, Victoria University of Wellington, New Zealand E-mail: [email protected] Associate Professor M. Shaheen Majid School of Communication & Information, Nanyang Technological University, Singapore E-mail: [email protected] Dr Stephen M. Mutula Department of Library and Information Studies, University of Botswana, Botswana E-mail: [email protected] Nandish V. Patel Director, Brunel Organization and Systems Design Centre, Brunel Business School, Brunel University, UK E-mail: [email protected] Professor Jagtar Singh Department of Library and Information Science, Punjabi University, India E-mail: [email protected] Professor Alan D. Smith Robert Morris University, Pittsburgh, PA, USA E-mail: [email protected] Dr Lucy A. Tedd Department of Information Studies, University of Wales, UK E-mail:[email protected] Professor Mike Thelwall School of Computing and IT, Wolverhampton University, UK E-mail: [email protected] Mr Paul S. Ulrich Berlin Central and Regional Library, America Memorial Library, Germany E-mail: [email protected] Dr Kristina Voigt GSF-Research Centre for Environment and Health, Institute of Biomathematics and Biometry, Germany E-mail: [email protected] Dr Xiaolin Zhang Chinese Academy of Sciences, Beijing, China E-mail: [email protected]

The current issue and full text archive of this journal is available at www.emeraldinsight.com/1468-4527.htm

GUEST EDITORIAL

The digital information war

The digital information war

Mariemma I. Yagu¨e Universidad de Ma´laga, GISUM Group, Malaga, Spain

5

Abstract Purpose – The purpose of this Guest Editorial is to introduce the papers in this special issue. Design/methodology/approach – A brief summary of the main contributions of the papers included in this issue is provided. Findings – In order to combat the digital information war it was found that important work must be done to establish both users’ and content providers’ trust through fair e-commerce/digital rights management (DRM). Originality/value – The paper provides an overview of the basic requirements of DRM systems. Keywords Worldwide Web, Information services, Information control Paper type Viewpoint

Since the advent of the Web, this new environment has emerged as a vast repository where anyone may place his or her content. As no rules were imposed initially, all kinds of information could be freely disseminated worldwide. Content industries also joined the Web and used this platform to sell their copyrighted digital products on a global scale. The problem has arisen when these digital contents under copyright have been illegally downloaded without penalty, and consequently content authors have not received appropriate remuneration for their creations. In addition the associated content value chain is long (rights holder’s organisations, content providers, publishers, broadcasters, DRM solution providers, among others). This problem has particularly affected digital content industries such as the music industry, where the amount of unauthorised downloading of copyrighted music in MP3 format has grown dramatically in recent years. We must address two different aspects when considering the access of information on the Web. On the one hand are the users, whose objective is to freely access and share the information available. On the other are the content providers, who want technologists to develop Digital Rights Management (DRM) systems and tools to prevent the unauthorised distribution of their copyrighted work. The ultimate aim must be to reach a balance between both fair use and individual rights versus the interests of content providers. Perhaps the music industry should have taken users interests into account years ago. Now, however, users’ attitudes and habits need to be changed in order to find a workable solution which is fair and acceptable to both sides. The Guest Editor wishes to acknowledge the support of Professor Gary Gorman, Editor of the Online Information Review, for his dedication during the whole process to producing a high quality journal. She also specially wishes to acknowledge Dr Leonardo Chiariglione for his collaboration in this event, and to invited speakers, Suzanne Guth and Bill Rosenblatt, for their appealing talks and interest during EuDiRights’06.

Online Information Review Vol. 31 No. 1, 2007 pp. 5-9 q Emerald Group Publishing Limited 1468-4527 DOI 10.1108/14684520710730994

OIR 31,1

6

Following the digitisation revolution, the shift from paper documents to their respective electronic formats has also produced important benefits in the functioning of business and public administration. Nevertheless, this shift is often limited to the internal operation of each entity because of the lack of security in the electronic communication mechanisms (Yagu¨e et al., 2004). In the digital world developments proceed apace. Technologies have been rapidly evolving, and the management and protection of digital information, as well as the rights associated with this from unauthorised access, use and distribution, is a matter of concern for many rights holders. Moreover, the protection of privacy is a matter of great concern for many citizens. Privacy can be defined as the ability of an individual to control the flow of that individual’s personal information. In Europe privacy is defined as a human right under Article 8 of the 1950 European Convention of Human Rights and Fundamental Freedoms (Council of Europe, 1950), and it is addressed by Directive 2002/58/EC of the European Parliament and the Council (Council of Europe, 2002). DRM techniques have been widely deployed in the digital world to enable only legitimate access to the intellectual property of right holders. At the same time that technology is closer to people, governments want to be closer to citizens. A great effort has to be made, however, not only technologically but also attitudinally. The amount of information produced and maintained by governmental organisations is enormous, and conventional procedures are inadequate for its management. Additionally, in the e-government scenario a new requirement could be the total availability (24 hours, seven days a week) of the virtual office, via the Internet or mobile channels. This would require the implementation of new models and technologies in digital government. The problem is that clients and customers require privacy, and this creates a conflict with the majority of currently deployed DRM systems that track consumer habits and personal information. Present DRM systems impinge on customer privacy, for example, with a requirement of user’s access right identification and authentication to the protected media. However, Korba and Kenny (2002) state that DRM and privacy share as common functions the management of assets by third parties, requirements for protection and restricted usage governed by issued rights and therefore can be combined into one system using DRM techniques to protect both intellectual property and personal information. The result of such a powerful synthesis is a Privacy Rights Management system. We think that DRM technology provides the functionality required for e-government and that Privacy Rights Management (PRM) may be used to give the desired confidence to citizens using this environment (Man˜a et al., 2006). Today, although many people are interested in the rapidly developing field of DRM, the problem is that the majority of solutions are proprietary and lack interoperability. Many studies have been carried out with the objective of characterising and defining the elements that constitute a general DRM framework, and solutions have been offered based on these definitions. In spite of this, we must point out that rights management applies to a wide variety of systems and objects, and this aspect makes it almost impossible to consider all the potential application scenarios. In my opinion the approach must, therefore, be developed in an open and extensible way. DRM frameworks are usually defined on the basis of multiple components with complex relationship, and it is usual to find descriptions based on a large set of predefined entities. These approaches produce inflexible and difficult-to-understand

schemes. The same applies to the definition of rights expression languages, such as XACML (Oasis, 2006). The best solution is to define and implement a general framework capable of supporting very heterogeneous DRM applications and scenarios, and for this to focus on flexibility and extensibility (Man˜a et al., 2003). In summary, although much work has been done in the field of DRM, we have yet to solve many issues. Any DRM system should, as basic requirements, be: . non proprietary; . interoperable; . highly scalable; . privacy aware; . flexible enough to seamlessly adapt new and unforeseen applications and scenarios; . fair; . dynamic enough to allow seamless and transparent adaptation to changing environments; . easy to set up and use; and . robust so that the consequence of a small security breach must be minimal. Important work must be done to reach these requirements (among others) and to establish both users’ and content providers’ trust through fair e-commerce/DRM. This will be the only way to end this Digital Information War. To conclude is a brief sketch of the main contributions of the papers included in this issue, which were selected for the First European Workshop on Technological and Security Issues in Digital Rights management (EuDiRights’06) and presented during the Workshop in Hamburg. A very interesting application of a watermarking technology that allows active content protection beyond the limitations of current DRM systems is presented in “Complementing DRM with Digital Watermarking: Mark, Search, Retrieve”, by Patrick Wolf, Martin Steinebach and Konstantin Diener. In this work the authors show how digital watermarking can be used to assist and improve cryptography-based DRM systems by allowing content to be protected beyond the domain protected by the DRM system. They explore the potential of this passive technology, which, although not suitable for actively preventing copyright violations, can irreversibly link information with multimedia data, ensuring that an embedded watermark can be retrieved even after analogue copies are accessed. This makes watermarking useful where DRM fails. When content needs to be moved out of the protected DRM domain, e.g. when playing back content via analogue output channels, the content can be marked with information that would help to identify its origin if it is used to violate copyright. The remaining challenge now is to find the marked content within the channels regularly used for copyright violations. Authors then introduce a concept for scanning file sharing networks for the marked content. A framework for finding potentially watermarked material on the Internet and for retrieving the watermarks is presented. In “A Class of Non-Linear Asymptotic Fingerprinting Codes with Epsilon-Error”, the authors (Marcel Fernandez, Josep Cotrina and Miguel Soriano) present a systematic strategy for collusions attacking a fingerprinting scheme. A fingerprinting code is a set of

The digital information war

7

OIR 31,1

8

code words that are embedded in each copy of a digital object, with the purpose of making each copy unique. If the fingerprinting code is c-secure, then the decoding of a pirate word created by a coalition will expose at least one of the guilty parties. As a particular case, this strategy shows that linear codes are not good fingerprinting codes. Based on binary linear equidistant codes, authors construct a family of fingerprinting codes in which guilty users can be efficiently identified using minimum distance decoding. The creation, distribution and consumption of multimedia content through the complete digital value chain are examined in the work of Eva Rodrı´guez and Jaime Delgado, entitled “Verification Algorithms for Governed Use of Multimedia Content”. This work presents mechanisms to enforce rights when distributing governed multimedia content. Additionally, it has been demonstrated that DRM systems controlling the use of multimedia content through the complete distribution chain can use the verification algorithms proposed in this paper to enable secure distribution of multimedia content. By using these algorithms, they can determine whether the rights have been passed correctly from parent to child licenses. Moreover, these systems can also enforce the rights when distributing multimedia content. In “Toward Semantics-aware Management of Intellectual Property Rights”, Ernesto Damiani and Cristiano Fugazza review XML-based DRML and semantics-aware IPR and present two use cases: the former translates the ODRL permission model into an OWL ontology, and the second applies the same translation to the XrML rights model. The resulting ontology of both use cases is subsequently integrated. As a consequence, equivalent rights in both categorisations can be equalised and policies can be transparently applied to principals whose permissions are defined in any format. As a result of this work, the authors define a fine-grained translation mechanism for distinct formalisms based on semantics. Bill Rosenblatt, in “DRM, Law, and Technology: An American Perspective”, presents a complete overview of the US legal panorama, and provides a review of developments in the United States related to digital rights management through legal, technological and market developments in recent years. The objective of this paper is to compare the situation of DRM technology and its legal context in the United States with the European Union and how this may affect the European framework. References Council of Europe (1950), EHCR (1950): European Convention of Human Rights and Fundamental Freedoms, Council of Europe, available at: http://conventions.coe.int/treaty/ en/Treaties/Html/005.htm (accessed 4 November 1950). Council of Europe (2002), Directive 2002/58/EC (2002): Directive on Privacy and Electronic Communications of the European Parliament and of the Council, available at: http://europa.eu. int/eur-lex/pri/en/oj/dat/2002/l_201/l_20120020731en00370047.pdf (accessed 12 July 2002). Korba, L. and Kenny, S. (2002), “Towards meeting the privacy challenge: adapting drM”. NRC paper NRC 44956, available at: http://iit-iti.nrc-cnrc.gc.ca/iit-publications-iti/docs/ NRC-44956.pdf (accessed November 2002). Man˜a, A., Yagu¨e, M.I. and Benjumea, V. (2003), “EC-gate: an infrastructure for digital rights management”, Proceedings of the IASTED International Conference on Communication, Network, and Information Security, Special Session on Architectures and Languages for Digital Rights Management and Access Control, New York City, December 10-12, 2003, Acta Press, Calgary.

Man˜a, A., Yagu¨e, M.I., Karnouskos, S. and Abie, H. (2006), “Information use-control in e-government applications”, Encyclopedia of Digital Government, Idea Group Reference, Hershey, PA, pp. 1076-1082. Oasis, XACM (2006), available at: www.oasis-open.org/committees/tc_home.php? wg_abbrev ¼ xacml Yagu¨e, M.I., Man˜a, A. and Sa´nchez, F. (2004), “Semantic interoperability of authorizations”, Proceedings of the Second Workshop on Security in Information Systems, 13 April 2004, Porto, Portugal, INSTICC Press. Further reading Yagu¨e, M.I. and Troya, J.M. (2002), “A semantic approach for access control in Web services”, Proceedings of EuroWeb 2002 International Conference, British Computer Society Electronic Workshops in Computing (eWiC), Oxford. Corresponding author Mariemma I. Yagu¨e can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

The digital information war

9

The current issue and full text archive of this journal is available at www.emeraldinsight.com/1468-4527.htm

OIR 31,1

Complementing DRM with digital watermarking: mark, search, retrieve

10

Patrick Wolf, Martin Steinebach and Konstantin Diener

Article received 25 September 2006 Reviewed by EuDiRights Workshop Committee Approved for publication 10 October 2006

Fraunhofer IPSI, Darmstadt, Germany Abstract Purpose – The purpose of this paper is to show how digital watermarking can be applied to assist and improve cryptography-based digital rights management (DRM) systems by allowing the protection of content beyond the domain protected by the DRM system. Design/methodology/approach – Digital watermarking is a passive technology, not allowing the active prevention of copyright violations. But it allows the irreversible linking of information with multimedia data, ensuring that an embedded watermark can be retrieved even after analogue copies. Therefore watermarking can be used where DRM fails: whenever content needs to be moved out of the protected DRM domain, e.g. when playing back content via analogue output channels it can mark the content with information that would help to identify its origin if it is used for copyright violations. The remaining challenge now is to find the marked content within the channels regularly used for copyright violations. The paper therefore introduces a concept for scanning file sharing networks for marked content. Findings – The vast number of files present in the file sharing networks prohibits every approach based on completely scanning and analysing each file. Therefore concepts for filtered search queries where only potentially watermarked files are downloaded are discussed. Originality/value – The paper shows how watermarking can be applied as a technology to allow active content protection beyond the limitations of current DRM systems. Keywords Data security, Digital storage, Mark scanning equipment Paper type Conceptual paper

Online Information Review Vol. 31 No. 1, 2007 pp. 10-21 q Emerald Group Publishing Limited 1468-4527 DOI 10.1108/14684520710731001

Security challenges for digital media Recent advances in multimedia document production, delivery and processing, including the wide availability of increasingly powerful devices for the production, communication, copying and processing of electronic documents, have made a large number of new opportunities available for the dissemination and consumption of multimedia content. At the same time these rapid developments have raised several important problems regarding intellectual property, digital rights management, authenticity, privacy, conditional access and security that risk impeding the diffusion of new services. Issues such as Internet piracy or electronic documents forgery may have a substantial negative impact on the commercial exploitation of the possibilities already offered by multimedia technology. Resolving these issues would provide new impetus to a more rapid introduction of products and services, benefiting both industry and consumers. Digital rights management (DRM) aims to be such a solution. First attempts at DRM relied on cryptography, particularly since cryptography proved so useful for secure transmission of digital data through public channels. Unfortunately, when cryptography alone is applied to protect multimedia applications,

it often fails. The biggest challenge is the so-called analogue gap or analogue hole: content needs to be decrypted and is presented to the user in an analogue environment, as humans cannot perceive digital information. This enables an attacker to re-record the signal in its unprotected state. Schneier (2001) offers additional reasons for the failure of cryptography-based copy protection, including the easy distribution of tools to break the protection. Driven by the necessities outlined above, the last few years have seen the development of new tools to tackle the problems encountered in media security applications, leading to a broader view of multimedia security, according to which security primitives and multimedia signal processing work together to ensure the secure access to the data. As it is now evident, multimedia security encompasses a wide range of diverse technological areas working together to cope with the complex problems characterising this rapidly evolving field (Lu, 2004). Enabling technologies include watermarking, data hiding, steganography, cryptography, perceptual hashing, biometrics, active fingerprinting, network security and digital forensics. Some of these technologies are described below in more detail. As an example, to meet the challenge of the analogue gap mentioned above in the case of music, both digital audio watermarking and perceptual audio hashing can be applied (Cano et al., 2002a). The first approach adds information appropriate to analogue conversion to the musical piece which can be retrieved, for example, to identify the content. The latter approach stores a description of the musical piece appropriate to analogue conversion in a database which helps to identify the content at a later time. Searching for content The interest of the media industry in fighting illegal file sharing has inspired various approaches and business models to searching for copyright violations within file sharing networks. This area includes specialised agencies scanning the networks with a low degree of automation (for example, Partners for Management www.p4m.de/ index.htm). While this approach can successfully identify copies within networks, it is time-consuming and therefore affordable only for media content where illegal copies cause huge financial losses, as in the film industry. As an alternative, Digimarc Image Search www.digimarc.com/mypicturemarc/imagesearch is an automated system currently addressing only Web pages and limited to single images to be protected. Other companies like GridPatrol www.gridpatrol.de offer a more complex service, including not only searching for content in various places like Web pages, FTP directories or file sharing networks, but also initiating legal measures when illegal content is found. Identification of illegal content here is done by logical analysis and statistical calculations. All of these approaches share one important weakness: they cannot identify the sources of illegal copies, but only the current point of distribution; or, in the worst case, only the existence of illegal content within their search space. This paper takes up the challenges just presented by describing how watermarking can be actively used to complement current DRM systems. First, we will provide the technical background of watermarking. It will become apparent that watermark embedding and retrieval alone will not suffice in advancing DRM systems. What is missing is an active component searching for potentially watermarked files on the Internet, which we will develop as a meta-search framework in the next section. Finally, we will take a closer look at what can be achieved by the results of such a meta-search.

Complementing DRM

11

OIR 31,1

12

Technologies for multimedia protection Because of the massive illegal copying of media files, content owners utilise various protection technologies. In online media stores normally digital rights management (DRM) and/or digital watermarking is applied. DRM hopes to prevent the illegal copying by restricting the usage of the content. As long as a DRM system has control over the media file, it cannot be moved to a file-sharing network. But often users circumvent these mechanisms and remove the restrictions. The result is an unprotected and freely distributable file. Digital watermarking embeds individual information within the content sold by the online store. This allows tracing illegal copies back to the original buyer of the content. The customers are granted freedom to use the content in any way they want, including burning a CD or copying the audio file to an MP3 Walkman. Misuse such as illegal P2P distribution is discouraged by the embedded information about the customer, which stays in the content after format changes or even in analogue copies. One important challenge here, among others, is to find marked content after illegal usage. Perceptual hashing Perceptual hashing algorithms have also been called robust hash algorithms, passive fingerprinting as well as audio IDs (when designed for audio data) in the literature. The concept here is to derive a robust content-dependent description from media data to be able to identify the media data by comparing the stored and a newly calculated description. This description aims to be sensitive to modifications of the media data (e.g. lossy compression). Known concepts include the inherent periodicity of signals, the time-frequency landscape given by the frame-by-frame mel-frequency cepstral coefficients, principal component analysis, adaptive quantisation and channel decoding (O¨zer et al., 2005). Various other approaches for deriving a robust content description have been introduced for acoustic (Cano et al., 2002a; Cano et al., 2002b; Allamanche et al., 2001; Mihcak, 2001) or visual data (Burges et al., 2002; Skrepth and Uhl, 2003; De Roover et al., 2005; Mihcak, 2005; Oostveen et al., 2001). Snocap www. snocap.com offers a content-recognition system based on passive fingerprinting enabling reliable filtering of illegal content in file-sharing clients or playback devices. Passive fingerprinting, one popular synonym for perceptual hashing, is called “passive” to distinguish it from the usage of specially designed binary sequences for customer identification, also called fingerprints. The act of embedding such fingerprints into a media file via digital watermarking is called active fingerprinting (Lu, 2004). Digital watermarking Digital watermarks are one of the various strategies which should help to make the distribution of digital material more secure. A distinction can be made between active and passive strategies: . active methods, such as cryptography, directly prevent unauthorised distribution of data; and . passive methods, such as digital watermarks, serve more as a method to provide proof of ownership rights.

Watermarks have been, and continue to be, used in other physical media such as banknotes or paper. In order to use watermarks for digital media, they are adapted for the relevant types of multimedia data. Watermark solutions are thus specially developed for areas such as audio and video. Watermark information is integrated with the data so that, unlike information merely attached to the data, it cannot simply be removed. In most cases it has the appearance of a noise pattern spread transparently across the medium. The noise pattern here needs only to be adapted, depending on the multimedia data format used. In audio data, for instance, individual frequencies are altered, and for image data colours of individual pixels are altered. Special conditions arise in the area of video, since the temporal sequence of pictures has to be taken into consideration as well. Watermarks have a variety of properties which receive various weightings depending on the way in which the watermark is used, including robustness, transparency, payload and security. Usage of digital watermarking In order to use digital watermarking for assisting DRM systems, two general strategies can be followed: (1) Public watermarking. The watermark is embedded by a DRM system when the protected content leaves the DRM system, e.g. when sending it to an analogue output channel. It includes information about the usage rules for the content within the DRM system. If the file re-enters the DRM domain, the watermark is retrieved and the DRM rules are re-assigned on the file. (2) Private watermarking. Here the watermark is embedded and retrieved by the owner of the protected content or a service provider. In most cases this is done at the sales stage in an online shop to include an individual customer-depended identifier of a sold file to be able to trace the customer in case of copyright violations. There is no interaction between the customer and the watermark; the embedding process is transparent to him. Both strategies have drawbacks and advantages. Private watermarking requires the complete computational costs of watermark embedding and retrieval to be paid by the content owner or the service provider, while for public watermarking the DRM system of the customer’s computer is responsible for watermarking. For public watermarking, the embedded information becomes an active part of the DRM system. All content to enter the DRM environment is first scanned for embedded watermarks, disabling attacks to remove DRM policies via the analogue gap. The private watermarking strategy requires additional efforts to enforce the content protection it is aiming at. After embedding the customer identifier, the watermark rests passively within the content bought by the customer. It does not hinder the customer in further distributing the content, for example, by placing the files into file-sharing networks. Only if the content owner uses active measures to search for illegal copies of his content, retrieve the watermark from it and address the identified customer does the watermark offer active protection. The most important difference between both strategies is the inability of users to check whether attacks against the embedded watermarks are successful. This is a major challenge for watermarks used in public scenarios: if the watermark is detected

Complementing DRM

13

OIR 31,1

14

by the DRM system and DRM regulations are issued to it based on the retrieved information, attackers can use the system to verify the success of attacks against the watermark. They can try an arbitrary number of operations on the protected file, move it into the DRM system and see if DRM rules are re-assigned to it. As soon as this is no longer the case, they have identified a successful attack on the watermark and can try to duplicate it for other protected files. This is especially critical, as one common watermark with respect to secret key and algorithm would need to be used for all users of a DRM system; otherwise files copied from one DRM system would not be recognised by another system, and DRM rules could not be issued for them. The private approach does not share this weakness. Customers may try to remove the watermark, but only the reaction of the content owner would indicate success or failure of the attack. In this case the number of attacks that could be tried by attackers is minimal, as legal action would occur soon after the first copyright violations were identified due to failed attacks. The weakness of the public approach therefore lies within the retrieval of the watermark and the feedback to the customer. A scenario where the watermark is only embedded by a DRM system and then detected by a private party would not allow attack optimisation. If this is the case, both strategies share the challenge of finding potential watermark content. In the following, we propose a strategy for automated scanning of file-sharing networks for marked content to meet this challenge.

Internet media search Potentially marked media on the Internet can be accessed in several ways. Most of the files are published in file-sharing systems, but the Web, FTP servers or News Forums are used as publication platforms as well. The search for potentially marked media on the Internet means scanning all these different sub-spaces for the media files; and it therefore requires an Internet-wide meta-search engine (Figure 1). This meta-search engine must provide a generic search interface to the client issuing a search request as well as simple access to all possible sub-spaces. This is best realised by an extendable framework that allows definition and registration of specialised components for individual sub-spaces. Thus the media meta-search framework covers two aspects: (1) Search interface. This part contains the definition of complex search queries and representation of generic search results.

Figure 1. Internet meta-search engine

(2) Search execution architecture. This part describes the extendable architecture for search execution on the concrete Internet sub-spaces and the platform dependent transformation of generic search queries. Search interface Internet sub-spaces that allow searching explicitly for digital media files like file-sharing clients or the Google image search use a metadata-based search mechanism. Unlike the classical text document search engines like Google, Lycos or Yahoo no content-based search mechanism is provided. All these media search interfaces hardly differ. Figure 2 shows the common functionality. The common media search interface accepts a set of (textual) metadata as input (keywords, file type, media type, etc.). This search query is processed on the underlying network (file sharing) or database (Google Image Search). From every file that matches the search query a search result entry is created. This result contains metadata about the file and provides an access to the medium’s essence – a file-sharing client offers the file for download and Google Image Search provides a link to the file. As all the Internet sub-space’s search interfaces based on a metadata search the meta-search, search interface will do so as well. But the search interfaces support different sets of metadata for search parameter definition and search results. Because of this, the framework will define generic query and result metadata.

Complementing DRM

15

Queries The set of metadata defined for meta-search query definition is reduced to minimum of items. This ensures that all sub-spaces will be able to support the metadata-based media search. Table I contains the metadata items supported for query. As most of the time one single criterion will not fit the programmer’s needs, there must be an option to combine multiple criteria with a complex one. Therefore the framework contains a set of logical operator criteria: NOT, AND, OR, and XOR. All operators are criteria objects themselves and can be nested elements of a logical

Figure 2. Common media search interface

Keywords

The simplest way is to map the keywords to the media file’s name. This option is supported by all Internet sub-space. Other possible mappings are to an MP3 file’s ID3 tag or metadata embedded in an image

Medium type

The current design provides three types of media: image, audio, and video. The easiest way to determine the medium type is to check the media file extension. Some of the sub-spaces allow obtaining the media type more distinctly. In addition they often allow specifying the requested media type together with the keyword-based query definition

Medium file size

The framework provides the search criteria for the media file size: greater than, equal, and smaller than

Table I. Metadata items for meta-search query definition

OIR 31,1

16

operator as well. The criteria objects grouped by logical operators create a tree structure that can be modelled using the composite pattern. Results The search process creates asynchronously a set of search results. In addition to the media file or media essence every search result contains a set of metadata that describe the found medium. This set of metadata differs for every Internet sub-space. Since the generic meta-search interface must provide a unique description for all search results, the framework defines an interface for all search results. Each result contains a relationship to the original criterion it matches. Thus a search result that possibly fits with more than one search request could easily be associated with the correct search criterion. Each search result also contains a set of media file locations for requested media files. This is due to the peer-to-peer basis of some media publication platforms on the Internet. One result of the peer-to-peer architecture is a non-guaranteed availability of the network nodes or the media files stored on them. In order to make the published media files as available as possible, the files are not stored on only one single node. Nearly all published media files are stored on multiple nodes. Therefore a search query to a file-sharing platform will create a set of search results. The media file’s metadata and the access to its content are not modelled directly in the search result’s interface. This is coupled to a medium descriptor, which provides access to the media file and its metadata. The framework’s search process will work asynchronously, as most of the Internet sub-spaces for the media search process search queries asynchronously. This is motivated mainly by the peer-to-peer-concept on which many media publication platforms are based. The platform client sends a message containing its query to the network and triggers a search result if another node answers this query. The framework provides an observer pattern-based publish/subscribe mechanism for search result delivery. Every client interested in a search result has to implement a subscriber interface and to register at the framework’s meta-search interface that is described in the following section. After registration the client receives all triggered search results from the meta-search interface. Search execution As mentioned earlier, our framework divides the Internet into sub-spaces. File-sharing, World Wide Web and FTP servers are three examples of sub-space offering different opportunities for media search and download. But all these spaces can be divided in different spaces themselves. File-sharing is a highly volatile field with very heterogeneous sets of publication platforms and access tools, and it is constantly changing. New tools like torrent networks appear in the space, and older tools like Napster or KaZaA are displaced. This brief look at only one of the potential spaces for media publication on the Internet shows the requirements which arise for the search execution architecture: an extendable and easily modifiable environment. A single search component is not able to fit the described requirements to access the Internet as a whole. Therefore the search execution logic must be delegated to specialised search components. The search component interface is shown in Figure 3 (NetworkConnector). The diagram also contains the meta-search interface – a manager class for the search

Complementing DRM

17

Figure 3. Search component interface

components (NetworkConnectorManager) that delegates all search requests to the registered search components. Every component encapsulates one pre-defined sub-space and hides its specific attributes and constraints to the caller. Two of these specific attributes are the concrete sets of metadata supported for query definition and result description. Beside the interaction with the target space’s search interface the component’s task is to transform the generic search criterion to a target space-dependent query statement and convert the target space’s search results to generic search results and vice versa. The component’s whole search process is shown in Figure 4. (1) The search component transforms the generic meta-search query to a search query supported by the target space. All generic metadata are mapped to metadata accepted by the target space’s search interface. This mapping ignores all metadata that are not accepted by the search interface. These metadata are relevant in Step 4. (2) The search component sends the search request to the target space’s search interface. This could mean to call functions of a network client API. (3) The search component receives a search result synchronously or asynchronously from the target space’s search interface.

OIR 31,1

18 Figure 4. The component’s internal search process

(4) The target space’s results metadata are mapped by the generic medium descriptors. As most of the time the set of metadata provided in the form of search results is larger than the set accepted for query definition, the search component checks the results conformity with the generic query again. The generic search result is triggered only if this evaluation succeeds; otherwise the target space’s search result is discarded. The generic search criteria are grouped by logical operators. The result of this grouping is a tree structure modelled through the composite pattern discussed earlier. The search component has to traverse this tree structure and to transform every tree node. In order to provide various ways to traverse the tree structure the iterator pattern is used (Gamma et al., 1996). The transformation algorithm has to map the generic search criteria to the metadata supported for search query definition by the target space. That means that the node transformation component depends on the constraints of the target space. On the other hand all transformation objects of search components must have the same interface. Otherwise it would not be possible for the iterator object to transport them from one node to the other. The visitor pattern provides a way to extend tree structures with new functionality without changing the node objects (Gamma et al., 1996). These visitor objects have a generic interface, and in our case they implement the transformation or mapping logic. They are also used to implement the evaluation algorithm required in Step 4 of the process described above. Searching in multiple sub-spaces for potentially watermarked media might provide the same files (bitwise). Since checking for watermarks is a costly operation (in terms of computational cost), it is desirable to eliminate as many duplicates as possible. This can be achieved by grouping network connectors themselves in a hierarchical structure, so that a “redundancy removal connector” would receive a search query, delegate it to its child P2P network connectors, which eventually return some results. These results are then checked for redundancies, and only the really different files are returned by the redundancy removal connector. But such a strategy is helpful only if a duplicate version of the same work really exists on the Internet. In order to verify this, we have downloaded MP3 copies of the song, “Hips Don’t Lie”, by Shakira, a Top 10 hit at the time, via the four file-sharing networks Gnutella, Bittorrent, eDonkey and DCþ þ . Our goal was to see:

(1) if we could find files which were available in more than one network that were identical hash-wise; and (2) if we could find files that differed hash-wise when looking at the MP3 but resulted in identical.wav files after MP3 to.wav conversion. Point (1) motivates the usage of hash comparison before watermark retrieval when downloading and checking copies from different networks. While all four networks use hash methods to identify files, a cross-identification of files between networks is impossible as each network uses an individual hash algorithm. Point (2) motivates the usage of hash algorithms on pure media data. As most media file formats allow adding meta-data to the multimedia content, changing this metadata also changes the file hash. With respect to watermark retrieval, the added metadata is irrelevant as the information is only embedded into the multimedia data. Therefore using a hash algorithm, taking into account only the pure multimedia data, may identify redundant files before watermark retrieval. As hash calculation is much faster than watermark retrieval, filtering already checked copies before the detection process improves performance of the overall system. For our experiment, we downloaded 102 copies of the song. In six cases we found hash-wise identical files which were present in more than one network. After converting all MP3 files to.wav, 47 files were hash-wise identical and therefore redundant with respect to watermark retrieval. In this case, hashing the pure multimedia data would reduce by 46 per cent the number of files where a watermark would be retrieved. Results of search What does a search for watermark information in media on the Internet truly reveal? First of all, it gives information about when and where the file was shared. Depending on the sub-space, the IP addresses of the source may be revealed as well (not true for some P2P networks). This is general information that every investigation can reveal. More importantly, the search for watermark information reveals the information that was originally embedded into the medium. In our case this is a direct link to the original buyer of the medium (either through a unique transaction number or a customer ID). Direct means that the instant that it is able to interpret the watermark, it can use this information to match the medium to a real-life person. No other parties like Internet Service Providers or even authorities need to be involved. This is true even for “stealth networks”, where uploads can be performed completely anonymously. The link to the original buyer resides in the medium itself. As long as the medium can be accessed (i.e. downloaded or even simply listened to), the potential source of the illegal distribution can be identified. A valid question about using this approach is: What does a copyright owner gain by being able to identify the original buyer? First of all, online stores selling digital goods download the general terms and conditions of the contract with the original buyer which usually not only forbid publishing the work but also the sharing of it with close friends, which in some countries might be otherwise legal (CNET News, 2004). Finding such a link in a medium does not necessarily incriminate the original buyer, but with the link investigations at least have a starting point – this is not the case for cryptography-based DRM systems. Recent cases in the film industry show that

Complementing DRM

19

OIR 31,1

watermarking can be successfully used as a first hint at copyright violation (The Guardian, 2004). Still, the media could have been stolen, and in court neither the watermark nor the IP address alone will suffice to sentence the original buyer. Additional proof will be needed. Thus the challenges described at the beginning of this paper have turned into legal problems, since the supporting technology is already there.

20 Conclusions It seems that DRM systems that are based on cryptographic mechanisms alone will not be able to fully close all the holes where digital works can leak out of the systems. Digital watermarking is the most promising technology to complement cryptographic mechanisms for closing the analogue hole. By embedding a link to the original buyer into a bought work (such as a customer identifier) digital watermarking is able to identify the potential source of illegal distribution. Since digital watermarking is a passive technology, it needs to be supplemented by an active component. The key element is to find suspect works that can be checked for watermarks. We have presented a framework that serves as a meta-search engine to find potentially watermarked works on the Internet and retrieve the watermarks. The meta-search engine is fed with abstract metadata-based search criteria and is able to search arbitrary search spaces (P2P networks, “Google space”, eBay, etc.) by using space-specific network connectors. Each network connector behaves as any other client of the search space and is thus indistinguishable from regular clients, so that search spaces cannot take active measures against being searched. The search reveals – even for stealth networks – a direct link to the original buyer of the work; thus misuse is strongly discouraged. In order to close the analogue hole in cryptographic DRM systems the workflow of digital watermarking needs an active component that finds potentially marked works. This summarises the digital watermarking workflow as mark, SEARCH and retrieve. References Allamanche, H., Helmuth, F. and Kasten, C. (2001), “Content-based identification of audio material using MPEG-7 low level description”, Electronic Proceedings of the International Symposium of Music Information Retrieval, available at: http://ismir2001.ismir.net/papers. html Burges, J., Platt, J. and Jana, S. (2002), “Extracting noise-robust features from audio data”, paper presented at the International Conference on Acoustics, Speech and Signal Processing (ICASSP 2002), Orlando, FL, 13-17 May. Cano, P., Batlle, E., Kalker, T. and Haitsma, J. (2002a), “A review of algorithms for audio fingerprinting”, paper presented at the International Workshop on Multimedia Signal Processing, US Virgin Islands, December. Cano, P., Gomez, E., Batlle, E., de Gomes, L. and Bonnet, M. (2002b), “Audio fingerprinting: concepts and applications”, paper presented at the International Conference on Fuzzy Systems Knowledge Discovery (FSKD’02), Singapore, November. CNET News (2004), “File sharing legal in Canada”, CNET News, available at: http://news.com. com/2100-1027_3-5182641.html

De Roover, C., De Vleeschouwer, C., Lefebvre, F. and Macq, B. (2005), “Key-frame radial projection for robust video hashing”, paper presented at the Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), Montreux. Gamma, E., Helm, R., Johnson, R. and Vlissides, J. (1996), Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley Longman, Boston, MA. (The) Guardian (2004), “Oscars piracy man fined £165,000”, The Guardian Digital Edition, available at http://film.guardian.co.uk/piracy/story/0,1397436,00.html Lu, C-S. (2004), Multimedia Security, Idea Group Publishing, Hershey, PA. Mihcak, K. (2005), Robust Image Hashing, Microsoft Research, Polytechnic University, Brooklyn, NY, available at: http://cis.poly.edu/seminars/abstract6fall05.shtml Mihcak, V. (2001), “A perceptual audio hashing algorithm: a tool for robust audio identification and information hiding”, Proceedings of the 4th International Information Hiding Workshop, Pittsburgh, PA, April 2001. Oostveen, J., Kalker, T. and Haitsma, J. (2001), “Visual hashing of digital video: applications and techniques”, Proceedings of the SPIE 24th Conference Applications of Digital Image Processing, San Diego, CA. ¨ zer, H., Sankur, B. and Memon, N. (2005), “Robust audio hashing for audio identification”, O European Signal Processing Conference (EUSIPCO). Schneier, B. (2001), “The futility of digital copy prevention”, CRYPTO-GRAM, 15 May. Skrepth, C. and Uhl, A. (2003), “Robust hash functions for visual data: an experimental comparison”, Proceedings of IbPRIA 2003. Corresponding author Martin Steinebach can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

Complementing DRM

21

The current issue and full text archive of this journal is available at www.emeraldinsight.com/1468-4527.htm

OIR 31,1

A class of non-linear asymptotic fingerprinting codes with 1-error

22

Department of Telematics Engineering, Universitat Polite`cnica de Catalunya, Barcelona, Spain

Marcel Fernandez, Josep Cotrina-Navau and Miguel Soriano

Article received 25 September 2006 Reviewed by EuDiRights Workshop Committee Approved for publication 10 October 2006

Abstract Pupose – The purpose of this paper is to show that a fingerprinting code is a set of code words that are embedded in each copy of a digital object, with the purpose of making each copy unique. If the fingerprinting code is c-secure, then the decoding of a pirate word created by a coalition of at most c dishonest users, will expose at least one of the guilty parties. Design/methodology/approach – The paper presents a systematic strategy for collusions attacking a fingerprinting scheme. As a particular case, this strategy shows that linear codes are not good fingerprinting codes. Based on binary linear equidistant codes, the paper constructs a family of fingerprinting codes in which the identification of guilty users can be efficiently done using minimum distance decoding. Moreover, in order to obtain codes with a better rate a 2-secure fingerprinting code is also constructed by concatenating a code from the previous family with an outer IPP code. Findings – The particular choice of the codes is such that it allows the use of efficient decoding algorithms that correct errors beyond the error correction bound of the code, namely a simplified version of the Chase algorithms for the inner code and the Koetter-Vardy soft-decision list decoding algorithm for the outer code. Originality/value – The paper presents a fingerprinting code together with an efficient chasing algorithm. Keywords Codes, Algorithmic languages Paper type Technical paper

Online Information Review Vol. 31 No. 1, 2007 pp. 22-37 q Emerald Group Publishing Limited 1468-4527 DOI 10.1108/14684520710731010

Introduction The fingerprinting technique consists in making the copies of a digital object unique by embedding a different set of marks in each copy. Having unique copies of an object clearly rules out plain redistribution, but still a coalition of dishonest users can collude, compare their copies and by changing the marks where their copies differ, they are able to create a pirate copy that tries to disguise their identities. Thus, the fingerprinting problem consists in finding, for each copy of the object, the right set of marks that help to prevent collusion attacks. The construction of collusion secure codes was first addressed in Boneh and Shaw, 1998. In that paper, Boneh and Shaw obtain (c . 1)-secure codes with a probability 1 of failing to identify a guilty user. Barg et al. (2003) present a construction based on the composition of two codes, an inner binary (c, c) separable code and an outer non-binary code. The security of fingerprinting codes is normally based on the fact that the coalition cannot use any pre-established strategy, and therefore, the only possibility is to This work has been supported in part by the Spanish Research Council (CICYT) Project TSI2005-07293-C02-01 (SECONNET) and TIC2003-01748 (RUBI).

construct the new false fingerprint in a random way. In this case, the probability of framing an innocent user is lowered. The only possible strategy that is assumed in all fingerprinting schemes is the majority strategy, so all existing fingerprinting codes are robust against this strategy. Here we show a new possible strategy. Moreover, we show how to construct families of 2-secure codes from binary linear equidistant codes. The proposed construction has the particularity that an innocent user is never framed and the probability of identifying at least one coalition member is 1 2 1 for any 1 . 0. Unfortunately, the families we construct have a poor code rate. In order to obtain codes with a better rate, we also present a binary 2-secure code with error 1, that is a concatenated code, where the inner code is a code from the previous family and the outer code is a Reed-Solomon code having the identifiable parent property (IPP). The use of error correcting codes allows us to view the pirate fingerprint as a codeword of the fingerprinting code containing many errors. As it is shown below, the number of errors that a dishonest coalition is able to introduce into the fingerprint, will force us to be able to decode beyond the error correction bound of the code. Since we will be using concatenated codes, the basic idea is to pass the information obtained from the inner decodings, provided by a simplified version of the Chase algorithms, to a Koetter-Vardy algebraic soft decision decoding algorithm that is used to decode the outer code. Previous results and definitions Let Q be a finite alphabet of size jQj. We define Q n :¼ x ¼ ðxi ; . . . , xn):xn [ Q. The elements of Q n are called words. For any set X, jXj denotes the cardinality of X. Codes Let Q be the finite field with q elements, that is, Q ¼ Fq . A subset C of a vector space Fnq is called a code. The words in C are called code words. The number of nonzero coordinates in a word x is called the weight of x and is commonly denoted by w(x). The Hamming distance d(a, b) between two words a, b [ Fnq is the number of positions where a and b differ. The distance between a word a and a subset of words U # Fnq is defined as d(a, U):¼ minu[ U d(a, u). The minimum distance of C, denoted by d, is defined as the smallest distance between two different code words. A code C is a linear code if it forms a subspace of Fnq . A code with length n, dimension k and minimum distance d is denoted as a [n, k, d ]-code. If the code is not linear then it is usually denoted as (n, M, d ) where as before n denotes the length of the code and d the minimum distance. In this case M denotes the number of code words is the code. Sometimes the distance parameter is omitted from the expression and the code is simply denoted as a (n, M) code. Descendants. Envelope set Following (Barg et al., 2003; Boneh and Shaw, 1998) (we refer readers to these papers for a more detailed exposition), given a subset X? Q n, we define the envelope E(X)? Q n of X as a set of words than can be derived from X using the rules that we detail below. If y [ ((X) then y is a descendant of X and any x [ ((X) is a parent of y. A position i is undetectable for X if xri ¼ xsi , ; x r, x s [ X. The undetectable positions form a set denoted ((X).

Fingerprinting codes with 1-error 23

OIR 31,1

24

Given X, to determine E(X), by the marking assumption, the positions in Z(X) can not be modified, thus if y [ E(X) then yi ¼ xri , ;i [ Z(X), ;x r [ X. For the rest of positions, detectable positions, there are several options to choose from. As we consider the binary case, the envelope is:     n ð1Þ EðXÞ ¼ y [ Q yi [ {c, d} ¼ B, their envelopes (sets of descendants) are also disjoint, e(a, b) > e(c, dÞ ¼ B. Next corollary from (Cohen et al., 2001) gives a sufficient condition for a linear code to be (2,2)-separating. Corollary 1. All linear, equidistant codes are (2,2)-separating. Algorithms that decode beyond the error correction bound In a codeword transmission through a communications channel, the received word is usually a corrupted version of the sent codeword due to the inherent presence of noise in the channel. If we denote by c the transmitted codeword and by r the received word, then the error pattern of r is a word e, that satisfies r ¼ c þ e. The task of the decoder is to estimate the sent codeword from the received word. If the number of errors w(e) is greater than bðd 2 1Þ=2c, then there can be more than one codeword within distance w(e) from the received word and the decoder may either decode incorrectly or fail to decode. This leads to the concept of list decoding (Guruswami and Sudan, 1999) (Koetter and Vardy, 2000), where the decoder outputs a list of all code words within distance w(e) of the received word, thus offering a potential way to recover from errors beyond the error correction bound of the code. In soft-decision decoding, the decoding process takes advantage of “side information” or channel measurement information generated by the receiver, to estimate the sent codeword. In the process of decoding fingerprinting codes, we are faced with the task of correcting a number of errors beyond the error correction bound of the code. Chase algorithms In (Chase, 1972), three decoding algorithms that can double the error correcting capabilities of a given binary block code are presented. Using a binary decoder that corrects bðd 2 1Þ=2c errors together with channel measurement information, the Chase algorithms can correct up to d 2 1 errors. The idea is to perturb the received word with different test patterns, and then pass the perturbed word to a binary decoder to obtain a small set of possible error patterns, rather than just a single error pattern. The channel measurement information is used to choose the appropriate error pattern from the set. The difference between the three algorithms lies in the size of the set of error patterns considered.

Fingerprinting codes with 1-error 25

OIR 31,1

26

The Koetter-Vardy soft-decision decoding algorithm In this section, we give a brief overview of the Koetter-Vardy (KV) soft-decision decoding algorithm presented in (Koetter and Vardy, 2000), that list decodes a Reed-Solomon code beyond the error correction bound. The KV algorithm, instead of using the received word symbols, uses probabilistic reliability information about these received symbols. In many applications the input to the decoder is a reliability matrix. Usually this matrix R ¼ r ij is a q *n matrix, where each row represents a symbol from the alphabet of the code and each column represents a position of the codeword. The input rij is related to the probability that the ith alphabet symbol has been sent in the jth codeword position. We are interested in knowing what code words the KV algorithm will return. With this intention, given two q *n matrices A and B over the same field, the following product is defined: q X n   X ai;j bi;j h A; Bi :¼ trace AB T ¼

ð4Þ

i¼1 j¼1

Also a word v ¼ ðv1, v2, . . . , vn over Fq can be represented by the q *n matrix [v ], with entries [v ]i, j defined as follows: ( 1 si vj ¼ ai   vi;j : ¼ ð5Þ 0 si vj – ai In (Koetter and Vardy, 2000), Koetter and Vardy state the following theorem. Theorem 2. If codeword u is transmitted, word v is received and the reliability matrix ? is constructed, then the KV soft-decision decoding algorithm outputs a list that contains the sent codeword u if: pffiffiffiffiffiffiffiffiffiffiffiffi h R; ½u i pffiffiffiffiffiffiffiffiffiffiffiffi $ n 2 d þ oð1Þ: h R; Ri

ð6Þ

Arithmetic attacks to fingerprinting codes In this section we present our attack, that we call arithmetic-attack. This attack determines some necessary conditions that must satisfy fingerprinting codes. As a consequence of these conditions we prove that linear codes are not useful for fingerprinting schemes. Remember that we suppose that the code C used in the fingerprinting scheme is unknown by the colluders (hypothesis that benefices the distributor), and that the distributor D uses the Hamming distance strategy as the identification assumption. Then if C is a linear code we show that the probability pe of framing a user is pe $ 1/3. This not means that linear codes can not be used to construct fingerprinting codes, but we need to be careful that the resulting code is not linear. Theorem 3. Let C be any fingerprinting code defined over an alphabet Q. We assume that coalitions know (Q, þ). If for some pair of different code words (fingerprints) u, v [ C 2 {0} we have that u 2 v and v 2 u are also in C, then pe $ 1/3.

Proof. Given two code words (fingerprints) U ¼ {u, v} , C 2 {0} we consider coalitions U 1 ¼ {u, v}, U 2 ¼ {u, u 2 v} and U 3 ¼ {v, v 2 u}. Each coalition U i ¼ {x i1, x i2} produces a false fingerprinting y i as y i ¼ xi1 Q x i2. In what follows we show that the false fingerprinting produced by one of these coalitions can frame a user not in the same coalition, thus in this case pe $ 1/3. A necessary condition to avoid framing the user with fingerprint x i1 2 x i2, not a member of Ui, is: dðx i1 2 x i2 ; y i Þ . dðx ij ; y i Þ;

ð7Þ

for i ¼ 1,2,3 and j ¼ 1 or j ¼ 2. But, in what follows, we will show that this is contradictory. First note that:    i2  ð8Þ dðx i1 2 x i2 ; y i Þ ¼ dðx i1 2 x i2 ; x i1 Qx i2 Þ ¼ {zxi1 z ¼ xz – 0} Moreover if x jk ¼ xi1 2 x i2 [ Uj, i – j, and we define: {x j } ¼ U j 2 {x jk }

ð9Þ

then {x j, 2 x j} > Ui – B. Now consider x i1 2 x i2 ¼ xjk [ Uj. We show that: dðx jk ; y j Þ $ dðx i1 2 x i2 ; y i Þ:

ð10Þ

i2 By equation (8) we only need to show that coordinates z such that xi1 z ¼ xz – 0 satisfy jk j jk xz – yz , but xz ¼ 0, and following the notation used in (9) we see that yjz ¼ ^xjz , but we know that {x j, 2 x j} > Ui – B, thus xjz – 0. Finally, using equation (10) it is an easy combinatorial problem to show that inequalities in equation (7) are impossible. Thus we can conclude that pe $ 1/3. Corollary 2. Let C be any linear binary code, with jCj . 2, then pe $ 1/3. Proof. In this case, Q ¼ F 2 thus any coalition can deduce (Q, þ ). A more restrictive result for the case of linear codes can be stated. Proposition 1. Let C be a linear code over a finite field, with jCj . 2. Let u, v [ C represents any arbitrary 2-coalition that produces the false fingerprinting y ¼ u Q v. Then the probability p of framing an innocent user, not in the coalition, satisfies p $ 1/9. Proof. From Theorem 3, for each pair {u, v} that does not frame any user (good-pair), we know how to construct a pair that frames other users (bad-pair). Moreover, given two good-pairs {u, v} and {u0 , v0 }, without loss of generality, relabeling the fingerprints (code words) if necessary, we can construct two bad-pairs U 1 ¼ {u, u 2 v} and U 2 ¼ {u0 , u0 2 v0 }. It can happen that U 1 ¼ U 2, but in this case u0 ¼ u 2 v and v0 ¼ 2v. Moreover, if the bad-pair associated to another good pair {u00 , v00 } is U 3 ¼ {u00 , u00 2 v00 }, and U 1 ¼ U 2 ¼ U 3 then v00 ¼ 2v and v00 ¼ 2v0 so v ¼ v0 5 v00 ¼ 0, and u ¼ u0 ¼ u00 . Therefore, in the worst case, two good-pairs generate the same bad-pair, thus the probability that an arbitrary coalition is a bad-pair is p $ 1/9. Remember that for the binary case, Q ¼ F 2, the operation y 5 x1 Q x 2 consists on setting yi ¼ x1i ;i [ Z(X), x 1 [ X, and yi ¼ 1 otherwise. Thus the strategy of the coalition on this case is to set the detectable positions of the fingerprinting to the value 1.

Fingerprinting codes with 1-error 27

OIR 31,1

28

Equidistant binary linear codes as fingerprinting codes In this section we discuss some of the properties about equidistant binary linear codes related to the fingerprinting problem. Equidistant binary codes are (2,2)-separable, equidistant codes, with all code words except the all zero code words have the same Hamming weight. We first show that considering an equidistant code C, a 2-coalition c cannot generate any false fingerprint that is closer (in the Hamming sense) to a codeword w [ C 2 c than to the own coalition code words, that is min{d(x, y) j x [ c} # d(w, y) ; w [ C 2 c. Proposition 2. Let C be an equidistant binary C ¼ C[n, k, d ] code. Let c ¼ {u, v} [ C be a coalition, and let y be a false fingerprint generated by c. Then, we always have that: d d dðw; yÞ $ ; and min{dðx; yÞjx [ c} # ; 2 2 where w is any codeword, w [ C 2 c. From the Proposition 2 it follows that the worst situation for the distributor is when: d dðw; yÞ ¼ dðu; yÞ ¼ dðv; yÞ ¼ ; 2

ð11Þ

for some w [ C 2 c. Note that the (2,2)-separability of the equidistant codes, determines that there can only exist a single codeword w with this property. Moreover, for this to happen, the false fingerprint yd must have exactly d/2 symbols from ud and d/2 symbols from vd. Next proposition gives the necessary conditions, so (11) is satisfied. Proposition 3. Let C be an equidistant binary C ¼ C[n, k, d ] code. Let c ¼ {u, v} [ C be a coalition and let y be a false fingerprint generated by the coalition c. Then: dðw; yÞ ¼ dðu; yÞ ¼ dðv; yÞ ¼

d 2

ð12Þ

only if: d dðu d ; y d Þ ¼ dðv d ; y d Þ ¼ ; 2 and therefore the Hamming weight of the false fingerprint y, denoted by w(y), satisfies w(y) mod 2 ¼ 0. Moreover, if w(y) mod 2 – 0, then we can obtain the coalition c using minimum distance decoding. Proof. The results are immediate by considering w(y) even and w(y) odd respectively. Assuming that (11) is satisfied, there is still a chance to unveil the coalition as is shown in the following proposition. Proposition 4. Let C be an equidistant binary C ¼ C[n, k, d ] code. Let c ¼ {u, v} [ C be a coalition and let y be a false fingerprint generated by the coalition c, such that:

d dðw; yÞ ¼ dðu; yÞ ¼ dðv; yÞ ¼ : 2

ð13Þ

Then the probability pC of recovering the coalition satisfies: pC ¼

2k 2 2 : 2d

Proof. If (11) is satisfied, then d(wd, ydÞ ¼ 0, since due to Proposition 2 we know that d(wn2 d, yn2 dÞ ¼ d/2. Considering all the code words, x i ¼ ðxid ; xin2d Þ [ C, then xid – xjd for i – j, since in any other case d(x i, x j) # n 2 d , d. Therefore, the space Cd generated by {xid jx i [ C} has dimension k (can be seen as a code subspace, where the code is a space of dimension k). Since in this case the false fingerprint y satisfies d(wid , ydÞ ¼ 0 for some codeword, this means that k free positions in the false fingerprint, but the remaining d 2 k positions are fixed. Now, assuming that all false fingerprints have the same probability, the probability pc that yd [ Cd is: d2k 1 pc ¼ ; 2 and therefore the probability pC that d(wid , ydÞ ¼ 0 for some codeword w i [ C 2 c is:

2k 2 2 1 d2k 2k 2 2 pC ¼ ¼ : ð14Þ 2k 2 2d Using the results in (Bonisoli, 1984) we have that for equidistant codes d $ 2k2 1. Note that in this case, we exceed the correcting capacity of the code and we will have to use the Simplified Chase algorithm discussed in Section 7. Construction of fingerprinting codes Equidistant linear codes cannot be used as fingerprinting codes as shown above, since the 2-coalition has enough information about the codeword symbols, so they can construct a false fingerprint deterministically that satisfies d(wid , ydÞ ¼ 0 for some codeword w i [ C 2 c, taking d/2 symbols from each codeword. However, equidistant linear codes can be used in a concatenated scheme, so the probability that the coalition is able to determine correctly d/2 symbols from each codeword can be made as small as we seek. In order to create a family ? of 2-secure codes we consider an equidistant linear binary code C, mappings ?i:F2 ! Fm 2 , with ?i(0Þ ¼ 1 þ ?i(1), chosen uniformly at random among all possible ones, for i ¼ 1, . . . , n, where m . 0 is an integer. To generate the Ck code of the family, we choose uniformly a random permutation Pk: mn Fmn 2 ! F2 and we define the code words through the mapping Pko(?1, . . . , ?n):C ! Ck. Note that in this construction, when a coalition generates a false fingerprint does not have any information that helps them reconstruct the symbols (they have to do it at random), and therefore they can generate symbols that do not correspond to any one of the symbols used in the encoding process, that is, the distributor can detect positions

Fingerprinting codes with 1-error 29

OIR 31,1

30

that have been modified, and of course take advantage of it, as is shown in the following theorem. Theorem 4. Let C ¼ C[n, k, d ] be an equidistant binary linear code. Let ? the family of codes created according to the above paragraph. Let c ¼ {u, v} be a coalition and let y ¼ ðy1, . . . ,ymn) be the false fingerprint generated by c. If any of the symbols of y is not valid, that is, yi  {?i(0), ?i(1)}, for some 1 ¼ i ¼ n, then the coalition can be identified. Proof. Taking the invalid symbols yi, and replacing them by valid symbols yi0 we can always construct a new false fingerprint y0 , in such a way that w(y0 ) is odd. Obviously c can generate y0 and since w(y0 ) is odd, by Proposition 2 it can be seen that: min{dðx; y0 Þjx [ {u; v}} #

d , dðw; y0 Þ; ; w [ C 2 c; 2

that is, the false fingerprint y0 is closer to some coalition codeword than to any other codeword. Moreover, the false fingerprint is also within the error correcting capability of the code. Taking into account the previous result, it is clear that the only way of not being able to decode correctly, is that the coalition c constructs a false fingerprint with d/2 (correct) symbols from each codeword, and also that d(wd, ydÞ ¼ 0 for some codeword w [ C 2 c (Propositions 3 and 4). Next proposition evaluates the probability that, considering that the coalition has taken d/2 symbols from each codeword, the false fingerprint is erroneous. Proposition 5. Let C[n, k, d ] be an equidistant binary linear code. Let ? be the family of codes created according to the previous paragraph. Let c ¼ {u, v} be a coalition and let y be the false fingerprint generated by c, in such a way that y contains md/2 bits from u and md/2 bits from v. Then, the probability pv that y is a correct false fingerprint (all the symbols of the false fingerprint are correct) is: ! d d=2 pv

md

!:

md=2 Proof. The coalition knows md bits of each codeword. Using this knowledge, they want to construct a false fingerprint that contains d/2 correct symbols of each codeword. Since the coalition does know neither the symbol encoding nor the bit position of each symbol among the md known bits, we can assume that the election of the md/2 has to be done in a totally random way. md In this case the number of possible elections is ðmd=2 Þ: On the other hand, the correct elections (the symbols that have been correctly reconstructed) are precisely the ones in which the chosen bots correspond exactly to d/2 d Þ possible ways of symbols among the available d symbols, in other words there are ðd=2 choosing them correctly. Using the previous proposition and the fact the minimum distance of the code increases with the same order of magnitude as the code length (Bonisoli, 1984), the relationship between the probability pv of not being able to identify any member of the coalition using minimum distance decoding, and the code length satisfies:

d

!

d=2 pv

md

!,

2d 3md 4

2

¼ 22dð 4 21Þ : 3m

Fingerprinting codes with 1-error

md=2

31 Decoding In the decoding process, there is only one situation in which we will not be able to find any member of the coalition, and this will be when the false fingerprint satisfies d(w, yÞ ¼ d(u, yÞ ¼ d(v, yÞ ¼ d/2, for some codeword w [ C 2 c. This case has probability pCpv and we will be able to detect it, since decoding by minimum distance does not return any codeword (This fact will be used below in the proof of Theorem 6). A different situation is encountered when the false fingerprint y satisfies d(u, yÞ ¼ d(v, yÞ ¼ d/2 and d(w, y) . d/2, for all w [ C 2 c. This situation is also detected, since again the decoding algorithm does not return any codeword. The probability of this case is determined by (1 2 pC) pv. Note however that applying the algorithm in Section 7 we are able to find the coalition. Finally, if the false fingerprint is not correct, and this happens with probability 1 2 pv, we are always able to identify the coalition. Note that this means that we will never frame an innocent user, and also that with probability as close to 1 as desired, we will be able to trace a coalition member. Construction of a concatenated fingerprinting code The idea of using code concatenation in fingerprinting schemes to construct asymptotically better codes, was presented earlier (Boneh and Shaw, 1998). A concatenated code is the combination of an inner (nI, MI, dI) qI -ary code qI ¼ 2 with an outer nO, MO, dO code over FM I . The combination consists in mapping the code words of the inner code to the elements of FM I , that results in a qI -ary code of length nInO, and MO, code words. Note that the size of the concatenated code is the same as the size of the outer code. Obviously, to construct a binary concatenated fingerprinting code the inner code must be a binary code. As it will be seen below the family of codes discussed in the above section will suit our needs. The major drawback of these codes is that the number of code words grows linearly with the length of the code. To overcome this situation we use code concatenation, and combine them with an 2-IPP Reed-Solomon code. So, to obtain a (nc, Mc) binary fingerprinting code C, where nc ¼ nInO and mc ¼ mO we use: (1) as inner code, a (nI, MI, dI) family of codes Ck, generated using: . an [ne, ke, de] equidistant binary linear code Ce; . mappings ci : F2 ! Fm 2 , with ci ð0Þ ¼ 1 þ ci ð1Þ, chosen uniformly at random;

OIR 31,1

32

among all possible ones, for i ¼ 1, . . . , ne; e e a random permutation pk : Fmn ! Fmn 2 2 ; and   . a mapping pk + c1; ... ; cn : C ! C k .   (2) as outer code, a n; dn=4e; n 2 dn=4e þ 1 2-IPP Reed-Solomon code overFM I ; and (3) together with a mapping f : FM i ! C k . . .

The code words of C are obtained as follows, take a codeword x ¼ ðx1 , . . . , xn) from the Reed-Solomon code and compute y i ¼ fðxi Þ; 1 # i # n. The concatenation of the yi’s forms a codeword y [ C, where, y ¼ ðy 1 ; . . . ; y n Þ such that y 1 ¼ fðx i Þ: Remark. Note that the number of code words of the concatenated code C, is the same as the number of code words of the outer code, that is, M c ¼ M O . In a practical application of a fingerprinting code, this simply means, that we can identify users either by their assigned fingerprint (codeword of the concatenated code) or by the associated codeword from the outer code. Simplified Chase algorithm In this section, we present a new decoding algorithm that decodes a descendant of an equidistant linear binary code. Our algorithm is based on the Chase decoding algorithms presented above. As we said before, to decode a descendant means to find all possible pairs of parents of a given descendant. To accomplish this task for an equidistant linear binary code, we will need to “correct” up to d/2 errors, which is beyond the error correction bound of the code. To see this, suppose we have a code Ce with parameters [n, k, d ], if y is a descendant, then, according to Proposition 2, there are three possibilities for the sets of pairs of parents: (1) A star configuration. All pairs of parents have a common element, say {h1}, where dist (h1, y) # d/2 2 1. (2) A “degenerated” star configuration. There is a single parent pair {h1, h2}. Now dist (h1, yÞ ¼ dist (h2, yÞ ¼ d/2. (3) A triangle configuration. There are three possible pairs of parents: {h1, h2}, {h1, h3}, {h2, h3}. Note that dist (h1, yÞ ¼ dist (h2, yÞ ¼ dist (h3, yÞ ¼ d/2. Therefore, we need an algorithm that outputs all code words of Ce within distance d/2 of y. Moreover, we suppose that we have a binary decoder that corrects up to bðd 2 1Þ=2c¼ d=2 2 1 errors, so if y lies in a sphere of radius d/2 2 1 surrounding any codeword, then it will be corrected by the binary decoder. On the other hand, if there are exactly d/2 errors in y, then the binary decoder fails to decode. But in this case, note that the word y0 , obtained by applying a test pattern p of weight 1 to y, can fall in a sphere of radius d/2 2 1 around a codeword. So, using the appropriate test pattern, we are allowed to correct d/2 errors. We define the set of matching positions between words a and b, M({a, b), as M({a, b) :¼ {i:ai ¼ bi }.

Suppose that the number of errors in a descendant word y is d/2, then the algorithm will return two or three code words, depending of how y was constructed. The idea of the algorithm is to efficiently find the right test pattern, using the already found candidate parents. To see how this is done, note that once a candidate parent is found, the support of the test pattern that helps to find another parent, lies in the matching positions between the candidate and the received word.

Simplified Chase algorithm: The algorithm uses: . A function called binary_decoder(v), that takes as its argument the descendant v, and if it exists, outputs the unique codeword within distance d/2 2 1 of v. . A function called right_shift(p), that takes as its argument a test pattern p of weight 1, and outputs a test pattern with its support shifted one position to the right with respect to p, i.e. right_shift((0,1,0,0ÞÞ ¼ ð0,0,1,0). . A list that maintains all the already used test patterns that, when applied to the received word, failed to decode into a codeword. . Take u ¼ ðu1 , . . . , un) and v ¼ ðv1 , . . . , vn), then u%v denotes the bitwise exclusive or, ðu%v ¼ ðu1 %v1 ; . . . ; un %vn Þ. Input: Descendant v of 2 code words of a [n, k, d ] equidistant binary linear code. Output: All code words within distance d/2 of v. (1) Set u1:¼ binary_decoder(v). If u1 – B then output u1 and quit. (2) Initialization: p :¼ ð1,0,0, . . . , 0), list:¼ B. (3) Set v0 :¼ v%p and run the binary decoder. Set u 1 ¼ binary_decoder(v0 ). (4) If u1 – B then go to step 5. Else add p to list. Set p:¼ right_shift(p). Go to step 3. (5) Construct a new test pattern p of weight 1 that: . is different from all the patterns in list; and . its support is one of the matching positions between v and u1. (6) Set v0 :¼ v%p and run the binary decoder, u2:¼ binary_decoder(v0 ). (7) If u2 – B then go to step 8. Else add p to list and go to step 5. (8) Construct a new test pattern p of weight 1 that: . is different from all the patterns in list; and . its support is one of the matching positions between v, u1 and u2. If there are no more test patterns available, output code words u1, u2 and quit. (9) Set v0 :¼ v%p and run the binary decoder, u3:¼ binary_decoder(v0 ). (10) If u3 – B then goto step 11. Else add p to list and go to step 8. (11) Output code words u1, u2 and u3 and quit.

Fingerprinting codes with 1-error 33

OIR 31,1

34

Decoding of the fingerprinting concatenated code In this section, we present a tracing algorithm that efficiently decodes the concatenated fingerprinting codes constructed above. The decoding method we propose uses the previously discussed Simplified Chase Algorithm and the Koetter-Vardy Algorithm. For a Reed-Solomon code with 2-IPP, we call any codeword that is involved in the construction of a given descendant in an unambiguous way a positive parent. The goal of a tracing algorithm is to output a list that contains all positive parents of a given descendant. However, note that to find both parents is a strong condition, since if for example, one of the parents contributes with only k 2 1 positions, where k is the dimension of the code, then there are multiple pairs of parents. Next theorem, provides a useful condition for a codeword to be a positive parent. Theorem 5. Let C be a [n, k, d ] Reed-Solomon code with 2-IPP, if a codeword agrees in more than 2(n 2 d ) positions with a given descendant then this codeword is a positive parent of the descendant. Proof. If the code has minimum distance d, then two code words can agree in at most n 2 d positions, therefore a parent pair is only able to produce a descendant that agrees in at most 2(n 2 d ) positions with any other codeword. Then any codeword that agrees with a descendant in at least 2(n 2 d ) þ 1 positions is a parent. Overview of the algorithm The decoding is done in two stages. First, we decode the inner code to obtain an n-tuple of sets of code words. Then, with this n-tuple of sets we construct a reliability matrix that is used to decode the outer code. Suppose we want to decode the following fingerprint: z ¼ ðz 1 ; . . . ; z n Þ The inner decoding consists in: . undoing the mappings ci : F2 ! Fm 1#i# 2 , with ci ð0Þ ¼ 1 þ ci ð1Þ, for each n to obtain a tuple y ¼ ðy1, . . . , yn); and . decoding each subword yi using the Simplified Chase Algorithm. The output, as seen above, will be a single codeword {h1}, a pair of code words {h1, h2} or three code words {h1, h2, h3}. Then, for i ¼ 1, . . . ,n we use the mapping f(st Þ ¼ ht to obtain the set S ði jÞ ¼ si1 ; . . . ; sij , where the superscript [ {1, 2, 3} indicates the cardinality of the set. Note that the elements of the S ði jÞ ’s are symbols from FM I . We denote by S (1) the set of the S (1)’s, by S (2) the set of the S (2)’snand by S (3) othe set of the S (3)’s. ðjÞ We also define the n-tuple of sets S ¼ S ðjÞ i ; · · ·; S n , that is used to construct a reliability matrix. With this matrix we run the Koetter-Vardy algorithm obtaining a list U of potential parents. Decoding algorithm for the fingerprinting concatenated code The algorithm takes as its input a descendant (z ¼ z 1 , . . . , zn) of the concatenated code, and outputs the positive parents of this descendant in the form of code words of the outer code. Note that due to the remark above this does not imply any ambiguity.

(1) For i ¼ 1, . . . , n: . Undo de mapping ci : F2 ! Fm 2 for the mne-tuple zi, and obtain the ne-tuple yi. . Decode the inner word yi using the Simplified Chase Algorithm to obtain a list of at most 3 code words {h1,. . .,hj}, j [ {1, 2, 3}. . Define S ði jÞ ¼ fsi1 ; . . . ; sij g, where fðst Þ ¼ ht ; 1 # m # j and j [ {1, 2, 3} depending on the output of previous. (2) Initialize S ¼ fS ði jÞ ; . . . ; S ðnjÞ g: . Define the subsets S ð1Þ ¼ fS ðp jÞ [ S : j ¼ 1g, S ð2Þ ¼ fS ðp jÞ [ S : j ¼ 2g, and S ð3Þ ¼ fS ðp jÞ [ S : j ¼ 3g. (3) Construct the reliability matrix R as follows: r i;p ¼

1 j

si spt ¼ ai ; spt [ S ðp jÞ ; 1 # t # j

0

otherwise

:

(4) Run the Koetter-Vardy algorithm using R, obtaining a list of code words U. (5) Use the following decision rule:     (6) If S ð1Þ  þ S ð2Þ  . 2ðn 2 d Þ then compute:  o n  ð2Þ  U 1;2 ¼ u [ U :  p : up [ S ð1Þ . ðn 2 d Þ: p _ up [ S p    Note that U 1;2  # 2. If jU1,2j ¼ 0 then go to Step 7, else output U1,2 and quit. ð3Þ (7) Find a list U3, of all code words u l [ U =ulp [ S ð3Þ p ;S p [ S: ð2Þ . For each S p [ S do: 1 If there is an S ð2Þ p ¼ fsp1 ; sp2 g for which there are exactly 2 code words (u , 2 1 2 1 2 u [ U3 such that up ¼ sp1 and up ¼ sp2 , then output u and u and quit. . For each S ð1Þ p ¼ [ S do: 1 If there is an S ð1Þ p ¼ fsp1 g for which there is exactly 1 codeword u [ U3 1 1 such that up ¼ sp1 , then output u and quit. (8) Decoding Fails. Analysis of the algorithm Note that if the algorithm succeeds it outputs at least one of the parents. Also note that the algorithm only outputs positive parents, so if it is used in fingerprinting schemes it never accuses innocent users. Next, we will show that the probability that the algorithm above fails can be made arbitrarily small. To see this, we show that the probability that the tracing algorithm fails, decreases exponentially with the length of the code. Theorem 6. Given the above discussed concatenated fingerprinting code and given a descendant Word y ¼ ðy1, . . . , yn) created by a coalition of at most size two. The probability that the tracing algorithm fails is given by: pe , 22½ ð4 m2 3

ke 21

Þðn22ðn2d ÞÞ2ke 

Fingerprinting codes with 1-error 35

OIR 31,1

36

where d ¼ n 2 dn=4e þ 1 is the minimum distance of the outer code and k its dimension, and ke is the dimension of the inner code. Proof. Note that from the previous discussion we have that the fingerprinting code is the combination of an inner {ne,ke, de] equidistant binary linear code and an outer  code 2-IPP n; dn=4e; n 2 dn=4e þ 1 Reed-Solomon code. From the decoding algorithm, it can be seen that decoding can only fail if  ð3Þ  S  $ n 2 2ðn 2 d Þ. But we have seen that the probability that this happens is: n X

pe

ð pv pC Þi

i¼n22ðn2d Þ 3 ke where pv , 22de ð4 m21Þ and pc , 2 2d22 , and n and d are respectively, the length and e minimum distance of the outer Reed-Solomon code. Clearly the above sum can be upper bounded as:

pe ¼ 2ðn 2 d Þð pv pc Þn22ðn2d Þ : Now note that: pv pc ,

2ke 2 2 2de ð3 m21Þ 3 3 3 ke 21 4 ¼ 2ke 2ðde 4 mÞ 2 22ðde 4 mÞþ1 ¼ 224m2 ð2ke 2 2Þ 2 d de

where we have used the fact from (Bonisoli, 1984) that for equidistant codes d e $ 2ke 21.  On the other hand we have that since the outer code is a n; dn=4en 2 dn=4e þ 1 2-IPP Reed-Solomon code, then: 2ðn 2 d Þ ¼ dn=2e 2 2 ¼ d

2k e22 2

because for Reed-Solomon codes the length of the code is equal to the size of the alphabet (MacWilliams and Sloane, 1997) of the code, and in the concatenated construction the size of the alphabet of the outer code is precisely the number of code words in the inner code, that we have denoted by 2ke . Then, ke

3 k 21 n22ðn2d Þ 2 e pe , d 2 2e 224 m2 ð2ke 2 2 2 from this it follows that:   3 ke 21 n22ðn2d Þ pe , 2ke 224 m2 2ke so: pe , 22½ ð4 m2 3

and the theorem follows.

ke 21 þk

e

Þðn22ðn2d ÞÞ2ke 

Conclusions We have presented an attack to fingerprinting codes that shows that linear codes are not good fingerprinting codes. Taking this attack into account we discuss the construction of a new family of fingerprinting codes based on equidistant binary linear codes. Moreover, in order to obtain codes with a better rate, we have also presented a fingerprinting code together with an efficient tracing algorithm. The tracing algorithm never accuses an innocent user and the probability that tracing fails can be made arbitrarily small. The identification process consists in the decoding of a concatenated code, where both the inner and the outer code are decoded beyond the error correction bound. For the decoding of the inner code, we present a modification of the Chase algorithms, that taking advantage of the structure of the descendant, allows to efficiently search for all codewords within distance d/2 of the descendant. The outer code is decoded with the Koetter-Vardy soft-decision decoding algorithm. Using soft-decision decoding allows us to improve the tracing capabilities of the algorithms in (Barg et al., 2003), that use hard-decision decoding, where only one of the colluders can be traced with probability 1. Our approach allows, in many cases, to find both colluders with probability 1. References Barg, A., Blakley, G.R. and Kabatiansky, G. (2003), “Digital fingerprinting codes: problem statements, constructions, identification of traitors”, IEEE Transactions on Information Theory, Vol. 49 No. 4, pp. 852-65. Boneh, D. and Shaw, J. (1998), “Collusion-secure fingerprinting for digital data”, IEEE Transactions on Information Theory, Vol. 44 No. 5, pp. 1897-905. Bonisoli, A. (1984), “Every equidistant linear code is a sequence of dual hamming codes”, Ars Combinatoria, Vol. 18, pp. 181-96. Chase, D. (1972), “A class of algorithms for decoding block codes with channel measurement information”, IEEE Transactions on Information Theory, Vol. 18, pp. 170-82. Cohen, G., Encheva, S. and Schaathun, H.G. (2001), On Separating Codes, technical report, ENST, Paris. Guruswami, V. and Sudan, M. (1999), “Improved decoding of Reed-Solomon and algebraic-geometry codes”, IEEE Transactions on Information Theory, Vol. 45 No. 6, pp. 1757-67. Hollmann Henk, D.L., van Lint, J.H., Linnartz, J-P. and Tolhuizen Ludo, M.G.M. (1998), “On codes with the Identifiable Parent Property”, Journal of Combinatorial Theory, Vol. 82 No. 2, pp. 121-33. Koetter, R. and Vardy, A. (2000), “Algebraic soft-decision decoding of Reed-Solomon codes”, IEEE Transactions on Information Theory, Vol. 49 No. 11, pp. 2809-25. MacWilliams, F.J. and Sloane, N.J.A. (1997), The Theory of Error-Correcting Codes, North Holland, New York, NY. Corresponding author Marcel Fernandez can be contacted at: [email protected] To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

Fingerprinting codes with 1-error 37

The current issue and full text archive of this journal is available at www.emeraldinsight.com/1468-4527.htm

OIR 31,1

Verification algorithms for governed use of multimedia content

38

Eva Rodrı´guez and Jaime Delgado

Article received 25 September 2006 Reviewed by EuDiRights Workshop Committee Approved for publication 10 October 2006

Distributed Multimedia Applications Group, Departament d’Arquitectura de Computadors, Universitat Polite`cnica de Catalunya, Barcelona, Spain Abstract Purpose – The purpose of this paper is to present different verification algorithms that will be used by digital rights management (DRM) systems to enable the governed distribution, super-distribution and offers of multimedia content. An issue of increased interest in DRM systems is the control of the creation, distribution and consumption of multimedia content through the complete digital value chain. Design/methodology/approach – The design and implementation of verification algorithms based on licences is described. Tools implementing these algorithms are used by DRM systems in B2B and B2C models where the distribution, offer and consumption of digital assets are controlled. Some use cases regarding the distribution, super-distribution and offer models are presented. Findings – It has been demonstrated that DRM systems governing the use of multimedia content through the complete distribution chain can use the verification algorithms proposed in this paper to enable governed distribution of multimedia content. By using these algorithms, they can determine whether the rights have been passed in a proper way from parent to child licences. Moreover, these systems can also enforce the rights when distributing multimedia content. Originality/value – The algorithms proposed can be used by DRM systems that control the use of multimedia content through the complete digital value chain. These algorithms have been designed to ensure that the permissions and constraints passed from parent to child licences have been done according to the terms determined by content creators or distributors. Keywords Intellectual property, Data security, Algorithmic languages, Multimedia, Standards Paper type Technical paper

Introduction A key issue in the distribution and use of multimedia content is to ensure that the terms and conditions stated by content owners are respected by the other actors of the value chain, such as content aggregators or distributors when using or distributing digital instances of these assets. It is also important to specify verification mechanisms that will be used by DRM systems in order to ensure that governed content is used and distributed through the complete digital value chain according to the rights stated by its creator. These verification algorithms will determine whether distribution or Online Information Review Vol. 31 No. 1, 2007 pp. 38-58 q Emerald Group Publishing Limited 1468-4527 DOI 10.1108/14684520710731029

This work has been partly supported by the Spanish administration (DRM-MM project, TSI2005-05277) and AXMEDIS (Automated Production of Cross Media Content for Multi-Channel Distribution, http://www.axmedis.org), a European Integrated Project, funded under the European Commission IST FP6 Program.

consumption licences are valid according to the terms and conditions specified by the producer or distributor respectively. The MPEG-21 standard (Bormans and Hill, 2002) specifies mechanisms to enable controlled distribution of multimedia content through the complete digital value chain. Specifically, Parts 5 (ISO/IEC, 2004) and 6 of this standard, MPEG-21 REL and RDD respectively, specify mechanisms to create rights expressions that govern the distribution and consumption of multimedia content. This paper presents three verification algorithms that specify how to determine whether licences that govern digital objects have been created according to the rights expressions stated by their creators when they are distributed, super-distributed or offered. Then we present three different scenarios (one for each verification algorithm specified) in which a DRM system makes use of the appropriate verification algorithm specified in this paper to verify that the digital content is distributed according to the terms and conditions specified by its producer. In this area we can find other contributions (Halpern and Weissman, 2004) related to the formalisation of licence based authorisation algorithms. Nevertheless, this contribution examines only a fragment of XrML (Wang et al., 2002), and the authors do not address the enforcement of rights in distribution scenarios. Digital rights management The role of Digital Rights Management (DRM) systems in the use of digital content is to enable business models where the creation, distribution and consumption content are controlled. As such, DRM permits the governance of multimedia content throughout the complete digital value chain. For example, when a distributor buys content, he agrees to certain permissions and constraints, for example, the free distribution of a low quality version of a track, or the distribution of the complete music album to the members of a music club for a special fee. Rights Expression Languages allow this choice to be translated into permissions and constraints, which are then enforced when the content is distributed or rendered. Therefore, it is important to enforce the rights through the complete digital value chain. The different elements that form a DRM system are rights expression creators, protection mechanisms and tools; and services and tools enabling the packaging, distribution and consumption of governed and protected digital content. Rights expression languages The different parties involved in the distribution and consumption of digital content need to exchange information about the rights expressions that govern each digital asset through the multimedia distribution chain, from creation to consumption. Rights expression languages (RELs) are languages devised to express terms of use of digital content. They have been proposed to describe licences governing the terms and conditions of content access. The most relevant rights expression languages are MPEG-21 REL based on the eXtensible rights Markup Language (XrML) (Wang et al., 2002) proposed by ContentGuard, Inc. and the Open Digital Rights Language (ODRL) (Iannella and Guth, 2006) proposed by Renato Ianella from IPR Systems. XrML and ODRL are based syntactically on XML, while structurally they both conform to the axiomatic principles of rights modelling first laid down by, among others, Dr Mark Stefik of Xerox PARC, the designer of the Digital Property Rights Language (DPRL).

Verification algorithms

39

OIR 31,1

In an end-to-end system other considerations such as authenticity and integrity of rights expression become important. For example a content provider or a distributor who issues rights to use or distribute assets must be identified and authorised. In addition rights expressions may be accessed by different participants, which requires mechanisms and semantics for validating their authenticity and integrity.

40

MPEG-21 MPEG’s approach is to define a Multimedia Framework to ensure that the systems that deliver multimedia content are inter-operable and that the transactions among them are automated. The MPEG-21 multimedia framework (Bormans and Hill, 2002) has two essential concepts: the Digital Item (DI), a fundamental unit of distribution and transaction; and the users that interact with them. Different parts of the MPEG-21 standard normatively specify different pieces and formats needed by a complete DRM system. These parts are: MPEG-21 Digital Item Declaration (DID, Part 2) that specifies the model for a DI. MPEG-21 Rights Expression Language (REL, Part 5) (ISO/IEC, 2004) which defines it as a machine-readable language to declare rights and permissions using the terms as defined in the Rights Data Dictionary. MPEG-21 Rights Data Dictionary (RDD, Part 6) comprises a set of clear, consistent, structured, integrated and uniquely identified terms. The structure of the RDD is designed to provide a set of well-defined terms for use in rights expressions. MPEG-21 Intellectual Property Management and Protection Components (IPMP, Part 4) deals with the standardisation of a general solution for the management and protection of Intellectual Property. MPEG-21 Event Reporting (ER, Part 15) provides a standardised means for sharing information about events among peers and users. MPEG-21 REL Part 5 of the MPEG-21 standard specifies the syntax and semantics of a Rights Expression Language for issuing rights for users to act on Digital Items and their components. MPEG chose XrML as the basis for the development of the MPEG-21 Rights Expression Language (REL) (ISO/IEC, 2004). The most important concept in REL is the licence that is conceptually a container of grants, each one of which conveys to a principal the sanction to exercise a right against a resource. Figure 1 shows the structure of an MPEG-21 REL Licence that is formed by the elements title, inventory, grant or grantGroup, issuer and otherInfo. The title element provides a descriptive phrase about the licence. The grant or grantGroup elements are the most important factors in an MPEG-21 REL license. They convey to a particular principal the sanction to exercise some identified right against some identified resource, possibly subject to the need for some condition to be fulfilled first. The inventory element is used for listing XML expressions that will be used in the rest of the licence. The issuer element may contain two pieces of information regarding the identification of the issuer, usually coupled with a digital signature for the licence and a set of issuer-specific details about the circumstances under which the licence was issued. Finally, the otherInfo element provides an extensibility hook within which licence issuers may place relevant additional information. A grant is formed by four elements: a principal that represents the unique identification of the entity involved in the granting of rights; a right that specifies the action that the principal may perform over a resource; a resource that represents the

Verification algorithms

41

Figure 1. MPEG-21 REL license structure

object against which the principal of a grant has the right to perform; and the condition element represents the restrictions and obligations that a principal must satisfy before it may take advantage of an authorisation conveyed to it in this grant. MPEG-21 REL distribution mechanisms MPEG-21 REL provides mechanisms to enable distribution of multimedia content in a governed way. This section presents the distribution, super-distribution and offer models specified in MPEG-21 REL. Distribution. MPEG-21 provides mechanisms to enable distribution of multimedia content in a governed way. MPEG-21 REL has defined mechanisms to specify distribution licences by defining the issue right. Content owners and distributors can grant other parties of the digital value chain permission to distribute instances of their works or manifestations. Entities that use this right, for example, distributors, can issue the associated grant or grantGroup. These grants or grantGroups determine the digital resources and the rights that can be distributed. These grants may also determine the principals to which digital resources can be distributed and the conditions that must be fulfilled in order to exercise granted rights. If a licence grants the right to issue a resource, then the resource within this licence shall be a grant or grantGroup. Then the licence conveys the authorisation for the principal to issue this grant or grantGroup. At the instant that a licence is issued, the issue right must be held by the issuer of the licence with respect to all the grants and grantGroups directly authorised therein. The use of the issue right is one of the basic mechanisms by which governed multimedia content can be distributed throughout the multimedia delivery chain in a controlled way avoiding unfair use of copyrighted content. Infinite redistribution. MPEG-21 REL has defined the delegation control feature in order to enable infinite redistribution of governed multimedia content. This

OIR 31,1

42

mechanism is used in distribution scenarios to control the delegation of permissions and constraints over governed multimedia content. With this mechanism, the rights holder can specify that the licensee can delegate the permissions that a grant or grantGroup conveys. A licence that enables infinite redistribution shall have a grant that contains a delegationControl element, which enables the principal to whom that grant is issued to delegate it. Moreover, the licence issuer can impose constraints on delegation, for example controls on adding or changing conditions during delegation, the allowable depth of the delegation chain and/or the principal to whom the grant or grantGroup may be delegated. These constraints depend on the child element of the delegation control: . ConditionIncremental element allows conditions to be added during delegation. . ConditionUnchanged element specifically prevents the adding of conditions during delegation. If a grant contains a conditionUnchanged in a delegationControl when it is delegated, it cannot contain more conditions than those stated initially. . DepthConstraint element allows for specifying the number of times that a grant can be delegated. . ToConstraint element allows for specifying the principals to whom the grant may be delegated. . When a principal delegates a grant, the delegated licence must contain a delegation Control compatible with the delegationControl element in the original licence, and it must be at least as restrictive as in the original one. Offers. MPEG-21 REL specifies offers that can be conceptualised as advertisements. This kind of licence allows users to obtain other licences from distributors according to the terms and conditions specified in the offer licence. Offer licences can be used by negotiation mechanisms that will resolve the permissions and constraints of the consumer licence. A simple example could be an offer licence granting to any consumer the chance to obtain a licence that allows them to play a music album during the whole year. Then when a consumer exercises that right, the distributor issues him a licence containing the offered grant that will permit the consumer to play the album during the year. MPEG-21 REL authorisation algorithm Other important concept of MPEG-21 REL is the authorisation model. It is used by any implementation of software, which makes an authorisation decision using MPEG 21 REL licences. The central question that lies in this decision making process is: “Is a principal authorised to exercise a right against a resource?” The MPEG-21 REL Authorisation Model, sketched in Figure 2, makes use of an authorisation request, an authorisation context, an authorisation story and an authoriser. The authorisation request can be conceptualised as representing the question if is it permitted for a given principal to perform a given right upon a given resource during a given time interval based on a given authorisation context, a given set of licences, and a given trust root. On the other hand, the authorisation story contains the following elements: a primitive grant which is used to demonstrate to

Verification algorithms

43

Figure 2. MPEG-21 REL authorisation model

which authorisation requests the authorisation story applies; a grant or a grantGroup representing the actual grant or grant group that is authorised by the authoriser of the authorisation story and an authoriser. Implementation of the MPEG-21 REL authorisation algorithm MPEG-21 standard specifies the data format and data types for the authorisation model elements (presented in the section above). We have participated in the definition of these elements as presented in Wang et al. (2003) and Rodrı´guez et al. (2004). Moreover, our research team has developed tools for implementing effective digital rights management (DRM) systems based on the MPEG-21 standard. Most of these tools currently form part of the MPEG-21 Reference Software (ISO/IEC, 2006). In Rodrı´guez et al. (2004) and Delgado et al. (2004) we proposed an implementation for the MPEG-21 REL Authorisation Model, called DMAG REL Interpretation Conformance reference software. The software developed makes an authorisation decision based on the MPEG-21 REL Authorisation Algorithm. The inputs are an authorisation request and an authorisation story file, while as output an XML file specifies if the authorisation story is an authorisation proof for the authorisation request. This software module was contributed to the MPEG-21 standard and currently forms part of MPEG-21 Reference Software (ISO/IEC, 2006). Related work also includes the work done in Torres et al. (2004), where we presented a general architecture of a system capable of distributing and processing multimedia information structured as defined in the MPEG-21 standard. The architecture consists of several modules that make use of different protection, governance and processing tools that initially were implemented according to MPEG-21 requirements. In order to avoid lack of inter-operability we improved our work in Torres et al. (2005) to be flexible enough to support different standards as MPEG-21 or OMA DRM.

OIR 31,1

44

Governed distribution of multimedia content This section presents the three verification algorithms that we have specified. These verification algorithms can be used by DRM systems to control the governed distribution of multimedia content. We have specified a verification algorithm for each one of the distribution models specified in MPEG-21 REL. These algorithms are the distribution verification algorithm, the super-distribution verification algorithm and the offers verification algorithm. For each of them, we also present a use case to illustrate how they are used by DRM systems in different scenarios. Formalisation of the distribution verification algorithm This section specifies a verification algorithm that will ensure that governed digital content is distributed according to the terms and conditions stated by the content creator. Then we propose a verification algorithm for distribution of governed multimedia content that will determine if an actor of the multimedia delivery chain can distribute multimedia content and issue the appropriate license(s) to other actors of the value chain (see Figure 3). The distribution verification algorithm that we have specified resolves whether a licence conforms to its parent licence. This licence will conform if all the following have been fulfilled: (1) The Issuer element (s0 ) of the Child Licence is Equal to Principal element (p) of the Parent Licence

Figure 3. Licence distribution verification algorithm

(2) The Right element (I) of the Parent Licence is the Issue right (specified in the MPEG-21 REL core) (3) If the Resource element (g0 ) of the parent licence is a GrantGroup (see Figure 3), then each of the grant elements that form the GrantGroup (a) of the child licence shall fulfil the following: . Principal element (pi0) within the Grant (ai) of the Parent Licence surpasses the Principal element (Pi) of the Child Licence. That is, for each of the elements of Pi exists an element in pi0 that is equal to it. Or the Principal element (pi0) within the Grant (ai) of the Parent Licence is absent. . Right element (ri0) within the Grant (ai) of the Parent Licence is Equal to Right element (hi) of the Child Licence. . Resource element (ti0 ) within the Grant (ai) of the Parent Licence is Equal to Resource element (Ti) of the Child Licence. . Conditions element (x&i0) within the Grant (ai) of the Parent Licence are Equal to the Conditions element (Xi) of the Child Licence. (4) The Conditions (x) of the Parent Licence are Satisfied. (5) The Time Of Issuing (T0 ) of the Child Licence is later than the Time Of Issuing (T) of the Parent Licence and the Time Of Issuing of the Child Licence is within the interval of the verification process.

Verification algorithms

45

Application of the licence distribution verification algorithm to a multimedia content distribution use case This section shows how the distribution verification algorithm is applied in a music distribution scenario, sketched in Figure 4. The entities involved in this music

Figure 4. Music distribution scenario

OIR 31,1

46

distribution scenario are gMusicRCLabs, the music distributor AFC DST Music and AFC music club members. The music producer, gMusicRCLabs, grants to AFC DST Music, a music distributor, permission to distribute to AFC music members the last album that the Labs have produced. In this use case gMusic RC Labs issued a licence to AFC DST Music granting the right to distribute the last album they have produced, Waiting For the Sirens, to AFC Music Club members since July 2006. The licences that AFC DST Music can distribute will allow AFC Music Club Members to play the album 10 times during this year. Then the distribution licence results as follows: AFC DST Music is the entity to which the right to issue licences is granted according to the terms and conditions specified in this licence. Then AFC DST Music is the principal (p) of the distribution licence. The right (I) of the licence is the issue right. The resource of the licence is a grant (a) with the permissions and constraints that can be distributed. This grant is formed by the principal (p0 ) to which licences would be issued, AFC Music Club Members, the right (r0 ) to play that they can exercise over the resource (t0 ) Waiting For the Sirens, a music album, and the conditions (x0 ) that they must fulfil in order to play the album. In this case the conditions that the AFC Music Club Members must fulfil are as follows: they could only play the album ten times during this year. On the other hand, the conditions that the music distributor shall fulfil when distributing licences are specified as the conditions element (x) of the distribution licence. In this case AFC DST Music only can issue licences to the AFC Music Club Members from July of this year. Finally, the issuer of distribution licence is specified. The issuer element of the licence contains two pieces of information: a set of issuer-specific details with the specific date and time (T), 29 January 2006, at which this issuer claims to have effected the issuing of the licence; and an identification of the issuer (s), gMusic RC Labs, coupled with a digital signature for the licence. Figure 5 shows the distribution licence issued by gMusic RC Labs to AFC DST Music. Now the Music Distributor can distribute the music album and issue licences to the AFC Music Club Members to play the Waiting for the Sirens album. Susanne, a member of this Music Club (Figure 6 shows the licence that proves that Susanne is member of the Music Club), downloads the music album from the distributor’s Web site. If Susanne wants to reproduce the music album, as it is protected and governed, she needs to purchase a licence that gives her the appropriate permission to perform the required operation. Then Susanne purchases in the Music Distribution Website a licence that gives her permission to play the music album 10 times during the year (see Figure 5). Note that she could obtain the licence because she is a member of the AFC Music Club. In order to verify whether the consumer licence has been created according to the permissions and constraints specified by the rights holder, gMusic RC Labs, the licence verification algorithm shall be applied. It specifies that a licence has conformed to its parent licence and is valid if all the following are fulfilled: (1) The Issuer element (s0 ) of the Consumer Licence, AFC DST Music, is equal to Principal element (p) of the Distribution Licence, AFC DST Music. (2) The Right element (I) of the Distribution Licence is the Issue right. (3) The Resource element (g0 ) of the Distribution Licence is a Grant element, then the resultant Grant element (a) of the Consumer Licence shall fulfil the following.

Verification algorithms

47

Figure 5. Music distribution licences verification

Figure 6. Susanne music club member licence

(4) The Principal element (p0 ) within the Grant (g0 ) of the Distribution Licence, AFC Music Club Members, surpasses the Principal element (P) of the Consumer Licence, Susanne. That is, for each of the elements of R exists an element in p0 that is equal to it. In this case, as Susanne is member of the AFC Music Club (see

OIR 31,1

(5) (6)

48 (7)

(8) (9)

licence in Figure 6), for P (Susanne) exists an element in p0 (AFC Music Club Members) representing her. The Right element (r0 ) within the Grant (g0 ) of the Distribution Licence, play, is equal to Right element (h) of the Consumer Licence, play. The Resource element (t0 ) within the Grant (g0 ) of the Parent Licence, Waiting for the Sirens, is equal to Resource element (T) of the Consumer License, Waiting for the Sirens. The Conditions element (x0 ) within the Grant (g0 ) of the Distribution Licence, exerciseLimit set to 10 and validityInterval set to this year, are equal to the Conditions element (X) of the Consumer Licence, exerciseLimit set to 10 and validityInterval set to this year. The Conditions (x) of the Distribution Licence are satisfied. Consumer licence has issued on 1st February 2006, that is before July of this year. The Time of Issuing (T0 ) of the Consumer Licence, 2006-02-15T21:00:00, is later in time than the Time of Issuing (T) of the Distribution Licence, 2006-01-29T17:00:00, and the Time of Issuing of the Consumer Licence is within the interval of the verification process.

Formalisation of the super-distribution verification algorithm This section presents the verification algorithm that we have specified for super-distribution of governed multimedia content. This verification algorithm will be used by any system that provides super-distribution of governed content. Then when a new licence is generated the verification algorithm should be applied to guarantee that the terms and conditions that govern the multimedia content are satisfied. The super-distribution verification algorithm (see Figure 7) determines whether a licence is valid according to the terms and conditions specified by its parent licence if all the following are fulfilled: (1) The Principal element (P) of the Child Licence is allowed by the delegationControl element of the Parent Licence if toConstraint element is present or is anyone if absent. (2) The Right element (r) of the Parent Licence is equal to the Right element (h) of the Child Licence. (3) The Grant element of the Parent Licence (g) has the delegationControl element. (4) The Resource element (T) of the Child Licence is equal to the Resource element (t) of the Parent Licence. (5) The Conditions (X) of the Child Licence are allowed by the delegationControl of the Parent Licence. . If toConstraint is present in delegationControl of the Parent Licence. Then, the Conditions (X) of the Child Licence are at least the same as Conditions (x) of the Parent Licence and the principal (P) of the ChildLicence has to be allowed by delgationControl Parent Licence as follows:

Verification algorithms

49

Figure 7. Super-distribution verification algorithm



If the Parent Licence has a forAll element with a pattern within the toConstraint element, then Principal (P) of the Child Licence satisfies the pattern. – If the Parent Licence has a Principal (p) within the toConstraint element, then it is equal to the Principal (P) of the Child Licence. . If conditionUnchanged is present in the delegationControl element of the Parent Licence, then the Conditions (X) of the Child Licence are equal to the Conditions (x) of the Parent Licence. . If conditionIncremental is present in the delegationControl element of the Parent Licence, then the Conditions (X) of the Child Licence are at least the same as the Conditions (x) in the Parent Licence. . If depthConstraint is present in the delegationControl element of the Parent Licence, then: – DepthIncremental element of the Child Licence is equal to the depthIncremental of the Parent Licence less one. . If dcConstraint is present in delegationControl of the Parent Licence, then: – The Principal element (P) of the Child Licence is allowed by dcConstraint. (6) The Conditions (X) of the Child License are allowed by dcConstraint. Application of the delegation control verification algorithm to a super-distribution scenario This section proposes a multi-tier distribution scenario in order to illustrate the application of the delegation control verification algorithm when delegating rights.

OIR 31,1

50

Figure 8. e-books super-distribution scenario

Figure 9. e-book distributor licence

In this use case a publisher (MTs Publisher) issues to a distributor (eBooks DST) a licence that grants him the right to play and an e-book (The Buried Life) and to delegate this right to end-users according to the conditions specified by the publisher. Figure 8 shows graphically the super-distribution scenario described. In this scenario the publisher issues to the distributor a delegation control licence (see Figure 9) that contains the following elements granting him the right to view the

e-book during this year and to distribute to end-users this e-book. Then the distributor licence is made up of the following elements: a grant element that specifies that the principal of the licence, the eBooks DST, can view the e-book, The Buried Life, during this year and issue licences to end-users, which is indicated by the delegationControl element that contains the depthConstraint element set to 1, that can view the e-book only during this year also; and the issuer element that contains the identification of the issuer, MTs Publisher, a digital signature for the licence and the date and time at which the licence was issued. Once the distributor has obtained the licence presented in Figure 10, he can distribute the e-book to end-users that grant them the permission to view the e-book during this year. The licences that the distributor can issue are sketched in Figure 10. These licences contain a grant element with a delegation control depth constraint element set to 0, the principal that will identify an end-user, the play right, The Buried Life resource and the conditions element will restrict the use of the book to this year. Finally, the issuer element of the grant will identify the distributor, eBooks DST, and detail the date and time at which the licence was issued. In the scenario presented in this section the delegation control verification algorithm will be applied in two cases: first, when an end-user tries to purchase a licence that grants him the right to view the e-book; and second, when the end-user tries to play this book in a non-trusted system. Then if we consider the first case, when the distributor requests from a licence generator service the issuing of a licence to an end-user, Peter, that grants him the right to play the The Buried Life e-book during this year, this service has to verify that the distributor can distribute the e-book according to the terms and permissions specified. Then the verification algorithm is applied:

Verification algorithms

51

Figure 10. End-user licence

OIR 31,1

52

(1) The Grant element of the Distributor Licence (g) has the delegationControl element. (2) The Principal element (P) of the End-User Licence, Peter, is allowed by the delegationControl element of the Distributor Licence if the toConstraint element is present or is anyone if absent. (3) The Right element (h) of the End-User Licence, play, is equal to the Right element (r) of the Distributor Licence, play. (4) The Resource element (T) of the End-User Licence, The Buried Life, is equal to the Resource element (t) of the Distributor License, The Buried Life. (5) The Conditions (X) of the End-User License, during this year, are allowed by the delegationControl of the Distributor License, during this year. (6) If depthConstraint is present in the delegationControl element of the Distributor Licence, then: (7) DepthIncremental element of the End-User Licence, 1, is equal to the depthIncremental of the Distributor Licence less one, 0. In this use case (see Figure 11) Peter’s licence is valid according to the terms and conditions specified by the eBooks DST licence, as all the statements specified in the delegation control verification algorithms are fulfilled. Formalisation of the offers verification algorithm This section presents the verification algorithm that we have specified for offers of governed multimedia content (see Figure 12). This verification algorithm will be used by any DRM system that provides this feature. Then when a new licence is generated

Figure 11. Delegation control verification algorithm for the use case

Verification algorithms

53

Figure 12. Offers verification algorithm

the verification algorithm should be applied to guarantee that the terms and conditions that govern the offered multimedia content are satisfied. The offers verification algorithm that we have specified determines whether a licence is valid according to the terms and conditions specified by its parent licence if all the following are fulfilled: (1) Principal element (p0 ) within the Grant (a) of the Parent Licence surpasses the Principal element (P) of the Child Licence. That is, for each of the elements of P exists an element in p0 that is equal to it. Or the Principal element (p0 ) within the Grant (a) of the Parent Licence is absent. (2) Right element (h) of the Child Licence is equal to the Right element (r0 ) within the Grant (a) of the Parent Licence. (3) Resource element (T) of the Child Licence is equal to the Resource element (t0 ) within the Grant (a) of the Parent Licence. (4) Conditions element (X) of the Child Licence is equal to the Conditions element (x0 ) within the Grant (a) of the Parent Licence.

OIR 31,1

54

Figure 13. e-book distribution scenario – publisher special offers

(5) Right element of the Parent Licence is obtained. (6) Conditions (x) of the parent licence are satisfied. Application of the offers verification algorithm to a distribution scenario In this section we present a simple distribution scenario in which a publisher advertises to consumers that they can view his last e-book for a special price. In this use case the publisher will distribute digital objects that will contain a reference to the e-book and some metadata with regard to the e-book (title, author, genre, etc.) and to the protection and governance information (licences, information related to the protection tools, etc.). Within the governance information the publisher will include the offer licence in order to allow users to obtain licences that allow them to consume the governed content according to the terms specified in the offer licence. Figure 13 shows graphically the use case proposed. The offer licence is made up of two main elements that specify that any consumer can obtain a licence granting him the right to view the e-book, The Buried Life, during this year if he pays e5.00. On the other hand the issuer element identifies the issuer of this offer licence, MT Publisher, and the time at which this licence was issued. Finally, the licence is signed to guarantee its integrity and authenticity. Once the publisher has stated the terms of the offer, the offer licence sketched in Figure 14 is associated with the digital object that contains the e-book protected in some way, with other metadata as information related to the protection tools. Then any consumer can obtain a licence granting him the right to view the e-book. If a consumer is interested in the offer, then he exercises the right to obtain a consumption licence according to the terms stated by the publisher in the offer licence. Once the user has paid the e5.00, a licence service sends the user the corresponding consumption licence. This licence grants him (Peter) the right to view the e-book during this year. Peter’s consumption licence is shown in Figure 15.

Verification algorithms

55

Figure 14. Offer licence – e-books distribution scenario

Figure 15. Consumption licence – e-books distribution scenario

Before the licence creator service sends Peter his consumption licence it must verify that the created licence is valid according to the terms and stated in his parent licence and that the condition has been satisfied; in this use case the user must have paid the corresponding fee. Then the offers verification algorithm that we have specified in this section must be applied.

OIR 31,1

56

Figure 16. Offers verification algorithm – application to the distribution use case

The offers verification algorithm (see Figure 16) that we have specified determines if a licence is valid according to the terms and conditions specified by its parent licence if all the following are fulfilled: (1) Right element of the Parent Licence is obtained. (2) Principal element (p0 ) within the Grant (a) of the Parent Licence (Anyone) surpasses the Principal element (P) of the Child Licence (Peter). That is, for each of the elements of P (Peter) exists an element in p0 that is equal to it (Anyone). (3) Right element (h) of the Child Licence (play) is equal to the Right element (r0 ) within the Grant (a) of the Parent Licence (play). (4) Resource element (T) of the Child Licence (The Buried Life) is equal to the Resource element (t0 ) within the Grant (a) of the Parent Licence (The Buried Life). (5) Conditions element (X) of the Child Licence (exercise granting rights during this year) is equal to the Conditions element (x0 ) within the Grant (a) of the Parent Licence (exercise granting rights during this year). (6) Conditions (x) of the parent licence are satisfied (payment service verifies that the user has paid the e5.00). (7) Issuer element (s)of the Parent License (MT Publisher) surpasses the Issuer element (s0 ) of the Child Licence (MT Publisher). That is, for each of the elements of s exists an element in s0 that is equal to it. In this use case they are equal.

Conclusions and future work This paper has presented mechanisms to enforce rights when distributing governed multimedia content. This paper addresses three different ways of sharing governed multimedia content which are distribution, infinite re-distribution and offers, and three verification algorithms have been specified. They will be used by any DRM system to ensure that the rights and conditions stated by the content owner are respected when distributing digital manifestations of his work. Then in order to illustrate one of them we have presented three different scenarios. The first is a music distribution scenario in which a music producer grants a distributor the right to distribute a music album to the members of a music club. The licence distribution verification algorithm resolves whether a music album has been distributed according to the terms and conditions stated by its producer. The second scenario is a multi-tier distribution scenario where a publisher (MTs Publisher) issues to a distributor (eBooks DST) a licence that grants him the right to play and an e-book (The Buried Life) and to delegate this right to end-users according to the conditions specified by the publisher. Finally, in the third scenario a publisher advertises to consumers that they can view his last e-book for a special price. In this use case the publisher will distribute digital objects that will contain a reference to the e-book and some metadata regarding to the e-book (title, author, genre, etc.) and to the protection and governance information. We will further study the necessity of defining verification mechanisms to enable DRM systems to support the adaptation and modification of content through the complete digital value chain according to the terms and conditions stated by their creators. References Bormans, J. and Hill, K. (2002), MPEG-21 Overview v.5, ISO/IEC JTC1/SC29/WG11/N5231. Delgado, J., Rodrı´guez, E. and Llorente, S. (2004), “DMAG REL Interpretation Conformance”, ISO/IEC JTC1/SC29/WG11 MPEG2003/M10573. Halpern, J. and Weissman, V. (2004), “A formal foundation for XrML”, Proceedings of the 17th IEEE Computer Security Foundations Workshop, Pacific Grove, CA, June 28-30, 2004. Iannella, R. and Guth, S. (2006), “ODRL V2.0 – Model Semantics”, available at: http://odrl.net/2.0/ WD-ODRL-Model.html ISO/IEC (2004), ISO/IEC IS 21000-5 – Rights Expression Language. ISO/IEC (2006), “ISO/IEC IS 21000-8 – Reference Software”. Rodrı´guez, E., Llorente, S. and Delgado, J. (2004), “Use of rights expression languages for protecting multimedia information”, Proceedings of the Third International Conference Web Delivering of Music, Leeds, 15-17 September, 2003, IEEE Computer Society, New York, pp. 70-8. Torres, V., Rodrı´guez, E., Llorente, S. and Delgado, J. (2004), “Architecture and protocols for the protection and management of multimedia information”, Proceedings of the Second International Workshop on Multimedia Interactive Protocols and Systems, Grenoble, France, 16-19 November, 2004, LNCS 3311, Springer Verlag, Berlin, pp. 252-63. Torres, V., Rodrı´guez, E., Llorente, S. and Delgado, J. (2005), “Use of standards for implementing a multimedia information protection and management system”, Proceedings of the 1st International Conference on Automated Production of Cross Media Content for MultiChannel Distribution, Florence, Italy, 30 November-2 December, 2005, IEEE Computer Society, New York, NY, pp. 145-53.

Verification algorithms

57

OIR 31,1

Wang, X., Lao, G., DeMartini, T., Reddy, H., Nguyen, M. and Valenzuela, E. (2002), “XrML – eXtensible rights Markup Language”, Proceedings of the 2002 ACM Workshop on XML Security, Washington, DC, 22 November, 2002, pp. 71-9. Wang, X., DeMartini, T., Delgado, J., Rodriguez, E. and Llorente, S. (2003), Data Types for MPEG-21 REL Interpretation Conformance Software, ISO/IEC JTC1/SC29/WG11 MPEG2003/M10000.

58 Corresponding author Eva Rodrı´guez can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

The current issue and full text archive of this journal is available at www.emeraldinsight.com/1468-4527.htm

Toward semantics-aware management of intellectual property rights Ernesto Damiani and Cristiano Fugazza Dipartimento di Tecnologie dell’Informazione, Universita` degli Studi di Milano, Crema, Italy Abstract

Toward semantics-aware management 59 Article received 25 September 2006 Reviewed by EuDiRights Workshop Committee Approved for publication 10 October 2006

Purpose – The purpose of this paper is to introduce the advantages of semantics-aware representation formalisms in the integration of digital rights management (DRM) infrastructures grounded on heterogeneous formats. Design/methodology/approach – After discussing the notion of semantics-aware IPR and its relationship with Semantic Web-style metadata, we exemplify the advantages of adopting it by providing two different use cases. XML-based DRMLs are mapped with a shared ontology-based representation in such a way that dependencies between elements can be drawn. Findings – Individual formalisms may take advantage of a semantics-aware infrastructure to check consistency of DRM policies according to dependencies not explicit in the specification language. On the other hand, distinct formalisms can be integrated with each other according to fine-grained translation mechanisms. Originality/value – Inference procedures can pre-process this knowledge base and derive implicit knowledge that can be used by programming logic in the actual enforcement of DRM policies. Keywords Semantics, Intellectual property, Worldwide web, Open systems, Automation Paper type General review

Introduction The information and entertainment industry is one of the fastest growing and most profitable sectors of today’s economy. Distributing information in digital form, however, raises numerous concerns due to the fact that it is difficult for digital content providers to control what others do with the information. This is especially true on the Internet, which is highly vulnerable to unauthorised use of information. Unauthorised distribution, forgery and defacement of digital content have become widespread, triggering a technological “arms race” between content providers and malicious users. For the last 10 years the industry has been demanding an efficient mechanism for digital content protection. Digital Rights Management (DRM) technologies supporting secure transmission of digital products from publishers to consumers have become a crucial factor in the marketing of digital content. In general DRM systems seek to manage access to digital content, restricting it to individuals or organisations that are entitled by payment or affiliation to have access. Digital content managed by DRM can take many different forms, including music, information, software applications, video and even enterprise e-mail. First generation DRM technologies focused on encryption-based solutions for locking digital content and limiting its distribution to authorised users. Early DRM techniques included limiting the number of devices on which content could be accessed

Online Information Review Vol. 31 No. 1, 2007 pp. 59-72 q Emerald Group Publishing Limited 1468-4527 DOI 10.1108/14684520710731038

OIR 31,1

60

or the number of times it could be accessed; imposing forward locks to prevent onward transmission of content, or withholding access until the user had registered with the content owner or publisher. Digital watermarking is also part of this palette of techniques, as it provides the basis for legal action if a violation of rights is detected. While early DRM solutions addressed the issue of unauthorised copying, they did so at the expense of a substantial limitations to the publishers’ and retailers’ business opportunities, since products burdened with DRM policies tend to be hardwired to a specific distribution channel or technology and rely on a simple supply chain structure. (There are even well known examples of particular solutions that have had rather unfortunate and unforeseen side effects, e.g. exposing user machines to security vulnerabilities.) Modern content delivery channels are more and more multi-tenanted, involving content owners, content aggregators, network owners, service providers, terminal manufacturers and DRM solutions providers. Also, increasing end user concerns about fair usage and privacy must be addressed. Today, innovative second generation DRM solutions are needed, capable of flexibly supporting new opportunities to do business with digital information products. This increased flexibility is largely expected to be due to XML-based Digital Rights Management Languages (DRML) that declaratively assign usage rights to digital content (Open Digital Rights Language, n.d.; eXtensible right Mark-up Language, n.d.). DRMLs allow the description of specifications of rights, fees and usage conditions, together with message integrity and entity authentication. For instance, in the video-on-demand industry there is growing interest in switching from the usual download-to-display to a novel download-to-own business model. Intuition suggests that this switch for a specific category of video applications could be expressed as a DRML policy change. However, DRML’s expressive power is bound by the metadata specifying properties of the digital content to which they refer. In this paper we deal with associating DRMLs with advanced Semantic Web style metadata, highlighting the role of reasoning and inference in DRML’s enforcement. A further motivation for investigating more expressive mechanisms to bind heterogeneous formats is that the global information infrastructure has made it possible for different enterprises to integrate their business activities as a Virtual e-Enterprise (VEE). A key success factor for VEEs is achieving seamless integration of legacy applications, as well as those of heterogeneous formats and resources. This is particularly important in the DRM landscape, where independent actors need to integrate data structures in a more loosely coupled fashion with regard to distributed Web services. DRM basics While much research is being done on extending available information meta-models to fully support advanced DRM concepts (Yagu¨e et al. (2003), some basic notions of DRM can be easily captured by basic entity-relationship modelling. Current DRM systems involve three main entities: Users, Content and Rights. Users create and use digital content, i.e. any type of digital product. Rights are privileges, constraints and obligations regarding content; they are granted or denied to users. DRM languages have been designed to state assertions about allowable permissions, constraints, obligations and any other rights-related relationship between users and content. Rights expressions can become very complex, and their correct enforcement needs a complete

specification of the DRML semantics. Nonetheless, a full formalisation of DRM models is still underway: after the seminal paper by Carl Gunter at HICSS’01 (Gunter et al., 2001), important contributions came from recent work made by several researchers, including Frederic Cuppens (Sans and Cuppens, 2004) and David Bjorner (Arimoto et al., 2006). Referring to the original Gunter/Weeks/Wrights model, rights expressions consist of four parts: (1) Permissions. What a right allows to do. (2) Rights holders. Who is entitled to a right. (3) Constraints. Any restriction on the right that may apply. (4) Obligations. What must be done/provided/accepted to exercise a right. Besides modelling issues, DRM systems pose a series of well-known management problems: . Policy management. An entity should define and continuously enforce policies about digital products as part of its business strategy. . Rights management and licensing. When rights are acquired from other entities, it is important to remember the source, how broad the rights are, etc. Also some business models require rights to be transferable, i.e. an entity can license some rights to other entities. . Revenue collection. Most traditional business models associate billing to right transfer (e.g., when buying a CD), while others generate revenue only when the user actually exercises the rights. Practical DRM systems must specify how to collect, account and share revenues. Summarising, a DRM system is composed of a well-specified rights model/language (DRML) and of a set of advanced management tools. (Of course, legal aspects are also an important part of the picture, and may have an important impact on modelling issues. For the sake of conciseness, however, they are not discussed further in this paper.) Rights models and languages must be highly expressive in order to satisfy flexibility requirements posed by modern business models; also they must be as standardised as possible. In the next sections we introduce two XML-based DRMLs, XrML (eXtensible rights Mark-up Language) and ODRL (Open Digital Rights Language), and highlight their expressive power limitations. Limitations of XML-based DRM As we have seen, evolving business models for digital content introduced novel requirements for DRM systems. In this section we describe the motivations stimulating research on DRM policy languages (DRMLs) and their role toward interoperable enforcement of Intellectual Property Rights (IPR) on the Web. With respect to access control (AC) policy languages, DRMLs have many distinguishing features. A first difference is multi-tenancy, i.e. the greater number of actors involved in each transaction being regulated. Traditional AC typically involves two actors: a user requesting a resource, and a service provider requiring authentication in order to grant it. Third parties (e.g., certification authorities) may be involved but usually represent super-parties, entities that do not directly participate in the business process relating user and service provider (e.g., the contract with an Internet provider or mobile

Toward semantics-aware management 61

OIR 31,1

62

Figure 1. A sample scenario for policy re-use

communication company). By contrast a DRM transaction involves at least four actors: the content provider, the distributor delivering the content, the clearinghouse managing licences, and the end user buying licences. These actors cooperate along the digital media value chain according to the specific rights held by each party. Orchestrating this process is therefore more complicated than accessing a resource on a secure server. Another crucial factor in DRM is the sensitivity of the information required by service providers that may affect users’ privacy. DRM systems may have to establish a one-to-one relationship with customers (e.g., to ensure that distinct users cannot share the same licence), which makes it difficult to avoid privacy concerns such as those associated with secondary use of collected information for marketing purposes. All the above considerations motivate the need for a highly expressive DRML supporting a broad choice of radically different business models. The specific business model being implemented (pay-per-play, usage metering, download-to-own) dictates the strategies for regulating access to resources; furthermore, hardware and software solutions allow for tracking the actual usage of resources and enforce the rights associated with them. For these reasons current XML-based DRMLs support fine-grained descriptions of resources, business processes and actors. Mainstream examples are ODRL (Open Digital Rights Language) and XrML (eXtensible right Mark-up Language), the latter forming the basis for MPEG-21 REL. More proposals are available, e.g. XMCL (Ayars, 2002), and a general convergence on a definitive solution is difficult to achieve. As an example, the mapping between ODRL and XrML is only partial; the former is more suitable for actual transactions, and the latter allows for a more cross-vertical applicability. Moreover, the specific environment in which DRM is applied (e.g., e-books, streaming media, mobile services) and the different hardware/software support by media players introduce specific requirements and functionalities. In order to show the importance of interoperability between distinct DRMLs, Figure 1 displays a possible scenario for policy re-use. In Workflow (a) the content provider delivers media objects to customers via the Internet according to the MPEG-21 multimedia framework (Bormans and Hill, 2002), therefore relying on the XrML format for policy specification. As soon as the same content has to be delivered to mobile appliances, typically using the OMA open standard for DRM (Open Mobile

Alliance, 2003), it is necessary to translate the original policy into ODRL for enforcement. Rewriting policies is a cumbersome task because media objects are supposed to be added continuously; also, this process can be automatic, such as in content aggregation, and consequently human supervision cannot always be considered in the translation process. Instead, in Workflow (b) a translation tool takes care of converting XrML into ODRL, adapting policy specifications to the new environment. The work of Polo et al. (2004) proposes a purely syntactical translation mechanism between ODRL and XrML based on eXtensible Stylesheet Language Transformations (XSLT) (W3C, 1999). An example translation pattern for this approach is shown in Figure 2 display, respectively, example input and output XML fragments. One of the limitations of this approach is the excessive dependence on the physical structure of the XML input. As a consequence, syntactically different constructs sharing the same meaning cannot be easily mapped with one other. Moreover, albeit effective in the small scale, this strategy (generally known as metadata crosswalking) becomes difficult to manage as soon as a large number of distinct, partially overlapping formats are taken into consideration. More promising approaches based on Semantic Web languages can fill this gap and provide a more interoperable representation of DRM constructs. The integration model proposed by this work aims at mapping distinct formalisms by means of ontology-based data structures that essentially act as a wrapper for the

Toward semantics-aware management 63

Figure 2. XSLT template (a) translating ODRL count constraints; (b) into the corresponding XrML construct; (c) (from Polo et al., 2004)

OIR 31,1

64

Figure 3. Our semantics-aware model for policy re-use

heterogeneous set of data formats that is considered, as shown in Figure 3. The primary difference with regard to the scenario portrayed in Figure 1 is that, in our model, the expressivity of policy descriptions may exceed those of the distinct DRMLs that are taken as data sources (e.g., separate ODRL and XrML policies for the same media object can be merged into one comprehensive data structure). Advantages of a semantics-aware DRM infrastructure The primary aim of Semantic Web languages is providing a foundation in formal logic that can be used to relate vocabularies defined by heterogeneous sources and derive implicit knowledge not declaratively specified. Inference capabilities with formal logic, primarily the fragments of First Order Logic (FOL) delimited by Description Logics (DL) (Baader et al., 2003) and Horn Rules (Boley et al., 2004), greatly augment the expressiveness of policy languages based on them (Damiani et al., 2006b; Damiani et al., 2006a). In this section we review some examples of semantics-aware DRM approaches. The most straightforward use of semantics-aware metadata is providing a fine-grained description of resources being managed. Digital Asset Management (DAM) tools are typically used by content providers (e.g., authors) to describe and associate IP rights with their products. Even if DAM is traditionally kept separated from DRM, data produced by DAM tools comply with a fundamental DRM requirement, namely the univocal identification of resources, e.g. DOI (The International DOI Foundation, n.d.), Handle System (Corporation for National Research Initiatives, n.d.). Advanced descriptions of digital products and of the associated rights can drive the brokerage of media products whenever distributors make them publicly available, overlaying their own rights on the existing IP rights and enforcing them via the DRM infrastructure. Consequently, it makes sense to consider rights (and formats associated with them) as dynamic properties spanning the whole digital product lifecycle. This is the approach followed by the Adobe eXtensible Metadata Platform (XMP) (Adobe, n.d.), which exemplifies the applicability of Semantic Web formats at production level. This architecture is based on RDF (W3C, n.d.) assertions, which are used for binding the wide range of Adobe applications (Illustrator, Premiere, Acrobat,

etc.) to a common workflow-integrating asset and content management, search facilities, and control mechanisms on possible secondary uses (e.g. DVD duplication). Metadata is stored as XMP packets that label resources; within them, RDF assertions describe resources according to terms defined in the XMP schema. This structure can be derived by an existing XML schema definition because the RDF data model comprises the tree structure of XML documents. Recently, the Semantic Web proposal of using domain ontologies written in a standard language, e.g. OWL (W3C, 2004), as controlled vocabularies for description metadata triggered new DRM-related research. Delgado et al. (2003) developed Regulatory Ontologies for expressing IPR, formalised ODRL semantics using OWL ontologies, and also applying Semantic Web techniques to the MPEG-21 Rights Data Dictionary (RDD) (Delgado et al., 2004). Also, Delgado pinpointed the need for interoperable workflow descriptions and proposed the OWL-based Web service ontology (OWL-S) as a viable format for achieving this (Llorente et al., 2004). Finally, the Semantic XPath Processor (Tous et al., 2005) enables interoperability between distinct DRM formats by providing a RDF mapping of XML schema constructs with particular regard to complex structures, such as substitution groups, that cannot be easily translated with a purely syntactical approach such as (Polo et al., 2004). Another noteworthy example of logic-based DRM architecture is LicenseScript language (Chong et al., 2003b), whose data structures are constituted by multisets that are rewritten by Logic Programming (LP) procedures. On the one hand, the FOL fragment on which LP formalisms are grounded prove to be extremely powerful with regard to the application logics it can express; for this reason, DL knowledge bases (KBs) are currently being integrated with Horn FOL (Boley et al., 2004) production rules to cover a wider range of FOL semantics. On the other hand, the non-monotonic behaviour of LP languages (i.e. that previously inferred conclusions can be contradicted by adding new knowledge to the KB) is not generally regarded to as a desirable feature in open environments such as Web-based interactions, as it will be detailed in the next section. Nevertheless, the comparison of logic-based and XML-based DRMLs provided in (Chong et al., 2003a) is a valuable starting point for semantics-aware DRM. Introduction to knowledge bases As candidate formalisms for describing entities in the DRM scenario, Semantic Web languages provide an increased flexibility with respect to XML schemata, essentially because the abstraction provided by RDF subject-predicate-object allows the expression of any graph structure. Also, the formal models underlying these languages allow the integration of heterogeneous vocabularies more easily than with point-to-point XSLT translation facilities, such as those described in Polo et al. (2004). However, the most interesting feature of expressive Semantic Web languages, such as the OWL DL formalism we are considering, is the use of logic-based semantics. As an example, OWL DL is grounded on the SHOIN(D) Description Logic (Baader et al., 2003) enabling the decidable, monotonic inference facilities implemented by reasoners (Parsia et al., 2003; Haarslev and Mo¨ler, 2001). This kind of reasoning regarding distributed data sources based on the Internet is a fundamental requirement of the Semantic Web. Otherwise deductions that can be made on data structures under control of the system can prove to be incorrect as soon as incomplete data sources retrieved from the open Internet are integrated with the former.

Toward semantics-aware management 65

OIR 31,1

66

As a simple example, consider decision procedures that are targeted at media objects asserted as “not on discount sale”: in the traditional, controlled environment of a proprietary system, determining these items may amount to moving the media objects indicated as discounted from the full list of media objects into a separate data structure (e.g., a database table) for media objects on discount. Instead, if the environment is characterised by information provided by independent sources, it is not so straightforward to assert objects as “not on discount sale”, because exhaustive knowledge of data structures managed by remote parties cannot be taken for granted. The OWL specification provides three sublanguages, providing different expressive power (W3C, 2004). OWL Lite has a limited expressive power; it was designed to keep complexity of reasoning under control. In principle it is simpler to provide tool support for OWL Lite than its more expressive relatives. Also, OWL Lite provides a quick migration path for existing thesauri and other taxonomies. OWL DL was designed to take advantage of DL decision procedures and reasoning systems. OWL DL computational operations are complete (given an OWL DL system, all its logic entailments are guaranteed to be computed) and decidable (all computations will finish in finite time). Finally, OWL Full is meant for users who need all the expressiveness and syntactic freedom of RDF(S) with no computational guarantees. At the time of writing, no reasoning software is able to support every feature of OWL Full. Here we deal with OWL DL because it guarantees a high expressive power and computational completeness. A DL KB is traditionally divided into two main parts. On the one hand, the terminology provides an intentional description of the application domain’s vocabulary and is generally indicated as the TBox. In our example we concentrate on this part of a KB and apply OWL DL semantics to integrate the ODRL permission model with the taxonomy of rights provided by XrML. Specifically, XML elements associated with these entities constitute concepts (or classes) in the TBox. On the other hand, assertions constitute the intensional part of a KB, called the ABox. Assertions introduce and relate to each other named individuals expressed in terms of the vocabulary. In our scenario individuals represent instances of permissions associated with principals. TBox and ABox entities in a DL KB represent two separate meta-levels in the application domain. In other words entities in the TBox are the definitions of entities instantiated in the ABox. KBs implemented in OWL DL can take advantage of reasoning algorithms, enabling a suite of inference services. In particular, we have: . Consistency checking, which ensures that an ontology does not contain any contradictory fact. . Concept satisfiability, which checks if it is possible for a class to have any instances. . Classification, which computes the subclass relations between every named class to create the complete class hierarchy. The class hierarchy can be used to answer queries such as getting all or only the direct subclasses of a class. . Realisation, which finds the most specific classes that an individual belongs to or, in other words, computes the direct types for each of the individuals. These algorithms are implemented in a number of software tools collectively known as reasoners. With respect to generic FOL, for which few sound and mature reasoners are available, well-engineered reasoners and other tools do exist for DL, supporting

advanced features like standard query interfaces and clean separation between TBox and ABox. Available reasoners include RacerPro (Haarslev and Mo¨ler, 2001) and Pellet (Parsia et al., 2003). Also, the DL Implementation Group (DL Implementation Group, n.d.) has developed an interface which is now an emerging standard for accessing DL reasoners via an http-based interface. This kind of inference is, at the time of writing, the only technique capable of deriving information on incomplete data sources such as the ones interacting with each other on the open Internet. The primary drawbacks are the computational complexity associated with this kind of inference and the pre-defined set of inference procedures. In order to integrate DL reasoning with more flexible criteria for deriving information, the FOL fragment delimited by positive definite Horn clauses can be considered, but a complete integration of both paradigms (i.e. an integration scheme capable of deriving all conclusions implied by premises) is still under development.

Toward semantics-aware management 67

Case studies We are now ready to apply our approach to two use cases. In the first case we translate the ODRL permission model (see Figure 4) into an OWL ontology, expressing dependencies not explicit in the picture and enabling the wide set of inference procedures associated with this formalism. In the second case the same translation is applied to the XrML right model, and the resulting ontology is integrated with the one defined for ORDL. Equivalent rights in both categorisations can now be equalised and policies can be transparently applied to principals whose permissions are defined in either format. Figure 4 shows the flat categorisation of permissions provided by ODRL.

Figure 4. The categorisation of ODRL permissions

OIR 31,1

Considering just the asset management subtree, it is clear that a backup operation implies the duplication of the asset; on the other hand, duplication corresponds to a “save” operation where no changes have been made to the asset. (Note that, in ODRL, a save operation corresponds to the creation of a new media object.) Principals being given the right to save a resource and being denied the right to duplicate it would constitute an inconsistent state in the DRM infrastructure, because the negative right is totally ineffective (i.e. the user can obtain the same effect as a duplication operation). Moreover, a “move” operation implies the duplication and subsequent deletion of an asset. Both these rights should then be granted to principals in order for them to perform this operation. Considering now the re-use subtree, it is obvious that the aggregate usage implies duplication, while a “modify” operation requires a “save” permission. Transfer permissions can correspond to “move” or “duplicate” operations. “Give” can be .

68

Figure 5. The OWL ontology derived from the ODRL permission model

considered a “sell” operation with no exchange of value. The same is true for “lease” and “lend”, except that they both should enforce the rollback of the operation after a fixed period. Finally, usage operations should be the consequence of one of the transfer operations. Of course, these examples are only meant to show that structures introduced by policy languages may have a semantic that is not made explicit in the XML Schema definition that, typically, defines the range of permissible policies. Domain experts should decide, for a specific architecture, the relationships between permissions that should be considered when checking consistency of policies expressed in a specific DRM language. Knowledge Representation (KR) practices are the ideal tools for representing cross-cutting relationships between permissions. In particular inference procedures associated with expressive Semantic Web languages, such as OWL DL, allow for checking consistency of DRM policy definitions and augmenting the expressive value of DRMLs by enabling the propagation of permissions. Figure 5 shows the complex structure interrelating ODRL permissions displayed in the SWOOP ontology editor (Mindswap, 2004). Permission definitions (i.e. actual instances of concepts shown in Figure 5) can populate the SWOOP Knowledge Base (KB) and be checked against one

Toward semantics-aware management 69

Figure 6. The categorisation of XrML rights

OIR 31,1

70

Figure 7. Integration of ODRL permissions and XrML rights

another for consistency. Note that, after creating the OWL categorisation of the ODRL permission model, the structure can be directly imported in the AC model described in Damiani et al. (2006a, 2006b). This enables use of the conflict resolution criteria described there for DRM enforcement as well. Figure 6 shows the categorisation of rights provided by XrML. The structure is simpler with respect to the permission model of ODRL and, with the exception of possible relationships between “render” and “transport” operations, can fit into a simple tree representation. As in the previous case, it is possible to integrate the XrML rights model with an ontology-based infrastructure. More interestingly, XrML and ODRL policies can be made interoperable with each other by mapping the two ontologies introduced in this paper. As an example, the homonymous “play” and “print” operations from both formats can be made equivalent. As a consequence, principals with a “display” permission in the ODRL expression language can be given the “play” and “print” rights on resources being protected by XrML metadata. Figure 7 displays the sample mapping of ODRL permissions with XrML rights.

Conclusions Digital Rights Management languages allow asserted rights over digital content to be expressed in a machine-readable format. Today, DRM policies are increasingly used in conjunction with more general metadata – for example, harvested from cataloguing systems. It is generally recognised that rights assertions of today’s XML-based policy languages do not fully benefit from the highly expressive metadata of Semantic Web style descriptions. As a contribution to the evolution process of these languages, in this paper we have outlined the concepts, background and state-of-the art related to existing languages (e.g. ODRL and XrML) and presented some case studies showing the benefit of integrating them with ontology-based, Semantic Web style metadata. A semantics-aware environment for DRM policy definitions based in these ideas is currently under development. References Adobe (n.d.), “A manager’s introduction to Adobe eXtensible Metadata Platform”, available at: www.adobe.com/products/xmp/pdfs/whitepaper.pdf Arimoto, Y., Bjorner, D., Chen, X. and Xiang, J. (2006), “Alternative models of Gunter/Weeks/Wright’s: models and languages for digital rights”, JAIST/DEDR document. Ayars, J. (2002), “The eXtensible Media Commerce Language (XMCL)”, available at: www.xmcl. org/specification.html Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D. and Patel-Schneider, P.F. (Eds) (2003), The Description Logic Handbook: Theory, Implementation, and Applications, Cambridge University Press, Cambridge. Boley, H., Dean, M., Grosof, B., Sintek, M., Spencer, B., Tabet, S. and Wagner, G. (2004), “FOL RuleML: the first-order logic Web language”, available at: www.ruleml.org/fol/ Bormans, J. and Hill, K. (Eds) (2002), “MPEG-21 Overview v.5”, available at: www.chiariglione. org/mpeg/standards/mpeg-21/mpeg-21.htm Chong, C., Etalle, S. and Hartel, P.H. (2003a), “Comparing logic-based and XML-based rights expression languages”, Lecture Notes in Computer Science, Vol. 2889, pp. 779-92. Chong, C., Corin, R., Etalle, S., Hartel, P., Jonker, W. and Law, Y. (2003b), “LicenseScript: A novel digital rights language and its semantics”, Proceedings of the 3rd International Conference on Web Delivery of Music (WEDELMUSIC), IEEE Computer Society Press, New York, NY, pp. 122-9, available at: http://citeseer.ist.psu.edu/chong03licensescript.html Corporation for National Research Initiatives (n.d.), “The handle system”, available at: www. handle.net/ Damiani, E., De Capitani di Vimercati, S., Fugazza, C. and Samarati, P. (2006a), “Extending Context descriptions in semantics-aware access control”, Second International Conference on Information Systems Security (ICISS 2006), December 17-21, Kolkata, India. Damiani, E., De Capitani di Vimercati, S., Fugazza, C. and Samarati, P. (2006b), “Modality Conflicts in semantics-aware access control”, Sixth International Conference on Web Engineering (ICWE’06), July 11-14, Palo Alto, CA. Delgado, J., Gallego, I. and Garcı´a, R. (2004), “Use of semantic tools for a digital rights dictionary”, EC-Web, pp. 338-47. Delgado, J., Gallego, I., Llorente, S. and Garcı´a, R. (2003), “Regulatory ontologies: an intellectual property rights approach”, OTM Workshops, pp. 621-34.

Toward semantics-aware management 71

OIR 31,1

72

DL Implementation Group (n.d.), Available at: http://dl.kr.org/dig/ eXtensible right Markup Language (XrML) 2.0 (n.d.), “eXtensible right Markup Language (XrML) 2.0”, available at: www.xrml.org Gunter, C.A., Weeks, S.T. and Wright, A.K. (2001), “Models and languages for digitals rights”, 34th Hawaii International Conference on System Sciences (HICSS’01), Maui, HI. Haarslev, V. and Mo¨ler, R. (2001), “Description of the racer system and its applications”, Proceedings of International Workshop on Description Logics (DL-2001), Stanford, CA, pp. 131-41. (The) International DOI Foundation (n.d.), “The DOI system”, available at: www.doi.org/ Llorente, S., Rodrı´guez, E. and Delgado, J. (2004), “Workflow description of digital rights management systems”, OTM Workshops, pp. 581-92. Mindswap (2004), “SWOOP – a hypermedia-based featherweight OWL ontology editor”, available at: www.mindswap.org/2004/SWOOP/ Open Digital Rights Language (n.d.), “ODRL 1.1”, available at: http://odrl.net/1.1/ODRL-11.pdf Open Mobile Alliance (2003), “OMA digital rights management”, available at: at: www. openmobilealliance.org/docs/DRM%20Short%20Paper%20DEC%202003%20.pdf Parsia, B., Sivrin, E., Grove, M. and Alford, R. (2003), “Pellet OWL reasoner”, Maryland Information and Networks Dynamics Lab, College Park, MD, available at: www. mindswap.org/2003/pellet/ Polo, J., Prados, J. and Delgado, J. (2004), “Interoperability between ODRL and MPEG-21 REL”, ODRL Workshop, pp. 65-76. Sans, T. and Cuppens, F. (2004), “Vers une formalisation des langages de DRM”, Inforsid. Tous, R., Garcı´a, R., Rodrı´guez, E. and Delgado, J. (2005), “Architecture of a semantic XPath processor, application to digital rights management”, EC-Web, pp. 1-10. Yagu¨e, M.I., Man˜a, A., Lo´pez, J., Pimentel, E. and Troya, J.M. (2003), “A secure solution for commercial digital libraries”, Online Information Review, Vol. 27 No. 3, pp. 147-59. W3C (n.d.), “Resource Description Framework (RDF)”, available at: www.w3.org/RDF/ W3C (1999), “XSL Transformations (XSLT) version 1.0”, available at: www.w3.org/TR/xslt W3C (2004), “OWL web ontology language overview”, available at: www.w3.org/TR/ owl-features/ Corresponding author Ernesto Damiani can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

The current issue and full text archive of this journal is available at www.emeraldinsight.com/1468-4527.htm

DRM, law and technology: an American perspective

DRM, law and technology

Bill Rosenblatt GiantSteps Media Technology Strategies, New York, USA Abstract Purpose – The purpose of this paper is to provide a review of developments in the USA related to digital rights management (DRM) through legal, technological, and market developments in recent years. Design/methodology/approach – This article summarizes recent developments in DRM in two areas. First is the legal landscape, including copyright law developments that apply to digital content and attempts to impose DRM technology through legislation and litigation. Second are recent advances in DRM-related technology and developments in digital content markets that are based on DRM. In both cases, USA developments are compared with the situation in Europe. Findings – Developments in American copyright law, DRM technology, and digital content markets exert heavy influences on the spread of DRM in Europe, but the legal and technological frameworks are not different, giving rise to incompatibilities. Practical implications – DRM technologies need to evolve differently for European markets, because they need to exist in the context of fundamentally different copyright law frameworks (e.g., Private Copying) and incumbent technologies (e.g., Conditional Access television). Originality/value – European readers of this paper should gain an understanding of American DRM technology and its legal context, and how they influence developments in Europe.

73 Article received 25 September 2006 Reviewed by EuDiRights Workshop Committee Approved for publication 10 October 2006

Keywords United States of America, Copyright law, Legislation, Litigation Paper type Viewpoint

The legal background The legal context for DRM is copyright law. Some relevant aspects of USA copyright law have similarities with those of European Union (EU) countries by virtue of their common derivation from the WIPO Copyright Treaty of 1996 (WCT) (WIPO, 1996). However, there is an important difference between the two laws, which leads to divergent ways of contextualising DRM within the legal framework. Most EU countries have private copying provisions in their copyright laws, which allow consumers to create copies of legitimately obtained content for their own use or that of family members. Fair use and first sale The USA has no private copying concept in its copyright law (with a narrow exception for audio works – see below). Instead, it has two relevant concepts: Fair Use (17 USC §107) and First Sale (17 USC §109) (US Congress, 2003). Fair Use is similar to Fair Dealing in UK copyright law. It is a set of principles that guide courts when deciding whether uses of copyrighted works are defensible against infringement charges. The principles include such considerations as the purpose and character of the use, including whether the use is of commercial nature, and the effect of the use on the market for the work.

Online Information Review Vol. 31 No. 1, 2007 pp. 73-84 q Emerald Group Publishing Limited 1468-4527 DOI 10.1108/14684520710731047

OIR 31,1

74

USA case law has established precedents for types of uses being considered presumptively fair, such as criticism, parody and academic research. However, because Fair Use is based on abstract principles (not facts) and decided by courts, it is impossible to conceive of a DRM scheme that “upholds Fair Use”. This has been a source of contention between USA advocacy organisations – such as the Electronic Frontier Foundation and Public Knowledge – and the media industry. Contention over Fair Use also applies to another part of USA copyright law, a portion of the Digital Millennium Copyright Act (DMCA) (US Congress, 1998). Although the primary purpose of DMCA – as with the European Union Copyright Directive of 2001 (EUCD) (European Parliament, 2001) – was to bring the USA into compliance with the WCT, it is mainly known as DMCA 1201 for one of its provisions (USC 17 §12.01) DMCA 1201 criminalises distribution of technology for circumvention (hacking) of DRM schemes – which are known as Technical Protection Measures (TPMs) in law. The anti-circumvention provision is also derived from the WCT, and there are now anti-circumvention laws in almost all EU countries through the EUCD as well. DMCA 1201 forbids circumvention of TPMs even if the purpose of the circumvention turns out to be one that a court finds to be Fair Use. It comes down to a question of whether content rights holders or consumers should get the benefit of the doubt surrounding content uses. The media industry feels that allowing exceptions to the anti-circumvention law (beyond the current narrow and temporary exceptions for such activities as encryption research and accessing content in obsolete data formats) undermines DRM by making those exceptions subject to court decisions and therefore, as a practical matter, gives the benefit of the doubt to consumers. First Sale, on the other hand, says that once a person has legitimately obtained a copyrighted work, the publisher of that work can have no further claim or influence on any further distribution of the work. First Sale law has thus enabled such services as public libraries, video rental stores and so on. Media industry interests argue that First Sale does not apply to digitally distributed works (as opposed to physically distributed digital works, such as CDs and DVDs) because they are made available under licence agreements (EULAs or “clickwrap” agreements) and not via copyright. Therefore, First Sale currently does not apply to content packaged with DRM. Secondary infringement liability The other primary principle in USA copyright law that bears on DRM is the theory of secondary infringement liability. If someone infringes copyright, and another party is somehow involved, the latter party could be legally liable; this is called secondary liability. Most countries have some form of secondary copyright infringement liability law. There is less uniformity among European secondary liability laws than there is for private copying. USA law has defined two types of secondary liability, known as contributory and vicarious infringement. Contributory infringement means knowingly aiding and abetting infringement, while vicarious infringement means being able to control infringing activities but choosing not to for one’s own gain. A key legal principle that governs applicability of secondary liability to technology providers in the USA derives from the 1984 Supreme Court decision in Sony v Universal (US Supreme Court, 1984), known as the Betamax Case because it

established the legality of Sony’s Betamax videocassette recorders (which, ironically, lost out to the VHS format in the market) over the film industry’s objections. With Betamax the Supreme Court established the principle of “significant non-infringing uses”, meaning that if a technology can be shown to have significant uses that do not infringe copyright, the maker or distributor of that technology should not be liable for infringement. Despite Betamax a federal appeals court (one level below the Supreme Court) found that both contributory and vicarious liability applied to centralised peer-to-peer file-sharing networks (i.e. file-sharing services that maintain central directories of files available) in its 2001 decision in A&M Records v. Napster (US 9th Circuit Appeals Court, 2001). As a result, developers created file-sharing network software that did not rely on central directories, such as Grokster, Morpheus, BearShare, and LimeWire. It appeared that no theory of secondary liability applied to the developers of the client software for these networks. The foregoing should set the scene for recent developments on the legal front in the USA. The future of fair use Regarding Fair Use, there is a growing recognition of a fundamental incompatibility between Fair Use guidelines (and the fact that only a court can decide on them) and technological means of controlling access to copyrighted works. In his book Free Culture Lawrence Lessig of Stanford University laments that, while Fair Use is meant to be a narrow “wedge” between infringement and non-infringement that applies to a small set of borderline content use cases and helps courts decide on them, it has been overburdened with the responsibility of determining the legality of many digital content use cases because they all happen to involve copying (of bits), thereby making them subject to copyright law (Lessig, 2004). Some of these use cases are analogous to content uses in the physical world that do not involve copying: for example, broadcasting music over a standard radio signal does not require copying, while streaming it over the Internet does. In general, because digital technology can be used to implement an almost infinite variety of content distribution models instantaneously, it becomes counterproductive to rely on Fair Use principles – let alone case precedents from the physical content world – to judge whether each and every case infringes: it would overload the court system, make it necessary to hire lawyers where ordinarily none would be necessary, and generally superimpose a physical-world timeline on a digital paradigm. Some legal scholars and advocacy groups argue that Fair Use should be kept intentionally principle-based – i.e. imprecise – because it is meant to handle exactly those cases which precise laws cannot handle. Yet in an era where technology underpins more and more content uses, this attitude seems increasingly outmoded. Someday, someone is going to have to do something about Fair Use – either scrap it in favour of more a priori decidable criteria or augment it with such. Without this, it becomes very difficult to enable technology to control access to technologically distributed content; there are too many fallbacks into the traditional legal system. (Of course, this is precisely what some of those legal scholars and advocacy groups intend.) The Private Copying concept in most EU law may be a tempting basis for modifying Fair Use so that it can work alongside DRM. (Discussions along these lines

DRM, law and technology

75

OIR 31,1

76

are currently taking place in Australia, whose copyright law supports UK-derived Fair Dealing). In most cases, private copying is a right, although rights holders are still entitled to compensation for copies made. Yet private copying presents its own incompatibilities with DRM. First, copyright collecting societies in Europe already collect levies from electronic device makers for presumptive private copies (as well as presumptive unauthorised copies). Even in the USA, the Audio Home Recording Act of 1992 (USC 17 §10) provides for levies on blank digital audio recording media based on the presumption of private copies. Using DRM to track private copies and facilitate rights holder compensation for them amounts to “double dipping”, a problem that the European Commission is currently attempting to address. More fundamentally, a DRM system that permits private copying must be able to determine whether people are making private copies for others who have rights to them, e.g. family members. This is not feasible without putting technology into place that establishes personal identities and enables people to claim (legitimate) family relationships among one other. This is technologically possible, if nontrivial, but could well raise privacy concerns – to say nothing of ambiguities over the definition of “family”. With the latter issue in mind the State of California enacted a law in 2004 that requires anyone who digitally transmits copyrighted works (e.g., through e-mail) to more than 10 other people to include the identities of the sender and the work (California State Legislature, 2004). In other words California has decided that private copying is probably acceptable for up to 10 “friends and family”, beyond which it is probably not. Unfortunately, this law attracted very little attention. But it is the kind of law that seems inevitable in the future. Of course, today’s DRM systems generally prohibit what may be legal private copying under EU law as well as Fair Use under USA law. Legal actions in Europe (particularly France) have focused on this issue, so far without much effect. Current activity in USA law is focused on coexistence of DRM with Fair Use. Legislation has been offered in Congress to amend DMCA 1201 to allow for circumvention of TPMs to facilitate Fair Uses of content: the most recent bill is the Digital Media Consumer Rights Act (US House of Representatives, 2005a), which (among other things) would create exceptions in DMCA 1201 to allow circumventions for non-infringing purposes. The bill has some chance of passage in the near future; its sponsors are bipartisan. The IT and telecommunications industries, as well as most of the aforementioned advocacy groups, back DMCRA; the media industry opposes it. As for First Sale, there is a sense that some digitally-distributed content products could someday fall under it if a judge decides that a particular licence has terms that are similar enough to copyright usage terms and that the product should be judged as if it were copyright (this is known in America as the “If it looks like a duck, waddles like a duck, and quacks like a duck, then it must be a duck” principle), thereby setting a precedent. This has not happened yet, however. Grokster and secondary liability The media industry has repeatedly sought USA government action to bring decentralised P2P networks like Grokster and Morpheus under the regime of secondary infringement liability; European courts have largely decided against this.

The first attempt was to lobby Congress to pass a law making it illegal to “induce infringement of copyright”. This was known as the INDUCE Act (US Senate, 2004). It failed: Senator Orrin Hatch, the bill’s sponsor, refused to take the bill forward when the various sides in the debate could not agree on reasonable criteria for judging “inducement”. Around the same time, a federal appeals court dealt the media industry a setback when it ruled that secondary liability did not apply to the decentralised P2P networks, Grokster and Morpheus. The media industry responded to these setbacks by getting the Supreme Court to hear the Grokster Case. In June 2005, with its decision in MGM v Grokster (US Supreme Court, 2005), the Supreme Court unanimously did what the lower court and Congress would not do: establish an “inducement” principle in copyright law. Inducement has thus become a third theory of secondary infringement liability in the USA. “Inducement to infringe” is, in fact, a well-known principle in patent law (35 USC §271(b)). An implementer of technology that infringes a patent may not actually infringe himself; he may “induce” someone who uses the technology – such as the buyer of a technology-based product – to infringe the patent. The Supreme Court established a set of criteria that determine inducement to infringe copyright: the developer of the technology must actively market the technology for infringing purposes, and its business model must depend on infringement. Those who merely invent technology that could possibly be used for infringing purposes but do not meet those criteria are not liable. The court did not overturn Betamax, but the line between “substantial non-infringing uses” and “inducement” has yet to be explored in the courts. (A current case that may test these boundaries is a journalist’s lawsuit against the video-sharing website, YouTube.) The Supreme Court found that both Grokster and Streamcast (the firm that developed the Morpheus software) met the inducement criteria. It vacated the lower court’s summary judgment in the case, which means not that the two firms were found guilty, but that the case is referred back to the lower court, which must now hold a trial and take the Supreme Court’s decision (i.e. the inducement principle) into consideration. Soon after Grokster the music industry sent cease-and-desist letters to many P2P network software developers based in the USA, and most of them chose to shut down. One that did not, LimeWire, is implementing a hash-based filtering scheme to show that it “respects copyright”, although the scheme it intends to use has been shown to be easily hackable. Grokster settled the case by selling its assets – essentially its list of subscriber information – to a service called Mashboxx (discussed below) and shutting itself down. Streamcast intends to fight the case, which could take years. Mandating DRM The media industry has also been lobbying Congress to pass legislation that makes DRM technology mandatory in digital media rendering hardware and software. A previous attempt, the so-called Hollings Bill of 2002 named after its sponsor, Senator Ernest Hollings (US Senate, 2002), failed over forceful opposition from the IT industry (led by Intel) and even some media companies with their own interests in IT. The negotiations over this bill revealed a schism in the media industry between companies with hard-line attitudes towards DRM, mainly Disney and News Corp., and those with more liberal attitudes, such as Time Warner and NBC Universal.

DRM, law and technology

77

OIR 31,1

78

The latest attempts to impose DRM-type technology on IT and consumer electronics industries are the Broadcast Flag and the so-called Analog Hole Bill. A lineup of “usual suspects” has formed around these bills as well as past ones mentioned above (e.g., the INDUCE Act): the media industry is in favour, while IT, telecoms, and consumer advocacy groups are against. Broadcast Flag (FCC, 2003) would require digital television receivers to detect a simple “flag” (bit of data) that would act as a signal that the content is not to be copied. This was established in late 2003, not as a law but as a regulation through the Federal Communications Commission (FCC), the body that regulates radio, television, telecoms, etc. The FCC chose to adopt the regulation, but a federal appeals court found that it was overstepping its authority in doing so (DC Circuit Appeals Court, 2005). Now Congress is considering legislation that would explicitly empower the FCC to adopt Broadcast Flag. This legislation is considered unlikely to pass this year (2006), mainly because it is a small provision tacked onto a major telecommunications reform bill in which much larger-scale differences have yet to be reconciled between the two houses of Congress. But it could be reintroduced next year, along with related legislation that would extend the Broadcast Flag concept to digital radio. The so-called Analog Hole Bill, formally known as the Digital Transition Content Security Act of 2005 (US House of Representatives, 2005b) is meant to address illegal analogue copying of video content, such as through analogue outputs of video players. The bill would require certain types of video playback equipment to include digital video watermarking technology – specifically that of a startup company called VEIL Interactive – that can forensically catch pirated content once it has been distributed, even if it was converted to high-quality analogue. The Analog Hole Bill is problematic because it effectively enshrines a specific technology firm’s products into law, even though the technology has established competition and has not even really been used in the situations that the bill covers. Additionally, the bill purports to solve a problem that is shorter-term and narrower than that envisioned in the Hollings Bill. For these and other reasons, the Analog Hole Bill is also deemed unlikely to pass. All of the foregoing contrasts starkly with Europe, where the only legislation containing DRM-related technology mandates has been oriented toward opening up DRM rather than requiring it: the DADVSI Bill that the French Parliament recently passed (Assemble´e Nationale, 2006). Part of this controversial slate of copyright law reforms was intended by consumer advocates like UFC-Que Choisir to impose DRM interoperability requirements on online content services and devices, ostensibly to dismantle Apple’s alleged monopoly of online music with iTunes and iPods. The Se´nat (the upper house) passed an eviscerated version of the original Assemble´e Nationale (lower house) bill, and eventually compromise legislation was enacted. The new law creates a regulatory body to adjudicate requests for DRM interoperability but ultimately, in effect, lets content owners dictate to what extent online services must provide interoperability. In other words the status quo was largely upheld. Similar legislation is now being considered in such countries as Denmark and Sweden.

New DRM technologies and content markets DRM-related technologies are being adopted at different paces in the USA compared to Europe. Some technologies are moving faster in the USA, while others are behind. Here we compare four types of DRM-based technologies and services that are of interest in both the USA and Europe: mobile content services, copyright-respecting P2P networks on the Internet, home entertainment networking and IPTV.

Mobile content services The USA is behind Europe in DRM for mobile content. Full-fidelity over-the-air wireless music services did not launch in the USA until late 2005, whereas they have been operative somewhere in the EU for over a year prior to that. Europe has been the primary venue for a standards initiative that has had the most rapid market success of any in DRM: that of the Open Mobile Alliance (OMA). There are two major versions of the OMA: (1) DRM Open Standard Version 1.0 www.openmobilealliance.org/tech/ wg_committees/bac.html, which is intended for cheaper devices and simple content such as ring tones and screen savers; and (2) Version 2.0, a much richer standard intended for full-fidelity audio and video and far more sophisticated business models than Version 1.0. OMA DRM 1.0 has enjoyed rapid success since its original launch in 2003. Wireless content services that use OMA DRM 1.0 can now be found through wireless carriers in most European countries. A few small technology companies have arisen to provide technology based on OMA DRM, including Beep Science of Norway, CoreMedia of Germany and DMDSecure of the Netherlands (now part of USA-based SafeNet). OMA DRM 2.0 was released in the summer of 2005, but it has achieved far less momentum in the market. There are several reasons for this, one of which is concerns over the licensing of patents essential to OMA DRM implementations: uncertainty over royalties (or even possible litigation) attends deployments of this supposedly open standard. Patent holders, acting through the licensing agency MPEG LA, are asking OMA DRM implementers for royalties that are higher than the overall software royalties of solutions from other vendors – including Swiss-based SDC as well as Microsoft – which have lower apparent risks of IP overhang. If OMA DRM overcomes the current tiff over patent licensing, it may be destined to be the DRM analogue to GSM in wireless telephony: it may become prevalent everywhere in the world except in the USA. (There are already OMA DRM implementations in Africa, Latin America and Asia as well as Europe.) Instead, the two major DRM-enabled mobile music services in the USA market, from carriers Verizon Wireless and Sprint PCS, include DRM technology from Microsoft and the small firm, Groove Mobile, respectively. Microsoft is expected to introduce its own wireless media playing device to the USA market by the end of 2006, and Apple is expected to release an “iPod phone” to succeed the ill-fated ROKR handset that it previously developed with Motorola. It is likely that those two firms’ DRM platforms will dominate the mobile content market in the USA.

DRM, law and technology

79

OIR 31,1

80

Copyright-respecting P2P networks Another music technology that is on the verge of takeoff in the USA market is the copyright-respecting P2P network. In the wake of the Supreme Court’s Grokster decision mentioned above P2P network providers have been looking for ways to offer file-trading services that respect copyright while preserving some of the advantages of P2P architectures. Two approaches have emerged. One enables users to trade files, but those files are protected with DRM. The most prominent example of this approach is Wurld Media’s Peer Impact; another is Shared Music Licensing’s Weed. Peer Impact has licences to music from all four “majors” (EMI, SonyBMG, Universal, and Warner Music), video content from several film studios and television networks, and various computer games. It uses Microsoft Windows Media DRM for music and video, and Macrovision’s TryMedia DRM for games. Weed features only independent-label music. Peer Impact incorporates both notions of “peer-to-peer”: that of sharing files and that of making transfer of large files more efficient by using users’ machines on the network as download points, and by splitting large files into pieces that each go through a different node on the network to reach their ultimate destination. (The latter is known generically as “swarming” and is similar to BitTorrent.) Users can contribute their PCs and Internet bandwidth, and create “home pages” and email messages that highlight their favourite content items and contain links to shareable files. In return for this users receive credits that are exchangeable for more content. The other approach to copyright-respecting P2P – and the one gaining more notoriety – is one that uses a technology called acoustic fingerprinting. Acoustic fingerprinting can purportedly analyse the bits of a music track and identify it (title, artist, etc.). The idea is to let users trade whatever files they wish on a P2P network but also to analyse those files to “take their fingerprints”. For each file the network software looks up the fingerprint in a database and determines what the track’s copyright owner wishes to do about controlling access or charging the user. The software could substitute a DRM-protected track, offer a low-quality version of the track for free, make the track freely available (e.g., because the record label wants to get the artist maximum exposure rather than maximum revenue), or do something else. The advantage of acoustic fingerprinting over the Peer Impact-style “walled garden” technique is that it can be used in a network that allows the trading of any files at all; those that the fingerprinting technology does not detect are assumed to be freely shareable. In contrast services like Peer Impact can only offer tracks that record companies have previously cleared. The acoustic fingerprinting approach started in the UK, on the Wippit network, which used technology from Gracenote. But Wippit was never able to obtain licences to use major-label content with acoustic fingerprinting. It has abandoned its original P2P architecture and now offers downloads of unprotected MP3 s as well as major-label content in Windows Media DRM. As a result, the locus for acoustic fingerprinting in copyright-respecting P2P has moved to the USA. Two services, Mashboxx and iMesh, use acoustic fingerprinting with licensed content from major labels. (iMesh is based in Israel but intends its service for the USA market.) Both services have roots in the original P2P networks. Mashboxx’s chairman is Wayne Rosso, who ran Grokster when it was first sued by the music industry.

Mashboxx uses a music distribution platform called Snocap from a company of that name chaired by Shawn Fanning, developer of the original Napster. Snocap, in turn, incorporates acoustic fingerprinting technology from the Content Identification division of Philips Electronics. IMesh began as a non-copyright-respecting P2P network; it has added acoustic fingerprinting technology from Audible Magic and obtained licences to music from all four “majors”. Both Mashboxx and iMesh are in beta at the time of this writing, as they have been for several months. There are still unresolved questions as to whether either technology can truly stand up to large-scale production use, be accurate enough in identifying music (particularly music that is not from major releases), and be sufficiently hack-proof. Home entertainment networks The third technology category of interest is home entertainment networks. The notion of the “home entertainment network” is that consumers can obtain content legitimately and then use it anywhere in their homes (or in their cars or on their personal portable devices), while at the same time rights holders can remain confident that all of the devices and the links among them will remain impervious to exploitation for piracy. Major content providers are becoming comfortable with this idea. Unfortunately, consumers are not – at least not yet. In Europe the conditional access cable television (CAS) market offers a good transition point to home entertainment networks, and most of the major CAS vendors in Europe have strategies to move their customer bases from single-set-top-box CAS to home entertainment networking with DRM. Standards initiatives and consortia are driving this development in Europe. The consortium with the most clout behind it is the Secure Video Processor (SVP) Alliance www.svpalliance.org/. SVP was originally founded by two leading makers of set-top boxes (STBs), NDS and Thomson, along with STMicroelectronics, a maker of semiconductors for STBs. SVP now numbers about 30 members, which span cable and satellite operators, content providers and consumer electronics makers in addition to CAS equipment and semiconductor providers. SVP-compliant chips are actually being shipped from semiconductor providers, and Samsung has released an SVP-compliant STB with digital video recording (DVR) functionality. Yet at least two other Europe-centric standards initiatives are addressing the same market. One is TIRAMISU (The Innovative Rights and Access Management Inter-platform SolUtion) www.tiramisu-project.org/, which has some European Commission funding and participation primarily from research laboratories, although it is also backed by NagraVision (a Swiss STB maker), Optibase (an Israeli digital video encoder and server maker), and France Telecom. Another is OMA DRM 2.0, which has some home networking functionality built into its specification. The French CAS vendor Viaccess has branched out into supporting OMA DRM 2.0, as has Philips. Home entertainment networks may penetrate European markets more quickly than the USA. The problem in the USA is that there is no obvious incumbent platform for home entertainment networking. USA penetration of CAS is much lower than in Europe, and the closest thing to a home media control centre in the USA market is the

DRM, law and technology

81

OIR 31,1

82

“home theatre receiver”, which is essentially an analogue component that only accepts direct outside content in the form of traditional radio broadcasts. In general Americans are more used to standard broadcast and cable models for obtaining video programming, even if their cable systems are digital. There is no critical mass with consumers around any single concept of home networking – even though there are plenty of DRM-related standards and other technology building blocks, such as HDCP www.digital-cp.com/home, DTCP www.dtcp.com/ and CPRM www.4centity.com/tech/cprm/, that can be combined into solutions. Instead, at least five candidates for home entertainment networking ecosystems in the USA are emerging: (1) Microsoft, with the help of partners such as HP and Creative Labs, has been positioning its Windows Media Center as the home media network control centre, along with technologies such as Microsoft Media Transfer Protocol and Windows Media DRM for Network Devices that facilitate secure media networking. (2) TiVo has a technology called TiVoToGo, which enables transfer of video programming recorded on TiVo devices to a variety of portable devices, including Apple Video iPods, Sony PlayStation Portables (PSPs) and Microsoft Windows Media compatible players such as the Creative Zen. (3) Various standards initiatives are working on DRM interoperability for home networks. The most prominent of these is the Coral Consortium www. coral-interop.org/,whose membership includes a large number of consumer electronics, IT, and media firms. Coral is best understood as advancing the interests of top-tier consumer electronics makers such as Sony, Philips, Samsung, and Matsushita (Panasonic). (4) Third-party vendors such as Mediabolic offer interoperability among different types of devices in the digital home, such as networked DVD players, streaming music devices and PCs. These technologies purport to include interoperability among those devices’ DRM schemes. Electronics vendors such as BenQ, Creative, Denon, Fujitsu, HP and Maxtor have integrated Mediabolic’s technology into products, though not all of them include DRM interoperability. (5) Finally, Apple has its own designs on the digital home, but these remain secret, as the company does not typically publicise its product roadmaps. None of these paradigms has achieved much momentum, but even so five is far too many; successful markets have no more than two or three different technology platforms. The home entertainment network market in the USA is currently too disjointed and confusing. A larger issue is that most consumers do not perceive much benefit to home entertainment networks in their current state, which is essentially complex and expensive technology that replaces cheaper, well-understood products like home video receivers and hardwire connections (analogue cables). It is difficult, for example, to convince a consumer to replace a US$20 cable TV splitter and some coaxial cable to a TV in another room with a PC or set-top box, an Ethernet or WiFi network, and an adapter for the TV that currently sells for US$150. It is also harder to convince consumers to buy their own video recorders, when increasing numbers of cable TV

providers are offering their own server-based PVR (personal video recorder) functionality for a low monthly subscription fee. The hump that electronics vendors must get over is best exemplified by a recent advertisement that HP ran in newspapers, which was essentially an instruction manual on how to buy and use Microsoft-based HP products for home entertainment networking scenarios. It may take a few more years for the consumer value propositions to become clear and the price points to become attractive. Many consumer electronics vendors are clearly depending on home entertainment networking to be the next source of high-margin revenue, now that growth in the USA market for large-screen televisions is slowing down. It is in their favour that media companies are becoming receptive to home entertainment networking and that the DRM technologies appear to be ready when the time comes.

IPTV An increasingly important entry point into the home entertainment networking market is digital television broadcasting – i.e. a video signal coming into the home that can be readily repurposed onto various devices once appropriate DRM is in place. There have been various attempts over the years to build digital TV services based on proprietary technologies or standalone standards, such as DVB-T in Europe. IPTV (Internet Protocol Television) is emerging as the most promising digital television standard yet, and it is taking off worldwide. A number of broadband providers – mostly small regional telecommunications providers – are now offering IPTV video-on-demand services in conjunction with voice (VoIP) and standard ISP services, collectively called “triple play”. There is a need to protect copyrighted material from piracy from the head ends of these services all the way through to the receiver (STB or suitably configured PC). The market for IPTV content protection is coalescing around four vendors: Widevine, SecureMedia, Verimatrix, and Latens. The last of these is based in the UK, while the others are American; the last two appear to be garnering the largest numbers of installations. Scandinavia is the area with the fastest growth in the IPTV market, followed by the USA and other parts of Europe (including Bulgaria, France, Poland and Switzerland). Little attention has been paid thus far to the interoperability of IPTV content once it is received in a home network. Presumably an STB or other device could use one of the aforementioned home networking DRM technologies to move content around a home network or store it in secure fashion for later use. In the meantime consumer electronics makers are focusing interoperability in the home network on content acquired through traditional analogue broadcasting or physical media. The introduction this year of two new optical disc formats, Blu-ray and HD DVD – both with DRMs based on the Advanced Access Content System (AACS) www.aacsla.com/home standard – introduces even more complexity into home networking scenarios. The media industry’s requirements for DRM in home entertainment networks will thus keep consumer electronics vendors busy for the foreseeable future, even as products and consumer value propositions change.

DRM, law and technology

83

OIR 31,1

84

References Assemble´e Nationale (2006), “Loi sur le droit d’auteur et les droits voisins dans la socie´te´ de l’information”, available at: www.assemblee-nationale.fr/12/ta-pdf/ta0596.pdf California State Legislature (2004), SB 1506, available at: www.leginfo.ca.gov/pub/03-04/bill/sen/ sb_15011550/sb_1506_bill_20040921_chaptered.html DC Circuit Appeals Court (2005), American Library Association et al. v. Federal Communications Commission and United States of America, 04-1037 (DC Cir. 2005), available at: http:// pacer.cadc.uscourts.gov/docs/common/opinions/200505/04-1037b.pdf European Parliament (2001), Directive 2001/29/EC of the European Parliament and of the Council, available at: http://europa.eu.int/eur-lex/pri/en/oj/dat/2001/l_167/l_ 16720010622en00100019.pdf FCC (2003), FCC 03-273, available at: http://hraunfoss.fcc.gov/edocs_public/attachmatch/ FCC-03-273A1.pdf?date ¼ 031104 Lessig, L. (2004), Free Culture, Penguin Books, New York, NY. US Congress (1998), Digital Millennium Copyright Act, available at: www.copyright.gov/ legislation/dmca.pdf US Congress (2003), Copyright Law of the United States of America, available at: www.copyright. gov/title17/circ92.pdf US House of Representatives (2005a), HR 1201, Digital Media Consumer Rights Act of 2005, available at: http://frwebgate.access.gpo.gov/cgi-bin/getdoc.cgi?dbname ¼ 109_ cong_bills&docid ¼ f:h1201ih.txt.pdf US House of Representatives (2005b), HR 4569, Digital Transition Content Security Act of 2005, available at: http://thomas.loc.gov/cgi-bin/query/z?c109:H.R.4569 US 9th Circuit Appeals Court (2001), A&M Records v. Napster, Inc., 239 F. 3d 1004, 1020 (9th Cir. 2001), available at: http://cyber.law.harvard.edu/,wseltzer/napster.html US Senate (2002), S. 2048, Consumer Broadband and Digital Television Promotion Act of 2002, available at: http://thomas.loc.gov/cgi-bin/bdquery/z?d107:s.02048 US Senate (2004), S. 2560, Inducing Infringement of Copyrights Act of 2004, available at: http:// thomas.loc.gov/cgi-bin/query/z?c108:S.2560 US Supreme Court (1984), Sony Corporation of America v. Universal City Studios, Inc., 464 U.S. 417, available at: http://caselaw.lp.findlaw.com/scripts/getcase.pl?court ¼ US&vol ¼ 464&invol ¼ 417 US Supreme Court (2005), Metro-Goldwyn-Mayer Studios Inc et al. v. Grokster Ltd et al., 545 US 125, available at: www.supremecourtus.gov/opinions/04pdf/04-480.pdf WIPO (1996), WIPO Copyright Treaty, available at: www.wipo.int/treaties/en/ip/wct/ trtdocs_wo033.html Corresponding author Bill Rosenblatt can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

The current issue and full text archive of this journal is available at www.emeraldinsight.com/1468-4527.htm

SAVVY SEARCHING

Clustering search results. Part I: web-wide search engines Pe´ter Jacso´

Clustering search results

85

University of Hawaii, Hawaii, USA Abstract Purpose – The purpose of this paper is to examine clustering search results. Traditionally, search results from professional online information services presented the results in reverse chronological order. Later, relevance ranking was introduced for ordering the display of the hits on the result list to separate the wheat from the chaff. Design/methodology/approach – The need for better presentation of search results retrieved from millions, then billions, of highly unstructured and untagged Web pages became obvious. Clustering became a popular software tool to enhance relevance ranking by grouping items in the typically very large result list. The clusters of items with common semantic and/or other characteristics can guide the users in refining their original queries, to zoom in on smaller clusters and drill down through sub-groups within the cluster. Findings – Despite its proven efficiency, clustering is not available, except for Ask, in the primary Web-wide search engines (Windows Live, Yahoo and Google). Originality/value – Smaller, secondary Web-wide search engines (WiseNut, Gigablast, and especially Exalead) offer good clustering options. Keywords Search engines, Cluster analysis, Worldwide web Paper type General review

Introduction Although traditionally search results from professional online information services were presented in reverse chronological order, this could be changed to sort the results by author, journal name, article title and some other data elements. The scope and choice of data elements for sorting depend on the host system. Dialog, for example, has offered many sort options but no sorting by the article title. The sort options also depend on the type of database. In a business directory the results typically can be sorted by postal code, NAICS code, value of assets and liabilities, number of employees, etc. Later, relevance ranking was introduced for ordering the display sequence of the hits on the result list, which is supposed to separate the wheat from the chaff. The relevance ranking algorithm is intended to determine which document or document surrogate best matches the subject as presented by the user’s query. The different search programs use very different ranking algorithms. The details of the algorithm are not revealed, but the widely different rank position of the same items in the same set retrieved from the same datafile on the various hosts clearly shows that relevance is in the eye of the search software (Jacso´, 2005). An item which is top-listed for the same query in the result list generated from the implementation of the same datafile on System A may have a much lower rank order in System B, and thus not even seen by the average user who looks at the first page of the result list which usually shows 10 items by default.

Online Information Review Vol. 31 No. 1, 2007 pp. 85-91 q Emerald Group Publishing Limited 1468-4527 DOI 10.1108/14684520710731056

OIR 31,1

86

Figure 1. Clustering of search results in the revived Northern Light service

The evolution of the World Wide Web, of course, changed the scene dramatically, and the need for better presentation of results retrieved from millions, then billions, of highly unstructured and untagged Web pages became obvious. David Seuss, the developer of the Northern Light system, recognized this a decade ago and developed an excellent, on-the-fly topical clustering system. The service was later acquired by a company flush with cash but extremely poor in competence, and the Northern Light service was driven into the ground in a short time. Justice was done, however, as the company went bankrupt and Seuss could buy back Northern Light at about 0.5-0.7 cents on the dollar at which it was sold (Hane, 2003). Interestingly, just before writing this column, the news broke that Northern Light was to resume some of its original services, including a partially free database – with the folder clustering feature intact. The test search in the revived service illustrates the type of clusters created in a few seconds simultaneous to the relevance ranked result list of 1,838 items. It would be quite time consuming to scroll through the list to find the pertinent items. However, the folder labeled Relevancy Ranking immediately guided this searcher to the most promising cluster of the result (Figure 1). The cluster had 14 items, all of them relevant and pertinent. It would have taken quite some time to get the same result set either by refining the query or scrolling down the list and picking the pertinent items one-by-one. Northern Light applies its patented clustering algorithm equally well for its relatively small collection of 100 million Web pages crawled from the Web and the journal and newspaper article collection.

Primary Web-wide search engines The clustering approach would be even more important for the largest databases, the gigantic data silos of Google, Yahoo, MSN, and Ask. Still, only the last service (which acquired and incorporated the Teoma search service) has an appropriate clustering system to guide users to the subset(s) most pertinent to them. This feature is based on the clustering module of Vivisimo, one of the most popular clustering software packages. It is licensed by many Web services, including FirstGov, and the Central Search metasearch engine. The latter was enhanced by integrating it with Vivisimo’s clustering module, forming an impressive synergy of search features. Such sophisticated systems will be discussed in Part 3 of this series. Google’s once novel relevance ranking PageRank algorithm based on the number of links received from other Web sites and on their PageRanks, has become less appropriate primarily because of the abuse by link factories, and blogs which generate zillions of fluff pages with links. The sheer size of the source base for the Web-wide crawlers is so huge that even the highly relevant pages and Web sites may be crowded out from the top list, and lost to the users, unless they have hints about the composition of the result set, and appropriate tools to zero in on the subset most pertinent to them. The title (“You still Google? That is so last week”) of Mary Ellen Bates’ column (Bates, 2005) rings even more true now than a year ago, and so does her warning: “What I find distressing about this [the fact that at least 80 percent of her workshops audience still start their search with Google] is that most people still consider Google to be the gold standard of search engines.”

Clustering search results

87

Figure 2. Google’s one-dimensional result list

OIR 31,1

88

Figure 3. Good clustering through a special toolbar for Windows Live

Figure 4. Informatively labeled clusters in Ask.com result list

A look at the result of the same search in Google (Figure 2), even for a longer-than-usual, three-word query, illustrates the limitations of the one-dimensional ordered listing. Although Google always bluffs in reporting absurdly high number of hits (and so do most of its peers, although to a lesser extent), even the actual number of hits for the test query (which plummeted as scrolling through in batches of 100 items to 759) is too much to wade through.

Clustering search results

89

Figure 5. Drill-down process and unusual cluster size reporting in GigaBlast

Yahoo is not any better, nor is Microsoft’s revamped MSN Search (now Windows Live); but at least Microsoft’s Asian Research and Development arm has developed a very good Search Result Cluster (SRC) toolbar which can guide users in refining their search efficiently (Zeng et al., 2004). It has the important added feature that the size of the clusters (in terms of the items in the group) is clearly indicated. The name of the cluster labels may look somewhat odd, but they are automatically generated by the software with the purpose of being distinctive and descriptive. This clustering module should have been incorporated in the new Windows Live search engine instead of the deeply disappointing Windows Live Academic service (Jacso´, 2006) (see Figure 3). While you can hardly tell apart the one-dimensional result list layout of Google, Yahoo and MSN, the result list from Ask is distinctive (Figure 4). It really guides and motivates the users to refine their searches just by clicking on the label of the most promising cluster after a broad first query, the drill down through one or more of the sub-clusters if needed to narrow the search further. Secondary Web-wide search engines Interestingly, the search engines in the second tier have been much more creative than Yahoo, Google and MSN in making the search experience more efficient. GigaBlast (Figure 5) has an excessive number of 3,823,216 hits for the initial test query. However,

OIR 31,1

90

Figure 6. Unnecessarily crowded but still useful cluster list

its clustering module gives some hints for the topical composition of the set. It allows the user to start drilling down in one of the smaller subsets, such as the Web Search Result cluster, then to choose a cluster within that partition for further refinement. For example, choosing the cluster labeled “search results with document clustering” would yield a good set of 19 items. Indicating the size of the clusters as a percentage of the whole set, instead of an absolute number, is an interesting solution, especially as the reported hit numbers are hardly believable in the initial search. This notation also makes it clearer that arts of the clusters overlap, just as items in a traditional bibliographic search retrieved by using descriptor A will overlap to some extent with the set retrieved by using descriptor B. WiseNut has the smallest database of the Web-wide search engines (and for that reason only the single word query, “clustering”, was used for illustration). Still, its users certainly benefit from the cluster list presented with item numbers within the clusters above the result list. The cluster list is difficult to read, because after each cluster label there is the absolutely unnecessary [search this] extra links which make the list annoyingly crowded (Figure 6). A single line above the cluster list could provide sufficient instruction even for those who cannot figure out that clicking on the blue underlined cluster names would display the 3-5 items in the cluster. All the above mentioned systems do only topical clustering. Exalead offers that and additional clustering criteria, which provide more guidance to the users in selecting the clusters, such as the type, format, language and geographical origin of the documents. (The last is not shown in Figure 7). Among the web-wide search engines Exalead provides the best combinations for clustering. In addition the software makes it very swift to undo the restrictions, experiment with other limiters, and try “what-if” scenarios. It could be only better if the size of the clusters would be indicated, so as not to bother with excluding French language documents, as in this example it would only reduce the set by one. There is

Clustering search results

91

Figure 7. Multiple clusters by subject, media, file type and genre in Exalead

no date clustering because the dates extracted from Web pages may not have anything to do with publication dates (or, for Web-born documents, posting dates). The search engines which are used with databases that have well-defined data structure, consistently applied descriptors, subject headings, classification codes, author names, journal names and many other data elements, can display more versatile clusters, as demonstrated by the innovative and superb new online public access catalogue of North Carolina State University (NCSU). Clustering of search results retrieved from highly structured and richly tagged databases, including the aforementioned OPAC of NCSU, will be the topic of Part 2 of this series. References Bates, M.E. (2005), “You still Google? That is so last week”, EContent, Vol. 28 No. 9, p. 27. Hane, P. (2003), “Seuss hopes Northern Light will rise and shine”, Information Today, available at: www.infotoday.com/newsbreaks/nb030602-1.shtml Jacso´, P. (2005), “Relevance in the eye of the search software”, Online Information Review, Vol. 29 No. 6, pp. 676-682, available at: http://dx.doi.org/10.1108/14684520510638106 and an enhanced version at: www2.hawaii.edu/ , jacso/savvy/relevance/ Jacso´, P. (2006), “Windows live academic”, Peter’s Digital Reference Shelf, available at: http:// projects.ics.hawaii.edu/ , jacso/gale/windows-live-acad/windows-live-acad.htm Zeng, H.J., He, Q-C., Chen, Z., Ma, W. and Ma, J. (2004), “Learning to cluster Web search results”, Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 210-217, available at: http://doi.acm.org/10.1145/ 1008992.1009030 Corresponding author Pe´ter Jacso´ can be contacted at: [email protected] To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

OIR 31,1

92

Online Information Review Vol. 31 No. 1, 2007 pp. 92-97 q Emerald Group Publishing Limited 1468-4527

Book reviews Digital Libraries: Integrating Content and Systems Mark Dahl, Kyle Banerjee and Michael Spalti Chandos Publishing Oxford 2006 ISBN 1843341557 203 pp. 39.95 soft cover Keywords Digital libraries, Library systems Review DOI 10.1108/14684520710731065 This superior book which expertly explains the making of digital libraries consists of ten chapters: Introduction; Enabling Technologies; The Role of Standards in Digital Library Integration; Authentication, Identity Management and Security; Interfacing with Integrated Library Systems; Electronic Resource Management; Digital Asset Management; Integration with Content Providers; Library Portals and Conclusion: Digital Libraries and the Library Organisation. There is also a helpful glossary of acronyms used in the book, a bibliography and an index. The table of contents is very detailed, including not only chapter titles but also section headings and summaries of content. The authors compare various components of digital systems and suggest the most appropriate components for successfully and efficiently managing a digital library system. The authors also indicate why some digital resources are more effective than others. Following the introduction, Chapters 2 and 3 provide, by way of background, information on Web-based services and the requirement for robust standards for digital library systems. Chapters 5-7 deal primarily with management issues related to digital systems: issues of security, the content of integrated library systems, electronic resources management and the importance of digital asset management. Chapter 8 cover issues related to adding information to digital systems, including by subscription and by purchase. In this chapter the authors also offer criteria for selecting digital library system devices. Finally, Chapters 9 and 10 look at library portals and speculate on the future of digital libraries: individual purchase of digital libraries, digital libraries on iPods, etc. Clearly in 203 pages the authors cannot cover the topic of their nbook in great detail, and the work should be seen as a basic introduction to specific aspects of digital libraries rather than a comprehensive monograph on the topic. Nevertheless, Digital Libraries: Integrating Content and Systems is recommended as a useful introduction to the workings of digital libraries. Melinda F. Matthews University of Louisiana at Monroe

Self, Peer and Group Assessment in E-Learning Tim S. Roberts Idea Group Publishing Hershey, PA 2006 ISBN 159140966 333 pp. US$69.95 soft cover Keywords E-learning, Assessment Review DOI 10.1108/14684520710731092 The greater use of online technologies provides the ideal environment for self-, peer and group assessment, as well as opportunities to increase group and collaborative work where students help each other. It, however, brings us to questions such as: . How can assessment practices be used to improve learning processes? . What support is there for more student-centred methods of teaching and learning? Under the very capable editorship of Tim Roberts, a group of international experts, including practitioners as well as researchers, explores the intricacies of non-instructor-based assessments in the context of e-learning, for example self-, peer and group assessment. Self, Peer and Group Assessment in E-Learning consists of a preface, 13 chapters, information about the authors and an index. Chapter 1 offers an introduction, an explanation of the meaning of key terms, as well as an overview of commonly experienced advantages and disadvantages. Chapter 2 offers a case study in peer evaluation for second language teachers in training. The author reports that greater honesty has been noted when using quizzes and surveys submitted online than with paper-based ones submitted directly to the instructor. Chapter 3 explores self-assessment in Portuguese engineering education. Chapter 4 reports a case study on a course about the self-, peer and group assessment of adult educators. Here online discussion forums play an essential role. Chapter 5 offers a case study of assessment in a core first-yeast educational psychology unit through flexible delivery implementation. In Chapter 6 the value of learning circles and peer review in graduate education is explored, while Chapter 7 deals with learning English for technical purposes. Further chapters deal with self- and peer assessment in a problem-based learning environment, designs for Web-assisted peer and group assessment, using online peer feedback to improve performance in a composition course, interpersonal assessment, and evaluating others in online environments. In the two final chapters a framework for assessing self-, peer and group performance in e-learning, and the demise of exams, are dealt with. The chapters are mostly well referenced and can direct readers to further sources of information. The intention with Self, Peer and Group Assessment in E-Learning is to stimulate thought on the topic of peer, self- and group assessment, as well as changes in assessment practices. Tim Roberts and the contributors have succeeded very well in achieving this aim.

Book reviews

93

OIR 31,1

The book is therefore highly recommended for all libraries serving academic institutions, as well as all LIS professionals involved in information literacy programmes. In a professional environment stressing collaborative work, we can certainly incorporate peer and group work in our assessment practices. Ina Fourie University of Pretoria

94 Handbook of Research on ePortfolios A. Jafari and C. Kaufman Idea Group Publishing Hershey, PA 2006 ISBN 1591408903 595 pp. US$195.00 hard cover Keyword E-learning Review DOI 10.1108/14684520710731083 Although printed and electronic portfolios have drawn wide interest over the last decade, the technology is gaining importance in its potential in electronic learning environments. Since this applies to higher as well as other educational sectors and eventually also to the workplace, Handbook of Research on ePortfolios is a very timely publication. It is certainly the most comprehensive, best-structured “research treasure” on the topic of e-portfiolios that I can imagine. Handbook of Research on ePortfolios addresses all the major aspects from concepts and issues to implementation, shortcomings, strengths and experiences. Having read through the substantial text, there is really nothing I could see that was lacking. My greatest frustration was the fact that, while reading the book for this review, I did not have time to make an in-depth study of the different contributions. Personally I cannot wait to start exploiting eportfolios in my own courses. I have known about the concept for some years, and have experience in printed portfolios. Handbook of Research on ePortfolios is, however, the first comprehensive publication on electronic portfolios and the supporting research literature. Over 100 experts with extensive experience of portfolios contributed to the publication. The wide variety of topics covered include career eportfolios; challenges of implementing successful eportfolio systems; development issues; eportfolios as spaces for collaboration; eportfolios for knowledge and learning; eportfolios for teacher education; health professional educators; the implementation of eportfolios and the elements of successful implementation; opportunities, challenges and future prospects; professional development; using eportfolios for peer assessment and the use of Web-based portfolios in UK schools. These topics are addressed in two sections. Section 1, containing 22 chapters, addresses the concepts of eportfolios and the links to thinking and technology. It focuses on the conceptual aspects of electronic portfolio systems and how these can enhance teaching and learning. Section 2, 29 chapters, contains eportfolio case studies covering a wide variety of applications and contexts. Apart from a brief table of contents listing chapter titles and giving the reader an overview at a glance, there is a detailed table of contents with a short

abstract/summary for each chapter. Another valuable feature of the book is the list of more than 370 detailed definitions spread among the different chapters. The only disappointing aspect of the book is the index. A 13-page index with no cross references is really not doing any justice to such an excellent 595 page publication. Handbook of Research on ePortfolios is highly recommended. It belongs on the shelves of all academic libraries. It also belongs as a reference work in all teaching departments in education, and on the book shelves of serious researchers in educational practices and assessment. For individual use, the publication might, however, be a bit expensive at US$195. It certainly must be recommended literature for education students, where Handbook of Research on ePortfolios can serve as inspiration and a mechanism to help educators establish critical benchmarks. Ina Fourie University of Pretoria

Ethical Decision Making for Digital Libraries Cokie G. Anderson Chandos Publishing Oxford 2006 ISBN 1843341492 138 pp. £39.95 soft cover Keywords Digital libraries, Ethics Review DOI 10.1108/14684520710731074 In a clear and easy-to-follow style Ethical Decision Making for Digital Libraries explores the unique ethical dilemmas that face digital librarians in selecting, preparing, preserving and publishing digital materials. It starts with a brief but valuable introduction to ethical theory and applied ethics, including an explanation of virtue ethics, Kant and deontological theory, and utilitarianism. Next follows chapters on the codes of ethics in the information professions (a number of such codes are brought to the reader’s attention), ethics and digitisation policies as well as ethics in the selection of materials to digitise (e.g. priorities of criteria, copyright and privacy issues). Ethics and funding (including grants, corporate sponsorships, individual donors and sustaining digital collections), digital collaborations (e.g. choosing partners and ending a collaboration), digitisation standards (a number of useful sources are offered), the digitisation process, digital preservation, access (including open access and institutional repositories) and digital library management (including personnel management and vendor relationship) are discussed in subsequent chapters. The final chapter deals with ethics for 21st century librarians. Ethical Decision Making for Digital Libraries intends to stimulate discussion of ethical issues in professional organisations, graduate schools of information science and among librarians who work in this field. As such, it can be recommended as a very stimulating, although rather brief, point of departure. Although I would have expected a slightly more substantial publication for the price, I still consider it a good buy. Ina Fourie University of Pretoria

Book reviews

95

OIR 31,1

96

Web and Information Security Elena Ferrari and Bhavani Thuraisingham IRM Press Hershey, PA 2006 ISBN 1591405890 318 pp. Price not reported, soft cover Keywords Data security, Internet Review DOI 10.1108/14684520710731100 The Internet, and with it the World Wide Web (WWW), has become an information highway and this has resulted in an even greater need to manage data, information and knowledge. In their new book, Web and Information Security, authors Elena Ferrari and Bhavani Thuraisingham state that conventional tools, such as catalogues and databases, have become ineffective to control all the information that has become available. New tools and techniques are now needed to effectively manage these data. Web and Information Security includes an edited collection of papers that were presented at a workshop held at the IEEE COMPSAC (Computer Systems and Applications) Conference in August 2002 in Oxford. Several additional papers appearing in this volume are on state-of-the-art topics such as semantic Web security and sensor information security. The aim of the book is to provide some of the key developments, directions, and challenges for securing the semantic Web, enforcing security policies, as well as securing some of the emerging systems such as multimedia and collaborative systems. It is written by experts in the field of information security, the semantic Web, multimedia systems, group collaboration systems and data mining systems. The volume is divided into three sections: (1) Section 1. Securing the Semantic Web. This section consists of five chapters addressing various aspects of securing the semantic Web, such as defining and enforcing security policies for the Semantic Web; describing issues on securing Web services, specifically those that need to be standardised; defining and enforcing security policies for Web services. The fourth chapter shows how inference problems can be handled, while the final chapter in this section shows how the concepts from secure semantic Web and secure grid can be integrated to secure the semantic grid. (2) Section 2. Policy Management and Web Security. Five chapters focusing on various policy issues for Web-based information systems constitute this section. It describes how users can be prevented from accessing harmful content; focuses on privacy for text documents; specifying and enforcing access control policies; the management and administering of Web-based systems; and arguments as to why the Chinese Wall model cannot be used for mandatory access control only. (3) Section 3. Security for Emerging Applications. This section focuses on the incorporation of security into some emerging systems such as multimedia systems; sensor information systems; and flexible data sharing as well as

effective data replication mechanisms. The final chapter describes how one can carry out data mining while maintaining privacy.

Book reviews

This volume could be used as a reference book for senior undergraduate or graduate courses in information security. It also is useful for technologists, managers and developers who want to know more about emerging security technologies. The bibliographies at the end of each chapter prompt further research, and a useful index completes the volume. Madely du Preez University of South Africa

97

OIR 31,1

Guide to the professional literature

98

This column is designed to alert readers to pertinent wider journal literature on digital information and research. A User-Centred Design and Evaluation of IR Interfaces Ahmed, S.M.Z., McKnight, C, and Oppenheim, C. in Journal of Librarianship and Information Science, Vol. 38, No. 3, 2006, pp. 157-72 This paper presents a user-centred design and evaluation methodology for ensuring the usability of IR interfaces. The methodology is based on sequentially performing: a competitive analysis, user task analysis, heuristic evaluation, formative evaluation and a summative comparative evaluation. These techniques are described, and their application to iteratively design a prototype IR interface, which was then evaluated, is described. After each round of testing, the prototype was modified as needed. The user-centred methodology had a major impact in improving the interface. Results from the summative comparative evaluation suggest that users’ performance improved significantly in our prototype interface compared with a similar competitive system. They were also more satisfied with the prototype design. This methodology provides a starting point for techniques that let IR researchers and practitioners design better IR interfaces that are both easy to learn to use and remember. Strong copyright plus DRM plus weak net neutrality 5 digital dystopia? Bailey, C.W. in Information Technology and Libraries, Vol. 25, No. 3, 2006, pp. 116 þ Three critical issues-a dramatic expansion of the scope, duration, and punitive nature of copyright laws; the ability of Digital Rights Management (DRM) systems to lock-down digital content in an unprecedented fashion; and the erosion of Net neutrality, which ensures that all Internet traffic is treated equally-are examined in detail and their potential impact on libraries is assessed. How legislatures, the courts, and the commercial marketplace treat these issues will strongly influence the future of digital information for good or ill.

Online Information Review Vol. 31 No. 1, 2007 pp. 98-105 q Emerald Group Publishing Limited 1468-4527

Using OAI-PMH and METS for Exporting Metadata and Digital Objects between Repositories Bell, J. and Lewis, S. in Program – Electronic Library and Information Systems, Vol. 40, No. 3, 2006, pp. 268-76 This paper examines the relationship between deposit of electronic theses in institutional and archival repositories. Specifically the paper considers the automated export of theses for deposit in the archival repository in continuation of the existing arrangement in Wales for paper-based theses. The paper presents a description of software that makes use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) as the first stage in the automatic import and ingest of items between institutional and archival repositories. The implications of this approach on the management of the institutional repository are also considered. The paper shows

that OAI-PMH is a useful approach to harvesting the metadata for items to be imported into an archival repository. This reduces the difficulty of maintenance of the import and export software components albeit at the possible expense of necessitating certain requirements on the management of the institutional repository. Options for Putting CDS/ISIS Databases on the Internet Buxton, A. in Program – Electronic Library and Information Systems, Vol. 40, No. 3, 2006, pp. 286-95 This paper reviews the variety of software solutions available for putting CDS/ISIS databases on the internet. To help anyone considering which route to take. It briefly describes the characteristics, history, origin and availability of each package. Identifies the type of skills required to implement the package and the kind of application it is suited to. Covers CDS/ISIS Unix version, JavaISIS, IsisWWW, WWWISIS Versions 3 and 5, Genisis, IAH, WWW-ISIS, and OpenIsis. There is no obvious single “best” solution. Several are free but may require more investment in acquiring the skills to install and configure them. The choice will depend on the user’s experience with CDS/ISIS formatting language, HTML, programming languages, operating systems, open source software, and so on. Bibliographic Displays In Web Catalogs: Does Conformity to Design Guidelines Correlate with User Performance? Cherry, J.M., Muter, P. and Szigeti, S.J. in Information Technology and Libraries, Vol. 25, No. 3, 2006, pp. 154-62 The present study investigated whether there is a correlation between user performance and compliance with screen-design guidelines found in the literature. Rather than test individual guidelines and their interactions, the authors took a more holistic approach and tested a compilation of guidelines. Nine bibliographic display formats were scored using a checklist of eighty-six guidelines. Twenty-seven participants completed 90 search tasks using the displays in a simulated Web environment. None of the correlations indicated that user performance was statistically significantly faster with greater conformity to guidelines. In some cases, user performance was actually significantly slower with greater conformity to guidelines. In a supplementary study, a different set of forty-three guidelines and the user performance data from the main study were used. Again, none of the correlations indicated that user performance was statistically significantly faster with greater conformity to guidelines. Research Note – How Often Should Reputation Mechanisms Update a Trader’s Reputation Profile? Dellarocas, C. in Information Systems Research, Vol. 17, No. 3, 2006, pp. 271-85 Reputation mechanisms have become an important component of electronic markets, helping to build trust and elicit co-operation among loosely connected and geographically dispersed economic agents. Understanding the impact of different reputation mechanism design parameters on the resulting market efficiency has thus emerged as a question of theoretical and practical interest. Along these lines, this note studies the impact of the frequency of reputation profile updates on co-operation and efficiency. The principal finding is that, in trading settings with pure moral hazard and

Guide to the professional literature 99

OIR 31,1

100

noisy ratings, if the per-period profit margin of co-operating sellers is sufficiently high, a mechanism that does not publish every single rating it receives but rather only updates a trader’s public reputation profile every k transactions with a summary statistic of a trader’s most recent k ratings can induce higher average levels of co-operation and market efficiency than a mechanism that publishes all ratings as soon as they are posted. This paper derives expressions for calculating the optimal profile updating interval k, discusses the implications of this finding for existing systems, such as eBay, and proposes alternative reputation mechanism architectures that attain higher maximum efficiency than the, currently popular, reputation mechanisms that publish summaries of a trader’s recent ratings. Repository Librarian and the Next Crusade: The Search for a Common Standard for Digital Repository Metadata Goldsmith, B. and Knudson. F. in D-Lib Magazine, Vol. 12, No. 9, 2006 www.dlib.org/dlib/september06/goldsmith/09goldsmith.html Charged with selecting a metadata standard to use in their multi-million record digital repository, the authors studied the abilities of MARCXML, Dublin Core, PRISM, ONIX, and MODS to meet their requirements for granularity, transparency and extensibility. This paper describes their comparison of these formats, states their selection, describes their principles of use, and evaluates their experiences over the two years the repository has been in operation. The Evolution of Corporate Web Presence: A Longitudinal Study of Large American Companies Heinze, N. and Hu, Q. in International Journal of Information Management, Vol. 26, No. 4, 2006, pp. 313-25 This paper presents the results of a six-year longitudinal survey of the Websites of Standard & Poor’s (S&P) 500 companies. Using the technology acceptance model (TAM) and impression management theory as guidance, and eight design and functional measures, the authors found that S&P 500 companies have gone through remarkable transformation in their Web presence during the evaluation period of 1997-2003, signified by increasing levels of information, interactivity, and service offered at their Websites. There is a continuing trend towards increasing numbers and types of features offered, suggesting that large companies are placing greater importance on customer orientation to their Websites in an effort to create positive impressions about their companies and to induce consumer acceptance of their e-commerce technology. Digital Preservation in the Context of Institutional Repositories Hockx-Yu, H. in Program – Electronic Library and Information Systems, Vol. 40, No. 3, 2006, pp. 232-43 This paper discusses the issues and challenges of digital preservation facing institutional repositories and to illustrate the Joint Information Systems Committee’s (JISC) view on institutional repositories and its key initiatives in helping UK institutions address these issues. Digital preservation is a complex process and there are still many unsolved issues which make it a challenging task for institutional repositories. However, the wide deployment of institutional repositories also provides

new opportunities for digital preservation. Much could be done to consider digital preservation from the outset, to involve the authors and to embed digital preservation into repository workflow, which will ease the later preservation tasks. A number of ongoing JISC-funded projects are briefly reported which explore different models for the provision of digital preservation services for institutional repositories. These models may be a way forward to tackle collectively the issue of long-term preservation within the setting of institutional repositories. Depending on the outcomes of the projects, further investigation and implementation could be undertaken to test the models. From Librarian to Digital Communicator Huwe, T.K. in Online, Vol. 30, No. 5, 2006, pp. 21-6 Huwe describes librarians who are putting themselves at the forefront of emerging online information technologies and, by doing this, making themselves indispensable to their institutions in ways that then open up opportunities for the delivery of more traditional library services. The technologies range from ListServs to Wikis, Blogs and Podcasts. Some prove successful, and some not. A Dynamic Approach to Make CDS/ISIS Databases Interoperable over the Internet Using the OAI Protocol Jayakanth, F., Maly, K., Zubair, M. and Aswath, L. in Program – Electronic Library and Information Systems, Vol. 40, No. 3, 2006, pp. 277-85 There are many bibliographic databases that are being maintained using legacy database systems. CDS/ISIS is one such legacy database system. It was designed and developed specifically for handling textual data. Over the years, many databases have been developed using this package. There is, however, no mechanism supported by the package for seamless interoperability of databases. The open archives initiative (OAI) addresses the issue of interoperability by using a framework to facilitate the discovery of content stored in distributed archives or bibliographic databases through the use of the OAI Protocol for Metadata Harvesting (OAI-PMH). The protocol is becoming a de facto standard for interoperability of DLs. Many of the legacy database systems that are in use today, to the best of our knowledge, for various reasons, are not CAI-compliant. This makes it difficult for the legacy databases to share their metadata automatically. There are two possible approaches to make legacy databases OAI-compliant – static and dynamic. This paper discusses the dynamic approach to make CDS/ISIS databases OAI-compliant. The dynamic approach is a simple way to make legacy databases OAI-compliant so that they become interoperable with other OAI-compliant Us. The Effects of Trust-Assuring Arguments on Consumer Trust in Internet Stores: Application of Toulmin’s Model of Argumentation Kim, D. and Benbasat, I. in Information Systems Research, Vol. 17, No. 3, 2006, pp. 286-300 A trust-assuring argument refers to “a claim and its supporting statements used in an Internet store to address trust-related issues”. Although trust-assuring arguments often appear in Internet stores, little research has been conducted to understand their effects on consumer trust in an Internet store. The goals of this study are: to investigate

Guide to the professional literature 101

OIR 31,1

102

whether or not the provision of trust-assuring arguments on the website of an Internet store increase consumer trust in that Internet store; and to identify the most effective form of trust-assuring arguments to provide guidelines for their implementation. The results indicate: providing trust-assuring arguments that consist of claim plus data or claim plus data and backing increases consumers’ trusting belief but displaying arguments that contain claim only does not and; trust-assuring arguments that include claim plus data and backing lead to the highest level of trusting belief among the three forms of arguments examined in this study. Based on the results, we argue that Toulmin’s (1958) model of argumentation is an effective basis for website designers to develop convincing trust-assuring arguments and to improve existing trust-assuring arguments in Internet stores.

Privacy Protection In Data Mining: A Perturbation Approach for Categorical Data Li, X.B. and Sarkar, S. in Information Systems Research, Vol. 17, No. 3, 2006, pp. 254-70 To respond to growing concerns about privacy of personal information, organisations that use their customers’ records in data-mining activities are forced to take actions to protect the privacy of the individuals involved. A common practice for many organisations today is to remove identity-related attributes from the customer records before releasing them to data miners or analysts. We investigate the effect of this practice and demonstrate that many records in a data set could be uniquely identified even after identity-related attributes are removed. We propose a perturbation method for categorical data that can be used by organisations to prevent or limit disclosure of confidential data for identifiable records when the data are provided to analysts for classification, a common data-mining task. The proposed method attempts to preserve the statistical properties of the data based on privacy protection parameters specified by the organisation. We show that the problem can be solved in two phases, with a linear programming formulation in Phase I (to preserve the first-order marginal distribution), followed by a simple Bayes-based swapping procedure in Phase 11 (to preserve the joint distribution).

User-Centered Design of a Web Site for Library and Information Science Students: Heuristic Evaluation and Usability Testing Manzari, L. and Trinidad-Christensen, J. in Information Technology and Libraries, Vol. 25, No. 3, pp. 2006, 163-9 This study describes the life cycle of a library Web site created with a user-centered design process to serve a graduate school of library and information science (LIS). Findings based on a heuristic evaluation and usability study were applied in an iterative redesign of the site to better serve the needs of this special academic library population. Recommendations for design of Web-based services for library patrons from LIS programs are discussed, as well as implications for Web sites for special libraries within larger academic library settings.

Health Information: Does Quality Count for The Consumer? How Consumers Evaluate the Quality of Health Information Materials across a Variety of Media Marshall, L.A. and Williams, D. in Journal of Librarianship and Information Science, Vol. 38, No. 3, 2006, pp. 141-56 An aspect of the information literacy of health information consumers is explored, in particular whether and how they evaluate the quality of health information on the Internet and in printed formats. A total of 32 members of patient support groups in Northeast Scotland were recruited to take part in information review groups (a variation of focus group methodology) where discussion focused on a set of health information materials. Data analysis revealed 15 ways in which the participants evaluated quality. The two most important indicators of quality were organisational authority and the use of plain language. They did not find many of the indicators of evidence-based information. Participants demonstrated lack of confidence about their ability to select quality health information and relied on pre-selection by authoritative sources (libraries, support groups, health professionals) and distrusted the Internet. Librarian Publishing Preferences and Open-Access Electronic Journals Peterson, E. in E-JASL: The Electronic Journal of Academic and Special Librarianship, Vol. 7, No. 2, 2006 http://southernlibrarianship.icaap.org/content/v07n02/peterson_e01.htm Peterson’s six-question survey found that, while 80 per cent of authors had considered publishing in an open access journal and 42 per cent had actually done so, only 48 per cent said the following statement was false: “Usually I do not publish in free electronic journals because they are viewed by myself or by my institution as ‘lesser’ than established journals titles.” Moreover, when asked to “rate each of these items when selecting a journal to publish your article”, only 7 per cent said that “Free/Open-Access on the Internet” was very important and only 28 per cent said it was important. Peterson concludes that “The written comments indicate that OA titles are not yet on par with their paper/electronic subscription based counterparts. OA editors need to ensure that their journals are peer reviewed, indexed, and of general high quality. Permanence in and of itself can also lend credibility to the title. It also appears that librarians think that even if the journal is indexed and peer reviewed, the editors can do a better job of marketing the title so that more librarians are aware of this new venue for publishing”. Content Management for the Virtual Library Salazar, E. in Information Technology and Libraries, Vol. 25, No. 3, 2006, pp. 170-5 Traditional, larger libraries, can rely on their physical collection, coffee shops, and study rooms as ways to entice patrons into their library. Yet virtual libraries merely have their online presence to attract students to resources. This can only be achieved by providing a fully functional site that is well designed and organised, allowing patrons to navigate and locate information easily. One such technology significantly improving the overall usefulness of Web sites is a content management system (CMS). Although the CMS is not a novel technology per se, it is a technology smaller libraries cannot afford to ignore. In the fall of 2004, the Northcentral University Electronic Learning Resources Center (ELRC), a small, virtual library, moved from a static to a

Guide to the professional literature 103

OIR 31,1

104

database-driven Web site. This article explains the importance of a CMS for the virtual or smaller library and describes the methodology used by ELRC to complete the project. Repositories for Research: Southampton’s Evolving Role in the Knowledge Cycle Simpson, P. and Hey, J. in Program – Electronic Library and Information Systems, Vol. 40, No. 3, 2006, pp. 224-31 This paper provides an overview of how open access (OA) repositories have grown to take a premier place in the e-research knowledge cycle and offer Southampton’s route from project to sustainable institutional repository. The evolution of institutional repositories and OA is outlined raising questions of multiplicity of repository choice for the researcher. A case study of the University of Southampton research repository (e-Prints Soton) route to sustainability is explored with a description of a new project that will contribute to e-research by linking text and data. Design and Development of an Institutional Repository at the Indian Institute of Technology Kharagpur Sutradhar, B. in Program – Electronic Library and Information Systems, Vol. 40, No. 3, 2006, pp. 244-55 This paper describes how an institutional repository (IR) was set up, using open source software, at the Indian Institute of Technology (IIT) in Kharagpur. Members of IIT can publish their research documents in the IR for online access as well as digital preservation. Material in this IR includes instructional materials, records, data sets, electronic theses, dissertations, annual reports, as well as published papers. This opens up the world of scholarly publishing in a way that causes re-examination of many of the current practices of scholarly communication and publishing. This paper provides evidence on how to set up an IR and how to create different communities and, under each community, many collections using the DSpace software. Evaluating the Consistency of Immediate Aesthetic Perceptions of Web Pages Tractinskya, N., Cokhavia, A. and Kirschenbaum, M. in International Journal of Human-Computer Studies, Vol. 64, No. 11, 2006, pp. 1071-83 The article discusses a series of studies on people’s ability to rate the aesthetic qualities of Web pages. The research finds that people are consistent in their own judgement but that this determination tends to differ from one individual to another. The authors also look at “design characteristics” that might affect perception across a broad scale. Personal Name Identification in the Practice of Digital Repositories Xia, J.F. in Program – Electronic Library and Information Systems, Vol. 40, No. 3, 2006, pp. 256-67 This paper proposes improvements to the identification of authors’ names in digital repositories, based on analysis of current name authorities in digital resources, particularly in digital repositories, and analysis of some features of existing repository applications. The paper finds that the variations of authors’ names have negatively affected the retrieval capability of digital repositories. Two possible solutions include

using composite identifiers that combine author name, publication date, and author affiliation, and also asking authors to input the variants of their name, if any, at the time of depositing articles. A Strategic Case for E-Adoption in Healthcare Supply Chains Zheng, J.R., Bakker, E., Knight, L., Gilhespy, H., Harland, C. and Walker, H. in International Journal of Information Management, Vol. 26, No. 4, 2006, pp. 290-301 This paper examines whether a strategic case for e-commerce can be recognised and the factors that influence e-adoption, using e-business development models, a contingency approach and a stakeholder approach. The paper explores the link of e-commerce with strategy and the potential strategic benefits, risks and problems. This paper analysed e-adoption in four diverse healthcare supply chains in the context of the English National Health Service (NHS). The fieldwork showed there is of limited use of e- in supply chains; there are key problems associated with perceived benefits and costs by different actors both within organisations and within the chain. The paper proposes a framework to link the case for e-commerce with the achievement of strategic objectives across three inter-related domains-health, supply and business.

Guide to the professional literature 105

OIR 31,1

106

Online Information Review Vol. 31 No. 1, 2007 pp. 106-107 q Emerald Group Publishing Limited 1468-4527

Note from the publisher Emerald at 40 This year Emerald Group Publishing Limited celebrates its 40th Anniversary. As anyone with more than two score years under their belt (or approaching it) will know, 40 represents a milestone. It marks a point at which we have, or we are supposed to have, come to terms with the world and reached a good understanding of what we want from life. And for those of a more contemplative persuasion, it prompts us to reflect on our earlier years and how we got to where we are. It is perhaps not so different for a company. In many ways, to reach the age of 40 for an organisation is quite an achievement. Emerald’s own history began in 1967 with the acquisition of one journal, Management Decision. The company was begun as a part-time enterprise by a group of senior management academics from Bradford Management Centre. The decision to found the company, known as MCB University Press until 2001, was made due to a general dissatisfaction with the opportunities to publish in management and the limited international publishing distribution outlets at this time. Through the creation and development of the journals, not only was this particular goal achieved, but also the foundations of a successful business were laid. By 1970 the first full-time employee was appointed and by 1975 there were five members of staff on the pay roll. In 1981, there were 20 members of staff and three years later the company had grown to a size that meant we had to move to larger premises – one half of the current site at 62 Toller Lane. Through the 1990s Emerald came of age. In 1990 the first marketing database was introduced and several years later we acquired a number of engineering journals to add to our increasing portfolio of management titles. The IT revolution also began to impact on the publishing and content delivery processes during this period. Writing in 2007, it seems hard to remember a time when information was not available at the click of a button and articles were not written and supplied in electronic format – and yet it was only 11 years ago that Emerald launched the online digital collection of articles as a database. The move was seen as pioneering and helped to shape the future of the company thereafter. The name of the database was Emerald (the Electronic Management Research Library Database) and in 2001 we adopted this name for the company. So, how does Emerald look in 2007? Emerald has grown into an important journal publisher on the world stage. The company now publishes over 150 journals and we have more than 160 members of staff. Emerald has always stressed the importance of internationality and relevance to practice in its publishing philosophy. These two principles remain the cornerstones of our editorial objective. The link between the organisation and academe that was so crucial in the foundation of the company continues to influence corporate thinking; we uphold the principle of theory into practice. Emerald also continues to carry the tag of an innovative company. Through our professionalism and focus on building strong networks with our various communities, we have launched and developed initiatives such as the Literati Network for our authors and a dedicated web site for managers. These innovations and many more, help to set us apart from other publishers. For this reason, we feel confident in stating

that we are the world’s leading publisher of management journals and databases. It is important to us that we continue to strengthen the links with our readers and authors and to encourage research that is relevant across the globe. In more recent history we have, for example, awarded research grants in Africa, China and India. We also opened offices in China and India in 2006, adding to our existing offices in Australia, Malaysia, Japan and the USA. We would like to thank the editors, editorial advisory board members, authors, advisers, colleagues and contacts who, for the past 40 years, have contributed to the success of Emerald. We look forward to working with you for many years to come. Rebecca Marsh Director of Editorial and Production

Note from the publisher

107