313 44 23MB
English Pages 437 [440] Year 2012
Indexing and Retrieval of Non-Text Information
Knowledge and Information
Studies in Information Science
Editor-in-chief Wolfgang G. Stock (Düsseldorf, Germany) Editorial Board Ronald E. Day (Bloomington, Indiana, U.S.A.) Sonja Gust von Loh (Düsseldorf, Germany) – Associate Editor Richard J. Hartley (Manchester, U.K.) Robert M. Hayes (Los Angeles, California, U.S.A.) Peter Ingwersen (Copenhagen, Denmark) Michel J. Menou (Les Rosiers sur Loire, France) Stefano Mizzaro (Udine, Italy) Christian Schlögl (Graz, Austria) Sirje Virkus (Tallinn, Estonia)
Indexing and Retrieval of Non-Text Information Edited by Diane Rasmussen Neal
ISBN 978-3-11-026057-1 e-ISBN 978-3-11-026058-8 ISSN 1868-842X Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.dnb.de. ©2012 Walter deGruyter GmbH, Berlin/Boston Printing: Hubert & Co. GmbH & Co. KG, Göttingen ∞ Printed on acid-free paper Printed in Germany www.degruyter.com
Table of contents Diane Rasmussen Neal Introduction to indexing and retrieval of non-text information
1
Part I: Literature reviews and theoretical frameworks Jason Neal Chapter 1. Precedent or preference? The construction of genre and music recommender systems 15 Elaine Ménard Chapter 2. Multilingual taxonomy development for ordinary images: Issues and challenges 40 Chris Landbeck Chapter 3. Access to editorial cartoons: The state of the art 59 Part II: Information behaviour studies Diane Rasmussen Neal, Niall Conroy Chapter 4. Information behaviour and music information retrieval systems: Using user accounts to guide design 83 Margaret Lam, Matt Ratto Chapter 5. Seeking what we have yet to know: A user-centred approach to designing music knowledge platforms 111 Athena Salaba, Yin Zhang Chapter 6. Searching for music: End-user perspectives on system features
137
Yin Zhang, Athena Salaba Chapter 7. A user study of moving image retrieval systems and system design implications for library catalogues 160
VI
Table of contents
Part III: Empirical knowledge organization studies Abebe Rorissa, Diane Rasmussen Neal, Jonathan Muckell, Alex Chaucer Chapter 8. An exploration of tags assigned to geotagged still and moving images on Flickr 185 Maayan Zhitomirsky-Geffet, Judit Bar-Ilan, Yitzchak Miller, Snunith Shoham Chapter 9. Exploring the effectiveness of ontology based tagging versus free text tagging 212 Kathryn La Barre, Rosa Inês de Novais Cordeiro Chapter 10. That obscure object of desire: Facets for film access and discovery 234 Olha Buchel Chapter 11. Designing and visualizing faceted geospatial ontologies from library knowledge organization systems 263 Part IV: Case studies Paweł Rygiel Chapter 12. Subject indexing of images: Architectural objects with complicated history 287 Renata Maria Abrantes Baracho Porto, Beatriz Valadares Cendón Chapter 13. An image based retrieval system for engineering drawings Kathrin Knautz Chapter 14. Emotion felt and depicted: Consequences for multimedia retrieval 343 Tobias Siebenlist, Kathrin Knautz Chapter 15. The critical role of the cold-start problem and incentive systems in emotional Web 2.0 services 376 Caroline Whippey Chapter 16. Non-textual information in gaming: A case study of World of Warcraft 406 Index
429
314
Diane Rasmussen Neal
Introduction to indexing and retrieval of non-text information Abstract: This introduction to the edited book Indexing and retrieval of non-text information provides an overview of non-text information as a fact of everyday life as well as an area of information science practice and research. Additionally, a summary of the chapters in the book is provided. Keywords: Indexing, retrieval, information science, non-text information
Diane Rasmussen Neal, Assistant Professor, Faculty of Information and Media Studies, The University of Western Ontario, [email protected]
What is non-text information? As an information science researcher who studies the indexing and retrieval of documents that exist in a format other than text, I struggle with comprehending the scope and conceptualizing the immense breadth and depth of this area daily. Academically and practically, it is typically neither productive nor accurate to define what something is by what it is not, but perhaps it is human nature to do so when we do not have a tidy category for a set of items. In this volume, we explore the challenges – and delights – surrounding human interactions with objects that do not communicate using words as their native language. Non-text information surrounds us. We move around the world every day by engaging constantly with non-text information such as inanimate objects, ephemeral events, and interpersonal cues without thinking about it. For example, the chair you are sitting in told you, “You can sit here” without a textual label. The body language of the person you were talking with at the coffee shop told you that she needed to leave as she kept looking at the time. The snow on the lawn told you it is winter. In response to these communications, you sat down, wrapped up the conversation, and shovelled the driveway, respectively. All these events happened without necessarily involving one word. But, it is important to keep in mind that based on our human communication conventions, words are also essential to interacting with non-words. You can determine the need to shovel the snow, and execute the shovelling, without
2
Rasmussen Neal
words, but if your partner moved the snow shovel without your knowledge and you need to find out where it is, words become essential for information gathering. They could instantiate themselves in many forms: an email, a text message, or a verbal exchange, for example. That said, the look on your face in a verbal interaction may indicate to your partner – without the use of words – the fact that you are not happy that the shovel was not put back in its usual place! Based on my theoretical musings and empirical research, I posit that text and non-text forms of information elucidate one another (Neal, 2010a, 2010b). Either one can exist exclusively, but together, they provide more information. You can deduce that you are in a coffee shop by smelling the familiar aroma, by seeing the espresso machines, by observing others drinking hot beverages, and so on. But, you do not know exactly what the delicious-looking pastries are without asking an employee or reading the labels in front of them, and you cannot find out how much the shop charges for a latte without reading or asking. Through this, we can see that different types of information are communicated more effectively through different modalities of delivery. To explore this point further, let us consider one type of document that is typically both text and non-text: a map. A map contains a non-textual representation of a geographic area; depending on the map’s features, we can see the layout of the roads, the shapes of cities, and the topography of the mountains. Figure 1 shows a satellite image of the Denver, Colorado, USA area, and Figure 2 illustrates a map with highways and textual labels of the same area. Separately, they give us different pieces of information; together, they give us a bigger picture of the land, its features, and its human-constructed transportation and geopolitical divisions. Without the text, we would probably not know what the map is “of”. Without the non-text, we would only have a list of roads and cities, and Figure 2 would not exist at all.
Introduction to indexing and retrieval of non-text information
Figure 1. Satellite image of Denver, Colorado, USA. Source: Google Maps.
Figure 2. Map of Denver, Colorado, USA. Source: Google Maps.
3
4
Rasmussen Neal
Non-text information in library and information science: Lost in translation To date, the majority of practice and research in library and information science has focused on text-based resources and methods of representation. In the academy, we teach future information professionals how to do monograph cataloguing and how to search databases for journal articles. Faculty members perform research into how people construct text-based queries on search engines, and how to design retrieval systems that meet the needs of users. Libraries follow suit in practice, as they provide access to e-books, subscribe to electronic resources containing articles, and promote literacy. Undeniably, textual information is important in our culture; bibliometrics would likely demonstrate that our field has researched the textual much more than the non-textual; research has not caught up with today’s online reality. Libraries, cultural heritage institutions, universities, and related areas spend substantial time and money on providing, organizing, and interacting with electronic non-textual items. Just as we interact with both text and non-text in the real world, we do the same online; practice and research must reflect this fact. Consider the following scenario. John gets online after a full day of work to enjoy some quiet time before bed. First, he watches clips from a television show on YouTube. Then, he checks his personal email while listening to music on last. fm. In his email, he finds that his friend has sent him a YouTube link of a Chihuahua standing on his hind legs and running in circles while begging his owner for carrots, so he watches that as well, and leaves a comment on the video’s page about how funny it was. As a result of watching the video, he thinks that the dog, a Chihuahua, might be a fun breed for his family to have as a pet, so he looks for Chihuahuas available through petfinder.com, a website that shows photos of and information about pets available for adoption. He finds an adorable Chihuahua that is available and appears to be located in a city near him, so he uses Google Maps to find out how far away the Chihuahua’s current city is from his house. It’s not far, so he emails the dog’s foster home to get more information and additional photos. He concludes his evening Internet time by playing the Massive Multiplayer Online game World of Warcraft (WoW) for an hour before bed, which allows him to do text and voice chat with friends as they navigate their characters through Nagrand. Nagrand is a zone of WoW’s immersive, fantasy-based world called Azeroth. (As one of my gamer friends so eloquently described it, Nagrand has “waterfalls, floating pieces of earth-like islands in the air.”) John interacted with a variety of textual and non-textual sources of information during his time online. But do our current online systems meet our needs for
Introduction to indexing and retrieval of non-text information
5
searching, retrieving, and interacting with non-text documents as well as they could? I argue no. Thinking about our scenario, John had to interact with textual information to find non-textual information: – He typed in the name of the television show he wanted to watch; – He had to provide traditional bibliographic information to find music, such as artist, song title, and so on; – He had to read an email to link to the Chihuahua video; – He searched for “Chihuahua” to find photos of available Chihuahuas; – He typed in city names to find directions to the Chihuahua’s city; – In WoW, he typed chat messages to and read chat messages from his friends so they could navigate their visually-existing characters through the game together. These interactions sound like the behaviours that we all encounter when we go online every day. But we do not necessarily think about the amount of online information we encounter that is not textual. Library and information science as a field is familiar with the subjectivity present in indexing a document, and the associated difficulties with retrieving by subject. If there are books in the library catalogue about “dogs,” and they are only described as such, John might not find any of the books if he only searches for “Chihuahuas.” This gap between the indexing, the searching, and the retrieving of documents is even wider with nontextual items due to the translation problem. The translation problem is not my idea. We can look to authors such as O’Connor and Wyatt (2004), who discuss the fact that words are not pictures, and vice versa. Svenonius (1994) explored the problem of “lack of translatability between different media on subject indexing” (p. 600). According to Svenonius’ strongly supported argument, the language of words is fundamentally different from the language of visual art or music. Hospers, as cited by Svenonius, discusses the iconic qualities of musical language: Thus, to take an obvious example, the patterns of rising and falling, crescendo and diminuendo, rising gradually to a climax and then concluding (such as are to be found in the “Liebestod” of Wagner’s Tristan und Isolde) possess a considerable structural similarity or isomorphism with, the rhythm of the sexual climax. The pattern of the slow movement of Beethoven’s Quartet No. 16, Op. 135, is similar to the voice inflection of a person asking questions and then answering them. (Hospers, 1972, p. 48, as cited in Svenonius, 1994, p. 604)
These iconic representations exist in textual materials as well; consider the way a historical novel makes us feel as though we are living in that time and place when we read them, for example. It can be difficult for librarians to describe nontext materials without any accompanying textual metadata: a photo of children
6
Rasmussen Neal
standing in a field, for example, tells us a few things on its own, but it does not tell us the location the photo was taken, the names of the children, the name of the photographer, when it was taken, what it is “about”, and so on (Neal, 2006, 2008). In my mind, the complications associated with the subjectivity in aboutness rests with Wilson’s (1968) observations: “what seems to stand out depends on us as well as on the writing, on what we are ready to notice, what catches our interest, what absorbs our attention” (p. 82). There can be something unspeakably personal in our iconic associations with, and preferences for, non-textual documents. When I look at a photograph of my late father, I am filled with many emotions: loving remembrance of the wonderful man he was, anger at the illness that took his life, hope that he is proud of me. When I hear my favourite musical work, Wolfgang Amadeus Mozart’s Requiem, I am overtaken with joy at its beauty, but it makes me cry for verbally unexplainable reasons. These personal iconic qualities could not be indexed by a librarian, nor even represented by any text-based tags that I could provide to describe music or images with personal or iconic meaning for me. That said, I would argue that using these intangible qualities for search (as well as serendipitous exploration or browsing) is a desirable trait of information retrieval systems that are yet to be developed. Given the flexibility that Internet technologies afford us, and given the lack of spatial constraints that printed library collections impose on us, why should we be limited to only search by bibliographic methods of description such as title, author, or even the controversial “subject”? At the time of this writing, one of my current research projects, entitled “Using affect-based labels in whole collection retrieval” and funded by the Social Sciences and Humanities Research Council of Canada, is questioning these siloed and restricted models of indexing and retrieval. For example, if I mark a Flickr photograph as a “Favourite”, what are some YouTube videos or Twitter feeds that I might also enjoy – via connections that take me beyond the standard recommendation points of subject, genre, time, and place? The notion of a “collection” is difficult within the context of non-text online information as well. Digital repositories are appearing from cultural heritage institutions all over the world, but they are difficult to locate due to, once again, their siloed nature. Google Images searches are much more likely to find images stored on commercial Web sites than on those of non-profit organizations, due to non-profits’ lack of Search Engine Optimization (SEO) utilization and other factors that complicate today’s Google-dominant search engine landscape. The balance that our field is trying to negotiate between control and lack of control over systems of representation (Neal, 2006) almost seems out of date when we consider the current online environment. People are uploading, tagging, commenting on, and searching for photographs, videos, and music at exponentially
Introduction to indexing and retrieval of non-text information
7
growing rates. At the same time, our field continues to utilize traditional methods of description and access such as the Library of Congress Subject Headings. We ponder the feasibility of Semantic Web technologies over a period of many years, but they are not being recognized at the speed of the Web. We discuss ontological approaches, while rarely implementing them. Are new approaches possible? Do we need words to describe non-textual objects? Explorations into new approaches such as Rorvig, Turner, and Moncada’s (1999) NASA Image Collection Visual Thesaurus, which attempted to provide visual indexing for photographs by using images as surrogates for other images, have not been developed fully and successfully. Content-based image retrieval, which is explored in computer science and the more technical corners of information science, relies on physical similarities such as shapes, colours, and patterns that can be uncovered between images. Music information retrieval research centres on similarities of musical content as well. These methods, however, do not address semantic, iconic, or other facets that are emotionally or intellectually meaningful to the user. Library and information scientists might believe that user-supplied tagging on websites such as the photograph sharing website Flickr is not the best approach to access and retrieval. We are powerless to stop it, but at the same time, we do not act to improve upon these fun-to-use structures that nimble Internet start-up companies are developing and deploying with or without us. At the time of this writing, Pinterest.com, a social media site that allows users to “pin up” visual collections of items they like. In early 2012, Pinterest “hit 11.7 million unique monthly U.S. visitors, crossing the 10 million mark faster than any other standalone site in history” (Constine, 2012). Our field is positioned to not just assist with, but to create, new and innovative approaches to online, non-text, information representation, access, and retrieval. The chapters in this volume that I have been privileged enough to assemble and edit are all a means toward that end. I enter into an introduction of the chapters, and the chapters themselves, with one question for the reader: What is needed to understand, grasp, and deploy unrealized approaches to non-text indexing and retrieval that will help the world discover the plethora of online non-text documents in refreshingly connected ways?
In this volume I am quite pleased to have edited this diverse collection of papers. Given my discussion of what “non-text” is (or is not), the composition of such a volume would be diverse in any instantiation. To take it a step further, however, this volume pre-
8
Rasmussen Neal
sents a tremendously impressive span of theoretical and empirical backgrounds as well as research experience levels to help us explore the wide range of non-text information research. It is decidedly multicultural as well: researchers working at universities or originate from countries such as Brazil, Canada, China, Ethiopia, Germany, Greece, Israel, Poland, the United States, and Ukraine. The idea for this volume was conceived in 2010, when I discussed my thenforthcoming 2011 doctoral seminar on non-text indexing and retrieval with Knowledge & Information Series editor Prof. Wolfgang Stock. He believed that non-text indexing and retrieval research necessitated an up-to-date volume, and asked whether my students would want to contribute chapters. After the contract was signed, I built in the writing of a chapter suitable for publication in this book as a seminar requirement. Working with my students was such a rewarding experience, and they valued the guidance through their first publication experience. It helped me remain cognizant of the challenges I faced surrounding learning how to publish when I was a doctoral student myself; this is a lesson none of us should forget. As a result, I am proud that this volume contains many chapters from PhD students and other new researchers. While this equalled more work for me as the editor than a volume containing works from seasoned researchers might have, it contains ideas that are fresh, visionary, and unafraid to question current paradigms of understanding. All chapters went through double-blind peer review from at least two non-text researchers who were knowledgeable about the literature and the employed methodologies. Part I, “Literature reviews and theoretical frameworks,” does not present chapters with completed empirical studies, but rather new ways of thinking about non-text indexing and retrieval paradigms through alternative, well-supported lenses. In Chapter 1, Jason Neal critiques the pervasiveness of genre as a way of categorizing music: how could the music information retrieval community of scholars dig deeper in its efforts to support people’s ability to discover music they like that they might not otherwise know about? Elaine Ménard’s Chapter 2 provides a theoretical framework toward developing a multilingual taxonomy for images: it is clear through this chapter that there are more issues in linguistic translation involved than one might imagine. For Chapter 3, Chris Landbeck looks at an underserved topic: access to political cartoons, which is an important area for preserving our historical, political heritages. Part II, “Information behaviour studies,” explores user behaviours surrounding how people search for, describe, and discuss non-text information. Three of the four chapters in this part use music as their topic, but they use very different methods. In Chapter 4, my student Niall Conroy and I present a content analysis of blog excerpts that discuss how bloggers use and look for music, and what these behaviours could mean for system design. Chapter 5 outlines Margaret
Introduction to indexing and retrieval of non-text information
9
Lam’s master’s thesis work, in which she developed in-depth “domain-specific user profiles” of non-musicians to understand holistically how people experience music and therefore music information systems. Athena Salaba and Yin Zhang contributed two co-authored chapters to this part: their study in Chapter 6 presents user feedback on a music retrieval system, and Chapter 7 outlines users’ thoughts on two different moving image retrieval systems. Part III, “Empirical knowledge organization studies,” includes studies of knowledge organization systems used for indexing non-text documents. In Chapter 8, Abebe Rorissa, his students, and I explore the specificity levels of geotags (tags containing geographic place names) on a sizeable sample of Flickr photographs and videos. Maayan Zhitomirsky-Geffet and her colleagues explore whether ontologies or folksonomies might be more useful for image retrieval, and examine what influences the usefulness of each method, in Chapter 9. Kathryn La Barre and Rosa Inês de Novais Cordiero provide the results of their cross-cultural study in Chapter 10, which asked Americans and Brazilians to summarize and provide keywords for short films. With Chapter 11, also a facet study, Olha Buchel explains the benefits of metadata facets in ontologies for geographic information in a map mashup. Finally, Part IV, “Case studies,” provides specialized examinations of various non-text settings. Chapter 12, the only chapter contributed by a practitioner (Paweł Rygiel), explains the issues involved in describing images of architectural objects that have been under the control of different countries over time. In Chapter 13, we see the technical side of designing image retrieval systems for engineering, with a focus on the associated classification system and system prototype testing, thanks to Renata Maria Abrantes Baracho Porto and Beatriz Valadares Cendón. Chapters 14 and 15, by Kathrin Knautz and Tobias Siebenlist, present information about the development of their search engine MEMOSE, which seeks to gather crowdsourced, emotion-based ratings of “felt” and “depicted” emotions for images, music, and videos. Chapter 14 focuses on theories of emotion and their ties to emotional information retrieval, while Chapter 15 explains the issue with gathering user-generated data for retrieval (how do you leverage it before it’s there?) and suggests gamification as a motivational tool for users to provide it. The book concludes with a chapter on the visuals and the sound present in WoW, and how its author, seasoned player Caroline Whippey, experienced the sights and sounds of the game through autoethnography.
10
Rasmussen Neal
Acknowledgements First and foremost, I want to thank the authors and anonymous reviewers, and the Knowledge & Information Series Editor Wolfgang Stock, who made this book possible. Along the way, I have benefitted from so many researchers’ writings and mentorships, including Brian O’Connor, Shawne Miksa, Samantha Hastings, Howard Greisdorf, Gary Marchionini, Abby Goodrum, and Corinne Jörgensen. I would like to thank the late Mark Rorvig for setting me on this amazing path when I was a mere master’s student. The PhD students who took my doctoral seminar on non-text information – Niall Conroy and Caroline Whippey – know how to ask the right questions, which frequently means I cannot yet answer them! Caroline was working as one of my research assistants as I was finishing this project, and she provided invaluable, proficient editorial assistance at the end of the project, when I needed it most. Of course, I need to thank my administrators and colleagues in the Faculty of Information and Media Studies at The University of Western Ontario for providing the collegial, supportive atmosphere needed to complete a creative project such as this one. I am forever indebted to the American Society for Information Science and Technology for providing me with a professional home throughout my career to date. Finally, I would like to thank my friends and family for believing in me, and my fellow members of the Vanguard Gaming guild for providing hours of evening fun after long days of double-checking comma placements in reference lists.
References Constine, J. (2012, February 7). Pinterest hits 10 million U.S. monthly uniques faster than any standalone site ever –comScore [Blog post]. Retrieved from http: // techcrunch. com / 2012 / 02 / 07 / pinterest-monthly-uniques / Neal, D. R. (2006). News photography image retrieval practices: Locus of control in two contexts. (Unpublished doctoral dissertation). University of North Texas, Denton, TX. Neal, D. (2008). News photographers, librarians, tags, and controlled vocabularies: Balancing the forces. Journal of Library Metadata, 8(3), 199 – 219. Neal, D. (2010a). Emotion-based tags in photographic documents: The interplay of text, image, and social influence. Canadian Journal of Information and Library Science, 34(3), 329 – 353. Neal, D. (2010b). Breaking in and out of the silos: What makes for a happy photograph cluster? Paper presented at the 2010 Document Academy Conference (DOCAM ‘10), University of North Texas, Denton, TX, USA. O’Connor, B. C., & Wyatt, R. B. (2004). Photo provocations: Thinking in, with, and about photographs. Lanham, MD: Scarecrow Press.
Introduction to indexing and retrieval of non-text information
11
Rorvig, M. E., Turner, C. H., & Moncada, J. (1999). The NASA Image Collection Visual Thesaurus. Journal of the American Society for Information Science, 50(9), 794 – 798. Svenonius, E. (1994). Access to nonbook materials: The limits of subject indexing for visual and aural languages. Journal of the American Society for Information Science, 45(8), 600 – 606. Wilson, P. (1968). Two kinds of power: An essay on bibliographical control. Berkeley, CA: University of California Press.
Part I: Literature reviews and theoretical frameworks
Jason Neal
Chapter 1. Precedent or preference? The construction of genre and music recommender systems Abstract: With the advent of user-generated content and the capabilities of current information and communication technologies, indexing and retrieval tools for music should facilitate discovery that transcends genre boundaries. Nonetheless, they still privilege genre as the primary mode of categorization. Even recommender systems, which utilize other measures to determine similarity, give the appearance of drawing upon genre. By examining the ambiguous boundaries and definitions of genres, the contexts in which indexing and retrieval tools for music have developed, and the roles played by music at individual and societal levels, this paper considers alternative traits that could act as indicators of “similarity.” Keywords: Music, genre, classification, indexing and retrieval, recommender systems, MIR, tagging, cross-genre
Jason Neal, Doctoral Student and Research Assistant, Faculty of Information and Media Studies, The University of Western Ontario, [email protected]
Introduction: Genre trouble Whether by design or default, many indexing and retrieval systems draw upon genre as the primary way of categorizing music. Nonetheless, with multiple record manufacturers, music retailers, record charts, websites specifically focused on music, and music-related publishers contributing to the construction of genres and subgenres, disagreement exists on their boundaries and parameters (Aucouturier & Pachet, 2003). As an example, websites for music content providers identify varying numbers of possible genres, ranging from less than 10 to almost 700 (Lippens, Martens, Leman, Baets, & Meyer, 2004). Even among music retailer websites that identify hundreds of genres, the same song may appear under different ones and the same genre may appear at different hierarchical levels among individual taxonomies (Aucouturier & Pachet, 2003; Pachet & Cazaly, 2000). In addition to multiple cultural and market-based boundaries for genres, individual
16
Neal
listeners have their own opinions about the appropriateness of assigning music to a specific one (Cunningham, Reeves, & Britland, 2003; Lippens et al., 2004). Definitions of broad categories like “classical” and “pop” underscore the problematic nature of genre. For its entry on classical, The concise Oxford dictionary of music outlines the vague nature of the term. It can refer to symphonic music composed in the late eighteenth and early nineteenth centuries, or more broadly to music deemed as having “permanent” value. In addition, classical can act as a binary opposite to music with more popular appeal (Kennedy & Kennedy, 2007). Ross critiques the underlying assumptions and consequences of the term: I hate “classical music”: not the thing but the name. It traps a tenaciously living art in a theme park of the past … The phrase is a masterpiece of negative publicity, a tour de force of anti-hype … For at least a century, the music has been captive to a cult of mediocre élitism that tries to manufacture self-esteem by clutching at empty formulas of intellectual superiority … Yes, the music can be great and serious; but greatness and seriousness are not its defining characteristics. It can also be stupid, vulgar, and insane. (2004, para. 1 – 2)
In other words, classical music can share some stereotypical characteristics associated with “popular” music. At least since the mid-twentieth century, pop and popular have encompassed a range of “non-classical” music genres and subgenres, which usually consist of songs performed by groups that include a vocalist, backed by guitar and drum players. It can also refer to orchestral performances of relatively “accessible” classical music, going back to the London Popular Concerts of the nineteenth century (Kennedy & Kennedy, 2007). The varying numbers of possible genres, along with their vague boundaries and definitions, point to the paradoxical nature of genre’s pervasiveness in classifying music recordings. This chapter examines the role genre has played within cataloguing and classification systems, as well as its indirect influence on recommender features in online retailers and social networking sites. In considering alternatives to genre categorization, the paper explores interactions that have already occurred between classical and popular music, two genres typically portrayed in the broader culture as polar opposites. Relevant contributions from cognitive psychology and the social sciences provide some possible explanations for the connections that listeners could make among music from different genres, while some research in the field of music information retrieval (MIR) offers tentative prospects for developing broadly-based systems that enable people to engage in cross-genre music discovery.
Chapter 1. Precedent or preference?
17
Genre within indexing and retrieval systems Libraries: General background With the growth of sound recording collections in libraries throughout the twentieth century, the Music Library Association (MLA) responded by developing rules for music indexing and retrieval. They included the Code for Cataloging Phonograph Records in 1942, as well as the Code for Cataloging Music and Phonorecords in 1958 (Bryant & Marco, 1985). Even with MLA’s standards, a 1963 study by Gordon Stevenson found that 90 % of surveyed libraries created their own classification systems to organize physical music recordings (Stevenson, 1973; McKnight, 2002). Within open stacks collections, users tend to prefer an alphabetic arrangement, usually by composer or performer (Bryant & Marco, 1985). Because the length of classification numbers can become unwieldy for physical music recordings, Dewey Decimal and Library of Congress systems are poor facilitators of open stacks browsing. The latter eschews its own classification system for music recordings, preferring to shelve them by manufacturer label and item numbers. Libraries with closed stacks music collections follow similar practices (McKnight, 2002).
Classification Carol Saheb-Ettaba and Roger B. McFarland proposed a four-level hierarchical classification standard in their 1969 publication Alpha-Numeric System for Classification of Sound Recordings, also known as ANSCR. The top two elements of ANSCR reflect the preferred browsing habits of people looking for music (Cunningham et al., 2003; McKnight, 2002). The primary element, General Category or Subject, has at least one letter to signify a broad subject category, such as genre or performance type. A second letter at the same level may indicate sub-categories, such as sub-genres or primary instrument. Filing Element for Author or Subcategory appears at the secondary level, consisting of four letters to identify the name of an individual or group responsible for the music (Bryant & Marco, 1985; McKnight, 2002). It focuses on the names of composers or performers, depending on whether the recording consists of “Western art music” (another name used for the broadly colloquial definition of classical music) or popular music. In the latter, performers may include individuals or groups, whichever is better known. Even though the tertiary level of ANSCR has the name Filing Element for Title, the secondary element may draw upon the
18
Neal
album title if it is “better known” than the names of composers or performers. Such an assumption usually applies to recordings of musicals or soundtracks. Other alternatives to author or title include a “collections” entry for music compilations, as well as “ethnic and geographic” entries for folk and traditional music (McKnight, 2002). ANSCR provides a functional standard for organizing physical music recordings. Due to the constraints of such media, however, it limits the ways in which people can browse them. With genre’s prominence at the primary level of ANSCR, recordings may be filed in genres that differ from those that browsers might find more suitable (Cunningham et al., 2003). Genre categories also do not easily fit “crossover” recordings, which typically consist of different musical styles or performers associated with different genres (McKnight, 2002).
Cataloguing Around the same time that Saheb-Ettaba and McFarland developed ANSCR, a number of changes began to occur in cataloguing techniques for materials in many media. They included the publication of the first edition of Anglo-American Cataloguing Rules (AACR), as well as the Library of Congress’ development of MAchine Readable Cataloguing (MARC). During this period of standardization, libraries also began to build bibliographic networks, which enabled them to share catalogue record information electronically. The first such network was the Online Computer Library Center (OCLC), which evolved from a consortium of libraries in Ohio to an international network. With records contributed by the Library of Congress and other member libraries, technical services staff no longer needed to duplicate effort. Instead, they could focus on modifying pre-existing records for local catalogues, and on creating original records for unique materials (Rehbach, 1989; Papakhian, 2000). By the 1980s, such practices had become ubiquitous in many libraries. Online public access catalogues (OPACs) were replacing card catalogues, facilitating relatively easier searching, the ability to find items more quickly, and greater flexibility in search capabilities. The costs of related technologies had decreased, computers had become much smaller, and discs enabled portable storage for larger amounts of information. With the development of integrated library systems (ILS), both users and library staff could track the status of specific items (Rehbach, 1989). The Music Library Association (MLA) developed its own version of MARC in the 1970s, providing rules for entering information into catalogue records for scores and recordings (Rehbach, 1989). Some aspects of music cataloguing are
Chapter 1. Precedent or preference?
19
relatively straightforward, such as physical format. Other cataloguing decisions, however, provide challenges similar to those found in ANSCR. As an example, the main entry of catalogue records for popular music recordings lists performers first and composers last. The reverse is true for “serious” music, yet another term used for Western art and (colloquially speaking) classical music. For recordings with works by multiple composers, the main entry is usually the manufacturer’s given title to the recording (Bryant & Marco, 1985). Also like ANSCR, music cataloguing practices can overemphasize genre’s importance. As McKnight (2002) points out, “… libraries have traditionally treated music more abstractly, choosing to focus on the form or genre of a musical work or the instrumentation of that work, rather than on any extramusical meaning it might convey” (p. 2). Along with stores that sell music (Cunningham et al., 2003), libraries have traditionally had static systems for categorizing sound recordings (Kanters, 2009). By drawing upon genre as a primary element for classification, the physical organization of recordings or linked subject headings within surrogates make implicit recommendations based on that aspect. Consequently, users have difficulty finding music recordings that could share a variety of potentially meaningful similarities.
Online music vendors With Internet usage increasing exponentially throughout the early and mid1990s, Wall Street hedge fund analyst Jeff Bezos established the online bookstore Amazon.com. When it became more successful than anticipated, Bezos decided to expand its product offerings beyond books. This trend began with selling music in mid-1998. Among e-commerce sites, Amazon.com had become dominant in that market within months (Rivlin, 2005; Umesh, Huynh, & Jessup, 2005). The ambition and success of Amazon.com, along with its development of a system to recommend additional products to its customers, are worth noting in relation to genre. As part of its commercial mission, Amazon.com developed an in-house algorithm to facilitate “item-to-item collaborative filtering.” It enables a registered customer to log into her personal account, and to view recommendations for products deemed as “similar” to others she has already purchased or rated. The algorithm takes into account items that other customers have tended to purchase together, rather than specific groupings from individual purchases. The latter is relatively rare, and similarity metrics would be inefficient to compute (Linden, Smith, & York, 2003). With its efficiency, the Amazon.com recommender system may prove sufficient for users who wish to look for more items associated with specific genres or
20
Neal
creators. That said, it does not draw upon those aspects directly. Rather, the itemto-item collaborative algorithm can bias it in such directions. Instead of drawing upon the content data of items, the system relies on navigational data and ratings. Furthermore, it provides no other explanation that could further justify its recommendations (Symeonidis, Nanopoulos, & Manolopoulos, 2008). It also does not take into account other reasons why some users might have activity in different music genres, such as the possibility that preferences for a variety of other characteristics of music could account for seemingly idiosyncratic inclinations (De Pessemier, Deryckere, & Martens, 2009).
Personal digital music libraries As Amazon.com established itself as a major presence in e-commerce, the merging of audio content and descriptive data had become more feasible. Digital recording technology, which fostered the replacement of cassettes and long play (LP) albums with compact discs (CDs) during the 1980s, already possessed this potential. Rather than recording on tape, studios began to capture digitized sound directly on computers for later transfer to disc. Following this trend, recordings had the potential to move from discrete physical formats to request-based “gigantic jukebox” systems (Groknow & Saunio, 1998). Hard drive storage capacities had also increased on personal computers, but they remained insufficient for large-scale storage of audio files. The MP3 format enabled users to compress audio files to maximize storage capacity, and to download or share them relatively quickly. Even with the legal and ethical debates over MP3-enabled functions, electronics manufacturers introduced devices that made personal music collections more portable. With both metadata and audio files stored in a personal computer’s hard drive, users could organize music their own way and transfer music collections to an MP3 player. Apple is well-known for integrating such capabilities by manufacturing the iPod and enabling users to set up personal iTunes libraries. It has also become a leader among online music vendors by selling digital downloads of songs and albums from its iTunes store (Blake, 2010; Jennings, 2007; Kanters, 2009). Since they typically belong to individuals, however, personal digital music libraries remain limited to pre-selected content. This limitation poses challenges to users whose collections contain music from specific genres, but who would like to find similar music from others.
Chapter 1. Precedent or preference?
21
Social network sites The term “social media” and its associated colloquialism “Web 2.0” started to enter common parlance in the mid-2000s. Although Amazon.com enabled users to contribute product reviews and ratings prior to that time, both terms refer specifically to users having higher levels of control over online content in a variety of formats, and being able to connect with people who share similar interests. On more general social networks like Facebook, users can pass along recommendations to friends, or to groups of people who share common interests (Jennings, 2007). Such actions may take the form of posting links found in other social network sites like YouTube. Its name implies an emphasis on video postings, but some YouTube clips focus more closely on musical content; in such cases, their visual content may be of secondary interest, or even perfunctory. Some social network sites, including last.fm and Pandora, focus primarily on music itself. Rather than use controlled vocabulary terms, users assign tags to musical works or their creators to enable future retrieval (Jennings, 2007; Kanters, 2009). Tags may signify objective aspects like date of creation (Jennings, 2007), melody, or instrument (Cui, Liu, Pu, Shen, & Tan, 2007). They can also pertain to more subjective associations, including emotional states, moods (Bischoff, Claudiu, Firan, Nejdl, & Paiu, 2009; Kanters, 2009; Neal, et al., 2009; Hu & Downie 2010), opinions (Lamere, 2008), and individual contextual meanings (Bischoff et al., 2009). On sites like last.fm, the same piece of music may have tags for different genres. In fact, genre is the most frequently used type of tag on that site (Lamere, 2008). Other indexing and retrieval systems, including library catalogues (Spiteri, 2006) and online vendor sites (Amazon.com, 2011), have also incorporated tagging. Tags appear to provide a more democratic alternative to controlled vocabulary. Taggers can place music in multiple genres, demonstrating the “fuzziness” of such boundaries and compensating for top-down systems with more conservative approaches to assigning genre. Weighting of tags can reduce the possibility of misleading tagging practices (Lamere, 2008). Still, tags contain less information about specific preferences than more detailed ratings and reviews (Vig, Soukup, Sen, & Riedl, 2010). They also tend to focus on descriptive information for whole audio files, rather than specific content (Law & von Ahn, 2009). Like Amazon. com, music social network sites draw upon collaborative filtering to make recommendations. They assess similarity by drawing upon user profiles, but recommendations generally remain within the realm of pre-existing genre categories (De Pessemier et al., 2009; Jennings, 2007). A study of last.fm by Neal et al. (2009) illustrates how sites that enable tagging can bias towards genre. Among the top ten songs that users selected to
22
Neal
embody five different emotional categories, all come from genres and subgenres of popular music. Others, including classical, jazz, and world music, remain absent from the top ten listings (Neal et al., 2009). Typically, more popular items receive tags than less popular ones (Law & von Ahn, 2009), and systems that enable tagging reflect the biases of its user base (Lamere, 2008). Even with its potential for changing the ways in which music can be categorized, tagging has yet to become a viable alternative to genre. Within Pandora, the “Music Genome Project” draws upon 400 attributes besides genre to make recommendations to users. Expert music analysts use the attributes to decide how to categorize songs (Pandora, 2011). Listeners refine the system’s recommendation accuracy by rating the selections played on their individual “channels.” As Pandora’s chief technical officer points out, the system can yield matches between such performers as Cyndi Lauper and Norah Jones, as well as Metallica and the Indigo Girls (Bell, Bennett, Koren, & Volinsky, 2009). Nonetheless, it is worth noting that those examples remain within a broadly “popular” category. Furthermore, music experts remain at the centre of determining similarity, with users implicitly trusting that their judgments are sufficient starting points for recommendations.
Automatic genre classification Whether derived from vaguely-defined “inherent” qualities that drive preferences for certain genres of music, the precedent set by other indexing systems, or both, many studies continue to focus on using genre as a primary way of categorizing music. Starting with the work of Tzanetakis, Essl, and Cook (2001), many studies have investigated how to improve automatic genre classification, including the development of machine learning algorithms that can assess and identify music features associated with specific genres (Annesi, Basili, Gitto, Moschitti, & Petitti, 2007), repeating patterns and duration of notes (Karydis, Nanopoulos, & Manolopoulos, 2006), essential segments of waveforms (Sanden, Befus, & Zhang, 2008), signal representation (Grimaldi, Cunningham, & Kokaram, 2003; Li, Ogihara, & Li, 2003), chord progressions (Pérez-Sancho, Rizo, Kersten, & Ramirez, 2008), names of artists (McKay & Fujinaga, 2010), and assumptions about users’ music tastes (Grimaldi & Cunningham, 2004). Since the usage of “low-level” audio extractions alone impedes accuracy within automatic genre classification systems, some studies suggest complementing them with other “high-level” symbolic features (McKay & Fujinaga, 2010). Also potentially useful for automatic genre classification is the separation of musical features like harmony and instrumentation (Pérez-García, Pérez-Sancho, & Iñesta, 2010).
Chapter 1. Precedent or preference?
23
Many of the aforementioned studies (Annesi et al., 2007; Grimaldi & Cunningham, 2004; Karydis et al., 2006; Li et al., 2003), as well as others focusing on automatic genre classification (McKay & Fujinaga, 2006; McKinney & Breebaart, 2003; Pampalk, Flexer, & Widmer, 2005), acknowledge the challenges of working with the vague and inconsistent conceptualizations of genre boundaries. Some research suggests that culture-based conceptualizations of genre, as well as comparisons between automatic and human genre classification, may improve the accuracy of such systems (Cunningham et al., 2003; Lippens et al., 2004; McKay & Fujinaga, 2006). Weighting of genre classifications for the same piece offers another avenue for determining “best fits,” rather than an exact fit (Wang, Huang, Wang, Liang, & Xu, 2008). User-based conceptualizations of genre could compensate for its many ambiguities (Chen, Wright, & Nejdl, 2009; Hong, Deng, & Yan, 2008), and they may act as an additional point of comparison with expertbased terminology (Sordo, Celma, Blech, & Guaus, 2008).
A framework for alternatives to genre classification Whether derived from expert opinion, machine-generated algorithms, user-generated content, or some combination of the techniques mentioned above, classification by genre might become more “accurate” over time. Nonetheless, such activity de-emphasizes and downplays the perennial problem of clearly defining the characteristics and parameters of specific genres. It also does not account for the many interactions that have occurred among music and musicians associated with different genres (Headlam, 2000), as well as the associations individual listeners could make among music from “opposing” categories. Some of the techniques used for improving genre-based classification, such as tagging, could enable users to find similar music from different genres as well. Of course, notions of similarity will vary by individual. Refining and expanding upon previous work by Neal and Neal (2009), the following sections outline multiple perspectives from which to consider a framework for de-centralizing genre’s primacy in music categorization.
24
Neal
Classical versus popular music: Deconstructing the dichotomy A commercial for the Cadillac STS begins with German cars “waltzing” in a ballroom. A Cadillac interrupts this activity by revving its engines and careening around to Led Zeppelin’s “Rock ‘n’ Roll” (clint454, 2009). Similarly, another vehicle commercial opens briefly with a genteel classical piece portraying an exclusive golf club, which is quickly cut off by golfer Michelle Wie pulling up in a Kia Soul to a remix of the Kid Sister rap song “Pro Nails” (TheSceneBehind, 2010). Both commercials provide just a few examples of how classical and popular music are set in opposition to each other within broader cultural contexts. Even in discussing how a “Relationship Engine” could inadvertently enable exploration among genres, Jennings (2007) assumes that classical music listeners are somehow least likely to appreciate such a system: … it is not clear that it [the Relationship Engine] could differentiate between different schools, traditions, and periods of composition, if they shared acoustic and structural features. In some contexts that could be an advantage; it could help classical music listeners break out of hide-bound prejudices and pigeonholes, opening their ears to connections and similarities they had not previously considered. (pp. 111 – 112)
Indeed, “hide-bound prejudices” and stereotyping can apply to listeners who closely identify themselves with any genre. Furthermore, they obfuscate the interactions that have occurred between “classical” and “popular” music for many years. With his occasional references to rock in television broadcasts where he explained classical music, as well as such genre-defying theatre works as West Side Story and Mass, conductor and composer Leonard Bernstein went beyond the parameters associated with “his” music by giving serious attention and respect to more “popular” genres (Burton, 1994). In a more subtle way, so did his investigation into the possibility of using Chomskyan linguistics to outline “universal” rules for musical grammar (Bernstein, 1992). Such an ambition may have gone too far in assuming connections among different kinds of music and, by extension, cultures. Nonetheless, Bernstein’s spirit of inquiry suggests the possibility of exploring music beyond genre-based categories. While it need not reference potentially scientific or metaphysical notions of universality, such explorations should take into account individual and societal meanings of multitudinous musical signs. Beginning in the late 1950s, music producer Phil Spector drew upon Richard Wagner as inspiration. Spector did not share the composer’s interest in structural
Chapter 1. Precedent or preference?
25
complexity per se, but he focused on conveying a “Wagnerian” effect through recording techniques that possessed both sonic density and etherealness (Long, 2008). Spector achieved density by playing the initial track from studio speakers, with performers singing the same lyrics over the playback (Howard, 2004). To create its ethereal aspect, Spector used reverberation to simulate the presence of empty space amidst sonic density (Long, 2008). As Spector’s recording company Philles Records yielded more hits in the early 1960s, he gained licence to work more intensely on refining his Wagnerian approach to rock, such as bringing in multiple bass players and setting recording volume levels that bordered on distortion. Even if some of his requests were technically impossible, studio engineers incurred Spector’s wrath for not meeting his demands. Not unlike his operatic idol, Spector tended towards self-indulgence, egoism, and arrogance (Howard, 2004). Changing musical preferences brought on by the “British Invasion,” along with Spector’s stubborn insistence on using monaural recording techniques in an increasingly stereophonic recording industry, accounted for the decline in the popularity of his recordings during the mid-1960s (Howard, 2004). As the Beatles established themselves as a premiere rock band during that time, a number of conductors and composers (including Leonard Bernstein) bestowed upon them a degree of cultural validation unprecedented in the history of popular music (Kozinn 2004). In 1967, The Beatles released Sgt. Pepper’s Lonely Hearts Club Band, which seemed to confirm such assessments. The band drew upon a variety of music styles, including traditional “classical” elements (Wagner, 2008). They also derived inspiration from avant garde composers like Karlheinz Stockhausen, whose face appears among many others on the cover of Sgt. Pepper. He has also been cited as an influence by Frank Zappa, The Grateful Dead, Björk and many others (Didcock, 2005; Ross, 2007). Sgt. Pepper set the stage for other rock musicians to employ similar techniques (Holm-Hudson, 2002). Progressive rock bands emerged within this zeitgeist, drawing upon elements of classical and avant garde music, as well as jazz (Weinstein, 2002). As an example, Emerson, Lake, and Palmer (ELP) directly quoted works by a number of classical composers, including Leoš Janáček, Béla Bartók, and Modeste Mussorgsky (Headlam, 2000). In describing the extramusical significance of the band’s 1971 album Pictures at an Exhibition, Macan (2006) considers commonalities between the counterculture and nineteenth century Romanticism, as well as the expansion of the syntax of rock to accommodate such ideals:
26
Neal
[ELP] wished to express the visionary, utopian, and prophetic concerns of the late-1960s / early-1970s counterculture … The last time such ideals had held strong currency with artists and with society at large … they had been expressed by composers … through the parameters of classical music culture. (pp. 193 – 194)
Producer Brian Eno, who has collaborated with David Bowie and groups like U2, has also worked with contemporary composer Philip Glass. For his Low Symphony, Glass drew upon music by Bowie and Eno, as well as Bowie for his Heroes Symphony. Conversely, both Bowie and Eno mention Glass’ impact on their work (Kozinn, 1997; Strickland, 1997). In addition to acknowledging Glass’ influence, Bowie has mentioned how the late Romantic sensibilities of composer Richard Strauss informed the creation of his albums Buddha of Suburbia (Leseman, 2007) and Heathen (Sennett, 2002). Several commercial recordings document orchestral renditions of popular music, including Arthur Fiedler and the Boston Pops Play the Beatles (Fiedler, 1969); What If Mozart Wrote “Born to Be Wild?” (Hampton String Quarter, 1988); In from the Storm: Music of Jimi Hendrix (Hendrix, 1995); and Symphonic Elvis (Presley, 1996). Other examples of “crossover” include the multi-genre repertoire of the Kronos Quartet (Headlam, 2000), tenor Luciano Pavarotti’s collaborations with popular singers (Tsioulcas, 2007), and the album For the Stars, which features rock musician Elvis Costello and soprano Anne Sofie von Otter (Sexton, 2001). Furthermore, social networking sites enable individuals to create and share orchestral renderings of popular music, and vice versa. As an example, Rodrigo Dias composed an orchestral score for Alanis Morissette’s a cappella song “Your House” from Jagged Little Pill, mixed the original recording with samplers, and posted it to YouTube (RodrigoCFD, 2010). Whether in recording practice or concert performances, heavy metal shares affinities with progressive rock and classical music. Deep Purple musician Ritchie Blackmore cites Henry Purcell, Johann Sebastian Bach, and Antonio Vivaldi as influences on chord progressions in some of the band’s music (Walser, 1993). Further underscoring such affinities, Elektra released S&M, a collaborative recording where Metallica played some of its songs with the San Francisco Symphony Orchestra (Metallica, 1999). Among some heavy metal aficionados and musicians, Wagner is cited as a major influence. Manowar mentions it on the band’s website (Manowar, n.d.), and the VH1 series Metal Evolution discusses affinities between metal and Wagner (Busheikin, 2011). Known for his contributions to such albums as Meat Loaf’s Bat out of Hell, music producer and songwriter Jim Steinman cites both Wagner and The Doors as broad influences (Bats and Valkyries, n.d.). In a way, this can relate to the musical and extramusical connections that listeners could make among dif-
Chapter 1. Precedent or preference?
27
ferent genres. Doors frontman Jim Morrison had a keen interest in philosopher Friedrich Nietzsche (Davis, 2004), with whom Wagner had a troubled friendship (Magee, 2000). Furthermore, in a conversation with Francis Ford Coppola, John Milius explains to the director of Apocalypse Now how the “warlike” qualities he perceived in both The Doors and Wagner aided him in writing the film’s script (Coppola, Wagner and The Doors, n.d.). Whatever one may think about the validity of the aforementioned connections, they demonstrate that individuals can draw upon their personal experiences to yield seemingly unusual relationships between classical and popular music. Whether in subjective or objective terms, they hint at greater degrees of similarity than genre-based categorizations can convey. In discussing perceptions of knowledge and truth, social theorist Michel Foucault suggests that people do not rely entirely on objective or subjective perceptions of reality. Rather, intersubjectivity can encompass both aspects (Foucault, 1999). They may include interactions with, and the interplay of, various cultural “texts,” as well as the “shared meanings, conventions, and social practices operating within and between discourses, and to which an individual’s sense-making processes are inextricably linked” (Olsson, 2010, p. 66). Within the framework of musical psychology, Clarke, Dibben, and Pitts (2010) suggest that music could possess multiple meanings that become apparent under specific circumstances, depending on listeners’ knowledge about music and the ways in which they interact with it.
Psychological and sociological contexts The field of music psychology examines how people engage with music, whether as listeners, performers, or both. It can encompass topics related to cognition, such as memory and emotion, as well as broader social and cultural factors that influence how individuals interact with music. Most research in the field draws upon empirical methods, and utilizes inductive reasoning to explain findings (Clarke et al., 2010). At an elementary level, infants have some of the same cognitive maps for music as adults, and their understanding of musical structure shares some parallels with the development of language skills. As they grow and become more deeply acquainted with music, their individual experiences and participation within broader cultural contexts enable them to form an “implicit knowledge” of musical structures and meanings. Several studies have found that, as early as the age of five, children generally have high levels of agreement with adults in selecting emotional states to match specific musical cues. By the time they reach adulthood, listeners apply an array of implicit musical understandings quickly
28
Neal
and with relatively little effort while listening to music, to the point that they are unaware of doing so (Dowling, 1999). Such understanding relates to the variety of purposes for which individuals or groups use music, which include listening closely to a piece for aesthetic reasons or utilising it as accompaniment to a specific activity. Drawing upon cultural constructs, this activity implies some degree of interest in the music’s apparent significance, whether at individual or societal levels (Clarke et al., 2010). At a subjective level, tagging typically draws upon the contexts that an individual might associate with certain pieces of music, as well as emotion or mood (Bischoff et al., 2009; Hu & Downie, 2010; Kanters, 2009; Neal et al., 2009). Contextual tags could relate specifically to individual “uses” of music. It can accompany quotidian tasks like housework or driving, or aid in the development of nonmusical skills, as suggested by proponents of the so-called “Mozart Effect.” That said, context could also encompass affective “uses” of music as well, which relate primarily to emotion or mood regulation (Clarke et al., 2010). An individual emotion-based tag like “happy” or “sad” is deceptively simple. One term can signify an implied contextual foundation for a complex interaction of mechanisms, related to specific or general associations made with pieces of music (Clarke et al., 2010). To varying degrees, they derive from the following factors: auditory information that aids with determining the sound’s source, which relates to survival; internalizing the feelings one perceives from elements of a performance; the formation of “episodic memories,” where listeners make emotional associations between music and moments in one’s own life; cultural meanings attached to the music itself; and arousal responses caused by delays in anticipated musical continuations (Clarke et al., 2010; Snyder, 2000). For these reasons, people have varying responses to the same piece of music. Furthermore, reactions of the same individual may change over time, or even in relation to sections within a specific piece (North & Hargreaves, 2008). In terms of emotion, individual interactions with music relate to specific contexts and degrees of physiological arousal (North & Hargreaves, 2008). Broader social and cultural factors are also important to consider. As in the case of the vehicle commercials mentioned earlier, music can reinforce social norms and signify notions of identity (Clarke et al., 2010; North & Hargreaves 2008). In some ways, this relates to the continuing debate over formalist and social constructionist analyses of music. As the structure of Western art music became increasingly complex throughout the 1600s and 1700s, many of its adherents claimed that its “loftiness” distinguished it from other kinds of music that served specific social functions. Formalist analyses rely on the notion of “absolute music,” stating that musical meaning can only be found within the music itself. Social constructionist
Chapter 1. Precedent or preference?
29
perspectives, on the other hand, suggest that musical meaning can relate directly to social, cultural, and historical contexts (Clarke et al., 2010). With an interest in a broad range of fields, including musicology, philosophy, and sociology, social theorist Theodor W. Adorno provided a foundation for critiquing the transcendental perspective of formalism. In terms of music, he studied the cultural norms that influence its creation and use, how the “culture industry” turns music into a commodity by preying on “passive” consumers who expect both novelty and sameness, and how music can be used to reinforce power structures and marginalise difference (Adorno & Horkheimer, 1998; Adorno, 2002). In the years following his death, Adorno’s approach to musicology remained marginal within the field. In the late 1980s and early 1990s, however, critical musicology became increasingly visible. Even if they take issue with Adorno’s broad dismissal of popular music’s value, his notions of listener “passivity” and his keen focus on the German “classical” tradition, many critical musicologists still cite Adorno as a major influence on their work. They bring additional perspectives to analysing music as well, such as poststructuralism, gender theory, critical race theory, and queer theory. Furthermore, many critical musicologists make a point of studying music from diverse genres (Brett, Wood, & Thomas, 1994; Kramer, 1990; McClary, 2007; Subotnik, 1991). Such approaches have some potential relevance to analysing the construction of genre categories and their impact on recommender systems, but they remain marginal in the field of MIR (Futrelle & Downie, 2003). As is the case with discerning a fine boundary between individual cognition and the influence of culture on musical taste, the validity of formalist and social constructionist perspectives remains ambiguous. Even assuming that musical structure possesses some degree of autonomous meaning, cultural frames of reference still guide discussions about it. For this reason, analyses of the cultural, political, and social contexts in which music exists provide a useful alternative to the decontextualized approach of formalism. On the other hand, the uninitiated may perceive such approaches as possessing an arcane and elitist aspect as well, believing that they overemphasise the impact of external factors on engagement with music (van der Toorn, 1995). Cognitive approaches can counterbalance the shortcomings of social constructionist analyses of music, but controlled experiments and abstract models of mental processes can oversimplify the ways in which listeners experience music in everyday life (Clarke et al., 2010). Despite their weaknesses, both approaches could provide complementary perspectives on human interactions with music. As Clarke et al. (2010) point out:
30
Neal
… psychological and sociological perspectives on listening suggest that music is far from autonomous, performing a range of functions for individuals and groups of listeners. Research on listening also reveals the differences between the physical characteristics of musical sound, and its psychological manifestations. These differences show how our perception of the musical world is shaped by the interaction of physical, biological, psychological, and cultural factors. (p. 77)
While musical and extramusical traits hint at commonalities among many genres, so too may common characteristics of people who listen to different kinds of music. In a study by North (2010), participants selected “Big Five” personality traits to describe themselves, as well as the degree to which they liked over 100 different types of music. Although the study made closer correlations among demographic factors and favoured music types, it also yielded common personality traits among individuals who listen to music from different genres (North, 2010). For a more general audience, an earlier Daily Mail article on North’s research highlighted the similarities among heavy metal and classical music listeners. It reported that both groups were “creative, at ease with themselves and introverted,” as well as “obsessive” about their music and possessing a “sense of theatre” (Derbyshire, 2008). Although the stronger demographic correlations with specific types of music (North, 2010) are not highlighted in the Daily Mail piece, it is worthwhile to consider how the article uses stereotypes about popular and classical music to attract readers’ attention. The headline reads, “Take note: Fans of heavy metal and classical music have a lot in common, study finds” (Derbyshire, 2008). It demonstrates how both types of music are still perceived as polar opposites in the broader culture, while using an apparently surprising claim to subvert such a view.
Music information retrieval The previous subsections demonstrate some of the ways that different genres interact or share common traits. A complex array of individual and cultural factors can compel composers and performers from different music traditions to influence each other directly and indirectly. Similar principles could apply to listeners who make musical and extramusical connections among seemingly different works. Whether due to the pervasiveness of genre-based categorization practices or other factors, the amount of MIR research that addresses the possibilities of cross-genre discovery remains relatively limited. Nonetheless, such studies provide hints about the ways in which systems could aid people who wish to find
Chapter 1. Precedent or preference?
31
recommendations for music from an unfamiliar genre, but that shares similarities with favourite music from a more familiar one. Downie (2003) identifies seven facets of music that are necessary for developing MIR systems. Four facets describe aspects of the music itself: pitch, tempo, harmony, and timbre. The editorial facet refers to both text-based and non-textual “iconic” instructions for musicians. The remaining two text-based facets, textual and bibliographic, relate to lyrics and metadata, respectively (Downie, 2003). Genre has traditionally acted as a form of metadata, so it would tie most closely to the bibliographic facet. Some automatic classification systems could assign genre based on other facets, including timbre (Downie, 2003). Nonetheless, the social and individual perceptions that influence the construction of genres, along with the attendant ambiguities of that process (Aucouturier & Pachet, 2003; Lippens et al., 2004), make other modes of classification worth considering. Tags based on emotion, mood, and context remain the primary alternatives to genre for music classification (Bischoff et al., 2009), but some research within experimental MIR systems suggests other techniques that could yield additional similarities. Studies led by Baumann used such elements as volume, tempo, and mel-frequency cepstrum coefficients, which enabled Nearest Neighbor [sic] (NN) classifiers to pick up on cross-genre similarities (Baumann, 2003; Baumann & Dulong de Rosnay, 2004; Baumann & Halloran, 2004; Baumann, Klüter, & Norlien, 2002). Roos and Manaris (2007) developed a prototypical music search engine with over 15,000 pieces. In developing their system, they drew upon previous research that connects power law metrics with physiology, cognition, and aesthetics. This suggests that an understanding of such fields can contribute to developing systems that are more sensitive to similarities in music from different genres. In discussing a music search experiment using their system, Roos and Manaris (2007) note an “intriguing observation”: … that the search engine discovers similarities across established genres. For instance, searching for music similar to Miles Davis’ “Blue in Green” (Jazz), identifies a very similar (MSE 0.0043), yet obscure cross-genre match: Sir Edward Elgar’s “Chanson de Matin” (Romantic). Such matches can be easily missed even by expert musicologists. (p. 30)
Both works would likely appear in different sections of a genre-based physical music collection, and a musicologist might only consider the cross-genre similarities of both works (or even of both musicians) if she happened to recall one while listening to the other. Hong (2011) does not outline an underlying theoretical framework, but he uses the term “cross-genre music recommendation” to describe a project whose ultimate end is to enable people to find similar music from different genres. The
32
Neal
stated focus has a prescriptive aspect, which is to guide young listeners of popular music to “classical music that they have the best chance of enjoying” (Hong, 2011, para. 5). Nonetheless, Hong (2011) suggests that future related work could focus on connections among other genres, including the influence of Jazz on other types of music. The study’s practical approach to cross-genre similarities, taken together with the discoveries described by Baumann (2003; Baumann & Dulong de Rosnay, 2004; Baumann & Halloran, 2004; Baumann, Klüter, & Norlien, 2002) and Roos and Manaris (2007), provides a useful foundation for developing systems that could address the issues discussed throughout this paper.
Conclusion Although the literature on genre categorization rarely discusses similarities among different types of music, some studies hint at the possibility of developing systems with cross-genre recommender functions. Even research into automatic genre classification acknowledges the ambiguities of genre boundaries. In addition to accounting for “errors” within such systems, such permeability hints at prospects for developing broadly-based cross-genre recommendation features. The literature on alternatives to genre-based classification, such as tagging, considers other musical aspects that may resonate more deeply with users of MIR systems, such as mood, emotion, and context. Literature on music and cognition, as well as analyses of cross-genre inclinations among musicians and listeners, would provide useful guides for creating systems that could more easily match music with individual traits, and possibly foster the development of unforeseen online music communities. Experts from various domains of knowledge, such as psychology, the social sciences, history, physiology, and information science, could aid in refining the ability of systems to recommend “similar” music from different genres. Furthermore, laypeople open to music from diverse genres could make their own contributions, even if they have minimal knowledge about musical language. With emerging technologies and contributions from many of the aforementioned fields, the time has come to engage in active research on developing systems that can make music recommendations that transcend genre. As Ross (2007) states: At the beginning of the twenty-first century, the impulse to pit classical music against pop culture no longer makes intellectual or emotional sense. Young composers have grown up with pop music ringing in their ears … Likewise, some of the liveliest reactions to twentieth-
Chapter 1. Precedent or preference?
33
century and contemporary classical music have come from the pop arena, roughly defined. (pp. 541 – 542)
In order to maximize their potential, music indexing and retrieval systems need to reflect such trends, and facilitate diverse notions of “similarity” that enable users to make more broadly-based and unexpected music discoveries than current systems allow.
References Adorno, T. W. (2002). Essays on music (S.H. Gillespie, Trans.). R. Leppert (Ed.). Berkeley, CA: University of California Press. Adorno, T. W., & Horkheimer, M. (1998). The culture industry: Enlightenment as mass deception. [Most of one chapter from Dialectic of Enlightenment.] Transcribed by A. Blunden. Retrieved from http: // www.marxists.org / reference / archive / adorno / 1944 / culture-industry.htm Amazon.com. (2011). Amazon.com help: Tags. Retrieved from http: // www.amazon.com / gp / help / customer / display.html?ie=UTF8&nodeId=16238571 Annesi, P., Basili, R., Gitto, R., Moschitti, A., & Petitti, R. (2007). Audio feature engineering for automatic music genre classification. In D. Evans, S. Furui, & C. Soulé-Dupuy (Eds.), Proceedings of the Eighth Conference on Large-Scale Semantic Access to Content (pp. 702 – 711). Aucouturier, J.-J., & Pachet, F. (2003). Representing musical genre: A state of the art. Journal of New Music Research, 32(1), 83 – 93. Bats and Valkyries. (n.d.). (Originally published as K. Krizanovich, “Rock ‘n’ Roll’s Richard Wagner,” CD Review Magazine, Nov. 1989). Retrieved from http: // www.jimsteinman.com / valkyries.htm Baumann, S. (2003). Music similarity analysis in a P2P environment. In E. Izquierdo (Ed.), Proceedings of the Fourth European Workshop on Image Analysis for Multimedia Interactive Services. Retrieved from http: // www.dfki.uni-kl.de / ~baumann / pdfs / WIAMIS2003.pdf Baumann, S., & Dulong de Rosnay, M. (2004). Music life cycle support through ontologies. In V. Barr and Z. Markov (Eds.), Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference. Retrieved from http: // www.aaai.org / Papers / FLAIRS / 2004 / Flairs04 – 009.pdf Baumann, S., & Halloran, J. (2004). An ecological approach to multimodal subjective music similarity perception. In R. Parncutt, A. Kessler, & F. Zimmer (Eds.), Proceedings of the First Conference on Interdisciplinary Musicology. Retrieved from http: // www.informatics. sussex.ac.uk / research / groups / interact / publications / CIM04.pdf Baumann, S., Klüter, A., & Norlien, M. (2002). Using natural language input and audio analysis for a human-oriented MIR system. In C. Busch, M. Arnold, P. Nesi, & M. Schmucker (Eds.), Second International Conference on WEB Delivering of Music. Retrieved from http: // www. dfki.uni-kl.de / ~baumann / pdfs / WEDEL2002.pdf
34
Neal
Bell, R., Bennett, J., Koren, Y., & Volinsky, C. (2009). The million dollar programming prize. IEEE Spectrum, 46(5), 28 – 33. Bernstein, L. (Writer), Kraut, H. J. (Executive Producer), Burton, H. (Consulting Producer), & Smith, D. (Producer). (1992). The unanswered question: Six talks at Harvard by Leonard Bernstein [Motion Picture]. West Long Branch, NJ: Kultur. Bischoff, K., Claudiu, S., Firan, C. S., Nejdl, W., & Paiu, R. (2009). How do you feel about “Dancing Queen”? Deriving mood & theme annotations from user tags. Proceedings of the Ninth ACM / IEEE-CS Joint Conference on Digital Libraries (pp. 285 – 293). Blake, A. (2010). Ethical and cultural issues in the digital era. In A. Bayley (Ed.), Recorded music: Performance, culture, and technology (pp. 52 – 67). New York: Cambridge University Press. Brett, P., Wood, E., & Thomas, C. G. (Eds.). (1994). Queering the pitch: The new gay and lesbian musicology. New York: Routledge. Bryant, E. T., & Marco, G. A. (1985). Music librarianship: A practical guide (2nd ed.). Metuchen, NJ: Scarecrow Press. Burton, H. (1994). Leonard Bernstein. New York: Doubleday. Busheikin, D. (2011, 24 November). Metal Evolution: Acclaimed filmmakers Sam Dunn and Scott McFadyen discuss their new television series, and why it isn’t just for metalheads. CHART Attack. Retrieved from http: // www.chartattack.com / features / 2011 / nov / 24 / metal-evolution-acclaimed-filmmakers-sam-dunn-and-scott-mcfadyen-discuss-theirChen, L., Wright, P., & Nejdl, W. (2009). Improving music genre classification using collaborative tagging data. In R. Baeza-Yates, P. Boldi, B. Ribeiro-Neto, & B. B. Cambazoglu (Eds.), Proceedings of the Second ACM International Conference on Web Search and Data Mining (pp. 84 – 93). Clarke, E., Dibben, N., & Pitts, S. (2010). Music and mind in everyday life. New York: Oxford University Press. clint454. (2009, February 26). Cadillac STS “let*s dance” commercial [Video file]. Retrieved from http: // www.youtube.com / watch?v=8SE4YfmlckE Coppola, Wagner and The Doors. (n.d.) ARTISTdirect Music [Video file]. Retrieved from http: // www.artistdirect.com / video / coppola-wagner-and-the-doors / 82613 Cui, B., Liu, L., Pu, C., Shen, J., & Tan, K.-L. (2007). QueST: Querying music databases by acoustic and textual features. Proceedings of the Fifteenth ACM International Conference on Multimedia. (pp. 1055 – 1064). Cunningham, S. J., Reeves, N., & Britland, M. (2003). An ethnographic study of music information seeking: Implications for the design of a music digital library. In C.C. Marshall, G. Henry, & L. Delcambre (Eds.), Proceedings of the Third ACM / IEEE-CS Joint Conference on Digital Libraries (pp. 5 – 15). New York: ACM. Davis, S. (2004). Jim Morrison: Life, death, legend. New York: Gotham Books. De Pessemier, T, Deryckere, T., & Martens, L. (2009). Context aware recommendations for user-generated content on a social network site. Proceedings of the Seventh European Interactive Television Conference (pp. 133 – 136).. Derbyshire, D. (2008, 5 September). Take note: Fans of heavy metal and classical music have a lot in common, study finds. The Daily Mail. Retrieved from http: // www.dailymail. co.uk / news / article-1052606 / Take-note-Fans-heavy-metal-classical-music-lot-commonstudy-finds.html
Chapter 1. Precedent or preference?
35
Didcock, B. (2005, March 27). Madman or genius? Iconic composer Karlheinz Stockhausen still divides critics after 54 years. The Sunday Herald. Retrieved from http: // www.highbeam. com / doc / 1P2 – 10007915.html Dowling, W. J. (1999). The development of music perception and cognition. In D. Deutsch (Ed.), The psychology of music (2nd ed., pp. 603 – 625). New York: Academic Press. Downie, J. S. (2003). Music information retrieval. In B. Cronin (Ed.), Annual Review of Information Science and Technology 37 (pp. 295 – 340). Medford, NJ: Information Today. Fiedler, A. (Cond.) (1969). Arthur Fiedler and the Boston Pops play the Beatles. Boston Pops Orchestra. (2000). New York: RCA Records. Foucault, M. (1999). What is an author? (R. Hurley, Trans.). In J. D. Faubion (Ed.), The essential Foucault: Selections from essential works of Foucault, 1954 – 1984 (Volume 2) (pp. 377 – 391). New York: The New Press. Futrelle, J., & Downie, J. D. (2003). Interdisciplinary communities and research issues in music information retrieval: ISMIR 2000 – 2002. Journal of New Music Research, 32(2), 121 – 131. Groknow, P., & Saunio, I. (1998). An international history of the recording industry (C. Moseley, Trans.). New York: Cassell. Grimaldi, M., & Cunningham, P. (2004). Experimenting with music taste prediction by user profiling. Proceedings of the Sixth ACM SIGMM International Workshop on Multimedia Information Retrieval (pp. 173 – 180). Grimaldi, M., Cunningham, P., & Kokaram, A. (2003). A wavelet packet representation of audio signals for music genre classification using different ensemble and feature selection techniques. Proceedings of the Fifth ACM SIGMM International Workshop on Multimedia Information Retrieval. (pp. 102 – 108). Hampton String Quartet. (1988). What if Mozart wrote “Born to be wild?” [CD] New York: RCA Red Seal. Headlam, D. (2000). Re-drawing boundaries: The Kronos Quartet. Contemporary Music Review 19(1), 113 – 140. Hendrix, J. (1995). In from the storm: Music of Jimi Hendrix. London Metropolitan Orchestra. [New York]: RCA Victor. Holm-Hudson, K. (2002). Introduction. In K. Holm-Hudson (Ed.) Progressive rock reconsidered (pp. 1 – 18). New York: Routledge. Hong, E. (2011). Cross-genre music recommendation project – Evan Hong – Ithaca College. Retrieved from http: // eportfolios.ithaca.edu / ehong1 / cgmr / Hong, J., Deng, H., & Yan, Q. (2008). Tag-based artist similarity and genre classification. In J. Wang (Ed.), Proceedings of the First IEEE International Symposium on Knowledge Acquisition and Modeling Workshop (pp. 628 – 631). Piscataway, NJ: IEEE. Howard, D. (2004). Sonic alchemy: Visionary music producers and their maverick recordings. Milwaukee, WI: Hal Leonard Corporation. Hu, X, & Downie, J. S. (2010). When lyrics outperform audio for music mood classification: A feature analysis. In J. S. Downie and R. C. Veltkamp (Eds.), Proceedings of the Eleventh International Society for Music Information Retrieval Conference (pp. 619 – 624). Retrieved from http: // ismir2010.ismir.net / proceedings / ismir2010 – 106.pdf Jennings, D. (2007). Net, blogs, and rock ‘n’ roll: How digital discovery works and what it means for consumers, creators, and culture. Boston, MA: Nicholas Brealey Publishing. Kanters, P. W. M. (2009). Automatic mood classification for music. (Unpublished master’s thesis). Tilburg Centre for Creative Computing, Tilburg University, Tilburg, The Netherlands.
36
Neal
Karydis, I., Nanopoulos, A., & Manolopoulos, Y. (2006). Symbolic musical genre classification based on repeating patterns. In K. Nahrstedt, M. Turk, Y. Rui, W. Klas, & K. Mayer-Patel (Eds.), Proceedings of the Fourteenth ACM International Conference on Multimedia (pp. 53 – 57). Kennedy, M., and Kennedy, J. B. (Eds.). (2007). The concise Oxford dictionary of music. New York: Oxford University Press. Kozinn, A. (2004, 6 February). Critic’s notebook: They came, they sang, they conquered. New York Times. Retrieved from www.nytimes.com / 2004 / 02 / 06 / arts / music / 06BEAT.html Kozinn, A. (1997). The touring composer as keyboardist (1980). In R. Kostelantz (Ed.) & R. Flemming (Asst. Ed.), Writings on Glass: Essays, interviews, criticism (pp. 102 – 108). New York: Schirmer Books. Kramer, L. (1990). Music as cultural practice, 1800 – 1900. Berkeley, CA: University of California Press. Lamere, P. (2008). Social tagging and music information retrieval. Journal of New Music Research, 3(2), 101 – 114. Law, E., & von Ahn, L. (2009). Input-agreement: A new mechanism for collecting data using human computation games. In D. Olsen, R. Arthur, K. Hinckley, M. Morris, S. Hudson, & S. Greenberg (Eds.), Proceedings of the Twenty-Seventh Annual ACM Conference on Computer Human Interaction (pp. 1197 – 1206). Leseman, L. (2007, 17 October). David Bowie, Buddha of Suburbia. Houston Free Press. Retrieved from http: // www.houstonpress.com / 2007 – 10 – 18 / music / david-bowie / Li, T., Ogihara, M., & Li, Q. (2003). A comparative study of content-based music genre classification. In J. Callan, F. Crestani, & M. Sanderson (Eds.), Proceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 282 – 289). Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 7(1), 76 – 79. Lippens, S., Martens, J. P., Leman, M., Baets, B., & Meyer, H. (2004). A comparison of human and automatic musical genre classification. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 233 – 236). Piscataway, NJ: IEEE. Long, M. (2008). Beautiful monsters: Imagining the classic in musical media. Berkeley, CA: University of California Press. Macan, E. (2006). Endless enigma: A musical biography of Emerson, Lake and Palmer. Chicago: Open Court. Magee, B. (2000). The Tristan chord: Wagner and philosophy. New York: Metropolitan Books. Manowar – The Kingdom of Steel. (n.d.). Retrieved from http: // www.manowar.com / biography. html McClary, S. (2007). Reading music: Selected essays. In Ashgate Contemporary Thinkers on Critical Musicology Series. Burlington, VT: Ashgate Pub. Company. McKay, C., and Fujinaga. I. (2010). Improving automatic music classification performance by extracting features from different types of data. In Proceedings of the Eleventh ACM SIGMM International Conference on Multimedia Information Retrieval (pp. 257 – 265). New York: ACM. McKay, C., and Fujinaga. I. (2006). Musical genre classification: Is it worth pursuing and how can it be improved? In Proceedings of the Seventh International Conference on
Chapter 1. Precedent or preference?
37
Music Information Retrieval (pp. 101 – 106). Retrieved from http: // www.music.mcgill.ca / ~cmckay / papers / musictech / ISMIR_2006_Genre.pdf McKinney, M. F., & Breebaart, J. (2003). Features for audio and music classification. Proceedings of the Fourth International Conference on Music Information Retrieval. Retrieved from http: // ismir2003.ismir.net / papers / McKinney.PDF McKnight, M. (2002). Music classification systems. In Music Library Association Basic Manual Series, No. 1. Lanham, MD: Scarecrow Press. Metallica. (1999). S & M. San Francisco Symphony Orchestra.M. Kamen (cond.). New York: Elektra. Mitri, G. (2004). Automatic music classification problems. In V. Estivill-Castro (Ed.), TwentySeventh Australasian Computer Science Conference (pp. 315 – 322). Sydney: Australian Computer Society. Neal, D., Campbell, A., Neal, J., Little, C., Stroud-Matthews, A, Hill, S., & Bouknight-Lyons, C. (2009). Musical facets, tags, and emotion: Can we agree? Proceedings of the Fourth Annual iConference. Retrieved from http: // publish.uwo.ca / ~dneal2 / musictagging_neal. pdf Neal, J., and Neal, D. (2009). Answering the unanswered question? Contextualizing a holistic theoretical framework for cross-genre music information retrieval. Proceedings of the American Society for Information Science and Technology 46(1). North, A. C. (2010). Individual differences in musical taste. The American Journal of Psychology, 123(2), 199 – 208. North, A. C., & Hargreaves, D. J. (2008). The social and applied psychology of music. New York: Oxford University Press. Olsson, M. R. (2010). Michel Foucault: Discourse, power / knowledge, and the battle for truth. In G. J. Leckie, L. M. Given, & J. E. Buschman (Eds.), Critical theory for library and information science: Exploring the social from across the disciplines (pp. 63 – 74). Westport, CT: Libraries Unlimited. Pachet, F., & Cazaly, D. (2000). A taxonomy of musical genres. Proceedings of the Content-Based Multimedia Information Access Conference. Retrieved from http: // www.csl. sony.fr / downloads / papers / 2000 / pachet-riao2000.pdf Pampalk, E., Flexer, A., & Widmer, G. (2005). Improvements of audio-based similarity and genre classification. In Proceedings of the Sixth International Conference on Music Information Retrieval. Retrieved from http: // www.ofai.at / ~elias.pampalk / publications / pam_ismir05. pdf Pampalk, E., Rauber, A., & Merkl, D. (2002). Content-based organization and visualization of music archives. In L. Rowe, B. Mérialdo, M. Mühlhäuser, K. Ross, & N. Dimitrova (Eds.), Proceedings of the Tenth ACM International Conference on Multimedia. (pp. 570 – 578). Pandora (2011). About the Music Genome Project®. Retrieved from http: // www.pandora.com / corporate / mgp Papakhian, A. R. 2000. Cataloging. In R. Griscom (Ed.) & A. Maple (Asst. Ed.), Music librarianship at the turn of the century (pp. 19 – 28). Music Library Association Technical Reports, No. 27. Lanham, MD: Scarecrow Press. Pérez-García, T., Pérez-Sancho, C., & Iñesta, J. M. (2010). Harmonic and instrumental information fusion for musical genre classification. Proceedings of the Third International Workshop on Machine Learning and Music (pp. 49 – 52). New York: ACM. Pérez-Sancho, C., Rizo, D., Kersten, S., & Ramirez, R. (2008). Genre classification of music by tonal harmony. Intelligent Data Analysis 14(5), 533 – 545.
38
Neal
Presley, E. (1996). Symphonic Elvis. Memphis Symphony Orchestra. E. Stratta (Cond.). [Portland, OR]: Teldec. Rehbach, J. (1989). Computer technology in the music library. In A. Mann (Ed.), Modern music librarianship: Essays in honor of Ruth Watanabe (pp. 123 – 132). Festschrift Series No. 8. Stuyvesant, NY: Pendragon Press. Rivlin, G. (2005, July 10). A retail revolution turns 10. The New York Times. Retrieved from brianjones.org / images / uploads / amazon_NYT.pdf RodrigoCFD [Dias, R.]. (2010, September 25). Alanis Morissette with symphonic orchestra – Your house [Video file]. Retrieved from http: // www.youtube.com / watch?v=6zRKbudoe9k Roos, P., & Manaris, B. (2007). A music information retrieval approach based on power laws. Proceedings of the Nineteenth IEEE International Conference on Tools with Artificial Intelligence (pp. 27 – 31). Ross, A. (2007). The rest is noise: Listening to the twentieth century. New York: Farrar, Straus, and Giroux. Ross, A. (2004, Feb. 16). Listen to this. The New Yorker. Retrieved from http: // www.newyorker. com / archive / 2004 / 02 / 16 / 040216fa_fact4#ixzz1Rvj88ZjM Sanden, C., Befus, C., & Zhang, J. (2008). Clustering-based genre prediction on music data. In Proceedings of the Canadian Conference on Computer Science and Software Engineering (pp. 117 – 119). New York: ACM. Sennett, S. (2002, May 13). Classic Bowie. Sydney Morning Herald. Retrieved from http: // www. smh.com.au / articles / 2002 / 05 / 13 / 1021002431619.html Sexton, P. (2001). Costello, Von Otter duet on DG. Billboard, 1, 83. Snyder, B. (2000). Music and memory: An introduction. Cambridge, MA: The MIT Press. Sordo, M., Celma, O., Blech, M., & Guaus, E. (2008). The quest for musical genres: Do the experts and the wisdom of crowds agree? Proceedings of the Ninth International Conference on Music Information Retrieval (pp. 255 – 260). Retrieved from http: // ismir2008.ismir.net / papers / ISMIR2008_267.pdf Spiteri, L. F. (2006). The use of folksonomies in public library catalogues. The Serials Librarian 51(2), 75 – 89. Stevenson, G. (1973). Classification chaos. In C. J. Bradley (Ed.), Reader in music librarianship (pp. 273 – 278). Readers in Library and Information Science Series. Washington, DC, USA: Microcard Editions Books. (Original article published in Library Journal, 15 Oct. 1963). Strickland, E. (1997). Minimalism: T. In R. Kostelantz, (Ed.) & R. Flemming (Asst. Ed.) Writings on Glass: Essays, interviews, criticism (pp. 113 – 128). New York: Schirmer Books. Symeonidis, P., Nanopoulos, A., & Manolopoulos, Y. (2008). Providing justifications in recommender systems. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 38(6), 1262 – 1272. Subotnik, R. R. (1991). Developing variations: Style and ideology in western music. Minneapolis, MN: University of Minnesota Press. TheSceneBehind. (2010, September 14). Kia Soul + Michelle Wie “parking lot” [Video file]. Retrieved from http: // www.youtube.com / watch?v=M4Dod-bMtDo Tsioulcas, A. (2007, September 15). Luciano Pavarotti: 1935 – 2007. Billboard, 9. Tzanetakis, G., Essl, G., & Cook, P. (2001). Automatic musical genre classification of audio signals. Proceedings of the Second International Symposium on Music Information Retrieval. Retrieved from http: // ismir2001.ismir.net / pdf / tzanetakis.pdf Umesh, U.N., Huynh, M.Q., & Jessup, L. (2005). Creating successful entrepreneurial ventures in IT. Communications of the ACM, 48(6), 82 – 87.
Chapter 1. Precedent or preference?
39
van der Toorn, P. (1995). Music, politics, and the academy. Berkeley, CA: University of California Press. Vig, J., Soukup, M., Sen, S., & Riedl, J. (2010). Tag expression: Tagging with feeling. Proceedings of the Twenty-Third Annual ACM Symposium on User Interface Software and Technology (pp. 323 – 332). Wagner, N. (2008). The Beatles’ psycheclassical synthesis: psychedelic classicism and classical psychedelia in Sgt. Pepper. In O. Julien (Ed.), Sgt. Pepper and the Beatles: It was forty years ago today (pp. 75 – 90). Burlington, VT: Ashgate. Walser, R. (1993). Running with the Devil: Power, gender, and madness in heavy metal music. Music / Culture (Series). G. Lipsitz, S. McClary, and R. Walser (Eds.). Hanover, NH: Wesleyan University Press. Wang, L., Huang, S., Wang, S., Liang, J., & Xu, B. (2008). Music genre classification based on multiple classifier fusion. Fourth International Conference on Natural Computation (pp. 580 – 583). Weinstein, D. (2002). Progressive rock as text: The lyrics of Roger Waters. In K. Holm-Hudson (Ed.), Progressive rock reconsidered (pp. 91 – 109). New York: Routledge.
Elaine Ménard
Chapter 2. Multilingual taxonomy development for ordinary images: Issues and challenges Abstract: This chapter explores the issues and challenges derived from the development of a bilingual taxonomy for ordinary images. The first section of the chapter describes the general steps in the taxonomy creation, based on a userand-task-adapted approach. The major steps are presented in terms of a best fit for the complex metadata needed for image organization. The second section introduces the different phases leading to the development of a multi-faceted, bilingual (English / French) taxonomy used as a visual representation for an image database. The methodology adopted for the development of the new vocabulary structure is described in detail, as are some major resultant challenges. Keywords: Taxonomy, image indexing, image retrieval, image description, controlled vocabularies, bilingual environment, multilingualism, methodological approach, language issues, tagging
Elaine Ménard, Assistant Professor, McGill University, School of Information Studies, [email protected]
1 Introduction As many of us have discovered, the Web is a wonderful source for the image hunter. Images are found in most Web resources, such as personal pages, museum collections, digital libraries, commercial products and services catalogues, government information and so on. However, finding what you want can be a bit of a hit-and-miss affair (Ménard, 2009). At a certain level of abstraction, textual documents and images can be thought of as being very similar (both expressing specific ideas and concepts). In general, we consider the image as language independent. However, image indexing gives the image a language status similar to any text document, which may affect retrieval. Compared with text-based retrieval, image retrieval poses specific challenges. The first difficulty comes from the translation of the visual representation of an object into a textual description. Given the possibility of multiple interpretations of the visual resource, there is
Chapter 2. Multilingual taxonomy development for ordinary images
41
serious risk of ambiguity and error. In other words, image searchers will not necessarily describe and search for an image using the same concepts or the same words. Moreover, perhaps the most significant difference between textual and image retrieval is that, with images, users have a higher propensity to browse when they go through their searches. As a result, users will also check associated textual metadata to decide if an image is relevant or not. Consequently, in a multilingual environment, the opportunity to browse textual metadata in a language the user understands will be important. The quality of the indexing language plays a major part at the time of digital image retrieval. The principal problem of most languages used for image indexing remains the difficulty in translating certain specific elements of the language (e.g., expressions, compound words, proper names, etc.). Unfortunately, even if relevant images retrieved do not require any translation mechanism to be understood by the user and can be used immediately, language barriers still prevent users from accessing digital images. Even if tools and systems for organizing and disseminating visual resources have improved significantly in recent years, especially with the growth of the Internet and the inclusion of resources written or indexed with many different languages, image retrieval still presents its share of obstacles for many Web image searchers. There is a substantial call for subject and pre-iconographic indexing in systems that are intended to serve a broad range of user queries. Text-based image indexing and retrieval have been studied extensively for decades. Several researchers (Armitage & Enser, 1997; Choi & Rasmussen, 2002, 2003; Chung & Yoon, 2009; Conniss, Ashford & Graham, 2000, 2003; Enser, 2007, 2008; Goodrum & Spink, 2001; Greisdorf & O’Connor, 2008; Jörgensen, 1998, 2003; Krause, 1988; Markey, 1988; Markkula & Sormunen, 2000; Matusiak, 2006; Ménard, 2008; Panofsky, 1955; Rorissa, 2008; Shatford, 1986; Stvilia & Jörgensen, 2009; Turner, 1994, 1998) have described the considerable amount of indexing work that accompanies image organization. Some theories of indexing relate to concepts such as “aboutness” or subject in different ways, while other theories of indexing relate differently to theories of meaning, language and interpretation. When examining the different indexing approaches, it is obvious that the majority of the images are indexed at a minimum and often offer a single point of access (Jörgensen, 1998). Several studies have shown that most of the indexing approaches are not suitable for image searchers (Besser & Snow, 1990; Roddy, 1991), while other studies have emphasized the fact that the main problem concerning image retrieval is the approach chosen for the indexing process (Krause, 1988; Ohlgren, 1980; Turner, 1993). With a text-based approach, indexing terms can either be directly extracted from the natural language or be drawn from a controlled vocabulary. On the one hand, indexing with uncontrolled vocabularies seems very desirable since it
42
Ménard
shows its close relationship with real users and the way they see and describe things (Matusiak, 2006; Chung & Yoon, 2009; Stvilia & Jörgensen, 2009). However, despite its growing popularity, and much like indexing with controlled vocabulary, uncontrolled vocabulary indexing also presents several gaps. For example, many ambiguities emerge because the same keyword is often employed by several individuals, but in various contexts. In the same vein, the lack of synonymic control involves the use of many different keywords to describe the same concept. On the other hand, while controlled vocabularies created with terms extracted from natural language can help reduce problems caused by natural language, such as polysemy and synonymy (Hudon, 2003, 2006; McClung, 2009), they are often not sufficiently specific and do not provide the necessary access points for the full range of users (Greisdorf & O’Connor, 2008). For example, controlled vocabularies are used to ensure consistent indexing, particularly when indexing multiple documents, periodical articles, websites, and so on. The main advantage of controlled vocabularies is their ability to promote consistency and to increase the probability of matching words chosen by the indexer to those of the image searcher (Jörgensen, 2003), and they can improve the image retrieval process. Among controlled vocabularies, thesauri and ontologies have long been used in digital collections and are often considered by experts as better suited for retrieval and discovery. A thesaurus is a controlled vocabulary created with terms extracted from natural language and designed specifically for post-coordination searches. An ontology is defined as an “explicit specification of a conceptualization” (Gruber, 1993, p. 1). It is used to formulate a thorough and rigorous representation of a domain by identifying and defining all concepts and the relations between them. Thesauri and ontologies have been gradually adopted in digital projects. However, they often represent concepts in an artificial way (Macgregor & McCulloch, 2006) by using terms that are linguistically correct but also hardly ever used by individuals in everyday life. Controlled vocabulary standards, classification systems and taxonomies can assist in both cataloguing and retrieval of image collections. Taxonomies have recently emerged as powerful tools that provide helpful information for use by indexers or information specialists who are describing works of art, architecture, material culture, archival materials, visual surrogates or bibliographic materials. Taxonomy development involves the participation of many specialties, from IT staff to librarians. It also entails an emphasis on user tasks and an understanding of various audience perspectives, as well as their contexts and processes. In this chapter, the issues and challenges derived from the development of a bilingual taxonomy are explored. The first section of the chapter presents the general steps in taxonomy creation, based on a user-and-task-adapted approach.
Chapter 2. Multilingual taxonomy development for ordinary images
43
The major taxonomy creation approach is described in terms of a best fit for the complex metadata needed for image organization. The second section explains the different phases leading to the development of a multi-faceted, bilingual (English / French) taxonomy used as a visual representation for an image database. The methodology adopted for the development of the new vocabulary structure is described in detail, as are some major resultant challenges.
2 General steps in taxonomy creation In the 1970s and 1980s, many cognitive scientists studied the way in which people naturally classify and categorize things. Rosch (1978) defined two principles of categorization. One is cognitive economy, which states that classifications exist to provide the most amount of information with the least amount of cognitive effort. The second principle is the perceived world structure, that is, the world as we discern it gets delivered to us as structured information, not in an arbitrary manner. According to Lambe (2007, p. 4), a taxonomy is a “structured set of names and descriptions used to organize information and documents in a consistent way.” The basic processes of the taxonomy development described here are derived from Hedden (2010a), Lambe (2007), Pincher (2010), and Whittaker and Breininger (2008). In general, a thorough analysis of the context using several resources is a crucial step in taxonomy development. The information should be organized in a manner that offers cognitive economy in addition to a definite hierarchical structure. More specifically, the ideal taxonomy should include: – Universal categories, which appropriately represent the visual resources in order to increase the effectiveness and the efficiency of user browsing and searching – Categories that present the appropriate level of granularity in order to improve the effectiveness and the efficiency of user searches – Clear definitions available to eventual users in order to enhance the satisfaction of the taxonomy by users The choices of top-level categories and their subcategories often involve the combination of two parallel approaches: a bottom-up approach and a top-down approach. Starting with a list of potential categories resulting from domain analysis, relevant values are extracted. These values are then clustered into a limited number of classes. A more conceptual analysis of the image database also needs to be carried out, inspired by a review of best practices. The authoritative sources
44
Ménard
analysed previously are used for the selection and the definition of the top-level categories and their subcategories. This phase of the taxonomy development is incremental, implying the combination of a bottom-up approach and a top-down strategy at different stages of the project. According to ISO 9241 – 11 (Association française de normalisation, 1998), the standards related to usability are primarily concerned with: (1) the use of the product (effectiveness, efficiency and satisfaction in a particular context of use; (2) the user interface and interaction; (3) the processes used to develop the product; and (4) the capability of an organization to apply user-centred design. Following the preliminary analysis of best practices, the organization of categories should be based on facilitating the learning process, on helping users to remember how to use the taxonomy structure, on reducing the likelihood of errors and on increasing efficiency and effectiveness in order to deliver a higher degree of user satisfaction. More specifically, the general organization of the taxonomy should: – Take into account the user’s perspective in order to increase the level of satisfaction – Limit the number of choices presented to the user in order to avoid frustration and increase the degree of satisfaction – Ensure the interface, searching and browsing functionalities are as simple as possible in order to improve the efficiency and effectiveness of user searches Term clustering and entity listing involves the process of breaking up the taxonomy domain into elemental classes from which categories / facets are selected. Each class is clearly and simply defined in order to enhance the satisfaction of the taxonomy users. Simple and short definitions that can later be published within the browsing interface are recommended. The next stage involves the organization of the taxonomy into a coherent structure produced simultaneously with the choice and definition of top-level categories and their subcategories. The number of top-level facets as well as the depth of the taxonomic structure is generally restricted to seven first-level facets. This number is close to the nine top-level facets reported regularly in the literature (Lambe, 2007). It is also usually recommended to restrict the level of depth to four, developing up to five in some instances. In the case of a bilingual taxonomy, as much as possible, the categories and their descriptors need to be simultaneously developed in both languages. The last stage of the facet and subfacet selection process entails the normalization of the facets and their descriptors using authoritative sources. In addition, it is important to take into account that the taxonomy is often meant to be used by the average user, and the vocabu-
Chapter 2. Multilingual taxonomy development for ordinary images
45
lary should be kept as simple as possible. Finally, the ideal development process should include usability tests conducted with real users in order to overcome the challenges regularly encountered by usability testing of taxonomies.
3 Development of an image taxonomy model This section presents the methodological approach of a research project that aims to develop a taxonomy for digital image indexing, which could be used to enhance retrieval in monolingual and multilingual contexts. The different phases leading to the development of a multi-faceted, bilingual (English / French) taxonomy used as a visual representation for an image database are described, as are some issues and challenges identified during the different stages of the development.
3.1 Image database creation Finding the perfect image collection that can be used for the development of a taxonomy is not an easy task, since most image databases are subject to copyright like any other type of documents. Among the solutions considered for this research project, it was decided to create a database of ordinary images from digital images freely donated by their owners. The collection of images was conducted between May 1, 2011 and July 15, 2011. A total of 14 donors accepted to give copies of their pictures to constitute the database called IDOL (Images DOnated Liberally). All donations were entirely voluntary and participants could choose to withdraw their donation at any point from the project. It was established that the pictures, as well as the information identifying the pictures, would remain confidential. Each donor received a personal ID number and only the Principal Investigator and her research assistants had access to this information. No identification information relating to a donor would be published. Moreover, no compensation was offered in exchange for copies of the pictures. A draw was held among all donors once the image collection was completed. For the purposes of our research, we first collected 8,932 images. In a second step, we reduced the number of images to 5,439. This decrease in the number of images was mainly done to meet three objectives: (1) elimination of images presenting poor visual quality; (2) selection of images that could be easily identified by a majority of individuals; and (3) diversification of the images included in the database. Consequently, some images were excluded from the database mainly due to the poor image quality offered (e.g., images that were fuzzy or out
46
Ménard
of focus). Other images were also removed from the database because they were considered to be outside the range of the taxonomy (e.g., wall graffiti). Finally, as stated by Jörgensen (2003), the image database was enhanced by the inclusion of images representing a wide variety of contents (Figure 1). Therefore, during the creation of the image database, we targeted this goal rather than trying to include a large quantity of images representing the same content (weddings, beaches, babies, cats, etc.). As an example, the object “flower” was available in several different varieties (as of July 15, 2011, end date of the collection). However, examination of different varieties of flowers did not reveal a great diversity in the visual process that could affect the image indexing and retrieval. It was therefore determined, in cases where an object was proposed in multiple ways, that a maximum of 50 images representing the same content would be subsequently included in the database.
Figure 1. Examples of images included in the IDOL database.
3.2 General methodological approach For the development of a bilingual taxonomy model, a three-step methodology is proposed. First, a best practices review consisting of an extensive analysis of existing terminology standards in image indexing needs to be performed. The objectives of this extensive examination are to acquire knowledge of the users’ terminology standards and to assess how this terminology can be integrated in the development of the taxonomy. For this review, specialized terminologies used by professional indexers (such as Art and Architecture Thesaurus, Iconclass, etc.) have been examined. Moreover, with social tagging becoming more popular and users willing to provide annotations and tags for images, the study of tags available in image-sharing systems (such as Flickr, Photobucket, etc.) also provides a basis for identifying patterns and trends in the types of terms employed by real users in their process of personal image tagging / indexing. The advantage of using these resources is that they generally regroup tags (or uncontrolled indexing terms) in multiple languages. The analysis of indexing terms of ordinary
Chapter 2. Multilingual taxonomy development for ordinary images
47
images used both by indexing specialists and non-specialists represents a crucial step in the taxonomy development, since it generates the basic guidelines and standards for the categories and formats of terms, and for the construction of relationships to be included in the new bilingual taxonomy. The second step of the methodology entails the formal structuring of the taxonomy. It involves the choices of top-level categories and their subcategories. Two parallel approaches are usually combined: a bottom-up approach and a top-down approach. Starting from a list of potential concepts resulting from the best practices review, the relevant values are extracted and these values are clustered into a limited number of categories. These categories form the basis from which categories, subcategories and candidate descriptors are selected. From the listing of concepts comes taxonomy organization using various techniques, including taxonomy perspectives (top-down); shared characteristics of concepts, particularly those from users or content (bottom-up); and high-level unification (top-down). These processes identify gaps, which should be filled in, and the hierarchy and alternative entry point types and views. As a result, the selected concepts can be organized into a coherent structure produced simultaneously with the choice and definition of top-level categories and their subcategories. The number of top-level categories, as well as the depth of the taxonomic structure, needs to be kept to a minimum in order to avoid frustration and increase the degree of satisfaction of the eventual users. Since the taxonomy is also meant to be a bilingual tool, the selected concepts and their descriptors are developed simultaneously, as much as possible, in English and French. At this stage, the taxonomy includes essential facets that appropriately represent ordinary images in order to increase the effectiveness and the efficiency of user browsing and searching. The categories also offer the appropriate level of granularity in order to improve user searches. Finally, clear and available definitions of the concepts included in the taxonomy are provided in order to enhance user satisfaction. Since the taxonomy development process consists of several steps that are iterative in nature, incremental user testing needs to be carried out in order to validate and refine the taxonomy components (categories, values and relationships). The taxonomy must be validated using a number of different samples of images and participating indexers and users. During the taxonomy development, a two-phased user testing strategy needs to be conducted to ensure the final product is clear, comprehensive and consistent. During the taxonomy testing, Spiteri’s Simplified Model for Facet Analysis (1998) is used to respect the principles of qualities in the choice and naming of the categories: – Differentiation: Criteria of organization between fundamental facets (toplevel facets) and their arrays (subfacets) are distinct and logical.
48 – – – – –
Ménard
Relevance: Facets are expected to adequately reflect the purpose, subject and scope of the classification system. Ascertainability: Facet names are simple and circumscribed. Permanence: Facets represent permanent qualities of the item being divided. Homogeneity: Fundamental facets are homogeneous, that is, they’re situated at the same level of granularity. Mutual Exclusivity: Characteristics of division between facets are mutually exclusive, that is, each facet describes one single aspect of a subject.
The formal testing of the taxonomy normally involves image indexing and retrieval experiments in a real user context. Many scenarios are possible. For example, the user testing method employed for the first evaluation proposes to use the card sorting technique. This technique allows the opportunity for the participant to organize specific information content (Rugg & McGeorge, 2005). This data collection method is mainly used in domains such as psychology, cognitive science and Web usability. Card sorting is regularly chosen to test taxonomies mainly due to its suitability to the information needed to validate the first two levels of the taxonomy structure. The card sorting technique allows a quick confirmation of a given information structure and the understandability of the proposed labels. This technique also provides validation that the first levels of the taxonomy correspond to the mental representations of the possible users regarding an ordinary image database (Miller et al., 2007). Another scenario involves user testing with a group of specialists and nonspecialists indexing a selection of images using the taxonomy. A questionnaire is developed in order to gather the comments of the indexers. Unstructured interviews are also conducted to deepen the knowledge of the ins and outs of the taxonomy. This type of interview usually contains open-ended questions and progresses along the topic. This data collection method offers the necessary flexibility to explore multiple respondents’ points of view (Sproull, 2002). Upon completion of this first evaluation, the taxonomy is refined according to the comments and suggestions received from the indexers. The second phase of testing involves an evaluation of the performance of the taxonomy usage through a usability test under experimental conditions. This supposes a sufficient number of respondents to form at least two comparison groups, random assignment of the subjects to each group and manipulation of certain variables while others are kept constant. The objective of this testing is to ask a representative sample of image searchers to complete typical image retrieval tasks of images indexed with the new taxonomy to measure the degree of effectiveness, efficiency and user satisfaction. The performance testing is also
Chapter 2. Multilingual taxonomy development for ordinary images
49
expected to identify usability inconveniences of the new taxonomy that may not be revealed by less formal testing (Sproull, 2002). This type of experiment aims to evaluate the quality of the vocabulary, the structure of the taxonomy and the selection of specific categories / facets. Respondents (English and French native speakers) from different age groups and with various experiences in image searching (elementary school students, regular workers, retired people, etc.) are asked to participate in the testing. The structure of the taxonomy (English and French) is reproduced using a standard Windows folder structure. This type of system, though limited by the capabilities of the folder structure, still offers the advantage of giving the opportunity to a large number of individuals to eventually retrieve images (Sproull, 2002). All of the levels of the taxonomy are included in the structure to allow the participants to browse the entire taxonomy in order to retrieve a set of images. Using the taxonomy structure, participants are invited to retrieve various images. The general idea of the test is to ask the participants to indicate which categories they would consider appropriate to retrieve the images from the actual database. During the testing, several variables are recorded, including the number of choices used to complete the retrieval task and the categories selected by the participants for each image to be retrieved. The evaluation of the taxonomy performance is based on the usability measures recommended by the ISO 9241 – 11 standard, that is, effectiveness, efficiency and user satisfaction (Association française de normalisation, 1998). Once the retrieval simulation is completed, a written questionnaire is submitted to the participants in order to obtain their general opinions on the taxonomy and to report any difficulties encountered during the retrieval process. The questionnaire evaluates the quality of the entire taxonomy as well as the overall satisfaction from an end-user perspective.
4 Issues and challenges related to multilingual vocabularies 4.1 Living in a “small world” Searching for multilingual resources on the Web, in metadata catalogues or any bibliographical databases, including visual resources such as images, is not an easy task. The literature review revealed that it is well established that Internet users communicate not only in its established lingua franca, English, but also in a multitude of other languages. It also indicated that online information users
50
Ménard
may want to access information in non-native languages for numerous purposes. For example, Oard (1997) revealed that cross-language information retrieval (CLIR) technologies would considerably improve commercial online services, such as Dialog and Lexis / Nexis. Petrelli, Beaulieu and Sanderson (2002) stated that translators and information specialists, such as reference librarians acting as intermediaries for patrons, also have an interest in multilingual information sources. Governments, international institutions, businesses and industries look for information in foreign-language online newspapers and other Web resources (Chen & Bao, 2009). Countries with populations who have diverse languages face significant problems of multilingualism. Notably, this is the case for Belgium, Switzerland and Canada. In a larger context, it is also the case for the European Union. Other countries, such as China, for example, somehow mitigated this problem since they adopted a single writing system. Nevertheless, access to multilingual information plays a major role in many circumstances: information sharing in multinational corporations, national security (terrorism, nuclear power) and so forth. Even if it is difficult to accurately estimate the linguistic diversity of the Web, some annual statistics consistently report the decline in the number of Internet users whose mother tongue is English. For example, it was reported in 2010 that the proportion of English Web users had dropped to 27.3 %, compared to a proportion of 35.9 % in 2004 (Internet World Stats, 2011). This change necessarily implies a reorganization of Internet services, including the indexing languages and the functionalities of search engines. Even more evident is that more people using different languages now have access to multilingual documents, including visual resources such as images. All types of documents indexed with multiple languages, including visual resources, are now available on the Web, and search engines offer their users means to access this information. For example, Google launched “language tools” in May 2007, and Google has been identified as the search engine with the best multiple language support (Zhang & Lin, 2007). These language tools provide users a “translated search” functionality, among others, that allows searching in other languages and translates the resultant documents. However, for the time being, these language tools do not work in conjunction with image retrieval functionalities. As a consequence, access to images is still limited to monolingual search results. For example, search results for the query “car” will be all of the images indexed with the word “car.” If the image searcher also wants to expand the results with the equivalent French word “automobile,” he or she will need to formulate another query in French. And if not familiar with the French word “automobile,” the image searcher will need to find a way to translate the query “car”
Chapter 2. Multilingual taxonomy development for ordinary images
51
into French, since the search engine does not provide a direct link between the image search functionalities and the language functionalities. Unfortunately, the lack of connections between search functionalities is not obvious to most users and still causes frustration and discouragement. As a result, most users will continue to search for images in one language only and will be denied access to treasure troves of visual resources. The results obtained are generally in the language used in the search query. This is the main reason multilingual approaches and multilingual indexing tools are needed, since they could be a gateway to coexistence and dialogue between different languages and discourses. With the current and anticipated explosion in online information, multilingual controlled vocabularies are being recognized as more vital and essential to improve access to relevant information. Nevertheless, multilingual controlled vocabularies (thesauri, taxonomies, etc.) are still scarce, and the process takes time to produce quality and coherence in these vocabularies.
4.2 Multilingual indexing tools Among multilingual resources, many thesauri are currently used by cultural institutions, mainly in European countries. For example, The UNESCO Thesaurus is a controlled and structured list of terms used in subject analysis and retrieval of documents in the fields of education, culture, natural sciences, social and human sciences, communication and information, politics, law and economics, and countries and country groupings. This trilingual thesaurus contains more than 7,000 terms in English, 8,600 terms in French and 6,800 terms in Spanish (UNESCO, 2011). The Library of Congress Subject Headings (LCSH) is a quasi-thesaurus from which the subject of library documents (books, articles, websites, etc.) is selected. It is an accumulation of the headings established at the US Library of Congress since 1898. It currently contains more than 220,000 English terms (Library of Congress, 2011). The Multilingual Access to Subjects (MACS) project aims to provide multilingual access to subjects in the catalogues of the participants. This project was conceived by the Deutsche Nationalbibliothek (SWD: Schlagwortnormdatei), the British Library (Library of Congress Subject Headings), the Bibliothèque nationale de France (Répertoire d’autorité-matière encyclopédique et alphabétique unifié) and the Swiss National Library. It is worth mentioning that the RAMEAU language has been developed since 1980 in an autonomous way at Quebec’s Université Laval. The “Répertoire de vedettes-matières” (Université Laval, 2011) is a translation / adaptation of the Library of Congress Subject Headings. Some
52
Ménard
English and French equivalents already exist, and this allows a search of some French library catalogues with the LCSH (Landry, 2006). In the field of iconographic description, Iconclass is a specific international classification that the museums can employ for iconographic research and the documentation of images. It contains definitions (approximately 28,000) of objects, people, events, situations and abstract ideas that can be the subject of an image. The Iconclass browser offers English, German, French and Italian keywords, as well as descriptions. Partial translations in Finnish and Norwegian and experimental translations in Chinese and Dutch also exist (Iconclass, 2011). The Art and Architecture Thesaurus (AAT) is a thesaurus of terms used in the cataloguing and indexing of art, architecture, artifactual and archival materials. This structured vocabulary currently contains around 131,000 terms and other information about concepts. The AAT grows through contributions. For example, the complete translation of the AAT in Spanish was integrated in 2010. A Chinese translation is under way by the National Digital Archives Program, Taiwan, as well as a German translation undertaken by the Institut für Museumsforschung in Berlin and a full Dutch translation of the AAT by the Rijksbureau voor Kunsthistorische Documentatie. The integration of 3,000 Italian object-type terms from ICCD, Rome, is under way. Finally, a set of 3,000 French terms from the Canadian Heritage Information Network (CHIN) has been fully integrated (Getty, 2011).
4.3 Issues and challenges Similarly to other controlled vocabularies, few fully developed and working multilingual taxonomies exist. The literature review conducted for that project highlighted that almost everything needs to be done in that domain. Lambe (2010) speculates that a taxonomy is more or less a consequence of the “Babel instinct” of fragmentation within organizations. He also considers that taxonomy looks like an “ideal” solution to the division caused by different languages and different vocabularies. Creating a multilingual taxonomy can be very expensive and highly complicated due to the semantic problems between different languages. Undoubtedly, it can also be time-consuming. Ideally, any multilingual vocabulary should cover all concepts of interest to the users in the various languages at a minimum of all domain concepts lexicalized in any of the participating languages. It also must accommodate hierarchical structures suggested by different languages (Hudon, 1997). However, this is not an easy process. Numerous problems relating to multilingual vocabularies exist. Among them, it is worth mentioning the high cost of the construction and management processes, specific indexing use (manual keyword assessment or errors in automatic keyword
Chapter 2. Multilingual taxonomy development for ordinary images
53
assessment) and new domain calling for a new vocabulary. In a word, the problems encountered in the development of multilingual controlled vocabularies are similar to the difficulties resulting from the construction of a monolingual controlled vocabulary. To these difficulties, a real “multilingual problem” must be added: equivalence. For example, the translation of an English controlled vocabulary into Italian does not make an Italian controlled vocabulary. Many aspects need to be considered. For example, equivalence of terms exists only in some contexts, certain terms do not exist in other languages, and so on. Sometimes, two terms mean almost the same thing, but diverge to some extent in meaning or connotation (e.g., the pair English “alcoholism” and French “alcoolisme”; English “vegetable” includes potatoes, whereas German “Gemüse” does not). A possible solution, for example, introducing separate concepts under a broader term or using a scope note, needs to clearly instruct indexers in all languages how the term is to be used so that the indexing stays, as much as possible, free from cultural bias or reflects multiple biases (Soergel, 1997). According to Hudon (1997), many categories of problems can occur when developing a multilingual controlled vocabulary. First, semantic problems affect the equivalence relations between terms used as preferred and non-preferred terms. Second, some equivalence relations exist only within each separate language involved, but also sometimes between the languages (intra-language equivalence and inter-language equivalence). Third, intra-language homonymy and inter-language homonymy are also considered to be crucial semantic questions. In addition, problems pertaining to semantics involve the scope, form and choice of terms. Finally, categories of problems can be closely related to the structure of the controlled vocabulary. These structural problems include hierarchical and associative relations between the terms. In most, if not all, cases, the structure will most probably not be the same for all languages involved, whereas in some cases it will be possible to use the same structure for all languages.
5 Conclusion Issues related to image indexing are numerous, and resolving them remains a challenge. Different approaches are needed for different kinds of image collections, depending on their content and the needs of their users. Standardized methods, data structures and controlled vocabularies promote as much as possible a networked exchange of data. According to Hedden (2010b), the word “taxonomy” has now become a popular term for any hierarchical classification or
54
Ménard
categorization system. As a type of controlled vocabulary, the taxonomy structure proposes a single hierarchical structure and has parent / child or broader / narrower relationships to other terms. The structure is sometimes referred to as a “tree.” Sometimes, non-preferred terms / synonyms may or may not be part of a taxonomy. The primary goal of any taxonomy is to offer consistent, accurate and rapid indexing and retrieval of content. Can this be also true for images? Can a taxonomy be the optimal form of organization for images? The fact that both indexers and end-users benefit from them makes taxonomies very attractive. However, cost-effective taxonomy development requires the active participation of many specialties, including IT staff, corporate librarians, departmental publishers, commercial information providers and international standards bodies. In the same vein, the participation of the end-users is of crucial importance in the taxonomy development. However, using a taxonomy or any type of controlled vocabulary remains an uneasy process for many. Furthermore, the phenomenon of social tagging now looks very appealing to users, who can use their own keywords to tag their pictures and forget all about the often capricious and mysterious employ of controlled vocabularies. The main feature of social indexing found in the image-sharing systems is the use of uncontrolled vocabulary, that is, the language that the individual uses on a daily basis. Therefore, user tags can take all possible forms corresponding to culture and language proficiency, and be valuable for targeted image searches. In addition, social tagging with uncontrolled vocabulary found in the image-sharing systems can be extracted from a single language or combine many languages. However, this type of “free” indexing relies on the capacity of individual indexers. Consequently, the efficiency of retrieval, as well as the satisfaction of the image searcher, also depends on this ability. A good taxonomy helps decision makers see all of the perspectives, obtain and explore categories from each, and explore lateral relationships among them. A taxonomy can also support discovery and suggest terms and associated content of potential interest that the searcher was not originally directly searching for. This means greater potential for image searching, since including related terms, by displaying additional terms tagged to the image or by dynamically suggesting related content, shares auto-categorization keywords. In the context of the image retrieval process, many questions remain unanswered. Do we really need another controlled vocabulary? Do people want to be “controlled,” instead of being empowered to make their own decisions? Do image searchers need only a form of intuitive help in order to ease the image retrieval process? It is a known fact that Internet image searchers still tend to mainly use text to formulate their queries, even though images are visual information sources with little or no text associated with them. Although sometimes an image search
Chapter 2. Multilingual taxonomy development for ordinary images
55
query is very simple to formulate (e.g., Lady Gaga, birthday cake), often users want to obtain more precise images that involve more complex queries. Furthermore, retrieving images indexed or associated with text written in unfamiliar languages is also very challenging for most users. Internet users can, of course, use the existing machine translation mechanisms offered by most search engines. However, in the case of image retrieval, relying on a machine translation system is no doubt equivalent to buying a lottery ticket and hoping the results will be satisfactory. However, most queries formulated to find images usually contain only a few terms and do not provide enough context to be well translated by the machine. Consequently, successful image retrieval is largely based on appropriate manual indexing, since automated image content analysis is still limited. Giving the image searchers the possibility of using an imagebased taxonomy appears to be the ideal solution. Image searchers still need to be assisted when formulating their queries, especially when more than one language is involved in the indexing process. In reality, assuming that everybody can perform searches in many languages is a utopia. The diversity of languages, traditions and cultural experiences is a richness that often remains very deeply buried because individuals just do not know how to access multilingual documents, including visual resources. A bilingual taxonomy could be a vital step in shattering the image searcher’s solitude.
References Armitage, L. H., & Enser, P. G. B. (1997). Analysis of user need in image archives. Journal of Information Science, 23(4), 287 – 299. Association française de normalisation (1998). Exigences ergonomiques pour travail de bureau avec terminaux à écrans de visualisation (TEV) – partie 11: Lignes directrices relatives à l’utilisabilité. Genève: Organisation internationale de normalisation. Besser, H., & Snow, M. (1990). Access to diverse collections in university settings: The Berkeley Dilemma. In T. Petersen & P. Molholt (Eds.), Beyond the book: Extending MARC for subject access (pp. 203 – 224). Boston: G. K. Hall. Chen, J., & Bao. Y. (2009). Cross-language search: The case of Google Language Tools. First Monday 14(3). Retrieved from http: // firstmonday.org / htbin / cgiwrapbin / ojs / index.php / fm / article / viewArticle / 2335 / Choi, Y., & Rasmussen, E. M. (2002). Users’ relevance criteria in image retrieval in American history. Information Processing & Management, 38(5), 695 – 726. Choi, Y., & Rasmussen, E. M. (2003). Searching for images: The analysis of users’ queries for image retrieval in American History. Journal of the American Society for Information Science and Technology, 54(6), 498 – 511. Chung, E. K., & Yoon, J. W. (2009). Categorical and specificity differences between user-supplied tags and search query terms for images. An analysis of Flickr tags and Web
56
Ménard
image search queries. Information Research, 14(3). Retrieved from http: // informationr. net / ir / 14 – 3 / paper408.html Conniss, L. R., Ashford, A. J., & Graham. M. E. (2000). Information seeking behaviour in image retrieval: VISOR I Final Report. Newcastle upon Tyne: University of Northumbria at Newcastle, Institute for Image Data Research. Conniss, L. R., Davis, J. E., & Graham, M. E. (2003). A user-oriented evaluation framework for the development of electronic image retrieval systems in the workplace: VISOR II final report. Newcastle upon Tyne: University of Northumbria at Newcastle, Institute for Image Data Research. Enser, P. G. B. (2008). The evolution of visual information retrieval. Journal of Information Science, 34(4), 531 – 546. Enser, P. G. B., Sandom, C. J., Hare, J. S., & Lewis, P. H. (2007). Facing the reality of semantic image retrieval. Journal of Documentation, 63(4), 465 – 481. Getty Foundation (2011). Art & Architecture Thesaurus – editorial guidelines. Retrieved from http: // www.getty.edu / research / tools / vocabularies / aat / index.html Goodrum, A. A., & Spink, A. (2001). Image searching on the Excite Web search engine. Information Processing & Management, 37(2), 295 – 311. Greisdorf, H. F., & O’Connor, B. C. (2008). Structures of image collections: From Chauvet-Pont d’Arc to Flickr. Westport, CT: Libraries Unlimited. Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5, 199 – 220. Hedden, H. (2010a). The accidental taxonomist. Medford, NJ: Information Today. Hedden, H. (2010b). Taxonomies and controlled vocabularies best practices for metadata. Journal of Digital Asset Management, 6, 279 – 284. Hudon, M. (1997). Multilingual thesaurus construction: Integrating the views of different cultures in one gateway to knowledge and concepts. Information Services & Use, 17(2 / 3), 111 – 123. Hudon, M. (2003). True and tested products: Thesauri on the Web. The Indexer, 23(3), 115 – 119. Iconclass (2011). Iconclass home page. Retrieved from http: // www.iconclass.nl / Internet World Stats. (2011). Internet users by language. Retrieved from http: // www. internetworldstats.com Jörgensen, C. (1998). Attributes of images in describing tasks. Information Processing & Management, 34(2 / 3), 161 – 174. Jörgensen, C. (2003). Image retrieval: Theory and research. Lanham, MD: Scarecrow Press. Krause, M. G. (1988). Intellectual problems of indexing picture collections. Audiovisual Librarian, 14(4), 73 – 81. Lambe, P. (2007). Organising knowledge: Taxonomies, knowledge and organisational effectiveness. Oxford: Chandos Publishing. Lambe, P. (2010). How to kill a knowledge environment with a taxonomy [Blog post]. Retrieved from http: // www.greenchameleon.com / gc / blog_detail / how_to_kill_a_knowledge_ environment_with_a_taxonomy / Landry, P. (2006). Multilinguisme et langages documentaires: Le projet MACS en contexte européen. Documentation et bibliothèques, 52(2), 121 – 129. Library of Congress. (2011). Subject headings. Retrieved from http: // www.loc.gov / aba / cataloging / subject / Macgregor, G., & McCulloch, E. (2006). Collaborative tagging as a knowledge organisation and resource discovery tool. Library Review 55(5), 291 – 300.
Chapter 2. Multilingual taxonomy development for ordinary images
57
Markey, K. (1988). Access to iconographical research collections. Library Trends, 2, 154 – 174. Markkula, M., & Sormunen, E. (2000). End-user searching challenges indexing practices in the digital newspaper photo archive. Information Retrieval, 1(4), 259285. Matusiak, K. K. (2006). Towards user-centered indexing in digital image collections. OCLC Systems & Services, 22(4), 283 – 298. McClung, J. (2009). Herding cats: Indexing British Columbia’s political debates using controlled vocabulary. The Indexer, 27(2), 66 – 69. Ménard, E. (2008). Étude sur l’influence du vocabulaire utilisé pour l’indexation des images en contexte de repérage multilingue. (Unpublished doctoral dissertation). Université de Montréal, Montréal, QC. Retrieved from https: // papyrus.bib.umontreal.ca / jspui / bitstream / 1866 / 2611 / 1 / menard-e-these-indexation-reperage-images.pdf Ménard, E. (2009). Images: Indexing for accessibility in a multi-lingual environment – challenges and perspectives. The Indexer, 27(2), 70 – 76. Ménard, E. (2011). Study on search behaviours of image users: A case study of museum objects. Partnership: The Canadian Journal of Library and Information Practice and Research, 6(1). Retrieved from http: // journal.lib.uoguelph.ca / index.php / perj / article / view / 1433 / 2079 Miller, C. S., Fuchs, S., Anantharaman, N. S., & Kulkarni, P. (2007). Evaluating category membership for information architecture (Technical Report 07 – 001). Chicago, IL: DePaul University, CTI. Oard, D. W. (1997). Serving users in many languages: Cross-language information retrieval for digital libraries. D-Lib Magazine, 3. Retrieved from http: // www.dlib.org / dlib / december97 / oard / 12oard.html. Ohlgren, T. (1980). Subject indexing of visual resources: A survey. Visual Resources, 1(1), 67 – 73. Panofsky, E. (1955). Meaning in the visual arts: Papers in and on art history. Garden City, NY: Doubleday. Petrelli, D., Beaulieu, M., & Sanderson, M. (2002). User-participation in CLIR research. In Proceedings of the Hawaii International Conference on System Science – HICSS33. Retrieved from http: // ucdata.berkeley.edu:7101 / sigir-2002 / sigir2002CLIR-12-petrelli.pdf Pincher, M. (2010). A guide to developing taxonomies for effective data management. Computer Weekly. Retrieved from http: // www.computerweekly.com / Articles / 2010 / 04 / 06 / 240539 / A-guide-to-developing-taxonomies-for-effective-data.htm Roddy, K. (1991). Subject access to visual resources: What the 90s might portend. Library Hi Tech, 9(1), 45 – 49. Rorissa, A. (2008). User-generated descriptions of individual images versus labels of groups of images: A comparison using basic level theory. Information Processing & Management, 44(5), 1741 – 1753. Rosch, E. (1978). Principles of categorization. In A. Collins & E. E. Smith (Eds.), Readings in Cognitive Science, a Perspective from Psychology and Artificial Intelligence (pp. 312 – 322). San Mateo, CA: Morgan Kaufmann. Rugg, G., & McGeorge, P. (2005). The sorting techniques: A tutorial paper on card sorts, picture sorts and item sorts. Expert Systems, 22(3), 94 – 107. Shatford, S. (1986). Analyzing the subject of a picture: A theoretical approach. Cataloging & Classification Quarterly, 6(3), 39 – 61. Soergel, D. (1997). Multilingual thesauri and ontologies in cross-language retrieval. Proceedings of the AAAI Spring Symposium on Cross Language Text and Speech Retrieval. Retrieved from http: // www.dsoergel.com / cv / B60.pdf
58
Ménard
Spink, A., & Jansen, B. J. (2004). Web search: Public searching of the Web. Boston: Kluwer Academic. Spiteri, L. (1998). A simplified model for facet analysis: Ranganathan 101. Canadian Journal of Information and Library Science, 23(1 / 2), 1 – 30. Sproull, N. L. (1995). Handbook of research methods: A guide for practitioners and students in the social sciences. Metuchen, NJ: Scarecrow Press. Stvilia, B., & Jörgensen, C. (2009). User-generated collection-level metadata in an online photo-sharing system. Library & Information Science Research, 31(1), 54 – 65. Turner, J. M. (1993). Subject access to pictures: Considerations in the surrogation and indexing of visual documents for storage and retrieval. Visual Resources, 9(3), 241271. Turner, J. M. (1994). Determining the subject content of still and moving image documents for storage and retrieval: An experimental investigation (Unpublished doctoral dissertation). University of Toronto, Toronto, ON. Turner, J. M. (1998). Images en mouvement: Stockage, repérage, indexation. Sainte-Foy: Presses de l’Université du Québec. UNESCO. (2011). Diversité culturelle et linguistique dans la société de l’information. Retrieved from http: // portal.unesco.org / ci / fr / file_download.php / f0138f3685432a579c5cfc5849314368culture_fr.pdf Université Laval (2011). Répertoire de vedettes-matière. Retrieved from https: // rvmweb.bibl. ulaval.ca / Whittaker, M., & Breininger, K. (2008). Taxonomy development for knowledge management. World Library and Information Congress: 74th IFLA General Conference and Council. Retrieved from http: // archive.ifla.org / IV / ifla74 / papers / 138-Whittaker_Breininger-en.pdf Zhang, J., & Lin, S. (2007). Multiple language supports in search engines. Online Information Review, 31(4), 516 – 532.
Chris Landbeck
Chapter 3. Access to editorial cartoons: The state of the art Abstract: This chapter is a review of the state of the art in indexing editorial cartoons in large collections. It includes a review of the literature concerning image indexing in both broad and specific terms in order to establish what we might reasonably expect when looking at such systems. It then differentiates between sources and resources, the latter being the focus of the remainder of the chapter. The chapter focuses on the merits and problems of several resources for editorial cartoons, makes some suggestions about further efforts, and concludes by questioning why the state of the art is what it is. Keywords: Editorial cartoons, political cartoons, image analysis, image queries, image description, Jörgensen’s 12 Classes of Image Description, metadata, image access, historical documents, image interpretation
Chris Landbeck, Doctoral Candidate, School of Library and Information Studies, Florida State University, [email protected]
Introduction Editorial cartoons are a neglected part of the historical record, partly because of the difficulty in collecting and delivering such images, and partly because of the lack of consistent and coherent description of them. Until the advent of the cheap and abundant electronic storage and communication of information, such images were difficult to assemble in one place because of the limitations of print media; Mankoff (2004) speculates that to print the first 86,000 cartoons of The New Yorker it would be over 20,000 pages thick or have pages the size of barn doors, and that anything smaller would compromise the readability of the cartoon. To say that such a book might have problems circulating in a library, or even just being delivered to and housed in a research environment, would be an understatement. To compound the problem, retrieving cartoons from such a publication would be even more difficult, because the index that describes such images may well take as many pages as the cartoons themselves, not to mention the necessity of turning such pages back and forth while looking for an image that
60
Landbeck
fills the user’s needs. To make all of this an even more unlikely situation, there is not an extant system for uniformly describing editorial cartoons. Certainly, there are very well-developed ways of describing images in general, particularly works of high art, but it is not difficult to imagine that such focused and type-specific methods might not work well with materials other than intended. If we fast forward to the present day, where electronic storage and communication are quite cheap and very abundant we see that assembling large collections of political cartoons is not the least bit difficult; while gathering such images together may present some problems, the orderly and efficient storage of those images in an electronic format is something to which we rarely give a second thought. The same goes for the communication of those images; getting such cartoons from the artist to the publisher and to the collector is now so easy and inexpensive that some children routinely use the same technology to watch TV from their computers. But the ability to accurately and consistently describe political cartoons has not yet been developed. While time, money, and attention have been given to the technological issues of information storage and communication, little attention has been given to the organization of that information, and accessing political cartoons in large electronic collections may suffer from this lack. Access to editorial cartoons lags behind access to other types of images, and because of this, their inclusion in the historical record remains haphazard. The Library of Congress (2009) houses the largest collection of editorial cartoons in the world, but provides uneven access to them, describing a few images in great detail and others with very little. The Doonesbury collection (Trudeau, 1998) provides access to the entire series of strips up to 1997, but clutters this access with several extraneous kinds of information, such as celebrity birthdays, sports events, and other such ephemera that does not enhance understanding of or the findability of cartoons. The New Yorker’s collection of cartoons states that its collection is described on an ad hoc basis, reflecting terms popular in the everyday language of the time but of limited utility to following generations (Mankoff, 2004). The CNN Archive of Political Cartoons (2009) provides access only by date. And while it is true that some small and limited-scope cartoon collections describe their contents well (Bachorz, 1998; Mandeville, 2009), their methods have not been implemented in such a way as to gauge their usefulness in a large collection. Compare these to ARTstor (2011), Corbis Images (2011), and the Getty (2011) and Guggenheim (2011) imagebases, and a gap in coverage, treatment, and research become evident. What follows is: first, an examination of pertinent literature that deals with access to images in general as well as editorial cartoons in specific; second, an examination of the state of the art in how political cartoon collections are organized. While there are a number of cartoon anthologies, based either on the works of certain authors or centred around a given subject, very little has been written
Chapter 3. Access to editorial cartoons
61
about indexing or describing political cartoons. But much research has been done on describing images in general, and from some of these, guidelines and expectations are drawn. Once this is done, the three part of the chapter will be an examination of the state-of-the-art in cartoon description takes place, scrutinizing how cartoon collections are organized and extracting what lessons we can from those examples.
Literature review Assumptions Given the dearth of research concerning indexing political cartoons, those articles that deal directly with them will be assumed to be both factual and pertinent. Of all the sources cited here, the most directly relevant is Chappel-Sokol’s Indexing political cartoons (1996). She first notes: For years researchers have conducted their tedious research by sifting through piles of yellowed, crumbling newspapers, seeking the page on which the cartoon was customarily published – never knowing if that particular cartoon was about the desired subject or by the desired illustrator. (p. 22)
She states that most large cartoon and newspaper syndicates do not routinely index cartoons at all, citing little demand for reprints and the costs of doing so, and the effects on cartoons of time passing, with industry leaders stating that the value of a cartoon diminishes quickly because of these. From this article, we can draw three basic assumptions: 1) editorial cartoons are time sensitive; 2) there is no tradition of describing editorial cartoons for the Electronic Age to draw on; 3) editorial cartoons do not exist in a vacuum, but in a rich and active world that a reader must be familiar with in order to both perceive the visual part of the cartoon as well the message within it. On a separate issue, the idea that editorial cartoons are in fact historical documents – ones quite close to the feelings of the time on a given issue – is found in Weitenkampf (1946). He notes that even obvious partisanship in a cartoon is a commentary on the times and, as such, is a perhaps unintentional part of the historical record as well. He goes so far as to say that where the creation of standard paintings or etchings denoting a given event may be years removed from the event that inspired them, editorial cartoons are “… a contemporary reaction to events or actions or trends of thought or prejudices which called forth the caricaturist’s
62
Landbeck
comment” (p. 172). Weitenkampf’s contention that editorial cartoons are historical documents in and of themselves will be assumed to be true in this chapter. Enser (2008) comments that even with all of the technological and conceptual progress that’s been made in image retrieval over the last 20 years, there is still a disconnect between those who index images and those who search for them, giving a possible explanation to the gap in attention between describing political cartoons and other kinds of images: … those involved in the professional practice of visual asset management and … those at the cutting edge of research in image retrieval … need a shared perception of the principles and practices that guide their respective endeavours if both opportunity and challenge are to be addressed effectively … Sadly, it remains the case that professional practitioners have only a minimal engagement with the activities of those occupied in image retrieval research, and the endeavours of the latter community have been little informed by the needs of real users or the logistics of managing large scale image collections. (p. 3)
Image description: Theory The theoretical foundations used in this evaluation of political cartoon resources is grounded in the idea that there are, in fact, different kinds of images, and that those images can be differentiated by the level of understanding and depth of knowledge necessary to what is being depicted in the image. Panofsky’s theories (1939) dealt with interpreting the subject of artwork, assuming that the art in question is Renaissance art; paintings are the main subject that he treats, but his ideas can also be applied to the sculptures, artefacts, and other art of the era. His theories show that there is a continuum of meaning that can be broken down into three basic parts: pre-iconography, iconography in the narrower sense (later called simply “iconography”), and iconography in the deeper sense (later called “iconology”, to demonstrate the difference between this idea and that of iconography). Through each of these, the work in question is interpreted at different levels, though it may be commonly found that these levels bleed into on another. Panofsky does not address any area other than Renaissance art, but ShatfordLayne adds to Panofsky to produce a broader outlook on image description. Shatford Layne (1994) finds that there are four levels of image attributes: Biographical (bibliographic and historical data), Exemplified (content of the image), Relationship (to things other than the image at hand), and Subject, which is the most difficult to deal with. Building on the ideas of Panofsky and expanding the reach of his ideas, she proposes that an image can be of one thing (that which is plainly seen) but about another thing (that which is not seen but is, nevertheless, shown). An image of the end of a hockey game might be about the Miracle on Ice
Chapter 3. Access to editorial cartoons
63
at the 1980 Winter Olympics; a painting might be Generically Of a woman and Specifically About Whistler’s mother. Fidel (1997) presents a different conceptual model, one where images can be sought from what she calls the Object pole, where the image represents what something looks like or as an example of something (such as a stock photo of a highway), or from the Data pole, where an image represents an idea, process, or something beyond that which is inherently included in the image (such as a map). Fidel concedes that there are times when images fall in between the poles, making indexing and retrieval more difficult, but the idea that there are times that an image is merely a visual record while at other times it is a symbol or icon that is useful to the task at hand. In all of these, a range of knowledge and perception is required of the viewer to understand the image at various levels. Each of these theories holds that the basic perception of the constituent parts of an image (Fidel’s Object pole or Panofsky’s pre-iconographic level) is more easily available to most viewers – regardless of nation, tongue, or education – and that a deeper understanding of an image (Fidel’s Data pole or Panofsky’s iconological level) requires a deeper understanding of the culture that produced the image. In evaluating collections of political cartoons, it seems that the indexing of all levels of content – what is it “of” and what it is “about” – are to be sought.
Users’ descriptions of images We also know that the ways that people tend to search for images is different than the ways they try describe them. Studies of users’ queries for images have resulted in some general conclusions about what users look for when seeking images, but nothing specific has been found across the research. When Jörgensen (1995) developed her 12 Classes of image description, she also collected the terms that people used in tasks simulating database searches for such images, finding that the most often-used terms were Literal Objects, then Content Story, Location, and People, these latter three representing a substantial difference in class frequency from the describing task. Jansen (2007) parsed the queries of 587 images searches on Excite.com into the 12 Classes, and found that the four Classes used most often to retrieve images were Literal Object, Content / Story, Location, and People, closely mirroring the findings of Jörgensen. But Chen (2000), when he analysed the queries of 29 art history students who were required to find at least 20 images as part of an assignment, found that Location was the Class most often used, followed by Literal Object, Art Historical Information, and People, producing yet another set of main describers within a given image set.
64
Landbeck
Others have described the image searches of various groups in other ways. Armitage and Enser (1997) found that users seeking images in library databases asked more often for people and places in the specific, and far less so for people and events in the generic. They found that this was true across different types of libraries (to varying degrees), but that in all cases the request for images that display abstract concepts was minimal to non-existent. Hastings (1995) found that, within the art historical research field, that there are four levels of queries: first, direct questions with simple answers, such as the name of the artist or when the image was created; second, either comparisons made by inquiry, or more textual information than that already provided with an image; third, questions that sought to identify the object, actors, or actions within an image; fourth, the most complex questions, such as “what does this image mean?”. Goodrum and Spink (2001) examined image seeking in a popular search engine, finding that most queries had more than three terms and that users often had more than three queries per image. They concluded that more research needs to be done to find out how users represent their needs when formulating image queries and that the representation of higher-order items (those beyond content-based image retrieval) needs more scrutiny. From these, we see that the variability in the frequency of Jörgensen’s classes and in the general tenor of image queries is similar to that found when the images are being described. We can in turn expect that systems for image search and retrieval should be able to provide information regarding both the content of a political cartoon (its components and attributes of them) as well as the context embodied in a cartoon (its place in time and history, and the themes and topics discussed). Taken together, the research regarding the description of images and the content of image queries can be seen to reflect the theories of Panofsky, Shatford-Layne, and Fidel: it is difficult to know a priori what things about a certain image may be of use or interest to a user, or what kind of needs or activities a user might have vis-à-vis an image, so any method for describing images should include aspects of both the mundane and everyday aspects of an image, as well as the deeper and broader implication of the same image, when present.
Users’ attitudes about image searching We know much more about the give and take between the querier and the information system than about what expectations and tools image searchers bring to the table (Lew, Sebe, Djeraba, & Jain, 2006). While there is much already known about the nature of image queries and the success or failure of them, most of the other research concerning users seems to be a few degrees removed from search-
Chapter 3. Access to editorial cartoons
65
ing for images. We know that searching and browsing seem to take a subordinate role to finding answers to questions or problems (DeRosa, Dempsey, & Wilson, 2004), but this is a general commentary on the nature of searching. We know that there has been a rise in the number of conferences and workshops for members of the IEEE and the ACM centred on image and video retrieval (Hanjalic, Lienhart, Ma, & Smith, 2008), but the results of those meetings tend to concentrate on introducing or improving systems, not on the image searcher. And the applicability of content-based image retrieval (CBIR) to the searches of ordinary, nonprofessional searches is thought to be low (Datta, Joshi, Li, & Wang, 2008) despite the large body of research in its implementation, use, and improvement. We seem to keep concentrating on areas of research that are ancillary to user’s needs and preferences before the search, instead looking almost exclusively at what they do during and after the search. That this should be so may be frustrating, but, on reflection, is not surprising. It is generally accepted as given that the pace of innovation in hardware and software has always and will continue to grow faster that the ability of most people and institutions to use it, and that the user base for any large image retrieval system will be both varied in its purposes and diverse in its native searching skill set, leading to what Dempsey (1999) refers to as “the Challenge of Planning for the Radically Unpredictable”, where … unpredictability emphasises the need for approaches which do not lock providers into inflexible or unresponsive offerings, and which support movement of data and services through changing environments. Without such approaches investment will be wasted and data will potentially be lost or difficult to use. (¶ 3.3)
Cartoon interpretation There is evidence that users do not normally interpret editorial cartoons correctly. Most users, when asked to identify the subject of a cartoon, cannot do so with any degree of accuracy and, to a lesser extent, the same applies to identifying the actors within a cartoon. Thus, it may be that the correct identification of the subject (or subjects) of a cartoon should be the most important aspect of the image to be described in a surrogate, because the users cannot do so themselves. DeSousa and Medhurst (1982) found that there is no evidence at all that “reliable claims can be made for the persuasive power of editorial cartoons prior to ascertainment of reader ability to decode the graphic messages in line with the cartoonist’s intent … editorial cartoons are a questionable vehicle for editorial persuasion” (p. 43). They asked 130 communications students to select keywords
66
Landbeck
and phrases from a list for three cartoons – some of which were seen as legitimate by the researchers, others not – dealing with the three major candidates in the 1980 United States presidential election. They found that while most subjects did not use the inappropriate keywords to describe the cartoons most of the time, neither did they overwhelmingly choose the appropriate ones; most of the choices that the researchers found to be correct for describing a cartoon were chosen less than 50 % of the time. Further evidence of the general inability of readers to interpret correctly editorial cartoons was found by Bedient and Moore (1982). They found that middle and high school students not only failed to interpret the subject and point of a cartoon correctly, but often had trouble identifying the actors in such works. Four groups of public school students, in three age categories (131 students total), were given 24 editorial cartoons pertaining to four separate subjects, and their descriptions of them were compared to those of a panel of expert judges, then categorized as abstract (correct or incorrect), concrete (correct or incorrect) descriptive, and No Response. They found that less than one third of the responses were correct overall, although this varied with age levels and the subject matter of the cartoons. Bedient and Moore concluded that these results, while not perfectly aligned with those of previous studies, represent similar conclusions: that cartoons are often misinterpreted and that the skills needed for proper interpretation cannot be seen as a given. This sentiment was also the result of Carl’s (1968) work, who studied adult interpretations of cartoons, finding that the point the artist intended to make was most often completely different than what people found in the work. In this study, cartoons were taken from 18 of the largest newspapers in the country over a nineweek period, and these cartoons were taken door-to-door to ask for interpretations from a random sample of people in Ithaca, New York, Candor, New York, and Canton, Pennsylvania. Participants were asked for open-ended interpretations of some cartoons, and were asked to rank other cartoons that dealt with race relations (on a segregation / integration scale) or with partisan politics (on a Democratic / Republican scale). The subject’s responses were compared to the expressed intent of the cartoons, according to the artists themselves. In Candor and Canton (described as small towns), 70 % of all open interpretations were in complete disagreement with the author’s intents. In Ithaca (the home of Cornell University and, therefore, seen as a more sophisticated and erudite town), this number was 63 %. In all three places, the scaling part of the study had similar results. Clearly, there is some evidence that the interpretation of an editorial cartoon is a problem for most people. It is not difficult to see that this would be a problem for both the description and the search for such images. It may be that, in addi-
Chapter 3. Access to editorial cartoons
67
tion to being the surrogate for the image, a textual description of it may also serve as an authority, helping the user to describe correctly, and in sufficient depth, the subject or subjects of an editorial cartoon as well as helping to direct searches by correctly linking to other cartoons that deal with the same subjects. In any case, the inability of most people to interpret correctly the subject of an editorial cartoon will probably be a hindrance to accessing such images.
Metadata In its broadest sense, metadata is data about data. Gilliand (n.d.) finds that metadata is “the sum total of what one can say about any information object at any level of aggregation” (p. 1), providing a broad and generally acceptable definition, mirrored and expanded on by Caplan (2003) and the ALA’s Committee on Cataloging: Description and Access (1999). Greenberg (2005) reports that a metadata schema is generally described as a collection of metadata elements, a container for metadata, or a tool designed to serve a purpose, sentiments echoed by the Institute of Electrical and Electronics Engineering (2002) and the National Information Standards Organization (2004). And there are, again, several different opinions about the kinds of metadata that exist: Caplan lists three; Greenberg (2001) four; Gilliland five; Lagoze, Lynch, and Daniels (1996) seven; and the IEEE nine. In her analysis of 105 metadata schema, controlled vocabularies, and other related works, Riley (2010) illustrated which works can be applied to various domains, communities, functions, and purposes. Four basic metadata standards are listed as having a strong association with all four of these areas. The W3C’s Ontology for Media Resource (2009) as well as Dublin Core and Qualified Dublin Core (2009) are not designed specifically for images, and will not be considered in this chapter. The VRA Core 4.0 (2011) would do a good job in describing editorial cartoons, but the Categories for the Description of Works of Art and CDWA Lite (2011) would do a slightly better one. The CDWA was developed by the Art Information Task Force at the Getty Trust with funding from the National Endowment for the Humanities and the College Art Association. As expected, it deals well with each of the categories of metadata in its core elements (which it considers essential for retrieval purposes) and develops them in the expanded elements which can be used for both collection retrieval and image or art display. This schema does not provide a sufficient place for the listing of words found in such cartoons. Granted, the Inscription / Marks category of the full CDWA would be where all quotes, captions, labels, and other such writing would be described, but there is no way to make known which words
68
Landbeck
belong where; as it stands, the CDWA does not provide a method for differentiating between a caption or a person speaking. The CDWA also does not provide a ready place for the actions depicted in a cartoon, which, given the type of image, could carry meaning vital to description and interpretation. While the CDWA may be the best schema for describing the images in question, it is not ideal.
Examples of cartoon collections: Sources In many instances, editorial cartoons are not treated as historical documents or as relevant to scholarly work. They are instead treated solely as items in that collection, and as such they are not described with the historian, the anthropologist, or the educator in mind, resulting in a description of the cartoon that does not treat the historical underpinnings of the image as an important aspect of the item in the collection. The following examples demonstrate that some collections can be used as sources for cartoons but are not meant for that purpose, while other collections are meant to be resources for finding cartoons or for cartoon research. Cartoons are occasionally found as items in a large collection or as part of an archive. The Claude Pepper Library at Florida State University (2009) lists several specific cartoons in its searchable database of catalogued items, and many more simply as “cartoons” among the uncataloged items in the collection. Similarly, the Berryman Family Papers at the Smithsonian’s Archives of American Art (2009) include “cartoons” as a description of portions of the several microfilm reels that represent the searchable collection. In both cases, the focus is on managing the collection, not on providing extensive searchability for purposes other than those concerning the collection managers and their role in providing access. In a different vein, CNN’s cartoon “archive” (2009) is not a well-organized and searchable archive, but is a mere repository for their editorial cartoonist’s recent work. Cartoons are listed by author, then by date, with no attention paid to the subject or any other description of the image aside from providing the captions as a title to the images. In this, we see that sometimes what is called an “archive” is simply a place to store materials, rather than an organized, purposeful collection of important records, that there are archives that serve something other than a legal or management purpose. In some instances, a cartoon collection is used to point to another resource entirely. The National Portrait Gallery borrowed several cartoons from the National Portrait Gallery’s Herbert Block Collection (2009) to briefly illustrate Herblock’s views of the presidents from FDR to Clinton. The cartoons available here do not represent the entirety of the author’s work on these men; in this small
Chapter 3. Access to editorial cartoons
69
collection, most Presidents are examined in three or four cartoons. While it is possible to get cartoons from this site, it is not possible to examine any President or presidency in depth, and while it can be used as a source for cartoons, it is not a true cartoon resource. Similarly, the Smithsonian Institution Libraries American Art Museum / National Portrait Gallery Library (2009) provides access to portions of some of the books on cartoons and caricature in its collection. This resource is less of a cartoon archive and more of a showcase of what is available to researchers in the Smithsonian’s library. For this limited set of cartoons, access is provided through books by image and by subject, although this latter method of searching does not seem to be cross-indexed among the images. It seems likely that the thrust of this effort is to promote the library collections of the Smithsonian, not to provide access to the cartoons themselves. The Pulitzer Prizes’ Web site (2009) makes the portfolio of winning editorial cartoonists available online. The cartoons are not described in any way other than by author and date of publication, thus is not an effort to provide access to editorial cartoons, but a way of providing insight into what the Committee considers in its deliberations concerning who should win the Prize. While interesting, it does not provide researchers with a way of examining political or social issues represented in the cartoons. In all of these cases, and many more like them, it is not the intent of the respective entities to provide access to editorial cartoons nor is it their business to determine which of the cartoons are historical documents and treat them as such. While one can find cartoons in these places, the cartoons are not the reason they exist. These organizations can be considered sources for editorial cartoons, but they cannot be considered resources for them. Thus, they should not be considered when examining ways to index cartoons for research purposes, as they do not intend their collections to be used for such ends; they are included here to serve as examples of what a resource – in the context of this research – is not.
Examples of cartoon collections: Resources There are several resources – collections that make an effort to meaningfully organize and fully describe editorial cartoons for future retrieval – that allow access to the images in the collection. But each has a number of shortcomings that would hinder, to some degree, any research that might be done using them. The American Association of Editorial Cartoonists (AAEC) (2009) maintains a web presence for the purposes of promoting both its members’ work and the profession in general, a by-product of which is a small sample of recent works from
70
Landbeck
AAEC cartoonists. The cartoons available are kept on the site for one week at most, and are divided into Local Issues and National / International Issues. This seems to be a resource for which the intended audience is other editorial cartoonists, allowing for reference on both artistic and professional issues. It also addresses the needs of educators seeking editorial cartoons dealing with very recent issues in governments and society.
Poorly designed and poorly executed Mankoff (2004) provides on CD-ROM all 68,647 cartoons published in The New Yorker from February 1925 to February 2004. Access is provided to these by using the magazine’s in-house descriptions, developed ad hoc over a number of years and following no particular system at all; occasionally, the words within the cartoon or its caption are included in the description, but this is more the exception than the rule. Generally, this means referring most often to the subject of the cartoon, whether it refers to a social issue or a political one. While the indexing scheme does provide access to cartoons, the search tool is difficult to use because only one term at a time can be entered, and a completed search shows all of the terms for that cartoon, including the ill-matched, the irrelevant and the bizarre. goComics.com (2009) provides access to 62 editorial cartoonists’ most recent work and, after free registration, to archives of their work from the time they became a part of Universal Press Syndicate (which sponsors the site) there are two user-driven avenues for potential description: the chance to tag each cartoon. This access is two-tiered: by author, then by date within each author. Also within each author (which is seldom used), and a chance to contribute to a discussion board for each image (which, for cartoon research purposes, is used too much). The result is a community of commenters who seem more intent on keeping a discussion going than on describing the cartoon for future retrieval. The AAEC’s time-limited collection, the New Yorker’s ad hoc indexing practices, and goComics’ seldom-used user-based indexing practices are not fatal flaws for use in research, but are flaws nonetheless, resulting in ill-conceived indexing practices and that seem unlikely to be rethought in the future. In each of these, some serious (though not fatal) flaws emerge, centring on the exclusion of image content by providing access solely on the basis of context, both within and without the image. There is a tendency to index the cartoons more in terms of Panofsky’s iconological level of indexing, leaning more toward providing access to cartoons via the subject or topic of the cartoon and ignoring the visual content of the image, or to the cartoons more as one of Fidel’s Object pole point rather than as one of her Data pole points. Similarly, those of Jör-
Chapter 3. Access to editorial cartoons
71
gensen’s Classes that are addressed in these collections tend toward the abstract and interpretive – such as Content / Story and Abstract Concepts – than they are toward the concrete. Add to this the various instances of poorly thought-out plans for actually providing access to the cartoons, and we see that while the effort was probably well-intended, it does not provide the kind and quality of access to political cartoons that we would hope for much less expect.
Well-designed but poorly executed There are other resources for editorial cartoons where the indexing scheme is well-conceived but poorly executed, which is to say that a reasonable system for describing these images in manifest, but that the execution of that system is suspect. One such resource is the Prints and Photographs Division of the Library of Congress (2009), which lists the holdings of several cartoon collections. Most of the images are not available online at this time, but all of the cartoons are catalogued to some degree. Some LC records are little more than an artist, a publication date, and how to locate physically the image, while others offer an inventory of items and words in the cartoon, while still others offer the context in which the cartoon was created. This variability in the description of cartoons could be the result of any number of circumstances (among them manpower, cost, and lack of information), but brings about the overall result of a hit-or-miss system of cartoon description, making the job of the researcher more difficult. The Bundled Doonesbury: A Pre-Millennial Anthology (with CD-ROM) (Trudeau, 1998) is an indexed compilation of all Doonesbury comic strips for the first 25 years of its publication. Indexing has been done by character, date of publication, and subject, and groups of strips dealing with the same topic providing a timeline. It also provides a list of top headlines from the week that the strips were published, and trivia lists from those weeks as well, a distraction for the researcher that does not provide access to the cartoons in any way. But its emphasis seems to be a way to track characters over time rather than a method for recalling commentary on political events. It is almost as though the focus of the database is centred on the phenomenon of the long-running strip rather than on the commentary on the times the strip covers. Darryl Cagle’s Professional Cartoonist’s Index (2009) is a good resource for current American editorial cartoons as well as those from the recent past. It provides access to cartoons by author, date, and by subject, this latter being a broad-based description of issues pertinent to the day. Additionally, Cagle has compiled what he considers the best editorial cartoons of each year into a permanent, published index, but these indices are not based on subject. One problem
72
Landbeck
is that the cartoons do not carry with them a date of publication, a major obstacle to researching the event that inspired the cartoon. Though flawed, the cagle.com site is among the best resources for finding editorial cartoons. In the Prints and Photographs Division site, the Doonesbury CD, and the cagle.com site, the intent seems to be to provide access to the respective cartoons in relevant and meaningful ways, but the result of the indexing efforts can hinder research efforts within these collections because of a lack of granularity and consistency in their description. Here we see that accommodations for two of Panofsky’s three levels of image description have been made, and that both of Fidel’s poles are better, but not perfectly, represented. Further, we see that while Jörgensen’s 12 Classes are not represented by name, the concepts behind most of those classes are found in systems for describing the cartoons and in the resulting interface. We still do not see a perfect implementation of image indexing theory – Panofsky’s lowest level of image description is not extant in these collections, and none treat political cartoons as solely instances of data as per Fidel – but these exclusions make sense in that it is probably unlikely that any will search for such images using generic terms like “man” or “house”. But the end products still suffer from such problems as uneven efforts in indexing, extraneous information in the interface, and lack of detailed descriptions.
Well designed and well-executed There are a few resources for editorial cartoons that avoid these shortcomings, providing equal coverage to all cartoons, framing the subjects of the cartoons well for the researcher, and presenting clear descriptions of the cartoons in the collection; these are examples of how cartoon description should be maintained. The Mandeville Special Collections Library (2009) at the University of California, San Diego hosts the World War II editorial cartoons of Dr Theodore Giesel, better known as Dr Seuss. The collection represents the entirety of Dr Seuss’ work while he was the chief editorial cartoonist for the New York newspaper PM from 1941 to 1943. The cartoons are presented both chronologically (a simple list of cartoons by date) and by subject (a detailed list of the people, countries, battles, and political issues) with cross-references between these four superordinate headings. These two access points are a good gateway for researchers to search the collection. Charles Brooks (2011) has compiled what he considered to be the best editorial cartoons of each year since 1974 and has published them in book form, organized by general issue, topic, and person. He does not provide a date for the cartoon’s publication (aside from the year provided in the book’s title), but
Chapter 3. Access to editorial cartoons
73
does provide an introductory paragraph to each subject area, covering the issue in broad terms and sometimes offering a retrospective evaluation of the public mood about it. The topics in each year’s section are listed in the table of contents, and the index provides access to the works of each artist. While this is standard for print works, better access to editorial cartoons can be offered electronically because of the ability to provide multiple access points to each cartoon. In 1998, Paul Bachorz asked his high school students to describe several FDRcentred cartoons as part of a class project (Bachorz, 1998). It is among the betterconceived efforts to provide broad-based access to the editorial cartoons in the collection, providing the typical bibliographic information as expected, as well as a general breakdown of cartoon topics. Within these, the representation of the cartoon’s specific subjects varies; whereas some representations are presented longitudinally (such as FDR’s attempt to pack the Supreme Court), others are more of a snapshot, providing brief treatments of the perspective of each cartoon (as for the Farms Issue) before giving way to a simple list of cartoons. This work recognizes that while retaining information about the author and publication are important, efforts need to be made to provide access both by subject and by context for the cartoons in a collection. Although the execution of the description may be questionable in some cases, the framework for description is wellconceived and foreshadows some of the image description metadata schema that were developed later. These efforts are examples of both good planning and good execution. They allow for reasonable and prudent use of Panofsky’s levels of image description, and for both of Fidel’s poles. Both the perceptive and interpretive aspects of Jörgensen’s 12 Classes are available as well. There is a welcome lack of extraneous information provided, and the efforts here seem to be focused on image description instead of other concerns. The efforts discussed above describe cartoons range from the haphazard to the well-considered in the way they approach describing cartoons. Some suffer from problems in the interface itself, others from the way they attempt to describe cartoons, but all of these resources do shed some light on what organizers think their intended audience wants in terms of accessing editorial cartoons. Both the sources and the resources provided basic, low-level access to images based on the creator of the cartoon, and generally provide the date the cartoon first appeared in print. Most of the resources described here provide some means of linking the cartoon to a specific event; some do this through plain statement, others through a paragraph description, and those which attempt it all have varying degrees of success. But it is clear that the people and organizations that created these collections saw that there is a need to provide access to cartoons by subject. Additionally, the
74
Landbeck
best of these resources seem to be organized from the top down, on a collectionlevel basis. In any case, the examples shown here indicate that it may be possible to recommend methods for describing editorial cartoons, and that there are some common threads found throughout descriptions of such images, but that the truly well thought-out descriptions take into account both the collection and the intended audience.
Discussion A number of issues arise from the academic literature and the observed practices concerning editorial cartoons. Perhaps foremost of these is how we deal with the subject of a cartoon. Most of the resources noted in this research made some attempt to provide access to cartoons by subject; it is perceived by practitioners to be important to users. We can see that there is some evidence that readers generally do not interpret the subject of a cartoon correctly, but we also see that the research does not provide the standard for specificity that was used in any of those cases; we do not know what level of description was expected and counted as having properly understood the cartoon in any of those articles. How does this affect the work involved in describing editorial cartoons? Surely, we will not list an erroneous subject in the record, vainly hoping to guess what people might think the subject is, instead of listing some reasonable description of what the cartoon is dealing with. The more pressing question is this: what level of specificity is required when describing the subject of an editorial cartoon? Jörgensen’s 12 Classes provide two places for such a description – Content / Story and External Relation – that might be reasonably used as places to bring the context of the image into focus, but they do not provide any guidance into how deeply to describe the subject, meaning that, for instance, the subject “War” would be treated in the same manner as “World War II”, which could lead to problems. Can we profitably incorporate Shatford’s ideas about the generic and the specific into the description of a cartoon’s subject, or will we find what Armitage and Enser found, that users will tend very much toward one or the other of these? Is it as important to establish the subject of an editorial cartoon as it is to establish the context? In a similar vein, what level of specificity is necessary in a description of the content of a cartoon? While the identification of the components in a cartoon would seem to be a given, to what degree should such things be measured? It would not seem to be enough to identify the specific – such as “Barack Obama” – in the generic – “a man”. The problem here is that editorial cartoons sometimes
Chapter 3. Access to editorial cartoons
75
make use of symbols, which are one thing but represent another. While we might discard the simple description of Barack Obama as “a man”, we might need to note that the “a bear” is meant to represent “the Soviet Union”. This seemingly contradictory set of requirements for specificity in describing cartoon content can pose problems in indexing such images. From a broader perspective, we can see that Enser was right: there seems to be a fundamental disconnect between the practice and theory, between those who actually describe and organize editorial cartoons and those who research user needs and preferences vis-à-vis images. We can clearly see that there is a demand for access to these cartoons; Brooks is now in his fourth decade of compiling the best cartoons in a given year, both cagle.com and goComics.com (among others) are viable business concerns, and several other of the resources examined here show that considerable effort has gone into preservation and access for these images. But the way they have gone about providing access – as manifest in their description of the cartoons – lacks some of the insight and polish that is represented in the literature concerning images in general. Even the very best of the examples could have benefited from Shatford’s ideas concerning Of and About, from each of the three examinations of accuracy in cartoon interpretation, and from Jörgensen’s 12 Classes of image description. But none of them did. Whether this is due to a lack of awareness of this research on the part of the practitioners, or an assumption that, as practitioners, they knew best how to get the job done, is unknown at this time, but that there is a gap between those who describe cartoons and those who research image description is evident. What can metadata do for editorial cartoons? A custom-made schema can help to describe the images in a collection. The flexible quality of metadata provides means for collectors to establish, define, and document how each of the elements in a given schema are used within a given collection, allowing both uniformity in the application of the standards within a collection and methods to build crosswalks and connections to other like-minded collections. Additionally, the use of a metadata schema such as the CDWA or the VRA Core, as opposed to a more formal method of indexing and classification like MARC21, allows lessskilled, non-professional indexers to provide metadata for editorial cartoons; the self-contained nature of a schema lets anyone interested create metadata for these images for later review by a professional or other representative of the collection itself, and does not require a great deal of a priori knowledge of the system to allow an indexer to use it effectively. Whether the labour involved in both the crosswalk building and the data conversion is worth the effort between such a schema and a collection’s cataloguing system would be worth the effort is beyond the scope of this chapter, but the existence of a crosswalk between both the VRA
76
Landbeck
Core and MARC21 and the use of MODS in large collections of all types can be seen to show that it is at least possible.
Conclusion Clearly, images in general, and editorial cartoons in specific, can be meaningfully and usefully indexed for retrieval in today’s electronic world. While there are some good examples of how we might perform this task, there is little in the way of well thought-out and well-executed systems of cartoon description evident. Chappel-Sokol tells us some of the whys and wherefores of this situation, while Weitenkampf argues for the inclusion of such images in the historical record. The literature tells us of some of the concerns of describing images of any type with words, some of the pitfalls to be expected, and some of the methods to mitigate the inherent loss that comes with surrogation. We can see that there are now new ways of indexing images, and that such activity is no longer restricted to professional realms. The practices examined here show us that some cartoon collections are only attractions, lures, or baubles for the organizations concerned, while others are dedicated to the proposition that editorial cartoons deserve to be remembered – and accessible – so that all may benefit. We have the tools. We have the talent. And we have the interest. And yet, editorial cartoons are not generally included as parts of historical records, or archives, or newspaper indices, or any other body of general imagery. Why? Is this simply a case of “we do it this way because we have always done it this way”? Clearly, we can do better to preserve and expand the historical record, to provide a valuable resource to historians, anthropologists, art historians, and political scientists, and to investigate how and why images should be described. But, at this point, the efforts to do this through the indexing of editorial cartoons is limited to sporadic, localized, and haphazard efforts to serve the interests of collections rather than to fill in the gaps in the collective body of knowledge.
References American Association of Editorial Cartoonists. (2009). AAEC – today’s political cartoons. Retrieved from http: // editorialcartoonists.com / index.cfm American Library Association & Association for Library Collections and Technical Services, Committee on Cataloging: Descriptions and Access. (1999). Summary report. Retrieved from http: // www.libraries.psu.edu / tas / jca / ccda / tf-meta3.html
Chapter 3. Access to editorial cartoons
77
Armitage, L., & Enser, P. (1997). Analysis of user need in image archives. Journal of Information Science, 23(4), 287 – 299. ARTstor, Inc. (2011). Welcome to ARTstor. Retrieved from http: // www.artstor.org / index.shtml Bachorz, P. (1998). FDR Cartoon Collection Database. Retrieved from http: // www.nisk.k12. ny.us / fdr / index.html Bedient, D., & Moore, D. (1982). Student interpretations of political cartoons. Journal of Visual / Verbal Languaging, 5, 29 – 36. Brooks, C. (Ed.) (2011). Best editorial cartoons of the year: 2009 edition. Gretna, LA: Pelican Publishing Company, Inc. Cagle, D. (2009). Daryl Cagle’s Professional Cartoonists Index. Retrieved from http: // www. cagle.com / main.asp Caplan, P. (2003). Metadata fundamentals for all librarians. Chicago: American Library Association. Carl, L. (1968). Political cartoons fail to reach readers. Journalism Quarterly, 45, 33 – 535. CDWA. (2011). Categories for the description of works of art. Retrieved from http: // www.getty. edu / research / publications / electronic_publications / cdwa / Chappel-Sokol, A. (1996). Indexing editorial cartoons. Special Libraries, 87(1), 21 – 31. Claude Pepper Collection. (2009). Claude Pepper Collection. Retrieved from http: // www. claudepeppercenter.net / collection / polaris / memorabilia.html CNN.com – Cartoon Archive. (2009). Retrieved from http: // www.cnn.com / POLITICS / analysis / toons / archive.htm. Corbis Corporation. (2011). Corbis images – premium quality stock photography and illustrations. Retrieved from http: // www.corbisimages.com / ?s_kwcid=corbis&gclid=CNTC krGrp6oCFcXs7QodREWCCQ. Datta, R., Joshi, D., Li, J., & Wang, J. (2008). Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2), 1 – 60. Dempsey, L. (2000). Scientific, industrial, and cultural heritage: A shared approach: A research framework for digital libraries, museums and archives. Ariadne, 22. Retrieved from http: // www.ariadne.ac.uk / issue22 / dempsey / intro.html De Rosa, C., Dempsey, L., & Wilson, A. (2004). Online Computer Library Center: The 2003 OCLC environmental scan: Pattern recognition – executive summary. Retrieved from http: // www. oclc.org / reports / escan / downloads / escansummary_en.pdf DeSousa, M., & Medhurst, M. (1982). The editorial cartoon as visual rhetoric: Rethinking Boss Tweed. Journal of Visual / Verbal Languaging, 2, 43 – 52. Dublin Core Metadata Initiative. (2008). DCMI metadata terms. Retrieved from http: // dublincore.org / documents / dcmi-terms / Enser, P. (2008). Visual image retrieval. Annual Review of Information Science and Technology, 42, 3 – 42. Fidel, R. (1997). The image retrieval task: Implications for the design and evaluation of image databases. New Review of Hypermedia and Multimedia, 3(1), 181 – 199. Getty Art History Information Program, J. Paul Getty Trust, & Petersen, T. (1990). Art & Architecture Thesaurus. New York: Oxford University Press. Published on behalf of the J. Paul Getty Trust. Gilliland, A. (n.d.). Setting the stage. Introduction to metadata: Pathways to digital information. Retrieved from http: // www.getty.edu / research / conducting_research / standards / intrometadata / setting.html
78
Landbeck
goComics.com. (2009). Comics, editorial cartoons, email comics, comic strips. Retrieved from http: // www.gocomics.com / explore / editorial_lists Goodrum, A., & Spink, A. (2001). Image searching on the Excite Web search engine. Information Processing and Management, 37, 295 – 311. Greenberg, J. (2001). A quantitative categorical analysis of metadata elements in image-applicable metadata schemes. Journal of the American Society for Information Science and Technology, 52(11), 917 – 924. Greenberg, J. (2005). Understanding metadata and metadata schemas. Cataloging & Classification Quarterly, 40(3 / 4), 17 – 36. Greisdorf, H., & O’Connor, B. (2002). Modelling what users see when they look at images: A cognitive approach. Journal of Documentation, 58(1), 6 – 29. Guggenheim Foundation. (2011). Guggenheim Foundation. Retrieved from http: // www. guggenheim.org / guggenheim-foundation. Hanjalic, A., Lienhart, R., Ma, W., & Smith, J. (2008). The Holy Grail of multimedia information retrieval: So close or yet so far away? Proceedings of the IEEE, 96(4), 542 – 543. Hastings, S. (1995). Query categories in a study of intellectual access to digitized art images. Proceedings of the 58th Annual Meeting of the American Society for Information Science, 32, 3 – 8. Hollink, L., Schreiber, A., Wielinga, B., & Worring, M. (2004). Classification of user image descriptions. International Journal of Human-Computer Studies, 61, 601 – 626. Institute of Electrical and Electronics Engineers. (2002). IEEE Standard for Learning Object Metadata (IEEE Std 1484.12.1 – 2002). New York: IEEE. Jörgensen, C. (1995). Image attributes: An investigation. (Doctoral dissertation). Retrieved from ProQuest Dissertations and Theses Database. Lagoze, C., Lynch, C., & Daniel, R. (1996). The Warwick framework: A container architecture for aggregating sets of metadata. Retrieved from http: // ecommons.library.cornell.edu / handle / 1813 / 7248. Lew, M., Sebe, N., Djeraba, C., Jain, R. (2006). Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing, Communications and Applications, 2(1), 1 – 19. Library of Congress (2009). Prints and photographs online catalog. Retrieved from http: // www. loc.gov / pictures / Mandeville Special Collections Library at the University of California, San Diego. (2009). A catalogue of political cartoons by Dr. Seuss. Retrieved from http: // orpheus.ucsd.edu / speccoll / dspolitic / Frame.htm Mankoff, R. (Ed.). (2004). The complete cartoons of The New Yorker with CD-ROMs. New York: Black Dog & Leventhal Publishers. National Information Standards Organization. (2004). Understanding metadata. Retrieved from http: // www.niso.org / publications / press / UnderstandingMetadata.pdf National Portrait Gallery. (2009). Herblock’s Presidents. Retrieved from http: // www.npg. si.edu / exhibit / herblock / intro.html Pulitzer Prizes. (2009). Editorial cartooning. Retrieved from http: // www.pulitzer.org / bycat / Editorial-Cartooning Riley, J. (2010). Seeing standards: A visualization of the metadata universe. Retrieved from http: // www.dlib.indiana.edu / ~jenlrile / metadatamap / Shatford Layne, S. (1994). Some issues in image indexing. Journal of the American Society of Information Science and Technology, 45(8), 583 – 588.
Chapter 3. Access to editorial cartoons
79
Smithsonian Institution, American Archives of Art. (2009). About the Berryman Family Papers. Retrieved from http: // www.aaa.si.edu / collections / findingaids / berrfami.htm Smithsonian Institution Libraries American Art Museum / National Portrait Gallery Library. (2009). Drawing from life: Caricatures and cartoons from the American Art / Portrait Gallery Collection. Retrieved from http: // www.sil.si.edu / ondisplay / caricatures / Trudeau, G. (1998). The bundled Doonesbury with CD-ROM. Kansas City: Andrews McNeal Publishing. Visual Resources Association. (2011). VRA Core 4.0. Retrieved from http: // www.vraweb.org / projects / vracore4 / index.html Weitenkampf, F. (1946). Political cartoons as historical documents. Bulletin of the New York Public Library, 50(3), 171 – 176. World Wide Web Consortium. (2010). Ontology for media resource 1.0. Retrieved from http: // www.w3.org / TR / 2009 / WD-mediaont-10 – 20090618 /
Part II: Information behaviour studies
Diane Rasmussen Neal, Niall Conroy
Chapter 4. Information behaviour and music information retrieval systems: Using user accounts to guide design Abstract: Music Information Behaviour (MIB) is an area that explores the ways in which people use music in their everyday lives, and especially how the objectives of users relate to how information systems are developed. User-oriented system design is, therefore, informed by the classification of several aspects of music information behaviour, including how (and why) people describe, share, and search for music. Building on the results of prior music information behaviour work, the researchers performed a content analysis of 200 blogs in which users describe their own use of music and music information systems. The results uncover various dimensions of user behaviour, and how needs translate to online activities. The emotional needs of users feature prominently in how music is searched for and shared. Users describe a preference for browsing and collecting behaviour, rather than targeted search-type activities. Existing systems and new design proposals are discussed in light of these trends. Keywords: Information behaviour, music information retrieval, user-oriented design, content analysis, blogs
Diane Rasmussen Neal, Assistant Professor, Faculty of Information and Media Studies, The University of Western Ontario, [email protected] Niall Conroy, Doctoral Candidate and Research Assistant, Faculty of Information and Media Studies, The University of Western Ontario
Background The development of Music Information Retrieval (MIR) systems has been focused on providing newer, better, more efficient means of providing access to music materials. As digital music becomes more accessible through online sources and applications like Last.fm, iTunes, Amazon Music, Pandora and others, it is vital to consider how effective information retrieval systems are in accommodating people’s most instinctive user needs and natural searching behaviours. Accessing music is unique from other types of information retrieval in that it occupies
84
Rasmussen Neal, Conroy
a prominent place in casual and leisure activities. For instance, people derive great pleasure and entertainment from listening to music, and even from browsing through music collections to find their favourite song. Others enjoy creating large music collections of their own that they may show off and share with friends and family. People’s musical experience is multifaceted and so is the meaning they derive from music, both of which depend on individual characteristics and prior musical knowledge. Finally, music can be used in a variety of social contexts; experiencing music in a group environment is done to suit a range of social functions. In light of the multiple dimensions of music information behaviour, there is conjecture about how these translate into creating an ideal music information system. Information systems should provide access to music in a way that conforms to user needs, including the accurate representation of the properties of music and functions that suit people’s natural activities with music. Understanding how people naturally use music means understanding what musical characteristics they deem most significant. Descriptions about music subsequently affects information categorization, organization, presentation, discovery, and other important considerations of system design. One approach to implementing musical descriptions is to use traditional metadata labels which designate objective musical facets, such as song title, composer name, or year of release. More detailed labels may be used to identify properties of the musical content itself, such as its tempo, key, and tonal attributes. Contextual or associative metadata is derived from the subjective or emotional associations people have with music, and may be determined through the context in which music is experienced. Descriptors may focus on cues that relate to physical or sensory features (such as articulation or roughness) or they may focus on higher level properties whose semantics may range from structural to synesthetic / kinaesthetic to affective / emotive qualities (Leman, Styns, & Noorden, 2006). The primary difficulty, however, is that people’s experience of music is highly individual, making the designated meaning of music dynamic and ephemeral. Still, since music’s pre-eminent functions are psychological and sociological (Laplante, 2008), the challenge lies in creating systems that are flexible, adaptive, and to some degree accommodating to the wide range of subjective meanings which arise from people’s everyday uses of music. The evolution and evaluation of music information systems has tended to follow a system-oriented rather than a user-oriented approach. System-oriented design tends to be concerned with the development of content-based features and automatic content analysis and classification. The user experience tends to take a back seat, and is gauged based on how well people adapt to working within the confines of these features, usually through a highly-controlled task or context. Downie and Cunningham (2002) note that users often tailor requests based on
Chapter 4. Information behaviour and music information retrieval systems
85
what he or she thinks could be retrieved from that system. Although user groups have been utilized in evaluation studies, information requests to a formal IR system are often constrained by the user’s preconception of what types of information or document formats are available. Additionally, performance metrics tend to be quantitative, measuring efficiency, recall, and precision rates. These measures largely ignore the social, affective, and subjective uses of music which feature so prominently in people’s common use of music. Finally, MIR systems have been designed and evaluated largely based on anecdotal evidence, intuitive feelings, and a-priori assumptions of typical usage scenarios (Cunningham, Reeves, & Britland, 2003). These may or may not correspond to multi-faceted natural music information behaviour. In contrast, the user-centred approach integrates users at each stage of the process so that information needs and activities directly inform feature development. Researchers have noted that the same information need may be expressed in different ways according to the resource being used at a given time (Lee & Downie, 2004). Moreover, music information “needs” may be seen as fundamentally different from those which involve other information formats, especially text. The cognitivist discourse of information behaviour considers information needs in terms of problem-solving, where knowledge gaps are resolved by exploring potential outcomes (Allen, 1996). Although this concept has worked reasonably well for the design of text-based retrieval systems, it is not necessarily wellsuited for music retrieval. This is because the motivations behind searching and browsing behaviours are based on affective rather than cognitive needs (Laplante, 2008). Often there is pleasure derived from the act of browsing through music libraries and developing large personal music collections. This is the same distinction between what Laplante and Downie (2011) describe as the hedonistic and the utilitarian uses of music. The former describes how users derive pure enjoyment from the acts of listening, seeking and exchanging music, while the latter describes acquiring music to achieve some specific and measurable outcome. These distinctions point to a complex set of music behaviour needs that should be explored and categorized if system design is to move towards a more user-oriented framework. To this end, a body of literature has begun to emerge which attempts to resolve the dearth of information about natural music information behaviour. These have been published within the HUMIRS (Human Use of Music Information Retrieval Systems) project of MIREX (Music Information Retrieval Evaluation Exchange). This work is among other such attempts to inform system design using methods and data sources which help construct a comprehensive picture of users and contexts of use. This forms the basis for creating benchmarks specifying the preferred features and evaluation of music information systems (Downie, 2008).
86
Rasmussen Neal, Conroy
Related work Previous studies have addressed design by observation in natural settings, as well as through online communication related to general music information behaviours. They tend to focus on how users make choices based on various contexts without restricting use to one particular system. Lesaffre, Baets, Meyer, and Martens (2007) found significant correlations with demographic information such as gender, age, expertise, musicianship, and breadth of taste, with the kinds of descriptors users prefer (whether interest-based, appraisal-based, or factors denoting physical involvement). Recognizing the dearth of information dealing with natural music information behaviour, the HUMIRS initiative has sought as its main objective the investigation of the Who, What, When, Where, Why and How of music information retrieval. From this knowledge, system development may align more closely with people’s natural usage patterns. Since involving human subjects directly in the design process is oftentimes practically unfeasible, the principal methods have been to use ethnographic observation, interviews, surveys, and focus groups. These provide the important benefit of analysing interaction apart from any particular system. The results of these studies have begun to uncover general patterns of behaviour including relationships between users and social contexts. Rather than fitting these observations squarely within a formalized information behaviour model, the research has aimed at exploring the range of resources users employ, so that they may be linked to system features. Although many of these studies tend to reinforce the importance of using bibliographic descriptions of music (Downie & Cunningham, 2002), some interesting exceptions have been uncovered. The Lee, Downie, and Cunningham (2005) study demonstrated the importance of utilizing contextual metadata and user generated exemplars in searching activities when bibliographic information is unknown (due to language differences) or when genre definitions are not typical (due to cultural differences). There is a consistent need for the use of contextual metadata as well as bibliographic metadata in describing music, providing evidence for the use of access points which are linked to extra-musical objects or events (Lee & Downie, 2004). Semantic descriptions should be flexible since they encapsulate music’s subjective, intrinsic qualities. These are sometimes associated with cultural, historical, or other non-musical significations. Many authors in social science, musicology, psychology, and other fields have addressed the fact that music seeking behaviour is invariably guided by certain underlying motivational triggers. For example, Clarke, Dibben, and Pitts (2009) summarized some of these practices such as social engagement and bonding, help on extra-musical tasks, identity construction, and mood manage-
Chapter 4. Information behaviour and music information retrieval systems
87
ment, to name a few. In particular, the importance of emotions experienced by the listener, and the linking of emotions to events in episodic memory, play a huge significance in musical taste. It is, therefore, not surprising to see how much variation exists in attempts to map the surface properties of music to emotional experiences. Emotive forms of musical search features prominently (Lesaffre et al., 2007), yet evaluation studies are seldom user-focused and instead highlight how arbitrary sets of emotional descriptors might be applied to musical objects in the systems. In natural settings, the role of emotions in musical searching makes it problematic for users to relate to rigid terms imposed by the system. Papers published under the HUMIRS project also reveal important clues regarding motivation, and in this sense address the fundamental why question regarding music information behaviour. Findings show that users have different reasons for music retrieval, such as verifying the works of certain artists (Lee & Downie, 2004) or building personal collections (Downie & Cunningham, 2002). Certain resources are selected for different reasons, such as providing visual cues to musical content, or because they originate from informal trust-based relationships between peers (Cunningham et al., 2003). Findings like these illustrate the general principle that music searching is not an individual isolated process. It is public and shared: people display “public information-seeking” behaviours by making use of collective knowledge and / or opinions of others about music in the form of reviews, ratings, and recommendations. Users overwhelmingly prefer to use friends or family members as resources for music, indicating the importance of familiarity beyond mere musical knowledge (Lee & Downie, 2004). Signs of monitoring behaviour are also witnessed as users rely on information present in reviews and posters in order to keep up to date on trends and works put out by their favourite artists (Lee et al., 2005). The efforts of music information behaviour studies also involves providing recommendations, idealized goals, and speculations for how to improve systems in order to correspond more closely to empirical findings. For example, locating new songs in an idiosyncratic genre could be supported by search tools that allow the user to ask for “more songs like these”. Designers should develop interfaces for specifying musical query-by-example searches. This implies that systems must clearly identify the musical facets most useful for characterizing genres, such as timbre, instrumentation, and rhythm. A digital library that supports monitoring would provide “what’s new in this collection” summaries that could be subdivided by genre as well as date. As Downie and Cunningham (2002) observed, serendipitous browsing should be easier in an MIR system than in music shops. Innovations such as collage machines display series of documents, such as CD covers, accompanied by snippets of songs from each album in order to aid in visual browsing. Given the difficulties of developing and applying a taxonomy
88
Rasmussen Neal, Conroy
of musical genres, genre searching could be better supported by automatic clustering based on similarities of sound or rhythm (e.g., using self-organizing map displays originally intended for clustering text documents). The development of truly novel and exploratory applications is likely to require developers to imagine new ways of organizing information outside of those which evolve from the current paradigm. What these studies have not addressed is how people engage in music information behaviour in online environments without focussing on one particular system or task. The current study looks to supplement previous findings by analysing how users engage with music. Similar to previous work, the study remains exploratory so that proposed design features may be informed by users’ natural tendencies and preferences. However, unlike previous studies, it looks at how online systems in particular shape behaviours and where users are challenged as they engage with these systems. This means analysing how users frame their search intention, how they tend to describe and associate with music, what resources they tend to favour, and what drives them to look for music in the first place. This also includes addressing the issue of how users manage large personal music collections. When combined, the findings will aid in providing a clearer, more comprehensive picture of activities, objects, users, and processes that comprise music information behaviour.
Methodology This study continues in the spirit of the HUMIRS project, which has provided a useful starting point for user-oriented system design and the investigation of music information behaviour in general. The current study improves on previous methods by using content analysis of blogs combined with quantitative summaries to generate a profile of these behaviours. Unstructured, user-generated descriptions provide important information about users and needs, and illuminate the details of high-level behaviours as they apply to music. From these descriptions, a determination can also be made about whether problem-solving, cognitive-based behaviour models are appropriate for the music domain. The existing body of research reveals non-trivial problems in data collection methods which inevitably impact findings. Ethnographic methods such as field observation and interviews have provided some useful clues regarding users’ underlying motivations and goals. But they are time-consuming and limited in scope, making it difficult to create generalizable theories across a broad population (Hu, Downie, West, & Ehmann, 2005). The decision to use certain data
Chapter 4. Information behaviour and music information retrieval systems
89
sources has also implicitly regulated the kind of users being studied. For example, the popular use of web-based question answering systems and other knowledge bases means that behaviour is only framed in terms of detailed, explicitly formulated information requests. Only a limited population of internet users and music seekers utilize these sources on a regular basis since opinions of unfamiliar experts are secondary to trust-based and informal information channels (Hu, Downie, & Ehmann, 2006). In addition, other methods have required that users be present in a public environment, such as a CD store or library, and this might produce an artificial representation of behaviour distinct from those witnessed in more intimate environments, such as a party of friends, a quiet bedroom, or a cultural event. The use of diaries in data collection has been a useful tool for information behaviour research by having subjects record activities, problems, and sources related to information seeking (Ingwersen & Järvelin, 2005). Web logs or blogs are one type of publicly-available electronic diary which may likewise serve as a useful resource to address the problems in existing methods. The anonymity and scope of blogs means that users are likely to report – in a candid fashion – subjective accounts of preferred resources, activities, and other aspects of information behaviour that might otherwise remain hidden in data sources such as Q&A websites, reviews, or surveys. And, unlike findings originating through subject observation methods, the content of blogs is not restrictive in the type of user, environment, context, or culture in which music searching is conducted. For this study, we analysed public blog content with content analysis. The HUMIRS dimensions suited to the context of this study guided the analysis of unstructured descriptions. In other words, this means answering the “5 W’s” about the use of online music information systems. The research questions we considered were as follows: – Who is looking for and using music online? (This question seeks to identify who the users of MIR systems are and what is most revealing about their characteristics). – What kinds of music (or musical properties) do users seek out? (This may include specific genres or artists, but more broadly, what facets are most used to describe music). – Where do people go to locate music? (This can include websites or actual systems in addition to more overlooked resources, such as people). – Why do people look for music? (This important element examines the underlying motivation or goal that drives behaviour. It may also address the evaluation criteria people apply to music information). – How do people look for and use music? (This question addresses what people do in terms of actual online behaviours).
90
Rasmussen Neal, Conroy
Interpretation and specification of these questions remained flexible to allow for reasonable interpretation of blog content. As an analysis tool, this question set aided in producing a coding system used to describe and categorize the blog content. As the posts were gathered and analysed according to each respective question, certain behavioural categories emerged so that instances could be tallied according to their observed frequency. As the overall purpose of this study was inductive and exploratory, these divisions were left as general and discrete as possible in order to present a basic profile of user behaviours across the various dimensions. To implement this process, a query set was established which consisted of phrases likely to occur in personal or subjective accounts of music search behaviours. For each query, an RSS feed was created in order to extract the most current content published to Google Blogs. Using phrases aimed to capture instances of music searching and related behaviour, a query list was constructed without restrictions imposed as to music types, users, or domains. The sampling method chosen was based on the relevance of the search results to the query. A set of 10 queries submitted to Google Blogs on two separate instances (Appendix B) produced an extensive list of matching blog entries of which the first 10 results were used. This produced a dataset of exactly 200 blog entries of various lengths, authors, sources, and topics. If a blog was a deemed to be a “false hit” and irrelevant to the goals of the study, the following blog entry in the search results was used instead. Blog content was extracted using a tool to manage and submit queries as well as store related information, such as time, query syntax, URLs, and the associated blog file in a database. The coding process was performed through a Microsoft Access database, where codes and associated explanations were applied to sanitized content. Codes were constructed using a derived vocabulary of behaviours falling within each dimension of analysis. As codes were assigned, they were stored as a separate record and associated with its corresponding blog record. The results were tallied and assembled in a set of tables (Appendix A) each matching a particular behavioural dimension.
Findings Who is using online music systems? According to the blog accounts, the extent to which users reveal personal information is proportional to the extent it influences their online tasks (see Table 1, Appendix A). The most often-reported user characteristic was profession. Demo-
Chapter 4. Information behaviour and music information retrieval systems
91
graphic information such as age, gender, location, or education level was mostly absent and did not form revealing groups. However, when users describe their online activities or their preferences in organizing their own collections, it was often in reference to some job-related task. The users who most frequently identified their professions were typically musicians and DJs (57 %), followed by writers (18 %), and instructors (14 %). Other examples include helping professions such as therapists (6 %) and teachers (3 %). A diverse user group, and the fact that profession is reported as significant to online behaviour, have important implications for design. Examples were found where performers use online sources to help build assorted, eclectic collections so that in a performance context they are capable of adapting the mood of music. Musicians were also found to use music retrieval systems in ways specific to professional development, or to promote communication between the musicians and fans: Using sites such as PledgeMusic or Kickstarter, musicians are able to make their case to fans and others who are interested in helping cover the costs associated with recording an album — booking studio time, distribution, promotional efforts, and so on. Helping an artist finance their next album usually comes with perks … of course, there’s the simple personal gratification you’d feel in assisting a band get their music out to the masses. (Blogger1)
What do users search for? This analysis looked at the various ways users tend to describe their search intentions. Other studies indicate that search needs tend to be expressed through traditional metadata categories (Lee & Downie, 2004). These might be in the form of song title, artist name, a year of release, or other commonly used fields. The current study also revealed this tendency, and users relied on common metadata categories in 28 % of incidents where music and music searches are described (Table 2, Appendix A). However, based on the data, it is more often the case that users attend to the genre characteristic of music (30 %). Users make references to their search intentions to locate many familiar genre brands, such as rock, pop, dance, jazz, African, heavy metal, country, and others. In one example, a user states the value of being able to access a variety of genres from one source while online. (On the other hand, it seems prudent to keep in mind Downie and Cunningham’s (2002) observation that users search for music in ways that they know are possible in the current system; see Jason Neal’s literature review in this volume for an exploration of cross-genre MIR possibilities). In this case, music material does not necessarily consist of simply song files but also of information about events and concerts.
92
Rasmussen Neal, Conroy
What about a site with over 9000 bands and 45000 live shows to listen and download? It would be absolutely a dream. Imagine a place where you can find several Jazz concerts, several Funk & Soul concerts, several Reggae concerts, several concerts of any music genre your minds can possibly come up with! (Blogger2)
Not surprisingly, the emotional component of music featured prominently for users as well. The category labelled affective descriptors captured incidents where users framed a semantic information need according to some set of emotional adjectives or terminology. These composed roughly 22 % of incidents. Although the emotional qualities of music feature prominently for many music fans, indexing along this category remains highly problematic. In particular, there is high variability in the use of category names between multiple users. Also, the use of a particular emotional adjective such as “lonely” or “exciting” may be insufficient and inaccurate in expressing the complexity and dynamic qualities of an emotional response. One way to overcome this problem is to articulate emotional qualities through some analogical reference to an outside source. In other words, making associations to external entities allows users to convey the intention without being restricted by linguistic terminology. The emotional reaction from music may be caused by coincidental memory encoding of highly emotional prior experiences (Clarke et al., 2009). The descriptions of these episodic memories and events, rather than simple emotional tags, may then be used to express affective qualities. It’s so dreamy. When I listened to this track I was lying on the hospital bed waiting for my brain scan results to come back. At a point where I felt very low, this track made it seem like I was at home lying on my bed and nothing had happened at all. “Lady (Hear Me Tonight)” may have nothing to do with hospitals or getting the shit beaten out of you, but this track made me think of my childhood and all those innocent, harmless times. (Blogger3)
Descriptions of musical features included references to musical style, recording quality, vocal characteristics, and other sonic aspects of musical objects. Extra-musical events were also cited in descriptions, yet were not clearly associated with any particular type of activity, nor with any particular type of user. However, some anomalies are worth noting. Blogs written by musicians and DJs often describe music in terms of its applicability to a social “scene”. The appraisal of online sources is based on attributes which are deemed to be critical to the success of professional activities, such as putting on a live performance. Ideally, systems should have some way of relating the multidimensional experiences of live music, with its highly emotional and social components, to how that music is represented within the system. This may subsequently translate to more precise
Chapter 4. Information behaviour and music information retrieval systems
93
formulations of a search intention. One user creates a vivid account of such an experience: I love going to shows and concerts of all kinds. I love the strangers that become your friends because you are bonding over that same connection you feel with that band. I love the shows because you are dancing so hard that you don’t care what anyone thinks, because they are more than likely doing the same. I love all of it. Even fighting your way out of parking lots at the end of the night; or not being able to fall asleep because your ears are ringing too loudly, and your adrenaline is still pumping through your veins; or waking up the next morning and your body feels somewhere between the mornings after your 21st birthday and competing in a triathalon [sic]. (Blogger4)
Where do users look for music? The analysis of user blogs also sought to identify what resources, both online and offline, users depend on for the acquisition of music. The most often reported resources were popular websites which feature music search, browsing, streaming, and download features, including Spotify, Last.fm, GrooveShark, Pandora, and Yahoo! Music. Uses of particular applications such as iTunes were classified in a separate category since these types of applications facilitate the collection as well as the organization of local collections (Table 3, Appendix A). What was particularly revealing in this category is that users also describe social search activities where information about music is sought in the form of music reviews and recommendation. This influenced the type of source used: advice from friends and family and established professional contacts may be used as sources of new music. Other types of information about music such as musician and band bios, musical history and musicology sources, and musical theory and lessons were mentioned in limited degrees. Maintaining links between music formats and these types of extra-musical data has been an important feature of some retrieval systems to further enrich the semantic quality of online music collections. Although these links were not often described in users’ search objectives, they should be considered significant aspects of an ideal music information retrieval system.
Why do users look for music? The motivation behind information searching and the anticipated goals of users are central to understanding how users engage with information systems and how to improve that interaction. In this analysis, collecting and listening to music
94
Rasmussen Neal, Conroy
for personal enjoyment were the reasons most commonly cited as motivations behind looking for music (41 %) (Table 4, Appendix A). The criteria users describe are sometimes vague and generic (“finding music I like”) making it difficult to distinguish between entertainment and emotional needs, since entertainment is dependent on the ability of music to arouse emotions. However, many users mention the desire to seek music for the express purpose of managing one’s emotions through music’s calming or nostalgic effects, for example. In these cases, the intention is expressed in terms of the emotional descriptors of the sought after music. Emotional engagement may also be dependent on the context in which music is experienced, as indicated above and confirmed in the literature in music psychology (Clarke, Dibben & Pits, 2009). In music information behaviour studies this is also hinted at through the differences in behaviour between research involving textual information and searches for music. In the case of music, such needs are not clearly linear, nor are they driven by cognitive appraisal against the presence of a knowledge gap. Cunningham and Masoodian (2007) revealed that when searching tends to be work, browsing tends to be fun. Owners of music collections enjoy browsing and take pride in the extent and diversity of their collection. Browsing a large collection sometimes uncovers music that we had not realized that we possessed, or that we had not listened to in years. The clearest demonstration of this is that needs are often described in affective, personal, or emotional terms rather than as missing elements of factual knowledge. Moreover, the emotional rewards afforded by music may also serve to aid in the performance of physical as well as cognitive activities. In these example accounts, music is used to help facilitate physical exercise, as well as for the intellectual inspiration for literary characters. Let me describe it another way. I am a runner, and when I find music I really love, music that fills my heart up, it makes me want to throw on my running shoes and hit the pavement. This album does this for me. (Blogger5) I’m a big fan of soundtracks too – I hate not having one for a book, and when a story is still at the stewing-in-my-head stage, I look for music that will suit, and play it over and over again to get in the mood. The Opposite of Amber had quite a retro soundtrack – it wasn’t deliberate, but Ruby’s sister Jinn turned out to have a sentimental and nostalgic streak, and was keen on really old songs from the fifties and sixties. (Blogger6)
As described previously, it is evident that motivation is linked with the type of user. Another blog author and music therapist reports how during the therapy process the cognitive, physical, and emotional components of music are integral in successfully aiding a client. The acquisition of music is mediated through a
Chapter 4. Information behaviour and music information retrieval systems
95
third party (in this case, within an organizational context) and must be appraised based on the ability of music to mimic everyday experiences. After getting the basics, I start getting into the specifics of how the family, client, or other professionals use music in the home or other environments, and how the client responds to it – physically, cognitively, emotionally, and socially. I’m looking for clues as to how their body moves in general, how their thought process works when I give directions (and what kind of directions they best respond to – complex, simple, written, verbal, etc.), and how they connect emotionally and socially with me during the session. (Blogger7)
Some online activities demonstrate that the major incentive is to simply help build a personal music collection. Users indicate a very passionate attachment not only to their stylistic tastes in music but simply to the activity of expanding one’s personal music library. As such, many users exhibit process gratification (Allen, 1996) where fulfilment is achieved merely by engagement in the search process itself regardless of the outcome. One blog writer indicates that there is an insatiable desire and an “appetite” associated with looking for music, as well as a sense of pride in the size of the physical collection. Me, sorting through my 2010 music acquisitions and reviewing them? Holy COW, talk about getting a drink from a fire hose. I’m a huge music fan. HUGE. I’m ashamed to note how many songs I downloaded this year from various sources. (Blogger8)
How do users use systems? Activities associated with using music systems followed a range of different online behaviours (Table 5, Appendix A). Searching and browsing activities featured most prominently (22 %). People describe using sites to stream their favourite music using systems such as Groove Shark or publicly available internet radio sites. Others may download material for local storage. Because of this, these local music collections must now be managed and maintained, introducing a whole separate set of functional requirements from the information system. The size of these personal libraries may rival other publicly available collections, so some users are in the practice of devising their own organization schemes which may include adding new metadata and deciding on broad categories for organization. One writer identified the importance of using accurate metadata for items in his collection, and seems to indicate a sense of obligation for categorizing his collection “correctly”. Conceivably, this may be due to the anticipation of allowing others easy access to his music.
96
Rasmussen Neal, Conroy
I’m not necessarily looking for a program to organize my files for me. I just need to know what things (like genre, year, album) i [sic] should add to the file information. well iTunes is good.. you can add all your songs on it and then right click the song and change the ID3 tag on it … in the iTunes library, each album will automatically be labelled into a specific (not always correct) music category. Also you can filter the iTunes library view to lots of different criteria, such as alphabetically ordered music name, genre, albums and artists. You can also automatically download album artwork for the albums where it’s missing. (Blogger9)
In their ethnographic study of the music information behaviour of students, Cunningham and Masoodian (2010) uncovered a passion that many users have about sharing their collections in the form of playlists and other means. Like photos, music is a way of sharing something about ourselves. Likewise, in the blogs we queried, people relate similar stories of using music to socialize. They describe giving people CDs as gifts or copying music from their friends. They report memorable times music was experienced socially through groups, live performance and important life milestones. The sharing aspect, therefore, remains a fundamental part of people’s intimate relationship to music. For the purposes of this research, we were mostly interested in how these behaviours translate to online activities. We uncovered situations where users describe their experience of actual system use, such as using iTunes and other applications to create and exchange collections. Users also described summarizing collections through written descriptions sent to friends. The fact that users actually publicly describe this through a blog is itself a revealing behaviour, and indicative of the social significance of sharing music. As Cunningham and Masoodian (2010) also pointed out, the music collections that people develop over time can themselves reflect a personal story, including individual feelings, personality, or membership in a sub-culture. Highly useful user information is therefore revealed through the intrinsic properties of the collection. Users acknowledge this personal attachment through the social exchange of their musical choices both online and in the physical world. All this indicates that since sharing remains a central part of the musical experience, it must be supported and facilitated through application designs.
Chapter 4. Information behaviour and music information retrieval systems
97
Discussion Design challenge: Descriptions of music are complex Understanding users means understanding how users apply meaning to the information they seek. With the wide availability of online music, there has been the emergence of many types of online search tools, specialized databases, and applications that are suited for many different devices. The emphasis has always been on relying on retrieval tools and indexing methods like those of text-based systems. Users clearly show a preference to look for music and regard their own collections based on a metadata framework. They also employ more subjective genre tags. The construction of an ideal music information retrieval system must therefore be able to accommodate these types of access points. With music, indeed with any type of information that involves a high element of artistic expression, there is a problem in accessing the high-level semantic meanings of the object itself, such as the complex emotions being invoked, or the rich set of personal associations people ascribe to music. Emotions are bound to whole collections as evidenced in the way playlists are used and shared, and by entertainment users derive from the act of browsing music. The emotions of music are cited as facilitating physical behaviours as well as cognitive behaviours, and these feelings may also be linked with outside experiences that provide a context for recreating the emotion. The subjective qualities of these descriptions make creating a standard way to access music from the emotional aspect difficult. With the exception of happiness, there is only nominal consensus across individuals as how to index the emotional qualities of music (Lee & Neal, 2007; Neal et al., 2009). Byrd and Crawford (2002) raise the problem of expressing the semantic meaning in system design. These meanings are often unpredictable and highly varied, making difficulties in forming a manageable set of tags that may be applied to music. Users may recognize “good” quality when they hear it, but they may not be able to formulate a query string to match affective needs, nor apply affective-based terms to a piece of music in a way consistent with other users. The ability for information systems to capture, describe, and provide access to these features continues to be a predominant problem in the development of these technologies. Alternatives to standard metadata are therefore necessary to help narrow the semantic gap (Ruger, 2010) and extract the emotional nature of music through alternative means. Automatic extraction of sonic features may be used to translate music events to symbolic descriptions, for example, equating fast tempos with excitement. Audio fingerprinting and other content-based methods use the digital profile of music facets, for example, rhythm, melody,
98
Rasmussen Neal, Conroy
harmony, timbre, tempo, and so forth to infer something about the character of music. Other solutions may describe high level structure formally using markup languages such as MusicXML (Good, 2001) which may be used to approximate the emotional character of music. User-created annotations and implicit user metadata (Corthaut, Govaerts, Verbert, & Duval, 2008) are other methods which have been proposed whereby semantic meanings can be assigned to multimedia objects (Ruger, 2010). Alternative approaches may incorporate novel ways to assess user reactions in real time and to segment these findings based on user type. One example is the TagATune system (Crawford, Law, von Ahn, & Dannenberg, 2007), in which users collaborate on assigning appropriate semantic tags. Due to the limitations of language in creating appropriately descriptive semantic tags, tools can also serve to convey describe the qualities of music through visual primitives such as colour and shape. Clustering can be used in conjunction to form meaningful groups of music that possess similar features. Imagery and artistic representations can be used to index music in useful ways. Artwork and its association to music and brands of musicianship express culturally meaningful connections that can be used to build interest and musical identity. Bainbridge, Cunningham, and Downie (2004) also found that by offering intriguing visual representations in the form of CD covers and album libraries, and by engaging textual information in the form of accompanying notes, users can experience and probe information in multiple formats. This type of presentation helps initiate deeper investigation: an important component of music searching behaviour (Cunningham et al., 2003). The stories behind real world music experiences and the use of visual media like photographs provide memory triggers to help describe music. It would be interesting to investigate whether the reverse is true; in other words, to the extent that image searching behaviour is associated with music preference, or in what ways the search for images with music subjects may be similar to search for music itself. Photographic representation of artists or musical imagery may strike the same affective descriptions of the music itself and could provide detail needed for indexing of the affective aspect.
Design challenge: Musical search is social Musical expression, in whatever form, may be regarded as an human adaptation based on its social utility (Huron, 2006). The act of making and listening to music in groups, and the use of music in cultural, religious, and political engagements throughout the centuries, point toward the ability of music to facilitate many forms of social activities. People tend to use informal channels as information sources and share content as a way of strengthening existing relationships.
Chapter 4. Information behaviour and music information retrieval systems
99
It is not surprising, then, to see this phenomenon translated to virtual environments and manifest in online behaviour. The accounts of people’s use of music indicated a propensity to exchange music, as well as express to others what they value in music, their musical tastes, favourite concerts they’ve seen, and what bands they have in their collections. As indicated previously, users have a strong preference to share music online in the form of playlists. These playlists may be diverse in content, yet they are often constructed and shared in such a way as to communicate social intention and to strengthen existing friendships. The findings from this study are similar. System development that supports playlist functionality should not only provide easy ways to construct and index playlists, they may also be linked to content as a way to aid in retrieval and increase the descriptive quality of the media. Lee and Downie (2004), using web based surveys to examine music information behaviour, found that users consult a range of sources when seeking out music information, including friends, family, other musicians, and collective knowledge from online communities. Users may seek out music experts or other informants as a way to acquire music based on personal recommendations. Pre-existing familiarity with reviewers, site reliability, and conformance to existing modes of taste all contribute to the degree to which information is used. As indicated in the blog accounts of music searching, these activities were found to varying degrees; there is indeed an important social component to the formation of musical tastes. Sometimes this communication has a direct benefit on people’s working lives as well. For example, musicians and DJs reported the value in communicating with fellow colleagues as a way to access more specialized resources. Information is transferred between users not simply as a passive activity, but to help foster interest in musical styles and genres. In some instances, this influenced the choice of websites used, what users search for and most significantly, how their personal lives benefit from these activities. Recommendations can be added through the use of annotations as a way to enhance the information value of specific items. Semantic Web vocabularies also provide a way of supporting the creation-rich associations between people and web resources. The FOAF (Friend of a Friend) vocabulary is a common standard for describing people, groups or organizations through the use of concepts which are constructed into identity profiles (Kruk & McDaniel, 2009). With the development of increased semantic metadata on the web, further relationships may be drawn based on detailed user characteristics (such as demographic data, geographic data, and online behaviour) which can, in turn, aid in promoting music to otherwise disparate groups. Problems with existing design assumptions should be addressed here. In the case of music, fast, efficient, and accurate retrieval remains important, but it is only part of the puzzle. The user experience has become an especially impor-
100
Rasmussen Neal, Conroy
tant facet of system design. Of course, the objective of many other technologies (especially social networking and online gaming) has been to replicate real world experiences as much as possible through the integration of different social elements and modes of communication. This has great but yet unrealized potential for the construction of an ideal music retrieval system. However, evaluating system performance as it is currently conducted through objective measures remains an inadequate way to gauge the “effectiveness” of social technologies. A second problem is that the evolution of information system design has also been specific to content format. These systems are distinguished based on whether the information is text, image, video, or music. Many devices are now able to support behaviours involving multiple modalities such as playing music, taking photographs, and composing text. This makes divisions between how formats are stored now seem especially artificial and outmoded. The capabilities of current systems highlight the value of investigating which information behaviours are common regardless of whether users are dealing with videos, songs, texts, or images. In one attempt, Bentley, Metcaffe and Harboe (2006) investigated consumer use of photos and music in creating collaborative experiences. Sharing media is used as a way to reminisce or to communicate an experience. They identified a very significant association between the common function of media types which share narrative attributes. The current study used the qualitative analysis of users’ online behaviour to demonstrate that supporting social functionality means users may exchange and annotate various media types by their common narrative features. Integrating music with other media, such as personal photos and written communications, increases the effectiveness of information systems since it exploits the narrative quality. It is up to a digital library implementation to group these items and remove the burden of users who wish to seek different cultural expressions regardless of the particular form or mode of their representation. For example, a sound recording of various performances, multiple editions of a published score, and other information that is related to each item can be achieved through content links. In addition, access to reviews and recommendations supports the natural social search component of music information behaviour, and provides a way for users to expand their existing scope of interest.
Design challenge: Managing personal music collections The highly apparent data-centred framework used in many current systems is not well-aligned with observed seeking behaviours. Looking for music materials is often not directed by a perceived gap in knowledge, nor is it easily formalized into
Chapter 4. Information behaviour and music information retrieval systems
101
query terms. The challenge presented in process gratification as evidenced in the blog findings is that the motivations for seeking music are not cognitive. Indeed, the mere acts of looking and listening are what users find enjoyable. The datacentred model, which is more suited to targeted search behaviour, is inappropriately being applied to a situation that is not typically characterized by formalized goals. This difference between the actual and the presumed use of systems has been termed the Use Context Gap (Deserno, Antani, & Long, 2008). This describes the discrepancy between the needs of users working within a particular context and the choice of features, granularity of representation, and modes of presentation deemed appropriate to the situation. These findings do tend to appear to contradict findings from previous studies which show that when using multimedia systems, the quality of information was valued more highly than the quantity (Allen, 1996). Rather than focusing on targeted search capabilities, designs must support exploratory activities such as browsing while presenting music materials at various levels of abstraction. Since users often do not have a specific search goal in mind, it is the work of the system to indicate areas of potential discovery. For example, users might be interested in finding songs from alternative genres that match in key, rhythm, or tonality. Visual presentation of sonic features and matching algorithms that are based on these features create associations that are not otherwise apparent in metadata tags such as artist names, publication years, genre descriptions, or song titles. In his discussion on information behaviour models for the use of multimedia, Case (2007) describes Play Theory, noting its applicability to the use of entertainment media. At the heart of Play Theory is not only the idea that humans tend to seek pleasure and avoid pain, but also that they tend to mix work with play. He notes that there is no “need”, no anomalous state of knowledge, and no knowledge gap event (Case, 2007). Music information behaviour seems to reflect this as well, since it seems that motivation for hunting for music online is the gathering process itself. The central ideas behind Play Theory are supported by these findings and lie in contrast to more standard information behaviour models that often assume the presence of an identifiable information need. Investigation into similar notions and models of information behaviour present ways to gain insight into how system designs may emphasize the entertainment and experiential components rather than simple efficiency. As discovered in the content of many blogs, music is not acquired for immediate consumption. Users often report that the objective of consulting several online sources is to help build vast personal collections of varied content. As the cost of digital storage drops and transfer speed increases, collections are able to grow into the terabyte range. Users are no longer compelled to make tough decisions about what to keep and what to discard, and aside from the difficulty
102
Rasmussen Neal, Conroy
in managing collections once they are created, there is little personal cost. The ease of acquiring music means that these collections have the potential to grow large enough that they rival other publicly available collections. As a result, users benefit from tools that help them organize and label collections for personal retrieval when it becomes increasingly difficult to merely remember the contents of the collection. Not only this, but when information is shared, other users also require accurate and concise metadata to retrieve known items. The problem of personal information management (PIM) has been well-studied in the information science literature, with various tools designed to help individuals maintain collections and to assist in the ordering of information through categorization, placement, or embellishment in a manner that makes information easier to retrieve (Jones, 2007). Despite what has been proposed in this PIM work, users still resort to their own ad-hoc methods organization. Some chose no particular method and opted to browse items by file name or whatever metadata exists along with the file. Popular devices such as the iPod and software such as iTunes are often cited as standard PIM choices. I, too, would go through periods when i [sic] would try to figure out new ways of organizing all of those LPs. but [sic] now, everything is in my iTunes library, and all that matters for organizing my music collection is how i [sic] tag my files and create playlists … the iPod has certainly changed the world of music and the way we experience music. who would’ve thought, 30 years ago, that i [sic] could take a trip with hundreds of live Grateful Dead concerts, every single Bob Dylan album, all of Franz Schubert’s lieder, all of Haydn’s 104 symphonies—all on a single device. (Blogger10)
Certain categories of users are especially interested in the ability to construct large, diverse personal music libraries. As indicated above, music-related professions such as DJing featured prominently in many blog accounts, where professionals describe the barriers and techniques used to accommodate a wide range of musical styles for performance and to retrieve these items based on their audience. As technology has improved and availability of digital music has become more ubiquitous, finding and collecting music has become less of an issue compared to the problem of actually organizing these collections. We have so much music as [sic] our disposal it’s really important for us to start organizing our music in a way that allows us to access our music with ease and convenience. After all, especially if you’re performing, the whereabouts of your music is just as important as your ability to perform in front of a live crowd. (Blogger11)
One method cited for organizing music files is through hierarchical folder structures. For users with particularly large collections, meaningful structure is not
Chapter 4. Information behaviour and music information retrieval systems
103
gleaned from the metadata that comes with a particular file, so hierarchies in the form of folder structures provide placeholders for retrieval. This leads to location-based searching when items are needed at a later time. This involves individuals taking a guess at the directory / folder where they think the file might be located, going to that location and browsing the list of files in the location until they find the file. In their study of personal information management, Nardi and Barreau (1995) reported that users preferred browsing a list of files rather than trying to remember the exact file names. They believed that location-based filing more actively engages the mind and bestows a greater sense of control and noted that individuals preferred filing by location because it served as a crucial reminding function. To accommodate location it is helpful to use display formats which assist in browsing and scanning activities. Representing folder structures can be done through visualization tools, such as 2-D and 3-D tree maps (Shneiderman, 1990) which can translate content properties into visualizations. In the case of music, fine-grained information about items in the collection may be more important than the physical location within the system. This is because file information such as song title, date, or media format provide less meaning than the existence of actual musical properties. To this end, certain musical features such as beat, meter, timbre, or loudness can be translated to visualizations that provide item relationships and better interactivity within the collection. Some notable examples of this are demonstrated in visual clustering and “Islands of Music” systems (Pampalk, 2001), automatic summarization (Cooper & Foote, 2002), and 3D visualizations based on feature extraction (Leitich & Topf, 2007). Whether these methods are favourable in practical applications is not yet clear. However, these systems introduce design concepts which are seemingly strongly aligned with user needs when it comes to many tasks associated with managing large file collections. On the other hand, large-scale visualization of these types may work well for collection-level representations, but perform poorly for knownitem searching. The key to a successful implementation is to allow for flexibility around what features may be matched since different users have specific usability preferences, search goals, and types of music they want stored in the collection. In addition to location based searching and collection-level browsing is the desire for users to assign their own descriptive labels to their music. One notable type of description is genre. For various reasons, users look for music based on genre and have individual notions of what genres to assign to their own music. I thought I made up Americana as a genre. Turns out I didn’t, it’s sort of a sub-genre in the Alt Country / Folk genres. I characterize it as sort of unsophisticated country oriented music, somewhere between bluegrass, country and Bruce Springsteen. (Blogger8)
104
Rasmussen Neal, Conroy
Another user describes a personal labelling technique which is based on genre, and relies on when the music was acquired rather than on some set of prescribed properties: I’m going to do it by genre, and then by rank. Most people look for music by genre first.. at least I do. The criteria again, just like the books, music that I acquired this year, not necessarily music that just came out in 2010. The other hard part.. in the iTunes era.. albums just don’t matter like they used to. (Blogger8)
Here, the category “rank” may be ambiguous to others, yet it provides an important distinguishing property for the user. Although what features designate certain music for membership in a particular genre can be highly user-specific, it is often found to be a commonly used access point for music retrieval (Cunningham et al., 2003). There are several setbacks in using personalized genre descriptions as a form of labelling and categorization, however. The first is that the concept of genre is something that is not easily understood or defined either by music experts or enthusiasts; there is no singular characteristic that disambiguates a genre. Also, any one genre category may include a number of different subgenres, for example hardcore punk as a subgenre of rock and roll. Finally, genre types have many highly historical and cultural connections that must be accounted for any definition of genre (Abrahamsen, 2003). These problems of assigning genre tags are compounded if collections are to be shared between users. Users apply their own definitions of genre as descriptors, yet these categories may be misinterpreted if they do not share similar experiences. To resolve the problem of labelling, some users devise other intuitive themes and logical groupings of music. This includes the use of artist or song exemplars to provide more translatable musical descriptions and help to characterize a particular musical style or era. Confirming research in cross-cultural use of MIR systems (Lee et al., 2005), exemplars were also found to be a useful tool for describing categories which defy linguistic descriptions, as well as a way to make novel, cross-genre associations. Describing subsets of a collection in terms of audio features, emotions, metadata, or other features apart from genre can be highly contextual and dependent on the background of the user. Again, these labels may be useful for individual users but problematic when others need to understand these labels. As collection sizes grow, remembering where music is located or how it may have been categorized is a fundamental problem of personal information management. How music in the collection is used, for its emotional quality or its ability to convey social or cultural meanings, affects how it is organized. In recent years researchers and designers have begun to explore the potential of
Chapter 4. Information behaviour and music information retrieval systems
105
using various pieces of contextual information to aid retrieval. Standard forms of context data such as time, date, and number of accesses have proved beneficial since they organize information in a way that conforms to conventional use of the information. Since musical experiences are often linked to outside events, contextual metadata serves as a memory aid for retrieval. Elsweiler and Ruthven (2007) put forward the hypothesis that memory lapses make it difficult for people to find items in their personal archives. To address this, contextual data allows us to harness crucial components of memory for recollection of items we wish to reretrieve. Context-based search is identified as a personal information management strategy which helps facilitate locating activities by creating associations or links between items and events. Fuller, Kelly, and Jones (2008) explored this idea by examining the usefulness of relating items based on contextual descriptions of chronological proximity. The LifeStreams system offers an alternative approach to existing personal file organizational hierarchies by using a time-ordered stream as a storage model and stream filters to organize, locate, summarize and monitor incoming information (Jones, 2007). The chronological facet of music can include year of production, but also more relative criteria such as how songs are placed on an album or the order in which songs were downloaded. Such information provides memory cues that may facilitate retrieval. So, describing an example song might look like, “the song I heard last Friday at my friend’s school orchestral performance”. Here, the context comprises several dimensions in addition to time, including a person (my friend) and an event (a school orchestral performance). In addition, relating music files to other data such as email exchanges, album reviews, album artwork, photos, or video again serve as contextual memory aids for the management of large collections.
Conclusion Blog accounts of music information behaviour provide a rich and multifaceted data source while revealing important insights to help fuel user-centred system design. The current study was limited in its ability to construct comprehensive profiles of user characteristics since the methods employed relied on initiative of users to describe their specific characteristics. However, based on the limited findings reported in this chapter, several mismatches exist between the standard system-centred approach to design and the needs of average MIR users. These needs are complex and multifaceted. The complexity consists primarily in the way users describe music in general as well as search intentions. This means that
106
Rasmussen Neal, Conroy
systems should provide flexible methods to better integrate the affective characteristics of music. The tendency for users to share and exchange music as part of their social lives requires that system designs should integrate social networking and other communication technologies so that natural behaviour is more integrated with the role of music systems. Finally, online music searching behaviour resembles open-ended, leisurely browsing rather than targeted, known-item searching. Many users, including musicians and other music professionals, seek music specifically to build large personal music libraries. To facilitate these tasks, information systems should allow users to more easily manage and describe these collections to assist in the retrieval of items both for collection owners and for others who may benefit from having access to these collections. Future work in this area may consist of efforts to use accounts of users’ social activities and individual preferences regarding the use of music systems. In particular, further exploration should be made into how online activities correlate with user types. This work could aim to uncover dependency networks which emerge online between systems, users, and the music information they exchange.
References Abrahamsen, K. (2003). Indexing of musical genres. An epistemological perspective. Knowledge Organization, 30(3 – 4), 144 – 169. Allen, B. (1996). Information tasks: Toward a user-centered approach to information systems. San Diego, CA: Academic Press. Bainbridge, D., Cunningham, S. J., & Downie, J. S. (2004). Visual collaging of music in a digital library. Proceedings of the 5th International Symposium on Music Information Retrieval, 397 – 402. Bentley, F., Metcalf, C., & Harboe, G. (2006). Personal vs. commercial content: The similarities between consumer use of photos and music. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 667 – 676). Byrd, D., & Crawford, T. (2002). Problems of music information retrieval in the real world. Information Processing & Management, 38, 249 – 272. Case, D. (2007). Looking for information: A survey of research on information seeking, needs, and behavior. London, UK: Elsevier. Clarke, E., Dibben, N., & Pitts, S. (2009). Music and mind in everyday life. London: Oxford University Press. Cooper, M., & Foote, J. (2002). Automatic music summarization via similarity analysis. Proceedings of the International Symposium on Music Information Retrieval (pp. 81 – 85). Paris, France.
Chapter 4. Information behaviour and music information retrieval systems
107
Corthaut, N., Govaerts, S., Verbert, K., & Duval, E. (2008). Connecting the dots: Music metadata generation, schemas and applications. Proceedings of the Ninth International Conference on Music Information Retrieval (pp. 249 – 254). Crawford, M., Law, E. L. M., von Ahn, L., & Dannenberg, R. B. (2007). TagATune: A game for music and sound annotation. In S. Dixon, D. Bainbridge, & R. Typke (Eds.), Proceedings of the International Symposium of Music Information Retrieval (pp. 361 – 364). Cunningham, S. J., Reeves, N., & Britland, M. (2003). An ethnographic study of music information seeking: Implications for the design of a music digital library. Proceedings of the 3rd ACM / IEEE-CS Joint Conference on Digital Libraries (pp. 5 – 16). Cunningham, S. J., & Masoodian, M. (2007). Management and usage of large personal music and photo collections. Proceedings of the 2007 IADIS International Conference on WWW / Internet (pp. 163 – 168). Deserno, T., Antani, S., & Long, R. (2008). Ontology of gaps in content-based image retrieval. Journal of Digital Imaging, 22(2), 202 – 215. Downie, J. S. (2008). The music information retrieval evaluation exchange (2005 – 2007): A window into music information retrieval research. Acoustical Science and Technology, 29(4), 247 – 255. Downie, J. S., & Cunningham, S. J. (2002). Toward a theory of music information retrieval queries: System design implications. Proceedings of the 3rd International Conference on Music Information Retrieval (pp. 67 – 92). Elsweiler, D., & Ruthven, I. (2007). Towards task-based personal information management. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 22 – 30). Fuller, M., Kelly, L., & Jones, G. J. F. (2008). Applying contextual memory cues for retrieval from personal information archives. PIM 2008: Proceedings of Personal Information Management, Workshop at CHI 2008. Good, M. (2001). MusicXML: An internet-friendly format for sheet music. XML 2001 Conference Proceedings. Hu, X., Downie, J. S., & Ehmann, A. F. (2006). Exploiting recommended usage metadata: Exploratory analyses. Proceedings of the 7th International Conference on Music Information Retrieval (pp. 67 – 72). Hu, X., Downie, J. S., West, K., & Ehmann, A. F. (2005). Mining music reviews: Promising preliminary results. Proceedings of the 6th International Conference on Music Information Retrieval (pp. 536 – 539). Huron, D. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA: The MIT Press. Ingwersen, P., & Järvelin, K. (2005). The turn: Integration of information seeking and retrieval in context. Secaucus, NJ: Springer-Verlag. Jones, W. (2007). Personal information management. In B. Cronin (Ed.), Annual Review of Information Science and Technology 41 (pp. 453 – 504). Medford, NJ: Information Today. Kruk, S., & McDaniel, B. (Eds.) (2009). Semantic digital libraries. Heidelberg, Germany: Springer. Laplante, A. (2008). Everyday life music information-seeking behaviour of young adults: An exploratory study. (Unpublished doctoral dissertation). McGill University, Montreal, QC. Laplante, A., & Downie, J. S. (2011). The utilitarian and hedonic outcomes of music informationseeking in everyday life. Library & Information Science Research, 33(3), 202 – 210.
108
Rasmussen Neal, Conroy
Lee, J. H., & Downie, J. S. (2004). Survey of music information needs, uses, and seeking behaviors: Preliminary findings. Proceedings of 5th International Conference on Music Information Retrieval (pp. 441 – 446). Lee, J. H., Downie, J. S., & Cunningham, S. J. (2005). Challenges in cross cultural / multi-lingual music information seeking. Proceedings of the 6th International Conference on Music Information Retrieval (pp. 153 – 160). Lee, H. J., & Neal, D., (2007). Toward Web 2.0 music information retrieval: utilizing emotion-based, user-assigned descriptors. Proceedings of the 70th Annual Meeting of the American Society for Information Science and Technology (pp. 732 – 741). Leitich, S., & Topf, M. (2007). Globe of music: Music library visualization using GEOSOM. Proceedings of the 8th International Conference on Music Information Retrieval. Vienna, Austria: OCG. Leman, M., Styns, F., & Noorden, L. (2006). Some basic observations on how people move on music and how they relate to movement. In A. Gritten & E. King (Eds.), Proceedings of the Second International Conference on Music and Gesture. Manchester: Northern College of Music. Lesaffre, M., Baets, B., Meyer, H., & Martens, J. (2007). How potential users of music search and retrieval systems describe the semantic quality of music. Journal of the American Society for Information Science and Technology, 59(5), 695 – 707. Nardi, B., & Barreau, D. (1995). Finding and reminding: File organization from the desktop. ACM SIGCHI Bulletin, 27(3), 39 – 43. Neal, D., Campbell, A., Neal, J., Little, C., Stroud-Matthews, A., Hill, S., & Bouknight-Lyons, C. (2009). Musical facets, tags, and emotion: Can we agree? Proceedings of the 2009 iConference. Retrieved from http: // publish.uwo.ca / ~dneal2 / musictagging_neal.pdf Pampalk, E. (2001). Islands of music: Analysis, organization, and visualization of music archives. (Unpublished master’s thesis). Vienna University of Technology, Department of Software Technology and Interactive Systems, Vienna, Austria. Ruger, S. (2010). Multimedia information retrieval. San Rafael, CA: Morgan & Claypool Publishers. Shneiderman, B. (1990). Tree visualization with tree-maps: 2-d space-filling approach. ACM 44Transactions on Graphics, 11(1), 92 – 99.
Appendix A: Blog coding results User Type Musician / DJ Writer Instructor / Musicologist Therapist Teacher Author Table 1. User types.
Percentage 57 % 18 % 14 % 6 % 3 % 2 %
Chapter 4. Information behaviour and music information retrieval systems
Description Type Genre Descriptive Metadata Affective Descriptors Musical Features Events
Percentage 30 % 28 % 22 % 14 % 6 %
Table 2. Musical features.
Resource / Location Online Search / Streaming Sites iTunes and Music apps Music Review Sites Friends Other Professionals / Musicians
Percentage 35 % 19 % 18 % 16 % 12 %
Table 3. Resources.
Motivation Entertainment Emotional Management Social / Relationship building Professional Work Physical Activity
Percentage 41 % 28 % 14 % 9 % 8 %
Table 4. Motivation.
Activity Online browsing & querying Downloading Listening / Streaming Sharing Organizing / Categorizing Buying Table 5. Behaviours.
Percentage 22 % 20 % 17 % 13 % 11 % 7 %
109
110
Rasmussen Neal, Conroy
Appendix B: Queries sent to Google Blogs for data collection ID
Query Syntax
Actual Query
001
%2522searching%2B|%2Bsearch%2Bfor%2Bmu sic%2522 %2522looking%2B|%2Blook%2Bfor%2Bmusic%2522 %2522favourite%2Bbands%2B|%2Bmusic%2B|% 2Bconcerts%2522 %2522locate%2B|%2Blocating%2Bmusic%2522
“searching | search for music”
002 003 004 005 006 007 008 009 010
%2522recommend%2B|%2Brecommending% 2Bmusic%2522 %2522download%2B|%2Bdownloading% 2Bmusic%2522 %2522organize%2B|%2Borganizing%2Bmusic%2522 %2522acquire%2B|%2Bacquiring%2Bmusic%2522 %2522organize%2B|%2Borganizing%2Bmusic%2522 %2522share%2B|%2Bsharing%2Bmusic%2522
“looking | look for music” “favourite band | music” “experience| experiencing music” “recommend | recommending music” “my music” “organize | organizing music” “music search” “collect | collecting music” “share | sharing music”
Margaret Lam, Matt Ratto
Chapter 5. Seeking what we have yet to know: A user-centred approach to designing music knowledge platforms Abstract: From the perspective of an individual aspiring to learn about music, self-directed learning can be a daunting and potentially difficult task. Buying the right instrument, finding instructional materials, figuring out the vocabulary, and finding social support through the learning process are just some of the challenges one has to face. Learning music independently, without access to the right resources, can be a very discouraging experience. This chapter explores the idea of developing domain-specific user profiles as an effective design approach for non-textual information systems. Such approach is not meant to be a replacement for existing practices and methods, but rather an alternative and complimentary approach that encourages a more holistic approach to system design. More specifically, the idea of a knowledge trajectory is introduced as a way of conceptualizing user’s preferred modes of learning and information needs, which are not always well articulated. While the concept is still in its early stage of development, the fundamental framework will be discussed. Keywords: Music information retrieval, information practices, user-centred design, personas, information systems, system design
Margaret Lam (corresponding author), Faculty of Information, The University of Toronto, [email protected] Matt Ratto, Assistant Professor, Faculty of Information, The University of Toronto
Introduction If we wish a different world, it is necessary to design humane and liberating technologies that create the world as we wish it to be. (Nardi, 1996, p. 44)
From the perspective of an individual aspiring to learn about music, self-directed learning can be a daunting and potentially difficult task. Buying the right instru-
112
Lam, Ratto
ment, locating instructional materials, grasping the vocabulary, and identifying social support through the learning process: these are just some of the challenges one might have to face. Learning music independently, without access to the right resources, can be a very discouraging experience. Drawing on the findings from an exploratory study¹ on the way five nonmusicians² seek music knowledge on their own, this paper explores the idea of developing domain-specific user profiles as an effective design approach for non-textual information systems. Such an approach is not meant to be a replacement of existing practices and methods, but as an alternative and complimentary approach that encourages a more holistic approach to system design. More specifically, we will focus on how our understanding and conceptualization of users’ preferred modes of learning and information seeking may impact the design of information systems and knowledge platforms that support self-directed learning within the domain of music. While the ideas are still in their early stages of development, the fundamental framework is discussed in this chapter. The five cases used in this study were developed based on semi-structured interviews with self-identifying non-musicians who responded to a general call. The interviews were exploratory in nature, with the intention of learning more about the information practices of individuals who engage in self-directed learning of music. Each of the non-musicians exemplified the creative ways through which individuals overcome the information seeking challenges described earlier, and their perceived personal limits in terms of music learning abilities. They demonstrate not only the diversity of ways that the singular task of learning music can be tackled, but also the shared desire of so-called non-musicians to take ownership of their own learning. In other words, while the processes of learning were not the same across the five cases, the motivations for engaging in self-directed learning were strikingly similar: a desire for a flexible and personalized way of learning that is driven by a sense of curiosity, and a learning process that is constantly shaping and adapting to one’s learning needs and interests. A note about the use of the term music knowledge in this study will be useful for readers not familiar with information research in the domain of music. The term music knowledge was used instead of the more widely used term music
1 A 2011 Masters thesis entitled Online music knowledge: The case of the non-musician, supported by the Social Sciences and Humanities Research Council. 2 Non-musicians are loosely defined as individuals who have not been exposed to any regular music instruction over a significant period of time via either formal lessons with a teacher or interactions with a family or community member on a casual basis.
Chapter 5. Seeking what we have yet to know
113
information in order to acknowledge the situated nature of music information. For example, whether someone wants to learn more about a band, an instrument, a musical style, a specific song, or a cultural practice, the type or genre of information they are looking for may differ wildly depending on how they understand and engage with music. This reality is not reflected by the way music information is treated within music information retrieval (MIR) research, which has a tendency to qualify music information as digitally encoded musical data from which musical information such as musical key, chord, pitch, and rhythm can be isolated and extracted. Bearing this in mind, the term music knowledge is used intentionally to highlight the ephemeral and embodied nature of musical-knowing, and the great variety of ways by which music is transmitted and preserved beyond written notations or audio recordings. The term also helps emphasize the socially-mediated nature of knowledge in general, as well as the multiplicity of ways in which it is mediated by notation records, instruments, and technology. In many ways, the use of the term is an active attempt to avoid defining what kind of music is worth studying in relation to dominant paradigms, such as the western European paradigm of music. Instead it encourages one to cast a larger net to encompass music in all its different forms and meanings to different people and societies around the world.³ However, due to conventional usage, music information and music knowledge will be used interchangeably throughout this chapter. For literature that has informed the conceptualization of music knowledge in this chapter, please see Green (1988) for a critical view within the field of music education, and Born (2010) in the broader field of music research. See also Hobart (1993) for a discussion of knowledge and ignorance from an anthropological perspective.
3 While not discussed in detail within this paper, this epistemological perspective informs our thinking across a spectrum of issues related to the transmission of music knowledge, the production of music, inter-disciplinary approaches to music research, and the role of technology and emergent digital spaces, all of which mediate our experience and understanding of music in some way.
114
Lam, Ratto
Situating the information practices of non-musicians This aim of the study was motivated by a desire to better understand the information needs of non-musicians, in order to provide appropriate support for their pursuit of music knowledge. The five cases served as a starting point to map out the profiles and needs of such users, as well as the basics for the initial efforts to develop a useful framework, as will be seen later in the chapter. To begin, we found two dimensions of the non-musicians’ self-directed learning that are closely related to their information practices: 1. Trajectory of learning: informs the nature of music knowledge that one is seeking; and 2. Modes of learning: informs the way one seeks and conceptualizes music knowledge. Applying these two concepts, the way non-musicians seek information can be described in the following way: 1. They are orienting themselves by shaping and / or navigating a trajectory of learning; 2. They are exploring mediations of music (i.e. people, instruments) and modes of learning. Non-musicians’ trajectories of learning are shaped by their general interests; in particular, these interests include musical worlds, practices, or genres. Their modes of learning are shaped by their temporal, spatial, physical, and cognitive strengths and limits. Respectively, they represent both the “larger” socio-musical contexts and the “smaller” personal contexts within which learning is engaged, or what we will refer to as small world and large world contexts (See Figure 1).
Chapter 5. Seeking what we have yet to know
115
Figure 1. Situating non-musicians’ information practices within their small and large world contexts.
Conceptualizing non-musicians’ information needs as arising out of engagement with small and large world contexts has some affinity with what Engeström (1999) has described as expansive learning, a mode of learning which is a hybrid of the acquisition and participation forms of learning: The theory of expansive learning focuses on learning processes in which the very subject of learning is transformed … Initially individuals begin to question the existing order and logic of their activity. As more actors join in, a collaborative analysis and modelling of the zone of proximal development are initiated and carried out. Eventually the learning effort of implementing a new model of the activity encompasses all members and elements of the collective activity system. (Engeström & Sannino, 2010, pp. 5 – 6)
While the theory of expansive learning was applied post-analysis in order to facilitate discussion, it is a framework that reflects the way we have conceptualized the seeking of music knowledge, namely as “an on-going process of questioning, discovery, and changing both oneself and the world” (Mwanza-Simwami, Engeström, & Amon, 2009, p. 365). Their work will be referenced occasionally to highlight the dynamic nature of learning and how it informs our conceptualization of non-musicians’ information practices. Throughout this paper, we will also draw on Nonaka’s (2004) concept of Ba as a space that facilitates knowledge work, which Nonaka emphatically distinguished from information. This theory was also applied retroactively to discuss
116
Lam, Ratto
research findings, but the spatial metaphor as well as his use of axial-spectrum graphs to model his theory made him particularly well suited to our concept of non-musicians’ information practices as a process of navigating or orienting oneself. In Part I, we will give a brief introduction to both of these frameworks, as well as introduce the work by Howard Becker (1982, 1992, 1995, 1997) on social worlds and Antoine Hennion (2001, 2003) on the sociology of music. Part II offers a brief introduction to each of the five cases. Part III and IV will discuss the concepts of trajectories of learning and modes of learning in greater depth, while Part V will discuss the concept of knowledge trajectories, which will integrate various ideas presented in this paper.
Part I: Frameworks for discussion Music itself is a boundary that is too impervious. Depending on the case, it either effectively outlines the precise borders of a practice isolated from others or, on the contrary, has to be “reconfigured” within a larger set, from outings with friends, parties and shared listening to the same music, to a set of strongly integrated cultural elements. (Hennion, 2001, p. 18)
One important overarching framework that should be mentioned at the onset is that our conceptualization of music is informed by what Becker (1982, 1992, 1995, 1997) describes as social worlds within which music is necessarily situated, and what Hennion (2001, 2003; Loosley, 2006) refers to as mediation. According to Becker (1982, 1992, 1995, 1997), who uses the art world as the focus of his discussion, social worlds consist of inter-connected and co-constituting actors, institutions, and relationships. What distinguishes the world of art from the world of craft is not simply a matter of taste, but the result of various agents’ interaction with and dependence on each other to legitimate both themselves, as well as the art world. Our direct experiences of music are similarly dynamic, as “music has nothing but mediations to show: instruments, musicians, scores, stages, records” (Hennion, 2003, p. 83). A recognition of the dynamic nature of such social worlds—or what we will refer to as socio-musical worlds—and the mediated nature of all our musical experiences serve as a departure for many of the ideas that will be developed in this paper. Expansive learning requires articulation and practical engagement with inner contradictions of the learners’ activity system … Expansive learning is an inherently multi-voiced process of debate, negotiation and orchestration. (Engeström & Sannino, 2010, p. 5)
Chapter 5. Seeking what we have yet to know
117
The findings in this study are also closely related to Engeström’s (1999) heuristic framework of expansive learning, which is useful in “analyses of learning in non-traditional, hybrid and multi-organizational settings” where “nobody knows exactly what needs to be learned” (Engeström & Sannino, 2010, p. 3) ahead of time. They also note “the interactive potential of the internet, or Web 2.0 … opens up a field of possibility for the formation of new types of activities and values with huge expansive potentials” (p. 4). In particular, the notion of “contradiction” as an important motivation for expansive learning, as well as the transformation process of a germ of an idea into a learning practice, reflect the personal challenges that non-musicians have to confront as part of their learning process. There is no “right” way that non-musicians engage in self-directed learning, as they are all driven by their individual inclinations and interest. Yet, when taken together, their trajectories and modes of learning sketch out an expansive space within which self-directed learning is engaged. If knowledge is separated from ba, it turns into information, which can then be communicated independently from ba. Information resides in media and networks. It is tangible. In contrast, knowledge resides in ba. It is intangible (Nonaka & Konno, 1998, p. 40).
Nonaka’s (1994) SECI model (Socialization, Externalization, Combination, Internationalization) further offers a way to think about the knowledge-seeking behaviour which Nonaka situates within what he describes as knowledge spaces. Knowledge exchange that occurs outside of these mediating spaces, called Ba, are merely information exchanges. His reference to Polanyi’s concept of tacit and explicit knowledge is a useful metaphor in describing the tacit dimension of music knowledge, while the model’s emphasis on space help us conceptualize the type of emergent spaces that can address the knowledge-seeking needs of non-musicians. Similarly, Engeström’s (1999) expansive learning model is built upon activity theory and thus considers activity as the primary mediating agent for wide and expansive contexts, both at the socio-cultural and at the personal level, which in turn informs the activity itself. In particular, the sequence of learning actions in an expansive learning cycle (questioning, analysis, modelling, examining, implementing, reflecting, and consolidating) offer a useful framework for conceptualizing the self-directed learning we have observed in the cases. These two frameworks, while rooted in different theoretical traditions, both offer a metaphor for knowledge creation (Paavola, Lipponen, & Hakkarainen, 2004) that rejects the notion of knowledge acquisition as consisting of discrete tasks or information. They understand “knowledge as a dynamic human process”
118
Lam, Ratto
(Nonaka, 1994, p. 15) and recognize learning as a process of “continuously absorb[ing] new knowledge while restructuring existing knowledge” (MwanzaSimwami et al., 2009, p. 364), two important ideas that anchor our discussion of music knowledge and self-directed learning. As such, we draw on both frameworks in order to analyze the two learning contexts from which non-musicians’ information needs arise: the need to orient and situate oneself within a learning trajectory or a space for knowledge creation (Nonaka, 1994), and the need to explore different modes of learning (Mwanza-Simwami et al., 2009).
Part II: The cases Before we consider in more detail the relationship between trajectories of learning and modes of learning, a brief introduction to each case will offer some insight and context for the discussions to follow.
The relentless (Allan) Despite having shaky hands and some learning difficulties that prevent him from reading classical music notation for the purposes of playing music, Allan is imbued with great intellectual curiosity as well as a great affinity for music. He was able to sing in tune at a very young age, and has a fascination with sound and the processes of making sound. Interesting objects and noises easily distract him, and he is always curious about people around him. The fact that traditional ways of learning music could not satisfy his curiosity about music did not prevent him from trying to learn to play guitar, learn about music from different disciplinary perspectives (such as physics, electronic or computerized music), and making friends within musical circles. Allan’s interest is vast, and he makes meaningful connections between his various interests in the fields of physics, mathematics, and computers. In many ways, music-making is difficult for him. The necessary levels of concentration and the physical barriers that need to be overcome are perhaps steeper than the average. Yet instead of hindering him, it has encouraged him to explore the different facets of music. A depth of conceptual understanding about music leads to a richer experience in music making for Allan, and becomes a source of motivation to do the unglamorous and often tedious task of practicing.
Chapter 5. Seeking what we have yet to know
119
The musical silo (Giles) Music as a universal social and cultural artefact is not something that can be taken for granted. Just as there are those that seem to have an affinity or natural inclination towards music, for others it was something mundane that was simply “there” and did not elicit much attention or excitement. While growing up, music did not have meaning for Giles on its own. There was no discourse about music that sparked a personal interest or encouraged him to explore. Even in high school when band was an extracurricular activity that you could sign up for, he took up computers and architecture instead, pursuing his interest in building things and new technology. It was only during his university years, when friends who studied classical music introduced him to the mathematical beauty of sound and bringing him to live concerts, that he realized what he had missed out on. Today, Giles had a great desire to learn music. However, not knowing any other way beyond the paths set by the very same institutions that prize the music that has inspired him, Giles is forced to confront with the fact that he simply does not have the time to invest in learning music as such paths require him to do. He can afford to listen to music, and to cultivate his taste for music, and indeed it is something he does regularly. He gets excited thinking about instruments that “teach” you music, such as guitars with positions lighting up, or virtual keyboards that can be projected and are thus mobile. Even though finding a teacher is the most obvious avenue, he is cautious to engage in such social and financial commitment when he feels he is still not sure how he would like to learn.
The “non-musician” (Simon) Despite being an advanced trumpet player, Simon feels very strongly that he is very far away from being a musician, since he has only had the benefit of a trumpet teacher for two years. He has gone through (albeit independently for the most part) the benchmarks that indicates levels of musical achievement within the conservatory system, for the purposes of learning about music as an act of pursuing something that he loves. As he describes his process of learning— which also involved a couple of years of piano lessons, performing in community orchestras for over 15 years, and participating in exams and competitions—he is critical of the emphasis on mechanical execution instead of musical expression in the little instruction that he has received.
120
Lam, Ratto
What is most interesting about this case is that Simon is an individual who— by virtue of his achievement as defined by institutional benchmarks and participation in music organizations—would easily be recognized as an accomplished musician by popular social norms. And yet, he feels strongly and deeply that he has only succeeded in learning how to ‘pass’ within the system of music education, while lacking the musical skills required for one to really be a ‘true’ musician.
The hobbyist (Chloe) Chloe has always loved to sing; not to sing well, but just for fun. She sings along to pop songs, and to video games, but she does not sing in a choir. Her mother put her in piano lessons for two years when she was young, and it wasn’t long before she quit because she didn’t like practicing. Discovering the ukulele, which has increased in its popular appeal, was perhaps only a matter of time. Her interest in it developed as a combination of factors. Chloe’s friend was talking about the ukulele for about a year, which led her into looking up the ukulele online. She encountered websites, music videos, tutorials, and forums. It was a whole online world of ukulele enthusiasts, with a prominent culture that was welcoming and unpretentious. Ukulele makes music fun. You can get “serious” about it if you want, but you do not have to be in order to participate and share your love of “fun music” with other ukulele lovers.
The outlier (Jason) Jason loves music, or as he likes to describe it, he loves to just make noise. Sounds have an inherent quality in them that makes them interesting, and when you start putting them together you have yourself endless possibilities of combinations and arrangements. Being diagnosed with autism albeit much later in life, such a creative and experimental process was the way Jason was inclined to learn music. Sight-reading and memorization of music was difficult despite his conceptual grasp of music theory, a discerning ear, and the ability to recall music based on aural association. The inability to exercise fine motor control also made playing instruments “properly” a challenge. However, this did not stop him from playing in his high school band, playing bass in a local rock band, and learning to play the guitar.
Chapter 5. Seeking what we have yet to know
121
The music Jason makes, however, is really not intended for sharing. He does not play or compose music for the sake of pleasing an audience, but for the sake of hearing something he finds pleasing. These are the motivations behind the sonic sculptures that he has made on Second Life, a virtual world, where multiple tracks that he created himself are visualized, and as your avatar flies through the area it triggers different kinds of sounds. He is getting some help on setting up the code and exploring the different ways the position of the avatar would affect the sound, but he is happy with the experiment so far. Garage Band’s interface for musical composition opened up a whole world of possibilities for Jason. For one, the square notes denoting different pitches and timbre allowed him to drag his mouse around the screen in order to “find” the sound he is looking for.
Part III: Trajectories of learning: The emergent nature of music information need Learner activities are largely driven by motives and relationships that exist in the context in which learning takes place. (Mwanza-Simwami et al., 2009, p. 361)
Non-musicians’ trajectories of learning are shaped by the actors and relationships within their socio-musical worlds on the one hand, and their own personal inclinations and preferences on the other. While these factors actually have a dynamic and co-constituting relationship with each other, we will use trajectories of learning to anchor our discussion of non-musicians’ emergent information needs within a learning context. Trajectories of learning are what set the fundamental tone and direction of a self-directed learning process, and offers a context within which information needs first emerge. In other words, an understanding of a nonmusician’s overall trajectory of learning can potentially lead to an understanding of their information needs which are emergent in nature. In the case of the non-musicians, trajectories of learning are defined by personal inclinations and preferences for certain types of music which are situated within socio-musical worlds. Their personal inclinations and preferences help us understand their motivation for pursuing a particular learning trajectory, while the socio-musical worlds within which they situate themselves inform their conceptualization of music.
122
Lam, Ratto
Shaping a trajectory within a socio-musical world A learner’s motivation for learning and the conceptualization of music plays a major role in shaping their own trajectories of learning whether they are consciously shaping the trajectory or not. In the case of the non-musicians, their primary motivation for learning music was for personal enjoyment. Music as a domain has meaning for them in some way, and the process of learning music is to further explore and engage that interest in a personally meaningful way. Engeström et al. (2009), for example, speak of motivation as a fundamental factor in engaging in any kind of learning by remarking on “the significance of learner motives in determining learner engagement in activity” (p. 366). Each non-musician’s particular conceptualization of music is reflected by the socio-musical worlds. For example, all of the non-musicians described their interactions with particular friends and family members that had an impact on the way they engaged with music, and reflected on their experiences with institutions of music such as public schools, conservatories, record labels, and music stores. Actors, institutions, and relationships do not automatically shape one’s conceptualization of music, but they mediate our conceptualization vis-à-vis the way we interact with them. For example, Chloe’s interest in music is often discussed in relation to her experience as a girl guide engaging in music-related interests with friends or the piano lessons she disliked that made up her world at various points in her life. In Jason’s case, his experiences and insights were formed in relations to many musical and non-musical activities in an educational context, where he struggled to engage with music within the institutional norms. In Simon’s case, we see that he did not feel the need to venture very far away from the classical institutions of conservatories and Western music ensembles such as the orchestra to explore his musical interests.
Trajectories of learning as emergent contexts The consideration of a user’s learning trajectory forces us to acknowledge the larger socio-musical context within which they are situated, as well as the complex nature of individual’s musical interests, both of which are closely related to the way we conceptualize the information needs of non-musicians as emergent. That is, not as a discrete and clearly articulated queries, but as an exploratory process of discovery.
Chapter 5. Seeking what we have yet to know
123
While recognizing a need to seek out something that one needs during the process of self-directed learning, non-musicians often have a difficult time articulating a query, either to a search engine or a live person, to find exactly what they are looking for. While the “need to seek” is guided by their learning trajectory, the “information need” itself goes through a process of re-articulation and reconceptualization. Giles’ motivation for learning and conceptualization of music, for example, leads him to “articulate” an information need (i.e. “how can I learn music?”) vis-à-vis the familiar world of technological innovation, where he tries to find “smart” instruments that are designed to teach people how to play music. To further illustrate this emergent quality of non-musicians’ information needs and explore the concept of trajectories of learning as a context for information needs to emerge, Figure 2 is an axial spectrum that maps out a conceptual space that addresses various kinds of trajectories of learning. The four characteristics were defined by considering two factors: firstly, the approach one takes as a learning method (informal and formal) and secondly, the nature of the actors involved in the learning process (individual and institutional). In Figure 2, trajectories of learning are defined by whether the learner engages in a formal trajectory that is highly structured, or a casual trajectory that is more spontaneous. The two extremes reflect different levels of commitment such as the amount of time one has to devote to learning, as well as the personal preference for structured or spontaneous and opportunistic forms of learning. One important difference between the formal and casual trajectories of learning is the way success is determined. Clearly, the amount of time and commitment one devotes to learning is not necessarily an indicator of success. In fact, upon close reflection, all trajectories of learning have inherent in them ideas of “success”, and the measure of success in self-directed learning is experiencing personal enjoyment. Furthermore, while the concept of formal and casual trajectories of learning arose out of a consideration of the types of learning that are engaged within institutional contexts, the personal preference for a highly structured trajectory outside of institutional or other standardized contexts is also a conceivable form of self-directed learning. As such, the dimension of personalized and institutionalized trajectories of learning are also present, reflecting the dynamic nature of learning communities that can make up the non-musician’s socio-musical world.
124
Lam, Ratto
Figure 2. Mapping the trajectories of learning or emergent information needs of each non-musician.
Situated within an axial spectrum, each of the cases can be seen in relation to each other by the way they are spatially situated. A visual-spatial approach to situating these cases also helps convey the idea that music information needs emerge out of “somewhere”, and not simply out of “nowhere”. Giles’ position near the centre of the graph, for example, is unusual and quite telling upon analysis. In many ways, he is still trying to define a learning trajectory. He is able to articulate his musical interest in particular forms of music such as classical and jazz, but he is unsure of a process of learning that suits both his professional lifestyle and his personal inclination, both of which are situated within the world of innovative technologies. Arguably, Jason and Allan also have an interest in the way technology impacts the way one can engage with music, but Jason’s trajectory of learning is highly personal and exploratory—situating him in the lower left corner of the chart—while Allan explores music through established disciplines such as physics and computer science with musical interests within a wide spectrum of Western music spanning from Bach to The Grateful Dead. In situating non-musicians within their learning trajectory—by understanding their motivation for learning and their conceptualization of music—a corresponding knowledge trajectory could be articulated as a direction for exploration, what is essentially an emergent space for information needs. Aside from
Chapter 5. Seeking what we have yet to know
125
non-musicians’ association with particular socio-musical worlds, their conceptualization of music offers an important perspective from which to consider and even anticipate their information needs, but it is more prone to shift or change as their learning progresses. In other words, non-musicians’ information needs are closely related to their conceptualization of music as a domain of study itself: “… a person constructs his or her own version of the subject matter during the learning process, that is, formulates his or her own theory about the study domain” (Mwanza-Simwami, et al., 2009, p. 367). Thus, the advantage of using trajectories of learning to map out a space from which information needs of non-musicians emerge is that it is articulated in association with socio-musical worlds, which are relatively consistent, especially when considering a time frame in terms of years and even a person’s lifetime. Consider Simon’s use of the conservatory system as a trajectory of learning over many years, or Chloe’s use of mass media to explore her interest in ukulele, both of which are rooted in the socio-musical worlds with which they are familiar. Should a non-musician have trouble clearly conceptualizing or articulating an information need, an understanding of their learning trajectory can be used to direct them to resources that are situated within particular socio-musical worlds that are more likely to have “relevant” information that they are seeking, without requiring them to fully articulate a query.
Part IV: Modes of learning: Situating music information practice As noted in Figure 1, the information practices of non-musicians are to be understood twofold: their trajectories of learning and socio-musical worlds, which we have just discussed, as well as their preferred modes of learning. This section will address this second layer, which is nested within their trajectories of learning. More specifically, “mode” refers to the way in which the experience of music within a learning context is mediated. It is a way of understanding non-musicians’ information practices in relation to their preferred types of interactions within socio-musical worlds, as well as the particular conceptualization of the topic at hand (i.e. music). In other words, a non-musician’s mode of learning informs the way one seeks and conceptualizes music knowledge.
126
Lam, Ratto
Exploration of boundaries and limits Music can be found wherever we look, and as such, the learning of music can occur just about anywhere. Due to the distributed nature of music, non-musicians’ information seeking is most often an act of exploration, where the individual sets out with a vague idea of what they would like to achieve, but without a clear idea of what exactly it is that they need. It is particularly useful here to use Hennion’s (2003) concept of mediation to conceptualize the process of learning music at the personal level as an exploration of different forms of mediation of music. Actors, institutions and relationships, as mentioned in the last section, also mediate our conceptualizations of music vis-à-vis the way we interact with them, each of them an agent that makes up a larger socio-musical world. Considering this, non-musicians’ modes of learning—or exploration of musical knowing or knowledge—are also situated within various socio-musical worlds, communities, and practices that serve the function of preserving and transmitting music knowledge to various degrees. In other words, the exploration of music knowledge is closely related to the ways through which we personally prefer to interact with, and within, such contexts. The metaphor of an emergent “space” offers a way of situating their processes of exploration—or modes of learning—also within some kind of space, an approach that is fundamental to Nonaka’s (1994) concept of Ba as an existential space for knowledge creation. “The concept of Ba unifies the physical space, the virtual space, and the mental spaces. Ba is the world where the individual realizes himself as part of the environment on which his life depends” (p. 40). The metaphor of space offers a way of address the notion of going beyond existing boundaries, or processes of boundary-expanding, which is inherent in the process of exploration. In all five cases, the curiosity-driven nature of learning, and the overcoming of limits—whether temporal, spatial, physical, or cognitive—is an essential feature of self-directed learning. Self-directed learning, by its nature, is defined as an alternative to dominant forms of learning. Self-directed learners are those that do not, or cannot, engage in learning in such popular contexts. As such, the process of self-directed learning itself addresses the pedagogical limit of dominant forms of learning, or the barriers one experiences in being able to engage with it. Engeström’s theory of expansive learning, which is driven by a learner’s “articulation and practical engagement with inner contradictions of the … activity system” (Engeström & Sannino, 2010, p. 5), offers a further consideration of how perceived limits can inform non-musicians’ modes of learning. The expansive learning cycle calls for a constant reflection, observation, and interaction
Chapter 5. Seeking what we have yet to know
127
with the world through the development of models and the implementation of those models. It is a learning process that is not unlike the way Jason explores his musical ideas and the deep reflections he engages in when he experiences a musical performance, the way Simon finds the underpinning patterns to “pass” conservatory examinations, or Chloe’s deliberate and careful process before she finally purchased a ukulele: The object (of an activity) is an invitation to interpretation, personal sense making and societal transformation. One needs to distinguish between the generalized object of the historically evolving activity system and the specific object as it appears to a particular subject, at a given moment, in a given action. (Engeström & Sannino, 2010, p. 6)
Consider Allan’s experience of learning music, which is marked by a multitude of musical friendships and involvements in music-making (e.g., barbershop, high school band). These relations and connections afforded peripheral engagement with and exploration of musical worlds such as orchestral musicians and art music composers. These are worlds in which he is considered a non-musician, but welcomed because of his appreciation of their music and the alternative perspectives that he brings with his technical background. Furthermore, Allan also shows a personal preference for reading about different ways of conceptualizing music, as reflected by his fascination with the ideas of mathematical concepts and patterns offered in Douglas Hofstadter’s Gödel, Escher, Bach, as well as his fascination with understanding music in terms of humans’ aural perception. Allan’s exploration of music knowledge is mediated by his engagement within a small world context, as well as his engagement with knowledge spaces, or a large world context. To put it in practical terms, every individual’s information needs can be mapped according to a dynamic understanding of their trajectories of learning and modes of learning within a domain of knowledge, such as music. This metaphor of exploring music communities extend into the online world as well, as reflected by Allan’s interactions with online contexts where fans of The Grateful Dead and early adopters of the eigenharp share their interest with others, Chloe’s engagement with ukulele communities both in real life and online, and Simon’s use of the world of classical music-making as a frame of reference for his own musical explorations. Essentially, wherever there are people, wherever they interact, there are potential perspectives and approaches to music to be discovered.
128
Lam, Ratto
Exploratory nature of music knowledge seeking Figure 3 illustrates the modes of learning that non-musicians engaged in by situating the kinds or “modes” of interactions they have with different socio-musical worlds. Drawing from the five cases again, we map out a conceptualized space that helps us articulate the nature of information seeking that each non-musician engages in by understanding the types of interaction they engage in with different socio-cultural worlds. Using space as a metaphor, we identified four dimensions of non-musicians’ exploration processes or modes of learning: socialization, conceptualization, experimentation, and innovation. These four characteristics were developed when considering the information needs of non-musicians as a result of boundary-expanding, where one is engaged in an effort to find knowledge spaces that potentially contain musical knowledge of interest, or where one attempts to overcome perceived limits that prevent them from accessing such knowledge. Each of the five cases is applied within the map to illustrate its use. The introduction of new cases has the potential to both refine and expand the scope.
Figure 3. Mapping the modes of learning of each non-musician.
Chapter 5. Seeking what we have yet to know
129
The socialization and conceptualization dimensions roughly correspond to the broad tendencies of an individual’s willingness or desire to engage other people as part of their process of learning, while the innovation and experimentation dimensions mirror a similar dynamic within the specific context of boundaryexpanding. These dimensions are along a spectrum, and intended to show a range of tendencies rather than categories. Jason’s position to the extreme lower right of the graph shows the extent to which he relies on experimentation and conceptualization in his learning process. Giles, on the other hand, demonstrates a preference for exploring technological innovation that are emerging in various socio-musical worlds; further exploration of his process of self-directed learning may reveal otherwise. Both Allan and Chloe show a great reliance on their personal relationships and social engagements in their learning process, while Simon’s relatively conservative process of learning still retains elements of exploration. After situating each case on the chart, what became clear was that this is not the full picture. Allan, for example, also actively explores various conceptualizations of music from different disciplines, and Simon’s participation in community orchestras is a highly social activity. While one’s trajectories of learning are indicative of one general direction, the non-musician engages various modes of learning. As such, Figure 4 is more reflective of the multiplicity of non-musicians’ modes of learning, and by the same token, also the multiplicity of knowledge seeking behaviours. In Figure 4, what can be clearly seen are more accurate representations of the modes of learning in which each non-musicians engage, offering a way of conceptualizing their range of potential music knowledge seeking behaviours. When considering the trajectories of learning and modes of learning together, what we can see is that the nature of music knowledge seeking is shaped by the non-musicians’ modes of interaction with various socio-musical worlds. Such interactions define the particular modes of learning in which they prefer to engage in, and when taken together, they mark the socio-musical worlds or a knowledge trajectory which has the potential of satisfying their curiosity. Furthermore, the idea of boundary-expanding and desire to overcoming perceived barriers are important motivations for engaging in the act of exploring musical knowledge itself.
130
Lam, Ratto
Figure 4. Range of music knowledge seeking behaviour of each non-musician.
In the final section, we will discuss how knowledge trajectories can be used to conceptualize the information practices of non-musicians who are engaged in self-directed learning. In particular, we see the co-formative nature of a learner’s subjective small world context and the large world context of their socio-musical world as an important key to understanding non-musicians’ information practice.
Part V: Knowledge trajectories: Seeking what we have yet to know “Unsystematic” knowledge … [that] does not fit neatly into an institutionalized paradigm is adaptive to change, and their related knowledge practices emerge based on emergent contexts. The process of responding to the emergent contexts guides the agent, not presuppositions of what is the right way to respond. (Lam, 2011b, p. 21)
To search for something before we can articulate what it is seems like a paradox, yet it is something that we frequently do throughout our lives. In place of a discretely articulated information need which can be satisfied by a corresponding “right” piece of information, the situation is almost the complete opposite: the
Chapter 5. Seeking what we have yet to know
131
need, despite its existence, is not well articulated, and the “thing” that will satisfy that need is somewhere “out there.” The concept of knowledge trajectories offers a way of qualifying the information needs of a user—in this case, a non-musician.
Small and large world context The information practices of the non-musician are characterized by a constant negotiation between their small world contexts and their large world contexts. On the one hand, the nature of self-directed learning gives the learner’s agency a primacy that is not common in traditional instructional contexts; on the other hand, learning must be situated within some socio-musical world with which the learner can interact. Furthermore, while modes of learning are highly personal, they are also shaped by the way particular communities and practices—which are situated within particular socio-musical worlds—mediate one’s concept of what constitutes musical knowledge. As Engeström & Sannino (2010) observe: … an individual subject’s ideas and aspirations are not only taken as idiosyncratic expressions of the subject’s particular life history; they always also draw upon and interact with generalized cultural models and motives, or social representations. (p. 18)
This on-going negotiation between the self and the world is also a tenor in the information practices of non-musicians, whose information behaviours are marked by a process of navigating within the context of their trajectories of learning, and their processes or modes of learning.
Knowledge trajectories We have referred to the concepts of trajectories of learning and modes of learning as a way of situating non-musicians’ information practices. Such an understanding of non-musicians’ information practices could also be referred to as their knowledge trajectories. The information practices exhibited by the non-musicians are all somehow a process of “orienting” themselves, either as part of their larger trajectories of learning (identifying socio-musical worlds of interest), or in their preferred modes of learning (interacting with different socio-musical worlds). As such, their information practices are marked by the exploration of potential knowledge spaces, as well as the orientation of oneself within known knowledge spaces.
132
Lam, Ratto
To conceptualize non-musicians’ information practices as knowledge trajectories also accounts for the common experience of plateaus in a learning process where progress seems to stall, as well as the variable levels of dedication and commitment self-directed learners have at any given time. The discontinuous nature of self-directed learning is also matched by the demonstrated ability of the learner to continue on in the same trajectory when the right condition nurtures that process again. As such, the associated knowledge trajectories will also re-emerge within the learning context, albeit with perhaps within a new sociomusical and technical space.
Exploration and orientation In many ways, non-musicians’ knowledge trajectories are a response to the emergent nature of self-directed learning. Trajectories of learning and modes of learning offer a conceptual compass with which they navigate both known and unknown territories. To put it another way, knowledge trajectories are defined knowledge spaces within which non-musicians can explore unknown approaches to learning and understanding music, and to orient themselves vis-a-vis their preferred ways of interacting with socio-musical worlds. Both the act of exploring and orienting are co-constituted, and they occur within small and large world contexts. For example, in order to orient yourself, you need to explore, and before you take off, you need to know your own point of departure. In order to highlight the way knowledge trajectories are situated within the large and small world contexts, and how it can help us better anticipate the information needs of non-musicians, we map out the intersection between the small and large world contexts, as well as the knowledge trajectories, as represented by their trajectories and modes of learning (Figure 5). In particular, drawing on Nonaka’s (1994) idea that spaces and platforms are fundamental to the facilitation of knowledge creation, we use this framework to consider the kinds of spaces within which non-musicians can engage in the practices of exploration and orientation. The small world context represents learners’ personal interests, personalities, and inclinations, while the large world context represents their social worlds. Trajectories of learning represent the non-musicians’ musical interests, while modes of learning represent their preferred ways of learning. While these ideas have already been introduced, here they are used to gain a holistic perspective on the information practices of non-musicians. Generally speaking, trajectories of learning rarely change as they point to a broad direction, but they do help identify particular types of music knowledge
Chapter 5. Seeking what we have yet to know
133
that are of interest in terms of either small world or large world contexts. The small world context delineates the full spectrum of spaces in which the learner orients him or herself, while the large world context marks the potential contexts for exploration. For example, in Giles’ case, he is using the world of innovative technologies, an intimately familiar world, to explore his musical interests. Even in Simon’s case, his exploration of online resources after many years of absence from learning the trumpet lead to new ways of finding resources that he could use.
Figure 5. Knowledge trajectories as trajectories and modes of learning, and small and large world contexts.
Modes are often associated with personal inclinations, which, like learning trajectories, tend not to change but can of course expand or reduce. Large world contexts such as music communities also have modes of knowledge transmission or mediation that are frequently associated with them. Learners sometimes have to engage in modes of learning that they are unfamiliar or even uncomfortable with, just as knowledge practices are often transformed by the mode of preservation and transmission. For example, Allan’s small world context and large world context has shaped his learning process and information practices. In the small
134
Lam, Ratto
world context, he recognizes the eventual need to master the piano in order to engage with music more intuitively on the computer, even though fine motor control is not a mode of learning that suits him. In the large world context, the emergence of online communities has opened up the possibility of engaging with fans of The Grateful Dead who share instructional videos on how to play their covers. The emergence of information practices comes from within, as well as without.
Towards knowledge platforms for music knowledge In this paper, we have offered an approach to conceptualizing the information needs of non-musicians within the context of self-directed learning based on the idea of knowledge trajectories—as informed by trajectories and modes of learning—and a metaphor of spatial exploration to describe the way music knowledge is sought within the context of self-directed learning. While the discussion has been necessarily focused on learning as a process, the ideas presented in this paper represent the initial development of a design approach that addresses problems of the mediation and facilitation of knowledge, as well as of indexing and knowledge organization.⁴ The way intended users are conceptualized have a great impact on the way information systems are designed. By virtue of conceptualizing information needs as emergent, and characterizing information practices as processes of exploration and orientation, a corresponding design challenge is how to map out knowledge as spaces for exploration. There is a need for the user to map out their personal space or small world context, mark the larger knowledge and world contexts within which their interest is situated, and thus allow them to see just beyond those boundaries and determine a direction for continual exploration. For the field of information research, the articulation of those spaces in terms of the acquisition, transmission and preservation of music knowledge requires further exploration of the nature of interactions and relationships between two or more groups of actors within a shared socio-musical world. Thinking beyond a temporal approach to defining a search process, a special approach affords a way of thinking about an individual’s information practices over a longer time frame, and the development of a more nuanced understanding of users within various contexts and domains.
4 See Lam (2011a) for a proposed approach to music knowledge organization.
Chapter 5. Seeking what we have yet to know
135
Developing knowledge platforms to support the exchange of music or other non-textually based knowledge presents a unique set of challenges. It requires collaboration between information professionals, domain knowledge experts, and most importantly, input from users for whom such platforms are being developed. In listening to the stories of individuals who pursue music knowledge in an online context, complex relationships between traditions and innovation emerge with tensions that are not easily resolved. While self-directed learners, particularly those that have limited engagement with institutional forms of learning, can be quite innovative in the way they negotiate these tensions, there is certainly room for developing online platforms that provide additional support for the exploration and discovery of music-related knowledge. For the field of information research, the articulation of research and design questions in terms of knowledge spaces that facilitate the acquisition, transmission, and preservation of knowledge within specific domains offer not only a rich area for exploration, but great potential for making a difference.
References Becker, H. (1982). Art worlds. Berkeley: University of California Press. Becker, H. (1992). Cases, causes, conjunctures, stories, and imagery. In C. Ragin & H. Becker (Eds.), What is a case? Exploring the foundations of social inquiry (pp. 205 – 216). Cambridge, NY: Cambridge University Press. Becker, H. (1995). The power of inertia. Qualitative Sociology, 18(3), 301 – 309. Becker, H. (1997). Foreword. In J. Shepherd, P. Virden, G. Vulliamy, & T. Wishart (Eds.), Whose Music? A Sociology of Musical Languages (pp. xxi-xiv). London, UK: Latimer. Born, G. (2010). For a relational musicology: Music and interdisciplinarity, beyond the practice turn. Journal of the Royal Musical Association, 135(2), 205 – 243. Engeström, Y. (1999). Innovative learning in work teams: Analyzing cycles of knowledge creation in practice. In Y. Engeström, R. Miettinen, & R.-L. Punamäki (Eds.), Perspectives on activity theory (pp. 377 – 404). Cambridge, NY: Cambridge University Press. Engeström, Y., & Sannino, A. (2010). Studies of expansive learning: Foundations, findings and future challenges. Education Research Review, 5(1), 1 – 24. Green, L. (1988). Music on deaf ears: Music meaning, ideology, education. Manchester, UK: Manchester University Press. Hennion, A. (2001). Music lovers: Taste as performance. Theory, Culture & Society, 18(5), 1 – 22. Hennion, A. (2003). Music and mediation. In M. Clayton, T. Herbert, & R. Middleton (Eds.), The cultural study of music: A critical introduction (pp. 80 – 91). New York: Routledge. Hobart, M. (1993). Introduction: The growth of ignorance? In M. Hobart (Ed.), An anthropological critique of development: The growth of ignorance (pp. 1 – 30). London: Routledge. Lam, M. (2011a). Towards a ‘musicianship model’ of music knowledge organization. OCLC Systems and Services, 27(3), 190 – 209.
136
Lam, Ratto
Lam, M. (2011b). Online music knowledge: The case of the non-musician. (Unpublished master’s thesis). University of Toronto, Toronto, Canada. Looseley, D. (2006). Intellectuals and cultural policy in France: Antoine Hennion and the sociology of music. International Journal of Cultural Policy, 12(3), 341 – 354. MwanzaSimwami, D., Engeström, Y., & Amon, T. (2009). Methods for evaluating learner activities with new technologies: Guidelines for the Lab@Future Project. International Journal on E-Learning, 8(3), 361 – 384. Nonaka, I. (1994). A dynamic theory of organizational knowledge creation. Organization Science, 5(1), 14 – 37. Nonaka, I., & Konno, N. (1998). The concept of “Ba”: Building a foundation for knowledge creation. California Management Review, 40(3), 40 – 54. Nardi, B. (1996). Studying context: A comparison of activity theory, situated action models, and distributed cognition. In B. Nardi (Ed.), Context and consciousness: Activity theory and human-computer interaction (pp. 35 – 52). Cambridge, MA: MIT Press. Paavola, S., Lipponen, L., & Hakkarainen, K. (2004). Models of innovative knowledge communities and three metaphors of learning. Review of Educational Research, 74, 557 – 576.
Athena Salaba, Yin Zhang
Chapter 6. Searching for music: End-user perspectives on system features Abstract: This study examines users’ reactions to MusicAustralia, a digital library devoted to Australian music and musicians. The study’s goals were to examine whether a music retrieval system supports user tasks, and what features users find useful or desirable. Forty-seven users with varied musical backgrounds were selected from an academic institution to perform a self-selected open task and three prescribed tasks. Participants desired advanced search features for music retrieval systems as well as more clearly defined limiting and sorting options. In addition, an open task was used to compare the music retrieval system to a general library catalogue. Keywords: Music, music retrieval, music catalogue, user study, user tasks, user system evaluation, music system features, user perspective
Athena Salaba (corresponding author), Associate Professor, School of Library and Information Science, Kent State University, [email protected] Yin Zhang, Professor, School of Library and Information Science, Kent State University
1 Introduction Information agencies face a constant challenge in meeting the expectations of users from diverse environments, especially libraries, having incorporated varied technologies and technological formats into their collections and catalogues over decades, and having developed various standards and encoding schemes to better organise collections and improve accessibility. However, the effectiveness of a given system and its clarity to users is not always uniform across varied held formats and genre areas. Indeed, the different uses and expectations users have among various genres and subject areas warrant closer examination. One area that needs more attention is access to music materials. Using a survey methodology to collect rich descriptions of user behaviour and via a qualitative analysis, this study focuses on users of music resources, and appends to prior research by examining users’ searches and search techniques in terms of content and success, to further identify users’ perspectives and pref-
138
Salaba, Zhang
erences in current systems, and to present recommendations for more effective future music retrieval systems.
2 Literature review As literature from online catalogues and digital libraries is relevant to the current study, these areas have been presented in the review. Although many studies have addressed users’ information seeking behaviour regarding music resources, relatively few user studies have been conducted to ascertain users’ own perceptions and preferences in music catalogues. The extant literature can be divided as follows: 1) studies of user needs and information seeking behaviour, and 2) studies of user perceptions and expectations.
2.1 Studies of user needs, information seeking behaviour, and music resources A number of studies have examined the nature of user needs regarding music resources. Gould (1988) determined that access to detailed editorial, publication, and instrumentation / scoring information of music scores was of extreme importance to musicians and scholars. Similarly, Matsumori (1981) examined the information needs and general seeking habits of 31 music faculty members at the University of Southern California, concluding that users in his study found distinguishing between various editions to be paramount. Studies addressing users’ information seeking behaviour and music resources are relatively common in the literature. In online public access catalogues, Itoh (2000) examined and analysed 21,177 search logs in a Japanese academic music library, and found that the majority of initial searches were combinations of access points and that users predominantly sought known items. User information-seeking studies have also been conducted in reference to digital libraries. Riley and Dalmau (2007) studied user access to digitised sheet music in the development of the IN Harmony project. Their findings included that name / title searches as keywords were the most popular, again suggesting known-item searching. In subject searches, the most common terms were genre / form / style, topic, and instrumentation. The study also showed a significant gap between users’ natural subject terminology and controlled subject vocabulary terminology.
Chapter 6. Searching for music
139
In the Music Information Retrieval (MIR) community, user studies have analysed music-related information requests in online newsgroups (Bainbridge, Cunningham, & Downie, 2003; Downie & Cunningham, 2002). These studies found that bibliographic information and lyrics information were often included in requests, and the information requested was overwhelmingly for performer, title, or date. User studies of specific MIR systems were conducted with Kazaa by Taheri-Panah and MacFarlane (2004), with the Meldex Digital Music Library (McPherson & Bainbridge, 2001), and with four different web-accessible digital libraries by Blandford and Stelmaszewska (2002). Usability studies in music catalogue / digital library development have notably been conducted for the Variations and Variations2 Digital Library Projects (Dunn, Byrd, Notess, Riley, & Scherle, 2006; Dunn, Davidson, Holloway, & Bernbom, 2004; Dunn & Notess, 2002; Hemmasi, 2002; Minibayeva & Dunn, 2002; Notess & Dunn, 2004; Notess & Swan, 2003), which helped design Variations3 to include better content-based searching and browsing for music. Since its development, Variations3 has been the subject of several publications (Dunn et al., 2006; Riley & Dalmau, 2005; Riley, Hunter, Colvard, & Berry, 2007). Cunningham (2002) conducted user studies in music stores, confirming users conduct known-item searches, and advised organising MIR systems by genre to reflect the music store environment. Similarly, Vignoli (2004) conducted four user studies using popular music in MIR systems, and concluded with the following recommendations: systems should present their collections with the structure Artist-Album-Song; offer genre organization as an option; allow the selection of music according to period of time and mood; offer the opportunity to select similar items and artists; and present access to song lyrics and artist discographies.
2.2 Studies of user perspectives and preferences regarding music resources Although highly relevant to the current study, previous literature addressing users’ own perspectives in using music catalogues is far less common. Hume (1995) conducted individual and focus group interviews with music faculty and students at a private academic institution, finding that students reported using word searches most frequently when using an OPAC, since much of the information they desired was only contained within content notes. In contrast, faculty tended to use the subject search or search for known items using author, title, and word searches. Students also found current music subject headings too broad for
140
Salaba, Zhang
effective use. They also indicated a desire to be able to limit searches by material type from the beginning of a search, and directly from the main menu. The other major user perspective study was conducted by Gardinier (2004), who conducted in-depth interviews with 21 musicians, including music scholars, educators, conductors, vocalists, and instrumentalists, rating the value of various possible access points for music resources. Musicians identified the composer, title, performer, genre, opus or thematic number, and instrumentation as the most useful access points. Gardinier categorised the identified desired access points by format (e.g., for scores, sound recordings, etc.). However, neither of these studies examined users’ perspectives and use of a particular music system, much less of a more general, non-expert audience.
3 Aims of the study In light of current literature on user perspectives of music retrieval systems, this study posed the following research questions: 1. What search tasks do users perform when searching for music resources using current online catalogues and genre-specific (music) systems? 2. How do the current catalogues and genre-specific (music) systems support user tasks? 3. What functions and system features do users find helpful when searching for music resources using current online catalogues and genre-specific (music) systems? 4. What are the difficulties users encounter when searching for music resources using current online catalogues and genre-specific (music) systems? 5. How could current systems be improved to facilitate searching for music resources?
4 Methodology A brief description follows including the systems used in the study, users’ profiles, and the study tasks and procedures.
Chapter 6. Searching for music
141
4.1 Systems In order to better identify not only user needs but also user behaviours and strategies when searching for music resources in current online catalogues, this study utilised a comparative approach between two different online systems that provide access to similar resources. One system is a genre- or music-specific retrieval system, MusicAustralia (http: // www.musicaustralia.org), and the second is a library catalogue providing access to all kinds of resources, including music, Libraries Australia (http: // librariesaustralia.nla.gov.au). These two systems were chosen because they are a general library catalogue and a music retrieval system that provide access to similar music resources. This is essential when comparing the same tasks in two different systems. MusicAustralia is an online service developed by the National Library of Australia and the National Film and Sound Archive (a division of the Australian Film Commission) and other cultural institutions across Australia, built using MARC, XML, and other tools to form a discovery database of musical resources concerning Australian music and musicians. Overviews of the goals and structure of MusicAustralia are available (Holmes, 2002; Holmes & McIntyre, 2008). The overall functionality of the system is briefly addressed by Assunção (2005), Ayres (2003, 2004b), and Le Bœuf (2003). MusicAustralia is constructed of “repurposed” records from four different Australian institutions merged through Dublin Core elements (Ayres, 2003; Le Bœuf, 2003), so issues of relationships between notated and performed expressions were not taken into account (Ayres, 2003). Although the conceptual model FRBR is included in some descriptions of the goals of the MusicAustralia system (Ayres, 2004a), in reality the FRBR Group 1 entities are fairly non-existent. The general resource catalogue, Libraries Australia, a National Library of Australia resource sharing service, is based upon the Australian National Bibliographic Database (ANBD), which details location information of more than fortyfive million items in the majority of Australian libraries across all spheres.
4.2 Study participant demographics The 47 participants examining these systems were drawn from an entry-level core library science course in a graduate-level library and information science programme. Table 1 summarises the profiles of the participants.
142
Salaba, Zhang
Participants’ Profiles Stage in the MLIS programme (36 required credits)
Beginning stage (0 – 6 credits): 95.7 % Middle stage (7 – 30 credits): 4.3 % Final stage (31 – 36 credits): 0.0 %
Computer skills
Novice: 0.0 % Below average: 4.3 % Average: 36.2 % Above average: 44.7 % Expert: 14.9 %
Internet searching skills
Novice: 0.0 % Below average: 2.1 % Average: 36.2 % Above average: 44.7 % Expert: 17.0 %
Library OPAC searching skills
Novice: 4.3 % Below average: 19.1 % Average: 46.8 % Above average: 27.7 % Expert: 2.1 %
Table 1. Participants’ profiles.
The vast majority were at the beginning stage of their programme. Before their searches, participants were asked to rate their own abilities including computer skills, Internet searching, and library OPAC searching on a five-point scale ranging from 1 (novice level) to 5 (expert). In terms of computer skills, over 80 % of participants rated themselves as Average or Above Average. Participants’ Internet searching skills were similar. However, users’ library catalogue aptitude showed a drop in users’ confidence in their own abilities.
4.3 Tasks and procedures These participants performed a series of two open and three prescribed tasks. For each task, participants were asked to document the steps of their searches, explain the relevance of items chosen, note thoughts and impressions of the systems, express any likes or dislikes, and offer up suggestions for system improvement using a survey methodology. The first task was performed in Libraries Australia, where participants were asked to perform an open search for any music resource of their own interest using any method they preferred. Participants then performed the three pre-
Chapter 6. Searching for music
143
scribed tasks in MusicAustralia. These tasks were designed to represent more widely known Australian music and to reflect searches by title, author / creator, and subject respectively. Each also contained subtasks aligned to reflect the Functional Requirements for Bibliographic Records (FRBR) user tasks find, identify, select, and obtain works, expressions, manifestations and items (FRBR, 1998). The tasks can be summarised as: – Find the work “Waltzing Matilda”. Identify a sound recording. Identify a score. Can you find how to obtain it? – Find works by Kylie Minogue. Can you find sheet music containing some of her works? Can you find a sound recording of one of her songs? – Find national songs of Australia. Identify and select one work. Find a sound recording of this work. Find similar items. For the final, open task in MusicAustralia, participants were asked to replicate the open search of the personal interest task they had performed earlier in Libraries Australia. At the end of the search session, participants were asked to reflect on their overall experience with both systems and compare and evaluate their respective features.
5 Results 5.1 What search tasks do users perform when searching for music materials using current online catalogues and genre-specific systems? The 47 self-assigned tasks are summarised and listed below in four categories based on format and by various attributes of the material(s) desired. Since one user sought both books and recordings by a particular author / performer, that user is included in both categories. Specificity varied greatly between participants. Also, because music materials are often comprised of multiple versions, performances, and / or arrangements of a particular title / piece and often come in various print and audio formats, users often searched for combinations of various attributes such as scoring, composer, title, performer, format, etc. These various aspects are listed by frequency within the categories below. Most users sought sound recordings. Additionally, over half of the users across three of the four categories included musical genre as part of their search
144
Salaba, Zhang
terms, although performers also played a significant role in searches for sound recordings and general information.
Task Category
Task Details (number of participants)
Sound recording 32 participants of 47; 68.1 %
Find music of a particular genre (24) Find music by a particular performer (12) Find music in an particular audio format (11) Find a particular time period (9) Find a particular title (8) Find music for a particular audience (3) Find music from a particular geographical area (3) Find music by a particular composer (2) Find music for a particular instrument (1) Find songs in a particular language (1) Find songs about a particular subject (1)
Find information on 11 participants of 47; 23.4 %
Find information on a certain genre (7) Find information on a non-genre-related topic (3) Find biographical information on a particular performer (3) Find information in a certain language (2) Find information written by a certain author (2)
Sheet music / score 3 participants of 47; 6.4 %
Find a particular genre (2) Find a particular vocal or instrumental setting / scoring (2) Find a particular title (1) Find an arrangement in a particular vocal range (1) Find a particular song and / or arrangements of a song in a “singable” setting (1)
Other 1 participant of 47; 2.1 %
Lyrics to a particular song (1)
Table 2. Participants’ self-selected open tasks.
5.2 How do current catalogues and genre-specific systems support user tasks? User search success using MusicAustralia and Libraries Australia for the various tasks is summarised in Tables 3 – 7. Overall, both systems were relatively successful at supporting users’ tasks related to finding music materials. Many user errors occurred in the selection of an inappropriate item for various reasons, some of which can be attributed to the system and its interface. Some users were also
Chapter 6. Searching for music
145
unable to find items in the catalogue that would have fulfilled their stated needs. The researchers recreated all participants’ failed or incomplete tasks in order to pinpoint a likely cause.
Open task Users were asked to search for music resources of their choice in Libraries Australia. In nearly all cases, these search criteria were the same as used later when they repeated the same task in MusicAustralia. Among the more common search methods, 22 participants searched for a specific musical genre, 14 searched for a specific performer, seven for a specific title, and five for a subject. Since Libraries Australia only presented a single search box, three users also tried to use this to search for a date range, and two for a language. Two users tried to use natural language for the search. Table 3 summarises the results of this task using Libraries Australia.
Open Task Task
Conduct a search to find music resources of a personal interest
Search Success Yes
No
Chose Chose relevant item alternative item
Relevant item Search not in catalogue failure
Misc.
38 (80.9 %)
0 (0.0 %)
1 (2.1 %)
6 (12.8 %)
2 (4.3 %)
Table 3. Summary of search success using Libraries Australia—Open task.
Participants reported a high search success rate for this task. Thirty-eight users found exactly what they were looking for and six others selected alternative items. Of the two clear failures, both said they were successful, but in reality had chosen items that failed to reflect their own stated search criteria. Although it is remotely possible that the users’ stated search criteria did not match their actual criteria, the researchers felt it more likely that user misunderstanding of the items led to the discrepancy. Finally, one user is listed separately as “Misc.” since they stated their search was unsuccessful, yet contradicted themselves by selecting an item. In Libraries Australia, for the same open task discussed above, the majority of users selected relevant or alternative items according to these personal criteria.
146
Salaba, Zhang
As the scopes of the two catalogues were different, searches for some items may not have achieved results in MusicAustralia, as was the case for nearly 30 % of participants.
Open Task Task
Search Success Yes
Conduct a search to find music resources of a personal interest
No
Chose relevant item
Chose alternative item
Relevant item not in catalogue
Unable to locate item actually held in catalogue
21 (44.7 %)
7 (14.9 %)
14 (29.8 %)
5 (10.6 %)
Table 4. Summary of search success using MusicAustralia—Open task.
As illustrated in Table 4, 21 participants chose a relevant item to their search item and seven chose an alternative item. Overall, 28 participants (almost 60 %) reported success with their self-selected task in MusicAustralia and 19 participants (almost 40 %) reported failure. When the researchers replicated these searches, they found that five of the 19 reported failed searches actually had relevant holdings in the catalogue. The five users who were unable to locate relevant items actually held within the MusicAustralia catalogue are of chief concern. Many users did not use the advanced search to narrow down results and simply said the item they wanted was not on the long list they had found but did not want to browse through. Some performers could only be found via keyword search, and users who only used the advanced search for author / performer were unable to identify the item. Also, in many cases users were unable to identify relevant subject headings that would return satisfactory results.
Prescribed tasks In MusicAustralia, the first prescribed, title-based task asked users to locate various sound recordings and scores for “Waltzing Matilda”. As summarised in Table 5, all participants acquired a list of results of some type. However, over one fourth of participants selected a sound recording for “And the Band Played Waltzing Matilda”, a different song completely. This errant title is actually the second
Chapter 6. Searching for music
147
sound recording on the results list when users do a simple search for “Waltzing Matilda”, since MusicAustralia by default sorts results first by online status and then by title. “Waltzing Matilda” itself as a single title is thus relegated to the end of the list. This method of default sorting combined with user unfamiliarity with the song in question likely resulted in the high error rate. In the remaining subtasks, only three users were unsuccessful. Two of these inexplicably chose a sound recording when asked to find a score, which appears to be a simple user error, although it is possible they did not understand the format icons used in MusicAustralia. The final unsuccessful user had located a sound recording in an earlier subtask, but cut-and-pasted the title of the individual track (there spelled “Waltzing Mathilda”) to run a new search for a score. This method bore no relevant results for any of the following subtasks, and the system gave no other suggestions for alternative spellings or query revision.
Title Search Task Subtask
Search Success Yes
No Selected Selected Unable to irrelevant item wrong format locate item held in catalogue
Find the work “Waltzing Matilda”.
47 (100.0 %)
Identify and select a sound recording of this work.
34 (72.3 %)
Identify and select a score of this work. Found out how to obtain this score.
13 (27.7 %)
0 (0.0 %)
0 (0.0 %)
44 (93.6 %)
2 (4.3 %)
1 (2.1 %)
44 (93.6 %)
2 (4.3 %)
1 (2.1 %)
Table 5. Summary of search success using MusicAustralia—Title search.
In the second prescribed, author / performer task, participants were asked to identify and select items written and / or performed by the Australian singer Kylie Minogue. The majority of users successfully completed all three subtasks. However, the second subtask asked participants to find a score that contained at least one song of which Kylie Minogue was the author / songwriter. As shown in Table 6, eight participants of the 46 counted in this subtask were unsuccessful.
148
Salaba, Zhang
This confusion may have resulted from the fact that a search for scores with “Kylie Minogue” in the “By” field had a songbook to the motion picture The Delinquents as the top result. All eight unsuccessful users chose this particular score, but upon the researchers’ examination of the bibliographic record itself, no songs within the volume were actually written by Kylie Minogue. According to the record, the name appeared on the score’s cover.
Author / Performer Search Task Subtask
Search Success Yes
Find works written or performed by Kylie Minogue.
47 (100.0 %)
Identify and select sheet music of which she is the author*
37 (80.4 %)
Identify and select the song “Did It Again”
46 (97.9 %)
No Selected wrong author
Selected Selected wrong format irrelevant item
8 (17.4 %)
1 (2.2 %)
1 (2.1 %)
*As one participant did not specify the item he / she selected, that participant was not counted for this subtask. Table 6. Summary of search success using MusicAustralia – Author / Performer search.
The third prescribed, subject-based task asked users to find items of a specified musical genre / subject – national songs of Australia. As Table 7 shows, this task as a whole was more successful than the others, although the catalogue did not always hold a corresponding sound recording for a selected score. Very few users did not locate relevant items actually held in the catalogue. In one case, a relevant recording was listed in the contents notes of an item, accessible only through a keyword search. Users’ errors contributed to a few additional unsuccessful results.
Chapter 6. Searching for music
149
Subject / Genre Search Task Subtask
Search Success Yes
No Relevant item was not in catalogue
Find national songs of Australia
47 (100.0 %)
Identify a relevant item of interest
46 (97.9 %)
Identify and select a sound recording of this item*
35 (76.1 %)
9 (19.6 %)
Figure out how to obtain this recording
38 (80.1 %)
9 (19.1 %)
Identify similar items
45 (95.7 %)
1 (2.1 %)
Selected irrelevant item
Unable to locate item actually held in catalogue
1 (2.1 %) 2 (4.3 %)
1 (2.1 %)
*One user did not specify which item he / she selected and was not counted for this subtask.
Table 7. Summary of search success using Music Australia – Subject / genre search.
5.3 What functions and system features do users find helpful when searching music resources using current online catalogues and genre-specific systems? The system functions and features that users found helpful when searching for music resources are summarised below for the individual systems.
MusicAustralia Users discussed liking some of the MusicAustralia system features, search options, and given information in bibliographic records based on their experience with the system. Those features identified as the most helpful and most likely to be used are listed below in descending order of frequency. Only those features cited by two or more participants are included in Table 8.
150
Salaba, Zhang
Feature
Frequency
Breakdown of results by format Advanced search Format categories to narrow search Full and sample audio clips available online Themes (a browsing / exploratory feature in MusicAustralia) View scores and pictures feature Purchasing item links Borrow this item button and associated features Copies direct link View picture feature Date limiters Cover art „Held in“—Location limiter Icons / graphics
21 18 17 17 12 8 6 5 4 4 3 2 2 2
Table 8. Helpful features in MusicAustralia.
The importance of providing a breakdown of results by format is strong according to this user group. Advanced searches are also rated highly, although part of this may stem from the fact that Libraries Australia did not have an advanced search option. The presence of full and sample audio clips available via link was also liked, as was the “Themes” feature.
Libraries Australia Users also liked certain features of Libraries Australia. Those features identified as the most helpful and most likely to be used are listed below (Table 9) in descending order of frequency. Only those features cited by two or more participants are included.
Chapter 6. Searching for music
151
Feature
Frequency
Limiters on search page Search history Email option for search results “Get this item” feature Search summary Sorting options Clean front page Detail in individual records Links in individual records Save search history Search terms highlighted in records Scope
19 16 7 7 5 4 3 3 3 3 2 2
Table 9. Helpful features in Libraries Australia.
According to the list, the most important elements are the presence of limiters on the initial search page and the search history feature.
5.4 What are the difficulties users encounter when searching for music materials using current online catalogues and genre-specific systems? Participants responded about their use of a number of various aspects of MusicAustralia. A categorised listing of some features they found least helpful and their frequency follows in Table 10. In addition, six participants responded that there was nothing that was completely useless in MusicAustralia’s search interface.
152
Salaba, Zhang
Category
Specifically
Frequency
Limiters (14) for
Kits Objects Moving images People & organizations Websites Online items Scores
3 3 2 2 2 1 1
Advanced search (14) by
“Held in” Date field Title search Name / title search Number search “Phrase / All of these words / Any of these works” About / genre
5 3 2 1 1 1 1
Links to possible suppliers General search tool Sorting options Themes feature Descriptions of limiter terms
5 4 3 3 2
Table 10. MusicAustralia’s least helpful features.
Opinions varied among the participants, although most cited one or more details of various features they found useless. Since all participants completed their searches in the United States, locations of Australian holdings were not highly valued, nor were links to possible suppliers, according to some. A few also rejected the simple search feature, the sorting options on the results page, and the “Themes” feature, since it was neither complete nor sortable. Nonetheless, a small number of users did state that nothing in MusicAustralia was completely useless. The advanced search feature garnered some criticism by a small number of users. Complaints included that too many boxes complicated finding results, that there was inadequate user guidance in using the function, and that results either differed too much or not enough from that of the basic search. A specifically cited feature in the advanced search was the title search, where one user felt that the entered titles did not seem to match the results. A number of users described Libraries Australia as very straightforward and easy to use, and noted that there were no tools that they would not likely use. However, two of these remarked that there were not enough features in Libraries Australia in the first place to find anything not useful. Three others expressed a
Chapter 6. Searching for music
153
general dislike of the system overall, crediting their ability to get results as luck. A condensed list of users’ negative responses to various aspects of Libraries Australia is presented here: – Limiters (11) – Limiters are unclear / not helpful (4) – Braille limiting (3) – Oral histories limiting (1) – Printed music limiting (1) – Theses limiting (1) – Recent Australian Government Publications limiting (1) – General search, lack of search options (8) – Sorting options (5) – General dislike of interface (3) – Search history tool (3) – Subject terms difficult to find (2) – Inclusion of links to Google and Yahoo! (2) – “Buy it” option (2) Regarding the search page, eight users specifically expressed their dislike of the single search box, with some noting their frustration with searching in natural language, lack of overall options, and lack of specificity. Eight other users disliked various aspects of the limiters. Specifically, four of these users found the limiters to be unclear. One user complained that music recordings were not divided into separate formats for CD, cassette, etc. Others criticised the search history tool, in part because the history was erased when the browser was closed. Further complaints included subject terms that were only accessible from individual item records, the inclusion of links to Google and Yahoo!, and purchasing links.
5.5 How could current systems be improved to facilitate searching for music materials? Users indicated features they wanted and were not available in MusicAustralia. Many of these features apply to any general retrieval system for information materials but some are specific to music. In terms of frequency, these desired improvements were: – Spell check (6) – Extra limiters for specific audio and video formats (CD, VHS, DVD, etc.) (5) – More sorting options (5) – Scope of collection (3)
154 – – – – – – – –
Salaba, Zhang
Ability to sort by relevance (3) “See also” links (3) Organise result pages by letter instead of by number (3) Ability to browse within genres (2) Ability to search within the “Themes” feature (2) References to related and broader terms (2) Search history (2) Subject search instead of about / genre (2)
The importance of a spell check was the most common feature requested, followed by the ability to limit to specific types of audio and video formats and more ways to sort results. Five users also requested the ability to apply limiters to specific types of formats, such as CDs, DVDs, etc. A small number also requested the ability to sort by relevance, to browse within genres, to search within the “Themes” feature, and a different organization of pages in search results. The desired features for Libraries Australia are listed below in terms of frequency: – Advanced search (23) – Ability to specify language (4) – Spell check (4) – Ability to sort by date (3) – Presentation of broader and narrower terms (2) – Ability to sort like MusicAustralia does (2) – List contents of items in individual records (2) Of obvious prominence is the request for an advanced search feature, mentioned by 23 users, followed distantly by the ability to specify language of materials and spell check.
6 Discussion In consideration of the variety of comments made by users in this study, some recommendations for future systems can be made. Since the participants of this study were neither professional musicians nor music scholars, recommendations for that group of users have not been extensively examined here except where directly relevant.
Chapter 6. Searching for music
155
6.1 Search interface Having both simple and advanced search features is recommended, at least for academic audiences. A simple search box is familiar to users from such online search engines as Google, and is liked for its saved time and convenience, a finding of Taheri-Panah and MacFarlane (2004) in users’ interactions with Kazaa. However, given the wide user criticism of Libraries Australia due to its lack of an advanced search, it is highly recommended to include this feature for those users who believe it gives them more accurate and specific results. The music genre is additionally characterised by the importance of distinguishing performer from composer in sound recordings in a capacity that varies from musical genre to musical genre. Given this, it is advisable for systems to provide an adequate means of distinguishing between these two roles as well as standardise cataloguing practices to reflect this in the catalogue at large. A more specified use of relator terms in personal name access points may help alleviate this issue. A spell check function is also recommended for those names of which users may not be sure. Finally, as suggested by Bainbridge et al. (2003), systems should be made to cope with requests for “fuzzy” dates, or at least point users toward an acceptable means of inputting such information. A search history is recommended, enabling users not only to reference past searches but to help them methodically direct their searches toward a desired end. The current study has also shown clearly presented and accessible limiters to be highly popular, provided adequate definitions are given for each of them and effective directions for their use. Sub-categories of some formats, such as CDs, DVDs, etc., would also be useful.
6.2 Results display The current study shows that users highly value format breakdowns in results pages. In addition, the links to possible relevant subject headings alongside the initial results display would enable users to narrow or broaden their searches accordingly without having to delve into individual records to do so. Multiple means of sorting should also be included, among them sorting by relevance and possibly genre. As in previous MIR studies (Bainbridge et al., 2003), the ability for users to browse by genre is paramount in a music catalogue, especially in popular music. The ability to sort results within a genre would also be very useful. Similarly, as per Bainbridge, Cunningham, and Downie (2003), a lack of agreement on many sub-genres within popular music suggests that systems provide examples of a particular genre rather than a strict category, allowing a
156
Salaba, Zhang
user to search for “more things like this.” Another user aid would be to present possible relevant subject headings given a user’s search terms. Several users also made reference to the usefulness of cover art in distinguishing not only between like items but also to gain an idea of the content of items they did not know. This is in accordance with the findings of Cunningham (2002) and Vignoli (2004). Sorting options should also include sorting by relevance, as this is the practice most commonly used in many mainstream online search engines.
6.3 Item records and miscellaneous The broad user approval of the free domain digital content items in the MusicAustralia collection should also be considered. Although it may not be within the ability of all systems to include digital libraries within OPACs, users do find it useful and interesting. In item records, links to individual tracks in an album are useful, provided that the results lists do not become unwieldy. Also, it remains a matter of debate whether individual records should contain musical and textual incipits, which, as in line with other studies (McPherson & Bainbridge, 2001), may be above the capability of most amateur users. Finally, as suggested by Vignoli (2004), a catalogue should present similar items and artists to aid users’ navigation toward like items. There are some possible weaknesses in this study. Since the participants came from a common academic source, the results cannot be confidently generalised beyond a similar population. A similar weakness is that users’ past experience with music is a factor that was also not taken into account, and which may shed additional light on users’ searching techniques, likes, dislikes, and overall success. Future studies might take this into account. However, considering the responses of some users to some of the questions and tasks, musical experience varied greatly, thus improving the possibility of viewing these users as part of an overall academic community as well as amateur users.
7 Conclusion This study was conducted as an analysis of users of music resources and their interactions with a system devoted to the music genre. Via a qualitative analysis, the previous discussion has illuminated many aspects of this user group as well as their own likes and dislikes with a given system in accordance with techniques
Chapter 6. Searching for music
157
seen in past studies. Users’ search tasks, system task support, specific helpful and unhelpful features, and user difficulties in music systems have been examined. Since the range of users of music resources varies so broadly, ranging from expert scholars to professional musicians to students to amateur collectors to casual and otherwise unmusical browsers, it is often difficult to boil down recommendations to appease all parties involved and to present information in such a way that all not only find it useful but relevant. Certainly, once an organisation has defined its user base and audience goals, such questions of clarity, navigation, and accessibility are more easily addressed. Furthermore, given the slowly unfolding trends in cataloguing and library standards such as RDA, which is influenced by the FRBR conceptual model, one can hope that new cataloguing standards and practices will significantly contribute to not only users’ understanding of information resources, but also their confidence in their effectiveness.
References Assunção, M. (2005). Catalogação de documentos musicais escritos: Uma abordagem à luz da evolução normative. (Doctoral dissertation, Universitade de Évora). Retrieved from http: // dited.bn.pt / 30964 / index.html Ayres, M. L. (2003, February). MusicAustralia: Experiments with DC.Relation. Paper presented at the meeting of DC-ANZ (Dublin Core Australia New Zealand), Canberra, Australia. Retrieved from http: // www.nla.gov.au / openpublish / index.php / nlasp / article / view / 1262 / 1547 Ayres, M. L. (2004a, February). Case studies in implementing FRBR: AustLit and MusicAustralia. Evolution or revolution? Paper presented at The Impact of FRBR: Melbourne, Australia. Retrieved from http: // www.nla.gov.au / lis / stndrds / grps / acoc / ayres2004.doc Ayres, M. L. (2004b, February). MusicAustralia: Building on national infrastructure. Paper presented at the VALA 2004 Conference, Melbourne, Australia. Retrieved from http: // www.nla.gov.au / openpublish / index.php / nlasp / article / view / 1247 / 1532 Ayres, M. L. (2005). Case studies in implementing Functional Requirements for Bibliographic Records [FRBR]: AustLit and MusicAustralia. ALJ: The Australian Library Journal, 54(1), 43 – 54. Retrieved from http: // www.nla.gov.au / openpublish / index.php / nlasp / article / view / 1225 / Bainbridge, D., Cunningham S. J., & Downie, J. S. (2003). How people describe their music information needs: A grounded theory analysis of music queries. In H. Hoos & D. Bainbridge (Eds.), Proceedings of the Fourth International Conference on Music Information Retrieval: ISMIR 2003 (pp. 221 – 222). Blandford, A., & Stelmaszewska, H. (2002). Usability of musical digital libraries: A multimodal analysis. In M. Fingerhut (Ed.), Proceedings of the International Symposium on Musical Digital Libraries (IRCAM) (pp. 231 – 237). Cunningham, S. J. (2002). User studies: A first step in designing an MIR testbed. MIR / MDL Evaluation Project White Paper Collection, 2, 19 – 21.
158
Salaba, Zhang
Downie, J. S., & Cunningham, S. J. (2002). Toward a theory of music information retrieval queries: System design implications. Proceedings of the International Symposium on Music Information Retrieval. Dunn, J. W., & Notess, M. (2002, November). Variations2: The Indiana University digital music library project. Paper presented at the Digital Library Federation Fall Forum, Seattle, WA. Retrieved from http: // variations2.indiana.edu / html / dunn-notess-dlf2002 / Dunn, J. W., Davidson, M. W., Holloway, J. R., & Bernbom, G. (2004). The variations and Variations2 digital music library projects at Indiana University. In J. Andrews & D. Law (Eds.), Digital libraries: Policy, planning, and practice (pp. 189 – 211). Aldershot, England: Ashgate Publishing. Dunn, J. W., Byrd, D., Notess, M., Riley, J., & Scherle, R. (2006). Variations2: Retrieving and using music in an academic setting. Communications of the ACM, 49(8), 53 – 58. Gardinier, H. A. (2004). Access points perceived as useful in searching for music scores and recordings (Unpublished doctoral dissertation). University of California at Los Angeles. Gould, C. (1988). Information needs in the humanities: An assessment. Stanford, CA: Research Libraries Group. Hemmasi, H. (2002, October). Why not MARC? Paper presented at the International Conference on Music Information Retrieval. Retrieved from http: // variations2.indiana.edu / pdf / hemmasi-ismir2002.pdf Holmes, R. (2002, August). MusicAustralia: A digital strategy for music. Paper presented at the International Association of Music Libraries (IAML) Annual Conference, Berkeley, CA. Retrieved from http: // www.nla.gov.au / openpublish / index.php / nlasp / article / view / 1274 / 1562 Holmes, R., & McIntyre, K. (2008). Music Australia: From development to production service. Fontes Artis Musicae, 55(1), 128 – 141. Hume, M. (1995). Searching for media in the online catalogue: A qualitative study of media users. Journal of Academic Media Librarianship, 3(1), 1 – 28. Inskip, C., Butterworth, R., & MacFarlane, A. (2008). A study of the information needs of the users of a folk music library and the implications for the design of a digital library system. Information Processing and Management, 44, 647 – 662. IFLA Study Group on the Functional Requirements for Bibliographic Records (FRBR). (1998). Functional requirements for bibliographic records: Final report. München: K.G. Saur. Itoh, M. (2000, October). Subject search for music: Quantitative analysis of access point selection. Poster presented at the International Symposium of Music Information Retrieval, Plymouth, MA. Retrieved from http: // ismir2000.ismir.net / posters / itoh.pdf Lee J., & Downie, J.S. (2004). Survey of music information needs, uses and seeking behaviours: Preliminary findings. In Proceedings of the Fifth International Conference on Music Information Retrieval: ISMIR 2004 (pp. 441 – 446). Le Bœuf, P. (2003, April). FRBR et bibliothèques musicales. Paper presented at le Compte rendu des Journées professionnelles [de l’Association internationale des bibliothèques, archives et centres de documentation musicaux, Groupe français], Paris, France. Matsumori, D. M. (1981). An analysis of the information transfer process among music school faculty with implications for library systems and services design (Unpublished doctoral dissertation). University of Southern California, Los Angeles. McPherson, J. R., & Bainbridge, D. (2001). Usage of the MELDEX digital music library. Proceedings of the Second Annual International Symposium on Music Information Retrieval (pp. 19 – 20).
Chapter 6. Searching for music
159
Minibayeva, N., & Dunn, J.W. (2002). A digital library data model for music. In Proceedings of the Second ACM / IEEE-CS Joint Conference on Digital Libraries (pp. 154 – 155). Retrieved from http: // www.dml.indiana.edu / pdf / minibayeva-dunn-jcdl2002.pdf Notess, M., & Dunn, J.W. (2004). Variations2: Improving music findability in a digital library through work-centric metadata: Abstract. In Proceedings of the 4th ACM / IEEE-CS Joint Conference on Digital Libraries (p. 422). Retrieved from http: // csdl.computer.org / comp / proceedings / jcdl / 2004 / 2493 / 00 / 24930422.pdf Notess, M., & Swan, M. (2003) Predicting user satisfaction from subject satisfaction. Proceedings of CHI ‘03: Human Factors in Computing Systems (pp. 738 – 739). Riley, J. (2005, September). Exploiting musical connections: A proposal for support of work relationships in a digital music library. Paper presented at ISMIR 2005: 6th International Conference on Music Information Retrieval, London. Retrieved from http: // www.dlib. indiana.edu / ~jenlrile / presentations / ismir2005 / riley.pdf Riley, J., & Dalmau, M. (2007). The IN Harmony project: Developing a flexible metadata model for the description and discovery of sheet music. The Electronic Library, 25(2), 132 – 147. Riley, J., Hunter, C., Colvard, C., & Berry, A. (2007). Definition of a FRBR-based metadata model for the Indiana University Variations3 project. Retrieved from Indiana University Digital Library website: http: // www.dlib.indiana.edu / projects / variations3 / docs / v3FRBRreport. pdf Taheri-Panah, S. & MacFarlane, A. (2004). Music information retrieval systems: Why do individuals use them and what are their needs? Poster presented at the International Symposium on Music Information Retrieval. Retrieved from http: // ismir2004.ismir.net / proceedings / p083-page-455-paper110.pdf Vignoli, R. (2004). Digital music interaction concepts: A user study. Paper presented at the International Symposium on Music Information Retrieval.
Yin Zhang, Athena Salaba
Chapter 7. A user study of moving image retrieval systems and system design implications for library catalogues Abstract: This study addressed the research gap in evaluating effectiveness of newly developed systems for moving images from the user perspective. It aimed to better understand user needs while searching for moving images, to evaluate the effectiveness of current systems in supporting user tasks, to identify helpful system functions, features, and challenges, and to suggest possible system improvements. Forty-seven participants in this study investigated and compared two different systems for searching moving images by completing both prescribed tasks and self-assigned tasks. The results showed that current systems could be improved to provide some genre-specific functions, features, and record contents for searching, refining, and collocating moving images. Keywords: Library catalogue, online catalogue, OPAC (Online Public Access Catalogue), moving image retrieval system, user research, user needs, user tasks, system evaluation, FRBR (Functional Requirements for Bibliographic Records), UCLA-Cinema, IMDb
Yin Zhang (corresponding author), Professor, School of Library and Information Science, Kent State University, [email protected] Athena Salaba, Associate Professor, School of Library and Information Science, Kent State University
Introduction While library collections have included moving images (e.g., films or videos on VHS, DVDs, or files) for decades, libraries have been slow to fully integrate these media into library catalogues and provide adequate public access, as compared to their print counterparts. In recent years, systems have been developed to explore effective retrieval of moving images for library users and general Internet users alike. These developments have adopted some new approaches to the retrieval and display of moving images. An example of such new developments is the University of California at Los Angeles (UCLA) Film and Television Archive
Chapter 7. A user study of moving image retrieval systems
161
catalogue (UCLA-Cinema) that was launched in 2007. UCLA-Cinema embodies the Functional Requirements for Bibliographic Records (FRBR) principles with a particular focus on retrieving moving images. Another example is the popular Internet Movie Database (IMDb), which has consistently evolved to provide new functions and features that could potentially be used to enhance retrieval and display of library catalogues. Despite efforts to improve systems for moving images, there has been a lack of research on the effectiveness of these systems from the user perspective, which is essential for the development of systems that support user information seeking. The study reported in this paper was designed to address the gap with the following objectives: – to understand user needs for searching moving images; – to evaluate current systems for searching moving images from the user perspective in terms of how the systems support user tasks – the sample systems include UCLA-Cinema and IMDb, which are genre-specific systems for moving images; – to evaluate current systems for searching moving images from the user perspective in terms of which system functions and features are helpful; – to identify problems and difficulties users encounter when searching for moving images using current systems; and – to identify and suggest areas in which current systems could be improved. The results of this study will offer insight into the user tasks that moving image retrieval systems in general and library catalogues in particular should support, and which functions and features such systems should provide in supporting the tasks. The user evaluations will also identify areas in which current systems such as UCLA-Cinema and IMDb could be improved to better support user tasks related to moving images.
Literature review Most research in moving image retrieval has been conducted in two different yet complementary areas with distinct approaches (Cawkell, 1992; Turner, 1998). According to Turner (1998), the first area is the “low-level access to images (also called content-based access) using methods from computer science and concentrating on statistical techniques for deriving characteristics of images that help promote retrieval”; the other area is the “high-level access to images using methods from information science, and concentrates on the use of text to create
162
Zhang, Salaba
metadata useful for retrieval” (p. 108). For the purpose of this study, the following literature review will cover library and information science literature related to moving images and retrieval systems. The related research and practice will be summarised in two sections: (a) access to moving images and (b) user studies of catalogues or other systems that support moving image searches.
Access to moving images Although library collections contain moving images, providing access to them is far more challenging compared to their print counterparts. Previous studies have noted that moving images are not well integrated into library catalogues for public access (Brancolini & Provien, 1997; Scholtz, 1995; Weihs & Howarth, 1995). There has been an ongoing effort to promote awareness of these resources and to provide online access and search capabilities for moving pictures (e.g., Atkinson, 2001). In a survey of cataloguing practices and access methods for videos at ARL (Association of Research Libraries) and public libraries in the United States, Ho (2004) reported that while all respondents included video records in their online catalogues, only 52 % provided additional means of access. The additional access methods reported in the survey included (1) printed or Web-based lists of titles in their video collections, (2) lists of newly acquired videos and DVDs, and (3) finding aids for videos on selected topics. It was noted that seven percent of the libraries still maintained separate catalogues for videos. Access to moving images has also been explored beyond the item level. Turner, Besser, Goodrum, and Hastings (2002) discussed the need and the effort required to provide access to moving images at the shot and scene level, arguing that individual shots “can be considered as an intellectual unit used to advance the plot of a movie or to provide a unit of information in a television program” (p. 29). The complexity, inconsistency, and limitations of related cataloguing rules and practices have been noted and reported. In reviewing records in RLIN (The Research Libraries Group) and OCLC (Online Computer Library Center), Horny (1991) observed a tremendous variation in cataloguing standards in the level of description detail. Ho (2004) noted that current AACR2 rules allow cataloguers greater flexibility in deciding what to include in video cataloguing records than is allowed for books, leaving room for inconsistencies. She also observed that current cataloguing standards do not easily support form / genre access to foreign films, which patrons indicated was one desirable way to access movies (Ho, 2005). Similarly, in examining subject authority records, Miller (1997) noted the lack of consensus regarding the appropriateness of using the Library of Congress Subject Headings (LCSH) to provide form and genre access to moving image
Chapter 7. A user study of moving image retrieval systems
163
materials in current cataloguing practice, a problem that was also recognised and discussed by Yee (2007a). In recent years, new ways of providing access to moving images in nontraditional online catalogues have also been studied and discussed. Notably, Functional Requirements for Bibliographic Records (FRBR) presents a new conceptual model and offers a broader view of the bibliographic universe, providing abundant opportunities for libraries to develop catalogues that function more effectively and provide better user services during the process of accessing bibliographic data (IFLA, 1998; Tillett, 2005). UCLA-Cinema (http: // cinema.library. ucla.edu / ) is a library catalogue that incorporated the FRBR model for its cataloguing and user interfaces. According to the project launch announcement made by project leader Martha Yee (2007b), an authority record in the new online catalogue is treated as a work record, a bibliographic record as an expression record, and a holdings record as a manifestation record for various format and distribution information. Another system for moving image retrieval, IMDb (http: // www.imdb.com /), has been very popular among Internet users and has been received positively by library professionals as a valuable reference resource since its official launch under the current name in 1995 (Collins, 1996; Cowgill, 1997; Naun & Elhard, 2005). In more recent years, IMDb has gained more attention as an important reference tool and object for research. Tenopir (2008) reported that pages from IMDb were added to subscription-based online research databases such as H.W. Wilson’s new Cinema Image Gallery (CIG) to provide additional information about cinema arts. Naun and Elhard (2005) examined the principles for information organisation and representation behind IMDb and compared them with the cataloguing principles used for current library catalogues, such as AACR2 and MARC standards, an aspect that had been largely overlooked. Naun and Elhard observed that despite the apparent differences between IMDb and current library catalogues, they shared similar conceptual models. For example, IMDb has work level records, which also function as authority control; variations are linked through the work level by listing alternates; and different makes of a movie are given the same title. Although research on UCLA-Cinema and IMDb is still relatively scarce, research along this line helps researchers and developers explore new possibilities for accessing moving images.
164
Zhang, Salaba
User studies of catalogues and other systems supporting moving image retrieval Previous research has expounded upon the importance of understanding users’ needs when designing effective online catalogues (Yee, 1991). It is critical to understand users in order to build an effective bibliographic system, particularly in the three areas of future use of bibliographic data: (1) discovery and delivery, (2) inventory management, and (3) cross-compatibility with related data (Fallgren, 2007). In a review of visual image retrieval (Enser, 2008), it was observed that user studies commonly involved analysis of search queries and logs to understand user needs and behaviours in searching images (e.g., Armitage & Enser, 1997; Chen, 2001; Sandom & Enser, 2002; Yang, Wildemuch, & Marchionini, 2004). The user image search behaviour in general can be categorised into three types: (1) “target search” aiming to find a specific image; (2) “category search” with some general criteria for an image; and (3) “search by association” where users do not have specific criteria in mind and rely on browsing to find an image (Smeulders, Worring, Santini, Gupta, & Jain, 2000). Researchers have noted a lack of literature regarding user studies of moving image retrieval through library catalogues. For example, Hume (1995) observed that while media users may have special needs when it comes to searching for media materials in catalogues, there have been no user studies reported in this area. After conducting individual and focus group interviews with faculty and students, Hume identified several areas of confusion for the users: library media holdings, media access points and searching features of the OPAC (Online Public Access Catalogue), and subject access to media. In another study concerning faculty and students searching for videos using a library catalogue, Ho (2001) observed user behaviour while looking for videos at the library, especially how they searched and what information from the record they found useful. The results showed that the majority of searches were title searches and users were satisfied with known item searches. The useful information for record display included original author, aboutness of the video, language of the video, length (play time), actors / actresses, date of original release, level of audience, time period of the storyline, geographic area of the storyline, director, recording format, series, other persons, and table of contents. Although there have been active system development efforts in implementing FRBR, related user research has been the least-addressed facet of FRBR research and development (Zhang & Salaba, 2009a, 2009b). Very few FRBR implementation projects actually conducted or reported user studies on their developed FRBR systems. In addition, there were no evaluative comparisons of existing FRBR pro-
Chapter 7. A user study of moving image retrieval systems
165
totype systems. To a great extent, the current FRBR application and implementation efforts have reflected the views of the designers and researchers with user considerations. There was no direct user input or user validation. In summary, a review of the related research and practice in access to moving images and user studies shows that there have been some exciting developments in moving image retrieval systems such as the FRBR-based UCLA-Cinema catalogue and the non-traditional library catalogue system IMDb. These developments help researchers and developers explore new approaches for accessing moving images. In addition, there have been examinations and reflections on the principles for information organisation and representation behind new systems such as IMDb and comparison of these principles with the cataloguing principles used for current library catalogues. User research and evaluation on the effectiveness of the new developments have been lacking, right along with the development efforts. Such user research and evaluations are needed to identify possible improvements for future developments that support user needs and tasks for moving images. The study reported in this chapter was designed and conducted to address the gap.
Methodology This study adopted a comparative approach in which participants investigated and compared two different systems for searching moving images with both prescribed tasks and self-assigned tasks.
Participants The participants consisted of 47 graduate students taking an entry level required course in an MLIS programme. Table 1 summarises the participant profiles. A majority of the participants (72.3 %) were at the beginning stage of their programme, about a quarter (23.4 %) in the middle stage, and a few (4.3 %) in the final stage. Before their searches, the participants were asked to indicate their skill level in computer use, Internet searching, and catalogue searching on a five-point scale, from 1 for novice level to 5 for expert. In terms of computer skills, 66 % of the participants thought they had at least above average skills using a computer; only 6.4 % rated themselves as below average. The participants’ Internet searching skills followed a pattern similar to their computer skills. Slightly over half of
166
Zhang, Salaba
the participants considered themselves to be average catalogue searchers, 21 % above average, 17 % below average, 6 % as expert, and 4 % as novice. Participants’ Profiles Stage in the MLIS programme (36 required credits to graduate)
Beginning stage (0 – 6 credits): 72.3 % Middle stage (7 – 30 credits): 23.4 % Final stage (31 – 36 credits): 4.3 %
Computer skills
Novice: 0.0 % Below average: 6.4 % Average: 27.7 % Above average: 53.2 % Expert: 12.8 %
Internet searching skills
Novice: 2.1 % Below average: 4.3 % Average: 25.5 % Above average: 44.7 % Expert: 23.4 %
Library online catalogue search skills
Novice: 4.3 % Below average: 17.0 % Average: 51.1 % Above average: 21.3 % Expert: 6.4 %
Table 1. Profile of participants.
Systems The systems evaluated in the study included IMDb and UCLA-Cinema, which have adopted brand new approaches for moving image retrieval and display but have not been evaluated from user perspectives. The participants were divided into two groups and asked to use two systems for comparison purposes. Both groups were asked to use UCLA-Cinema (http: // cinema.library.ucla.edu). For the second system, Group 1 participants were asked to use the local public or academic library catalogues they normally use, while Group 2 participants were asked to use IMDb (http: // www.imdb.com /). Table 2 summarises the research design and shows the number of participants using each system. The users’ selfchosen OPACs varied to a great extent, including catalogues from a wide range of libraries.
Chapter 7. A user study of moving image retrieval systems
Group
Number of participants
UCLA-Cinema
IMDb
Group 1 Group 2 Total
18 29 47
Yes Yes 47
Yes 29
167
User-chosen catalogue Yes 18
Table 2. Participants and systems.
Tasks The participants were asked to perform both prescribed tasks covering basic user tasks (e.g., find, identify, select, collocate, and obtain) and a user self-assigned task based on personal information needs and interests. The study included user tasks involving the types of materials that the systems had in common to examine effectiveness and user preferences of various implementation approaches for description, retrieval, and display of moving images. The specific tasks participants performed are listed below by system.
IMDb tasks – – –
Task 1: Find a movie, documentary, filmed performance, or television show. Task 2: Find, identify, and select a season of a television show (e.g., American Idol, Friends, The Simpsons). Task 3: Try to find related items (collocate) based on Task 2 above.
UCLA-Cinema tasks – – –
Task 1: Find, identify, and select a motion picture. Try to figure out if a copy is available for checkout. Try to find related items (collocate). Task 2: Find, identify, and select a movie title with several different makes or versions (e.g., Jane Eyre). Try to find related items (collocate). Task 3: Find, identify, and select a season of a television show (e.g., American Idol, Friends). Then find, identify, and select an episode of the television show. Try to find related items (collocate).
168 –
–
Zhang, Salaba
Task 4: Conduct a search to find films and television shows on a topic you are interested in (not a specific title). Identify and select a particular item of interest. Try to find related items (collocate). User self-assigned tasks:
The participants of the study were asked to think about a moving image, such as a film or TV show that they were interested in finding, and conduct the search using UCLA-Cinema.
User-chosen OPAC tasks –
The participants of the study were asked to conduct the same self-assigned task using both UCLA-Cinema and a self-chosen OPAC. The self-assigned tasks served two purposes: one was to elicit detailed user needs for moving image retrieval, and the other was to evaluate the effectiveness of the new approaches in UCLA-Cinema as compared to regular catalogues. IMDb was not designed to serve the purposes of a catalogue; thus participants of this study were not asked to perform the user-chosen tasks in this system.
Procedure The participants were asked to create search logs recording detailed search strategies (e.g., search options, search queries, query reformulations) with corresponding search results and to indicate how they identified and selected the bestmatching item using the system’s features. In addition, participants were asked to reflect on their search experience and suggest possible system improvements after each search task. At the end of their search sessions, the participants were given the opportunity to reflect on their overall experience with the two systems and to compare and evaluate the systems’ functions and features.
Results The results of this user study of moving image systems are summarised in the following five areas, which revolve around the objectives of this study as laid out in the introduction: (1) user needs for searching moving images; (2) effectiveness of current systems in supporting user tasks; (3) helpful functions and features;
Chapter 7. A user study of moving image retrieval systems
169
(4) problems and difficulties users encounter; and (5) areas for system improvements.
1) User needs for searching moving images In addition to the prescribed tasks, participants of this study conducted selfassigned searches for moving images based on their information needs and interests. The 47 self-assigned tasks, to some degree, serve as a small sample of user tasks that reflect users’ information needs when searching for moving images. These tasks are summarised in three categories based on the type of moving images users were looking for: motion pictures, television programmes, and documentaries. The majority of the search tasks were in the motion pictures category (32 tasks; 68.1 %). Approximately half of the searches (15) aimed to find and view a film with a specific title (11), a specific title and in a particular format (DVD) (3), or a specific actor / actress (1). Seventeen other motion picture-related search tasks involved finding information about a particular film (5), finding trivia about a motion picture (2), and one instance each of finding the following 10 movie attributes: – a director, plot synopsis, and release date; – a filming location; – a motion picture description and reviews; – an actor in a particular motion picture role in order to collocate other motion picture appearances; – an author and producer; – box office sales, planned sequels, actors, and feedback from other viewers; – character significance; – motion picture quotes; – reviews and feedback; and – screenwriters or screenplay development. Search tasks related to television comprised 13 searches (27.7 %). About half (six) of the television search tasks were to find and view a particular television series with a specific title, including two additional tasks to find and view episodes for a specific season of a television series. The remaining five tasks in this category each occurred only once: – find an old television series in a specific format (DVD); – find a television show about a particular subject;
170 – – –
Zhang, Salaba
find information about a particular television series and find VHS movies about the series; find information about a television series and production (filming location) of the old series; and find information about a specific ballet performance previously aired on television.
Documentary searches comprised two (4.3 %) of the search tasks. One search involved finding and viewing a documentary with a specific title by a specific filmmaker in a specific format (DVD), and the other search involved finding and viewing a television documentary for viewing prior to the release of the next instalment. It is interesting to note that some tasks included certain genre-specific attributes, such as various credits for movies and episodes and the season of a television series. Additionally, some tasks were collocative in nature, such as searching for movies featuring a particular actor / actress or for all television series by a particular creator. Some users were specific about the format of moving images (e.g., DVD).
2) Effectiveness of current systems in supporting user tasks The study examined how effectively the systems were able to support user tasks in the context of user success in completing tasks using the systems. User task success was evaluated by the researchers of this study based on their replication of each task in its respective system and their examination of the search results. Tasks in which participants searched for materials that were not included in the system collection were excluded when calculating the success ratio. User success in completing tasks in UCLA-Cinema, IMDb, and self-chosen library catalogues is summarised in Tables 3, 4, and 5, respectively. Overall, UCLA-Cinema was highly successful in supporting user tasks related to motion pictures in general (100 %), tasks involving multiple makes or versions (100 %), and searches by subject (98 %). This high degree of success may be due to the hierarchical FRBR-based display of search results that offers a simplified display of related results by work, expression, and manifestation, which made the collocating process easier for users. In addition, users reported that some of UCLA-Cinema’s functions and features were helpful. Such functions and features reported most frequently included hyperlinks within records, title and keyword searches, topic or genre / form search, and multiple search options. However, users in this study experienced problems when using this system with tasks related to televi-
Chapter 7. A user study of moving image retrieval systems
171
sion shows by season or episode (61 %). The system did not offer a clear display of such information and users had to read descriptions in small font to recognise the detail. For user self-assigned tasks, 83 % were successfully completed using this system. The types of self-assigned tasks are reported in the user needs section above. IMDb was extremely successful in supporting various user tasks related to movies, documentaries, filmed performances, and television shows (100 %). The task to find a television show by season was considered challenging to some extent given that there were no specific fields for such information and users had to read plot summaries, but all participants were still able to complete the task successfully, aided by the system’s user friendly display of the search results. Some of the system functions and features were reported by users as helpful in using the system. The functions and features reported most frequently as useful included rich hyperlinks in the system overall, search by title, and rich information found in records. The collocating task to find related materials was performed successfully by 90 % of participants. Some difficulties for collocating using IMDb were reported, including the fact that users had to scroll through long displays to manually identify related materials and that it was confusing when the system mixed different types of materials for display (e.g., mixing movies and video games in its display). On the other hand, user self-chosen local library catalogues were not as effective in supporting user tasks. Only 72 % of these tasks were completed successfully. This result served as a comparison for evaluating the UCLA-Cinema catalogue with the same user self-chosen tasks performed. UCLA-Cinema was more successful (83 %) in supporting the user tasks than the local library catalogues. Overall, the systems demonstrated varying degrees of effectiveness in supporting user tasks. Besides the highlights of the possible reasons mentioned above, the user input on the systems’ helpful functions and features for performing these tasks as well as the problems and difficulties they encountered are reported in the next two sections, which shed light on user success within these systems.
172
Zhang, Salaba
Task (N=47; Group 1 and Group 2 participants)
Search Success*
Task 1: Find, identify, and select a motion picture. Try to figure out if a copy is available for checkout. Collocate related items. Task 2: Find, identify, and select a movie title with several different makes or versions. Collocate related items. Task 3: Find, identify, and select a season of a television show. Then find, identify, and select an episode of a television show. Collocate related items. Task 4: Conduct a search to find films and television shows on a topic you are interested in (not a specific title). Identify and select a particular item of interest. Collocate similar items. User self-assigned tasks
45
100 %
0
0 %
47
100 %
0
0 %
28
61 %
18
39 %
45
98 %
1
2 %
25
83 %
5
17 %
Yes
No
* Totals for individual tasks may vary due to the exclusion of cases in which participants searched for materials not in the collection. Table 3. Success with UCLA-Cinema.
Task (N=29, Group 2 participants)
Search Success*
Task 1: Find a movie, documentary, filmed performance, or a television show Task 2: Find, identify, and select a season of a television show Task 3: Collocate related items based on Task 2
27
100 %
0
0 %
29
100 %
0
0 %
26
90 %
3
10 %
Yes
No
* Totals for individual tasks may vary due to the exclusion of cases in which participants searched for materials not in the collection. Table 4. Success with IMDb.
Task (N=18, Group 1 participants)
Search Success
User self-assigned tasks
13
Yes
Table 5. Success with user self-chosen library catalogues.
No 72 %
5
28 %
Chapter 7. A user study of moving image retrieval systems
173
3) Helpful functions and features Participants in this study were asked to identify helpful functions and features for tasks related to moving images based on their experience with UCLA-Cinema and IMDb. The system functions and features that users found helpful are summarised below by system.
UCLA-Cinema The system features and search options users identified as most helpful and most likely to be used in UCLA-Cinema are listed below in descending order of frequency: – hyperlinks within records (15) – keyword search (12) – title search (12) – topic or genre / form search (10) – multiple search options (8) – Boolean search (5) – title variant search (5) – credit search (4) – search result listing with adequate details and display sorting (e.g., material, director, release date) (3) – detail in the item records (3) – credit variant search (2) – inclusion of the MARC record (2)
IMDb As reflected in the high success rate with IMDb, participants responded very positively to a number of the system functions and features. The system features and search options users identified as most helpful and most likely to be used in IMDb are listed below in descending order of frequency: – links in the system (21) – participants enjoyed the rich links provided in IMDb, including links to persons (e.g., actors, writers) (5); links to trivia (4); and links to reviews (2) – good display and easy navigation (7) – rich information found in records with additional information in external links (6)
174 – – – –
– – – – – – – –
Zhang, Salaba
search by title (6) images (e.g., movie poster, photos of actors) (4) separating television series, seasons, and episodes (4) inclusion of “fun stuff” on film titles and actors, such as trivia, goofs, and quotes, as well as the ability to find out what happened on a particular date (e.g., birth dates, marriages) and what films are “now playing” in local theatres (3) search by plot (3) various search options (3) allowing quotation marks in a search (2) cast and crew feature (2) search by actor (2) search by character (2) separation of information into categories (2) the system’s forgivingness of misspellings, as it includes different spellings and menus on the sides with hyperlinks to those items (2)
4) Problems and difficulties encountered by users Participants were asked to report on difficulties and problems they experienced when using the systems. The reported difficulties and problems are summarised by individual systems below.
UCLA-Cinema The difficulties and problems users encountered when using UCLA-Cinema for various tasks fell in two broad categories: (a) searching and collocating; and (b) display and general usability issues, mostly related to system features and search options. In the searching, refinement, and collocating category, participants expressed confusion regarding search options for inventory number search (11) and holdings search (5), which are mainly for local users. Date search options such as broadcast date search and release date search were reported to be a problem by five participants. Other problems reported were the pre-existing works search (3), keyword search (3), topic / genre search (3), credit variants search (2), ability to collocate (2), connection among various search options (2), and advanced search (2).
Chapter 7. A user study of moving image retrieval systems
175
In the display and usability category, the most frequently reported issue was the terminology in the system (12), followed by unhelpful or confusing item records (4), interfaces with too much text (4), help page with little assistance (4), confusing search results list with duplicate results (3), difficulty in finding the correct version, format, or availability of an item (3), lack of hyperlinks within records for collocating (1), and sorting options (1). Overall, the major search problems users encountered were that some of the search options seem to have been designed mainly for staff or for people who are highly knowledgeable about the collection or the field (e.g., the inventory number search and the holdings search). While users credited the number of various search options available, they also commented that most fields could not be used jointly for their search or refinement. In terms of the display, the major issue was system terminology use (e.g., xrefs, SPAC, nitrate, variants) for search options and system functions. In addition, users reported difficulties regarding selection and display features, the overall system design, and usability issues.
IMDb Similarly, user-reported difficulties and problems with IMDb can also be summarised into two categories: (a) searching, refinement, and collocating; and (b) display and general usability issues. The most-reported difficulty in using IMDb was searching for television shows (14). When searching for a specific television episode or season, users had to read plot summaries to find details. Another equally challenging aspect reported while using IMDb was the unavailability of certain search or refinement options that would make searches much easier for television shows and movies (13). Such search or refinement options include year, type of movie (e.g., documentary, mini-series), filming location, both subject and keyword search, and language. Collocating related television shows on episode content or on a series basis was also a challenge (4). Users considered the system’s genre search unhelpful (5) because the terms tended to be too general or irrelevant and these searches normally led to a surplus of results and irrelevant hits. The system did not support quality subject or topic searching because the subject terms tended to be too broad (3). In the category concerning display and usability, users reported that record information in IMDb was limited or not always accurate (6). For example, although there was a field for a plot synopsis, it was often unavailable and the description was sometimes not adequate or specific enough to be helpful. When viewing a large number of search results, it was difficult to navigate a long page with lots
176
Zhang, Salaba
of information (5). It was confusing to have a mixed listing of various materials (e.g., movies and video games) instead of grouping and sorting search results (4). The system seemed to order the results by popularity rather than listing exact matches first, which was confusing to users. Users also expressed frustration that they had to follow hyperlinks for information that should be on the same page (3). For example, DVD information was scattered with hyperlinks to separate pages or to external sites, such as Amazon, for related details. In addition, some labels were confusing (3). Users were unsure of the difference between the review and comment sections and the meaning of the “Movie Connections” label.
5) Areas for system improvements While user-reported problems and difficulties may point to areas for possible system improvements, direct user input would be valuable to indicate user-preferred solutions to the problems and difficulties that were encountered. After they finished all tasks, users in this study were asked to suggest possible improvements that might have helped improve their searches during the various tasks. The suggestions are summarised below by system.
UCLA-Cinema The user-suggested improvements for UCLA-Cinema’s searching and refinement options included allowing more search options or enhancing current options (12), such as search options by characters, audience (e.g., children’s and young adult items), certain material types (e.g., television shows, episodes, movies), new releases, broadcast date by range, broader terms, narrower terms or related terms of a subject, and multiple criteria combined. Users noted that it would also be helpful to allow sorting, refining, limiting, and searching within results (7) and to enhance search with spell check (2). For collocating tasks, users would benefit from better collocation based on genre and film type, hyperlinks, and subject headings (5); additional links to related items based on awards, works by author, creator, or director (5); and additional links for each form (e.g., television, motion picture, etc.) (4). Overall, the look, feel, and usability of the system’s interface could also be enhanced (7). Users would benefit from more information in records (e.g., details including searchable content overview, photos, abstracts, availability, description of films) (5); increased and enhanced help availability and more informa-
Chapter 7. A user study of moving image retrieval systems
177
tion about the collection (3); and more specific and helpful system feedback from searches (3).
IMDb The user-suggested improvements on IMDb were focused primarily on two areas: (a) searching and refinement, and (b) record information. For searching and refinement, users would benefit from additional search options (e.g., by genre / topic or television season) (6); refining options by year and type of format (4); and sorting or grouping options (e.g., by movie, television show, or name) (1). For record information, users would benefit from more authoritative information rather than entirely peer-provided information (4); subject headings for records rather than user-submitted genre tags (2); links to local library holdings (4); and links to Amazon or other sites to purchase videos or DVDs (4). Additional suggested improvements for record contents included listings of current shows, movies, specials, and movie dates and times (4) as well as information about printed items related to search topics and information pertaining to a single season of a television series (2). Finally, it would also be helpful if format availability is included in the record (1).
Discussion This study addressed the lack of research in user perspectives on new developments, such as UCLA-Cinema and IMDb, in moving image searching and retrieval. The results of this user study have implications for developing retrieval systems in general and library catalogues in particular that better support user needs and tasks for moving images. First, the results of this study illustrated that some genre-specific attributes are integral components of user needs and their search and selection criteria for moving images. These genre-specific attributes include actors / actresses, screenwriters or screenplay development, producers, filming locations, and / or directors. Additionally, the sample of user self-assigned tasks showed that most user tasks are “target search” for a specific moving image. For example, among the 32 tasks for motion pictures, 31 (97 %) were “target search” with specific criteria. The one remaining task was “search by association” based on collocating. Among the 13 tasks for television, 10 (77 %) were “target search” while the remaining 23 % were “category search.” Both tasks related to documentaries (100 %) were “target
178
Zhang, Salaba
search.” Current library catalogues do not adequately support “target search” with genre-specific attributes or support such genre-specific search or display options for moving images. Thus, it was not surprising that the success rate of the same sample of user self-assigned tasks for regular OPACs was lower than that for UCLA-Cinema (72 % vs. 83 %), showing a clear disadvantage of the current library catalogues in supporting moving images tasks. Second, both UCLA-Cinema and IMDb offered more moving image-specific functions and features for more effective user and system interactions. For example, UCLA-Cinema was highly successful in supporting user tasks related to motion pictures in general (100 %), tasks involving multiple makes or versions (100 %), and searches by subject (98 %). UCLA-Cinema provided a FRBR-based display of search results with a hierarchical approach that made browsing and collocating easy for users. Users found UCLA-Cinema’s hyperlinks within records, title and keyword searches, topic or genre / form searches, and multiple search options that allow users to combine various searches in a single search particularly helpful. IMDb was extremely successful (100 %) in supporting various user tasks related to movies, documentaries, filmed performances, and television shows. Users found the rich hyperlinks within the system and to external sites, friendly display and easy navigation, search by title, and visual images most helpful. The collocating task to find related materials was also very successful (90 %) using IMDb. Third, the results of this study also suggest that both UCLA-Cinema and IMDb could be improved to better support various user tasks. In UCLA-Cinema, users experienced problems with tasks related to television shows by season or episode and only 61 % of them were able to complete the task successfully. The major reason cited was the lack of a clear display of such information and users had to make the effort to read descriptions in very small font for the details. Additionally, the usability and display of UCLA-Cinema could be improved in general. In IMDb, the major difficulties encountered by users were collocating. Users had to browse through long displays to manually identify related materials. Also, users experienced difficulties when IMDb mixed different types of materials within one search results display (e.g., mixed movies and video games). These detailed userreported problems and difficulties clearly point out areas for improvement for the two systems and will be helpful for future moving image system design.
Chapter 7. A user study of moving image retrieval systems
179
Conclusions UCLA-Cinema and IMDb are both one step ahead of current library catalogues in supporting moving image retrieval. UCLA-Cinema stands as a pioneer and an example of a moving images catalogue that applies the FRBR model in its cataloguing record structure as well as providing in its user interfaces a hierarchical display of works, expressions of a work, and manifestations of an expression (i.e., holdings record in the system) for various format and distribution information. This approach helps simplify the collocating process for users. IMDb has evolved to be not only a successful reference tool for libraries but also a source of data and inspiration for traditional library retrieval tools such as research databases and catalogues. IMDb pages are now part of mainstream research databases for images (Tenopir, 2008). Library professionals have also recognised its value for cataloguing. Naum and Elhard (2005) observed the conceptual similarities between library cataloguing principles and the IMDb approach. IMDb’s rich links, work-level records, record content, and cross-references for collocating all embody the great potential of library catalogues. Libraries are already taking advantage of this free resource by downloading data records from IMDb and integrating them into catalogues for moving images (Valenza, 2011). User-centred system design and usability are important aspects of enhancing the user experience with a catalogue system. User research is critical to developing systems that effectively support user tasks and information seeking. The user evaluations in this study identified an array of specific features and options for searching, refinement, display, and usability that are helpful and desirable. Some of these system features could be feasibly incorporated into library catalogues to support moving image retrieval more effectively.
References Armitage, L. H., & Enser, P. G. B. (1997). Analysis of user need in image archives. Journal of Information Science, 23(4), 287 – 299. Atkinson, J. (2001). Development in networked moving images for UK higher education. Program, 35(2), 109 – 118. Brancolini, K., & Provien, R. (1997). Video collections and multimedia in ARL libraries: Changing technologies. Washington, DC: ARL. Cawkell, A. E. (1992). Selected aspects of image processing and management: Review and future prospects. Journal of Information Science, 18(3), 179 – 192. Chen, H. (2001). An analysis of image queries in the field of art history. Journal of the American Society for Information Science and Technology, 52(3), 260 – 273. Collins, B. R. (1996). Beyond cruising: Reviewing. Library Journal, 121, 125 – 26.
180
Zhang, Salaba
Cowgill, A. (1997). Entertainment resources on the Internet. The Reference Librarian, 57, 179 – 186. Enser, P. G. B. (2008). Visual image retrieval. In B. Cronin (Ed.), Annual Review of Information Science and Technology 42, 3 – 42. Fallgren, N. J. (2007). User and uses of bibliographic data: Background paper for the Working Group on the Future of Bibliographic Control. Retrieved from the Library of Congress website: http: // www.loc.gov / bibliographic-future / meetings / docs / UsersandUsesBackgroundPaper.pdf Ho, J. (2001). Faculty and graduate student search patterns and perceptions of videos in the online catalog. Cataloging & Classification Quarterly, 33(2), 69 – 88. Ho, J. (2004). Cataloging practices and access methods for videos at ARL and public libraries in the United States. Library Resources & Technical Services, 48(2), 107 – 121. Ho, J. (2005). Applying form / genre headings to foreign films: A summary of AUTOCAT and OLAC-LIST discussions. Cataloging & Classification Quarterly, 40(2), 73 – 88. Horny, K. L. (1991). Cataloguing simplification: Trends and prospects. International Cataloguing and Bibliographic Control, 20, 25 – 28. Hume, M. (1995). Searching for media in the online catalog: A qualitative study of media users. Journal of Academic Media Librarianship, 3(1), 1 – 28. IFLA Study Group on the Functional Requirements for Bibliographic Records (1998). Functional Requirements for Bibliographic Records: Final report. München: KG Saur. Retrieved from http: // www.ifla.org / files / cataloguing / frbr / frbr.pdf Miller, D. (1997). Identical in appearance but not in actuality: Headings shared by a subject-access and a form / genre access authority list. Library Resources & Technical Services, 41(3), 190 – 204. Naun, C. C., & Elhard, K. C. (2005). Cataloguing, lies, and videotape: Comparing the IMDb and the library catalogue. Cataloging & Classification Quarterly, 41(1), 23 – 43. Sandom, C. J., & Enser, P. G. B. (2002). VIRAMI: Visual information retrieval for archival moving imagery (Library and Information Commission Research Report 129). London: The Council for Museums, Archives and Libraries. Scholtz, J. C. (1995). Video acquisitions and cataloging. Westport, CT: Greenwood Press. Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. C. (2000). Content-based retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349 – 1380. Tenopir, C. (2008, December). Online databases: Scenes from a database. Library Journal. Retrieved from http: // www.libraryjournal.com / article / CA6618867.html Tillett, B. B. (2005). FRBR and cataloging for the future. Cataloging & Classification Quarterly, 39(3 / 4), 197 – 205. Turner, J. M. (1998). Some characteristics of audio description and the corresponding moving image. In R. Larson, K. Petersen, & C. M. Preston (Eds.), Proceedings of the 61st American Society for Information Science Annual Meeting (ASIS’98) (pp. 108 – 117). Turner, J. M., Besser, H., Goodrum, A. A., & Hastings, S. K. (2002). Current research in digital image management. In E.G. Toms (Ed.), Proceedings of the 65th American Society for Information Science Annual Meeting (ASIST 2002) (pp. 525 – 526). Valenza, J. (2011, May 31). Six tools to simplify cataloging [Blog post]. Retrieved from http: // blog.schoollibraryjournal.com / neverendingsearch / 2011 / 05 / 31 / six-tools-to-simplifycataloging /
Chapter 7. A user study of moving image retrieval systems
181
Weihs, J., & Howarth, L. C. (1995). Nonbook materials: Their occurrence and bibliographic description in Canadian libraries. Library Resources & Technical Services, 39, 184 – 197. Yang, M., Wildemuth, B. M., & Marchionini, G. (2004). The relative effectiveness of concept-based versus content-based video retrieval. Proceedings of the 12th Annual ACM International Conference on Multimedia (pp. 368 – 371). Yee, M. M. (1991). SystemDesign and cataloging meet the user: User interfaces to Online Public-Access Catalogs. Journal of the American Society for Information Science, 42(2), 78 – 98. Yee, M. M. (2007a). Moving image cataloging: How to create and how to use a moving image catalog. Westport, CT: Libraries Unlimited. Yee, M. M. (2007b, February 7). A FRBR based catalog for moving images [Electronic mailing list message]. Retrieved from http: // www.frbr.org / 2007 / 02 / 07 / ucla Zhang, Y., & Salaba, A. (2009a). What is next for FRBR? A Delphi study. Library Quarterly, 79(2), 233 – 255. Zhang, Y., & Salaba, A. (2009b). Implementing FRBR in libraries: Key issues and future directions. New York: Neal-Schuman Publishers.
Part III: Empirical knowledge organization studies
Abebe Rorissa, Diane Rasmussen Neal, Jonathan Muckell, Alex Chaucer
Chapter 8. An exploration of tags assigned to geotagged still and moving images on Flickr Abstract: Digital libraries have become more distributed and more diverse in their collections. It is common for digital libraries to contain information sources that are multimedia in nature. They provide access to, among others, text and image (still and moving) documents as well as audio files. While digital cameras now capture some metadata automatically, user-assigned semantic tags are popular and indispensable. This includes geotagging, a process of tagging either the latitude and longitude coordinates or the place names of the location where an image was shot. Systematic analysis of geotagged images is timely and necessary because the phenomenon of social tagging and its true potential is new and not fully understood, several million geotagged photographs are uploaded to Flickr each month, and tags are frequently criticized because they are imprecise and not well-investigated. To address this, with the help of basic level theory, we undertook an analysis of tags assigned to a sample of geotagged still and moving images on Flickr. Our findings showed that tags assigned to geotagged still and moving images are not statistically significant with respect to their level of abstraction. Implications of our findings for indexing and retrieval of still and moving images are discussed, demonstrating that tags can potentially help solve the indexing problem associated with semantic contents of multimedia documents. They also have the potential to bridge the semantic gap. Keywords: Image indexing, video indexing, Flickr, geotagging, basic level theory
Abebe Rorissa (corresponding author), Associate Professor, Department of Information Studies, University at Albany, State University of New York, [email protected] Diane Rasmussen Neal, Assistant Professor, Faculty of Information and Media Studies, The University of Western Ontario Jonathan Muckell, Graduate Student, Department of Informatics, University at Albany, State University of New York Alex Chaucer, Graduate Student, Department of Informatics, University at Albany, State University of New York
186
Rorissa, Rasmussen Neal, Muckell, Chaucer
Introduction Geotagging “is the process of adding geographical identification metadata to resources (websites, RSS feed, images or videos). The metadata usually consist of latitude and longitude coordinates, but they may also include altitude, camera heading direction and place names” (Torniai, Battle, & Cayzer, 2007, p. 1). Semantic geotags include specific place names such as “Los Angeles” or more general place names such as “forest.” Geotagging’s appeal to Internet users is readily obvious. Millions of geotagged items reside on the photograph sharing Web site Flickr (http: // www.flickr.com). According to Flickr’s website, about 3.2 million geotagged photographs were uploaded during the month (http: // www. zooomr.com /) of September 2009 alone. In February 2009, Flickr reported on its blog that it housed 100,000,000 geotagged photos (Champ, 2009). Flickr enduser searches for the term “geotagged” or a place name such as “Australia” will consistently yield new uploads. Flickr, Zooomr (http: // www.zooomr.com /), and Picasa (http: // www.picasa.com / ) are three of a number of photo-sharing services and / or personal photo / video organization systems that have geotagging tools and allow users to add latitude and longitude information to their photos and videos. Through these services and systems, tools such as Google Maps, Google Earth, Yahoo! Maps, and other in-house tools are used to select the location information for the photos and videos. Geotags can also be added automatically by cameras outfitted with GPS devices. Geotagging is highly relevant to still and moving images (e.g. photos and videos). Images can capture a visual representation of a place, and people use geotags to identify the place where their photograph was shot. Research on how users describe and tag (or index) multimedia information sources is as important as studying users’ interactions with multimedia search and retrieval systems, including Web search engines, if we are to improve the development and performance of these systems (Tjondronegoro, Spink, & Jansen, 2009). Analyses of user-generated tags such as those on Flickr could make the discovery of attributes frequently used by users for organizing visual resources possible (Chung & Yoon, 2009). Wolfram, Olson, and Bloom (2009) recommended similar research when they stated that “it is worthwhile to explore potential parallel characteristics between indexing and tagging as these characteristics could influence the form of tagging offered in a given situation and how tagging influences retrieval” (p. 1996) and called for a better understanding of tags and tagging. Bar-Ilan, Shoham, Idan, Miller, and Shachak (2008) have also called for studies of the potential usefulness, strengths, and shortcomings of tags. Social tagging and photo sharing services such as Flickr offer researchers the opportunity to conduct large scale analyses of both the tags and taggers or
Chapter 8. An exploration of tags assigned to geotagged still
187
users. Before social tagging, previous image research focused on descriptions, by a small number of people, of images from small collections as objects of study. A number of studies have examined differences between user descriptions of images using tasks such as viewing, sorting, and describing, to mention a few (e.g., Jörgensen, 1995, 1996, 1998; Laine-Hernandez & Westman, 2006). Bearman and Trant (2005) and Trant (2006) have also examined users’ tagging of museum objects. However, the issue of whether there is a difference in descriptions and tags assigned to still and moving images has not been addressed. Digital cameras now have the capability to capture some geographic metadata automatically. Among the metadata captured is in the form of latitude, longitude, altitude, and other numeric representations of geographic location. While it is very useful to have such information, a human-assigned semantic tag is a higher-level form of content than machine-assigned numeric tags, and therefore serves a different purpose. While the latitude and longitude of Albany, New York (42° 39’ N 73° 45’ W) can tell a Web site where to display a photo taken in Albany on a map designed for browsing by location, a user is probably more likely to search for and understand the term “Albany” than the city’s geographic coordinates. Although semantic tags can easily (automatically) be created by “translating” geocoordinates into place names, semantic tags are useful in the indexing and retrieval of still and moving images in digital libraries, especially those in the collections of social tagging services such as Flickr. An established framework, the basic level theory, served as a basis for comparison of the structure of Flickr tags assigned to geotagged still and moving images. The basic level theory was chosen because the use of frameworks to study tags assigned to visual resources has also been recommended by others (Angus, Thelwall, & Stuart, 2008). Basic level theory is also closest to a universal hierarchical taxonomy of concepts and objects because it was found to be reliable across cultures (Lakoff, 1987). It offers an opportunity to look at contents of images and what they mean to viewers the same way Panofsky’s (1955) “levels of meaning” does. Rosch (1975) was the pioneer researcher who first proposed the basic level theory and, with her colleagues (e.g., Rosch & Mervis, 1975; Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976), developed it into one of the widely accepted theories for the study of categories of objects and their attributes. Researchers have previously used basic level theory and its categories in image research (Chung & Yoon, 2009; Hare, Lewis, Enser, & Sandom, 2007; Rorissa, 2008; Rorissa & Iyer, 2008). Rorissa (2008) and Rorissa and Iyer (2008) used basic level theory to show the similarities and differences between labels of groups of and descriptions of individual images supplied by image users. Hare et al. (2007) used a modified version of basic level theory to study keywords assigned to a museum’s image
188
Rorissa, Rasmussen Neal, Muckell, Chaucer
collection by expert indexers. Chung and Yoon (2009) used basic level theory and other frameworks to compare Flickr tags and search query terms in the log of the Excite search engine. The analysis by Hare et al. (2007) of keywords assigned to 17,000 images from the Victoria and Albert Museum collection found that the majority of the keywords were names of objects which are often at the basic level. Similar results were obtained by Rorissa (2008) and Rorissa and Iyer (2008). While the effectiveness of tagging in information retrieval has not yet been verified (Jeong, 2009), it is currently the best way for Web users to semantically index and search for the ever-increasing number of user-provided online documents. Librarians or other information professionals cannot possibly provide tags for all these documents (Peters, 2009). Given the preponderance of tagging, it seems that users do not mind providing their own descriptive terms, and in fact, they enjoy it (Peters, 2009). It is important to understand the practices used in geographic image tagging to inform future system design, since geographicallyoriented images are so prevalent on Web sites such as Flickr. From a user-centred perspective, it makes sense to design image access systems that capitalize on how users provide semantic image tags, rather than forcing them to conform to a predetermined method of description. A naturalistic understanding of tagging practices for geotagged images, using a proven cognitive theory such as the basic level theory as the framework, provides a firm foundation for eventual participatory design studies and new approaches to accessing visual resources containing geographic significance.
Problem statement The main purpose of this work is to conduct a content analysis of the tags assigned to still and moving geotagged images on Flickr, using basic level theory as the framework, in order to determine the similarities and differences between the levels of abstractions of the tags. Content analysis, which has already been utilized in previous image indexing and retrieval literature (Chung & Yoon, 2009; Green, 2006; Hare, Lewis, Enser, & Sandom, 2007; Rorissa, 2008; Rorissa & Iyer, 2008), in conjunction with statistical tests were used to identify the most common level of abstraction (basic, superordinate, or subordinate) of the tags assigned to still and moving images. We hope that our work is a worthwhile effort in our quest to understand social tagging, and that system and indexing tool designers for still and moving images will incorporate this knowledge in order to improve both indexing and retrieval of relevant documents from multimedia digital libraries. It is important to understand how users naturalistically describe images when no
Chapter 8. An exploration of tags assigned to geotagged still
189
restrictions are imposed on them to ensure that authoritative systems of description align with innate human cognitive constructs. The research questions arising from the above purpose were as follows: (1) What is the nature and number of tags assigned to geotagged still and moving images? (2) How do the numbers of tags assigned to geotagged still and moving images compare with each other? (3) How often are the basic, superordinate, and subordinate levels of description used in geotagged still and moving images? (4) How does the use of basic, superordinate, and subordinate levels of description differ between geotagged still and moving images? These questions and their answers are relevant due to the fact that human indexers or automatic agents could benefit from knowledge that in an integrated multimedia digital library, the choice of index terms vis-à-vis place / location is either independent or dependent of the format (still or moving). The questions have also become particularly important, given the recent interest in social / collaborative tagging.
Literature review Indexing still and moving images Indexing can be document-oriented or user-oriented. The former is most common while the latter has, at least so far, not been effectively utilized in part because involving users in a priori indexing was not practical (Fidel, 1994). This is changing because of the nature of today’s Web users, who are active in the creation and organization of information, in addition to being passive consumers of information. Some of the sources of information are still and moving images, which have both similar and different attributes. While still images have spatial attributes, moving images have both spatial and temporal attributes. User searches for still images are the dominant type of multimedia search on the Web, but searches for video are on the rise. This is due, in part, to the increase in the number of users with high-speed Internet connections, social networking services that allow uploading and viewing of videos (e.g., YouTube, Flickr), and the ubiquity of portable media devices such as the iPod (Tjondronegoro, Spink, & Jansen, 2009). Indexing the semantic contents of moving images poses greater challenges than indexing text documents and even still images (Enser, 2008). This is due in part to the number of different levels of semantic structure in moving images. These include frames, shots, scenes, clips, episodes, and news stories (Enser, 2008). Another reason is that because visual information sources such as still
190
Rorissa, Rasmussen Neal, Muckell, Chaucer
and moving images are seldom associated with textual descriptions, “the manual indexing of images has remained a matter of trying to represent visually encoded semantic content in a verbal surrogate” (Enser, 2008, p. 533). It is sometimes difficult to understand the context of an image without the elucidation of accompanying text (Barthes, 1964; Neal, 2010). Text-based indexing remains the most preferred mode of indexing semantic content of multimedia documents because “content-based multimedia search remains out of reach, and a simple tag like ‘Tokyo’ provides more information than we can possibly glean from content-based algorithms” (Weinberger, Slaney, & Van Zwol, 2008, p. 111). However, content-based image and video indexing mechanisms are gaining in popularity. Bertini, Del Bimbo, and Serra (2008) provide a comprehensive overview of recent work involving automatic ontology creation for the semantic annotation of videos. As the authors demonstrate, these formal hierarchies of conceptual description are created using a variety of metadata schemas, such as Dublin Core and Resource Description Framework (RDF) or Semantic Web technologies like the Ontology Web Language (OWL). Ontologies can be seen as complementary to user tagging.
Social tagging and indexing Both still and moving images are spatial in nature (they are of a specific place or taken / taped at the place) (Crandall, Backstrom, Huttenlocher, & Kleinberg, 2009), and so it is no wonder that hundreds of millions of Flickr images are geotagged. Moving images also have an apparent temporal dimension. They are taken or taped at a particular time and / or over a period of time. Moving images, however, have distinct dimensions, including motion and sound. Hence, unlike text documents, indexing of still and moving images is more challenging due to their visual nature and these dimensions. In spite of this, textual descriptions of still and moving images are popular methods of indexing (Enser, 2000; Jörgensen, 1995, 1996, 1998, 2003) and multimedia search queries are predominantly formulated using words (O’Connor, O’Connor, & Abbas, 1999). User-centred indexing proponents suggest a number of solutions to the indexing problem. Chief among these is involving users in the indexing process by encouraging them to supply index terms reflective of their future information need (Fox, 2006). It has to be pointed out that involving users in the indexing process assumes a certain level of motivation on the part of the users. This assumption could be met if system designers can provide an environment similar to social tagging, although social tagging works mainly because users are tagging their own resources (e.g. photos and videos) so that they will be able to retrieve
Chapter 8. An exploration of tags assigned to geotagged still
191
them. O’Connor (1996) was among the first to recognize the role that user-generated tags / keywords could play in information representation and indexing. Recently, the merits of indexing through collaborative tagging, coupled with the popularity of folksonomies, have been emphasized (Olson & Wolfram, 2008). Social tagging and indexing have more in common, with the difference being that indexing is done by professionals with the help of professional indexing tools, while social tagging is performed by non-professionals (Olson & Wolfram, 2008). Tagging is a form of indexing, albeit one where the end-user plays the role of the indexer, and it allows more freedom for the indexer. Broadly speaking, research on social tagging can potentially inform indexing. Enser (2008, p. 534) recognized the potential of social tagging and the challenges it poses when he succinctly stated that “social tagging has brought a new dimension to the representation of the semantic content of visual materials.” On online services for sharing still and moving images such as Flickr and YouTube, users assign personal tags to these visual materials and this “challenges the supremacy of professionally sourced, authoritative subject representation, whilst introducing opportunities for beneficial enhancement of both exhaustivity and specificity in subject indexing” (Enser, 2008, p. 534). Because it is a new phenomenon, it is understandable that there is scepticism about the role social or collaborative tagging can play in information organization (Trant, 2006), including indexing still and moving images. Granted, users’ languages can be imprecise and inaccurate (Guy & Tonkin, 2006; Hammond, Hannay, Lund, & Scott, 2005), and this is unavoidable because social tagging is free (not controlled) and open. Also, “a major weakness (among other flaws) of user-generated tags is the lack of semantic relations between terms, which are often represented in controlled semantics as broader, narrower, and related terms; or, in ontologies as relations between classes of concepts and instances” (Chen, Liu, & Qin, 2008, p. 117). However, we should emphasize that one of the strengths of social tagging is that for the first time capturing the vocabulary of users on a large scale is possible. In addition, due to the power law of social tags (most used tags are likely to be used by others and the majority of tags are likely to be used by few) (Mathes, 2004), social tagging is an ideal tool for indexing purposes. Social tagging could also address some of the concerns raised by Fidel (1994) such as “sources of index terms” and “user language” because tags are supplied by users. Due to the popularity of social tagging, enough user-generated tags are also available for training automatic indexing algorithms (von Ahn & Dabbish, 2004) and the imprecision and accuracy issues could be resolved by improving the quality of user supplied tags through indexing system features that could correct misspellings and avoid inclusion of inappropriate tags (Guy & Tonkin,
192
Rorissa, Rasmussen Neal, Muckell, Chaucer
2006). In order to realize these benefits, an understanding of the nature of tags is necessary. Toward this end, a growing number of researchers are exploring various aspects of Flickr use and image tagging. An ongoing debate in information science questions whether user-supplied tags are useful by themselves for image retrieval, or whether authoritative assistance such as an ontology or thesaurus is beneficial (Neal, 2006; Rafferty & Hidderly, 2007; Yoon, 2009). The social aspect of Flickr activity has been emphasized. For example, Cox’s (2008) thoughtful essay positions Flickr as a Web site popular with hobbyists for social networking purposes. A qualitative study of Flickr tagging practices at the point of mobile capture concluded that social motivation was a primary reason for providing tags (Ames & Naaman, 2007). Neal (2010) found that social activity associated with Flickr photographs has a positive correlation with the image’s Flickr relevance ranking. Bar-Ilan, Zhitomirsky-Geffet, Miller, and Shoham’s (2010) research noted the convergence of image tags after several people had tagged the same images. A study of Flickr photoset or collection-level tags revealed a series of categories for classifying the tags used at this level (Stvilia & Jörgensen, 2009); Beaudoin (2007) conducted a similar study with popular tags provided by random Flickr users. Schmidt and Stock (2009) explored the consistency of emotions assigned to Flickr photographs. Lee and Neal (2010) propose a new model of photograph description, guided by the all-time most popular Flickr tags as well as participant-supplied tags and descriptions for a test set of photographs.
Geospatial metadata and geotagging Geospatial metadata, or information about the geographic data associated with a document, has been an area of growing interest in recent years, especially in conjunction with the increased prevalence of geographic information systems (GIS). Many standards-based forms of geospatial metadata exist. These include ISO 19915, Geographic Information – Metadata – North American Profile (NAP), (Federal Geographic Data Committee, 2011), American National Standards Institute Codes (U.S. Census Bureau, Geography Division, 2010), the element in Dublin Core (Torniai, Battle, & Cayzer, 2007), and geographic data fields in the MARC 21 Format for Bibliographic Data (Library of Congress, 2010). The Exchangeable File Format (Exif), used for storing metadata in digital image files, affords Global Positioning System (GPS) data, such as latitude, longitude, and altitude (W3C, 2003; Torniai, Battle, & Cayer, 2007). Leaders of projects underway in the library and archival communities have noted the importance of utilizing and maintaining geographic metadata standards for the purposes of
Chapter 8. An exploration of tags assigned to geotagged still
193
long-term preservation and interoperability among geospatial repositories (Steinhart, 2006; Morris, Tuttle, & Essic, 2009). More research is needed to determine what standards best meet user needs, since metadata formats can influence relevance perceptions (Fraser & Gluck, 1999). Geotagging is one type of geospatial metadata. As previously mentioned, geotags frequently consist of numeric information such as latitude, longitude, altitude, and distance. Digital cameras with built-in Geographic Positioning System (GPS) capabilities can automatically add numeric geotags, or the photographer can add them using a myriad of freely available online tools. Geotags also appear as user-assigned semantic tags. Semantic geotags may include specific place names (“New York City,” “Taj Mahal”) or generic locations (“beach,” “mountains”). Elwood (2008) wrote that new geographic information opportunities such as geotagging offer new types of connections, noting the existence of “more ordinary people creating digital spatial data, and the rising potential to represent aspects of everyday life through geovisualization” (p. 261). Flickr offers browsing and searching tools that allow exploration of geotagged photos by place. Google’s Panoramio Web site, exclusively devoted to hosting and exploring place-oriented photographs, offers overlays of geotagged photos within Google Maps. Picasa, Google’s photograph editing and organization tool, has a “Where in the world?” game on its Web site, which allows players to indicate where they think geotagged photographs were taken by clicking on a world map. Despite the apparent interest in providing geotags and viewing photographs based on location, many geotagged photographs are shot close to home. Hecht and Gergle’s (2010) study found that “over 50 percent of Flickr users contribute local information on average, and over 45 percent of Flickr photos are local to the photographer” (p. 229). Researchers in computer science are developing and testing geotagged photograph applications by utilizing API calls to extract geotags. Efforts include combining geotags with image recognition technology to enhance retrieval (Joshi & Luo, 2008; Yaegashi & Yanai, 2009; Yanai, Yaegashi, & Qiu, 2009), clustering geotagged photographs to allow browsing of similar photographs on a map (Quack, Leibe, & Van Gool, 2008; Chen, Battestini, Gelfand, & Setlur, 2009; Chen et al., 2009), and combining geotags with content-based retrieval methods in order to suggest geotags (Moxley, Kleban, & Manjunath, 2008). Amitay, Har’El, Sivan, and Soffer (2004) and Zong, Wu, Sun, Kim, and Goh (2005) have explored methods of avoiding geo / geo ambiguities (Paris, Texas vs. Paris, France) and non-geo / geo ambiguities (turkey the bird vs. Turkey the country), such as extracting place names from a gazetteer, and using the gazetteer entries to verify the accuracy of assigned semantic geotags. Conversely, Keβler, Maué, Heuer, and Bartoschek
194
Rorissa, Rasmussen Neal, Muckell, Chaucer
(2009) describe a system for creating a gazetteer using extracted geotags. Lee, Won, and McLeod (2008) noted a strong power law correlation between the similarity of non-geotags and geotags. Jaffe, Naaman, Tassa, and Davis (2006) developed an algorithm to allow the visualization of multiple photographs on a map. While most geotagging studies have involved still images, Ay, Zimmerman, and Kim (n.d.) created a query method for geotagged moving images based on geographic location information as well as the direction and position of the camera.
Methods From the population of Flickr users who uploaded still images during the first week of the month of December 2009, Flickr’s API was used to select a simple random sample of geotagged still and moving images. To ensure that only geotagged images were returned, the “has_geo” argument in the flickr.photos.search API method was selected. When this argument is true, it means the photo or video has been geotagged. The tags were coded into the three basic levels based on their level of abstraction. Superordinate tags would only contain terms and geographies that consist of the broadest terms such as ocean or North America, while basic level would describe a generic description of a location, such as beach. Tags describing a specific location, such as Myrtle Beach, were coded as subordinate. Our sample of still images consisted of 736 still images and 3826 associated tags, with 390 owners. The moving image sample consisted of 1467 videos, 4272 tags, and 791 owners. These numbers include all tags in the sample, both geotags and non-geotags. The majority of the tags were in English.
Data analysis Mainly, content analysis, more specifically coding, was used to analyse the tags. Each tag was assigned to one of the three categories (subordinate, basic, and superordinate) using coding dictionaries created by Rorissa (2008) and Rorissa and Iyer (2008). Two research assistants each coded all the tags independently; where their coding differed, the authors together made the final determination of the appropriate coding category. For most tags, coding was conducted after the coders viewed the still and moving images to which the tags were assigned. Allowing the coders to view the images allowed for more accurate results, since ambiguous tags could be clarified based on the context of the images. For example, if a specific tag is a number, that number could refer to the date the
Chapter 8. An exploration of tags assigned to geotagged still
195
image was taken, the age of a person in the image, or a specific number shown in the image. Viewing the image allows the coder to perceive the semantic context and thereby allows for more reliable results. The three steps in content analysis (Weber, 1990) were followed where first the coding scheme previously utilized in image research by others (Rorissa, 2008; Rorissa & Iyer, 2008) was selected. Then, the consistency / reliability of the coding (through inter-coder agreement analyses) was assessed using percent agreement and Cohen’s kappa (1960) followed by the coding of the entire corpus or set of tags (Weber, 1990). Tags in both samples were the recording units (basic unit of text coded or categorized). The chi-square and t-tests were used to determine the extent to which the level of abstraction of tags assigned to geotagged still and moving images are similar or different. Percent agreement and Cohen’s kappa are provided as metrics to the amount of agreement of the two research assistants. Consensus coding is subsequently used by the authors to reach 100 % agreement. Computed percent agreement and Cohen’s kappa values were 0.83 and 0.70, respectively, for still images and 0.89 and 0.80, respectively, for moving images. All computed percent agreement and kappa values were above 0.70. To interpret these values, we followed the benchmarks provided by Landis & Koch (1977) where values above 0.20 and below 0.40 are considered indications of a fair level of agreement, and values between 0.41 and 0.60 are considered moderate. Values between 0.61 and 0.80 are considered substantial while those above 0.80 are considered almost perfect. Our conclusion is that the coding was consistent (reliable) because, according to Neuendorf (2002) as well, values over 0.70 are considered satisfactory. During coding, we noticed that there were a significant number of semantic tags that are geographic or place names (e.g., New York City, England, etc.). To be exact, there were 805 semantic tags that are geographic names assigned to 367 still images and 779 assigned to 543 moving images. This was understandable given that our sample consisted of geotagged still and moving images. Due to this, our subsequent analyses will take this into consideration. In addition, the similarities and / or differences between tags assigned to still and moving images, in terms of the frequency and average number of tags vis-à-vis the three levels of abstraction (subordinate, basic, and superordinate), were evaluated using the chi-square and t-tests.
196
Rorissa, Rasmussen Neal, Muckell, Chaucer
Results Description of the samples of geotagged still and moving images and their tags In total, our sample consisted of 736 geotagged still and 1467 geotagged moving images and 390 and 791 owners / users who assigned a total of 3826 and 4272 tags, respectively (see Table 1 for details). Of the 736 still images, about half (49.86 %) were assigned 805 tags that are geographic names (21.04 % of all tags assigned to still images). There were also 543 (37.01 %) moving images that were assigned 779 (18.24 %) tags that are geographic names. Our sample consisted of a comparable number of tags assigned to still and moving images, despite the fact that the numbers of moving images and their owners / users were almost twice the numbers of still images and their owners / users. In terms of the range of the number of tags assigned to both still and moving images and the relative variation of the number of tags per image and owner, the samples are comparable, too. In both cases, users assigned a wide range of tags (between 1 – 133 and 1 – 112 per owner, respectively). This could be due to the fact that Flickr users have the freedom to assign to their images as many tags as they wish and tagging is often collaborative; that is, multiple users can assign tags to the same image.
Still Images (Photos)
Number of images (%) Number of owners / users (%) Total number of tags (%) Minimum number of tags per image Minimum number of tags per owner / user Maximum number of tags per image Maximum number of tags per owner / user Mean number of tags per image Mean number of tags per owner / user
Moving Images (Videos)
G
NG
C
G
NG
C
367 (49.86 %) 204 (52.31 %) 805 (21.04 %) 1
686 (93.21 %) 371 (95.13 %) 3021 (78.96 %) 1
736 (100 %)* 390 (100 %)* 3826 (100 %) 1
543 (37.01 %) 276 (34.89 %) 779 (18.24 %) 1
1307 (89.09 %) 732 (92.54 %) 3493 (81.76 %) 1
1467 (100 %)* 791 (100 %)* 4272 (100 %) 1
1
1
1
1
1
1
16
60
67
9
32
34
42
130
133
36
108
112
2.19
4.4
5.2
1.43
2.67
2.91
3.95
8.14
9.81
2.82
4.77
5.4
197
Chapter 8. An exploration of tags assigned to geotagged still
Still Images (Photos)
Median number of tags per image Median number of tags per owner / user Mode number of tags per image Mode number of tags per owner / user Standard deviation of number of tags per image Standard deviation of number of tags per owner / user
Moving Images (Videos)
G
NG
C
G
NG
C
2
2
3
1
2
2
2
4
5
2
2
3
1
1
1
1
1
1
1
1
2
1
1
1
1.85
5.39
5.98
0.88
2.87
3.09
5.24
13.6
15.18
3.8
7.93
9.01
G=Tags that are geographic names, NG=Tags that are not geographic names, C=Combined (total of G & NG). *This is not equal to the sum of G and NG because some images / owners had both types of tags Table 1. Descriptive statistics for the number of tags that are geographic and non-geographic names assigned to geotagged still and moving images.
Overall, more tags for single images were found to be at the subordinate level than at either the basic or superordinate levels; 53.35 % were at the subordinate level, 41.64 % at the basic level, and only 5.02 % at the superordinate level (see Table 2). A similar picture emerged for moving images. In total, 4272 tags were assigned to 1467 moving images. Once again, a slightly more proportion of tags (51.33 %) were at the subordinate level than at the basic level (43.47 %) and the number of tags assigned to moving images that are at the superordinate level were very few (5.20 %). Of the combined set of 8098 tags for both still and moving images, 52.29 %, 42.60 %, and 5.11 % were at the subordinate, basic, and superordinate levels, respectively.
198
Rorissa, Rasmussen Neal, Muckell, Chaucer
Type of tags
Level of abstraction
G
NG
C
Subordinate Basic Superordinate Total Subordinate Basic Superordinate Total Subordinate Basic Superordinate Total
Still images (Photos)
Moving images (Videos)
Total
Freq.
%
Freq.
%
Freq.
%
453 235 117 805 1588 1358 75 3021 2041 1593 192 3826
56.27 29.19 14.53 100.00 52.57 44.95 2.48 100.00 53.35 41.64 5.02 100.00
425 219 135 779 1768 1638 87 3493 2193 1857 222 4272
54.56 28.11 17.33 100.00 50.62 46.89 2.49 100.00 51.33 43.47 5.20 100.00
878 454 252 1584 3356 2996 162 6514 4234 3450 414 8098
55.43 28.66 15.91 100.00 51.52 45.99 2.49 100.00 52.29 42.60 5.11 100.00
G=Tags that are geographic names, NG=Tags that are not geographic names, C=Combined (total of G & NG).
Table 2. Frequency and percentage of tags assigned to geotagged still and moving images and that are at the subordinate, basic, and superordinate levels.
When the tags were separated into two groups of tags that are geographic names and those that are non-geographic names, the proportions of the tags at the three levels did not change much either except for a decrease in the percentage of basic tags that are geographic names and an increase in the percentage of superordinate tags that are geographic names. What is interesting is that this is true for tags that are geographic names and those that are non-geographic names assigned to both still and moving images. For instance, there were more subordinate level tags that are geographic names (56.27 % and 54.56 % of all tags that are geographic names assigned to still and moving images, respectively) than those that were basic (29.19 % and 28.11 %, respectively) and superordinate level tags that are geographic names (14.53 % and 17.33 %, respectively). There were also more non-geographic name tags that were at the subordinate level (52.57 % and 50.62 % of all non-geographic name tags assigned to still and moving images, respectively) than at the basic (44.95 % and 46.89 %, respectively) and superordinate levels (2.48 % and 2.49 %, respectively). Based on the figures in Table 2, it appears that at least in terms of their frequencies, tags at the three levels of abstraction assigned to still and moving images on Flickr have a similar structure. However, further analysis is needed to ascertain this apparent similarity and to show that there is no statistically signifi-
Chapter 8. An exploration of tags assigned to geotagged still
199
cant difference between the proportions of subordinate, basic, and superordinate level tags assigned to still and moving images. We discuss this in the next section.
Differences between tags assigned to geotagged still and moving images Our main goal in this work was to determine the extent to which tags assigned to still and moving images are similar or different in their level of abstraction using the basic level theory and the three levels (subordinate, basic, and superordinate) as a framework. In order to achieve this goal, we compared our samples of tags assigned to still and moving images on Flickr using a chi-square (χ2) analysis and t-tests. Our analysis was based on the figures in Tables 2 and 4. For all three groups of tags (geographic name tags, non-geographic name tags, and all tags combined), statistically non-significant chi-square values (Table 3) were obtained. As we have observed in our analysis of the proportions of the tags at the three levels above, this confirmed our conclusion and served as proof that the type of geotagged image (still and moving) and the level of abstraction (subordinate, basic, & superordinate) have no relationships for Flickr tags. In other words, the level of abstraction of tags assigned to geotagged still and moving images are not different with respect to the basic level theory.
Type of tag
χ2
Tags that are geographic names (G) Tags that are not geographic names (NG) Combined (all tags)
2.316 2.524 3.279
p-value 2 2 2
0.314 0.283 0.194
Table 3. Chi-square values for the relationship between type of geotagged image (still & moving) and level of abstraction of tags (subordinate, basic, & superordinate).
200
Rorissa, Rasmussen Neal, Muckell, Chaucer
n
Subordinate
Basic
Superordinate
Mean
SD
Mean
SD
Mean
SD
Still images (photos)
Images(G) Images(NG) Images(C) Owners(G) Owners(NG) Owners(C)
367 686 736* 204 371 390*
1.23 2.31 2.77 2.22 4.28 5.23
1.27 3.25 3.46 3.74 8.10 8.97
0.64 1.98 2.16 1.15 3.66 4.08
0.93 2.99 3.24 2.01 7.62 8.07
0.32 0.11 0.26 0.57 0.20 0.49
0.59 0.41 0.64 1.19 0.65 1.16
Moving images (videos)
Images(G) Images(NG) Images(C) Owners(G) Owners(NG) Owners(C)
543 1307 1467* 276 732 791*
0.78 1.35 1.49 1.54 2.42 2.77
0.71 2.01 2.11 3.00 4.29 5.23
0.40 1.25 1.27 0.79 2.24 2.35
0.63 1.48 1.52 1.57 5.56 5.48
0.25 0.07 0.15 0.49 0.12 0.28
0.47 0.28 0.44 1.43 0.68 1.18
G=Tags that are geographic names, NG=Tags that are not geographic names, C=Combined (total of G & NG). *This is not equal to the sum of G and NG because some images / owners had both types of tags. n=number of still images or moving images or owners / users
Table 4. The mean and standard deviation of number of tags assigned to geotagged still and moving images vis-à-vis the three levels.
While the chi-square (χ2) test results showed that, at least for our sample of Flickr tags assigned to still and moving images, the number of tags assigned to still images at the three levels of abstraction are not different from the number of tags assigned to moving images, that does not mean that there was no difference in the average number of tags assigned to the two types of images and at the same level of abstraction. That is, we needed to ascertain whether corresponding levels of abstractions had the same or different average number of tags assigned to still and moving images. In other words, whether there were, on average, more (or less) subordinate level tags assigned to still images than the average number of subordinate level tags assigned to moving images. To address this, we conducted a number of unequal variance t-tests for corresponding levels of abstractions (that is, subordinate vs. subordinate, basic vs. basic, etc.).
Chapter 8. An exploration of tags assigned to geotagged still
Subordinate
Images(G) Images(NG) Images(C) Owners(G) Owners(NG) Owners(C)
Basic
201
Superordinate
t-value df
p-value t-value df
p-value t-value df
p-value
6.87* 8.14* 10.74* 2.21** 4.99* 5.93*
0.00 0.00 0.00 0.027 0.00 0.00
0.00001 1.98** 0.00 2.75* 0.00 4.71* 0.0289 0.68 0.00043 1.95*** 0.00001 2.92*
0.048 0.0059 0.00 0.495 0.0509 0.0036
908 991 2201 478 1101 1179
4.59* 7.26* 8.85* 2.19** 3.53* 4.35*
908 1991 2201 478 1101 1179
908 1991 2201 478 1101 1179
G=Tags that are geographic names, NG=Tags that are not geographic names, C=Combined (total of G & NG). *p ≤ 0.01, **p ≤ 0.05, ***p ≤ 0.10 Table 5. Comparison of levels of abstraction of the number of tags assigned to geotagged still and moving images.
As is clear from Table 5, almost all (except one) of the t-values were statistically significant (p ≤ 0.01 for 13 of them, p ≤ 0.05 for three of them, and p < 0.10 for one of them), an indication that even though their overall structures are similar, the average number of tags assigned to geotagged still and moving images were different for the same pairs of levels of abstraction. Invariably, on average, there were more subordinate, basic, and superordinate level tags assigned to geotagged still images and assigned by owners / users of geotagged still images than average number of tags at the same levels but assigned to geotagged moving images. For instance, the average number of subordinate geographic name tags (M=1.23) assigned to still images were significantly more than the average number of subordinate geographic name tags (M=0.78) assigned to moving images (t(908)= 6.87, p < 0.01). The same is true for tags that are non-geographic names and all the combined tags.
Discussion Summary of results, limitations, and possible biases Multimedia content currently utilizes far more data storage than traditional textual data. A 6 megapixel photo can be stored as a high quality JPEG image using about 5 megabytes of data. To put this in perspective, storing one single image is equivalent to storing roughly 2,500 pages of text (Joint Photographic Experts Group, 2010). In addition, the amount of data stored that is multimedia in nature is encompassing a greater percentage of the overall data stored on
202
Rorissa, Rasmussen Neal, Muckell, Chaucer
the internet (Lyman & Varian, 2003). Despite this trend, techniques for storing the semantic information contained in this data have not been widely adopted in practice. With the extraordinary growth in social tagging, it is now possible to analyse large volumes of multimedia messages with user-defined content to develop improved methods for indexing, which is useful for search and retrieval. To this end, two random samples of 3826 and 4272 Flickr tags assigned to 736 and 1467 still and moving images by 390 and 791 owners / users, respectively, were downloaded from Flickr in December 2009 using a Flickr API call. A coding exercise was undertaken using the basic level theory, and its three levels of abstraction (subordinate, basic, and superordinate) were used as a framework in order to achieve our main objective of finding the extent to which the level of abstraction of tags assigned to still and moving images are similar or different. We used the chi-square test to achieve this objective. To ensure integrity of our analysis, coding was performed by two individuals independently. In situations in which the coders did not agree, the research team analysed the tag and reached a common consensus. We also conducted further analysis using a number of t-tests to see if, on average, there were more tags at the same level of abstraction (e.g., subordinate) assigned to still or moving images. Our findings showed that tags assigned to geotagged still and moving images do not have statistically significant different levels of abstraction. That is, the type of image (still vs. moving) has no relationship to the level of abstraction of Flickr tags assigned to the images. However, pair wise, at all levels of abstraction, there were more tags assigned to still images and assigned by owners / users of still images than tags assigned to moving images and assigned by owners / users of moving images. There were slightly more subordinate tags than there were basic or superordinate tags, a result which conflicts with previous research showing that people most often gravitate toward using the basic level when describing objects. The data collected in this study were passively collected without participant bias, as opposed to previous research involving basic level theory on digital media performed in a laboratory setting, where participants are asked to tag preselected images. The advantage of this data collection method is that users were not aware that their tags were going to be used in a study, eliminating the potential for participant bias, although the limitation of this data collection method is that insight cannot be gained into users’ reasons for tag choices, an area that deserves future research. A related possible study bias may be that people who utilize geotagging may have other similar digital object organizational traits and behaviours, or potentially, have similar digital lifestyles. For instance, some geotagging systems require a user walk around with a separate Global Positioning System (GPS) device; have a GPS tracker attached to their camera case, or to have purchased a camera with GPS operation built in. These methods may or may
Chapter 8. An exploration of tags assigned to geotagged still
203
not require additional software processing for attaching a geotag to a moving or still image. These people may be the geotagging early adopters, and it follows that there may be similarities between people that utilized these early methods for geotagging images. As mobile internet devices become more location-aware and accepted by the mainstream, this may represent a shift in tagging behaviours from early adopter to mainstream user. While there is no automated way of determining the method by which images have been geotagged, it is useful to note this for understanding potential shifts or trends in this type of research. A limitation of our research is that the scope of our data collection has been restricted solely to still and moving images on Flickr. Therefore, our results are based on the assumption that Flickr tagging is representative of still and moving images across different tagging platforms. Also, our work looked at the implications of one factor, image format (still vs. moving), and more factors need to be looked into. Further research is needed to test our results on different datasets that contain user-generated tags on images from various online sources and demographics.
Interpretation of results Without working with Flickr users directly to understand their reasons for choosing particular tags, we can only provide logical assumptions based on our results and explanations provided in prior research. With that caveat in mind, we offer some thoughts regarding potential explanations of our results here. Our hope is to not proclaim definitive conclusions in such an early stage of tagging research, but rather to stimulate further questioning and discussion in the wider image indexing and retrieval research community. We offer two possible reasons why more tags were provided for still images than for moving images. One reason may be that video is thought to be more difficult to describe and retrieve (Enser, 2008). In addition to the technical difficulties inherent in processing automatic extraction and detection for content-based video retrieval, challenges arise due to the intrinsic nature of a moving image. It may contain special visual effects, music, spoken dialogue, multiple scenes with a multitude of visual content, and a host of additional elements. While more content might exist in a moving image than in a still image, still images may require more elucidation due to the relative lack of content. On the other hand, still images can evoke more instant reactions; we can quickly “index” our first reactions to their content with little cognitive effort. The human brain can process many details present in a scene within the first 100 milliseconds of a glance (Biederman, 1976; Intraub, 1980). A second possible reason is that while
204
Rorissa, Rasmussen Neal, Muckell, Chaucer
there is an element of a photograph that cannot be expressed in words (Barthes, 1981; O’Connor & Wyatt, 2004), there is a benefit in reading associated textual descriptions such as tags and captions in order to understand a photograph’s content and context (Neal, 2010). When considering the wealth of information that a moving image conveys compared to a still image, Neal’s finding seems particularly relevant. Also, if in fact our findings that tags assigned to still and moving images are not statistically significantly different in terms of their structure are confirmed by future research, there are implications for whether there is a need for interoperability between terminologies used in the description / indexing of the various formats of visual resources. Various potential reasons for the presence of more subordinate tags than basic level or superordinate tags exist. People create emotional connections to photographs of interest, leading to a desire to exercise control over how they are described and retrieved (Neal, 2006). It is possible that people perceive a stronger sense of control over our photographic representation if they provide tags with a higher level of specificity; this is an area for future research. Generic authoritative systems of description do not allow users to be as specific as they might like, especially if personal context is desired. For example, “Muskoka, Ontario, Canada” might be a plausible authoritative semantic geotag, but “Aunt Sarah’s Muskoka cottage” would not be. Adding to this equation is Wilson’s (1968) observation regarding the individual subjectivity present in “reading” a document: “[w]hat seems to stand out depends on us as well as on the writing, on what we are ready to notice, what catches our interest, what absorbs our attention” (p. 82). In addition to innate autonomous differences, individual items and small details frequently capture our attention when we view a picture, as supported by vision research (Buswell, 1935; Gombrich, 1968; Mackworth & Morandi, 1967).
Concluding remarks and recommendations for future work Results obtained in this study may apply to a general image collection and a collection of images built by individuals or groups of general users. However, future studies should compare them to results obtained from specialized image collections (e.g., medical images, news photographs) and users (e.g., doctors, journalists). Our findings also differ from those obtained by others with respect to the level of abstraction of image tags and descriptions (e.g., see Hare et al., 2007; Rorissa, 2008; Rorissa & Iyer, 2008). Future research needs to look into whether the sources and / or types of images and tags / descriptions have some effect on
Chapter 8. An exploration of tags assigned to geotagged still
205
the level of abstraction of the tags / descriptions. Researchers also need to look further into the similarities and / or differences in tags assigned to still and moving images and geotagged still and moving images using theoretical frameworks other than the basic level theory. Additionally, relationships between various types of tags, such as emotion-based tags (Neal, 2010; Schmidt & Stock, 2009) and geotags, should be investigated in order to better understand how people make lexical connections when assigning tags; such findings would have implications for improved browsing interfaces. As mentioned earlier, qualitative user studies involving users’ tagging behaviours would help researchers understand how to assist tagging, search mechanisms, and result set interfaces. Geotags have tremendous potential for digital library implementation, but additional work is needed. Geotagged still and moving images could be used as training sets for indexing and retrieval of visual resources in digital libraries where still and moving images that have similar latitude and longitude information could be automatically labelled and indexed. The fact that the tags assigned to the geotagged visual resources were assigned in a distributed manner with several taggers / indexers adds value to the automatically assigned tags and index terms. Researchers and practicing information professionals who are involved in the design and implementation of digital libraries might consider finding a way to use both semantic and numeric geotags to automate indexing of location contents of still and moving images. Another area for future investigation is the time at which still and moving images are geotagged and tagged, and the associated effect on tag frequency and detail levels. In other words, are the still or moving images being geotagged and tagged at the point of collection, or is this happening after the fact on a computer? With the proliferation of mobile internet devices with built-in capabilities for geotagging, it would follow that there would be an increase in situational, point of collection, geotagging and tagging. Simultaneously, there are improvements in personal image and moving image collection management software that allow post-collection geotagging and tagging. As the systems improve, both on mobile devices and through computer based collection management software or websites, it would follow that tagging behaviour may change. It is possible that situational tagging yields different types of tags than post-collection tagging, if not offering more tags situationally relevant to the moment of image collection than relying on memory of the context at time of image collection. Additionally, the method that tags are assigned situationally may also have bearing on the tags. As speech-to-text capabilities increase on mobile internet devices capable of geotagging and tagging, there may be an impact on the usage and nature of tag assignment when users can speak the tag after capturing an image versus having to type it in on a tiny mobile keyboard.
206
Rorissa, Rasmussen Neal, Muckell, Chaucer
Potential relevance may exist related to understanding the nature of geotagged moving and still images’ aggregation around location and for personal navigation. In understanding user assigned tags around geotagged images, there is an opportunity for understanding something about the actual location where the image was collected. For instance, Flickr has visualized geolocated tags with country names, and applied polygonal boundaries to such tags, and has been able to “recreate” a map of the world with these tag based boundaries. There may be other uses for geolocated tag-based information. Consider using a mobile internet device for finding a famous landmark. If users were able to query a geolocated tag “cloud” or some aggregation of tags while walking around a city, they would likely be able to use that device to find the landmark of interest. If the geotagging trend continues, geotags will likely be very useful in navigating the world and searching for objects and items of interest through the use of digital maps or augmented reality systems.
Acknowledgments We thank Andrew Noone and Greg Lloyd for coding and help in data analyses.
References Ames, M., & Naaman, M. (2007). Why we tag: Motivations for annotation in mobile and online media. Proceedings of the SIG CHI Conference on Human Factors in Computing Systems (CHI 2007). Amitay, E., Har’El, N., Sivan, R., & Soffer, A. (2004). Web-a-where: Geotagging web content. Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 273 – 280). Angus, E., Thelwall, M., & Stuart, D. (2008). General patterns of tag usage among university groups in Flickr. Online Information Review, 32(1), 89 – 101. Ay, S. A. A., Zimmerman, R., & Kim, S. H. (n.d.) Metadata organization and query optimization for large-scale geo-tagged video collections. Retrieved from http: // eiger.ddns.comp.nus. edu.sg / pubs / TRC10 – 11.pdf Bar-Ilan, J., Shoham, S., Idan, A., Miller, Y., & Shachak, A. (2008). Structured vs. unstructured tagging: A case study. Online Information Review, 32(5), 635 – 647. Bar-Ilan, J., Zhitomirsky-Geffet, M., Miller, Y., & Shoham, S. (2010). The effects of background information and social interaction on image tagging. Journal of the American Society for Information Science and Technology, 61(5), 940 – 951. Barthes, R. (1964). Rhetoric of the image. In S. Heath (Trans.), Image, music, text (pp. 32 – 51). Glasgow: Williams Collins Sons & Co Ltd.
Chapter 8. An exploration of tags assigned to geotagged still
207
Barthes, R. (1981). Camera lucida (R. Howard, Trans.). New York: Hill and Wang. Bearman, D., & Trant, J. (2005, September). Social terminology enhancement through vernacular engagement: Exploring collaborative annotation to encourage interaction with museum collections. D-Lib Magazine, 11(9). Retrieved from www.dlib.org / dlib / september05 / bearman / 09bearman.html Beaudoin, J. (2007). Flickr image tagging: Patterns made visible. Bulletin of the American Society for Information Science and Technology, 34(1). Retrieved from http: // www.asis. org / Bulletin / Oct-07 / beaudoin.html Bertini, M., Del Bimbo, A., & Serra, G. (2008). Learning ontology rules for semantic video annotation. Proceedings of the 2nd ACM Workshop on Multimedia Semantics. Biederman, I. (1976). On processing information from a glance at a scene: Some implications for a syntax and semantics of visual processing. Paper presented at the ACM / SIGGRAPH Workshop on User-Oriented Design of Interactive Graphics Systems. Buswell, G. T. (1935). How people look at pictures. Chicago, IL: The University of Chicago Press. Champ, H. (2009). 100,000,000 geotagged photos (plus) [Blog post]. Retrieved from http: // blog.Flickr.net / en / 2009 / 02 / 05 / 100000000-geotagged-photos-plus Chen, M., Liu, X., & Qin, J. (2008). Semantic relation extraction from socially-generated tags: A methodology for metadata generation. Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications (pp. 117 – 127). Chen, W., Battestini, A., Gelfand, N., & Setlur, V. (2009). Visual summaries of popular landmarks from community photo collections. Proceedings of the 17th ACM International Conference on Multimedia (pp. 789 – 792). Chen, Y., Chen, S., Gu, Y., Hui, M., Li, F., Liu, C., Liu, L., Ooi, B. C., Yang, X., Zhang, D., & Zhou, Y. (2009). MarcoPolo: A community system for sharing and integrating travel information on maps. Proceedings of the 12th International Conference on Extending Database Technology (pp. 1148 – 1151). Chung, E., & Yoon, J. (2009). Categorical and specificity differences between user-supplied tags and search query terms for images: An analysis of Flickr tags and Web image search queries. Information Research, 14(3). Retrieved from http: // InformationR.net / ir / 14 – 3 / paper408.htm Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37 – 46. Cox, A. M. (2008). Flickr: A case study of Web 2.0. Aslib Proceedings, 60(5), 493 – 516. Crandall, D. J., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. Proceedings of the 18th International Conference on the World Wide Web (pp. 761 – 770). Elwood, S. (2008). Geographic information science: New geovisualization technologies – emerging questions and linkages with GIScience research. Progress in Human Geography, 33(2), 256 – 263. Enser, P. G. B. (2000). Visual image retrieval: Seeking the alliance of concept-based and content-based paradigms. Journal of Information Science, 26(4), 199 – 210. Enser, P. G. B. (2008). The evolution of visual information retrieval. Journal of Information Science, 34(4), 531 – 546. Federal Geographic Data Committee. (2011). Geospatial metadata standards – The North Z American Profile (NAP) of the ISO 19115: Geographic information – metadata. Retrieved from http: // www.fgdc.gov / metadata / geospatial-metadata-standards#nap
208
Rorissa, Rasmussen Neal, Muckell, Chaucer
Fidel, R. (1994). User-centered indexing. Journal of the American Society for Information Science, 45, 572 – 576. Fox, R. (2006). Cataloging for the masses. OCLC Systems & Services, 22, 166 – 172. Fraser, B., & Gluck, M. (1999). Usability of geospatial metadata or space-time matters. Bulletin of the American Society for Information Science and Technology, 25(6), 24 – 28. Retrieved from http: // www.asis.org / Bulletin / Aug-99 / fraser_gluck.html Gombrich, E. H. (1968). Art and illusion: A study in the psychology of pictorial representation. London: Phaidon Press. Green, R. (2006). Vocabulary alignment via basic level concepts. OCLC / ALISE research grant report published electronically by OCLC Research. Retrieved from http: // www.oclc.org / research / grants / reports / green / rg2005.pdf Guy, M., & Tonkin, E. (2006). Folksonomies: Tidying up tags? D-Lib Magazine, 12(1). Retrieved from http: // www.dlib.org / dlib / january06 / guy / 01guy.html Hammond, T., Hannay, T., Lund, B., & Scott, J. (2005). Social bookmarking tools (I): A general review. D-Lib Magazine, 11(4). Retrieved from http: // www.dlib.org / dlib / april05 / hammond / 04hammond.html Hare, J. S., Lewis, P. H., Enser, P. G. B., & Sandom, C. J. (2007). Semantic facets: An in-depth analysis of a semantic image retrieval system. Proceedings of the 6th ACM International Conference on Image and Video Retrieval (pp. 250 – 257). Hecht, B. J., & Gergle, D. (2010). On the localness of user-generated content. CSCW ‘10: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work (pp. 229 – 232). Intraub, H. (1980). Presentation rate and the representation of briefly glimpsed pictures in memory. Journal of Experimental Psychology: Human Learning and Memory, 6(1), 1 – 12. Jaffe, A., Naaman, M., Tassa, T., & Davis, M. (2006). Generating summaries and visualization for large collections of geo-referenced photographs. MIR ‘06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (pp. 89 – 98). Jeong, W. (2009). Is tagging effective? – Overlapping ratios with other metadata fields. Proceedings of the International Conference on Dublin Core and Metadata Applications. Joint Photographic Experts Group. (2010). The JPEG Committee home page. Retrieved from http: // www.jpeg.org / Jörgensen, C. (1995). Image attributes: An investigation. (Unpublished doctoral dissertation). Syracuse University, Syracuse, NY|. Jörgensen, C. (1996). Indexing images: Testing an image description template. Proceedings of the 59th Annual Meeting of the American Society for Information Science, 209 – 213. Jörgensen, C. (1998). Attributes of images in describing tasks. Information Processing & Management, 34(2 / 3), 161 – 174. Jörgensen, C. (2003). Image retrieval: Theory and research. Lanham, MD: Scarecrow Press. Joshi, D., & Luo, J. (2008). Inferring generic activities and events from image content and bags of geo-tags. CIVR ‘08: Proceedings of the 2008 International Conference on Content-Based Image and Video Retrieval (pp. 37 – 46). Keβler, C., Maué, P., Heuer, J. T., & Bartoschek, T. (2009). Bottom-up gazetteers: Learning from the implicit semantics of geotags. Proceedings of the Third International Conference on GeoSpatial Semantics (pp. 83 – 102). Laine-Hernandez, M., & Westman, S. (2006). Image semantics in the description and categorization of journalistic photographs. Proceedings of the 69th Annual Meeting of the
Chapter 8. An exploration of tags assigned to geotagged still
209
American Society for Information Science and Technology. Retrieved from http: // www.asis. org / Conferences / AM06 / proceedings / papers / 48 / 48_paper.html Lakoff, G. (1985). Women, fire, and dangerous things: What categories reveal about the mind. Chicago: The University of Chicago Press. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159 – 174. Lee, H-J., & Neal, D. (2010). A new model for semantic photograph description combining basic levels and user-assigned descriptors. Journal of Information Science, 36(5), 547 – 565. Lee, S. S., Won, D., & McLeod, D. (2008). Tag-geotag correlation in social networks. SSM ‘08: Proceedings of the 2008 ACM Workshop on Search in Social Media (pp. 59 – 66). Library of Congress. (2010). MARC 21 format for bibliographic data. Retrieved from http: // www. loc.gov / marc / bibliographic / Lyman, P., & Varian, H. R. (2003). How much information 2003? Retrieved from http: // www. sims.berkeley.edu / research / projects / how-much-info-2003 / Mackworth, N. H., & Morandi, A. J. (1967). The gaze selects informative details within pictures. Perception & Psychophysics, 2(11), 547 – 552. Morris, S., Tuttle, J., & Essic, J. (2009). A partnership framework for geospatial data preservation in North Carolina. Library Trends, 57(3), 516 – 540. Moxley, E., Kleban, J., & Manjunath, B. S. (2008). Spirittagger: A geo-aware tag suggestion tool mined from Flickr. MIR ‘08: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, 24 – 30. Neal, D. R. (2006). News photography image retrieval practices: Locus of control in two contexts (Unpublished doctoral dissertation). University of North Texas, Denton, TX. Neal, D. M. (2010). Emotion-based tags in photographic documents: The interplay of text, image, and social influence. Canadian Journal of Information and Library Science, 34(3), 329 – 353. Neuendorf, K. A. (2002). The content analysis guidebook. Thousand Oaks, CA: Sage Publications. O’Connor, B. C. (1996). Explorations in indexing and abstracting: Pointing, virtue, and power. Englewood, CO: Libraries Unlimited. O’Connor, B. C., O’Connor, M. K., & Abbas, J. M. (1999). User reactions as access mechanism: An exploration based on captions for images. Journal of the American Society for Information Science and Technology, 50(8), 681 – 697. O’Connor, B. C., & Wyatt, R. B. (2004). Photo provocations: Thinking in, with, and about photographs. Lanham, MD: Scarecrow Press. Olson, H. A., & Wolfram, D. (2008). Syntagmatic relationships and indexing consistency on a larger scale. Journal of Documentation, 64(4), 602 – 615. Panofsky, E. (1955). Meaning in the visual arts: Papers in and on art history. Garden City, NY: Doubleday. Peters, I. (2009). Folksonomies, indexing, and retrieval in Web 2.0. (P. Becker, Trans.) Berlin: De Gruyter Saur. Quack, T., Leibe, B., & Van Gool, L. (2008). World-scale mining of objects and events from community photo collections. CIVR ‘08: Proceedings of the 2008 International Conference on Content-Based Image and Video Retrieval (pp. 47 – 56). Rafferty, P., & Hidderley, R. (2007). Flickr and democratic indexing: Dialogic approaches to indexing. Aslib Proceedings, 59(4 / 5), 397 – 410.
210
Rorissa, Rasmussen Neal, Muckell, Chaucer
Rorissa, A. (2008). User generated descriptions of individual images versus labels of groups of images: A comparison using basic level theory. Information Processing & Management, 44(5), 1741 -1753. Rorissa, A., & Iyer, H. (2008). Theories of cognition and image categorization: What category labels reveal about basic level theory. Journal of the American Society for Information Science and Technology, 59(9), 1383 -1392. Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104(3), 192 – 233. Rosch, E., & Mervis, C. (1975). Family resemblances: Studies in the internal structure of categories. Cognitive Psychology, 7(4), 573 – 605. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8(3), 382 – 439. Schmidt, S., & Stock, W.G. (2009). Collective indexing of emotions in images: A study in emotional information retrieval. Journal of the American Society for Information Science and Technology, 60(5), 863 – 876. Steinhart, G. (2006). Libraries as distributors of geospatial data: Data management policies as tools for managing partnerships. Library Trends, 55(2), 264 – 284. Stvilia, B., & Jörgensen, C. (2009). User-generated collection-level metadata in an online photo-sharing system. Library & Information Science Research, 31(1), 54 – 65. Tjondronegoro, D., Spink, A., & Jansen, B. J. (2009). A study and comparison of multimedia Web searching: 1997 – 2006. Journal of the American Society for Information Science and Technology, 60(9), 1756 – 1768. Torniai, C., Battle, S., & Cayzer, S. (2007). Sharing, discovering, and browsing geotagged pictures on the Web. Retrieved from http: // www.hpl.hp.com / techreports / 2007 / HPL-2007 – 73.html Trant, J. (2006). Exploring the potential for social tagging and folksonomy in art museums: Proof of concept. New Review of Hypermedia and Multimedia, 12, 83 – 105. U.S. Census Bureau, Geography Division. (2010). American National Standards Institute (ANSI) Codes. Retrieved from http: // www.census.gov / geo / www / ansi / ansi.html von Ahn, L., & Dabbish, L. (2004). Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 319 – 326). Weber, R. P. (1990). Basic content analysis. Newbury Park, CA: Sage Publications. Weinberger, K. Q., Slaney, M., & Van Zwol, R. (2008). Resolving tag ambiguity. Proceedings of the 16th ACM International Conference on Multimedia (pp. 111 – 120). Wilson, P. (1968). Two kinds of power: An essay on bibliographical control. Berkeley, CA: University of California Press. Wolfram, D., Olson, H. A., & Bloom, R. (2009). Measuring consistency for multiple taggers using vector space modeling. Journal of the American Society for Information Science and Technology, 60(10), 1995 – 2003. World Wide Web Consortium. (2003). Exif vocabulary workspace – RDF schema. Retrieved from http: // www.w3.org / 2003 / 12 / exif / Yaegashi, K., & Yanai, K. (2009). Can geotags help image recognition? Proceedings of the 3rd Pacific Rim Symposium on Advances in Image and Video Technology (pp. 361 – 373). Yanai, K., Yaegashi, K., & Qiu, B. (2009). Detecting cultural differences using consumergenerated geotagged photos. LOCWEB ‘09: Proceedings of the 2nd International Workshop on Location and the Web.
Chapter 8. An exploration of tags assigned to geotagged still
211
Yoon, J. W. (2009). Towards a user-oriented thesaurus for non-domain-specific image collections. Information Processing and Management, 45(4), 452 – 468. Zong, W., Wu, D., Sun, A., Lim, E., & Goh, D. H. (2005). On assigning place names to geography related web pages. JCDL ‘05: Proceedings of the 5th ACM / IEEE-CS Joint Conference on Digital Libraries (pp. 354 – 362).
Maayan Zhitomirsky-Geffet, Judit Bar-Ilan, Yitzchak Miller, Snunith Shoham
Chapter 9. Exploring the effectiveness of ontology based tagging versus free text tagging Abstract: In this paper, we investigate the effectiveness of an ontology based on popular tags of a selected set of images assigned by users for further tagging of these and other images. In our experiment, we systematically compared three types of user interfaces for image tagging: free-text based, ontology-based, and a mixed interface which incorporates both free-text based and ontology based tagging. We found that ontological concepts are seen by users as more reliable and highly qualified for image tagging, even for new images, but users tend to avoid working with the ontology when other, more simple interfaces are provided. Keywords: User-centred design, experimentation, human factors, social tagging, image tagging, folksonomy, ontology, free-text tagging
Maayan Zhitomirsky-Geffet, Instructor Doctor, Department of Information Science, Bar-Ilan University Judit Bar-Ilan (corresponding author), Head of Department, Department of Information Science, Bar-Ilan University, [email protected] Yitzchak Miller, Teaching Fellow, Department of Information Science, Bar-Ilan University Snunith Shoham, Associate Professor, Department of Information Science, Bar-Ilan University
Introduction One of the major Web 2.0 activities is collaborative social tagging of textual information (such as electronic news articles – Delicious), products for sale (Amazon), images (Flickr), and video data (YouTube). Tag visualization methods appear to be quite useful for improving image retrieval. Integration of tag clouds (Flickr), or the “related terms” representation of a visual search engine (http: // www.kartoo. com), might yield more results than their initial search terms (Neal, 2008). This is based on the theory that the most popular choices are often the most correct choices (Surowiecki, 2004).
Chapter 9. Exploring the effectiveness of ontology based tagging
213
Effective management of digital image collection has been intensively discussed during the past decade (Kherfi, Ziou, & Bernardi, 2004; Rorvig, Turner, & Moncada, 1999), but it still raises many research questions. These discussions have included questions such as what type of tags are most useful for image retrieval: those provided by image creators, tags given by librarians, tags by common users, tags that name depicted objects and events, descriptions of what is physically present in an image, or what an image symbolically represents (Neal, 2008; Shatford, 1986). Traditionally, predefined controlled vocabularies and ontologies were exploited as a high quality semantic source for image tagging (Stvilia & Jörgensen, 2010). However, controlled vocabularies created by professionals are not well enough suited for common users’ information needs and narrow knowledge fields. Recently, folksonomies (taxonomies based on user tags) have become quite popular as an aid for tagging visual objects (Neal, 2008). For example, the goal of the Steve.museum project is to “encourage visitor engagement with museum objects” through social tagging (www.steve.museum). It has been shown that user-assigned tags are often quite different from the descriptions provided by curators (Trant, 2009; Trant, Bearman, & Chun, 2007). However, it is still unclear how effective free-text tags, folksonomy tags, or ontology tag selection can be for image tagging. Hence, the primary research goals of this work are: 1. A qualitative and quantitative analysis and comparison of various types of tags: those coming from the predefined ontology (ontological tags) versus free-text associations (free-text tags). 2. Understanding the influence of working with different interfaces which supply various types of information as an aid for image tagging processes on user behaviour. In order to explore the influence of various types of supplementary information provided on user behaviour, we compared free-text based and ontology based tagging interfaces, as well as an interface that incorporates both options. To this end, three different groups of users participated in the experiment, which was divided into three stages. The differences between the groups were in the order in which they were exposed to each of the interfaces. In addition, at the last stage of the experiment, every user in each group could see tags given by other users of this group, and change her choice by adding or deleting some tags from her own tag list. Further, to analyze the quality of the resulting tags of the different groups, we define several evaluation criteria, such as tag usage ratio, stability or modification rate, tag popularity ratio, and the amount of unpopular tags. Previous research has shown that the wisdom of the crowds helps users eliminate iden-
214
Zhitomirsky-Geffet, Bar-Ilan, Miller, Shoham
tification mistakes, and thus tag popularity can be used for measuring quality (Bar-Ilan, Zhitomirsky-Geffet, Miller, & Shoham, 2010). We found that ontological tags always achieved a broader user agreement (less than 5 % of tags were given by only one user vs. at least 25 % for free-text tags). They were at least by 45 % more frequently used by distinct users on average than the free-text tags and were much more stable during the tag modification stages of the experiment (only 2 – 3 % of added / deleted tags from the ontology vs. 30 – 40 % from the free-text tags set). When the ontology was the first option for tag selection and the free-text interface was available only at a later stage, most of the tags overall were ontology tags (57 %). On the other hand, the freetext interface when available before using the ontology or in parallel with using it was perceived as an easier option and therefore produced more free-text tags (65 – 75 % of all tags). The main implications from the above findings are: 1. Ontology could be very effectively employed for image tagging, when no other interfaces are available at the time of or before seeing the ontology. Therefore, there is a strong need for system development to support the indexing of images through the application of ontological terms. 2. Our study also revealed a complementary nature of the free-text and ontological tags, which might create a sustainable basis for a dynamic process of collaborative ontology extension.
Literature review Free text tags are easy to use, widely employed, and highly available for visual data tagging, and indexing. Another advantage of user assigned tags is that they reflect common user viewpoints and information needs. As reported by Brown, Hidderley, Griffin, and Rollason (1996), where free-text terms were gathered from users for an image collection, the use of user-defined free-text terms was “likely to be more successful and objective than the more rigid approaches which are typified by image retrieval systems today” (p. 118). However, such free-text tags are not suited for collaborative processes (Kim, Scerri, Breslin, Decker, & Kim, 2008), since they are vulnerable to semantic ambiguity, grammatical variations and typos, spamming, subjective, and even erroneous views of individual users. Furthermore, as stated by Garcia Castro and GarcíaSilva (2009), free-text tags and the folksonomies based on them suffer from the lack of explicit semantics, that is, the meaning of the used terms are not formally determined and may vary between different users and systems.
Chapter 9. Exploring the effectiveness of ontology based tagging
215
With the advent of Web 3.0 (the Semantic Web), numerous formal ontologies for image tagging have been created and discussed in the literature, such as Gruber’s (2007, 2008) TagOntology, the Social Semantic Cloud of Tags Ontology (Kim et al., 2008), and the Meaning of a Tag Ontology (Passant & Laublet, 2008). Hollink, Schreiber, Wielemaker, and Wielinga (2003) experimented with the Art and Architecture Thesaurus (Peterson, 1994; see also http: // www. getty.edu / research / tools / vocabulary / aat / ) and Iconclass (van der Waal, 1985) and concluded that ontologies can be helpful for image tagging and search. However, ontologies are rather challenging to use for common users (Kallergi, Fons, & Verbeek, 2009), and are often too general to fit for specific domains, like the Jewish cultural heritage in our case. Recently, several researchers suggested incorporating controlled and uncontrolled vocabularies for tagging, and thus to benefit from the advantages of both tag sources: free-text tags and ontologies (Gruber, 2007; Ménard, 2009; Stvilia & Jörgensen, 2010). Some previous studies attempted to compare and evaluate the quality of freetext versus ontological tags. To this end, Ménard (2009) performed a retrieval simulation and reported that retrieval is more effective and more satisfactory for the searcher when the images are indexed with the approach combining the controlled and uncontrolled vocabularies. The results also indicate that the indexing approach with controlled vocabulary is more efficient (queries needed to retrieve an image) than the uncontrolled vocabulary indexing approach. A later study by Stvilia and Jörgensen (2010) proposed some criteria for comparison between a folksonomy based on Flickr tags and two existing ontologies: the Thesaurus for Graphic Materials (TGM) and the Library of Congress Subject Headings (LCSH). The authors evaluated the intrinsic quality of the folksonomy as the ratio of valid terms, whereas the relational quality was evaluated as the ratios of valid terms not present in the baseline controlled vocabularies to identify the scale and the nature of overlaps and differences between the tags and the terms from these ontologies. While 37 % of the user tags were invalid, more than half of the tags did not appear in the existing ontologies and 25 % of these tags were valid noun phrases. Therefore, they conclude that folksonomies can help in extending the TGM and LCSH with user-generated terms and category relationships can make these vocabularies more accessible to different user communities, not just to information professionals.
216
Zhitomirsky-Geffet, Bar-Ilan, Miller, Shoham
Methodology and experimental setting In this chapter, we employ an ontological model that is based on the most popular user tags in our previous research (Zhitomirsky-Geffet, Bar-Ilan, Miller, & Shoham, 2010). But, as opposed to the folksonomies reviewed in the literature above, these tags were manually normalized and organized into semantically disambiguated synonym sets (synsets). These synsets were then interrelated by a variety of semantic relationships including the classic hyponymy, meronymy, and several domain specific relationships, like “figure_in” and “manifestation”. The ontology was built from user tags assigned to twelve images in the domain of Jewish cultural heritage. In the current experiment we use six of these images (Passover Haggadah, Menorah, Sukkah, Israel Declaration of Independence, The Western Wall, Cave of the Patriarchs) (see Figures 1.1 – 1.6), which we refer to as base images along with other six images (Wooden dreidel – Chanukah Spinning Top, Tzedakah – Charity, Moses, Jewish National Fund Happy New Year – a greeting card, Torah Reading, see Figures 1.7– 1.12) referred to as new images throughout the paper.
Chapter 9. Exploring the effectiveness of ontology based tagging
217
fig. 1.1 Israel Declaration of Independence www.knesset.gov.il / docs / eng / megilat_eng.htm
fig. 1.2 Cave of the Patriarchs www.hebron.co.il / top / str_Machpela.html
fig 1.3 Passover Haggadah haggadahsrus.com / y.Art10-Kibbutz1955.htm
fig 1.4 Sukkah http: // www.chelm.org / jewish / chags / sukkot / suk1pic.html
fig 1.5 Menorah http: // www.biblelight.net / temple.htm
fig 1.6 The Western Wall http: // en.wikipedia.org / wiki / Western_Wall
218
Zhitomirsky-Geffet, Bar-Ilan, Miller, Shoham
fig 1.7 Wooden dreidel http: // en.wikipedia.org / wiki / Dreidel
fig 1.8 Tzedakah (Charity) http: // samsonblinded.org / blog / the-best-jewish-charity. htm
fig 1.9 Moses http: // www.ort.org.il / year / eshavuot / tor.htm
fig 1.10 Jewish National Fund http: // palestineposterproject.org / category / special-collection / jewish-national-fund
fig 1.11 Happy New Year http: // atlasshrugs2000. typepad.com / atlas_ shrugs / 2005 / 10 / lshana_ tovah.html
fig 1.12 Torah Reading http: // www.catholica. com.au / gc1 / dg / 020_ dg_070308.php
Figure 1. The images used in our experiment.
Chapter 9. Exploring the effectiveness of ontology based tagging
219
Figure 2. A screenshot of the ontology based tagging interface. The top area under the search box is the ontology, while the textbox under the “Add tag” box provides the users with detailed instructions on the tagging process. It was possible to search a concept in the ontology through the free text search box at the top of the screen or to browse the ontology starting from some perspective. The ontology was displayed in four columns. The rightmost column listed the perspectives (note that Hebrew is written from right to left). Selecting a perspective displays all the concepts assigned to the perspective in the third column from the right. When selecting a concept from the third column, perspectives to which it belongs are highlighted (the first and the third perspectives, art and religion, were selected in the screenshot above) and related concepts are shown in the second and the fourth columns. In the second column from the right only hierarchically related parent concepts are shown, while in the fourth column from the right hierarchically related child concepts as well as otherwise related concepts (e.g. time, location, agent) are shown. Double clicking on one of the concepts adds it as a tag to the displayed image.
In order to evaluate the effectiveness of such an ontology for image tagging we conducted an experiment with three separate groups of human assessors, denoted, G_OM, G_FM, and G_M. Each of the groups was presented with the same set of twelve images in the area of Jewish cultural heritage and with some of the following distinct annotation interfaces: 1. (O) – Ontology-based annotation, where the ontology is presented to the users and they are only allowed to select suitable tags from the ontology for each of the given images (see Figure 2);
220 2. 3.
Zhitomirsky-Geffet, Bar-Ilan, Miller, Shoham
(F) – Free-text annotation, where users can assign any tags that come to their mind as an association to the given image; and (M) – Mixed interface, where users can both pick tags from the predefined ontology and in parallel use their own free association tags for each image.
All the participants of the experiment were students at the Department of Information Science at Bar-Ilan University. Group G_OM consisted of 23 people who used the ontology-based interface at the first stage of the experiment and a mixed interface at the second stage. Group G_FM included 28 users, who were at first provided with the free-text annotation interface, and with the mixed interface at the second stage. Finally, group G_M of 29 users worked with the mixed interface at the first stage of the experiment and thus did not have to participate in the second stage. The variations in the number of participants per group exist because each group corresponds to students in a given course. At the third stage of the experiment, each of the groups was shown all the tags assigned by all the users in the group (only tags which were assigned by at least two users in the previous stage were shown). The users could view the tags given by others, discuss them with other participants, and consequently modify their previous tag selection. We note that due to the highly specific domain of the image collection, the Jewish cultural heritage, our ontology cannot be evaluated by comparison to some well-known existing ontologies, like in Stvilia and Jörgensen (2010). Hence, to analyze the results of the different groups, we define the following new criteria for evaluation of tagging results: 1. Tag usage ratio: What type of tags (free-text or ontological) were supplied more by the users? To calculate the usage ratio, one has to determine the proportion of the ontological tags out of all the tags at stages 1 and 2 of the experiment. We assume that massive usage of ontological tags reflects a high effectiveness of the ontology as an image tagging tool. Tag usage ratio is the number of tags of a certain kind (ontological or free-text) and at a certain stage (either stage 1 or stage 2) of the tagging process divided by the total number of tags assigned by the group at that stage. 2. Tag stability rate: Counts the number of changes – additions and deletions – of ontology tags versus the number of changes of free-text based tags at the tag revision stage of the experiment. A large amount of changes reflects a lower quality of the tags given at the former stages. Hence, the more changes are made in the tag list, the less reliable this source of tags is to be considered.
Chapter 9. Exploring the effectiveness of ontology based tagging
221
3.
Tag popularity rank: A tag popularity rank (TPR) score of a tag is the number of different users who have assigned this tag to an image. We estimate the relative quality of the tags by comparing their TPR. According to the theory of controlled vocabulary construction (Lancaster, 2003; Soergel, 1974; Stvilia & Jörgensen, 2010), preferred terms for concepts in a controlled vocabulary usually are selected from the most frequently used terms. Hence, we compare the average and maximal TPR values of ontological tags versus those of free tags, assuming that tags with higher TPR scores are more reliable than those with low TPR scores. 4. TPR-1 tags ratio: Compares the relative amount of tags with a TPR of 1 (i.e. a tag assigned to an image by only one user) for ontological tags versus free tags. Many TPR-1 tags reflect a low user agreement, while few TPR-1 tags make the source of tags more reliable. There can, of course, be cases where an individual with an exceptional vocabulary assigns unique, high-level tags, but in most cases, TPR-1 tags have a lower value.
Analysis and results Tag usage ratio To compare the tagging results for the three groups, G_M, G_OM, and G_FM, we first calculated the average total amount of tags given to each image by every user group, as well as the average number of tags taken from the ontology – ontological tags – out of this total. When comparing the groups’ average totals for all the images at stage 1, it appears that the largest average total amount of tags are given by group G_OM, whose users could only use the ontology at this stage of the experiment (see Table 1). A possible reason could be the fact that an ontology based interface stimulates people to use more tags because it reveals many more tags one does not think of otherwise (see Figure 2).
222
Zhitomirsky-Geffet, Bar-Ilan, Miller, Shoham
Group
Average total amount of tags per image at stage 1
Average amount of ontological tags per image out of the total amount of tags at stage 1
G_M (mixed interface) G_OM (ontology only, then mixed) G_FM (free text only, then mixed)
4.2 10.8 4.6
1.3 10.8 --
Table 1. The average amount of tags given to an image at stage 1 by different user groups.
We further examined the second stage (mixed interface) results and compared them for the base and the new images separately, assuming a possibly different behaviour for these subsets of images. Table 2 summarizes the first two stages’ results for the different groups of users and image sets. For group G_M (a mixed interface) the results in the table are for the first stage as there is no second stage for this group.
Group
Base images ontological tags
New images ontological tags
Base images free text tags
New images free text tags
All images ontological tags
All images free text tags
All tags
G_M G_OM G_FM
2.1 13.9 5.1
0.5 7.7 2.2
2.6 2.8 5.3
3.1 3.6 5.1
1.3 11.2 3.7
2.9 3.2 5.2
4.2 14.2 8.9
Table 2. The average amount of tags given to an image in the end of stage 2 of the experiment.
For group G_FM, in which users had to use only free text tags at stage 1, and then at stage 2 could add either tags from ontology or more free-text tags, the obtained results are as follows. At stage 1 (free-text tags only) there were almost identical results for the two sets of images (for the new images 4.55 tags on average vs. 4.65 for the base ones). This is a reasonable result as the users had to freely choose tags for the images without relying on an existing tag collection which is more adequate to the base images’ set than to the new images’ set. However, at stage 2, for the base images, the large majority of the tags were from the ontology: 5.1 versus 2.2 for new images. Thus, there is a clear distinction in the amount of ontological tags given to the images in the base set versus the amount of ontological tags given to the images in the new set. This is, of course, expected as well as explained above. Interestingly, for free-text tags, the picture from stage 1 remained almost identical: 5.3 free text tags on average for base images versus 5.1
Chapter 9. Exploring the effectiveness of ontology based tagging
223
for new images. One can infer that the presence of the ontology effected only the selection of tags from it but did not inspire the users to come up with additional free-text tags to the base images. Overall, for group G_FM, 41 % of the tags given in both stages 1 and 2 were from the ontology (49 % for base images and 30 % for new ones). For group G_OM, where users could only use the ontology at stage 1 and then use both ontology and free tags at stage 2, we obtained the following results. At stage 1, 10.8 tags were assigned per image on average (13.9 for base images, 7.7 for new ones). At stage 2, 90 % of the tags per image given were free-text for both base and new images (with slightly more tags on average for new images than for base images). The addition of mainly free text tags at the second stage was expected, because the users have already assigned ontological tags in the first stage. Overall, 74 % of the tags per image on average were ontological for stage 1 and 2 altogether (83 % for base images, 65 % for new ones). Group G_M results show that for the new images, free-text tags dominated (86 % of all tags), but for the base images on which the ontology was built, the gap between the numbers of chosen tags from both interfaces reduces to 45 % ontological versus 55 % free-text tags. The main reason for the difference in the tagging pattern of the new and old images was that there were not enough matching ontological tags for the new images. Interestingly, for one of the new images, “Torah Reading” (see Figure 1), there were always similar results to those of the base images, that is, the amounts of the ontological tags for this image were much higher than for other new images and quite similar to the base images for all the user groups. Carefully inspecting the ontology reveals that many tags that are relevant to “Torah Reading” were inserted into the ontology based on other images which, although presenting different themes, had many overlaps. This in turn demonstrates the effectiveness of building ontology out of a set of images to be used for tagging in a larger set of the same domain. Finally, we also checked the overall amount of given tags for new and base images separately (see Table 3), for all the groups. We found that more tags were given for base images for all groups at all the stages.
224 Group
G_M G_OM G_FM
Zhitomirsky-Geffet, Bar-Ilan, Miller, Shoham
After stages 1 & 2 – The average number of tags per image
After stage 3 – The average number of tags per image
Base images
New images
Base images
New images
4.7 16.7 10.4
3.6 11.3 7.3
5.9 20.1 12.7
5.3 14.8 10.0
Table 3. Overall tag amounts for base and new images.
For the next step of the analysis, we calculated the number of tags given to each of the images by individual users in every group at the various stages of the experiment. This kind of data is more indicative of an individual user’s behaviour and preferences rather than the general averaging per image presented in Tables 1 – 3. The summary of our per user analysis is presented in Table 4.
Stage 1 – Average free tags per image Stage 1 – Average ontological tags per image Stage 1 – Free tags excluding those that appeared in the ontology Stage 2 –Average free tags per image Stage 2 – Average ontological tags per image Stage 2 –Ontological tags including those added via free text interface Stage 2 – Free tags excluding those that appeared in the ontology Overall ontological tags for both stages including those added via free text interface Overall free tags for both stages excluding those that appeared in the ontology
G_M
G_OM
G_FM
2.9 1.3 2.2
-5.8 --
4.6 -3.5
----
3.4 0.4 0.9
0.6 3.6 3.9
--
2.9
0.3
2.0
6.1
4.3
2.2
2.9
3.6
Table 4. Individual user results averaged for all images for the different groups and stages.
Our first finding for all the groups was that many of the tags given via the free-text interface actually appeared in the ontology, but for some reason the users were not able to fetch them from the ontology. This leads us to the assumption that the ontological tags could be used more massively and effectively if the access to the ontology was easier for the users. For example, for group G_M, when we considered tags that were added through the free text interface but which also appeared in the ontology as ontological tags, the amount of ontological tags was significantly higher than the free text ones for base images (70 % vs. 32 % instead
Chapter 9. Exploring the effectiveness of ontology based tagging
225
of 45 % vs. 55 %). Overall for all the images (base and new) we got similar results for both interfaces (49 % vs. 51 %). Looking at the individual users’ averaged results for group G_M, we obtain that 48 % (2.0 vs. 4.2) of the given tags appeared in the ontology. We further noticed that some (8 out of 29) of the users from the G_M group had chosen no ontological tags for any of the images, probably meaning that they did not understand how to use the ontology or that they decided not to invest time and effort in learning how to use it. After removing these users’ results, we obtained more ontological tags per image per user on average (2.5), including those given via the free-text interface, and only 2.1 free-text tags excluding those appearing in the ontology. These figures provide additional evidence to support the assumption that ontological tags could be more widely exploited by the users than free-text ones, if there was a more convenient way to search and browse the ontology.
Comparison of the tag stability rates Further analysing the tags of the G_OM group, overall, at stage 2 almost all the tags were free-text as could be expected, since the ontology was maximally employed at stage 1. There was one exceptional case of a user who almost did not add tags at stage 1, and then added 90 % (66 out of 74) of all the ontological tags that were added at stage 2. Her results have slightly but not critically affected the final averages (having raised the total number of ontological tags from 6.0 to 6.1 on average). Interestingly, at stage 2 users added fewer tags (using the free-text interface) than at stage 1 (3.4 vs. 5.8). We also noticed that seven of the users (30 %) of this group chose no tags at stage 2, and one user chose no tags at all at stage 1; their results were not considered when calculating the averages for those stages. These observations probably reflect user satisfaction with the ontological tags, as they did not feel a need to add free-text tags after having all the ontological tags from stage 1. For group G_FM (Table 4, row 4) we noted that four of the users added no tags at stage 2; their results were not considered when calculating the averages for this stage. Two of the users chose no tags at all at stage 1 and two other users chose no ontological tags at stage 2, but instead these users added a lot of free tags at stage 2. If we do not count these four users’ results at stage 2, then it appears that overall for this group a few more ontological tags (3.9 instead of 3.6) and fewer free text tags (4.8 instead of 5.2) on average were given per image per user at both stages 1 and 2, since most (74 %) of the free-text tags at stage 2 were given by those four users. In comparison to group G_M (where users had only stage 1 with
226
Zhitomirsky-Geffet, Bar-Ilan, Miller, Shoham
the mixed interface), the results of group G_FM are a bit more favourable for the ontology interface (40 % of ontological tags vs. 31 %).
Group
G_M
G_OM
G_FM
Added ontology tags Added free- text tags Deleted ontology tags Deleted free- text tags Total amount of ontology tags Total amount of free-text tags
1 384 9 25 445 1354
23 440 18 17 1501 1139
18 618 4 20 1053 2191
Table 5. Summary of stage 3 results.
Next, we examined and analysed the results of the third stage of our experiment, in which users could see other users’ tags and make changes in their own tag lists from previous stages. We can see (Table 5) that for all the groups, users mostly added tags (generally they added free-text tags), thus leading to an assumption that the ontology tags were more optimally exploited at the previous stages. We further observed that most of the deletions of the ontological tags at stage 3 (7 out of 9 for group G_M, 3 out of 4 for group G_FM, and 15 out of 18 for group G_OM) and many of the free-text tags (12 out of 20 for group G_FM and 11 out of 17 for group G_OM) were performed by new immigrant users, who are probably less familiar with the Jewish cultural heritage concepts and images. Therefore, after seeing the other users’ tags, they deleted some incorrect tags that they gave at the previous stages. We also found that for group G_FM all the added ontological tags were added by two users, who added no ontological tags at stage 2, while all 23 additions from the ontology for group G_OM were made by one user, who added relatively few (2.8 on average) ontological tags at stage 1.
Chapter 9. Exploring the effectiveness of ontology based tagging
Group
G_M
Amount of tags Maximal TPR value Number of tags with TPR-1 Average TPR value G_OM Amount of tags Maximal TPR value Number of tags with TPR-1 Average TPR value G_FM Amount of tags Maximal TPR value Number of tags with TPR-1 Average TPR value
Stage1 – free tags
Stage1 – After ontologi- Stage2 & cal tags 3 – free tags
227
After Stage2 & 3 – ontological tags
Free tags that have also appeared in the ontology
Free tags that have not appeared in the ontology
1009 22
453 11
1354 30
445 11
563 16
791 30
322
16
323
20
42
281
2.14
4.08
2.76
4.01
4.27
2.20
---
1495 60
1139 18
1501 62
414 12
725 18
--
10
323
8
51
272
--
10.91
2.29
10.96
2.82
2.07
1436 23
---
2191 28
1053 37
1019 22
1172 28
353
--
339
13
29
310
2.58
--
3.71
7.8
6.18
2.75
Table 6. Comparison of TPR values of free vs. ontological tags.
Tag popularity rank values comparison In this subsection, we compared the quality of the free-text versus ontological tags in terms of their average tag popularity rank values (TPR). A TPR value is computed as a frequency of tag usage for a given image by different users. As shown in the literature, the most frequently used terms are the preferred source for thesaurus construction (Lancaster, 2003; Soergel, 1974; Stvilia & Jörgensen, 2010) and for objects categorization (Lakoff, 1988; Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976). Hence, it is assumed that higher TPR values reflect higher tag reliability. As can be seen in Table 6, we obtained higher average TPR values for ontological tags for all the user groups. We note that our results differ from those of
228
Zhitomirsky-Geffet, Bar-Ilan, Miller, Shoham
Ménard (2009), which reported a superior quality of the combined approach. In our case, at the first stage of the experiment, the mixed interface (G_M) was outperformed by the two other interfaces. Both the average TPR values and the amount of distinct tags per image (Table 1) for this interface were lower than for the others. We further examined the tags given by users as free-text tags but that also appeared in the ontology. These tags constituted 36 – 46 % of all the free-text tags for different groups. Interestingly, these tags’ TPR values were always significantly higher than those of the free-text tags that were not present in the ontology (see Table 6, columns 7 and 8). It can thus be concluded that the quality of the ontological tags is higher even when they are given through the free-text interface and not fetched from the ontology. During the tag modification process at stages 2 and 3, users mostly added free-text tags that were popular at the previous stages, thus the maximal TPR values rose quite significantly after completing the tagging process for all the groups. On the other hand, TPR values for ontological tags have changed very little. This is due to a small number of changes in these tags after the first stage. This behaviour of the users shows that unlike the free-text tags, the ontological tags are initially quite reliable and trustworthy, and hence need not be changed even after exposure to any additional information. Another interesting finding was that there was almost no overlap between top free-text and top ontological tags, while on the other hand, a rather high overlap exists between top free-text tags given through different tagging interfaces and also between the top ontological tags from different groups. To measure the above overlap rates, we created lists of the top 20 tags (ranked by their TPR values) for free-text and ontological tags in the different groups. Then, we computed the percentage of common tags in the different lists. Thus, for group G_OM, the overlap between the free-text and the ontological tags was 15 % (3 out of 20 tags), the same result was obtained for G_FM (3 common tags), and for G_M only 2 tags from the list of the top-20 free-text tags appeared in the list of the top 20 ontological tags. However, ontological tags’ lists of groups G_FM and G_OM had 70 % of tags in common. For G_FM-G_OM, 50 % of the tags were identical. A lower overlap rate of 25 % was presented for G_M-G_OM. Also, for free-text tags, the 50 – 60 % overlap rates were reported. In particular, for G_M-G_FM, there were 12 out of top-20 shared tags, and G_M-G_OM had half of their top 20 tags in common, as well as the G_FM-G_OM’s lists which shared 50 % of the tags. In addition, it appears that most top free tags (18 out of 20) belong to new images: Jewish national foundation, Chanukah, Chanukah spinning top, Tablets of Testimony, Jewish charity box; while most of the top ontology tags belong to base images: Menorah, Ancestor graves, prayer, synagogue. Therefore, we conclude that the free association tags and the ontology can be used as complementary tag sources, which confirms
Chapter 9. Exploring the effectiveness of ontology based tagging
229
the results reported in Stvilia and Jörgensen (2010). Furthermore, these results demonstrate that the ontology could be dynamically extended via the collaborative tagging process.
TPR-1 tag analysis and ratios comparison In Graph 1, the TPR values for various user groups are displayed. As can be viewed, there is a long “tail” of low TPR values for free tags for all the groups (the dashed lines), while most ontological tags are concentrated at the top TPR values (the solid lines). As shown in Table 6, the total amount of free-text tags is usually larger than the amount of ontology tags, but many of them (25 % or more) have appeared just once, while there are very few TPR-1 ontological tags (less than 5 %). This means that the ontology tags are typically more reliable and popular, while there is a low agreement between users about the free-text tags. A thorough analysis of tag content confirms the results of Neal (2008), as it appears that most of the tags name objects and events that are associated with the given images. However, the difference between the most popular and the least frequent (TPR-1) tags is that the most prominent tags provided the meaning or the interpretation of depicted object or event (“Israel state independence declaration”, “Sukkoth holiday”) while many free-text TPR-1 tags included a detailed descriptions of what is physically present in the image (like “a happy man ready to work”, “many people crowded in the small room”), or just stated what is depicted in the image (e.g. “beard”, “blue shirt”) without providing any interpretation or association to the meaning of what they see.
230
Zhitomirsky-Geffet, Bar-Ilan, Miller, Shoham
Graph 1. TPR values comparison of ontological vs. free tags. Axis X displays the number of tags, while Axis Y represents the TPR values for those tags.
Conclusions In this chapter, we examined and analysed typical user behaviours in the task of tagging images using various interfaces and tools: free-text tagging, ontologybased tagging and a mixed interface incorporating both of the above options. The main goal was to explore user behaviour and the quality of the results achieved by using each type of tool. In particular, we were interested in learning how effectively ontology could be applied in the tagging process. To this end, three different groups of users participated in the experiment, which was divided into three stages. The differences between the groups were in the order in which they were exposed to each of the interfaces. Group G_OM had to first select tags from the given ontology, and then added more tags through the free-text interface. Group G_FM first worked with the free-text interface, and then at the second stage they added more tags from the ontology. Group G_M could use both free-text tagging and ontology simultaneously. Finally, at stage 3, every user in each group could see tags given by other users of their group, and change her choice by adding or deleting some tags from her own tag list.
Chapter 9. Exploring the effectiveness of ontology based tagging
231
To measure the quality of tags produced via the different interfaces, four new evaluation criteria were devised: 1) tag usage ratio, 2) tag stability rate, 3) average tag popularity rank, and 4) TPR-1 tag ratio. The experimental data was analysed according to these criteria, and the main result was that ontological tags outperformed the free-text ones according to all the qualitative criteria (2, 3 and 4), demonstrating a higher quality of the ontology tags. We also found that the order of using the interfaces had crucially affected the user performance. Thus, group G_ OM, in which users worked with the ontology at the first stage of the experiment, produced the highest number of ontological tags per image on average (11.2) and also assigned more ontological than free text tags overall (57 %). However, users who were provided with a free-text interface as a first option chose only 25 – 35 % of tags from the ontology overall. A thorough analysis of individual user behaviour revealed that many of the tags chosen through the free-text interface actually appeared in the ontology, but users were not able to or not willing to look them up in the ontology. Furthermore, there were some users who preferred to not to use the ontology at all throughout the entire experiment. Another interesting finding was the fact that for images which had more suitable concepts in the ontology (i.e., the base images), many more tags from the ontology were indeed selected by the users (45 % – 83 % out of all the assigned tags) than for the new images (14 %-45 % out of all the given tags), which are less represented by the ontology concepts. Stage 3 of the experiment was mostly dedicated to adding more free-text tags, with almost no ontology tags being changed across all the groups’ users. We conclude that ontology can be very effectively employed for image tagging, when no other interfaces are available at the time of or before seeing the ontology. Many users saw their selection list of ontological concepts as complete and sufficient enough to describe the given images, meaning that they did not see a need in adding free text tags at all. However, when having access to some easier interfaces, users slightly prefer working with them rather than searching the ontology. Thus, in future work, we suggest to explore and devise easier ways for ontology search and browsing.
Acknowledgements This research was supported by The Israel Science Foundation (grant No. 307 / 07).
232
Zhitomirsky-Geffet, Bar-Ilan, Miller, Shoham
References Bar-Ilan, J., Zhitomirsky-Geffet, M., Miller, Y., & Shoham, S. (2010). The effects of background information and social interaction on image tagging. Journal of the American Society for Information Science and Technology, 61(5), 940 – 951. Brown, P., Hidderley, R., Griffin, H., & Rollason, S. (1996). The democratic indexing of images. New Review of Hypermedia and Multimedia, 2, 107 – 120. Garcia Castro, R., & García-Silva, A. (2009). Content annotation in the future Web. The European Journal for the Informatics Professional, X(1), 27 – 32. Gruber, T. (2007), Ontology of folksonomy: A mash-up of apples and oranges. International Journal on Semantic Web and Information Systems, 3(1), 1 – 11. Gruber, T. (2008). Collective knowledge systems: Where the social web meets the semantic web. Journal of Web Semantics, 6(1), 4 – 13. Hollink, L., Schreiber, G., Wielemaker, J., & Wielinga, B. (2003). Semantic annotation of image collections. Workshop on Knowledge Markup and Semantic Annotation (KCAP’03). Retrieved from http: // www.st.ewi.tudelft.nl / ~hollink / pubs / Hollink03_saic.pdf Kallergi, A., Bei, Y., & Verbeek, F. J. (2009). The ontology viewer: Facilitating image annotation with ontology terms in the CSIDx imaging database. Proceedings of VISSW 2009, CEUR-WS Proceedings, 443. Retrieved from http: // smart-ui.org / events / vissw2009 / papers / VISSW2009-Kallergi.pdf Kherfi, M. L., Ziou, D., & Bernardi, A. (2004). Image retrieval from the World Wide Web: Issues, techniques, and systems. ACM Computing Surveys, 36(1), 35 – 67. Kim, H. L., Scerri, S., Breslin, J. G., Decker, S., & Kim, H. G. (2008). The state of the art in tag ontologies: A semantic model for tagging and folksonomies. Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications, (pp. 128 – 137). Retrieved from http: // dcpapers.dublincore.org / index.php / pubs / article / viewPDFInterstitial / 925 / 921 Lakoff, G. (1988). Cognitive semantics. In U. Eco, M. Santambrogio, & P. Violi (Eds.), Meaning and mental representations (pp. 119 – 154). Bloomington, IN: Indiana University Press. Lancaster, F. W. (2003). Indexing and abstracting in theory and practice (3rd ed.). Champaign, IL: University of Illinois, Graduate School of Library and Information Science. Ménard, E. (2009). Ordinary image retrieval in a multilingual context: A comparison of two indexing vocabularies. Paper presented at the ISKO UK 2009 Conference, London, UK. Retrieved from http: // www.iskouk.org / conf2009 / papers / menarde_ISKOUK2009.pdf Neal, D. (2008). News photographers, librarians, tags, and controlled vocabularies: Balancing the forces. Journal of Library Metadata, 8(3), 199 – 219. Passant, A., & Laublet. P. (2008). Meaning of a tag: A collaborative approach to bridge the gap between tagging and linked data. Proceedings of Linked Data on the Web (LDOW 2008). Retrieved from http: // events.linkeddata.org / ldow2008 / papers / 22-passant-laubletmeaning-of-a-tag.pdf Peterson, T. (1994). Introduction to the Art and Architecture Thesaurus. Oxford: Oxford University Press. Rorvig, M. E., Turner, C. H., & Moncada, J. (1999). The NASA image collection visual thesaurus. Journal of the American Society for Information Science, 50(9), 794 – 798. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382 – 439.
Chapter 9. Exploring the effectiveness of ontology based tagging
233
Shatford, S. (1986). Analyzing the subject of a picture: A theoretical approach. Cataloging and Classification Quarterly, 6(3), 39 – 62. Soergel, D. (1974). Indexing languages and thesauri: Construction and maintenance. Los Angeles: Wiley. Stvilia, B., & Jörgensen, C. (2010). Member activities and quality of tags in a collection of historical photographs in Flickr. Journal of the American Society for Information Science and Technology, 61(12), 2477 – 2489. Surowiecki, J. (2004). The wisdom of crowds. New York: Anchor. Trant, J., Bearman, D., & Chun, S. (2007). The eye of the beholder: Steve.museum and social tagging of museum collections. Proceedings of the International Cultural Heritage Informatics Meeting (ICHIM07). Retrieved from http: // www.archimuse.com / ichim07 / papers / trant / trant.html Trant, J. (2009). Tagging, folksonomies and art museums: Early experiments and ongoing research. Journal of Digital Information, 10(1). Retrieved from http: // journals.tdl.org / jodi / article / viewArticle / 270 van der Waal, H. (1985). ICONCLASS: An iconographic classification system. Amsterdam: KNAW. Zhitomirsky-Geffet, M., Bar-Ilan, J., Miller, Y., & Shoham, S. (2010). A generic framework for collaborative multi-perspective ontology acquisition. Online Information Review, 34(1), 145 – 159.
Kathryn La Barre, Rosa Inês de Novais Cordeiro
Chapter 10. That obscure object of desire: Facets for film access and discovery Abstract: This project, Films and Facets, assessed written participants’ (n=12) responses (summaries and keywords) to three films across two cultures (American and Brazilian). The design sought to validate the utility of two parameters of analysis (facet analysis and filmic analysis) for identifying salient features that can be used to enhance access to moving images. The research also looked for patterns of similarity and divergence in participant responses across cultures. The results provide insights into the creation of enhanced indexing strategies that might provide more robust access and discovery by highlighting subject-related attributes such as theme, action and genre for moving image resources across a variety of information environments (digital repositories, libraries and archives). Keywords: Film discovery and access, moving image retrieval, facet analysis, filmic analysis, digital moving images, film archives, digital libraries
Kathryn La Barre (corresponding author), Assistant Professor, School of Library and Information Science, University of Illinois at Urbana-Champaign, [email protected] Rosa Inês de Novais Cordeiro, Professor, Department of Information Science, Universidade Federal Fluminense, Niterói, Brazil
Introduction Luis Buñuel’s film That Obscure Object of Desire (1977) tells the story of an aging French man who pursues a dysfunctional and frustrating romance with an elusive Spanish woman. The film serves as an apt metaphor for moving image access. Images speak a native language that evades understanding, even in a world culture that foregrounds the primacy of the visual. Robust solutions to discovery and access of visual resources have tantalized and eluded researchers in computer science, information science, and cognate disciplines. This chapter provides a brief overview of the current state of the art in information retrieval for moving images, and highlights the findings of the ongoing research project Films and Facets.
Chapter 10. That obscure object of desire
235
This exploratory project tested the use of facet and filmic analysis (CRG, 1957; La Barre, 2006, 2010; Ranganathan, 1933, 1937; Spiteri, 1998) as strategies for identifying ways to enhance access and discovery of moving image resources. Developed jointly at the University of Illinois at Urbana-Champaign and Universidade Federal Fluminense in Rio de Janeiro, Brazil, the study assessed similarities and divergences in the content of written participant responses (n=12) to a set of three films across two different cultures (American and Brazilian). Participants were asked to identify salient film characteristics in the form of keywords and film summaries. The researchers conducted facet and filmic analysis of these written narratives as a way to uncover insights to support the creation of enhanced strategies for moving image representations. These insights could provide support for the creation of more robust access and discovery interfaces for films across a variety of information environments (digital repositories, libraries and archives). One enhancement suggested by the findings provides support for the importance of indexing strategies for moving images that focus on subject-related attributes such as theme, action and genre.
Background The following overview of the literature will discuss (1) theoretical approaches to image analysis and representation, and (2) approaches to information access and discovery in moving image repositories.
Image analysis and representation Access to information is a prevalent theme in library and information science literature. Creating access to information resources involves the basic processes of analysis, translation, and representation. Current practice in creating simple documentary representations often includes encoding information about content and indications of how and where content is available. Representations, such as abstracts or bibliographic records, may be provided in order to assist users in discovering and retrieving complete documents, or documentary fragments such as movie trailers, book chapters or journal articles. Format is another salient characteristic included in representations, since a given resource may be available in a variety of formats or media types. Creating access to image resources requires some forms of analysis that differ from the process of creating representations of textual materials. Any discus-
236
La Barre, de Novais Cordeiro
sion of image analysis and representation would be incomplete without reference to Shatford (1986) and Panofsky (1991). Shatford draws upon Panofsky’s famous synthesis of the “Warburg School” approach to iconography. Panofsky’s approach relies upon three levels of analysis, or interpretation, of a work of art: pre-iconographic, iconographic and iconological. Analysis at the pre-iconographic level focuses on the description of primary objects and the natural significance of events represented by an image. At the iconographic or secondary level, the mythic and symbolic description of an image is highlighted. The iconological, or intrinsic, level was of most interest to Panofsky. Here the subject of a work of art is exposed, and the meaning revealed. This level of analysis and the description, or representation that is created, goes beyond the scope of the artistic object itself, and attempts to situate the object in time and space (for example, historically and geographically). The iconological level “is apprehended by ascertaining those underlying principles which reveal the basic attitude of a nation, a period, a class, a religious or philosophical association – qualified by one personality and condensed into one work” (Panofsky, 1991, p. 52). According to Argan (1992), for Panofsky, the world of images was an orderly world that could be uncovered by principled analysis. Shatford Layne’s (1994) approach to indexing and access of images considers two aspects: a) qualities of images; b) image groupings – not individual images. Each image has unique qualities within a given discipline; “these qualities can be classified according to a general classification theory, being applicable to all the images” (pp. 583 – 584). Some researchers posit that the creation of adequate image representations is dependent on a conceptual process that exposes connotative (or associative) levels of meaning. Baxter and Anderson (1996) addressed the challenges of connotation and the efficacy of image indexing and retrieval in a database using controlled vocabulary access terms from thesauri, including the Art and Architecture Thesaurus (AAT) (n.d.) and Iconclass (n.d.). Armitage and Enser (1997) determined salient connotative image qualities through a user-focused approach grounded in analysis of user queries. Greisdorf and O’Connor (2002) and Jörgensen (2003), in turn, focus on a cognitive view of system interactions and salient qualities for users who seek image resources. Greisdorf and O`Connor (2002) confirmed that hierarchical levels of perception and analysis, similar to those discussed by Panofsky, assist indexers in the creation of image representations. These levels are ordered as follows: (1) primitive characteristics (colour, shape and texture); (2) objects (person / thing; placer / location; (3) action (activity; event); (4) inductive interpretation (representation, subjective belief, symbolic value); (5) environment (general feeling) and (6) emotional stimuli (individual effect). Research by Greisdorf and O’Connor (2002) and Jörgensen (2003) has established that users frequently seek to find an image by attempting to formulate a search query that
Chapter 10. That obscure object of desire
237
expresses the connotative messages that exist within an image. Yoon (2008) corroborates the importance of connotative levels with cross-cultural research that sought to determine the importance of connotative image qualities to two groups of participants (Koreans and Americans). Connotative attributes, such as any associations, emotional impressions, and consequential mental states of viewers, were found to be a critical aspect of image searching across both cultural groups. Yoon (2008) also demonstrated that cultural differences do have a significant influence on searcher expectations and consequent success in finding images. Prevalent models of information search behaviour, especially those that account for emotion and affect, are also centrally important to the enterprise of creating useful and effective representations for information resources. In this vein, the most important model for the present study is Kuhlthau’s (2004) Information Search Process. This model is often used to analyze the efficacy of information access and the affective and emotional aspects of information search. Although this research did not analyse collections of images or films, it is a promising model for future research into creating enhanced digital access to collections of images and films. Nahl and Bilal (2008) detail the trajectory of the “affective paradigm” in information behaviour research in their examination of information seekers across a variety of demographic features (age range, profession, socio-cultural context, etc.). By exploring the relationship between personal emotions and information seeking, Nahl and Bilal (2008) indicate how environment, personal experience and interests, level of expertise, prejudices and personality traits of the users may result in unique experiences and relationships between users and the information they seek. According to Mentis (2008), studies on the affective paradigm in library and information science frequently discuss emotional responses to lost efficiency or time wasted in searching. Intense emotions may occur even in the absence of poor task performance or a failed information seeking experience. Although some searchers experience strong frustration, even those who feel successful report a sense of disappointment in the way that impediments during the search process caused them to alter their initial information seeking goals. Mentis proposes that a more accurate description of disappointment in information seeking is needed to move forward the agenda of creating more efficient information systems that can account for differences in experience, understanding and outcome. Because of the critical importance of the emotional experience of information seeking, awareness of the “affective paradigm” underpins the research design for Films and Facets.
238
La Barre, de Novais Cordeiro
Approaches to image access in moving image repositories This section begins with a brief overview of research that examines the unique considerations of moving image access and discovery. The main focus is upon intellectual indexing approaches that represent hybrid human-machine efforts or wholly human-centred efforts. It will also describe four digital film repositories that were assessed in order to capture a snapshot of current practice: the video area of the Internet Movie Database (IMDb) (n.d.), Moving Image Archive (MIA) (n.d.), Petrobras’ Porta Curtas (n.d.), and the BBC Motion Gallery (n.d.). Enser (2008a, 2008b), while lamenting the dearth of moving image studies, provides a detailed overview of contemporary research in image indexing and retrieval. One approach, content-based image retrieval (CBIR), differs from traditional information retrieval because it does not rely on textual descriptions or controlled vocabularies. Rather, image retrieval via CBIR uses image-based content, including colours, shapes, textures or other information that can be extracted from an image. CBIR allows users to select an existing image and conduct a pattern-based search. Enser focuses on the innovative and automated approach taken by the Informedia Digital Video Library project that began in 1994 at the School of Information Science of Carnegie Mellon University in Pittsburgh as the first large-scale digital library project to focus on moving images. This project used an automated approach to full content search and retrieval, with a search engine that used speech recognition, natural language processing, and algorithmically generated summary descriptions. A number of contemporary projects are similar to this pioneering effort in their use of fully automated approaches. These initiatives are outside the scope of inquiry for this project. The research design of Films and Facets focuses on human-generated approaches to image representation. These approaches include those that may rely on free text summary descriptions or user generated tags (Ménard, 2009; Neal, 2007; Ransom & Rafferty, 2011). This project also operates with an awareness of the standards for subject description that privilege controlled vocabularies for the creation of access to visual image materials. These vocabularies include the Art and Architecture Thesaurus, Iconclass, and the Moving Image Genre-form Guide (1998). Indexers and cataloguers also create representations using a variety of metadata standards including VRA Core and Dublin Core (Enser, 2008b; Getty Research Institute, 2011; Jörgensen, 2003; Trant, 2009). Successful approaches, whether human or machine-based, must balance multiple agendas as they seek to satisfy a wide range of users while balancing costs and benefits, leveraging legacy or crowd-sourced metadata, and supporting disparate tasks. As Enser (2008b) discussed previously, most studies of image retrieval concentrate on still images in familiar formats such as photography, painting,
Chapter 10. That obscure object of desire
239
drawing, and diagrams. This focus is slowly changing as libraries, archives, museums, cinematheques, and other information units digitize their holdings and mount them as digital collections or libraries of moving images. One example of such efforts is the Moving Image Archive, part of the non-profit Internet Archive. This digital library of textual, audio and moving image resources collaborates with stakeholders such as institutions like the Prelinger Film Archive or international citizens who upload personal files of their own independently created filmic works. Perhaps the best-known digital moving image repository is the Moving Image Collection (MIC), which began in 2002 as a joint partnership of the Association of Moving Image Archivists (AMIA) and the Library of Congress. A distinctive feature of the MIC has been the creation of a flexible proprietary metadata format that allows institutions to contribute original descriptive records for films and maps (or crosswalks) the original to the MIC format. This challenging approach supports diversity in metadata while promoting interoperability (Andreano, 2007). While institutionally-created metadata remains important, inviting user-created metadata contributions can lessen the time and costs of moving image metadata creation while enhancing access. As a case study in best practice, Andreano (2007) describes the collaboration of the Prelinger collection and the Internet Archive. As a pilot, the Prelinger uploaded a series of ephemeral, public domain films, and invited users of the Internet Archive to add comments and tags as a way to enhance discoverability of the Prelinger contributions. Strengths of this crowd-sourced approach include expanded information about content, while the drawbacks include flawed or misleading entry points. Hertzum’s (2003) research shifts from an institutional viewpoint to that of the searcher. This project analysed one year of email reference queries – for texts, photos, film music and video – sent to the Deutsche Film Institut (DIF), a film archive. Hertzum’s data included 275 email requests from student researchers, organizers of festivals and exhibitions, academic researchers and commercial enterprises. Nearly half (43 %) of the requests concerned known items and the rest contained subject-related information, such as theme or genre. Nearly 40 % of the subject requests also included production-related attributes, such as year, title or director. Subject-related requests also contained queries about contextual information (17 %), such as reviews or festival information, and content-related information, such as location. This result points to the importance of including subject attributes in moving image representations. Wildemuth, Oh, and Marchionini (2010) also focused on user queries and search tactics by assessing transaction logs for a video retrieval system. They found that image search strategies were more complex than those in text repositories, and that searchers frequently resort to browsing result sets as a primary
240
La Barre, de Novais Cordeiro
information seeking strategy. This validates results of two earlier studies, Jörgensen and Jörgensen (2005) as well as Matusiak (2006), that indicated forcing searchers to rely on textual descriptions for images is likely to result in a disconnect between searcher expectations and retrieved result sets – and cause searchers to formulate increasingly complex search queries. These findings add further support for the importance of including both subject and connotative attributes in moving image representations. Other recent projects, e.g., Zhang and Li (2008), shift the focus from user queries and searching behaviours to an examination of tasks that motivate the moving image seeker. Zhang and Li sought to validate whether the tasks they observed in moving image seekers mapped to the four tasks – find, identify, search, obtain – which lie at the heart of the bibliographic entity relationship model Functional Requirements for Bibliographic Records (FRBR). According to Tillett (2004, pp. 2, 5) “FRBR offers … a fresh perspective on the structure and relationships of bibliographic and authority records, and also a more precise vocabulary to help future cataloging rule makers and system designers in meeting user needs.” Zhang and Li found that the tasks of moving image seekers map directly to the four FRBR tasks. The tasks they observed were as follows: looking for particular attributes or object relationships (find), trying to confirm if a found item is the correct item (identify), wanting to locate an item with the desired content and format (select), and wanting to acquire a found item (obtain). McGrath, Kules, and Fitzpatrick (2011) also took a task-based approach in their evaluation of the design of the Online Audiovisual Cataloger (OLAC) (n.d.) prototype discovery interface. This interface provides facets for browsing and navigation that answer two sets of questions: (1) “What do you want to watch?” (2) “How do you want to watch it?” A facet for location (“Where do you want to watch it?”) is also available as a search refinement. This design was an attempt to allow information seekers to “start their search at any point in the FRBR hierarchy, from Item (location) to Work (genre, date), and easily transition between search and browse strategies, using facets to broaden or narrow their results and pivoting on facet values” (pp. 49 – 52).
Current approaches to image access in four moving image repositories To provide a framework for comparison between the findings in these aforementioned studies and the findings from the Films and Facets project, the researchers assessed access points provided by four major archives of moving images: IMDb, BBC Motion Gallery, Moving Image Archive, and Petrobras’ Porta Curtas.
Chapter 10. That obscure object of desire
241
Each of these four repositories provides free access to a minimum of 700 digital films through richly descriptive representations. The following table highlights the most common search or browsing features (or access points) for films on each site:
Videos-IMDb [US / UK+]
Moving Image Archive [US]
Porta Curtas [BRAZIL]
BBC Motion Gallery [UK]
Summary Date Title
X X X
X X X
X X X
X X X
Associated persons Genre Rights length
X X X X
X X X
X X X
Colour, format Keyword, subject Source
X X X
X X X X
X X X
X X X
X
Table 1. Common access points offered for search and browse on four moving image archive sites.
All four sites provide searchable film summaries. Most provide access via descriptions of associated persons, genre designations, by film length and type of access. Also prevalent are access points such as colour (e.g., whether or not a film is black and white or colour), by keyword, tag or subject headings, by type of format (e.g., 16mm or digital file) and by source (location of the original film). Less common access points (offered by two of the sites) include location (either for production or creation), awards, ratings, user-contributed reviews, and search by quotation (e.g. film dialogue). Two sites also allow viewers to search by related items – such as derivative items, reviews and news articles, or by associated persons, such as actors or directors. Search by language, identifier (such as class number or accession number), film character or festival is far less common, each being offered by only one site. Most of the sites allow visitors to filter search results by at least some of these access points, and all provide an advanced search mechanism. These search and discovery features will be revisited in the Discussion section and contrasted with the findings from the current study.
242
La Barre, de Novais Cordeiro
Films and Facets Films and Facets used the technique of facet analysis in conjunction with applicable principles of film study to help identify the salience of item features such as item format, and user characteristics such as language, experience, understanding and affect in textual descriptions created by study participants. Facet analysis is a useful tool that emphasizes the importance of domain knowledge and the various aspects, dimensions, categories, characteristics and associations that are necessary for optimal information organization and access. Vickery (1980) notes that facet analysis is especially useful because a single object can be classified in many ways and with different objectives: a rabbit can be classified as a rodent, a fur bearing animal, an herbivore, a pet, etc. in accordance with the point of view of the classifier, or the needs of the information seeker (Vickery, 1980).
Study design The following sections describe the design considerations that guided this study. The main objective of the field research was to analyse written viewer responses to three films in order to extract salient aspects that could improve retrieval and access to films in a variety of information environments (digital repositories, libraries and archives). This research relied on a cross-cultural approach involving subjects from America and Brazil. This research had two main objectives: (1) to verify the possibility of the use of facet analysis and filmic analysis in the interpretation of the information registered by the participants about the films; (2) to observe cross-cultural (national) similarities and differences in participant responses.
Films Moving images, the object of this study, are complex resources. Many different manifestations of a moving image resource may exist (e.g., a movie or a video game in multiple formats). Films and Facets uses the term ‘moving image’ in the sense commonly used in library and information science literature, and as a way to emphasize that the project is not concerned with still images, such as photographs or illustrations. The object of study is one type of moving image: films. A compendium of devices that include text, image, moving images and sound (noise, songs, and speech; dialogues and commentaries; pauses and silence) work together in the context of cinematic films. In order to provide a variety of
Chapter 10. That obscure object of desire
243
filmic experience, the sample for this study included two short, silent films, one contemporary and one from 1909. Silent films, especially during the early days of cinema, usually contained intertitles (a frame of printed text inserted into a moving image at various points to convey dialogue, or narrative related to the material). Additionally, film exhibition spaces often provided live musical accompaniment. Films were selected for this study along the following parameters: – Availability, in the public domain or via Creative Commons license; – Genre – Fiction; – Film duration (both short and full feature); – Language of participants: (English with Portuguese subtitles, or silent films); – Image quality; (no issues with viewability) – Cinematographic language. One film was chosen from each category: – Classical narrative: A Corner in Wheat (1909) by Griffith (13:52), silent; – Transition from classical to modern cinema: 39 Steps (1936) by Hitchcock (1:32), sound; – Contemporary cinema: First Love (2006), by Flight and Camac 6:57 (silent). The selected films vary in length (two short and one full length feature) so that all three could be watched in one sitting out of consideration for the participants, and to ensure completeness of data collection.
Viewers The participants in this study constituted a sample of convenience. Volunteers were contacted via listserv invitation and flyers at an American and a Brazilian department of Library and Information Science. Twelve graduate students (six American students from the University of Illinois, and six Brazilian students from the Universidade Federal Fluminense) were recruited. Participants were asked to view three films. After viewing each film, participants were given a film analysis form on which they were asked to: (1) Write out a summary of the film, (including information that you consider relevant to an information unit user (collections of archives, libraries, museums or repositories) (2) List keywords that you consider important to access information from the film by an information unit user. Participants were advised that they were not expected to conduct formal filmic analysis.
244
La Barre, de Novais Cordeiro
Cross cultural considerations Experiencing a film, whether by viewing it – in the case of the study participants – or reading a description – in the case of an indexer – is a mode of communication through which an individual interprets or creates a sense of aboutness for the content of a film. During this process, film content is filtered through a viewer’s own unique cognitive processes, cultural background, and personal experiences. This research sought to capture each participant’s unique process of interpretation, in the form of keywords and summaries. The process interpretation implies high levels of interactivity between a viewer and a film (Kuhn, 1982; Machado, 2007). During the analysis of participant responses, several aspects were noted: the form and manner of descriptions of characters, objects, and events, as well as references to staging or narrative devices (mise-en-scène). The researchers applied facet and filmic analysis to these products of meaning in order to identify patterns of interpretation. The analysis phase followed Ricoeur’s (1994) exhortation “to make up a plot is already to make the intelligible spring from the accidental, the universal from the singular, the necessary or the probable from the episodic” (pp. 70 – 71). It is hoped that this exploratory research might inform new procedures for studying film interpretation across cultures even though the sample was quite small, and the results may not be generalized. The cross-cultural dimension of this study also adopted Benedict’s (1972) views of culture as a lens of reality through which man observes the world and particularizes it to his own universe of experience. By observing American and Brazilian students, the study design attempted to elicit the presence or absence of cultural influences in written filmic interpretations. In addition to the small sample size, there are other limitations to this approach. We recognize that social life and personal identity are frequently influenced and mediated by the global market of styles, advertising, international travel, media images and the globally interconnected communications system. As a result, social life and personal identity may become fragmented, disconnected, or displaced; from time, personal stories and specific traditions (Hall, 2006). Even so, “there is, along with the impact of ‘global,’ a new interest in ‘local.’ Globalization … explores local differentiation. So instead of thinking of global as in ‘replacing’ the local, it may be more accurate to think of a new relationship between the ‘global’ and ‘local’” (Hall, 2006, p. 77). Thus, within the realm of culture, the records generated by the participants can be understood as material vehicles of perception, of emotion and of understanding of a number of symbols thought to be significant by each viewer. Each of these observations tempered the analysis of participant responses.
Chapter 10. That obscure object of desire
245
Methods Facet analysis The authors posit that facet analysis, in concert with filmic analysis, provides an ideal set of techniques for creating enhanced access to moving images. Vickery (1966, p. 15) pointed out a fundamental construct of information access: (1) “normalize the language [understood here as concepts] of the documents on the one hand and the language [concepts] of the [user’s] questions on the other.” The best possible outcome is a match between the concepts present in user queries and those in the indexed documents. Vickery asserts that an indexing language such as a thesaurus or subject-heading list is “not only a retrieval tool but a conceptual tool that defines what a person really wants to know.” Indexing languages also assist “indexers in the intellectual task of characterizing the subject contents of the document” and “provide a tool to the searcher in analysing and defining questions put to the file.” As per Vickery (1966) and Speziali (1973), facet analysis enables retrieval by providing maps of concepts and the relations of those concepts. The technique of facet analysis begins by asking a series of questions with regard to a given item: What concept or concepts does it represent? In what conceptual category should this concept be included? What are the class relations between this concept and other concepts included in the same category? (La Barre, 2010; Vickery, 1966) Consequently, it “offers a set of principles and techniques that have now been applied in a variety of subject fields, and . . . shown to be workable and useful. It is potentially of considerable value to all prospective designers of retrieval systems in special fields” (Vickery, 1966, p. 10). Today, the greatest challenge to the future of facet analysis is to determine the extent and manner in which this technique might be adopted and adapted for use in rapidly developing digital environments (La Barre, 2010). The approach taken in this study integrates traditional facet analytic models as delineated by Vickery (1966), and revisited by La Barre (2010). These require the researcher to clearly define both the (1) subject field (in this case, the audiovisual industry) and (2) the user group of interest. Only after this definitional process is complete, is it possible to formulate facets (in this case, salient aspects of selected films) that fit the information objectives and interests of the users. The next step is to examine a representative set of materials (in this case, participant responses to films). This study replicates the facet analytical model developed in the project Folktales and Facets (La Barre & Tilley, 2010; La Barre & Tilley, forthcoming), which sought to provide enhanced access to folktales while
246
La Barre, de Novais Cordeiro
giving special consideration to the shared and unique information seeking tasks of three distinct user groups: scholars, practitioners and laypeople. This project noted that it is often evident that indexing and retrieval of information resources occurs in several “layers” in order to permit access across different information tasks and multiple user profiles.
Filmic analysis The research design extends facet analysis by incorporating principles of filmic study. Eisenstein carried out the first filmic analysis in 1934. Other approaches to film analysis include Aumont (1996), Vanoye and Goliot-Lété (1994), and Jullier and Marie (2007). This operational definition of filmic analysis for this study is as follows: this method seeks to “transpose, transcode what belongs to the visual (description of filmed objects, colours, movements etc.), of the film (montage of the images), of the film score / soundtrack (music, noises, grains, tones, tonalities of voices) and of the audiovisual (relations between images and sounds)” (Vanoye & Goliot-Lété, 1994, p. 10). As per Journot (2005) as well as Aumont and Marie (2003), a film is analysed when one or more of the following forms of critical commentary are produced: description, structure, interpretation, or attribution. The keywords and summaries created by the participants were descriptions, in that they were tasked with analysing the content of each film. The principles of filmic study proved useful for sharpening the focus of analysis along the following dimensions: intensity of participant response, and participant identification with, or distance from themes in the films they watched. Film characteristics uncovered by filmic analysis will be presented in the context of the findings for each individual movie. The general parameters along which filmic analysis was applied included: intensity of participant discourse, identification with or distance from the plot, and amount of detail and fidelity of descriptions.
Findings The study sought to validate the utility of two parameters of analysis (facet analysis and filmic analysis) for identifying salient features that can be used to enhance access to moving images. The research also looked for patterns of similarity and divergence in participant responses across cultures. The results provide insights into the creation of enhanced indexing strategies that might provide more robust
247
Chapter 10. That obscure object of desire
access and discovery by highlighting subject-related attributes such as theme, action and genre for moving image resources across a variety of information environments (digital repositories, libraries, and archives).
Overview: Facets from participant responses In order to determine facets, Vickery (1966, p. 45) encourages examination of a “representative range of material that interests the user group … as can be supplemented by examination of comprehensive texts in the fields of interest, and of existing wordlists.” He concludes that the result will be “a collection of terms that are candidates for inclusion.” Our candidate terms were drawn from content analysis of written responses, keywords and summaries, created after viewing a set of films. From this, a set of candidate terms that are indicative of the conceptual aspects representing a given field of knowledge – in this case the three films – were drawn. After collocation and normalization, the terms and concepts that remained constituted the preliminary facets, or salient aspects of the selected films. The procedure followed for this study consisted of the following: (1) Each set of participant responses, consisting of a summary and a set of keywords or descriptive terms, was analysed. (2) Keywords were entered into a spreadsheet directly, summaries were reviewed and main concepts noted. (3) Terms that were similar in meaning (e.g. crush and infatuation), or words existing in both plural and singular form (ball, balls) were combined in order to reduce duplication. (4) Concepts from the summaries were noted and also entered into the spreadsheet. Keywords were compared across participants, as were summary concepts. These two data sets were not combined for analysis. Country of origin was coded for each participant. Table 2 illustrates the numerical distribution of terms and concepts.
Total keywords (phrases or terms) Total summary concepts
Corner on Wheat
First Love
39 Steps
US
BR
US
BR
US
BR
97 60
39 70
63 55
43 42
51 73
44 79
Table 2. Comparative table of total assigned keywords (terms and phrases) and total concepts from movie summary descriptions by country (US –United States, BR – Brazil).
248
La Barre, de Novais Cordeiro
As per La Barre (2006) and Vickery (1960), content analysis of the spreadsheet of terms began with a set of fundamental categories drawn from Aitchison, Gilchrist and Bawden (2000): Entities, things, objects; Actions and activities; Space, place, location, environment; Kinds or types; Systems and assemblies; and applications and purposes of these categories. This preliminary list guided analysis by suggesting possible characteristics (facets) that should not be overlooked. It was tailored to reflect user interests, domain knowledge, the characteristics of the entities being analysed, and existing principles of filmic analysis, in particular the work of Aumont (1996), especially with regard to cognitive and cultural factors. The following list displays the facets found in the viewer responses (both summaries and keywords). Filmic analysis provided some of the labels given to items on this list (denoted by items marked with an asterisk); the rest were terms used by participants. – Action: movement or events that propel the story forward. – *Associated person: persons associated with film director, actor, cinematographer, etc. – Character: performer of action in film. – *Commentary: audience narrative of a qualitative nature “I loved this film.” Features include intensity of discourse and identification with or distance from plot. – Emotion: descriptions of emotive aspects of film. – Feature: aspect of film that is not related to genre such as subtitles or special formats availability (streaming). – *Genre: class or type of film that shares common, predictable or distinctive artistic and thematic elements. – Object: props or other aspects of the mise en scène (everything that appears before the camera and its arrangement—composition, sets, props, actors, costumes, and lighting) – *Relation: connections between film and other entities such as books. – *Theme: describes the central characteristic, idea, concern or motif in a film. – Time: indications of chronology in movie, or when film was produced
Overview of findings by film Figure 1 (below) provides an introductory overview of the data by comparing the average distribution of assigned facets across all three movies, and according to participant’s country of origin. Here theme, character, action, and to a somewhat lesser extent genre, rank as the most salient aspects of the three films.
Chapter 10. That obscure object of desire
249
Figure 1. [A] Top facets assigned to keyword terms across movie by country of origin. [B] Average number of top facets assigned to summary concepts across movie by participant’s country of origin (US –United States, BR – Brazil).
Film 1: A Corner in Wheat The Library of Congress deemed D. W. Griffith’s 1909 film of social commentary (based on Frank Norris’ novel The Pit) “culturally, historically, or aesthetically significant.” As a result it is now preserved as part of the United States National Film Registry. In this film, the worlds of a poor farmer and a wealthy but ill-fated
250
La Barre, de Novais Cordeiro
commodity market speculator collide. Table 3 permits a visual comparison of keywords, and summary concepts assigned to the movie by participant country of origin.
Keyword
Facet
US
BR
Total
Capitalism Wheat Wealth Social issues Poverty Com. mkt Silent Farming
Theme Object Theme Theme Theme Object Genre Action
6 4 1 2 2 3 3 2
2 3 3 2 2 1 1 2
8 7 4
Summary concepts
Facet
US
BR
Total
Accidental death Wheat *Price increases Speculation Monopoly *Opulence Wealth Bread Celebrate *Farmer Riot
Action Object Action Theme Theme Theme Theme Object Action Character Action
6 6 3 4 4 2 1 2 2 3 3
6 5 6 5 3 4 4 3 3 2 2
12 11 9 7 6 5
Table 3. Corner on Wheat: Top keywords (terms and phrases) assigned by participant’s country of origin (US –United States, BR – Brazil).*Word not assigned as a keyword.
The top keywords selected by the US participants include ‘capitalism’ [facet: theme], ‘wheat’ and ‘commodity market’ [facet: object], and ‘silent’ [facet: genre]. Both American and Brazilian participants selected ‘wheat’ [facet: object]. Facets from the Brazilian participants are well distributed across several different categories. The keywords generally selected by the Brazilians were more widely distributed but favour terms like ‘wealth’ and ‘opulence’ [facet: theme]. The American group assigned more keywords in total, but the Brazilian terms tended to embody categories of broad concepts such as family farming, economic crisis, cost of living, misery, etc. The pattern is different for the facets for summary concepts – both groups favour action terms such as ‘accidental death,’ ‘price
Chapter 10. That obscure object of desire
251
increases’ (more heavily favoured by Brazil than US participants) object ‘wheat’ and theme ‘opulence’. Filmic analysis provided an additional set of tools that allowed the researchers to unpack the intensity of participant discourse, the extent to which participants identified with the plot, and the level of detail for plot keywords and summary concepts. In the film responses, a clear conflict matrix appears with respect to the social and economic differences of the society presented across both cultural groups. As a result, a series of antonyms emerged that embody the intensity of the filmic discourse: [American] rich and the poor, wealth and poverty, working class (proletariat), and the ruling class, farmers’ struggles – lower class – hunger; [Brazilian] opulence and poverty, cost of living and economic crisis, economic monopoly and wheat market, peasants and salaried workers, betting market and estates. To a lesser extent, a few participants from both groups designated the director, film duration, colour, sound, and date of production. We also identified the use of concepts and more contemporary words for the identification of issues raised in the plot, such as speculation, commodities, agribusiness. A clear overlap exists between keywords and summary terms. Top summary terms selected by both groups are quite similar, and include ‘accidental death’ and ‘wheat’. The Brazilian participants are clearly more focused on the movie occurrences of ‘price increases’, ‘speculation’, ‘opulence’ and ‘wealth’ than their American counterparts. Neither group discussed film devices such as angles, shots, scenes, or montage. Participants typically refrained from commentary about film images and plot sequences, such as the innovative use of scene repetition. The first and final scene – two small farmers sowing their land – are identical, a device that indicates that the miserable lives of these characters were not changed by the events in the movie. Another innovation was the use of parallel montage to contrast the situation of the rich and poor (wealth and poverty), exploiters and exploited, abundance and waste, deprivation and extravagance, as well as “good” and “bad.” Concepts in the summaries of both groups emphasize social vision in recognizing the conflict and contrast of life among farmers and speculators. One summary (American) reflects a different logic than the Brazilians, using the term “enterprising man.” “An enterprising man builds a monopoly on the world’s wheat and drives up the price. One day while touring a processing plant for his wheat, he slips and drown in a rapidly filling vat of wheat.” On the other hand, two Brazilian summaries describe the conflict and contrast with less intensity. The film
252
La Barre, de Novais Cordeiro
presents a countryfolk family … [T]he King of Wheat, merchant, … has a new idea to maximize his earnings. With the success of the speculation in the Stock Market, he doubles the revenues. The rise [in price] is felt throughout the production chain and [raises] the price of … bread. Over time … the population protests with hunger.
The summary concepts assigned to the two main characters in the movie provide an interesting contrast between the characters of the wealthy speculator and the poor farmers. Americans favoured a number of descriptive terms for the wealthy businessman character in the movie, calling him: King of Wheat, enterprising man, speculators, financial tycoons, man, greedy businessman, marketers, engineers, monopolist. Far fewer terms were selected to describe the poor characters. In contrast, the Brazilians favoured more terms for the less wealthy character (farmer, small farmer, less favoured class (people), country folk families, peasants) than for the wealthy characters. Analysis of this first film reiterates the importance of subject-related attributes for searchers and begins to indicate some differences across cultural experiences of this film.
Film 2: First Love The second film is contemporary, produced by amateur film-makers in the traditional style of a silent movie, complete with intertitles and musical background material. Set in an early modern setting, it is the story of a young boy who becomes infatuated with a young girl after following a trail of soap bubbles coming from her yard. After failing to gain her attention, he decides to earn money doing a variety of chores so he can buy her a gift (a sandwich for them to share).
Chapter 10. That obscure object of desire
Keyword
Facet
US
BR
Total
Love (first) Black and white Romance Silent Short film Children Young love
Theme Genre Genre Genre Genre Character Theme
2 3 3 4 3 2 4
5 4 3 2 2 3 1
7
Summary concepts
Facet
US
BR
Total
Boy Girl Earn money Chores Love (first) Sharing
Character Character Action Action Theme Action
6 6 6 5 2 4
3 3 3 3 5 3
9
253
6 5
8 7
Table 4. First Love: Top keywords (terms and phrases) and summary concepts by participant’s country of origin (US –United States, BR – Brazil).
For this film, the American participants chose more widely distributed keyword terms and demonstrated less consensus, with ‘silent’ as an indication of the movie genre, and the theme ‘young love’ ranking as the most frequently occurring terms. The Brazilian participants also focused on the theme of ‘first love’ and another attribute of genre ‘black and white.’ Some interesting anomalies appeared as the participants tried to set the location of the film, with two American viewers guessing Britain, and one Brazilian viewer guessing America. One scene shows a car with a steering wheel on the right (as in Britain). The story is set in Australia. Overall, the American viewers tended to focus more heavily on the themes of earning money and chores, while their Brazilian counterparts tended to emphasize the romantic theme (first or young love). In this film, like the previous one, information about the language of the film was omitted from the summaries. No comments were made about salient objects. However, one participant in each group called attention to the intertitles present in the story, such as: ‘She will be mine’; ‘Never give up!’; ‘Mum, can I do these chores?’; ‘Hope pays off …’; ‘While later …’ As with the first movie, only one participant noted the opening sequence “boy playing alone, drawn by bubbles from an unknown source. Follows them to girl playing.” Yet, the summary concepts do attempt to highlight the action, characters and the sequence of the opening scene, especially among the Americans, and to a somewhat lesser extent, the Bra-
254
La Barre, de Novais Cordeiro
zilians. It was notable that all participants, in their choice of summary concepts, describe the films in general terms and few details. Both groups highlighted the actions far more carefully: – boy seeks attention from girl; – the boy’s task list; – tasks performed and money earned by boy; – purchase of sandwich by the boy; – encounter and sharing of the sandwich. The summaries produced showed a certain detachment (impersonality) of the participants in the description of the story, but one Brazilian participant resorted to commentary in the summary: The film shows … also a naive and childlike feeling existing in pre-adolescents. Romance well constructed, with images in black and white, that refer to old times, giving the impression that feelings and attitudes so naive and pure are of a past, contrasting with today’s society.
This attempt at commentary is mirrored by the one American’s choice of keywords ‘gender roles,’ ‘stereotypes,’ ‘heterosexual,’ ‘instructions on how to win affection.’ In the summaries, descriptions about the film production emerged: length, colour, sound, direction, year of production along with indications of genre ‘romance,’ ‘amateur film’. In general, the summary concepts were a mix of general terms, specific descriptions, and emotional experiences, possible genre (comedy childhood, comedy, romance) and production (independent film). The keywords that were recurrently attributed in both groups were: first love, young love / juvenile love, love, boy, girl.
Film 3: The 39 Steps This film, directed by Alfred Hitchcock, is a well known, commercially produced feature-length British spy thriller from 1935. Featuring the wrongful accusation and eventual exoneration of the main character, it is loosely based on the adventure novel, The 39 Steps, written by John Buchan.
Chapter 10. That obscure object of desire
Keyword
Facet
US
BR
Total
Hitchcock Thriller Espionage England Scotland Black and white
Assoc. person Genre Theme Location Location Genre
5 4 3 2 3 3
3 4 3 3 2 1
8
4
Summary
Facet
US
BR
Total
Murder Spy England Chase Exonerate Scotland Wrongful accusation Triumph
Action Char. Location Action Theme Location Theme Theme
6 6 4 4 3 3 5 5
5 5 5 3 4 4 2 1
11
255
6 5
9 7
6
Table 5. 39 Steps: Top keywords and summary concepts by participant’s country of origin (US –United States, BR – Brazil).
For both groups, the name of the director [facet: associated person] assumes high ranking. All Americans and half of the Brazilians selected this as a keyword – perhaps reflecting the fame of Hitchcock. That this film is a thriller [facet: genre] is also of great importance to both groups. Other recurring keywords included title and location. Keywords that were frequently repeated to describe the plot were spy, agent or informer [facet: character], and espionage [facet: theme]. The summary concepts selected by both groups are also remarkably similar, with both selecting ‘murder’ [facet: action] and spy [facet: character]. The American group heavily emphasized two themes ‘wrongful accusation’ and ‘triumph.’ Location remains among the top ranked summary concepts for both groups. As for the film’s story, participants frequently indicated where [facet: location] and when [facet: time] it happened. The suspense throughout the film was noted by the two groups as the genre ‘thriller’ of the film and listed in the keywords, although other genres also emerged in the summary concepts; for example: adventure, light comedy, drama with some romance. The relative commonality of reference to time-space plot narrations is also notable. In light of this, we might have expected a summary like this: On vacation in England, the Canadian Richard Hannay, attends the presentation of Mr. Memory who affirms that he can answer any questions from the public thanks
256
La Barre, de Novais Cordeiro
to his privileged brain. Suddenly, shots are fired and a big confusion begins. Heading for the exit, Hannay encounters a woman, Annabella Smith who asks him for help, claiming that her life is in danger. In the analysed participant summaries, three participants alluded to this sequence: – Canadian visiting London is caught up in espionage plot to smuggle military secrets out of England using Mr. Memory (a vaudville performer). At first performance involving Mr. Memory, the Protagonist meets a woman trying to stop the spy ring (US). – R. Hannay (Robert Danot) attends a vaudeville show with Mr. Memory (Wylie Watson). During the presentation there is an uproar, and in this moment of panic he meets Anabella Smith (Lucie Mamheim) (BR). – Canadian who goes to London. There, in a theatre, he meets a secret agent (US) This analysis demonstrates the relative importance (and difficulty) of capturing important plot sequences or events in ways that captures complex associations of events and permits them to be easily searchable. As with the previous two films, participant summaries did not emphasize information about the film language, images or sound. Of the twelve participants, eleven identified at least one of the plot locations in the movie, England, Scotland or Canada (the nationality of the main character). Summaries for both groups of participants remained focused on a general description of the story of the film and the descriptions take a temporal order logic, or rather, “beginning-middleend” of the events narrated in the film. But on the other hand, in the summaries, film sequences that could perhaps assist an information seeker interested in accounts of temporal time (as evidenced by a burning candle in one scene) were not always described. For example, the precipitating event that launches the plot is missing from the descriptions. Also absent is reference to the puzzling situation that develops and sustains suspense throughout the movie. There were participant efforts in both groups to describe this film as ‘confusing’ ‘gripping’ ‘complicated’ [facet: commentary]. In fact, this film elicited the most keywords and summary concepts that fit into the commentary facet of all three of the films. Filmic analysis uncovered, in both keywords and summaries, the impersonal nature of participant responses – which tended to reflect the occurrence of appointments and conceptual coincidences. This film received the fewest assigned keywords and most general summary concepts. Perhaps this is an indication of the iconic status of the film, or perhaps it reflects the length and general fatigue of the participants (this was the third film). Despite these possibilities, the film was deeply resonant and enjoyable for most viewers – reflecting the fact that it is part of the legendary filmography of Alfred Hitchcock, whose ability to
Chapter 10. That obscure object of desire
257
engage and surprise the viewer in his films, regardless of gender or country of origin, is unquestionable. The skilful intrigue of The 39 Steps has long provided universal audience appeal.
Discussion Implications for moving image repositories In order to provide a point of comparison for the findings, we now revisit the access points for films provided by the four digital film archives that were reviewed earlier in Table 1: (1) Internet Movie Database (Videos-IMDb) (2) Internet Archive: Moving Image Archive (MIA) (3) Petrobras Porta Curtas (aka Short Port) (4) BBC Motion Gallery. Access points utilized by search and browsing functions in each repository were subject to content analysis, and subsequent facet analysis. As a result, several additional facets were identified and added to the original list of facets – these are interleaved in the list below marked with a double asterisk.** Note that not all of these facets are equivalent. Some are sub-facets that link to more general or upper level facets. In the following list all terms marked > denote higher-level facets under which the identified sub-facets may be located. As a result, several additional facets were identified and added to the original list of facets – these are interleaved in the list below marked with a double asterisk[**]. Note that not all of these facets are equivalent. Some are sub-facets that link to more general or upper level facets. In the following list all terms marked > denote higher-level facets under which the identified sub-facets may be located. Attribute (of film) [Attributes sorted by production value] Color Length Format** Rights [access] Time [of production] Association [types of associations] awards** derivations** collections** rating** persons [actors, director, screenwriter, etc.]
258
La Barre, de Novais Cordeiro
locations [production, setting, studio] genre Audience commentary [types of commentary] description emotion review rating Mise en scène Character Object [item] Location [of action] Time [chronological of setting] Plot Action Setting / Location [setting of film] Time [chronology within film] Theme
Comparison of this list with Figure 1 – Facet occurrence across movie and according to participant country of origin – supports the emergence of several suggestions for additional useful access points. One third of American and Brazilian participants noted theme as the most salient characteristic across both keyword and summary concepts. Genre was also considered an important feature by nearly one quarter of the participants in both groups as a keyword and ranked highly in the summaries for both groups. Action ranked as another highly salient feature in keyword terms for both sets of participants. The Brazilian participants favoured descriptions of objects, as did members of the other group, though to a lesser extent. Character was another highly salient feature for both groups. Other moderately important features were Associations, such as related persons, or items; Emotion; and the ability to interject personal Commentary. All four repositories provide title, date of production, and a summary, a finding that replicates Hertzum’s (2003) observations of desired access points in reference queries from a film archive. Three repositories provide access via association – though this is often limited in range, to directors and in some cases actors. Participants expressed interest in a much wider variety of associations, including references to other films, books and cultural objects. The construct of genre is also a valuable access point for the participants, and one caution that arises from the findings, is that this bears further development across all repositories as both a search and a browsing mechanism. Genre is most often used as a user-generated tag for films on these sites, and though one may be able to click on this term to draw other films that are tagged similarly, it is too often impossi-
Chapter 10. That obscure object of desire
259
ble to search for films in a particular genre, or to filter large result sets according to this concept. Character is an especially salient characteristic that is not leveraged as an access feature across all four sites. Further, unless a particular Object makes an appearance in a film synopsis, it is also impossible to search for films with specific items of interest. Furthermore, none of the sites support coding of emotive film features. Although two of the sites permit the addition of audience commentary, in the form of reviews and ratings, the interest shown by participants in creating commentary serve as a strong indication of the appeal of such features for users of digital moving image film repositories.
Conclusion Filmic analysis, in combination with facet analysis provided a useful set of analytical tools for understanding filmic responses in this project. This preliminary research calls for further applications that continue to test the utility of the analytical framework. One future possible extension of this research, as discussed by Cordeiro (2000), is to focus on one film and the family of documents that are generated as part of the production of a film (argument, synopsis, script, storyboard, script analysis technique, etc.). Another possibility would be to analyse the publications (articles, term papers, theses, books, etc.) that address a set of films under analysis as a way to further extend the current facet analytical paradigm. Both similarities and differences between cultures were evident in the filmic analysis, but due to the small sample of participants it is far too early to draw conclusions. Replication of this study design across additional cultures may well result in further insights. A comparison of the facets elicited by this approach across films and across cultures indicates that subject-related content, especially theme, genre and action, are highly salient access points for searchers in this group. This indicates a continued need to pay close attention to subject access in search and discovery systems. The findings of this project indicate that there is continued value in exploring ways to augment controlled vocabulary with crowd-sourced subject terms from folksonomies, in more powerful ways. With the increasing prevalence of digital multimedia resources, continued research is needed so that instead of the currently frustrating and dysfunctional relationship between ‘that obscure object of desire’ and those who seek to find moving images – a more user-aware paradigm might prevail.
260
La Barre, de Novais Cordeiro
References Aitchison, J., Gilchrist, A., & Bawden, D. (2000). Thesaurus construction and use: A practical manual. Cambridge: Psychology Press. Andreano, K. (2007). The missing link: Content indexing, user-created metadata, and improving scholarly access to moving image archives. The Moving Image, 7(2), 287 – 299. Argan, G. C. (1992). História da arte como história da cidade. São Paulo: Martins Fontes. Armitage, L. H. & Enser, P. G. B. (1997). Analysis of user need in image archives. Journal of Information Science, 23(4), 287 – 299. Art and Architecture Thesaurus (n.d.). Getty vocabulary program. Retrieved from http: // www.getty.edu / research / conducting_research / vocabularies / aat / about.html Aumont, J., & Marie, M. (1996). Á quoi pensent les filmes. Paris: Séguir. Aumont, J., & Marie, M. (2003). Dicionário teórico e crítico de cinema. Campinas, SP: Papirus. Baxter, G., & Anderson, D. (1996). Image indexing and retrieval: Some problems and proposed solutions. Internet Research, 6(4), 67 – 76. Benedict, R. (1972). O crisântemo e a espada. São Paulo: Perspectiva. BBC (British Broadcasting Corporation). (n.d.). Motion gallery. Retrieved from http: // www.bbcmotiongallery.com / Buñuel, L. (1977). That Obscure Object of Desire (Cet obscur objet du désir). Paris: Les Films Galaxie. CRG (Classification Research Group). (1957). The need for faceted classification as the basis for all methods of information retrieval. Proceedings of the International Study Conference on Classification for Information Retrieval (pp. 137 – 147). Cordeiro, R. (2000). Imagem e movimento. Rio de Janeiro: UFF: Programa de Pós-Graduação em Ciência da Arte. Enser, P. (2008a). The evolution of visual information retrieval. Journal of Information Science, 34(4), 531 – 546. Enser, P. (2008b). Visual information retrieval. Annual Review of Information Science and Technology, 42, 3 – 42. Getty Research Institute. (2011). Getty vocabularies. Retrieved from http: // www.getty.edu / research / tools / vocabularies / index.html Greisdorf, H., & O’Connor, B. (2002). Modelling what users see when they look at images: A cognitive viewpoint. Journal of Documentation, 58(1), 6 – 29. Hall, S. (2006). A identidade cultural na pós-modernidade. Rio de Janeiro: DP&A. Hertzum. M. (2003). Requests for information from a film archive: A case study of multimedia retrieval. Journal of Documentation, 59(2), 168 – 186. Iconclass. (n.d.). RKD: Netherlands. Retrieved from http: // www.iconclass.nl / Internet Archive. (n.d.). Moving Image Archive. Retrieved from http: // www.archive.org / details / movies Internet Movie Database. (n.d.). Videos. Retrieved from http: // www.imdb.com / features / videos Jörgensen, C. (2003). Image retrieval: Theory and research. Lanham, MD: Scarecrow Press. Jörgensen, C., & Jörgensen, P. (2005). Image querying by image professionals. Journal of the American Society for Information Science and Technology, 56(12), 1346 – 1359. Journot, M-T. (2005). Vocabulário de cinema. Lisboa: Edições, 70. Jullier, L., & Marie, M. (2009). Lendo as imagens do cinema. São Paulo: Ed. Senac.
Chapter 10. That obscure object of desire
261
Kuhlthau, C. (2004). Seeking meaning: A process approach to library and information services. Westport, CT: Libraries Unlimited. Kuhn, A. (1982). Women’s pictures: Feminism and cinema. London: RKP. La Barre, K. (2006). The use of faceted analytic-synthetic theory as revealed in the practice of website construction and design. (Unpublished doctoral dissertation). Indiana University, Bloomington, IN. La Barre, K. (2010). Facet analysis. Annual Review of Information Science and Technology, 44, 243 – 284. La Barre, K., & Tilley, C. (2010). Folktales and Facets: Final Report to OCLC / ALISE (for grant via OCLC / ALISE LISRG program). Retrieved from http: // hdl.handle.net / 2142 / 14887 La Barre, K. & Tilley, C. (forthcoming). The elusive tale: Leveraging the study of information seeking and knowledge organization to improve access to and discovery of folktales. Journal of the American Society for Information Science and Technology, 63(1). McGrath, K., Kules, B., & Fitzpatrick, C. (2011). FRBR and facets provide flexible, work-centric access to items in library collections. In G. Newton, M. Wright & L.N. Cassell (Eds.), Proceedings of the 11th Annual International ACM / IEEE Joint Conference on Digital Libraries (pp. 49 – 52). Machado, A. (2007). O sujeito na tela: Modos de enunciação no cinema e no ciberespaço. São Paulo: Paulus. Matusiak, K. (2006). Information seeking behavior in digital image collections: A cognitive approach. Journal of Academic Librarianship, 32(5), 479 – 488. Ménard, E. (2009). Image retrieval: A comparative study on the influence of indexing vocabularies. Knowledge Organization, 36(4), 200 – 213. Mentis, H. M. (2008). Memory of frustrating experiences. In D. Nahl & D. Bilal (Eds.), Information and emotion: The emergent affective paradigm in information behavior research and theory (pp. 197 – 201). Medford, NJ: Information Today. Moving Image Genre-form Guide (1988). The Library of Congress. Motion Picture & Television Reading Room. Retrieved from http: // www.loc.gov / rr / mopic / migintro.html Online Audiovisual Cataloger (OLAC) prototype discovery interface. (n.d.). Retrieved from http: // blazing-sunset-24.heroku.com / Nahl, D. & Bilal, D. (Eds.). (2008). Information and emotion: The emergent affective paradigm in information behavior research and theory. Medford, NJ: Information Today. Neal, D. (2007). Folksonomies and image tagging: Seeing the future? Bulletin of the American Society for Information Science and Technology, 31(4), 7 – 11. Panofsky, E. (1991). Significado nas Artes Visuais. São Paulo: Perspectiva. Petrobras. (n.d.) Porta Curtas. Retrieved from http: // www.portacurtas.com.br / index.asp Ranganathan, S. R. (1933). Colon classification. Madras: Madras Library Association. Ranganathan, S. R. (1937). Prolegomena to library classification. New York: Asia Publishing. Ransom, N., & Rafferty, P. (2011). Facets of user-assigned tags and their effectiveness in image retrieval. Journal of Documentation, 67(6), 1038 – 1066. Ricoeur, P. (1994). Tempo e narrativa. Campinas: Papirus. Shatford, S. (1986). Analyzing the subject of a picture: A theoretical approach. Cataloging & Classification Quarterly, 6(3), 39 – 62. Shatford Layne, S. (1994). Some issues in the indexing of images. Journal of the American Society for Information Science, 45(8), 583 – 588. Speziali, P. (1973). Classifications of the sciences. In P. P. Wiener (Ed.), Dictionary of the history of ideas (pp. 462 – 467). New York: Scribners.
262
La Barre, de Novais Cordeiro
Spiteri, L. (1998). A simplified model for facet analysis. Canadian Journal of Information and Library Science, 23, 1 – 30. Tillet, B. (2004). What is FRBR? A conceptual model for the bibliographic universe. Retrieved from http: // www.loc.gov / cds / downloads / FRBR.PDF Trant, J. (2009). Tagging, folksonomy and art museums: Early experiments and ongoing research. Journal of Digital Information, 10(1). Retrieved from http: // journals.tdl.org / jodi / article / view / 270 / 277 Vanoye, F., & Goliot-Lété, A. (1994). Ensaio sobre a análise fílmica. Campinas, SP: Papirus. Vickery, B. C. (1960). Faceted classification: A guide to construction and use of special schemes. London: Aslib. Vickery, B. C. (1966). Faceted classification schemes. New Brunswick, NJ: Rutgers, the State University of New Jersey. Vickery, B. C. (1980). Classificação e indexação nas ciências. Rio de Janeiro: BNG / Brasilart. Wildemuth, B. M., Oh, J. S., & Marchionini, G. (2010). Tactics used when searching for digital videos. Proceedings of IIiX ‘10, Third Symposium on Information Interaction in Context (pp. 255 – 263). Yoon, J. W. (2008). Searching for an image conveying connotative meanings: An exploratory crosscultural study. Library & Information Science Research, 30, 312 – 318. Zhang, Y., & Li, Y. (2008). A user-centered functional evaluation of moving image collections. Journal of the American Society for Information Science and Technology, 59(8), 1331 – 1346.
Olha Buchel
Chapter 11. Designing and visualizing faceted geospatial ontologies from library knowledge organization systems Abstract: Geospatial ontologies may play an important role in facilitating access to location-based information. But the design of content-rich geospatial ontologies is often cumbersome, labour-intensive, and incomplete due to the difficulties associated with the collection of local knowledge about locations. Unlike other enterprises that are in the business of collecting local knowledge, large-scale library collections have a wealth of local knowledge about geospatial locations hidden in their knowledge organization systems (metadata and classifications). In this paper, I explain the advantages of utilizing metadata facets for the construction of geospatial ontologies. I also discuss pros and cons of representing ontologies in text and in the prototype visualization – VIsual COLlection EXplorer (VICOLEX). Keywords: Geospatial ontologies, gazetteers, map-based visualizations
Olha Buchel, Doctoral Candidate, Faculty of Information and Media Studies, The University of Western Ontario, [email protected]
1 Introduction We live in a time where location-based information is becoming ever more important in our information needs (Hill, 2006). This is evident from the recent studies of online reference services (Bishop, 2011; Berry, Casado, & Dixon, 2003) which demonstrate that significant portions of inquiries in online reference constitute location-based questions. Such questions concern physical locations, directions, demographics, subjects, government, employment, genealogy, history, and other local information. They account for approximately 36 % to 50.2 % of all reference questions in online reference (depending on the type of reference service; whether it is at an individual library or a consortium). Interestingly, many location-based questions are being answered by nonlocal librarians, who are not confident in answering questions concerning locations other than their own (Berry, Casado,
264
Buchel
& Dixon, 2003; Bishop & Torrence, 2007; Sears, 2001). As a result of this, many reference questions end up in referral. In order to improve access to local information, I suggest enhancing organization of geospatial knowledge by means of geospatial ontologies. In the context of this paper, geospatial ontologies are knowledge organization systems (KOS) that describe geographic locations. In terms of their structure, geospatial ontologies can be similar to dictionaries, encyclopaedias, classifications, or reasoning schemas. In terms of content, they can be basic and content rich. Basic geospatial ontologies are ontologies that mainly have the core properties (i.e., place names, variant names, types, and coordinates or footprints), that have been discussed elsewhere (see e.g., Buchel & Hill, 2009; Hill, 2006). This paper focuses on content-rich geospatial ontologies, which are studied less frequently. Unlike basic ontologies that are used primarily for geographic retrieval, geoparsing, and visualization, content-rich ontologies pursue more ambitious goals (e.g., provide reference for researchers, educators, and learners, or support visual analytics). The Columbia gazetteer, The CIA world factbook, The world book encyclopedia of people and places, The Biography Light Ontology, The Semantic Web Technology Evaluation Ontology (SWETO, http: // lsdis.cs.uga.edu / proj / sweto / ), and some other ontologies are good examples of such ontologies. Typically, designing content-rich ontologies is not easy because it is difficult to collect information about geographic locations from data scattered in multiple sources, some of which is not readily accessible. Moreover, entries in ontologies require extensive local knowledge of geographic areas. In the context of this paper, local knowledge is “strongly rooted in a particular place” (Geertz, 1983, p. 75). Local knowledge of localities is comprised of cultural meanings of places, history, community characteristics, and other detailed and up-to-date information about named places (Hill, 2006). Such knowledge is difficult to collect in an authoritative ontology because it is scattered in various sources. Unlike any other enterprises that are attempting to build geospatial ontologies of local knowledge, large-scale libraries appear to have a strong advantage with respect to local information, since they already have extensive information about geographic locations in their existing KOS such as classifications and metadata. This paper demonstrates how content-rich local ontologies can be designed and visualized from library KOS. Unlike classifications and metadata that focus on the description of book collections, the derived geospatial ontologies represent historic and economic peculiarities of locations. This is not surprising since languages in publications bear similarity to languages in locations; subjects in publications reflect social, economic, and historical events in locations, and so on.
Chapter 11. Designing and visualizing faceted geospatial ontologies
265
Besides extracting ontologies from KOS, any geospatial ontology must have a visualization, whether on the screen or on paper. Visualizations facilitate understanding of geospatial ontologies, their objects, properties, relationships, and patterns (Budak, Sheth, & Ramakrishnan, 2004). For this reason, I also explain how the proposed geospatial ontology can be visualized, and how the proposed visualization enhances understanding of geospatial ontologies. The remainder of the paper is divided into three parts. In the Background information section, I provide some conceptual background about various kinds of geospatial ontologies, and the methods of their construction and visualization. In the Ontologies from KOS section, I describe the prototype collection, the geospatial ontology derived from the collection, and the visualization prototype VIsual COLlection Explorer (VICOLEX). In the Conclusions section, I summarize the discussion and briefly outline implications for libraries and the significance of this research.
2 Background information Geospatial ontologies can be designed in one of the following ways: a) authoritative (top down); b) programmatic, (using natural language processing, first-order logic); c) crowd-sourced; and d) faceted. In the following part of this section, I explain the merits of each of these methods, difficulties associated with their construction, and how the ontology can be visualized. Although the ontology was built specifically for the prototype collection of the Local History of Ukraine, its principles can be useful for many other contexts and subject domains.
2.1 Geospatial ontologies Examples of authoritative content-rich ontologies are print gazetteers such as The Columbia gazetteer of the world and The CIA world factbook. According to the description on its website, The Columbia gazetteer is “unrivaled in content scope and unmatched in authority” (Introduction to The Columbia gazetteer of the world, n.d., para. 33). Its 1998 print edition contains over 170,000 entries about places throughout the world. Most articles have details on where the place is physically located, its dimensions and borders, and information on economic activities, demographics, history, distance to relevant places, points of interest, transportation lines, physical geography, political boundaries, industry, trade, and service activities, agriculture, and former or alternate names and different spellings and
266
Buchel
pronunciations (Introduction to The Columbia Gazetteer of the World, n.d.). The CIA world factbook also has highly-structured summaries of the demographics, geography, communications, government, economy, and military of geo-political entities including U.S.-recognized countries, dependencies, and other areas in the world. But the number of entries in The CIA world factbook is much smaller than in The Columbia gazetteer – only 267 (Central Intelligence Agency, 2011). Content-rich authoritative ontologies are not easy to construct and maintain. Constructing and maintaining The Columbia gazetteer requires major effort from editorial board members and research staff. The construction is “supervised by a board of 150 leading geographical scholars from all parts of the world … with personal knowledge of places and features that these sources have described” (Introduction to The Columbia gazetteer of the world, n.d., para. 33). Information for the CIA World Factbook is also provided by numerous departments and agencies (within the CIA), journals, the National Science Foundation, and other public and private sources. Even though content-rich ontologies require large teams of researchers and editorial boards, some of their entries are still incomplete. In The Columbia gazeteer, for example, some descriptions are longer than the others, ranging from a brief notation to an essay; and in The CIA world factbook not all countries include the same number of descriptors because information is simply unavailable (Central Intelligence Agency, 2011). Some content-rich ontologies, however, are designed programmatically, using natural language processing techniques, first-order logic, or text mining. Examples of such ontologies are Emma Goldman and Ireland and Irish Studies, ontologies which comply with The Biography Light Ontology standard. Although these ontologies are about biographies, they can also be considered to be geospatial because they have multiple geospatial properties. The Biography Light Ontology standard describes events in people’s lives as events are actions or occurrences taking place at a certain time at a specific location, and lives are series of events and activities such as birth, education, marriage, military service, etc. (Larson, 2010). The methodology of extracting biographic events from texts has a series of problematic aspects, since the time and location of events are often unspecified or only implied in texts, and relations between events are not defined. All this makes automatic construction of the biographical ontologies difficult (Larson, 2010). Another example of a programmatically designed ontology is the SWETO ontology of the SenDIS project (Budak, Sheth, & Ramakrishnan, 2004). This ontology merges several classes of entities (including geographic locations, airports, companies, terrorist attacks, persons, publications, journals, conferences, and books). Information for this ontology was mined from a broad variety of information resources available from the openWeb, The National Map, NASA
Chapter 11. Designing and visualizing faceted geospatial ontologies
267
and members of UCGIS. All entities in SWETO are interconnected with explicit semantic relationships among them, such as: located in, responsible for, listed author in, paper published in, and others. However, the designers of this ontology reported challenges associated with capturing imprecise relations among different entities, and defining proximity measures with accommodating thematic, spatial, and temporal dimensions. Besides authoritative print and programmatic ontologies, crowdsourced ontologies are becoming prominent in the Web 2.0 environment, too. Examples of such ontologies include Wikipedia entries about geographic locations and various context-rich map mashups. Some of these ontologies are intended to provide a public service – for example, the online maps that were generated to help connect people after Hurricane Katrina in 2005 (Singel, 2005) or to provide reference (e.g., place name entries in Wikipedia); others are made for personal and recreational purposes. Many of them include content-rich annotations (Barricelli, Iacob, & Zhu, 2009; Bellucci, Malizia, Diaz, & Aedo, 2010; Marcante & Provenza, 2008; Simon, Sadilek, Korb, Baldauf, & Haslhofer, 2010). Annotations are made in a variety of forms ranging from audio and icon notes to graphic notes and scribbles on the maps. Annotations help people create knowledge about locations by adding emotions, cultural information, blog entries, links, tags, and so on. Recently, Poore (2011) drew attention to the need for a critical ontology approach that would embrace folksonomies, taxonomies and other user-contributed data. She proposed to develop algorithms for linking crowd-sourced ontologies into a high-level ontology for a different kind of a national map. Although crowd-sourced ontologies are often considered amateurish because they are designed by neogeographers who do not have formal training in map design, these geoontologies represent not only new ways of seeing and thinking with maps but also reveal local knowledge, and personalized views of the world. Personalized views, much like local knowledge are valuable for geospatial ontologies since they allow viewing the world from the perspectives of different users (Fonseca, 2002). Moreover, as Vander Wal (2007) indicated, crowdsourced data may prove to be superior to top-down ontologies because they capture local meaning better. But the problem with linking distributed ontologies to a high-level ontology lies in matching and integration of crowdsourced ontologies because metadata standards are not agreed upon and there are many different data types used in users’ ontologies. In addition, crowd-sourced data are infamous for “sloppy tags”: low-quality, redundant or nonsense metadata that could affect the quality of the high-level ontology (Guy & Tonkin, 2006). Nonetheless, some researchers are experimenting with statistical methods for extracting ontologies from crowd-
268
Buchel
sourced data, especially tags (Budak, Sheth, & Ramakrishnan, 2004; Chen & Qin, 2008). Ontologies from facets can be either built from metadata facets or analyticosynthetic schemas. An example of analytico-synthetic ontology is described in Farazi, Maltese, Giunchiglia, & Ivanyukovich (2011). This ontology includes facets about administrative divisions, bodies of water, and land formation; it is used for query expansion. Faceted metadata ontologies differ from all other ontologies by their role in information systems. Their role is not only cataloguing, organizing, or retrieving, but also improving user experiences. Namely, they are used for fine-tuning search strategies, adding multiple directions to retrieval, generating multi-perspective views on search results, enhancing browsing and exploration, and providing flexible, work-centric access to resources (La Barre & Cochrane, 2006; McGrath, Kules, & Fitzpatrick, 2011; Rosenfeld & Morville, 2002; Tudhope, Binding, Blocks, & Cunliffe, 2002; Yee, Swearingen, Li, & Hearst, 2003; Zhitomirsky-Geffet, BarIlan, Miller, & Shoham, 2010). Faceted metadata enhances navigation along various conceptual dimensions that describe collection objects (Yee, Swearingen, Li, & Hearst, 2003). For example, browsing the gallery of Nobel Prize winners at Hearst’s Flamenco project (http: // bailando.sims.berkeley.edu / flamenco.html), users can learn not only about individual laureates, but also about properties that describe laureates (years, countries, and affiliations). Moreover, users can switch the conceptual dimension from Nobel Prize Winners to countries, and explore properties of countries (e.g., how many laureates each country had, what institutions can be found in each country, and what Nobel laureates are associated with each country). This section presented an overview of the main methodologies used in the design of ontologies. However, knowing how to extract ontologies with natural language techniques and facets solves the task of creating content-rich geospatial ontologies only partially. Other issues are associated with visualization of ontologies. The role of visualization in presenting ontologies is explained in the next section.
2.2 Visualizations Traditionally, content in geospatial ontologies is encoded in a form of text. Text and language have two inherent shortcomings when conveying geospatial information: first, the linear nature of language is ill-suited for representing higher dimensionality of geospatial information; and second, language and text have a limited vocabulary for expressing geospatial and temporal relation-
Chapter 11. Designing and visualizing faceted geospatial ontologies
269
ships (Peuquet, 2002). Language by nature is categorical and is more reliable for conveying categorical relations (e.g., IsPartOf Ontario) than exact ones such as metric or coordinates (e.g., 50 miles north of Chicago) (Leibowitz, Guzy, Peterson, & Blake, 1993; Hill, 2006). But linguistic relationships cannot fully substitute geospatial footprints in information systems, since they are less accurate and incomplete (Janée, 2006). For example, it is difficult to record all PartOf relationships due to various geographic entities that a location can contain, overlap with, or be PartOf. For example, Santa Barbara, CA, can be classified not only as PartOf Santa Barbara country, but as a location that HasParts such as watersheds and wetlands. To overcome limitations of language and text, geospatial ontologies are often enhanced with illustrations and visualizations. Cartographers and information visualization researchers are convinced that images help derive patterns and relationships between objects, detect shapes, and make associations with prior knowledge of the perceiver much faster than any textual descriptions and tables (Card, Mackinlay, & Shneiderman, 1999; Spence, 2001; Peuquet, 2002). For example, The Biography Light Ontology has several visualizations for different contexts. These are the Emma Goldman itinerary interface (http: // gray.ischool. berkeley.edu / emma / ) and Contexts and relationships: Ireland and Irish studies (http: // ecai.org / neh2007). SWETO ontology has a 3D geo-visualization for helping analysts extract critical spatiotemporal patterns (Budak, Sheth, & Ramakrishnan, 2004). Wikipedia entries are visualized on a special Wikipedia layer that can be added to almost any Google Maps or Google Earth mashup (http: // code.google.com / apis / maps / documentation / javascript / v2 / overlays.html). Visualizations bring to the fore not only the geospatial aspect of ontologies, but other properties as well (e.g., time, descriptions of events, types of events, and so on). Visualizations allow users to interact with properties and consequently gain a better understanding of spatio-temporal relationships, and properties of events in space and time. For example, the Emma Goldman Itinerary Interface allows users to follow the chronology of Emma Goldman’s lectures on the temporal map and see patterns in her itineraries. In this paper, visualization is defined in terms of information visualization; that is, visualizations are conceptualized not simply as maps or illustrations but as complex systems composed of representations and interactions. Representations encode information in graphical form, and make effective use of human visual perception (Fast & Sedig, 2005). They can take different forms (i.e., maps, graphs, tables, diagrams, and so on). Interactions have two components: actions and reactions. Users act upon representations and the representations react and give a response (Fast & Sedig, 2005). For example, if a user sets filters on a map, the map responds by changing how it looks.
270
Buchel
The goal of interactions and representations in visualizations is to support and enhance various cognitive activities such as understanding, exploratory analysis, decision making, and sense making. From the point of view of geovisualization, many of these cognitive activities largely rely on visual tasks such as identifying, comparing, associating, and differentiating (Andrienko & Andrienko, 2006; Buchel & Sedig, 2011). The results of activities are patterns or hypothesis. Patterns are not simply observations about similarities and dissimilarities; they are interesting observations. Interestingness is determined by relevancy, validity, credibility, plausibility, and manageability (Mirel, 2007). While users mainly find patterns on representations, they cannot often get interesting patterns without modifying the representations. Interactions help users perform the necessary modifications. In sum, visualizations enhance higher level cognitive activities with ontologies, and expedite trend and pattern recognition, especially when representations can be acted upon with interactions. The next section explains how we connect this visualization theory with ontologies.
3 Ontologies from KOS This section describes the design and visualization of content-rich geospatial ontologies extracted from facets. This methodology has been implemented on a prototype collection. In what follows, I first describe the prototype collection; then I explain how geospatial ontologies can be constructed and represented in a form of text; and finally, I illustrate how such ontologies can be visualized on map-based visualizations.
3.1 Prototype collection The prototype collection is about the Local History of Ukraine. More specifically, the collection is comprised of 349 surrogate MAchine Readable Cataloguing (MARC) records about the local history and description of Ukraine from the Library of Congress’ online catalogue that were catalogued before July 2007. All records belong to lower-level classification numbers in the DK508 schedule (which is about the local history of Ukraine) in the Library of Congress Classification. These lower-level schedules are about place names in Ukraine. Overall, the collection has 32 place names. Via call numbers, place names are linked to metadata records. While some locations have from one to a few records associated
Chapter 11. Designing and visualizing faceted geospatial ontologies
271
with them, others have up to a hundred. Since the Library of Congress collection is one of the largest in the world and it includes almost all publications that have been published in any part of the world starting from the end of the 19th century, we may assert that the prototype collection has probably all the books about the local history of 32 Ukrainian cities that were published in the past century and at the beginning of this century. From each MARC record, I selected only the high-frequency book properties. They are physical descriptions (illustrations, maps, year of publication, and place of publication recorded in MARC 300), languages (field 041), bibliographic notes (MARC 504), subjects (fields 650 and 651), titles (field 245), call numbers (field 050), and acquisition numbers (MARC 010). All records were downloaded into a MySQL database. A large portion of titles in this collection is in foreign languages, but the metadata descriptions of collection items contain categories in English (e.g., subjects, languages). Furthermore, many metadata fields chosen for this study have controlled vocabularies (e.g., descriptions of physical properties, subjects, languages, and authors), and for this reason many records have common descriptors. Common descriptors allow us to assert that values can be grouped by facets.
3.2 Prototype conceptualization and implementation The conceptualization of the ontology and its visualization is based on interdisciplinary research. By taking this path, different approaches used in other disciplines were juxtaposed for the purpose of understanding how people perform higher-order cognitive activities with collections about locations on map-based visualizations. This strategy allowed me to integrate collaborative efforts involving many disciplines and fields of study, especially studies on cognition, HCI, geovisualization, information visualization, and library and information science. The prototype was built using PHP, MySQL, Google Maps API, and Fusion Charts. All components of the interface are dynamically generated, based on properties defined in metadata records. First, metadata records are grouped by location place names associated with call numbers, and later these groups are further divided into subgroups by other properties and facets found in metadata records. Properties and facets were extracted using the SQL COUNT(*) and GROUP BY operators.
272
Buchel
3.3 Faceted geospatial ontology from KOS At the very top level (Figure 1.a) the ontology represents geographic place names. Place names or locations can be considered key entities in the ontology. Each place name is linked to a number of metadata records that represent documents about the place name. The top level provides a broad overview of the ontology and density of the documents linked to the ontology. Such an overview allows many starting paths for exploration. The exploration starts when a user selects a place name and goes to the next level of ontology which represents faceted properties from metadata records (i.e., languages, places of publication, publication years, subjects, authors). Figure 1.b illustrates the lower level of ontology for Kyyiv.
Figure 1. Geospatial ontology represented in a form of text.
At first glance, users might think that the description shows that documents about Kiev were published in five languages (Ukrainian, Russian, Multiple, English, and Polish); that they were published between 1916 and 2005 in various parts of the world (including Ukraine, Russia, Poland, and the US); and that they have multiple subjects and authors. However, facets in this ontology can be viewed not only as properties of collections about place names, but also as properties of locations, since facets and their values are similar to properties of locations. For example, the results of the poll conducted in Kyyiv in 2006 show that 23 % of the Kyyiv population speak Ukrainian in everyday life; 52 %, Russian; and 24 % switch between both (Київ: місто, його жителі, проблемне сьогодення
Chapter 11. Designing and visualizing faceted geospatial ontologies
273
і бажане завтра [“Kyyiv: the city, its residents, problems of today, wishes for tomorrow”], 2006). In the ontology, 50 % of books about Kyyiv are in Ukrainian, and 21 % in Russian. Notably, despite the discrepancies in statistics, both the results of the poll and facets in ontology show that Ukrainian and Russian are the two leading languages in Kyyiv. Although this finding cannot be considered statistically significant due to the size of the dataset, similar associations between patterns in languages in locations and languages in the ontology were observed for many other locations (e.g., Odessa, Donetsk, Kharkiv, Donetsk, Simferopol’, Chernihiv, and others). Similarities between languages in the collections and locations were not observed in L’viv where 53 % of documents were published in Polish, while it is known from Hrytsak (2005) that 77.6 % of L’viv’s inhabitants are primarily Ukrainian-speaking. However, this exception has a well-justified explanation: L’viv was part of Poland for long periods of time throughout its history, and nowadays it has a large ethnic community of Poles. Subjects in the ontology represent landmarks, celebrities, and history of locations. While there are many subjects that do not have any special association with locations, some subjects carry special and unique local information about locations. For example, in Kyyiv such subjects are: Bratislava (Slovakia) Kievan Rus Khreshchatyk (Kiev, Ukraine) Kuksa, Viktor Ivanovych, 1940Ky͡ievo-Pechersʹka lavra Leipzig (Germany) Luk’i͡anivsʹke t͡syvilʹne kladovyshche Malhanov, Stanislav, 1935Moskalenko, Heorhiĭ Mytrofanovych, 1938Park-muzeĭ “Drevniĭ Kiev” Pozhivanov, Mikhail, 1960Shuli͡avka (Kiev, Ukraine) Sofiĭsʹkyĭ sobor (Kiev, Ukraine) Strashenko, Olʹha, 1950Z͡Hovtnevyĭ raĭon (Kiev, Ukraine) Zmiĭovi valy (Ukraine)
The subject “Kievan Rus” reminds us about the past of the city: Kyyiv was a capital of Kievan Rus. The subjects “Bratislava (Slovakia)” and “Leipzig (Germany)” link Kyyiv to its sister-cities. Such subjects as “Kyi͡evo-Pechersʹka lavra,” “Luk’i͡anivsʹke t͡syvilʹne kladovyshche,” “Park-muzeĭ ‘Drevniĭ Kiev’,” “Sofiĭsʹkyĭ sobor (Kiev, Ukraine)” represent landmarks; the subjects “Khreshchatyk (Kiev, Ukraine),” “Shuli͡aka (Kiev, Ukraine),” “Z͡Hovtnevyĭ raĭon (Kiev, Ukraine),” and “Zmiĭovi valy
274
Buchel
(Ukraine)” are names of streets and regions. People’s names in subjects are not accidental either. Many of these people played important roles in the history of the city. For example, “Kuksa, Viktor Ivanovych, 1940-“ is remembered in history as a nationalist who hoisted the Ukrainian Nationalist flag on a multi-story building in Kyyiv in 1967; ”Pozhivanov, Mikhail, 1960-“ is a Ukrainian politician and statesman; and “Strashenko, Ol’ha, 1950-“ is a writer and poet. Places of publication add tacit information to local knowledge about locations, too. In the prototype collection, they show the distribution of researchers and emigrants writing books about the history of locations in Ukraine. Emigrants and researchers (from remote locations) produce a different kind of “local knowledge” about locations than is created by persons living in those locations. This knowledge is valuable because it provides a different perspective about the local history of locations in Ukraine. Examining places of publication may prompt users to investigate the nature of relationships between locations described in books and places of publication. For example, places of publication in the United States (e.g., Freeman, SD, Princeton, NJ, and New York, NY) may suggest where to look for Ukrainian or Russian communities in the U.S. or where to find publishers or research institutions who publish books in Ukrainian or Russian, or who publish books about Ukraine. And finally, authors who write about locations represent communities of people interested in the local history of particular locations. Members of these communities often have personal ties with the locations that they write about. The majority of them currently live or have lived in the past in those locations. Other authors research history or genealogy of locations, but for them, locations are the objects of their research. To this end, I have explained how faceted geospatial ontologies from library KOS can generate rich local knowledge about geospatial locations. Specifically, they relate historical landmarks, names of celebrities, and many other cultural concepts. However, local knowledge about locations would have been too narrow, if ontologies remained in a form of text. Visualizations can potentially enhance knowledge about locations by making patterns in data easier to perceive, understand and make sense of them. Therefore, in the next section we present our visualization prototype.
3.4 VICOLEX Similarly to the ontology, VICOLEX is also a product of interdisciplinary research. It is built on the principles of information visualization, geovisualization, computational models of cognition, and HCI. It visualizes the faceted geospatial ontol-
Chapter 11. Designing and visualizing faceted geospatial ontologies
275
ogy described in the previous section. Detailed technical description of this visualization can be found in Buchel and Sedig (2011) as well as Buchel and Sedig (in review). VICOLEX has a rich set of representations and interactions. Representations are used for showing the ontology and its parts from different perspectives. They allow comparing, associating, and differentiating values of facets, and identifying trends and patterns in various facets. Interactions serve as glue that links all representations, facilitate navigation among them, and allow modifying the ontology’s facets according to users’ needs. Representations and interactions are discussed in greater detail below.
3.4.1 Representations In VICOLEX all representations are dynamically generated from the ontology. Locations are mapped on Google Maps. Markers on the map encode the numbers of documents linked to each location. But besides the Google Maps, VICOLEX utilizes additional representations such as pie charts, histograms, embedded maps, tag clouds, and Kohonen maps for showing different facets about locations. A snapshot of a visualization is shown in Figure 2. Each marker is linked to many additional representations: a pie chart, a histogram, an embedded map, a Kohonen map, and a tag cloud. The pie chart displays languages (Figure 2.a); the histogram shows years of publication (Figure 2.b); the embedded map visualizes places of publication (Figure 2.c); the Kohonen map represents subjects (Figure 2.d); and the tag cloud shows the names of authors (Figure 2.e). Displaying the ontology in VICOLEX allowed me to increase the number of facets that can be made salient. In particular, VICOLEX shows not only facets for discrete locations but for the ontology in general: a) genres (Figure 2.h), b) subjects (Figure 2.i), c) physical properties (Figure 2.j), d) languages (Figure 2.k); e) publication years (Figure 2.g); and e) acquisition years (Figure 2.f). These additional facets show when the first book in the entire collection was published, the peak of publishing, the most prevalent language, and many other aspects. Having these additional facets allows for comparison with facets for individual locations (i.e., it facilitates comparing properties of the whole country with the properties of individual locations).
276
Buchel
Figure 2. Facets in VICOLEX: a) languages (Kyyiv); b) years of publication (Kyyiv); c) places of publication (Kyyiv); d) subjects (Kyyiv); e) authors (Kyyiv); f) years of acquisition (entire collection); g) years of publication (entire collection); h) genres (entire collection); i) subjects (entire collection); j) physical properties (entire collection); k) languages (entire collection); l) locations (i.e., placenames).
Representations inside markers use size and colour encodings to make facet values easy to compare. For example, colour encodings on pie charts show different languages (Figure 2.a); on histograms, colour encodings separate years of publication that belong to different historical periods (Figure 2.b); size encodings on Kohonen maps show density of subjects (Figure 2.c); and size encodings on tag clouds make the most prolific authors more prominent (Figure 2.d). Encodings are consistent throughout locations. It means that if Ukrainian language is encoded blue in Kyyiv, it is encoded blue in other locations as well. Such encodings scaffold visual comparisons and assist with making sense of local knowledge in different locations. For example, enabling users to see languages on color-coded pie charts enhances understanding of trends in languages. One of the trends that can be identified on pie charts is that the Russian language is more common in Eastern Ukraine. Other trends can be found on histograms.
Chapter 11. Designing and visualizing faceted geospatial ontologies
277
During the Soviet period, the publication rates were low; moreover, publishing started only in 1958. This is not surprising because during the first half of the Soviet period Ukrainians lived through the Russian Civil War (1918 – 1922), collectivization, the Holocaust (1932 – 1933), World War II (1939 – 1945), and the Stalin era. During these events the economy and infrastructure of the Soviet Ukraine was basically destroyed. It took many years to rebuild the country, but after 1991 (when Ukraine became independent) the publication rates significantly increased. This phenomenon in ontology can be explained, mostly by the growth of the market economy in Ukraine. A spike in publishing can be observed in almost every location. This similarity between patterns in the ontology, and the facts from history and social life, reinforces the assertion that faceted ontologies from KOS can represent local knowledge about geographic locations. It is also important to note that with in-text representations, such trends were hard to notice, but representations in visualizations, trends and patterns became apparent immediately. This demonstrates that graphical representations do not merely display facets; rather, they add another layer of granularity to ontologies, and shed light on facets more efficiently than text representations.
3.4.2 Interactions VICOLEX is enhanced with numerous interactions which allow carrying out actions on facets in geospatial ontologies. More specifically, VICOLEX has filtering, selecting, annotating, and comparing interactions. In VICOLEX the purpose of these interactions is to support the discovery of transitional properties. Transitional processes in general involve movement, development, evolution or change from one object, state, stage, subject, form, style, or place to another (Sedig, Rowhani, & Liang, 2005). As a result of transitions, facets in ontologies increase, shrink, or remain unchanged. Examples of transitions can be quantitative and qualitative changes over time and space. Transitional processes are not always easy to understand (Tufte, 1997). Very often they remain “hidden” until encountered (Dodge & Kitchin, 2001), because they cannot be represented by means of static representation alone (Sedig, Rowhani, & Liang, 2005; Tufte, 1997), rather they require interactions to be understood and made sense of. While investigating transitions in library collections (Buchel & Sedig, in review), we found that transitional processes in library KOS are closely associated with transitions in locations. Here are a few examples. After 1991 there were substantial changes in subjects in various locations. The newly emerged subjects are:
278
Buchel
Americans Armenians Atrocities Authors, Ukrainian California Commerce Economic conditions Ethnic relations Ethnology Hungary Immigrants International Executive Service Corps Jews Massacres Minorities Pilsudsky, Joseph 1867 – 1935 Political figures Political prisoners Political persecutions Politicians Politics and government Prisoners of war Rehabilitation Religious life and customs Shashkevych, Makiian Social life and customs Ukrainian students Ukranian, nationalism Ukrainians Vinnytsia Massacre, Vinnytsia, Ukraine, 1937 – 1938
Although these subjects are not statistically significant, they represent emerging subjects and reflect changes in the political, economic, and cultural landscapes of Ukraine. Many of these subjects were prohibited during the Soviet Union due to their democratic, nationalistic, or political nature. Subjects such as Social life and customs, Ethnology, Ethnic relations, Ukrainians, Ukrainian students, “Authors, Ukrainian, “Ukrainian, nationalism,” “Minorities, Jews,” and Armenians provide evidence for the fact that the formerly repressed Ukrainian language and other ethnic languages in Ukraine flourished at the end of the twentieth century. The subject Religious life and customs reflects changes in the spiritual life of Ukrainians who became more religious after the return of religious freedom in 1991. The subjects Vinnytsia Massacre, Vinnytsia, Ukraine, 1937 – 1938, Political prisoners, Rehabilitation, Political prosecutions, Prisoners of war, Massacres, Atrocities, Pilsudsky, Joseph 1867 – 1935, and Shashkevych, Makiian emerged after 1991 as
Chapter 11. Designing and visualizing faceted geospatial ontologies
279
well. Seeing them after 1991 is not surprising because these subjects were taboo in the Soviet Union. They refer to the crimes of the Stalin era (1936 – 1953) and political figures who played a controversial role in Ukrainian history and which Ukrainians could not even mention during the Soviet Regime. Concerning transitional processes in languages, we found that the books in Polish about Ukraine were generally not published before 1981. But after 1981 the number of publications in Polish increased significantly. This could probably be explained by the pro-democratic social movements in Poland and the Soviet Union during the 1980s. In Poland, the Solidarnosz movement began in 1981, which was followed by perestroika and glasnost in the Soviet Union in 1985 – 1991. These social movements eventually led to the collapse of the Soviet Union and the Eastern Bloc, and consequently lessened the degree of censorship and led to a market economy. After these movements had begun, Polish authors started documenting their recollections about their own life and the life of their relatives in L’viv before 1939, when L’viv became part of the Soviet Ukraine. They also started interpreting history from their own (Polish) perspective, in which L’viv is part of Poland. Apparently, it was difficult to publish books on such topics during the Communist rule in Poland. But it is interesting that this tacit knowledge became prominent in the visualization after filtering. The most astonishing fact, however, that we were able to extract with VICOLEX is that only eight out of 58 books with maps were published before 1991. Moreover, one of these eight was published during the Imperial Russia era. This fact confirms earlier finding made by cartographers (Monmonier, 1996) that during the Cold War, map publishing only occurred at a low level in the Soviet Union. Another interesting feature of maps is that the coverage of maps in books published before 1991 includes only large cities (L’viv, Kyyiv, and Odessa), while the coverage of the remaining 50 books published after 1991 includes both large and small locations. This feature is not surprising since many locations in Ukraine were closed to foreigners until the mid-1990s, and the visualization represents this fact very well. These findings suggest that interactions can enhance understanding of local knowledge about locations in faceted geospatial ontologies. They enhance visibility of changes (transitions) in facets in geospatial ontologies. All these processes and behaviours of facets were not noticeable in text and graphical representations alone; rather, the transitions came to light when the representations became malleable, dynamic, and flexible with the help of interactions.
280
Buchel
4 Conclusions Throughout this paper, the goal has been to demonstrate how content-rich geospatial ontologies of local knowledge can be designed from faceted metadata. Such ontologies can be represented in a form of text and visualization. Both forms of representation reveal interesting information about local knowledge, but the visual representation of geospatial information and interaction with it is more convenient and effective than text since it enables users to make immediate conclusions regarding historical and geospatial trends and transitions. This would not be possible or would be very complicated to achieve by means of solely textual indexing, browsing, and representation.
4.1 Implications for libraries Although the design of geospatial ontologies from facets seem to offer an interesting solution for accessing local knowledge, library KOS would have to be augmented with basic gazetteer properties for such ontologies to be built. The design of basic gazetteers or as they are sometimes referred to as geospatial ontologies has been given proper attention (Budak, Sheth, & Ramakrishnan, 2004; Hill, 2006; W3C, 2007). According to these sources, basic gazetteers at the very minimum should have: coordinates or footprints, place names, and features types (e.g., river, mountain ridge, city, and so on), and possibly relationships between place names. It would also be useful to add relevant time periods and demographic information to each location. While time periods are necessary for colour-coding in histograms of years of publication, demographic information can help users validate patterns found in ontology and learn more about places. Without demographics and colour-coded timelines, users might not be able to recognize patterns and assign meaning to patterns (Mirel, 2007). Demographics and timelines can serve as frames of reference for validating patterns. With the wide availability of gazetteers such as Google Places API, Geonames API, flickr.places API, and other gazetteers building customized gazetteers has become easier than ever. These gazetteers have high-quality footprint and feature type information which can be downloaded via API in batches. Information about historical periods and demographics for different locations can be found in articles about places in Wikipedia and census records.
Chapter 11. Designing and visualizing faceted geospatial ontologies
281
4.2 Significance The ideas discussed in this paper not only are significant for the library community or library users, but also can be of interest to historians, social scientists, genealogists, and other researchers studying social, historical, and cultural phenomena. This paper demonstrates that library catalogues (as well as genealogical records, archival materials, Google Books metadata, and other sources) can be used not only for book indexing and retrieving, but also for exploration of local knowledge, discovery of interesting phenomena about history, and social processes in various geospatial locations. They can be turned into interesting data for researchers. This is especially important for libraries and collections that are experiencing low usage (e.g., collections in foreign languages). While text representations in foreign languages create a barrier to accessing the collections (because not all users can understand or submit queries in foreign languages), visual representations add meaning to items in foreign languages and allow users to explore social landscapes derived from book metadata without making them think about queries and retrieval.
References Andrienko, N., & Andrienko, G. (2006). Exploratory analysis of spatial and temporal data: A systematic approach. Berlin: Springer. Barricelli, B. R., Iacob, C., & Zhu, L. (2009). Map-based wikis as contextual and cultural mediators. Paper presented at MobileHCI’09. Bellucci, A., Malizia, A., Diaz, P., & Aedo, I. (2010). Don’t touch me: Multi-user annotations on a map in large display environments. Proceedings of the International Conference on Advanced Visual Interfaces. Berry, T., Casado, M., & Dixon, L. (2003). The local nature of digital reference. The Southeastern Librarian, 51(3), 8 – 15. Bishop, B. (2011). Location-based questions and local knowledge. Journal of the American Society for Information Science and Technology, 62(8), 1594 – 1603. Bishop, B., & Torrence, M. (2007). Virtual reference services: Consortium vs. stand-alone. College and Undergraduate Libraries, 13(4), 117 – 127. Buchel, O., Hill, L. (2009). Treatment of georeferencing in kinowledge organization systems: North American contributions to integrated georeferencing. Proceedings of the North American Symposium on Knowledge Organization (NASKO). Retrieved from http: // iskocus. org / publications / nasko2009 / Buchel_Hill-NASKO2009.pdf Buchel, O., & Sedig, K. (2011). Extending map-based visualizations to support visual tasks: The role of ontological properties. Knowledge Organization, 38(3), 205 – 230. Buchel, O., & Sedig, K. (in review). Making sense of document collections with map-based visualizations: The role of interaction.
282
Buchel
Budak A. I., Sheth, A., & Ramakrishnan, C. (2004). Geospatial ontology development and semantic analytics. Transactions in GIS, 10(4), 551 – 575. Card, S. K., Mackinlay, J. D., & Shneiderman, B. (1999). Readings in information visualization: Using vision to think. San Francisco, CA: Morgan Kaufman Publishers. Central Intelligence Agency. (2011, June 3). The world factbook. Retrieved from https: // www.cia. gov / library / publications / the-world-factbook / Chen, M., & Qin, J. (2008). Deriving ontology from folksonomy and controlled vocabulary. Proceedings of iConference 2008: iFutures: systems, selves, society. Dodge, M., & Kitchin, R. (2001). Mapping cyberspace. New York, NY: Routledge. Farazi, F., Maltese, V., Giunchiglia, F., & Ivanyukovich, A. (2011). A faceted ontology for a semantic geo-catalogue. The Semantic Web: Research and Applications, Lecture Notes in Computer Science, 6644, 162 – 189. Fast, K., & Sedig, K. (2005). The INVENT framework: Examining the role of information visualization in the reconceptualization of digital libraries. Journal of Digital Information (JoDI), 6(3). Fonseca, F. T., Egenhofer, M., Agouri, P., & Camara, G. (2002). Using ontologies for integrated geographic information systems. Transactions in GIS, 6(3), 231 – 257. Geertz, C. (1983). Local knowledge. New York: Basic Books. Guy, M., & Tonkin, E. (2006). Folksonomies: Tidying up tags? D-Lib Magazine, 12(1). Retrieved from http: // www.dlib.org / dlib / january06 / guy / 01guy.html Hill, L. (2006). Georeferencing: The geographic associations of information. Cambridge, MA: MIT Press. Hrytsak, Y. (2005). Historical memory and regional identity. In C. Hann & P. Magocsi (Eds.), Galicia: A multicultured land (pp. 185 – 209). Toronto: University of Toronto Press. Introduction to The Columbia gazetteer of the world. (n.d.). Retrieved from http: // www. columbiagazetteer.org / static / about_gazetteer Janée, G. (2006). Spatial footprint visualization. Retrieved from http: // www.alexandria.ucsb. edu / ~gjanee / archive / 2006 / footprint-visualization / La Barre, K., & Cochrane, P. A. (2006). Facet analysis as a knowledge management tool on the Internet. In K. Raghavan & K. Prasad (Eds.), Knowledge organization, information systems and other essays: Professor A. Neelameghan festschrift (pp. 53 – 70). New Delhi: Ess Ess Press. Larson, R. R. (2010). Bringing lives to light: Biography in context. University of California, Berkeley: Electronic Cultural Atlas Initiative. Leibowitz, H. W., Guzy, L. T., Peterson, E., & Blake, P. T. (1993). Quantitative perceptual estimates: Verbal versus nonverbal retrieval techniques. Perception, 22, 1051 – 1060. Marcante, A., & Provenza, L. P. (2008). Social interaction through map-based wikis. Psychology Journal, 6(3), 247 – 267. McGrath, K., Kules, B., & Fitzpatrick, C. (2011). FRBR and facets provide flexible, work-centric access to items in library collections. Proceedings of JCDL 2011 (pp. 49 – 52). Mirel, B. (2007). Usability and usefulness in bioinformatics: Evaluating a tool for querying and analyzing protein interactions based on scientists’ actual research questions. Proceedings of the IEEE International Professional Communication Conference. Monmonier, M. (1996). How to lie with maps. University Of Chicago Press. Poore, B. (2011). WALL-E and the “many, many” maps: Toward user-centred ontologies for the national map. Cartographica, 45(2), 113 – 120. Peuquet, D. J. (2002). Representations of space and time. New York, NY: The Guildford Press.
Chapter 11. Designing and visualizing faceted geospatial ontologies
283
Rosenfeld, L., & Morville, P. (2002). Information architecture for the World Wide Web. Sebastapol, CA: O’Reilly Media. Sears, J. (2001). Chat reference service: An analysis of one semester’s data. Issues in Science and Technology Librarianship, 32. Retrieved from http: // www.istl.org / 01-fall / article2. html Sedig, K., Rowhani, S., & Liang, H. (2005). Designing interfaces that support formation of cognitive maps of transitional processes: An empirical study. Interacting with Computers: The Interdisciplinary Journal of Human-Computer Interaction, 17(4), 419 – 452. Simon, R., Sadilek, C., Korb, J., Baldauf, M., & Haslhofer, B. (2010). Tag clouds and old maps: Annotations as linked spatiotemporal data in the cultural heritage domain. Workshop On Linked Spatiotemporal Data (pp. 12 – 23). Singel, R. (2005). A disaster map ‘wiki’ is born. Wired. Retrieved from http: // www.wired.com / software / coolapps / news / Spence, R. (2001). Information visualization: Design for interaction. Essex, UK: Pearson Education Limited. Tudhope, D., Binding, C., Blocks, D., & Cunliffe, D. (2002). Representation and retrieval in faceted systems. In M. López-Huertas (Ed.), Proceedings of the 7th International Society of Knowledge Organization Conference (ISKO 2002), Advances in Knowledge Organization (pp. 191 – 197). Tufte, E. (1997). Visual explanations: Images and quantities, evidence and narrative. Cheshire, CT: Graphics Press. Vander Wal, T. (2007). Folksonomy coinage and definition. Retrieved from http: // www. vanderwal.net / folksonomy.html W3C. (2007). W3C geospatial ontologies: W3C incubator group report 23 October 2007. Retrieved from http: // www.w3.org / 2005 / Incubator / geo / XGR-geo-ont-20071023 / Yee, K.-P., Swearingen, K., Li, K., & Hearst, M. (2003). Faceted metadata for image search and browsing. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ‘03) (pp. 401 – 408). Zhitomirsky-Geffet, M., Bar-Ilan, J., Miller, Y., & Shoham, S. (2010). A generic framework for collaborative multi-perspective ontology acquisition. Online Information Review, 34(1), 145 – 159. Київ: місто, його жителі, проблемне сьогодення і бажане завтра [«Kiev: the city, its residents, problems of today, wishes for tomorrow»]. (2006, April 29). Retrieved from http: // dt.ua / ARCHIVE / kiyiv_misto,_yogo_zhiteli,_problemne_sogodennya_i_bazhane_ zavtra-46651.html
Part IV: Case studies
Paweł Rygiel
Chapter 12. Subject indexing of images: Architectural objects with complicated history Abstract: The purpose of this chapter is to present the issue of subject indexing of architectural object images situated in the regions which in the past belonged to various countries. The author will discuss whether controlled vocabularies and authority files are sufficient for the subject indexing of images. Finally, the chapter will focus on the question of how adequately the above mentioned visual materials are described. With these aims in view, it is necessary to consider the difficulties connected with the selection and usage of appropriate terms or subject headings which appear in authority files. The selected examples of architectural objects come from Poland and its neighbouring countries. Keywords: Architecture, subject authority files, buildings, controlled vocabulary, image indexing, subject control, subject access, visual information, subject heading languages, image analysis, subject analysis
Paweł Rygiel, Senior Librarian, Bibliographic Institute, The National Library of Poland, [email protected]
Introduction Presenting museum, library and archive collections (or at least portions of them) online has become common practice in recent years (e.g. Digital Collections of the Library of Congress, Flickr Commons, Gallica – bibliothèque numérique of Bibliothèque nationale de France, steve.museum, Bildindex der Kunst und Architektur, etc.). As Jörgensen (1999) noted: Rapid expansion in the digitization of images and image collections (and their associated indexing records) has vastly increased the numbers of images available to scholars and researchers through electronic means, and access to collections of images is a research focus of several of the Digital Library research initiatives underway at universities across the nations. (p. 293)
288
Rygiel
Also, Matusiak (2006) stated: The expansion of digital technologies has enabled wider access to visual resources held by museums and libraries. In the last decade cultural institutions have undertaken large-scale digitization projects to convert their collections of historical photographs and art slides to digital format. Digitized images are presented to users on the web through digital collections that offer enhanced image manipulation and multiple search options. (p. 283)
Among the data accessible via the Internet, there are visual resources such as paintings, drawings, engravings and photographs which constitute a rich and vital source of information about, for example, the history of various places. These sources frequently portray architectural objects whose name, form and function might have changed throughout their history. Such alterations could have resulted from changes in ownership or from historical events that influenced the geopolitical situation of the place where the objects are located. The chapter is an attempt to discuss the problem of subject indexing from the perspective of a library, an institution that is obliged to provide reliable and coherent information and must thus follow the existing standards of indexing. The main problems to be reviewed are the subject indexing of iconographic documents using the example of architectural object images, as well as the use of headings from authority files and controlled vocabularies in descriptions of such documents via subject headings. Another vital issue, image retrieval and user studies, has not been included in this chapter. It has been discussed by many researchers (e.g. Armitage & Choi & Rasmussen, 2003; Enser, 1997; Enser, 1995, 2008; Jörgensen, 2003; Jörgensen, Jaimes, Benitez & Chang, 2001; Matusiak, 2006; Trant, 2009). It appears that they require separate, detailed studies into retrieval as well as into the expectations and needs of users who search for information on images of architectural objects. The progress in the field of information technology has made collections more accessible (Springer et al., 2008; Trant, 2009; Wyman, Chun, Cherry, Hiwiller, & Trant, 2006). It has also caused the interest in digital heritage to grow: Vast collections of both analog and digital images now exist in many types of organizations and institutions: libraries, museums, archives, information centers, research centers, hospitals, educational institutions, and newspapers, as well as in personal archives. While some image collections may be of interest only to researchers in a narrow field (such as those being created for medical diagnoses), other image collections are being created that will be of interest to many, including the artist, the scientist, the educator, the historian, the journalist, the student, and the leisure seeker. (Jörgensen, 2003, p. 1)
Chapter 12. Subject indexing of images
289
Uploading both digitized images of monuments and their descriptions has placed them in a new context; descriptions have changed: new categories of data have been added and the existing categories have been adapted to the new circumstances (Besser, 2003). Ways of searching for information have also evolved (Jansen, 2008; Jörgensen, 2003). Moreover, descriptions have been subject to assessment and verification by users who access the resources. Sometimes this is beneficial for the collections and institutions as it leads to improvements in the quality of descriptions and helps complete the missing data (see Springer et al., 2008). It must be remembered, however, that the majority of cultural institutions that upload their collections use or must use the norms and standards that apply to descriptions in order to avoid discrepancies and lack of clarity (Baca, 2002; Tillett, 2004). However, it seems that cataloguing image collections presenting architectural objects requires a modification of the existing description methods, including changes in the methods of creating and using authority files. This work is based upon practical experience of librarians working for the National Library of Poland and other Polish libraries. It provides a review of problems found in the representations included in the records of documents featured in the catalogues of Polish libraries. Certain aspects of subject description have helped us realise the types of problems that subject indexing of iconographic documents might cause. There have been more and more questions about what should be described, how it should be described, and with how much detail. The examples of photographs, postcards, and engravings that feature architectural objects whose functions have been changing throughout history became a reason to study this problem, and to consider whether authority files and controlled vocabularies could be useful in creating descriptions of such images. For some time now, the iconographic collections have been catalogued in the National Library of Poland. At first, their descriptions were included in card catalogues, then in an OPAC. Currently, they are being digitised, provided with a set of data according to the Dublin Core scheme, and placed in the digital library created and maintained by the National Library (cBN Polona; http: // www. polona.pl). The Europeana portal (http: // www.europeana.eu), where these collections constitute a vital part of the European cultural heritage, places them in a wider European context. The data accompanying the digitised resources seem to be standardised and coherent; yet, problems with description, which will be discussed here, can undeniably have influence upon the global perception of the collection as well as upon retrieving information about these resources. It is to be expected that other institutions that place their collections in the Europeana portal struggle with similar difficulties, especially in the context of the turbulent history of many European countries. It is very likely that the same issues prove problematic in other parts of the world.
290
Rygiel
Literature review The problem of creating unified authorities for objects and subject indexing of images of architectural objects has not attracted widespread interest in library collections. The issue is absent both from the works on using indexing tools such as the Library of Congress Subject Headings (Chan, 2005, 2007; Drabenstott & Vizine-Goetz, 1994; Lopes & Beall, 1999), and from the guidelines for creating subject authority files (Guidelines for Subject, 1993; Zeng, Žumer, & Salaba, 2011). Some information on the structure of headings for architectural objects can be found in manuals for creating subject headings. The Subject Cataloging Manual (1996), in the section on Buildings and other structures (H1334), describes how to construct a subject heading. It specifies the form of entry: model, tagging, entry term, language. It also specifies geographic qualifiers, UF references and other entry elements. In Guide d’indexation RAMEAU (n.d.), there is a chapter titled “Sciences humaines et sociales”, where we can find information about creating geographical names (Noms géographiques); among others there is some information about language and structure of headings for human constructions (Constructions humaines). In a book devoted to the indexing of art library resources (McRae & White, 1998), the issues connected with using the MARC format for describing art collections (also architecture) are discussed, but in no chapter is there a mention about creating authorities for architectural objects. There are a few works on creating subject authorities used for indexing architectural documents in art museums and similar institutions. One of them is an article (Harpring, 1990) which presents the Architectural Subject Authority as a part of cataloguing rules and automated systems (under development by the Architectural Drawings Advisory Group (ADAG) and the Foundation for Documents of Architecture (FDA)) devoted to cataloguing of architectural drawings and other documents related to architecture. According to Harpring (1990): The FDA ‘authority’ is not the more traditional authority lists only accepted name for the building. Rather, the FDA authority for Architectural Subjects (and similarly, authorities for Persons, Corporations, and Geographic Places) contains a full record of information describing and indexing the Architectural Subject. Such information includes the following: the FDA-preferred name in the vernacular, as well as all alternate names, e.g., former names or names in various languages; the dates of execution of the Architectural Subject; its location; indexing terms (such as “Latin cross buildings” and “churches”) drawn from the Art and Architecture Thesaurus (AAT); the names and full records for related persons or corporations (such as the architects and patrons); and the names and full records for all related Architectural Subjects. (p. 56)
Chapter 12. Subject indexing of images
291
The issue was further continued and more details were discussed in the work on the description of architectural drawings (Porter & Thornes, 2000). These publications are linked to the works of the Getty Research Institute, which is a leading institution in the field of setting standards for the description of works of art. These standards are mainly directed at museums but can be used by other institutions whose collections include works of art or other visual resources. Other publications of the Institute concentrate upon problems connected with the analysis and interpretation of works of art; they also focus on issues related to description itself: the use of various tools and standards (Baca, 2002; Baca & Harpring, 2009; Harpring, 2010). Currently, the project CONA (the Cultural Objects Name Authority), which is overseen by Getty Research Institute, is being prepared. It is supposed to encompass authority headings for works of art, including architectural objects (Cultural Objects Name, n.d.). As it is noted in About CONA: The focus of each CONA record is a work of art or architecture. In the database, each work’s record (also called a subject in the database, not to be confused with iconographical depicted subjects of art works) is identified by a unique numeric ID. Linked to each work’s record are titles / names, current location, dates, other fields, related works, a parent (that is, a position in the hierarchy), sources for the data, and notes. (para. 13)
And further about architecture: Built works within the scope of CONA are architecture, which includes structures or parts of structures that are the result of conscious construction, are of practical use, are relatively stable and permanent, and are of a size and scale appropriate for – but not limited to – habitable buildings. Most built works in CONA are manifestations of the built environment that is typically classified as fine art, meaning it is generally considered to have aesthetic value, was designed by an architect (whether or not his or her name is known), and constructed with skilled labor … Most fields in CONA records are written in English. However, the structure of CONA supports multilinguality insofar as titles / names and descriptive notes may be written and flagged in multiple languages. All information is written in the Roman alphabet (pending our conversion to Unicode in the future). (para. 16, 20)
The guidelines presented in these publications mentioned above contain useful information about indexing and authority records, but they do not, however, solve many of the problems to be discussed in this chapter.
Subject indexing and indexing tools Indexing can be defined as a process of creating document representations and / or retrieval instructions. Lancaster (1998) notes:
292
Rygiel
The main purpose of indexing and abstracting is to construct representations of published items in a form suitable for inclusion in some type of database … The abstractor writes a narrative description or summary of the document while the indexer describes its contents by using one or several index terms, often selected from some form of controlled vocabulary. (pp. 1, 5)
Indexing consists of the following sequence of steps: subject analysis, selection of information and translation to the language of indexing (transforming the message, which entails changing its language without losing information, as much as it is possible) (Chu & O’Brien, 1993; Lancaster, 1998). In the case of iconographic documents, this also involves transforming visual code into written code, as mentioned in Matusiak (2006): It involves translating the visual information into textual description to express what the image is about and what it represents … The process of translating the content of an image into verbal expressions poses significant challenges to concept-based indexing. (p. 285)
Information retrieval languages used as tools for subject indexing, depending on the quality of their lexicons, can be divided into controlled vocabularies, such as thesauri, subject headings, or classifications and uncontrolled vocabularies: keywords and tags (Lancaster, 1998). However, the tools most frequently used in libraries (subject headings, thesauri, classifications), despite their unquestionable benefits, are not well-suited to the description of iconographic material; they are, in fact, often completely useless for this purpose (O’Connor & Wyatt, 2004). Also, due to the very rich iconographic content of images, these tools are inadequate for creating full, comprehensive descriptions, suited to the expectations of users with different levels of competence and various information needs. Libraries do not use all the existing indexing tools for the description of iconographic documents. Some of the tools (such as subject headings) are too general to enable the description of certain elements of the iconographic aspect of images; others, such as the Library of Congress Thesaurus for Graphic Materials, the Art & Architecture Thesaurus, and Iconclass, encompass elements connected with both the technique of creation and the iconography of objects (Baca, 2002; Harpring, 2010).
Authority files and controlled vocabulary Authority files are a vital element of any system of information (Tillett, 1989, 2004). They contain unified, authoritative forms of headings (for example, authors, titles or subjects) used as access points in library catalogues and other sources of
Chapter 12. Subject indexing of images
293
information. High quality and coherence are guaranteed by adequate authority and vocabulary control procedures that the data must undergo. The consistency with which they are used ensures that data are uniformly indexed, which is vital for both those who catalogue and those who search for information (Aluri, Kemp, & Boll, 1991; Drabenstott & Vizine-Goetz, 1994; Lancaster, 1998). Although the usefulness of controlled vocabularies generally has not been questioned, there have been some indications that they do have weak points. Klingbiel made arguments against using controlled vocabularies (as cited in Aluri et al., 1991): – Highly structured controlled vocabularies are obsolete for indexing and retrieval. – The natural language of scientific prose is fully adequate for indexing and retrieval. – Machine-aided indexing of natural language text is within the state of the art. – Natural language retrieval can be conducted online if the request can be stated in a phrase or a sentence. (p. 35) Other arguments against the use of controlled vocabulary are noted by Aluri, Kemp, and Boll (1991): There are nuances in ideas, and these nuances are expressed in the choice of words. By treating two or more words as synonyms and then choosing one of them to stand for the other(s), as happens when a vocabulary is controlled, the opportunity to express the nuances is lost … The indexer and the searcher are limited to the terminology used, to the scope of each concept (term or notation) in that system, and to the structure of the system, as decided by the designers. This is not always helpful for a given library or clientele. (p. 35)
Also, Matusiak (2006) and McCutcheon (2009) pointed the weaknesses of controlled vocabulary. User studies have also been conducted, which provided evidence for the earlier assumptions (Gross & Taylor, 2005; Ménard, 2009, Wyman, Chun, Cherry, Hiwiller, & Trant, 2006). Authority files are used for describing many types of resources, and subject authority control is a vital process. However, have all the aspects of iconographic material been considered, and are current solutions adequate for the description of images and suited to the expectations of users? What can cataloguers do with images which represent really problematic objects? Are there any solutions? Maybe some examples of such images, which we will see in my further considerations, will explain what I mean and why I have addressed such a problem.
294
Rygiel
Subject analysis – image analysis The subject analysis of a resource can be defined as a process of establishing the subjects of a document. During this process, the most suitable subject headings (or descriptors) for the comprehensive description of its content are found. Such analysis is not an easy task. The question is which features of documents should be considered in the subject analysis (Albrechtsen, 1993). Visual materials may require a completely different approach. Certain features, like title, author, and appearance, can be easily identified and used in the description as access points. Furthermore, any image can be differently perceived and interpreted by various groups of users, who may have varying competencies and specific needs. What exactly is analysed? What is the picture / image? According to the definition in ODLIS, a picture is: a two-dimensional visual representation or image large enough to be easily viewed without magnification, usually rendered in black and white or colour on a flat, opaque surface. The term includes paintings, drawings, art prints, photographs, reproductions, illustrations, clippings of pictorial matter, etc., and is often used in a generic sense when a more specific word is inappropriate. (Reitz, n.d.)
The problem of analysing the meaning of a work of art has been discussed by, among others, the art historian Erwin Panofsky, who suggested that there might be three strata of the meaning of an artwork. A work of art can be described on the pre-iconographic level, which consists of general aspects – the primary or natural subject matter; the iconographic level, which encompasses concrete aspects – the secondary or conventional subject matter, and the iconological level, which includes symbolic, abstract aspects – the intrinsic meaning or content. These three strata of interpretation (pre-iconographic description, iconographic analysis and iconological interpretation) require specific skills from the interpreter, namely practical experience, the knowledge of resources, themes and concepts as well as synthetic intuition (Panofsky, 1962). Panofsky’s scheme was enriched by Sara Shatford, who adapted these ideas to image indexing (Shatford, 1986). She re-named Panofsky’s strata as generic, specific and abstract, and subdivided each of these areas into four aspects, each of which forms an answer to one of the following questions: Who?, What?, Where?, When?. In this way she created a model of description which encompasses all the aspects of a picture. Shatford also introduced the distinction between of (what a picture consists of) and about (what a picture is about). In one of her other works Shatford went beyond describing the visual contents of a picture and included in
Chapter 12. Subject indexing of images
295
her description the elements which she defined as biographical, exemplified and relationship (Shatford Layne, 1994).
Ofness and aboutness The process of indexing is connected with the concepts of ofness and aboutness. Aboutness is the more widely used of the two terms. It focuses on what is conveyed in a document, what it is about, and what its content, subject or theme is. Aboutness depends on the interpretation of the set themes, motifs, actions and events in a work. It is one of the many terms used for expressing certain features of a text or document (features different from the form and the so called description data). Aboutness (and other terms with the same meaning), as a vital element of the organisation of knowledge and information retrieval, has been discussed by many researchers (Beghtol, 1986; Bruza, Song, & Wong, 2000; Hjørland, 1992, 2001; Hutchins, 1978; Maron, 1977). Aboutness is a category that can be used for both language and non-language resources, while the other category – ofness – is the term which only applies to the analysis of pictures. Ofness refers to the elements that a picture consists of. The concepts of aboutness and ofness used in picture description are based on Panofsky’s theory about the three strata of meaning in a work of art. Aboutness and ofness are characteristic of the pre-iconographic and iconographic levels of description. Panofsky’s third stratum – the iconological level of description – refers to the interpretation of the inner meaning or content of a work of art (Panofsky, 1962; Shatford, 1986).
Image indexing First of all, it is necessary to raise the question as to whether it is at all possible to describe the subject and content of an iconographic document. If it is, how should it be described? What is the subject of such a document? What is it about? How should works of art be interpreted and what should be included in their description (also its library index)? What aspects of images could prove interesting to users? Svenonius (1994) asked: What then is a subject? Such a question addressed to a group of information specialists is likely to elicit the words “theme,” “topic,” or “aboutness” (Russell, 1982). This, however, is not very useful in that it begs the question asked; we must ask again what is meant by “theme,” “topic,” or “aboutness.” Philosophical confusions abound in attempting to define
296
Rygiel
a subject and, in particular, what is meant by the subject of a document (Maron, 1977; Wilson, 1968). While these are interesting, they are not constructive. One way to sidestep some of the confusion is simply not to define but to postulate a model of subject indexing. Such a model might take its analog from grammar. In grammar the subject of a logical proposition denotes what is spoken of, a concept or thing. The predicate of a proposition then says something about that concept or thing. (p. 601)
In similar context, Roberts (2001) noted: Like people, works of art become more interesting as one knows more about them, their lives, their experiences, their reputation. There is a lot to be known about a work of art. In the first place, it is a material object, consisting of pigment and canvas, metal or stone, paper or vellum, brush- strokes or chisel marks. It has a history of damage and perhaps conservation; it is of a certain size, weight, and density. Furthermore, a painting has a history. It was created by a certain artist at a particular time and place. Perhaps it was commissioned by a patron; this document may still exist. It cost something in materials and time to create; certain sums of money were paid for it. It may depict real persons or historic events for which documentary evidence exists. It may have been bought and sold many times, and listed in inventories, sales catalogs, and collection catalogs. At different times in its career it might have been attributed to different artists and given different titles. It may have been exhibited, and variously interpreted by critics. It may have been highly valued at one time or declared worthless during a different time. (pp. 913 – 914)
While creating an image description, the cataloguer may consider the formal aspects of the object (such as the medium, the technique and / or the material, e.g. albumen print, etching, ink drawing, oil on board painting) or other categories which are not directly related to the content of an image (Booth, 2001). These may include: – type of document / field of art (photography, engraving, drawing, painting) – dating (15 century, 1401 – 1500, the end of 19 century) – culture, epoch, art movement (Chinese art, Renaissance painting, impressionism) However, the indexer’s main task is to specify content features, to create the description of the “content” of a document. The basic elements of content include, among others: – objects and actions (a child, a village, a car, a waterfall, a dog, harvest) – type of picture (portrait, landscape, battle scene, still life, genre scene) – scene or iconographic type (The Dance of Salome, Christ’s Descent into Limbo, Last Judgement, The Three Graces).
Chapter 12. Subject indexing of images
297
Subject analysis and the description of “content” of iconographic documents are sources of numerous problems (Roberts, 2001; Svenonius, 1994). Establishing and defining the subject and meaning of an image may be problematic. Identifying the objects shown and other elements of an image is frequently difficult. Furthermore, the indexing of multifaceted, often allegoric or symbolic images, which are interpreted on many different levels, presumes a certain degree of knowledge of iconography (Jacobs, 1999; Schroeder, 1998). In the case of images of architectural objects, the knowledge of history and facts as well as the awareness of the context in which the resource was created are also necessary. Abstract images, images related to foreign culture and images created in a different epoch, all of which may contain elements of unknown meaning, undecipherable symbols and gestures, are yet another source of problems. Finally, the circumstances in which the work was created (usually only known to the author of the work) may have influence upon the interpretation of an image.
Architectural objects as the object of analysis Static images which portray reality in a given historical period are a vital and frequent type of representation. Many of them show architectural objects (buildings, bridges, squares, etc.), whose names might have been changed throughout history depending on the countries that have claimed the area they are located in or as a result of changes in their form, function or ownership. Not all such information is included in the headings in authority files. Not infrequently, the current name of the object is given, which is not always appropriate, especially in the case of old objects or objects that have changed as a result of historic events such as war: an object is demolished, destroyed, and then rebuilt in a new form or style. This is a serious problem for indexers, but also for users, who do not obtain results relevant to their search. Architectural objects may undergo various transformations and, as a result, have their names changed, for instance: – change of function and name, – change of form and name, – change of form without change of name, – change of name without change of form. Architectural objects are most often buildings (such as manor houses, palaces, churches, castles, railway stations) situated in or outside cities, having various features and forms and serving different purposes (the purposes frequently change); constructions of various functions (such as gates, walls, bridges), urban
298
Rygiel
structures (squares, roads), etc. Iconographic documents may have one or more objects, elements of an object or objects (such as details). Some such documents may present interiors of buildings or panoramas of places like squares or streets. Also, postcards may show places in a specific way (for example, a collage of images of objects located in a city or country). Each of these types of publication is a source of different problems, connected especially with interpretations, representations, and the choice of subject headings. Indexing images of architectural objects appears to be simple and straightforward. It does not require interpretation or research into and defining the subject of the work; yet, the identification of the object is often a challenge. Architectural objects are more than what is visible and more than the identification and interpretation of what is visible. Indexers must adapt the chosen term to the representation of the object as well as to the existing rules, such as input conventions, and metadata structures. They must not only have knowledge of history, but also the skills necessary to find the relevant information needed for identification. Moreover, they must realise how users might search for information about the object. Language materials containing information about an object do not pose a problem. They usually include the history of the object or part of its history, the account of events that influenced the object, any changes in the name and function of the object, etc. The information given in the material describes the architectural object in the context of its history, so it is not difficult to create or find a suitable controlled vocabulary term for it. Problems arise if the resource to be indexed is, for example, an old print and dates back to the time when the object in question had a different name or served a different purpose than today. Will using the current name of the object, as subject indexing guidelines dictate, be helpful for the user searching for information about this object (Guide d’indexation RAMEAU, n.d.; Subject Cataloging Manual, 1996)? A similar problem arises when images are analysed. The visual message seems straightforward. A photograph, drawing or engraving shows the object as it was at the time when the image was created, unless it is a design of an object to be built or an image of a non-existent structure. There is no doubt as to what the subject of the image is (a problem often encountered when indexing allegoric or symbolic representations). What is important is hic et nunc, the “here and now”, of the image. That is why the accurate identification of the object is so important. What is much more difficult is using an appropriate heading from the authority file, or, if there is no term suitable for the given representation, using the heading that has been authoritatively recognised as appropriate.
Chapter 12. Subject indexing of images
299
Problems with indexing architectural objects The analysis of iconographic documents leads to numerous doubts and questions that need to be addressed, however difficult and ambiguous the answers may be. One of the most serious problems is identification: determining what the image presents and when the image was created (who owned the object at that time, what the object looked like, what its function was and within whose borders it was located). Unfortunately, it is often impossible to obtain reliable data for identification. Often, a photograph is the only representation of a building, such as when warfare demolishes a building, and all sources that could assist in the identification are destroyed or dispersed, such as the local archives. There is also the question of what version of the name to choose: the name in the language of the country in which the object is currently located, or the name of the object when the image was created. It is also necessary to decide whether to make the heading appropriate for the image (refer to the time when the image was created and the name that the object then had) or to use the latest (current) name of the object with a suitable chronological subdivision. This is connected with the rules of using proper names of objects within the file and the way such names function in the catalogue – should headings be appropriate for an object’s particular time span? Should one name be selected and used consistently, although it may not be suitable for all the different images of the object? This has significant influence on the process of searching for information in the catalogue. If the heading appropriate for the object is used, users are more likely to find exactly what they are searching for, but the entire context is lost, information is dissociated from the object, and the material becomes more difficult to find. Adopting the latter solution, however, will lead to creating inadequate descriptions, while the material is overly concentrated. It also needs to be decided which elements of the image should be included and which of them should be omitted if the image is a complex one (a square with a palace and a statue in front of the palace). Should the description include a general heading for the dominant element (for example, the above mentioned square), or should headings be added for each element of the image? Finally, there is the issue of complementarity between the physical and subject description: should the elements of physical description (like title, date, type of representation stated in the title) be part of subject description as well, or not?
300
Rygiel
Indexing of images using subject headings Names of objects are usually proper names that are subject to authority control – the headings that are names of objects can be found in subject authority files. Names are recorded in different ways depending on the information retrieval system; these might differ in the structure of heading and the structure of authority records. The example in Figure 1 shows an object that is situated in Vilnius, Lithuania, a city with a very complicated history. This object was chosen because its authority record is present in various subject heading languages used in libraries.
Figure 1. The Gate of Dawn, Lithuania. Source: Cyfrowa Biblioteka Narodowa – cBN Polona. http: // www.polona.pl / dlibra (public domain).
The image represents The Gate of Dawn (Lithuanian: Aušros Vartai, Polish: Ostra Brama, Belarusian: Вострая Брама). It is a city-gate of Vilnius, the capital of Lithuania today. It was built between 1503 and 1522 as a part of defensive fortifications for the city of Vilnius, the capital of the Grand Duchy of Lithuania. After 1569 Vilnius became a part of the Polish–Lithuanian Commonwealth ruled
Chapter 12. Subject indexing of images
301
by a common monarch. After the Third Partition of Poland in 1795 Vilnius was annexed by the Russian Empire. During World War I, Vilnius and the rest of Lithuania was occupied by the German Army from 1915 until 1918. After the withdrawal of German forces, the city was briefly controlled by Polish self-defence units. Then Vilnius changed hands again during the Polish-Soviet War and the Lithuanian Wars of Independence. Poland and Lithuania both perceived the city as their own. In 1920 the city and its surroundings were annexed by Poland. In 1939 Vilnius was seized by the Soviet Union. At first, the city was incorporated into the Byelorussian SSR. In 1941, Vilnius was captured by Germans. In 1944, Vilnius was taken from the Germans and once again incorporated into the Soviet Union as the capital of the Lithuanian SSR. In 1990 announced its secession from the Soviet Union and restored an independent Republic of Lithuania. The capital of the State of Lithuania shall be the city of Vilnius. As you can see, we could need to deal with images showing the object at different times, when it belonged to different countries. This image shows the object about 1939, when it belonged to Poland. The example below illustrates various approaches to the same object. The solutions in different subject heading systems are as follows. In the Library of Congress Subject Headings (LCSH) (Subject Cataloging Manual, 1996), a heading for an object is its latest proper name in the noninverted form, in the natural order with a geographic qualifier (the place where the given object is currently located); the names of objects dating back to after 1500 are recorded in the language spoken in the country in which the object is currently located; the names of objects dating back to before 1500 are recorded in English, or, if there is no known English name, in the language spoken in the country where the object is currently located. The versions of the name in other languages and the former names (if any) are included as see reference. Aušros Vartai (Vilnius, Lithuania) (preferred name) UF Gate of Dawn (Vilnius, Lithuania) Medininkų Vartai (Vilnius, Lithuania) Ostra Brama (Vilnius, Lithuania) Pointed Gate (Vilnius, Lithuania) BT Christian shrines – Lithuania
In the Répertoire d’autorité-matière encyclopédique et alphabétique unifié (RAMEAU) (Guide d’indexation RAMEAU, n.d.), names of architectural objects (and other man-made structures) are recorded in their original form in Roman or German languages; in the case of other languages, the names are recorded in French. They are given as subdivisions after the geographic name – the name of the country where the object is currently located. Versions of the name in other
302
Rygiel
languages and former names (if any) are included as see reference in the natural order. Vilnius (Lituanie) – Aušros vartai (preferred name) UF Vilnius (Lituanie) – Ostra Brama Aušros vartai (Vilnius) Gate of dawn (Vilnius) Medininkų vartai (Vilnius) Ostra Brama (Vilnius) Pointed gate (Vilnius) Porte de l’aurore (Vilnius) Sharp gate (Vilnius) Tor der Morgenröte (Vilnius) BT Portes de ville – Lituanie
In the Katalogi Automatyczne Bibliotek Akademickich (KABA) Subject Headings (JHP KABA), which is used by major Polish academic research libraries (Głowacka, 2000; see also Kotalska, 2002; Nasiłowska, 2007), names of objects are recorded in their original form in the so-called Congress languages (English, French, German, Spanish and Italian). The remaining names are recorded in the Polish language, while the original name (sometimes transliterated) becomes see reference. Names of objects are recorded as geographic names with subdivisions: the names of specific objects, buildings, statues. Versions of the name in other languages and / or former names are included as see reference in the natural order. Wilno (Litwa) – Ostra Brama (preferred name) UF Aušros Vartai (Wilno). Brama Miednicka (Wilno). Brama Świętej Trójcy (Wilno). Ostra Brama (Wilno, Litwa). BT Bramy miejskie – Litwa.
In the Język Hasel Przedmiotowych Biblioteki Narodowej (JHP BN) (Klenczon & Stolarczyk, 2000; see also Klenczon, 2011), a name is recorded as a subdivision after a geographic name – the name of the place where the object is currently located. Foreign language names are given in their original form or are transliterated. If the original name cannot be established, the Polish version of the name is given, provided that it can be found in resources; otherwise the name appearing in the document is used. In the case of names of boroughs or other parts of cities, see references are given in the natural order. In the case of other names of objects, see references in natural order are not created.
Chapter 12. Subject indexing of images
303
Wilno (Litwa) – Aušros Vartai (preferred name) UF Wilno (Litwa) – Ostra Brama
None of these systems include the ability to create authority records for objects represented in various stages of development and linking these results with each other through references e.g. see also references. Such a solution would be extremely helpful in the description of the visual resources what we can see in the accompanying examples below.
Examples The examples of architectural objects presented below illustrate the complexity of the difficulties with the choice and use of appropriate headings created in authority files. All the objects are located in Poland and the neighbouring countries: territories that have had rather turbulent histories. Wars, occupations, and partitions have shifted borders and made different languages official in these areas. As a result, many buildings and structures in this part of Europe have had their names and addresses changed and have belonged to numerous different owners. As a consequence of treaties, after World War II, the eastern part of the territory that had belonged to the pre-war Poland and all the objects located there became part of neighbouring countries. On the other hand, land that used to belong to Germany, together with a large number of buildings and structures, became part of Poland. The turbulent history of these areas is reflected in monuments, the images of which (drawings, engravings and photographs) can be found in the collections gathered by various institutions.
304
Rygiel
Example 1: The change of function and name
Figure 2. Reichsbank. Source: KujawskoPomorska Biblioteka Cyfrowa. http: // kpbc.umk.pl / dlibra (public domain).
Figure 3. Collegium Maximum. Source: Wikipedia. http: // pl.wikipedia.org /, by Pko (CC-BY-SA 3.0).
The postcard dating back to ca. 1940 (Figure 2) shows the building housing the Reichsbank, built in 1905 in Thorn, Germany (currently Toruń, Poland). After World War I, in 1920, Toruń became part of the territory of Poland; in 1939 – 1945 it was German again; later it was returned to Poland. Since 1905, the building has had several names and served various purposes: originally a branch of the Reichsbank, it also housed a branch of the Polish National Bank and currently it is the Collegium Maximum (Figure 3), part of Nicolaus Copernicus University. The authority record for the object is: Toruń (woj. kujawsko-pomorskie) – Collegium Maximum (preferred name) UF Toruń (woj. kujawsko-pomorskie) – Bank Rzeszy (budynek) Toruń (woj. kujawsko-pomorskie) – Narodowy Bank Polski (budynek)
Chapter 12. Subject indexing of images
305
The name used in the record is the current name; the former names are included in UF fields as see references. To provide the description of the object as it was when it housed the Reichsbank, the cataloguer used the contemporary name.
Example 2: The change of form, function and name
Figure 4. Radziwiłł Palace. Source: Wikipedia. http: // en.wikipedia.org / (public domain).
Figure 5. Presidential Palace. Source: Wikipedia. http: // pl.wikipedia.org, by Marcin Białek (CC-BY-SA 3.0)
306
Rygiel
The two images show a Baroque mansion dating back to the mid-17th century, built in Warsaw by the Koniecpolski family. The building has had different owners and performed a variety of functions which have been reflected in its names. Figure 4 shows the palace as it was at the end of the 18th century, the time when it belonged to the Radziwiłł family. At the beginning of the 19th century it was reconstructed in Classicistic style and became a seat of the government, first that of invaders, then the Polish government and finally the President of Poland. Figure 5 shows the palace as it is now; it is currently pałac Prezydencki [the Presidential Palace]. The authority record for the object is: Warszawa – pałac Prezydencki (preferred name) UF Warszawa – pałac Koniecpolskich Warszawa – pałac Lubomirskich (Krakowskie Przedmieście) Warszawa – pałac Namiestnikowski Warszawa – pałac Prezydenta RP Warszawa – pałac Rady Ministrów Warszawa – pałac Radziwiłłów
The heading for both images would be Presidential Palace (the current name). Yet, the form and function of the building have changed several times throughout its history. If a user is searching for the image of the palace as it was in the Baroque period, the current name may be misleading. Additionally, Figure 5 also shows a statue of Prince Józef Poniatowski, which was erected in the 1960s. This is the case when a cataloguer must decide whether to include only the dominant part of the image (the palace), or other visible objects as well.
Chapter 12. Subject indexing of images
Example 3: The change of form and name
Figure 6. Warecki Square. Source: Author’s private collection.
Figure 7. Napoleon Square. Source: Author’s private collection.
307
308
Rygiel
Figure 8. Powstańców Warszawy Square. Source: Wikimedia Commons, http: // wikimedia.commons.org, by Clo mortis (CC-BY-SA 3.0).
The three images are two postcards and a photograph of a square in Warsaw. Both the name of the square and its form has changed throughout the years. Buildings have been built and demolished. Figures 6 and 7 show the square as it was before World War II. The descriptions on the postcards refer to the time when each of the photographs was taken: Figure 6 – Plac Warecki [the Warecki Square] (this name survived until 1921), Figure 7 – Plac Napoleona [Napoleon Square] (until 1939). The photograph (Figure 8) shows the square as it is now; the current name is Plac Powstańców Warszawy [the Warsaw Insurgents’ Square]. The authority record for the object is: Warszawa – plac Powstańców Warszawy (preferred name) UF Warszawa – plac Dzieciątka Jezus Warszawa – plac Warecki Warszawa – plac Napoleona
For all these images the current name of the square would be used. Is this appropriate, however? Should there be separate headings for the different stages in the history of the square, especially that the images illustrate significant changes in the buildings surrounding it, which may be important for users searching for the images of the square as it was at different stages of its development?
Chapter 12. Subject indexing of images
309
Example 4: The change of name without the change of form
Figure 9. District Management of National Railway. Source: Cyfrowa Biblioteka Narodowa – cBN Polona. http: // www.polona.pl / dlibra (public domain).
Figure 10. Lithuanian National Railway. Source: Wikipedia. http: // ru.wikipedia.org / , by Alma Pater (CC-BY-SA 3.0).
310
Rygiel
The postcard (Figure 9) presents the seat of the District Management of National Railway in Vilnius (currently Vilnius, Lithuania). The image shows the complex as it was in 1938, when the territory belonged to Poland. The complex was built between 1860 and 1914 for the management of the Russian Poleska Railway. Currently (Figure 10), it houses the management of the Lithuanian National Railway (Lietuvos Geležinkeliai). The authority record for the object is: Wilno (Litwa) – Dyrekcja Okręgowa Kolei Państwowych (budynek) (preferred name) UF Wilno (Litwa) – Zarząd Kolei Poleskiej (budynek) Wilno (Litwa) – Zarząd Kolei Państwowych (budynek)
The authority heading refers to the name of the object that was used at the time the image was created. This might be the best solution, as the description is appropriate for the image.
Conclusion The indexing of iconographic resources displaying architectural objects is a challenging task for cataloguers. Also, people searching for information face numerous difficulties. It has to be emphasised that authority files are inadequate for some iconographic collections, as shown in the examples. The limitations of authority files may make it impossible for the user to find relevant answers to their queries. Further user studies are needed in this area. What is the solution to this situation? Do the additions, the modifications, the changes in creating authorities which take into account the characteristics and problems sets presented above help? Is it necessary to create another model for the knowledge representation of architecture? It should be considered whether an authority record is intended to be for an object, or for its representation. If it is for the representation, include the name changes due to various events, and create authority records appropriate to the period in which they originate. It appears that creating authority file headings that include names appropriate for images and – if it is necessary – linking them (as see also references) to the other names of the same object would be a good solution. What could also be done is using uncontrolled vocabulary (keywords, tags) in indexing. The uncontrolled vocabulary would include the names which are not among headings in authority files, but were used when different images of the object were created. It might be worthwhile to create an ontology for the phenomenon described here. It would also be helpful if the rules of creating authority headings for museum and library
Chapter 12. Subject indexing of images
311
collections were standardised, since both types of institutions have some similar resources.
References Albrechtsen, H. (1993). Subject analysis and indexing: From automated indexing to domain analysis. The Indexer, 18(4), 219 – 224. Aluri, R., Kemp, D. A., & Boll, J. J. (1991). Subject analysis in online catalogs. Englewood, CO: Libraries Unlimited. Armitage, L. H., & Enser, P. G. B. (1997). Analysis of user need in image archives. Journal of Information Science, 23, 287 – 299. Baca, M. (Ed.). (2002). Introduction to art image access: Issues, tools, standards, strategies. Retrieved from http: // www.getty.edu / research / publications / electronic_publications / intro_aia / index.html Baca, M., & Harpring, P. (Eds.) (2009). Categories for the description of works of art. Retrieved from http: // www.getty.edu / research / publications / electronic_publications / cdwa / index. html Besser, H. (2003). Introduction to imaging. Retrieved from http: // www.getty.edu / research / conducting_research / standards / introimages / Booth, P. F. (2001). Indexing: The manual of good practice. München: K. G. Saur. Beghtol, C. (1986). Bibliographic classification theory and text linguistics: Aboutness analysis, intertextuality and the cognitive act of classifying documents. Journal of Documentation, 42, 84 – 113. Bruza, P. D., Song, D. W., & Wong, K. F. (2000). Aboutness from a commonsense perspective. Journal of the American Society for Information Science, 51(12), 1090 – 1105. Chan, L. M. (2005). Library of Congress Subject Headings: Principles and application. Westport, CT: Libraries Unlimited. Chan, L. M. (2007). Cataloging and classification: An introduction. Lanham, MD: Scarecrow Press. Choi, Y., & Rasmussen, E. (2003). Searching for images: The analysis of users’ queries for image retrieval in American history. Journal of the American Society for Information Science and Technology, 54(6), 498 – 511. Chu, C. M., & O’Brien, A. (1993). Subject analysis: The first critical stages in indexing. Journal of Information Science, 19, 439 – 454. Cultural Objects Name Authority™ Online. (n.d.) Retrieved from http: // www.getty.edu / research / tools / vocabularies / cona / index.html Drabenstott, K. M., & Vizine-Goetz, D. (1994). Using subject headings for online retrieval: Theory, practice and potential. San Diego: Academic Press. Enser, P. G. B. (1995). Progress in documentation: Pictorial information retrieval. Journal of Documentation, 51(2), 126 – 170. Enser, P. G. B. (2008). Visual image retrieval. Annual Review of Information Science and Technology, 42(1), 1 – 42. Głowacka, T. (2000). JHP KABA: Zasady tworzenia słownictwa. Warszawa: Wydawnictwo SBP.
312
Rygiel
Gross T., & Taylor, A. G. (2005). What have we got to lose? The effect of controlled vocabulary on keyword searching results. College & Research Libraries, 66(3), 212 – 230. Guide d’indexation RAMEAU (n.d.) Retrieved from http: // guiderameau.bnf.fr / html / rameau_0894.html#d11e67550 Guidelines for subject authority and reference entries. (1993). München: K. G. Saur. Harpring, P. (1990). The architectural subject authority of the foundation for documents of architecture. Visual Resources, 8, 55 – 63. Harpring, P. (2010). Introduction to controlled vocabulary: Terminology for art, architecture and other cultural works. Retrieved from http: // www.getty.edu / research / publications / electronic_publications / intro_controlled_vocab / index.html Hjørland, B. (1992). The concept of “subject” in information science. Journal of Documentation, 48(2), 172 – 200. Hjørland, B. (2001). Towards a theory of aboutness, subject, topicality, theme, domain, field, content … and relevance. Journal of the American Society for Information Science and Technology, 52(9), 774 – 778. Hutchins, W. J. (1978). The concept of “aboutness” in subject indexing. Aslib Proceedings, 30, 172 – 181. Jacobs, C. (1999). If a picture is worth a thousand words, then … The Indexer, 20(3), 119 – 121. Jansen, B. J. (2008). Searching for digital images on the Web. Journal of Documentation, 64(1), 81 – 101. Jörgensen, C. (1998). Attributes of images in describing tasks. Information Processing & Management, 34(2 / 3), 161 – 174. Jörgensen, C. (1999). Access to pictorial material: A review of current research and future prospects. Computers and the Humanities, 33(4), 293 – 318. Jörgensen, C., Jaimes, A., Benitez, A.B. & Chang, S.-F. (2001). A conceptual framework and empirical research for classifying visual descriptors. Journal of the American Society for Information Science and Technology, 52(11), 938 – 947. Jörgensen, C. (2003). Image retrieval: Theory and research. Lanham, MD: Scarecrow Press. Klenczon, W., & Stolarczyk, A. (2000). Hasło geograficzne. Wybór i zasady tworzenia w bibliografii narodowej i katalogach Biblioteki Narodowej. Zasady wypełniania rekordu wzorcowego. Warszawa: Biblioteka Narodowa. Klenczon, W. (2011). Język Haseł Przedmiotowych Biblioteki Narodowej (National Library of Poland Subject Headings) – From card catalogs to digital library: Some questions about the future of a local subject headings system in the changing world of information retrieval. In P. Landry, L. Bultrini, E. T. O’Neill, & A.K. Roe (Eds.). Subject access: Preparing for the future (pp. 169 – 180). Berlin: De Gruyter Saur. Kotalska, B. (2002). The RAMEAU / KABA Network: An example of multi-lingual cooperation. Slavic & East European Information Resources, 3(2 – 3), 149 – 156. Lancaster, F. W. (1998). Indexing and abstracting in theory and practice. 2nd ed. London: Library Association Publishing. Lopes, M. I., & Beall, J. (1999). Principles underlying subject heading languages (SHLs). München: K. G. Saur. Maron, M. E. (1977). On indexing, retrieval and the meaning of about. Journal of the American Society for Information Science, 28, 38 – 43. Matusiak, K. (2006). Towards user-centered indexing in digital image collections. OCLC Systems and Services, 22(4), 283 – 298.
Chapter 12. Subject indexing of images
313
McCutcheon, S. (2009). Keyword vs. controlled vocabulary searching: The one with the most tools wins. The Indexer, 27(2), 62 – 65. McRae L., & White, L. S. (Eds.) (1998). ArtMARC sourcebook: Cataloging art, architecture, and their visual images. Chicago: American Library Association. Ménard, E. (2009). Images: Indexing for accessibility in a multi-lingual environment – Challenges and perspectives. The Indexer, 27(2), 70 – 76. Nasiłowska, M. (2007). KABA Subject Headings: The current situation and prospects for the future. Polish Libraries Today, 7, 55 – 59. O’Connor, B. C., & Wyatt, R. B. (2004). Photo provocations: Thinking in, with, and about photographs. Lanham, MD: Scarecrow Press. Panofsky, E. (1962). Studies in iconology: Humanistic themes in the art of the Renaissance. New York: Harper & Row. Porter, V., & Thornes, R. (2000). A guide to the description of architectural drawings. Retrieved from http: // www.getty.edu / research / publications / electronic_publications / fda / index. html Rasmussen, E. M. (1997). Indexing images. Annual Review of Information Science and Technology, 32, 169 – 196. Reitz, J. M. (n.d.) ODLIS Online dictionary for library and information science. Retrieved from http: // www.abc-clio.com / ODLIS / odlis_A.aspx Roberts, H. E. (2001). A picture is worth a thousand words: Art indexing in electronic databases. Journal of the American Society for Information Science and Technology, 52(11), 911 – 916. Schroeder, K. A. (1998). Layered indexing of images. The Indexer, 21(1), 11 – 14. Shatford, S. (1986). Analyzing the subject of a picture: A theoretical approach. Cataloging & Classification Quarterly, 6(3), 39 – 62. Shatford Layne, S. (1994). Some issues in the indexing of images. Journal of the American Society for Information Science, 45(8), 583 – 588. Springer, M., Dulabahn, B., Michel, P., Natanson B., Reser D., Woodward, D., & Zinkham, H. (2008). For the common good: The Library of Congress Flickr pilot project. Retrieved from http: // www.loc.gov / rr / print / flickr_report_final.pdf Subject Cataloging Manual: Subject Headings. (1996 – 2008). Washington, D.C.: Library of Congress Cataloging Distribution Service. Svenonius, E. (1994). Access to nonbook materials: The limits of subject indexing for visual and aural languages. Journal of the American Society of Information Science, 45(8), 600 – 606. Tillett, B. B. (Ed.) (1989). Authority control in the online environment: Considerations and practice. New York: Haworth Press. Tillett, B. B. (2004). Authority control: State of art and new perspectives. Cataloging & Classification Quarterly, 38(3 / 4), 23 – 41. Trant, J. (2009). Tagging, folksonomy and art museums: Results of steve.museum’s research. Retrieved from http: // conference.archimuse.com / files / trantSteveResearchReport2008. pdf Wyman, B., Chun, S., Cherry, R., Hiwiller, D., & Trant, J. (2006). Steve.museum: An ongoing experiment in social tagging, folksonomy, and museums. In J. Trant and D. Bearman (Eds.), Proceedings of Museums and the Web 2006. Retrieved from http: // www.museumsandtheweb.com / mw2006 / papers / wyman / wyman.html Zeng, M. L., Žumer, M., & Salaba, A. (Eds.). (2011). Functional Requirements for Subject Authority Data (FRSAD): A conceptual model. Berlin: De Gruyter Saur.
Renata Maria Abrantes Baracho, Beatriz Valadares Cendón
Chapter 13. An image based retrieval system for engineering drawings Abstract: The study presents the conceptual model, the classification scheme and the prototype of an image retrieval system for engineering drawings. The model integrates concepts and techniques of information science and computer science, joining human interpretation and automated processing, and using textual and visual metadata as access points for retrieval. The paper discusses the tests and the case study used to optimize and validate the system proposed. The tests, using a database of 332 drawings, show the feasibility of the system which presented 90.3 % recall using a similarity rate of 70 % on average. Human and automatic processing for indexing and the classification scheme were important factors in reducing processing time. Keywords: Image retrieval, image based retrieval system, image retrieval system, concept based retrieval, engineering drawing retrieval system, information retrieval system, technical drawings, engineering drawings, technical drawing indexing, visual metadata
Renata Maria Abrantes Baracho (corresponding author), Assistant Professor, School of Information Science, Federal University of Minas Gerais, [email protected] Beatriz Valadares Cendón, Associate Professor, School of Information Science, Federal University of Minas Gerais
Introduction The motivation for the current study was the difficulty in information retrieval of engineering drawings required for decision-making, as well as the need to achieve greater efficiency in the process of encountering information in large sets of drawings. Current information retrieval systems for engineering drawings use textual data to index and retrieve drawing files, and do not normally consider visual data. Information retrieval in data-bases that may contain thousands of drawings frequently depends on someone who participated in the project, and it is impaired when that person is no longer part of the process.
Chapter 13. An image based retrieval system for engineering drawings
315
First, this chapter surveys concepts, techniques and research in image retrieval, from the perspectives of both information science and computer science. Next, the chapter discusses details of the categories and components of engineering and architectural design including explanations about the geometric representation of objects. Further elucidations on the contents of an engineering drawing that allow its interpretation by humans follow. The information above constituted the basis to develop the conceptual model for indexing and retrieval of engineering drawings which underlies the system proposed. The chapter details the theoretical basis, the conceptual model and the classification scheme, briefly discussed in Baracho and Cendon (2009), and expands this previous publication by presenting the development of the prototype which implemented the model and the final validation of the system through a case study. The prototype brings together knowledge from three different areas resulting in a system which integrates concepts of information and computer science as well as engineering as shown in Figure 1. Information organization, searching, retrieval, classification, representation.
Information Science
Computer Science
Information, creation, editing, handling, retrieval, recovery; system implementation.
Engineering Sources of information, area of knowledge, users Information System for Engineering drawings Figure 1. Contributions of three disciplines for the model of a system for image retrieval.
Literature review The literature review led to two different approaches in image indexing and retrieval research. The first, based on the concepts and foundations of information science, uses descriptive data to index and retrieve information in images. The second, based on concepts and foundations of computer science, achieves
316
Baracho, Cendón
content based information retrieval through the graphical properties and shape of the image.
Organization and retrieval of images in information science In information science, subject analysis, definition of access points, interpretation, categorization, classification, and indexing of documents are used for the organization and retrieval of information. To describe a document, subject analysis, or the determination of the concepts that represent the content of a document, is conducted. The description of a document requires the condensing of its content into the concepts it contains, considering its most perfect conceptual core. The concepts are then translated into indexing terms, which become subject descriptors after being incorporated into an indexing language. Indexing terms, as well as classification schemes and categories are metadata that represent the concepts and subject of a document (Frohmann, 1990). The interpretation, description, and representation of the documents that will be part of the system are part of the construction of online databases. In the process of representation, the document or set of documents in a database can be replaced by a set of metadata to enable its efficient location and retrieval by the user (Baeza-Yates & Ribeiro-Neto, 2011). The metadata used to represent the electronic documents in databases containing text, images and other media constitute access points to the documents. According to Hjørland (1992), the definition of access points is one of the problems in retrieval. Information retrieval is done through the definition of the documents within a collection which contain the keywords of a user’s inquiry. Often, this is not enough to satisfy the need of the user seeking information about a particular subject and not on a particular data or word. In an attempt to satisfy the information needs of the user, the information retrieval system seeks to interpret the content of the information items in a collection. Interpretation involves the matching of syntactic and semantic information in the documents with the information needs of the user (Baeza-Yates & Ribeiro-Neto, 2011). The retrieval process is complete when the user is satisfied with the results of his search. Most existing image information retrieval systems are based in information science concepts and utilize a textual model in describing this information. Nonetheless, due to the fact that different people comprehend a figure in a number of different ways, depending on the context, the description of an image may vary from one individual to another. In addition to this, all the problems inherent to the different vocabulary used for indexing the images arise. In a system that manipulates images, it would be ideal if the user described the search question
Chapter 13. An image based retrieval system for engineering drawings
317
using images. In this case, the model would lose less in terms of abstraction and would be able to retrieve more relevant information.
Organization and retrieval of images in computer science In computer science, content-based image retrieval (CBIR) is achieved by detection of the image features as well as identification and classification of its visual characteristics. CBIR systems usually use algorithms to test all or part of an image in order to identify similar images. The system detects the visual characteristics of the image based on colour, texture and form, and it classifies these characteristics, which are used in feeding the database to retrieve the image desired. Retrieval is attained through comparison and detection of similarity between the visual content of the searched image and the images in the database. In order to begin a search, the user selects the characteristic he is looking for, and defines a similarity measure. The image searched for can be defined by the user or obtained from an example as shown in Figure 2. Zachary, Iyengar, and Barhen (2001) emphasize that the fundamental aspect of image retrieval systems is the determination and representation of a visual feature that effectively distinguishes between pairs of images. Retrieval by picture similarity considers image variation, which enhances the level of information retrieval.
Figure 2. Example of search by colour, texture and form.
The basic principle of CBIR is the use of image visual properties in image search and retrieval instead of a textual description. In addition to this approach, known as the Query by Image Content (QBIC) system (developed by IBM), other systems have been developed with the same goal. Concepts and techniques of digital image processing are used for detection based on the shape of the object. According to Gonzales and Woods (2002), the
318
Baracho, Cendón
term image or monochrome image refers to the two-dimensional function of light intensity, f (x, y), where x and y denote spatial coordinates and the value of f at any point (x, y) is proportional to the brightness (or gray levels) of the image at that point as shown in Figure 3b. It can be considered as an array whose indices of rows and columns identify a point (Figure 3a) in the array. The corresponding value of the array identifies the gray level at that point. Currently, CBIR systems present search features for organization and representation of images based on syntactic interpretation and recognition of attributes. The systems do not present solutions for defining the concepts carried by an image, which is, at this point, dependent on human interpretation.
Figure 3. (a) Image matrix. (b) Axes representation digital image.
The gap Enser (2000), Heidorn (1999), and Smeulders, Worring, Santini, Gupta, and Jain (2000), point out that there is a gap between these two approaches that must be filled in the search for innovative solutions in image retrieval. According to Enser (2000), the collections of images consider the paradigm of retrieval based on the concept in which the image search is verbalized by the user and solved by means of text operations. The text is the verbalization of the image. The author points out other research involving content based visual retrieval, comments on the growth of this line of research since 1990, and underlines the importance of research on image retrieval. This research should investigate the use of hybrid systems in order to get beyond the lack of continuity that exists between high level human processing and low level computational processing.
Chapter 13. An image based retrieval system for engineering drawings
319
The study carried out by Smeulders et al. (2000) analyses 200 articles about image retrieval. It concludes that there is an obstacle to overcome in the process of image comprehension, in which computational researchers should be able to identify characteristics required for the iterative understanding of the subject of the image, in addition to the use of automatic techniques for image feature comparisons. The authors point out the growth of research in content based image retrieval (that is: form of the image) due to the greater availability of digital sensors as well as the strengthening of information made available on the Internet and the drop in prices of storage mechanisms. However, as the author notes, it is necessary to surpass the problem of image comprehension. To the present moment, the gap between high-level semantic concepts of the images and the low-level visual features remains an unsolved challenge. In an effort to better understand the problem, Colombino, Martin, Grasso, and Marchesotti (2010) draw on ethnographic studies of design professionals who routinely engage in image search tasks. Other examples of recent research studies are presented by several authors who refer to the gap issue and proposed various approaches to address it. For instance, Ajorloo and Lakdashti (2011) and Rosa et al. (2008) report on the use of relevance feedback from the user as a way to reduce the semantic gap in a CBIR system. Guan, Antani, Long, and Thoma (2009) and Ion (2009) attempt to deal with the problem through the use of a learning-based scheme for extraction of semantic meaning from image databases. Belkhatir and Thiem (2010) propose the use of three abstract levels of representation to address the problem. Lakdashti, Moin, and Badie (2009) and Ciocca, Cusano, Santini, and Schettini (2011) propose fuzzy rule-based methods to overcome the issues. Thus, the lack of connection between the two distinct lines of research for the retrieval of images remains. One line considers the semantic understanding of the image and the determination of the concepts that represent the documents. In this line, the description of the image is done through text. The other line of research considers the syntactic processing of the image, seeking to define the contents and properties of existing symbols and icons through digital processing of the image. Both methods present shortcomings when used alone. In the current study, these two concepts were merged and used as the basis for a model, a classification scheme and a prototype of a retrieval system for engineering drawings. Currently, most engineering drawing retrieval systems do not incorporate the visual content of drawings, and utilize just the textual attributes to represent the documents. Rather than substituting one technology for another, it is relevant to conduct research which permits the enhancement of existing technology, allowing them to consider visual attributes in addition to textual attributes and, therefore, making them more efficient.
320
Baracho, Cendón
The engineering drawing This section presents an overview of the object of this research: the engineering drawing. An engineering drawing is a type of technical drawing which communicates, through a graphical language, the ideas conceived by the designer of an object and the worker who will build it. There are specific issues in the organization of information in engineering technical drawings. Engineering has various branches such as agronomic, aeronautical, agricultural, food, environmental, civil, computer, electrical, structural, forest, mechanical, mining, naval, production, chemical, sanitary, safety, software, telecommunications, and transportation, amongst others. Each branch has its specificities and characteristics that determine a series of subdivisions. For instance, civil engineering alone involves the architectural, structural, hydraulic, electrical, fire fighting, fire prevention, and air conditioning projects, amongst others. Each branch of engineering produces a set of drawings needed for the construction of an object. The engineering project is a set of standardized technical drawings needed for the execution and representation of engineering works. It is usually developed in well-defined steps executed in a linear sequence, which include preliminary design, executive project, detailed design, and project presentation. Each step in the development of an engineering / architectural project is composed of a group of drawings which represent the object to be built in different views. The set of graphical records is the expression or representation of the form, dimension and location of objects, according to the different needs of the different branches of engineering and architecture. The drawing is a code for a language established between the sender (professional from the field) and the receiver (reader of the drawing), enabling its understanding. Its interpretation requires knowledge both from the sender and from the receiver (Chu, 2001). The representation and interpretation of the technical drawing requires specific training. Engineering drawings use flat figures in representing spatial forms. The technical drawing consists basically of geometric representations such as lines, surfaces and a set of symbols, signs, dimensions, and texts that complement each order. The set of drawings, through the different views, provides a representation for the constructions of the object. The icons present in the technical drawing allow inferences and conclusions about the project as a whole, defining the type of project (e.g. civil), the process (e.g. executive) and the form (e.g. elevation). Usually, it contains information regarding constructive elements, such as the drawing scale, position and measures of walls, doors, and windows, the name of each ambient, and its respective level.
Chapter 13. An image based retrieval system for engineering drawings
321
The development of technical drawings follows policies and standards that vary according to each field of engineering. The standardization of documents is an important step in the creation of a graphical language. This standardization is done by means of technical norms which result from the efforts of those interested in establishing technical codes. According to Maher and Rutherford (1997), engineering drawings are created by the use of a convention of graphical elements or a common syntax of symbolism, as shown in Figure 4. This standardized information improves the understanding of the drawing and the collaboration among users involved in the process. The information contained in the draft can be enhanced by the explicit semantics or meaning of the common syntax of the symbolism.
Figure 4. Example of document / technical drawing of an architectural project.
Technological progress and the standardization of drawings made possible the creation of computer aided design (CAD) software, which is now globally used for project development. To increase the performance of CAD software, it became necessary to develop libraries of icons. The libraries, used in project development, are made up of standard sets of icons. They typically represent objects that are repeated in the same or in different drawings and their definition depends on the context. The same library can be used in different projects, institutions, cities and states; however, each institution usually has its own. Figure 5 shows an example of a library of icons.
322
Baracho, Cendón
Figure 5. Example of a library of icons.
Conceptual model The model proposed herein considers human analysis and interpretation of the image as well as automated processing of visual content for the definition of the metadata to be used for retrieval of engineering drawings. Thus, the model considers a hybrid system, consisting of textual and visual data. To develop the model, the two lines of research described above were considered. The line of research originated in information science considers, for representation and retrieval, the semantics of the image expressed textually through categories or subject descriptors. Computer science considers, on the other side, low level computational processing and interpretation for the representation and retrieval of images. The current research merges these two aspects by proposing the use of textual and visual metadata, as shown in Figure 6.
Chapter 13. An image based retrieval system for engineering drawings
323
Figure 6. Two types of metadata in image retrieval.
The model proposed presents the interpretation of the drawing in two steps. The first one involves iterative understanding of the drawing, the determination of its subject on the semantic level and its analysis, in order to define what it represents. The second is the syntactic interpretation of metadata, with the definition of administrative, technical, and visual metadata of the drawing, as shown in Figure 7.
Figure 7. Two steps in drawing interpretation (Baracho, 2007).
324
Baracho, Cendón
Semantic interpretation takes place by means of the inferences made when a human subject observes the drawing and mentally defines the object depicted. It is done through the human definition of three categories (type, process, and form) that define the subject of the document. Syntactic interpretation takes place through the human reading of the textual, administrative, and technical metadata, as well as the automatic processing of the icons present in the document. The textual metadata is composed of the categories, the administrative metadata, and the technical metadata, and are attributed by a human indexer. Visual metadata is composed of the icons present in the drawing. The identification of icons is done automatically according to digital image processing techniques for extraction of features of form. Administrative metadata contain data for the drawing that are usually used for administrative control (e.g. number, name, file, company, date, address). Technical metadata refers to the technical characteristics of the project and can be obtained from the stamp or label of the drawing (e.g. scale, title, district, total area, area to build). The visual metadata are the geometric representation of a symbol and are not contextualized. The attribute of a door, in top view, is simply the representation of a line and an arch. The icon of a door presents different geometric representations according to its position in the drawing. Through syntactic interpretation and the recognition of icons, the definition of the image content is obtained. The icons alone can be seen without considering their context. The creation of the database starts with the classification of the drawing through the interpretation and analysis of the document by a human indexer who defines the categories, classifies the document, and attributes the administrative and technical metadata. The classification scheme developed is presented in detail in the next section. As Figure 8 demonstrates, the combination of the three categories determines a specific set of icons in a visual metadata table, which might be present in the draft. The drawings are then automatically scanned to locate, index, and store these icons. Drawing files and their textual and visual metadata compose the database. The first step in the retrieval process is the choice of an image (icon) to be searched for, previously stored in a table of standard icons. To this end, the user defines the three categories (type, process, and form) which lead to the table of icons (see Figure 8). After selecting the icon to be searched for in this table, the user defines the similarity rate between the searched icon and the icons in the documents. Next, the user defines the directory where the search will be conducted. The directory contains a set of drawing files which are then searched for the user’s selected icon.
Chapter 13. An image based retrieval system for engineering drawings
325
Classification scheme The following classification scheme was first proposed by Baracho (2007) and is the result of observation and professional experience in engineering companies. The classification scheme encompasses the formal categories of engineering / architectural technical drawings, herein defined as Type, Process, and Form. In classification theory, formal categories have the property of being mutually exclusive, in such a way that each object may belong to only one category at the moment of information organization and retrieval. Thus, the classification scheme funnels and filters technical drawings. According to Hjørland (1992), categories may be defined as access points.
Category: Type The first category, Type, defines the type of project represented in the technical drawing, among the various branches of engineering. The category Type defines whether the project is architectural, structural, electrical, or hydraulic, amongst others. The project Type defines a set of symbols and representations which may be present in the drawings. By the examination of a technical drawing, a specialist is capable of identifying the type of a given project. A Type may be: – architectural: includes the representation of elements to be built (in civil engineering) and the definition of spaces; – structural: includes information concerning the structure to be executed, pillars, beams and other structural elements; – electrical: includes information concerning electrical circuits, energy distribution and supply; – hydraulic: includes information concerning all the hydraulic network to be launched and water and sewage distribution; – fire prevention: involves information concerning fire prevention and fighting, in compliance with the standards established by the Fire Department; – mechanical: involves information needed for the manufacturing of mechanical parts; – others. The “others” field broadens the classification scheme. In other words, if the project belongs to a category different from those on the list, it will fit in as others. The same others concept can be used in the process and in the form categories.
326
Baracho, Cendón
Category: Process The second category, called “Process”, contains the development stage. Process is defined by interpreting the level of detail of the project. Most engineering / architecture projects are divided into phases. The cycle of development in a project involves many steps. These steps are usually independent and follow a linear order; when one step ends, one moves on to the next. Project development phases may vary according to the type of project and the type of user. A complete architectural project must contain the following steps: – pre-project: usually the first study presented in the development of a project. In this phase the parameters and general forms of the project are defined. It is considered a draft and it is subject to changes; – preliminary project: the project itself, with all definitions ready for execution. Usually, this project is submitted to departments or public agencies responsible for project approval. It is also called a “legal or licensing project”; – executive project: the project used in the building grounds. It contains a higher level of detail with complexity suited to building procedures; – detail: the project also used in the execution of building works, contains more information than the previous ones and is usually composed of specific parts of the project on a larger scale; – presentation project: the kind of project used in presentations to clients and non-expert people. Those are projects containing graphical representations easier to be interpreted, also used for sales and publicity; – others.
Category: Form The third category, called Form, defines the graphical representation of the design. The graphical representation of the project is divided into view, ground plan, section, perspective, and others. Each development stage of the project generates a series of technical drawings. These series may contain one or more representations, depending on the position in space of the object to be represented. For example, an architectural project during the development of the executive project may contain some or all technical drawings listed below: – floor plan: the most representative part of the project. It is the representation seen from above. It defines a horizontal cross section at 1.40m from the ground, and represents information cut and seen on this plane;
Chapter 13. An image based retrieval system for engineering drawings
– – – – – – – – –
327
upper view: representation of the object totally viewed from above, as if in a 90º angle in relation to the stand point; vertical section: representation of the object vertically cut and containing information concerning the heights of the project; elevation: representation of the object seen from outside, in a front or lateral view; front view: representation of the object in a front view; right lateral view: representation of the object in a right lateral view; left lateral view: representation of the object in a left lateral view; rear view: representation of the object in a rear view; perspective: representation of the object seen from a given angle, defining a vision in perspective; others.
Figure 8. Classification scheme (Baracho & Cendón, 2009).
328
Baracho, Cendón
These categories are used in the model to classify the technical drawing and determine which textual and visual metadata should be used. The combination of the three categories – Type, Process and Form – for a given drawing determines a set of symbols and representations that may be present in the drawing and, consequently, the iconic metadata table that should be used when scanning the drawing. For example, a drawing with Type ‘architectural project,’ Process ‘executive project,’ and Form ‘floor plan’ is classified as the technical drawing of an architectural executive project ground plan. It may contain the representation, belonging specifically to this classification, of an upper view of walls, doors, windows, layout, and impermeable areas. This is different from the attributes of an electrical executive drawing, which contains the symbols for outlets, lamps, circuits, and others. In Figure 8, the three axes of the graphic contain the “others” option, characterizing the openness of the model which can be extended and applied to other categories not discriminated in each axis. This characteristic opens the classification scheme and enables its adaptation to other contexts. Visual and textual metadata are used for indexing and for retrieving documents with consideration of both semantic and syntactic interpretation, both the concept and the content of the drawing.
Prototype The prototype uses visual and textual metadata for information organization and retrieval. As explained in the previous section, the textual metadata consist of three categories, Type, Process and Form, which a human indexer determines by analysing the drawing and inputting through the prototype interface. These categories are used for conceptual, semantic retrieval. Other textual metadata, consisting of administrative (e.g. number, title, file, company, data, address) and technical (e.g. scale, title, district, total area, etc.) metadata are also implemented in the prototype, and provide additional points of access. The visual metadata consists of icons which may be present in the drawings, according to the iconic metadata table defined for each combination of the categories (Type, Process and Form) for a given drawing. The icons are represented in the drawings as flat, that is, two dimensional monochrome images which consist of arrays of black and white points. The automatic identification of the icons provides content-based retrieval, and it is accomplished through techniques for extraction of shapes or forms of objects. Therefore, the content-based retrieval used by the prototype does not consider colour or texture, since these features are not present in the kinds of engineering drawings used.
Chapter 13. An image based retrieval system for engineering drawings
329
The four types of metadata, that is, categories, icons, administrative data, and technical data compose the inverted index used to search the database. Figure 9 shows the complete process described in this section.
Figure 9. Overview of the proposed system for engineering drawing organization and retrieval (Baracho & Cendón, 2009).
The prototype was developed in the C programming language. For better performance, the graphical user interface was developed in Java. The prototype implemented the model conceptual base in a system which interpreted, classified, and indexed the images in a database. This section presents a general view of the logic and the implementation of the algorithm in the prototype. More technical details on this topic can be found in Baracho (2007). A sweeping algorithm was developed to locate drawings having icons equal or similar to the searched icon in a database. The general conception of the algorithm is the retrieval of two-dimensional geometric shapes which compose the engineering drawings based on their geometric characteristics. For instance, when looking for the icon of a door, which is represented by a line and an arch (Figure 10), the algorithm searches the drawing for similar arrays of black and white dots, retrieving all the documents where they are present. The central idea of the algorithm is to detect shapes similar to a pre-determined shape, within a larger context.
330
Baracho, Cendón
Figure 10. Principle of the algorithm (Baracho & Cendón, 2009).
The algorithm is robust in its search for rapid response time, considering matrix images in vector or matrix format, as well as scales, lines of interference, and rotation of the searched images (Figure 11), independently of the drawing.
Key-image
1
5
4
6
rotation 4 options of rotation
normal
R=90
R=180 EHV
horizontal and vertical mirror
2
8
R=270 R=-90
3
5
2
8
horizontal mirror
EH
horizontal mirror
4 options of mirror
3
R=90 EH horizontal mirror
7
vertical mirror
EV
vertical mirror
Figure 11. Image with rotation and mirror variation.
R=90 EV vertical mirror
Chapter 13. An image based retrieval system for engineering drawings
331
It scans the drawings for any image identical or similar to the searched image according to the basic graphic computer transformations, that is, change in translation, scale, rotation, and mirroring. Rotations of 0°, 45°, and 90° were defined. A rotation outside this range in another quadrant is an abstraction of rotation from 0°, 45°, and 90°, and the algorithm is able to automatically recognize this abstraction at 0°, 45°, 90°, 135°, 180°, 225°, 270°, 315°, and 360°. The prototype also allows an image to be searched in various scales which are determined by the user. For instance, the user could choose to search for a door in the scales 1.0, 0.75, and 0.50 and rotated in 0°, 45°, and 225°. The basis for the implement of these search features were the classic mathematical principles of geometric transformations for graphic computing (Foley, van Dam, Feiner, Hughes, & Phillips, 1994; Hearn & Baker, 1996), the Fourier Transform (Gonzales & Woods, 2002) and Affine Transform (Chih-Chin & Ying-Chuan, 2011; Mehrotra & Gary, 1995; Mokhtarian & Abbasi, 2002; Yang & Yang, 1999). The prototype also allows the user to define a similarity rate between the searched icon and the icon found in the drawing as a parameter. The ideal value for the similarity rate would be 100 %, that is, the images should be identical. However, even when this occurs, small differences between the two images may occur due to various reasons such as differences in their resolution or the presence of other lines in the drawing that are not part of the image but overlap with it. The definition of a similarity rate takes this factor into consideration in the search. As shown in Figure 12, the similarity rate is determined by the percentage of pixels that are equal in the two images.
Figure 12. Similarity rate.
Figure 13 shows the interface of the prototype. Through the option “Icones”, in the top menu bar, the searched image can be selected. In Figure 13, the searched image is the icon for a computer in top view. The drawings found are displayed with the hits surrounded by a dotted rectangle. The interface also shows statistical data with the processing time and number of hits for each drawing retrieved.
332
Baracho, Cendón
Figure 13. Interface with presentation of the images found (Baracho, 2007).
Tests of the prototype During the first stage of the research, tests of the prototype were carried out using a corpus composed of 10 engineering technical drawings. Six tests will be discussed here which resulted in optimizations of the system. The results are presented in Figure 14. More details about the tests and optimizations implemented may be found in Baracho (2007).
Chapter 13. An image based retrieval system for engineering drawings
333
Figure 14. Results of the tests of the prototype.
Impact of the similarity rate in recall and precision In tests 1, 2 and 3, the searched icon was a sink. The scale search parameter was set at 1.0, that is, the system should find images that have the same size as the selected icon and should locate images rotated both at 0º and 45º. The similarity rate was set at 70 % in test 1, 80 % in test 2, and 50 % in test 3. In test 1, 24 icons should have been located. Of these, 9 were found. No incorrect icons were retrieved. Therefore, recall was 37.5 % and precision was 100 %. In test 2 there were 24 icons that should have been located. Two icons were found. No incorrect icons were retrieved. Therefore, recall was 8 % and precision was 100 %. In test 3, 25 icons should have been located. Of these, 18 were found. No incorrect icons were retrieved. Therefore, recall was 75 % and precision was 100 %.
334
Baracho, Cendón
Results of tests 1, 2, and 3 show that a similarity rate of 50 % increases the recall rate and does not affect precision. Test 3 also shows that recall was 0 % for drawings 3 and 10 while for the other drawings, it was 100 %. Examination of these drawings showed that the reason for the low recall was a difference in scale and rotation between the searched icon and the icons present in drawings 3 and 10. In test 6, shown in Figure 15, the icon searched for is a door. The result shows that a similarity rate lower than 50 % reduces precision, since two icons were incorrectly retrieved.
Figure 15. Drawing showing incorrect icons retrieved.
Impact of the parameter scale in retrieval In test 4, the searched icon was a door. The selected search parameters for the scale were set at 1.0, 0.87, and 0.75. That is, in each drawing the prototype should look for the searched icon in three different sizes, given that in the same building there can be doors of various widths. The prototype should locate images rotated both at 0º and 45º. Because the icon was a simple geometric form, the similarity rate was set at 70 %.
Chapter 13. An image based retrieval system for engineering drawings
335
In test 4, 85 icons should have been located. Of these, 56 were correctly found, and 27 false retrievals occurred. Therefore, recall was 67 % and precision was 67.5 %. This test, especially if compared to tests 1, 2 and 3, showed that the search for the same icon in a number of different scales reduces precision. The test also shows increase in processing time if more than one scale is defined for the search. Processing time was 5.8 minutes in average while in tests 1, 2, and 3 it was less than 50 seconds.
Impact of geometric form complexity in processing time In test 5, the searched icon was a computer, a more complex icon than the one in tests 1, 2, 3, and 4. The scale was set at 1.0 (i.e. the found image should have the same size as the searched image) and the prototype should locate images rotated both at 0º and 45º. The similarity rate was set at 70 %. In this test, average processing time was 2.2 minutes. This test shows that the size of the searched image in pixels affects processing time. Since resolution and complexity of the searched icon (e.g. number of lines, the presence of curves) influence the size of the image, this finding leads also to the conclusion that resolution and complexity of the icons affect processing time. The tests above allowed conclusions about factors that affected recall, precision, processing time, and the overall feasibility of the model and prototype. Some of these findings resulted in optimizations which were implemented in the prototype with the objective of improving the performance of the algorithm. The first optimization was implemented to reduce processing time by eliminating the blank spaces. That is, if an area corresponding to the searched image is white, the algorithm jumps a number of columns equal to the length of the image. With the second optimization, the algorithm was improved to ignore the edges of the whole image if a column or row of the image (mask) gets out of the area of the drawing. With the third, the algorithm was modified to ignore the spaces or areas where the image was previously found. That is, if an image has been found in a given position of the technical drawing then this particular position will not be considered in further searches. Other optimizations implemented the transformations for graphic computing, described in the previous section, for better retrieval of rotated images.
336
Baracho, Cendón
Case study: Military Fire Department of the State of Minas Gerais (MFDMG), Brazil For validation of the model and prototype in a real situation, a case study was developed with the Military Fire Department of the State of Minas Gerais (MFDMG), Brazil, which used projects of fire and panic prevention. The archive of the institution is made up of about 30,000 projects for fire prevention and fire fighting, and each project contains an average of three technical drawings composed of plans, sections, details, specifications and tables. The corpus used for the validation of the model and prototype was composed of a set of 332 drawings. In the Fire Department’s archives, 100 projects were selected and copied with proper authorization to perform this research. The criterion for selection of the projects was the date of approval. All projects selected were approved in 2006 and delivered in DWG format, or developed in AutoCAD. The icons that comprised the visual metadata for this study, shown in Figure 16, were obtained from the table of graphic symbols in the Legislation for Security against Fire and Panic in Buildings and Risk Areas in the State of Minas Gerais.
Figure 16. Graphic symbols for safety projects.
After tailoring the prototype to receive the MDFMG data, the graphic symbols table which contained the icons was implemented. Each of the 332 drawings that comprised the corpus was edited in AutoCAD software checking for scales and standard icons, and was placed in a directory. AutoCAD creates the Portable Network Graphics (PNG) image printing file, with parameters of measurement equalling 3.600 x 2.550 pixels. This size was defined after testing which proved its feasibility. Smaller files have lower resolutions which can compromise the perfor-
Chapter 13. An image based retrieval system for engineering drawings
337
mance of the prototype and bigger files occupy far too much memory space and processing time. Afterwards, each technical drawing, in .PNG format, was opened in Irfanview software and stored with a PGM extension which is the format used by the prototype. Each technical drawing was interpreted to determine the values for the three categories (type, process and shape). For data entry in the prototype, the indexer entered the drawing number, the categories, filled in the textual administrative and technical attributes, and selected the icons to be located.
Test results in the case study database The database described in the previous section was used to validate the model and the prototype. All 332 drawing files in the database belonged to the same type (fire fighting) and the same process (executive project). In regards to the Form category, they consisted of 213 floor plans, 74 cross-sections, and 45 details. The time to manually categorize all technical drawings was 2100 seconds, that is, it took an average of 6.2 seconds of human processing time to categorize each technical drawing, while it takes 213.30 seconds of computational processing time to automatically accomplish the same task, on average. Thus, the processing time of an expert for the interpretation and classification of the technical drawings is lower than the time a machine would take for the same task, which confirms that the choice for a human indexer was correct. By defining the category “Form of the Project”, which in this corpus comprises ground plans, cross sections and details, the number of drawings which had to be searched for a given query presented a reduction of 36 % to 86 % (Figure 17). This is representative of the reduction in processing time for image retrieval due to the use of the classification scheme.
338
Baracho, Cendón
Figure 17. Processing time reduction – category Form.
Two tests of the system will be reported here that demonstrate the success of the optimizations implemented in the prototype based on the results of the first set of experiments. Because the main concern was image retrieval, the administrative and technical metadata were not used in these tests, although they are present in the system. In the first test, the system looked for all seven types of fire extinguishers which are represented as triangles, as shown in Figure 16. Because the shape of the icon was simple, the similarity rate was set as 70. The prototype retrieved 140 of the 155 icons in the drawings that corresponded to the searched image, which represents a recall rate of 90.3 %. No incorrect items were found (100 % precision). The second test was a repetition of the previous one, with the similarity rate set as 80 %. The prototype retrieved 85 % of the icons in the drawings that corresponded to the searched images and no incorrect icon was retrieved (i.e. the recall rate was 85 %, and precision was 100 %). Examination of the drawings showed that this 15 % error rate occurred mainly because the resolution was low and the prototype could not distinguish the mobile water extinguisher from the mobile powder extinguisher. Figure 18 shows an example of a technical drawing with the icons found surrounded by a dotted rectangle.
Chapter 13. An image based retrieval system for engineering drawings
339
Figure 18. Example of fire prevention drawings with icons found surrounded by dotted rectangles.
Conclusions This study proposed a model for an image retrieval system for technical engineering drawings which was implemented in a prototype and tested in a real database of 332 drawings. The model is innovative in that the indexing of the drawings considers a combination of visual and textual metadata which provides more access points to the user and improves retrieval. In addition to technical and administrative metadata, such as number, author, or area, which are traditionally used for engineering drawing retrieval, the drawings are also indexed by the icons they contain. Both human and automatic processing were used to index the drawings. While the textual metadata (categories, administrative, and technical) are defined and input by a human indexer, the visual metadata are automatically extracted. An inverted index containing the textual and visual metadata was created. A classification scheme was used at the beginning of the indexing process, which was central to the feasibility of the system. At indexing time, the combination of the three categories for each drawing defines a specific table of icons that may be present in the drawing, therefore reducing the time needed to scan each drawing and extract the visual metadata. At the search stage, the definition of the three categories by the user leads to a table of icons to be selected to be located
340
Baracho, Cendón
and also reduces search processing time by limiting the number of drawings that will be searched. Other important factors for the feasibility of the system were the combined use of a human indexer and the automatic extraction of the visual metadata. This combination offers great potential as it makes the system more efficient using computer processing or human participation where it is most effective. Human cognitive processing was used in the interpretation of the drawing, determination of categories, and indexing of the administrative and technical metadata. The validation of the model and of the prototype through the case study in the MFDMG shows the time of 6.2 seconds was used by an indexer to interpret, classify, and index the technical drawing. The prototype shows 90.3 % recall using a similarity rate of 70 % on average. Recall drops to 85 % when the similarity rate is set at 80 %. In future developments, in order to achieve a higher level of success rate in the retrieval of icons similar but not identical to the searched image, other optimizations with more accurate functions which consider other variations in scale and rotation of the icons as well as lines of interference should be implemented in the prototype. Among the theoretical and methodological contributions of this research is the merging of knowledge and research from two areas of knowledge – information science and computer science – enabling the building of a system for retrieving images by images and the creation of a method of development of the prototype involving an iterative process. The resulting system adds human interpretation to automated processing, linking textual and visual metadata and proposing the intersection of techniques and concepts of information science and computer science. The current study can be applied to the solution of problems concerning the organization and retrieval of information in any field of engineering. The model proposed can be adapted to any institution which deals with engineering drawings requiring only that the parameters, specific metadata tables, categories, and graphical symbols database be defined and adapted for each application. Examples of applications are: 1. How many computers are there in a building or groups of buildings? For this question, the image to be searched would be the icon for a computer. Through this information, the number of laboratories and computers could be defined and factors such as energy consumption and maintenance staff needed could be estimated. 2. How many fire extinguishers are there in a given region and where are they located for direct access in case of an emergency? The system proposed here could provide immediate response for this question, rapidly locating
Chapter 13. An image based retrieval system for engineering drawings
341
this data within thousands of drawings which would permit more efficient action by Fire Brigades and public safety services. Broadly speaking, the system herein proposed can help services and decision making areas such as sanitation, energy, roads, infrastructure, safety, healthcare, and education. The model developed can also be adapted to other situations needing retrieval of images.
References Ajorloo, H., & Lakdashti, A. (2011). A feature relevance estimation method for content-based image retrieval. International Journal of Information Technology & Decision Making (IJITDM), 10(5), 933 – 961. Baeza-Yates, R. A., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search. New York: Addison-Wesley Longman. Baracho, R. M. A. (2007). Sistema de recuperação de informação visual em desenhos técnicos de engenharia e arquitetura: modelo conceitual, esquema de classificação e protótipo. (Unpublished doctoral dissertation). Federal University of Minas Gerais, Brazil. Baracho, R. M. A., & Cendon, B. V. (2009, December). Information retrieval for engineering projects: Using images to search for images. Paper presented at the 1st International Conference on Information Science and Engineering (ICISE), Nanjing. Belkhatir, M., & Thiem, H. C. (2010). A multimedia retrieval framework highlighting agents and coordinating their interactions to address the semantic gap. Expert Systems with Applications, 37(12), 8903 – 8909. Chih-Chin, L., & Ying-Chuan, C. (2011). A user-oriented image retrieval system based on interactive genetic algorithm. IEEE Transactions on Instrumentation and Measurement, 60(10), 3318 – 3325. Chu, H. (2001). Research in image indexing and retrieval as reflected in the literature. Journal of the American Society for Information Science and Technology, 52(12), 1011 – 1018. Ciocca, G., Cusano, C., Santini, S., & Schettini, R. (2011). Halfway through the semantic gap: Prosemantic features for image retrieval. Information Sciences, 181(22), 4943 – 4958. Colombino, T., Martin, D., Grasso, A., & Marchesotti, L. (2010). A reformulation of the semantic gap problem in content-based image retrieval scenarios. Proceedings of COOP 2010, Computer Supported Cooperative Work (pp. 45 – 56). Enser, P. (2000). Visual image retrieval: Seeking the alliance of concept-based and content-based paradigms. Journal of Information Science, 26(4), 199 – 210. Foley, J. D., van Dam, A., Feiner, S. K., Hughes, J. F., & Phillips, R. L. (1994). Introduction to computer graphics. Reading, MA: Addison-Wesley Professional. Frohmann, B. (1990). Rules of indexing: A critique of mentalism in information retrieval theory. Journal of Documentation, 46(2), 81 – 101. Gonzales, R. C., & Woods, R. E. (2002). Digital image processing. Upper Saddle River, NJ: Prentice Hall.
342
Baracho, Cendón
Guan, H., Antani, S., Long, L. R., & Thoma, G. R. (2009). Bridging the semantic gap using ranking SVM for image retrieval. Paper presented at the Sixth IEEE International Conference on Symposium on Biomedical Imaging: From Nano to Macro, Boston, MA. Hearn, D., & Baker, M. P. (1996). Computer graphics. New Delhi: Prentice-Hall of India Private. Heidorn, P. B. (1999). Image retrieval as linguistic and nonlinguistic visual model matching. Library Trends 48(2), 303 – 325. Hjørland, B. (1992). The concept of “subject” in information science. Journal of Documentation, 48(2), 172 – 200. Ion, A. L. (2009). Algorithms for reducing the semantic gap in image retrieval systems. Paper presented at the 2nd Conference on Human System Interactions, Catania, Italy. Lakdashti, A., Moin, M. S., & Badie, K. (2009). Reducing the semantic gap of the MRI image retrieval systems using a fuzzy rule based technique. International Journal of Fuzzy Systems, 11(4), 232 – 249. Maher, M. L., & Rutherford, J. H. (1997). A model for synchronous collaborative design using CAD and database management. Research in Engineering Design, 9(2), 85 – 98. Mehrotra, R., & Gary, J. E. (1995). Similar-shape retrieval in shape data management. Computer, 28(9), 57 – 62. Mokhtarian, F., & Abbasi, S. (2002). Shape similarity retrieval under affine transforms. Pattern Recognition, 35(1), 31 – 41. Rosa, N. A., Felipe, J. C., Traina, A. J. M., Traina, C., Rangayyan, R. M., & Azevedo-Marques, P. M. (2008). Using relevance feedback to reduce the semantic gap in content-based image retrieval of mammographic masses. Paper presented at the 30th Annual International Conference of the IEEE, Vancouver, BC. Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 1349 – 1380. Yang, J. D., & Yang, H. J. (1999). A formal framework for image indexing with triples: Toward a concept-based image retrieval. International Journal of Intelligent Systems, 14(6), 603 – 622. Zachary, J., Iyengar, S. S., & Barhen, J. (2001). Content based image retrieval and information theory: A general approach. Journal of the American Society for Information Science and Technology, 52(10), 840 – 852.
Kathrin Knautz
Chapter 14. Emotion felt and depicted: Consequences for multimedia retrieval Abstract: The great increase of multimedia documents in collaborative services like YouTube, Flickr and Last.fm brings with it the challenge of indexing and retrieving these documents. One aspect which has come into the focus of information science research is capturing emotions in videos, pictures and music. The goal is to identify emotions and enable the retrieval of documents based on these emotions. Research in this area is, however, still in the early stages and an application of the concept in the World Wide Web has yet to be implemented. Other fields, like psychology, film and musicology, on the other hand, pay great attention to the emotional description of films, images and music. In order to attain satisfying results regarding the capture of the emotional impact of multimedia documents and make these captured emotions retrievable, an interdisciplinary approach is necessary. The following chapter provides a look at the fundamentals of emotion research and applies them to the identification of emotions in multimedia documents. In the process, it becomes clear that it is necessary to distinguish between emotions depicted in the medium and emotions felt by the person viewing the medium. This has lasting consequences for indexing emotions and the retrieval of emotional content in multimedia documents, since it requires us to differentiate between these two types of emotions at both stages. We are using the search engine MEMOSE to exemplify how such a differentiation between depicted and felt emotions can be implemented. Keywords: Emotional information retrieval, multimedia documents, image, video, music, felt emotion, depicted emotion, cognitive theory, appraisal process, commotion, tagging, scroll bar, Media EMOtion Search; MEMOSE
Kathrin Knautz, Research Associate and PhD Student, Department of Information Science, Heinrich-Heine-University Düsseldorf, [email protected]
1 Introduction Emotions are central phenomena of human life, since they occur frequently and are often connected to personally significant events (Lazarus, 1991). They are a
344
Knautz
vital part of human communication and interaction. People take an active part in the emotions of others by trying to understand what caused the emotion and by helping to overcome certain emotion. Plutchik (1962), an important emotion researcher, writes: “The emotions have always been of central concern to men. In every endeavour, in every major human enterprise, the emotions are somehow involved” (Plutchik, 1962, p. 3). It is thus not surprising that researchers from various fields show an interest in emotions. Especially in the field of psychology, emotion research has a long tradition. Attaining a definition of the concept of an emotion is, however, rather difficult. Fehr and Russell (1984) write very aptly: “Everyone knows what an emotion is, until one is asked to give a definition. Then, it seems, no one knows” (1984, p. 464). We all seem to know which emotions are typical for ourselves and can easily list examples. The difficulty, however, lies in formulating necessary and sufficient conditions for emotions. In section 2.1, we will illustrate how definitions vary based on different psychological orientations. Since no consensus has been reached for a common definition, we will also elucidate the alternative practice of using a so-called working definition. Because of the difficulties in defining what an emotion is, some researchers use the features of emotions (2.2) to approach the problem. Psychologists use various models for this, which can be combined. The approaches range from describing emotions through dimensions like valence and intensity to clustering procedures which assign similar emotion words like aversion, contempt and disgust to the same class. Concerning the general delimitation of emotions, moods, feeling states and affect, we would like to refer to the paper by Otto, Euler and Mandl (2000). The applications of emotion-psychological research results are wide-ranging. While emotional aspects play a vital role in music therapy (Davis, Gfeller, & Thaut, 1992), for example, emotion-psychological research results are also used in advertising and marketing, since “media content can trigger the particular emotions and impulse buying behaviour of the viewer / listener / surfer” (Adelaar, Chang, Langendorfer, & Morimoto, 2003, p. 247) and emotions are quite important in brand perception and purchase intents (Morris & Boone, 1998). Web users also feel a need for emotional documents. Thanks to the emergence of collaborative services like YouTube (videos), Last.fm (music) and Flickr (pictures), we have massive amounts of user-generated content. In order to make these resources retrievable, the content is indexed via tags by the users of the service. Hastings, Iyer, Neal, Rorissa, and Yoon (2007) write the following about collaborative services and indexing via tags: “Image and video sharing services such as Flickr and YouTube pose new challenges in multimedia information indexing and retrieval and demand dynamic set of solutions” (p. 1026).
Chapter 14. Emotion felt and depicted
345
One of these challenges is the identification of the emotional content of the documents. Recent studies (Bischoff, Firan, Nejdl, & Paiu, 2008, 2010) show that if we assign categories like time, place or type to the various tags, the categories of tags that are assigned differ greatly from the categories of tags that are searched, especially in the area of opinions / qualities. 24 % of Flickr queries, for example, consist of emotional tags, but only 7 % of resources are indexed with emotional tags (Bischoff et al., 2010). We are thus confronted with the challenge of indexing emotions in multimedia documents and developing a system that facilitates their retrieval. In order to solve this challenge, a closer examination of the media documents is necessary, because not all emotions are equal. An analysis of how emotions are formed when we see something (3.1) and what relation exists between felt emotions and depicted emotions (3.2) provides important clues regarding the indexation of media documents. In order to elucidate the emotional media impact, we will introduce the component-process-model by Scherer (1984) and illustrate its application to the process of watching / listening to videos, pictures and music. In these so-called appraisal models, emotions are the result of the subjective appraisal of a situation. We will then integrate the emotion-psychological insights won from these considerations into a new approach to the emotional indexation of videos, pictures and music in a collaborative Web 2.0 environment (4.1). Using the search engine MEMOSE (Media Emotion Search) as an example, we will illustrate how a concrete realization of this approach might look (4.2).
2 Theoretical aspects of emotions Below, we want to provide a first insight into the concept of emotions. To this end, we will first illuminate various possible definitions. It should be said that, as a result of the differing theoretic trends and changing scientific paradigms, psychological emotion researchers tend to prefer using an (incomplete) working definition. In the following, we will further consider the question of which emotions exist and how they can be differentiated.
2.1 The definition of emotion Emotions play a vital role in social interaction and thus in the creation of social order. A definition of the term is, however, rather difficult. Emotion research has
346
Knautz
a long history, but an exact, uniform definition for the concept of an emotion has yet to be produced. Depending on their research focus, researchers emphasize different aspects in their definitions. I will mention some of the most popular and influential definitions exemplarily. Watson (1919, p. 165) said that “[a]n emotion is a hereditary pattern-reaction involving profound changes of the bodily mechanism as a whole, but particularly of the visceral and glandular systems”. In his definition, Watson doesn’t differentiate between an emotion and its reaction patter, which he understands to be inborn. William James (1884) says that “the bodily changes follow directly the PERCEPTION of the exciting fact, and that our feeling of the same changes as they occur IS the emotion” (p. 189). According to James, an emotion is thus a state which we experience, a bodily reaction that follows the perception of a specific stimulus. In his two factor theory, Schachter (1964) also assumes that feeling bodily changes is necessary in order to experience emotions. He amends, however, that this experiencing state emerges from the integration of a feeling of arousal and a context-dependent cognition that matches the arousal. Bernard Weiner (1986), on the other hand, does not recognize an influence of physical arousal on emotions. According to Weiner, emotions are experiencing states with a positive or negative quality. They are caused by cognitive assessments that have motivational and informational consequences. Plutchik (1980) argues that the concept of emotion can be described through a combination of biological (ultimate), psychological, and physiological (proximate) explanations in his psycho-evolutionary emotion theory. Plutchik sees emotion as a syndrome that was created through natural selection in evolution, and is characterized by an experiencing state, physiological reaction, cognition, stimuli to act and observable behaviour. These explanations and definitions show how different prevailing points of view have led to various emotion theories. These definitions only constitute a small part of the definitions of emotion found in the emotion-psychological specialized literature. Some notable newer approaches view emotions from a neurological point of view as neuro-physiological reactions (LeDoux, 1995) or as social constructs (Gergen, 1991) from the point of view of social constructionism. An exact definition accepted by all thus does not exist. According to Otto, Euler and Mandl (2000), such a definition would require an exhaustive exploration of the subject area, something that has yet to be achieved in emotion research. Meyer, Schützwohl and Reisenzein (2001) point out that an exact definition of emotions is not the prerequisite, but rather the result of scientific analysis. Thus, research uses a so-called working definition, which should be as uncontroversial as possible and accepted by many researchers. It serves to describe a phenomenon and delimit the research area.
Chapter 14. Emotion felt and depicted
347
Kleinginna and Kleinginna (1981), for example, examined 100 statements and definitions from relevant specialized works, dictionaries and introductory texts and came up with the following working definition: Emotion is a complex set of interactions among subjective and objective factors, mediated by neural hormonal systems, which can (a) give rise to affective experiences such as feelings of arousal, pleasure / displeasure; (b) generate cognitive processes such as emotionally relevant perceptual effects, appraisals, labelling processes; (c) activate widespread physiological adjustments to the arousing conditions; and (d) lead to behaviour that is often, but not always, expressive, goal directed, and adaptive. (p. 355)
With this definition, Kleinginna and Kleinginna try to do justice to and combine all traditional, significant aspects of the different psychological theories. To summarize, according to this definition, an emotion is a complex pattern that is characterized by changes. Physiological arousal, feelings, cognitive processes and behaviour patterns are parts of this pattern, which occurs in situations that are significant for the individual. Notable newer working definitions put a greater emphasis on the cognitive component. As a central process, cognitive assessment is important not only for the creation of emotions, but also for continuous and recursive information processing. In the context of his Component Process Model (CPM), Klaus Scherer (1993) defines an emotion as an episode of temporary synchronisation of all major subsystems of organismic functioning represented by five components (cognition, physiological regulation, motivation, motor expression, and monitoring / feeling) in response to the evaluation of an external or internal stimulus event as relevant to central concerns of the organism. (p. 4)
According to this working definition, emotions function as continuous intermediaries between the organism and its environment. Cognitive processes play a vital role here, since they assess events in the environment and judge their significance for the individual. Meyer, Reisenzein, and Schützwohl (2001) present another working definition. They define emotion as temporally dated, concrete occurrences which have certain attributes. In their view, emotions are current psychic states and thus distinguishable from other concepts like dispositions and moods. Emotions are also directed towards a specific – not necessarily existing – object with respect to the objects that cause said emotions. The authors further say that emotions reveal themselves as a so-called reaction triad consisting of subjective, behavioural and physiological aspects. They also have a specific quality, intensity and duration.
348
Knautz
2.2 Classification of emotions If we want to analyse emotions or use emotional aspects in the context of multimedia retrieval, it raises the question of which emotions exist and how they can be delimited. In the field of psychology, we can find three approaches to answer this question. Dimensional models try to quantitatively assess the features of emotions and map them to axes with respect to dimensions. When we analyse content emotionally based on a dimensional model, we can use either a 2-dimensional model, which models the arousal and valence of emotions, or a 3-dimensional model as proposed by Russell and Mehrabian (1977). Mehrabian’s (1995) 3-dimensional model is very similar to the behaviouristic model proposed by Wundt (1906), although it is not based on Wundt’s work. The three dimensions used in the (P-A-D-) model are as follows: – pleasure and displeasure (valence; analogous to Wundt’s Lust and Unlust) – arousal and non-arousal (intensity; analogous to Wundt’s Erregung and Beruhigung) – dominance and submissiveness (dominance; approximately similar to Wundt’s Spannung (tension) and Lösung (relaxation)) Whereas Wundt used his model to describe the progression of feelings, Mehrabian developed his dimensional model in order to categorize and group emotions. Arifin and Cheung (2007) hold a similar view of dimensional models: “This model does not reduce emotions into a finite set, but attempts to find a finite set of underlying dimensions into which emotions can be decomposed” (p. 69). We can see that the goal of dimensional models is to use a space defined by valence, dominance and intensity in order to identify and sort emotions. Another method to identify and distinguish between emotions is the use of classes (category model). This method sorts emotion words according to similarity and uses statistics to group similar words into one category. This proves helpful for finding word fields that contain similar emotion words (SchmidtAtzert & Ströhme, 1983). Another approach to categorize emotions which is sometimes used by psychologists is to reduce the number of emotions to a small, fixed number. These emotions are called fundamental emotions or base emotions. Advocates of the base emotion theory use psychological and / or (evolutionary) biological arguments for the existence of such fundamental emotions (Ortony & Turner, 1990). However, due to the differing research foci of emotion researchers, no consensus can be found for their number (Table 1). Furthermore, the existence of base resp. secondary emotions has been increasingly questioned in recent years (i.e. Ortony & Turner, 1990; Meyer, Reisenzein, & Schützwohl, 2003).
Chapter 14. Emotion felt and depicted
349
Fundamental Emotion
Basis for Inclusion
Reference
rage and terror, anxiety, joy expectancy, fear, rage, panic pain, pleasure anger, interest, contempt, disgust, distress, fear, joy, shame, surprise acceptance, anger, anticipation, disgust, joy, fear, sadness, surprise anger, aversion, courage, dejection, desire, despair, fear, hate, hope, love, sadness anger, disgust, anxiety, happiness, sadness anger, disgust, fear, joy, sadness, surprise anger, disgust, elation, fear, subjection, tender-emotion, wonder anger, contempt, disgust, distress, fear, guilt, interest, joy, shame, surprise desire, happiness, interest, surprise, wonder, sorrow fear, grief, love, rage fear, love, rage happiness, sadness
hardwired hardwired density of neural firing unlearned emotional states
Gray (1982) Panksepp (1982) Mowrer (1960) Tomkins (1984)
relation to adaptive biological processes
Plutchik (1980)
relation to action tendencies
Arnold (1960)
do not require propositional content universal facial expressions
Oatley & Johnson-Laird (1987) Ekman, Friesen, & Ellsworth (1982) McDougall (1908 / 1960; 1926)
relation to instincts
hardwired
Izard (1971)
forms of action readiness
Frijda (1986)
bodily involvement hardwired attribution independent
James (1884) Watson (1930) Weiner & Graham (1984)
Table 1. Basic emotions (Ortony & Turner, 1990, p. 316).
3 Felt emotions and depicted emotions Emotions can be found in all kinds of multimedia documents. However, the emotions depicted in media are not always the emotions that are aroused by said media. This raises the question of how these two types of emotions – depicted and felt – are related. We will begin our approach to the topic with an overview of the various emotion-psychological theories. These theories try to explain how emotions are formed from various points of view. Knowing how emotions are formed also means being able to influence people through depicted emotions. The influence emotions depicted in multimedia documents have on people shall be shown using a so-called appraisal model.
350
Knautz
3.1 Emotions felt – theories in emergence Different theories of emotion research try to explain how emotions are formed. We differentiate between – behaviouristic theories of emotion; – physiological-cognitive theories of emotion; – evolution-psychological theories of emotion; – attributional theories of emotion. At this point we only want to provide a brief insight into the emergence of emotions from these diverse viewpoints.
(Classical) behaviourism Behaviouristic approaches emphasize the behaviour aspect and the conditions that trigger emotions. Emotions are either seen as inborn (fear, anger and love) or as conditioned reaction patterns to specific stimuli (Watson, 1930). The most popular representative of this theory, Watson (1913), eschews any form of introspection. According to him, intra-subjective experience aspects are not scientifically acceptable, since they are not accessible to various independent observers. Only inter-subjectively observable and measurable variables and stimuli can be used in a methodological approach. A felt emotion in the behaviourist theory is therefore only present if it is measurable; for example, through physiological reactions (perspiration, skin temperature, blood flow indicators). Voluntary disclosures in the form of utterances or questionnaires are used only as rough indications and must be complemented by accompanying independent behaviour monitoring and physiological measurements. Specifically, Watson sees emotions as hereditary reaction patterns that occur when the triggering stimulus is presented (Watson, 1919). As basic reaction patterns, he postulates fear, anger, and love, which are modified by learning experience and stimulus substitution. Through classical conditioning, we acquire our own characteristic emotional reaction patterns in relation to environmental stimuli (e.g. Jones, 1924; Watson & Rayner, 1920).
Cognitive-physiological theories In cognitive and physiological theories, it is postulated that emotions are determined through the interaction between physiological and cognitive changes.
Chapter 14. Emotion felt and depicted
351
Through the (in)direct perception of this change, people develop and adapt their emotions (James, 1884; Lange & Kurella, 1887). An emotion is here characterized as an experiencing state, the emergence of which is dependent on subjective feelings and bodily reactions. James (1884) describes the relationship as follows: “We feel sad because we cry, angry because we strike, afraid because we tremble, and neither we cry, strike, nor tremble because we are sorry, angry, or fearful, as the case may be” (p. 190). According to James, physiological reaction patterns (e.g. heartbeat, pulse rate), are reflexively triggered by the perception of an external stimulus (e.g. the appearance of a beautiful woman) specific to a certain emotion. The conscious perception of these physiological (motoric) changes is the emotion. In summary we can say that, in contrast to Watson, a strong focus is on the subjective aspect. Physical changes are not the consequence, but the cause of the emotion. Due to much criticism James revised his theory ten years later. The new version held that not simply the perception of an object, but the assessment of the situation as a whole causes an emotional reaction. The belief that the origin of emotions is largely physiological, which has been held by a majority of cognitive(physiological) theorists in the beginning, was gradually revised (e.g. Cannon, 1927; Marañon, 1924). Theories of the 60s and 70s (Schachter, 1964; Valins, 1966; Zillmann, 1978) are therefore more strongly influenced by the cognitive component. Thus, although Schachter (1964) retains some of the basic assumptions of James, the perception of physiological changes is not sufficient for him and furthermore does not condition the emotion quality. Rather, cognition takes hold at this point. It conditions the quality of the emotion through an emotion-relevant assessment of the situation and causal attribution. The intensity of the emotion is determined by physiological arousal. Emotions can therefore be viewed in Schachter’s theory as postcognitive phenomena. Other studies such as studies with paraplegics (Hohmann, 1966) indicate that physiological arousal is not necessary for the emergence of emotions. For this reason, recent research deals with so-called cognitive appraisal theories. These assume that emotions are a result of the interpretation and explanation of an incident. The lack of physical arousal is here acceptable (e.g. Frijda, 1986; Lazarus, 1995; Scherer, 1988, 1998a). Figure 1 demonstrates the emergence of emotions according to Scherer (1998b) in a simplified form. In the example, a person assesses the relevance of a perceived stimulus (e.g. an event or object) for their needs. This appraisal process includes aspects like the novelty of the stimulus, valence, purposefulness, certainty about the consequences, etc. Thus, the subjectively assessed importance of the situation is critical for triggering the emotion. The result of this process leads to a specific reaction pattern, which may be marked by physiological reactions, motor expressions, etc.
352
Knautz
Figure 1. Emergence of emotions (according to Scherer (1998b)).
It has become increasingly established in research that emotions are based primarily on cognitions, appraisals and assessments. It is important, however, to conceptualize emotion and cognition in an appropriate manner. In the context of cognitive theories of emotion, attributional theories of emotion present a different concept by trying to accomplish this conceptualization.
Attributional theories of emotion Attribution / attributional theories describe how humans try to understand and control their environment using causal attribution. According to these approaches, emotions are reactions to the results of these actions (e.g. Arnold, 1960; Weiner & Graham, 1984). Important suggestions regarding the causal explanation go back to Fritz Heider (1958). While attribution theories primarily deal with the formation of attributions, attributional theories are primarily concerned with the effects of already formed attributions. According to Bernard Weiner (1986), a prominent representative of attributional theories, the emergence of emotions is a sequential cognitive information process. Before an emotion can be experienced, three cognitive steps must be completed upon the perception of the emotion-triggering event. In a first step, an assessment is carried out regarding the relation of the event to the achievement of objectives (event-driven emotions like happy, pleased, or satisfied). A second step is to ascribe the event to an originating factor like the person’s own ability, effort or chance (attribution-specific emotions like surprise). Mapping the originating factor to dimensions is the last step (dimension-dependent emotions such as pride, shame, helplessness). These dimensions form – personal dependencies: am I the cause or are other people the cause? – stability: how long-lasting is the cause / origin? Talent or effort? – controllability of the cause: controllable (effort), uncontrollable (aptitude). Emotions ultimately result from the interaction of causal attribution and evaluation.
Chapter 14. Emotion felt and depicted
353
Evolution-psychological theories Evolution-psychological theories emphasize the evolutionarily conditioned adaptive functions of emotions, such as their survival function. Besides proximate processes and their distal emergence, evolution-psychological approaches describe mainly ultimate (biological) functions. They go back to Darwin (1872), whose main concern was to demonstrate the evolutionary development of emotions. Darwin sees the evolutionary advantage of emotions in the fitness improvement (better perception, manipulation possibilities, effect on others). According to Darwin, emotions are innate mental states that occur automatically, depending on the situation. Cognitive assessments cause the emotional expression. His studies were continued by many emotion researchers, who usually represent the most famous formulation of the evolutionary psychological position: the theory of discrete basic emotions (McDougall, 1926; Plutchik, 1980; Izard, 1971, 1977, 1994; Ekman et al. 1982; Tomkins 1962, 1963). Among the most popular approaches is the psycho-evolutionary syndrome theory by American psychologist Robert Plutchik (1980). He postulates the existence of inherited dispositions towards eight fitness-enhancing adaptive behaviour patterns. These eight dispositions are the foundation of all other emotions (dyads or triads), and cannot be traced back to more fundamental emotions. Emotions are thus biologically and psychologically fundamental. Figure 2 portrays a simplified exemplary reaction sequence for the emergence of the emotion “fear”.
Figure 2. Chain reaction for the emotion “fear” (adapted from Plutchik, 1980).
The reaction sequence starts with a perceived stimulus which causes a threat, e.g., the emergence of a bear in a forest. The cognitive assessment of this event comes to the conclusion that a threat emanates from this animal and, through physiological reactions (increased autonomic activity), leads to the emotional state of “fear”. The next step in Plutchik’s sequential model is the activation of an action impulse, like escaping from the bear. This observable behaviour has the biological function of protection against threats.
354
Knautz
3.2 Depicted emotions – emotions triggered by multimedia documents How, then, do depicted emotions in multimedia documents influence people? Studies show that a significant number of emotions are caused by the representation of events in various media (Scherer, Wallbott & Summerfield, 1986; Cantor & Nathanson, 1996). The question of the effect of media on the behaviour of recipients is as old as media itself. The fundamental paradigm of research on the emotional effects of media is the stimulus-response model. In this model, the outgoing stimuli of a communicator (media content) cause a reaction (response) in the recipient and possibly lead to emotional changes. Recipients are viewed as primarily passive – the motives that made them turn to the medium in the first place are neglected. Modern models of media effects research are audience-centred approaches, which place the recipients and their media selection at the front. Thus, the uses and gratification approach (Katz & Foulkes, 1962), for example, tries to find out motives of media use. The use of a medium is thus directed by the expected utility and in how far it will satisfy certain needs (Schenk, 1987). Emotional media impact – moodmanagement – goes back to Zillmann (1988). His theory on media selection is a specialization of the uses and gratification approach. The basic assumption is the innate desire for a positive emotional state that occurs through an average stimulation. An experience-based media selection occurs based on individual unconscious motives. Through the selection of media content, emotions arise in the recipient. Regarding the question of how emotions caused in response to real events and emotions caused in response to media-mediated events differ, we would like to refer the reader to further reading relating to concepts such as Sense of Reality, Law of Apparent Reality, or Perceived Reality (Ortony, Clore & Collins, 1988; Frijda, 1988; Rothmund, Schreier & Groeben, 2001). But how exactly are felt emotions and emotions depicted in multimedia documents related? For an explanation of media effects in the emotional sphere, appraisal models can be used (Mangold, Unz, & Winterhoff-Spurk, 2001; Schwab, 2001). In these models emotions are the result of a subjective assessment of the situation (see 3.1, Cognitive theories of emotion). With their peculiarities, component models integrate the triad of emotions derived from Izard’s (1977) three component theory: the neurophysiological component, the subjective experience and the motor-expressive component. For the next steps, the component-processmodel by Scherer (1984, 2001b) is used and tested for its use in a multimedia context.
Chapter 14. Emotion felt and depicted
355
In his component-process-model, Scherer postulates five subsystems, which are involved in the formation of an emotion (Table 2). These include – cognitive processes (appraisal) due to the valuation of objects; – physical reactions, produced in the neuroendocrine, autonomic and somatic nervous systems; – motivational changes brought about by the appraisal process; – facial and vocal expression and – subjectively experienced emotional state. Scherer explains the relations between the systems as follows: “The central assumption of the componential pattern theory is that the different organismic subsystems are highly interdependent and that changes in one subsystem will tend to elicit related changes in other subsystems” (Scherer, 2001b, p. 106). Since a complete discussion of all components with reference to videos, music and images is not possible at this point, we will exemplarily discuss the appraisal component in greater detail below, since it is primarily responsible for triggering and differentiating between emotions. The cognitive component is intertwined with all other components through feedback loops (Brosch & Scherer, 2009) and is fundamental in order to illuminate the relationship between felt and depicted emotions.
Emotion function
Organismic subsystem
Emotion component
Evaluation of objects and events System regulation
Information processing (CNS)
Preparation and direction of action Communication of reaction and behavioural intention Monitoring of internal state and organism– environment interaction
Executive (CNS)
Cognitive component (appraisal) Neurophysiological component (bodily symptoms) Motivational component (action tendencies) Motor expression component (facial and vocal expression)
Support (CNS, NES, ANS)
Action (SNS)
Monitor (CNS)
Subjective feeling component (emotional experience)
Note: CNS = central nervous system; NES = neuro-endocrine system; ANS = auto-nomic nervous system; SNS = somatic nervous system.
Table 2. Relationships between organismic subsystems and the functions and components of emotion (Scherer, 2005, p. 698).
356
Knautz
The appraisal component is used for information processing and includes a subjectively-assessed relevancy of the situation. Evaluation criteria (stimulus evaluation checks (SECs)) are relevance, implications, coping potential and normative significance (Scherer, 2001b). The following questions motivate these appraisals: – relevancy: How relevant is this event for me? Does it affect me / my social reference group? – implications: What are the implications or consequences of this event, how do they affect my well-being and my immediate or long-term goals? – coping potential: How well can I deal with these consequences or adjust myself to them? – normative significance: How important is this event in relation to my self-concept and social norms and values? (Brosch & Scherer, 2009, p. 197 [translated]) These information processing steps affect the individual sub-processes through complex multiple feedback and feed-forward processes. The result of this appraisal process is an emotion which is characterized by physiological symptoms and movements (“motor-expressive movements”) in the face, body and voice (Figure 3). When the mutual influence the subsystems exert on each other subsides, the emotional episode ends, according to Scherer (1984). Crucial for triggering the emotion is the subjective nature of the event for the actual motivation of the organism.
Figure 3. Emergence of emotions (Scherer, 1998b).
Emotions can also arise in a person, A, when s / he sees another person, B, experiencing an event that is relevant to B. These emotions which we feel when observing other people’s emotions (depicted emotions) are called commotions by Scherer (1998b). According to the general commotion model (Scherer, 1998b; Scherer & Zentner, 2001), these emotions arise in the recipient via induction (appraisal process), empathy or emotional contagion. How exactly these emotions and commotions emerge when watching / looking at media is described in Figure 4.
Chapter 14. Emotion felt and depicted
357
Figure 4. Model of ‘normal’ emotion induction via appraisal (upper part) and mediated ‘commotion’ due to empathy or other mechanisms of emotional communication (lower part) (Scherer & Zentner, 2001, p. 366).
Film reception Videos and films are among those media through which an emotional impact can be achieved (Gross & Levenson, 1995). The origin of such considerations can be found as early as in antiquity. Aristotle was concerned with the effect depicted emotions have on the audience. According to Aristotle, the effect of tragedy is arousal and cleansing of the affective states of fear and compassion (catharsis). The depiction of emotions by the actors was supposed to evoke the same emotions in the audience. Scherer (2001a) formulates three possible origins of emotion emergence when viewing television reports, videos, etc. on the basis of the content displayed (see Figure 5): induction, empathy and contagion. In real situations, the process of induction is equal to the appraisal process. In this process, a person assesses the relevancy of a perceived stimulus (e.g. event or object) for their goals or needs based on the SECs. In contrast, in the media environment, a fictional idea which the actor is trying to express takes the place of the object. The commotion is formed through a virtual appraisal. The emotional reaction of the audience is a direct result of an evaluation in regards to their own goals or values (induction). A good example is the outraged reaction (depicted emotion) of a person accused of murder in a crime series. The emotional reaction of the audience (felt reaction) is a subjective assessment of the situation. At this point an important distinction must be made. The felt emotion may coincide one depicted. In this case anger, for example, because in his evaluation process the viewer has come to the conclusion that the accused person is
358
Knautz
innocent. However, the felt emotion may also be diametrically opposed. If the alleged murderer is a person negatively evaluated by the viewer, the felt emotion may be (malicious) joy. Another possible cause for emotions in the viewer is empathic responses. This is the case if the recipient is not in any way personally affected, but can still assess the situation with regard to goal relevancy thanks to the actor / sender (empathy). An identification with the actor is also possible if the observer is aware that the emotion is only acted and thus not real. Here too, we must distinguish between depicted and felt emotions. Empathic responses can lead to symmetric commotions (the same emotion as the one of the sender, e.g., if the sender is likeable) or asymmetric commotions (different emotion as the sender, e.g. if the sender is dislikeable). Emotional contagion is a third mechanism that causes emotions through the behaviour of the sender. This process is divided into two parts and arises when observing strong motor-expressive reactions. It includes, for example, motor mimicry, as when people yawn. The peripheral feedback of the emotional expression serves as a second process step. By acting out emotions via expression signals (e.g. gestures) the emotion is further strengthened. Hatfield, Cacioppo, and Rapson (1992, p. 153) define emotional contagion as a “tendency to automatically mimic and synchronize facial expressions, vocalizations, postures, and movements with those of another person and, consequently, to converge emotionally.” Emotional contagion is thus based not on the appraisal of certain situations or the comparison with one’s own goals. Rather, it is a mostly unconscious motoric reaction to the perceived behaviour of others. When emotional contagion happens, the depicted and felt emotions match. Studies show that emotional contagion is independent of empathy and identification with the sender (Hatfield et al., 1992).
Chapter 14. Emotion felt and depicted
359
Figure 5. Emotion elicitation based on watching a fictional person’s emotional expression in a media context (Scherer, 2001a, p. 138)
It can be said that an emotion may arise for various reasons (induction, empathy, contagion), and a distinction between depicted and felt emotions is fundamental for any analysis. A starting point to figure out which emotions are conveyed by films in general can be found in the various genre terms, since the denomination of the genre is often reflected by the strongest triggered emotion (Gross & Levenson, 1995). Wirth and Schramm (2005) give an overview of many genre-specific studies regarding the impact of emotions in films. Research regarding the emotions generated by advertising are also numerous (e.g. Bagozzi, Gopinath, & Neyer, 1999; Edell & Burke, 1987). These studies find, for example, that any type of advertising evokes emotions (Zeitlin & Westwood, 1986) and that the product functions as a mediator (Holbrook & Batra, 1987; Mitchell & Olson, 1981). This raises the question of which cinematic techniques can be used to trigger or reinforce emotions when viewing. A comprehensive meta-study regarding the effect of vocal expression and music performance has been compiled by Juslin and Laukka (2003). In addition to these acoustic parameters, visual features such as colour, texture, shape, or camera motion constitute another entry point. Apart from the extraction of content-based features, current information science studies use category models (e.g. Salway & Graham, 2003), dimensional models (e.g. Hanjalic & Xu, 2005; Soleymani, Chanel, Kierkels, & Pun, 2008) or hybrid models (e.g. Knautz, Neal, Schmidt, Siebenlist, & Stock, 2011) to identify emotions. Amongst other things, accompanying text-statistical methods (e.g. Chan & Jones, 2005) or the use of Hidden Markov models (e.g. Kang, 2003; Xu, Jin, Luo, & Duan, 2008) can also provide good results. Other methods, particularly for identifying felt emotions, include neuro-physiological approaches, observer and
360
Knautz
self-report methods. Lopatovska (2011) as well as Lopatovska and Arapakis (2011) have compiled an extensive summary of these methods.
Picture reception Just like films and videos, pictures can also induce emotions. Jörgensen (2003) claims that “[a]n image attribute is … not limited to purely visual characteristics, but includes other cognitive, affective, or interpretative responses to the image such as those describing spatial, semantic, or emotional characteristics” (p. 39). It can also be said that “[a]ll images are to be considered as emotional-laden, if they provoke emotions in the viewers, independent from the specific content of the picture” (Schmidt & Stock, 2009, p. 865). But what is the relation between the depicted content and the evoked emotion? Scherer’s (2001b) emergence mechanisms may come into play here as well. Viewing a picture can also entail an emotional reaction to the illustrated event. If this reaction is a direct result of the assessment of the situation with regard to one’s own values, the emotion is created via induction. The relevance and assessment of the perceived image can cause symmetrical and asymmetrical reaction patterns. Depending on the assessed importance of events for the current motivation of the viewer, a war picture (see Figure 6 (a)), for example, may cause positive or negative emotions in the appraisal process. For example, if the viewer is a war veteran and associates terrible memories with the picture, likely reactions are anger, sadness, or aversion. Conversely, he might just as well be reminded of his youth and old comrades and feel longing or joy. The emotion felt may therefore vary quite a bit. Apart from differing assessments of the picture, the reaction may also be either symmetric (the same emotion is depicted and felt) or asymmetric (the opposite emotion is felt) just as when watching cinematic material. Likewise, it is possible to feel empathy. This is the case when a situation can be understood from personal experience. Figure 6 (b) shows two boys whose relationship can be regarded as very amicable due to their posture, gestures and facial expressions. The depiction of friendship certainly has the potential to elicit empathy. Empathic reactions can, however, also lead to contrary commotions. Imagine a tragic situation (e.g. a mishap of a child with a bucket of water) that you can understand very well, but that would make you laugh. The emotions depicted in pictures may well have a contagious effect, as seen in figure 6 (c). It shows a sad child, the head is bowed deeply. Due to his inability to help this child, the negative emotion, e.g. sadness, can be transmitted to
Chapter 14. Emotion felt and depicted
361
the viewer. Through the perception of this expressive behavior, emotions may be transferred between people (Le Bon, 1985). Regarding the indexing and retrieval of emotions in pictures, low-level visual features are usually extracted and combined with content-based methods such as vector space approaches (Wang, Chen, Wang, & Zhang, 2003; Wang, Yu, & Zhang, 2004), neural networks (Dellandrea, Liu & Chen, 2010; Kim, Shin, Kim, Kim, & Shin, 2009), machine learning (Feng, Lesot, & Detyniecki, 2010), or fuzzy rules (Kim, Kim, Koo, Jeong, & Kim, 2005). Additionally, dimensional models (Hanjalic, 2006), category models or hybrid models (Knautz, Siebenlist, & Stock, 2010) often form the basis of these approaches. Interesting research approaches can also be found regarding the interaction of photographic documents and associated text (titles, tags, etc.) (Neal, 2010a, 2010b).
Figure 6. Possible images for emotion elicitation based on Scherer (2001).
Music Music is present in every culture and plays an important role. It can elicit intense reactions (Gabrielsson, 2001; Gabrielsson & Lindström, 2001). According to Sloboda and Juslin (2001), the greatest power of music is to represent and express emotions. Unlike psychology, there is a long tradition of musical emotion research in the musical sciences (Budd, 1985). This research contains very few psychological assumptions. One reason for this could be the confusing picture that has arisen due to the many emotion-psychological theories and postulates (Sloboda & Juslin, 2001). Sloboda and Juslin pointed out that there is a need for multidisciplinary research, particularly with regard to a psychological approach. But what is the connection between the communication of emotions and the emotions felt? At this point, too, the appraisal approach by Scherer is applicable (Figure 7). According to Scherer & Zentner (2001), this approach constitutes
362
Knautz
a suitable model to reflect the personal significance of an event on the basis of a number of criteria in the music context. In this process, the musical expression of a piece of music is received. The listener rates the intention on the basis of the application of specific combinations of musical parameters (Juslin & Laukka, 2003) and evaluates them in terms of relevance, implications, coping potential and normative significance. In addition to the emotion induction via appraisal, an induction can also arise during the reception of music on the basis of memories of certain situations (recall from memory) (Lang, 1979; Lang, Kozal, Miller, Levin, & MacLean, 1990; Scherer, 2004). Just like videos and pictures, music can also evoke empathic responses. Imagine that you see an enthusiastic musician filled with emotions. Although you are not personally affected, you can still understand the situation. Also, as with pictures and videos, a positive emotion on the part of the observer requires sympathy (Scherer & Zentner, 2001). With regard to emotions mediated by music, emotional contagion can arise via the so-called peripheral route, according to Scherer (1998a, 2004). These are direct effects on the somatic and autonomic nervous system. As an example of this phenomenon, Scherer cites certain rhythms that generate emotions by activating the neurophysiological component and feedback mechanisms with the sub-systems. Emotional music is a growing area of research in psychology and musicology. The results from this research are for example used in advertising and marketing (Bagozzi, Gopinath, & Neyer, 1999; Edell & Burke, 1987) as well as in music therapy. The applications in music therapy range from the treatment of patients with clinical depression (Hsu & Lai, 2004; Lai, 1999) or geriatric psychiatric patients (Clair & Memmott, 2008; Lee, Chan & Mok, 2010; Short, 1995) to the treatment of neurological diseases in the field of rehabilitation and procedures for cancer patients (Burns, 2001). Information science and computer science researchers are also concerned with emotional music research. As with the indexing and retrieval of images, approaches start with the extraction of low-level features. The results of the metastudy compiled by Juslin and Laukka (2003) could present a very good starting point. They evaluated which acoustic parameters can communicate emotions in which way. Their results suggest that there exists emotion-specific acoustic information which is used for the communication of emotions through individual vowels and musical expression (e.g. loud and / or fast music communicates anger; high pitch and low high-frequency energy (tone colour, timbre) transport fear). These transmitted emotions, particularly anger and grief, are easily perceived by the recipients thanks to the musical parameters. We’d like to point out once more that the perceived emotions may differ from the felt emotions. In this context, Sloboda and Juslin (2005) postulate that a greater consensus can be
Chapter 14. Emotion felt and depicted
363
reached in identifying depicted emotions than in those that are felt. A comprehensive summary regarding emotional reactions to music in relation to the other subsystems has been given by Juslin and Västfjäll (2008). In addition to the extraction of low-level features, similar procedures as for indexing and retrieving videos and images are applied. These are, among many others, vector-based approaches (Li & Ogihara, 2003), fuzzy logic (Yang, Liu, & Chen, 2006), software agents (Yang & Lee, 2004), regression (Yang, Lin, Su, & Chen, 2008), histograms (Li & Ogihara, 2006) and ontologies (Han, Rhon, Jun, & Hwang, 2010).
Figure 7. Emotion elicitation based in musical context (Scherer & Zentner, 2001, p. 370).
4 Consequences for multimedia indexing and retrieval Below, it will be shown how the identified models, designs and research results can be applied to a multimedia search engine with emotional retrieval options. To this end, we will first discuss the approach, in which the indexing component in particular plays an important role. The second part of this section shows the actual implementation of the approach in an emotional search engine for multimedia documents. The search engine presented here is a project of the HeinrichHeine University Düsseldorf. The author of this article participates in its development and implementation. Further information regarding the search engine can be found in Siebenlist & Knautz (this volume) as well as in other publications (Knautz et al., 2010; Knautz et al., 2011).
364
Knautz
4.1 Approaches to emotional indexing on the World Wide Web Concerning the indexing of multimedia documents, research distinguishes between two approaches. The automatic, content-based indexing of resources is neither time- nor labour-intensive. For images, colour, texture, and shape are taken into consideration. For videos, scenes that have shots with object and camera movement (panning or zooming), lighting, and cutting frequency also matter. Music is indexed according to pitch, rhythm, harmony, timbre and – if existent – the lyrics. However, the information that can be deduced from these low-level features is limited and therefore only partly suitable for an analysis of the semantic and emotional content of the Web. Concept-based solutions are considered promising alternatives to content-based approaches. Concept-based retrieval works with terms (concepts) that are currently primarily intellectually assigned, although, in principle, they could be deduced from the content. One way of achieving this is through knowledge organization systems (e.g. thesauri) and professional indexing. Given a taxonomy for emotions, it would at least theoretically be possible to intellectually assign controlled vocabulary about feelings toward documents such as videos. However, this method is highly dependent on the respective indexer. Additionally, there is the practical problem of having professional indexers evaluate the billions of pictures and videos available. Using user-generated tags, i.e. social tagging or cooperative indexing – as can be found in various Web 2.0 services – has the distinct advantage of drastically reducing the time and manpower required (Peters, 2009). In broad folksonomies with a sufficiently large number of tagging users consistency problems have been found to be negligible (Knautz et al., 2011). But how can users collaboratively index multimedia documents? As specified in section 1.3, three approaches for classifying emotions can be found in psychology (dimension model, category model, base emotions). Similar studies in emotional music retrieval (Lee & Neal, 2007) and in emotional image retrieval (Schmidt & Stock, 2009) work with a fixed set of five base emotions: sadness, happiness, anger, fear and disgust. The set of base emotions has been enriched on the basis of psychological specialized literature. For this project, the following base emotions have been selected: sadness, anger, fear, shame, disgust, surprise, longing, joy and love. Irrespective of these base emotions proposed in emotion psychology, we have included a further aspect in the form of humour resp. wit in order to do justice to this media component. The indexing of base emotions can be readily combined with the dimension model by considering the intensity of emotions. Following the approaches of Lee and Neal (2007), Schmidt and Stock (2009), and Knautz et al. (2010), sliders have been implemented in the search engine.
Chapter 14. Emotion felt and depicted
365
A far-reaching consequence of the research discussed above is the need to explicitly distinguish between depicted and felt emotions in multimedia documents. This applies to both the indexing and the retrieval options. The concrete implementation of the presented approach is shown below.
4.2 Implementation: An emotional multimedia search engine Since there is currently no satisfactory possibility of emotionally indexing videos, music and images on the basis of content- and concept-based approaches on the Web, a consideration of the different indexing approaches leads to the conclusion that the most appropriate method to index those resources on the Web is the use of a broad folksonomy. In this approach, many different people index the same documents with tags, guaranteeing in turn the findability of the resources. The search engine MEMOSE (Media Emotion Search) presented here has adapted this concept. In addition to offering a keyword search like other Web 2.0 services, it also offers ten emotional (base) terms (Figure 10). The user can select one or more emotions via check boxes (e.g. fear) and combine them with other non-emotionrelated search arguments (e.g. dog).
Figure 8. MEMOSE search interface.
The result of the query is presented in two lists (Figure 11). This is a consequence of the above considerations regarding the relationship between depicted and felt emotions. On the first page of the respective results list, the four most emotional pictures are included. More results can be viewed by clicking the arrow at the
366
Knautz
right edge. The results confirm the chosen approach of differentiating between depicted and felt emotions. The first results list includes photos that were indexed by queried tag and the queried emotion. It is clear, however, that in these pictures the emotion fear is depicted. The second results list, on the other hand, shows pictures that trigger the emotion when viewed. The results in the two lists are completely different and a distinction is necessary. Video and music search results can be accessed via tabs.
Figure 9. Search results (shown and felt) for “fear and dog”.
The indexing of the multimedia files can be done in two different ways. First, a user can index existing multimedia documents. He can do this by clicking the “Memose me” button below a search result (Figure 9). Second, he can index
Chapter 14. Emotion felt and depicted
367
media he has uploaded himself. This can be done either by entering a URL from Flickr, YouTube or Last.fm or using MEMOSE’s own uploader. In the first case, the tags are imported from the respective service; in the second case, the user adds them himself (Figure 12). The next step, the emotional indexing, is the same in both variants (Figure 13). To realize the retrieval options described above, the indexing tool includes three aspects: 1. general indexing with one or more emotions from a fixed set of emotions, 2. assessing the intensity of the emotion(s) on a scale of 0 to 10, 3. the distinction between depicted and felt emotions. These three aspects are a consequence of the considerations in this chapter. During indexing, a distinction must be made between depicted and felt emotions in order to obtain relevant retrieval results. As an emotion can be felt or communicated with varying degrees of intensity, including the intensity makes for a useful ranking parameter. The restriction to a fixed set of emotions is a good approach on the Web due to the variety of emotions and the possibility of paraphrasing. In a future version it is planned to identify the emotion word fields (emotion clusters) and make them available to users via overlays over the individual emotions.
Figure 10: Multimedia uploader
368
Knautz
Figure 11: Indexing tool.
5 Concluding remarks The aim of this chapter was to present the peculiarities of emotions in relation to multimedia documents and to point out the consequences in regard to indexing and retrieving these documents. For this purpose, a small insight into emotionpsychological foundations was given. We found that there are many definitions of emotion and modern research makes use of working definitions. The variety of definitions is due to the amount of emotion-psychological theories with different emphases. We also gave a brief overview of how individual research orientations explain the origin of emotions. In this context, studies show that a considerable number of emotional experiences are due to the representation of events in various media. To explain emotional media effects, we made use of an appraisal model – the componentprocess-model by Scherer (1984, 2001b) – in which emotions are the result of a subjective situation assessment. The subjective importance of the event for the current motivation of the subject is crucial for triggering the emotion. According to the commotion model by Scherer (Scherer & Zentner, 2001), emotions can, however, also arise when looking at depicted emotions via induction (appraisal process), empathy or emotional contagion. Scherer’s model was examined in relation to video, music, and pictures, and it was shown that the emotions felt not only emerge in different ways, but also that they can be com-
Chapter 14. Emotion felt and depicted
369
pletely different from the emotions depicted. This aspect is fundamental for any system that tries to make emotional content searchable. Which parameters (lowlevel features) cause or strengthen exactly which type of emotion in which media could not be discussed in detail in this chapter for lack of space. However, we referenced the relevant literature at the appropriate place. Using results from the fields of psychology, musicology and information science, a model for indexing and retrieval of emotional content in multimedia documents was designed. Besides the use of ten base emotions, the dimension model was used for the intensity setting. The valence is, in our view, given by the concept of each base emotion. The distinction between felt emotions and depicted emotions in both in the indexing and the retrieval form a centrepiece of the approach and implementation. In this way, the specific requirements of the media component are satisfied based on the appraisal model. MEMOSE is a project of the Heinrich-Heine-University Düsseldorf (Germany) and is currently still a prototype, with new insights being continually integrated into the system. We therefore do not claim that the system is perfect. Nevertheless, MEMOSE is a first look at how different emotion-psychological approaches can be used for the indexing and retrieval of multimedia documents.
References Adelaar, T., Chang, S., Langendorfer, K. L., & Morimoto, M. (2003). Effects of media formats on emotions and impulse. Journal of Information Technology, 18, 247 – 266. Arifin, S., & Cheung, P. (2007). A computation method for video segmentation utilizing the pleasure-arousal-dominance emotional information. Proceedings of the 16th ACM International Conference on Multimedia (p. 68). Arnold, M. (1960). Emotion and personality. New York: Columbia University Press. Bagozzi, R., Gopinath, M., & Nyer, P. (1999). The role of emotions in marketing. Journal of the Academy of Marketing Science, 27(2), 59 – 70. Bischoff, K., Firan, C., Nejdl, W., & Paiu, R. (2008). Can all tags be used for search? Proceedings of the 17th ACM Conference on Information and Knowledge Mining (pp. 193- 202). Bischoff, K., Firan, C., Nejdl, W., & Paiu, R. (2010). Bridging the gap between tagging and querying vocabularies: Analyses and applications for enhancing multimedia IR. Journal of Web Semantics, 8(2 – 3), 97 – 109. Brosch, T., & Scherer, K. (2009). A case for the component process model as theoretical starting point for experimental emotion research. In V. Brandstätter & J. H. Otto (Eds.), Handbuch der Allgemeinen Psychologie: Motivation und Emotion (pp. 446 – 456). Göttingen: Hogrefe. Budd, M. (1985). Understanding music. Proceedings of the Aristotelian Society, 59, 233 – 248. Burns, D. (2001). The effect of the bonny method of guided imagery and music on the mood and life quality of cancer patients. Journal of Music Therapy, 38(1), 51 – 65.
370
Knautz
Cannon, W. (1927). The James-Lange theory of emotions: A critical examination and an alternative theory. American Journal of Psychology, 100(3 – 4), 567 – 586. Cantor, J., & Nathanson, A. (1996). Children’s fright reactions to television news. Journal of Communication, 46(4), 139 – 152. Chan, C., & Jones, G. (2005). Affect-based indexing and retrieval of films. Proceedings of the 13th Annual ACM International Conference on Multimedia (pp. 427 – 430). Clair, A., & Memmott, J. (2008). Therapeutic uses of music with older adults. Silver Spring, MD: American Music Therapy Association. Darwin, C. (1872). The expression of the emotions in man and animals. London: John Murray. Davis, W., Gfeller, K., & Thaut, M. (1992). An introduction to music therapy: Theory and practice. Dubuque, IA: Wm. C. Brown Publishers. Dellandrea, E., Liu, N., & Chen, L. (2010). Classification of affective semantics in images based on discrete and dimensional models of emotions. Proceedings of the IEEE International Workshop on Content-Based Multimedia Indexing (CBMI) (pp. 1 – 6). Edell, J., & Burke, M. (1987). The power of feelings in understanding advertising effects. Journal of Consumer Research, 38, 421 – 433. Ekman, P., Friesen, W., & Ellsworth, P. (1982). What emotion categories or dimensions can observers judge from facial behavior. In P. Ekman (Ed.), Emotion in the human face (pp. 39 – 55). New York: Cambridge University Press. Fehr, B., & Russell, A. (1984). Concept of emotion viewed from a prototype perspective. Journal of Experimental Psychology: General, 113(3), 464 – 486. Feng, H., Lesot, M., & Detyniecki, M. (2010). Association rules to discover color-emotion relationships based on social tagging. Proceedings of Knowledge-Based and Intelligent Information and Engineering Systems (pp. 544 – 553). Frijda, N. (1986). The emotions. New York: Cambridge University Press. Frijda, N. (1988). The laws of emotion. American Psychologist, 43, 349 – 358. Gabrielsson, A. (2001). Emotions in strong experience with music. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 431 – 449). New York: Oxford University Press. Gabrielsson, A., & Lindström, E. (2001). The influence of musical structure on emotional expression. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 223 – 248). New York: Oxford University Press. Gergen, K. (1991). Toward transformation in social knowledge. London: Sage. Gray, J. (1982). The neuropsychology of anxiety: An inquiry into the functions of the septohippocampal system. Oxford: Oxford University Press. Gross, J., & Levenson, R. (1995). Emotion elicitation using films. Cognition and Emotion, 9(1), 87 – 108. Han, B., Rhon, S., Jun, S., & Hwang, E. (2010). Music emotion classification and context- based music recommendation. Multimedia Tools and Applications, 47(3), 433 – 460. Hanjalic, A. (2006). Extracting moods from pictures and sounds. IEEE Signal Processing Magazine, 23(2), 90 – 100. Hanjalic, A., & Xu, l.-Q. (2005). Affective video content representation and modeling. IEEE Transactions on Multimedia, 7(1), 143 – 154. Hastings, S., Iyer, H., Neal, D., Rorissa, A., & Yoon, J. (2007). Social computing, folksonomies, and image tagging: Reports from the research front. Proceedings of the 70th Annual Meeting of the American Society for Information Science and Technology (pp. 1026 – 1029).
Chapter 14. Emotion felt and depicted
371
Hatfield, E., Cacioppo, J., & Rapson, R. (1992). Primitive emotional contagion. In M. S. Clark (Ed.), Emotion and social behavior (pp. 151 – 177). Newbury Park: Sage. Heider, F. (1958). The psychology of interpersonal relations. New York: Wiley. Hohmann, G. (1966). Some effects of spinal cord lesions on experienced emotional feelings. Psychophysiology, 3, 143 – 156. Holbrook, M., & Batra, R. (1987). Assessing the role of emotions as mediators of consumer responses to advertising. Journal of Consumer Research, 14, 404 – 420. Hsu, W., & Lai, H. (2004). Effects of music on major depression in psychiatric inpatients. Archives of Psychiatric Nursing, 18(5), 193 – 199. Izard, C. E. (1971). The face of emotion. New York: Appleton-Century-Crofts. Izard, C. E. (1977). Human emotions. New York: Plenum Press. Izard, C. (1994). Die Emotionen des Menschen. Eine Einführung in die Grundlagen der Emotionspsychologie. Weinheim: Psychologie Verlags Union. James, W. (1884). What is an emotion? Mind, 9(34), 188 – 205. Jones, M. (1924). A laboratory study of fear: The case of Peter. The Pedagogical Seminary, 31, 308 – 315. Jörgensen, C. (2003). Image retrieval: Theory and research. Lanham, MD: Scarecrow Press. Juslin, P., & Laukka, P. (2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin, 129, 770 – 814. Juslin, P., & Västfjall, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. Behavioral and Brain Sciences, 31, 559 – 621. Kang, H.-B. (2003). Affective content detection using HMMs. Proceedings of ACM Multimedia (pp. 259 – 263). Katz, E., & Foulkes, D. (1962). On the use of the mass media as “escape” – clarification of a concept. Public Opinion Quarterly, 26, 377 – 388. Kim, E., Kim, S., Koo, H., Jeong, K., & Kim, J. (2005). Motion-based textile indexing using colors and texture. Proceedings of Fuzzy Systems and Knowledge Discovery (pp. 1077 – 1080). Kim, Y., Shin, Y., Kim, S., Kim, E., & Shin, H. (2009). EBIR: Emotion-based image retrieval. Digest of Technical Papers, International Conference on Consumer Electronics (pp. 1 – 2). Kleinginna, P., & Kleinginna, A. (1981). A categorized list of emotion definitions, with suggestions for a consensual definition. Motivation and Emotions, 5(4), 345 – 379. Knautz, K., Neal, D., Schmidt, S., Siebenlist, T., & Stock, W.G. (2011). Finding emotional- laden resources on the World Wide Web. Information, 2(1), 217 – 246. Knautz, K., Siebenlist, T., & Stock, W.G. (2010). MEMOSE. Search engine for emotions in multimedia documents. Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 791 – 792). Lai, Y. (1999). Effects of music listening on depressed women in Taiwan. Issues in Mental Health Nursing, 20, 229 – 246. Lang, P. (1979). A bio-informational theory of emotional imagery. Psychophysiology, 16, 495 – 512. Lang, P., Kozal, M., Miller, G., Levin, D., & MacLean, A. (1990). Emotional imagery: Conceptual structure and pattern of somato-visceral response. Psychophysiology, 17, 179 – 192. Lange, C., & Kurella, H. (1887). Ueber Gemüthsbewegungen: Eine Psycho-physiologische Studie. T. Thomas. Lazarus, R. (1991). Emotion and adaptation. New York: Oxford University Press. Lazarus, R. (1995). Vexing research problems inherent in cognitive-mediational theories of emotion – and some solutions. Psychological Inquiry, 6, 183 – 196.
372
Knautz
Le Bon, G. (1895). Psychologie des Foules. Paris: P.U.F. LeDoux, J. (1995). Emotion: Clues from the brain. Annual Review of Psychology, 46, 209 – 235. Lee, H., & Neal, D. (2007). Towards Web 2.0 music information retrieval: Utilizing emotionbased, user-assigned descriptors. Proceedings of the 70th Annual Meeting of theAmerican Society for Information Science and Technology (pp. 732 – 741). Lee, Y., Chan, M., & Mok, E. (2010). Effectiveness of music intervention on the quality of life of older people. Journal of Advanced Nursing, 66(12), 677 – 2687. Li, T., & Ogihara, M. (2003). Detecting emotion in music. Proceedings of the 4th International Symposium on Music Information Retrieval (pp. 239 – 240). Li, T., & Ogihara, M. (2006). Toward intelligent music information retrieval. IEEE Transactions on Multimedia, 8(3), 564 – 574. Lopatovska, I. (2011). Researching emotion: Challenges and solutions. Proceedings of the 2011 iConference (iConference ‘11) (pp. 225 – 229). Lopatovska, I., & Arapakis, I. (2011). Theories, methods and current research on emotions in library and information science, information retrieval and human-computer interaction. Information Processing and Management, 47(4), 575 – 592. Mangold, R., Unz, D., & Winterhoff-Spurk, P. (2001). Zur Erklärung emotionaler Medienwirkungen. Leistungsfähigkeit, empirische Überprüfung und Fortentwicklung theoretischer Ansätze. In Theoretische Perspektiven der Rezeptionsforschung (pp. 163 – 180). München: Reinhard Fischer. Marañon, G. (1924). Contribution à l’étude de l’action émotive de l’adrenaline. Revue Francaise d’Endocrinologie, 2, 301 – 325. McDougall, W. (1908 / 1960). An introduction to social psychology. Boston: Methuen. McDougall, W. (1926). An outline of abnormal psychology. Boston: Luce. Mehrabian, A. (1995). Framework for a comprehensive description and measurement. Genetic, Social, and General Psychology Monographs, 121, 339 – 361. Meyer, W., Reisenzein, R., & Schützwohl, A. (2001). Einführung in die Emotionspsychologie. Band I: Die Emotionstheorien von Watson, James und Schachter. Bern: Verlag Hans Huber. Meyer, W., Reisenzein, R., & Schützwohl, A. (2003). Einführung in die Emotionspsychologie. Band 2: Evolutionspsychologische Emotionstheorien. Bern: Huber. Mitchell, A., & Olson, J. (1981). Are product attribute beliefs the only mediator of advertising effects on brand attitude? Journal of Marketing Research, 18, 318 – 332. Morris, J., & Boone, M. (1998). The effects of music on emotional response, brand attitude, and purchase intent in an emotional advertising condition. Advances in Consumer Research, 25, 518 – 526. Mowrer, O. (1960). Learning theory and behavior. New York: Wiley. Neal, D. M. (2010a). Breaking in and out of the silos: What makes for a happy photograph cluster? Paper presented at the 2010 Document Academy Conference (DOCAM ‘10), Denton, TX, USA. Neal, D. M. (2010b). Emotion-based tags in photographic documents: The interplay of text, image, and social influence. Canadian Journal of Information and Library Science, 34, 329 – 353. Oatley, K., & Johnson-Laird, P. N. (1987). Towards a cognitive theory of emotions. Cognition & Emotion, 1, 29 – 50. Ortony, A., Clore, G., & Collins, A. (1988). The cognitive structure of emotions. Cambridge: Cambridge University Press.
Chapter 14. Emotion felt and depicted
373
Ortony, A., & Turner, T. (1990). What’s basic about basic emotions? Psychological Review, 97, 315 – 331. Otto, J., Euler, H. M., & Mandl. H. (2000). Begriffsbestimmungen. In J. Otto, H. A. Euler, & H. Mandl (Eds.), Handbuch Emotionspsychologie (pp. 11 – 18). Weinheim: Beltz, PsychologieVerlagsUnion. Panksepp, J. (1982). Toward a general psychobiological theory of emotions. The Behavioral and Brain Sciences, 5, 407 – 467. Peters, I. (2009). Folksonomies: Indexing and retrieval in Web 2.0. Berlin: De Gruyter Saur. Plutchik, R. (1962). The emotions: Facts, theories, and a new model. New York: Random House. Plutchik, R. (1980). A general psychoevolutionary theory of emotion. In R. Plutchik & H. Kellerman (Eds.), Theories of emotion: Emotion, theory, research, and experience (pp. 3 – 33). New York: Academic. Rothmund, J., Schreier, M., & Groeben, N. (2001). Fernsehen und erlebte Wirklichkeit I: Ein kritischer Überblick über die Perceived Reality-Forschung. Zeitschrift für Medienpsychologie, 13(1), 23 – 41. Russell, J., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11(3), 273 – 294. Salway, A., & Graham, M. (2003). Extracting information about emotions in films. Proceedings of the 11th ACM International Conference on Multimedia (pp. 299 – 302). New York: ACM. Schachter, S. (1964). The interaction of cognitive and physiological determinants of emotional state. In L. Berkowitz (Ed.), Advances in experimental social psychology (pp. 49 – 79). New York: Academic Press. Schenk, M. (1987). Medienwirkungsforschung. Tübingen: Siebeck. Scherer, K. (1984). On the nature and function of emotion. A component process approach. In K. R. Scherer & P. Ekman (Eds.), Approaches to emotion (pp. 293 – 318). Hillsdale: Erlbaum. Scherer, K. (1988). Criteria for emotion-antecedent appraisal: A review. In Cognitive Perspectives on Emotion and Motivation (pp. 89 – 126). Dordrecht: Kluwer. Scherer, K. (1993). Neuroscience projections to current debates in emotion psychology. Cognition and Emotion, 7, 1 – 41. Scherer, K. (1998a). Appraisal theories. In T. Dalgleish & M. J. Power (Eds.), Handbook of cognition and emotion (pp. 637 – 663). Amsterdam: Elsevier Science. Scherer, K. (1998b). Emotionsprozesse im Medienkontext: Forschungsillustrationen und Zukunftsperspektiven. Medienpsychologie, 10(4), 276 – 293. Scherer, K. (2001a). Emotional experience is subject to social and technological change: Extrapolating to the future. Social Science Information, 40(1), 125 – 151. Scherer, K. (2001b). Appraisal considered as a process of multilevel sequential checking. In K. R. Scherer, A. Schorr, & T. Johnstone (Eds.), Appraisal processes in emotion: Theory, methods, research (pp. 92 – 120). Oxford: Oxford University Press. Scherer, K. (2004). Which emotions can be induced by music? What are the underlying mechanisms? And how can we measure them? Journal of New Music Research, 33(3), 239 – 251. Scherer, K. (2005). What are emotions? And how can they be measured? Social Science Information, 44(4), 695 – 729. Scherer, K., Wallbott, H., & Summerfield, A. (1986). Experiencing emotion: A crosscultural study. Cambridge: Cambridge University Press.
374
Knautz
Scherer, K., & Zentner, M. (2001). Emotional effects of music: Production rules. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 361 – 391). New York: Oxford University Press. Schmidt, S., & Stock, W. G. (2009). Collective indexing of emotions in images. A study in emotional information retrieval. Journal of the American Society for Information Science and Technology, 60(5), 863 – 876. Schmidt-Atzert, L., & Ströhme, W. (1983). Ein Beitrag zur Taxonomie der Emotionswörter. Psychologische Beiträge, 2, 126 – 141. Schwab, F. (2001). Unterhaltungsrezeption als Gegenstand medienpsychologischer Emotionsforschung. Zeitschrift für Medienpsychologie, 13, 62 – 72. Short, A. (1995). Insight-oriented music therapy with elderly residents. The Australian Journal of Music Therapy, 6, 4 – 18. Sloboda, J. A., & Juslin, N. (2001). Psychological perspectives on music and emotion. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 71 – 104). New York: Oxford University Press. Sloboda, J., & Juslin, P. (2005). Affektive Prozesse: Emotionale und ästhetische Aspekte musikalischen Verhaltens. In Allgemeine Musikpsychologie (pp. 767 – 841). Göttingen: Hogrefe. Soleymani, M., Chanel, G., Kierkels, J., & Pun, T. (2008). Affective ranking of movie scenes using physiological signals and content analysis. Proceedings of the 2nd ACM Workshop on Multimedia Semantics (pp. 32 – 39). New York: ACM. Tomkins, S. (1962). Affect, imagery, consciousness, Vol. I: The positive affects. New York: Springer Publishing. Tomkins, S. (1963). Affect, imagery, consciousness, Vol. II: The negative affects. New York: Springer Publishing. Tomkins, S. (1984). Affect theory. In K. Scherer & P. Ekman (Eds.), Approaches to emotion (pp. 163 – 195). Hillsdale, NJ: Erlbaum. Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of Personality and Social Psychology, 4(4), 400 – 408. Wang, S., Chen, E., Wang, X., & Zhang, Z. (2003). Research and implementation of a content-based emotional image retrieval model. Proceedings of the Second International Conference on Active Media Technology (pp. 293 – 302). Wang, W.-N., Yu, Y.-L., & Zhang, J.-C. (2004). Image emotional classification: Static vs. dynamic. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (pp. 6407 – 6411). Watson, J. (1913). Psychology as the behaviorist views it. Psychological Bulletin, 20, 158 – 177. Watson, J. (1919). A schematic outline of the emotions. Psychological Review, 26(3), 165 – 196. Watson, J. B. (1930). Behaviorism. Chicago: University of Chicago Press. Watson, J., & Rayner, R. (1920). Conditioned emotional reaction. Journal of Experimental Psychology, 3, 1 – 14. Weiner, B. (1986). An attributional theory of motivation and emotion. New York: Springer. Weiner, B., & Graham, S. (1984). An attributional approach to emotional development. In E. Izard, J. Kagan, & R. Zajonc (Eds.), Emotions, cognition, and behavior (pp. 167 – 191). New York: Cambridge University Press. Wirth, W., & Schramm, H. (2005). Media and emotion. Communication Research Trends, 24(3), 1 – 39. Wundt, W. (1906). Vorlesungen über die Menschen und Tierseele. Hamburg: Voss.
Chapter 14. Emotion felt and depicted
375
Xu, M., Jin, J., Luo, & Duan, L. (2008). Hierarchical movie affective content analysis based on arousal and valence features. Proceedings of the 16th ACM International Conference on Multimedia (pp. 677 – 680). Yang, Y., Liu, C., & Chen, H. (2006). Music emotion classification: A fuzzy approach. Proceedings of the 14th International Conference on Multimedia (pp. 81 – 84). Yang, D., & Lee, W. (2004). Disambiguating music emotion using software agents. Proceedings of the 5th International Conference on Music Information Retrieval (pp. 52 – 58). Yang, Y., Lin, Y., Su, Y., & Chen, H. (2008). A regression approach to music emotion recognition. IEEE Transactions on Audio, Speech and Language Processing, 16(2), 448 – 457. Zeitlin, D., & Westwood, R. (1986). Measuring emotional response. Journal of Advertising Research, 26, 34 – 44. Zillmann, D. (1978). Attribution and misattribution of excitatory reactions. In J. H. Harvey, W. J. Ickes, & R. F. Kidd (Eds.), New directions in attribution research (pp. 335 – 370). Hillsdale, NJ: Erlbaum. Zillmann, D. (1988). Mood management through communication choices. American Behavioral Scientist, 31(3), 327 – 341.
Tobias Siebenlist, Kathrin Knautz
Chapter 15. The critical role of the cold-start problem and incentive systems in emotional Web 2.0 services Abstract: Some content in multimedia resources can depict or evoke certain emotions in users. The aim of Emotional Information Retrieval (EmIR) is to identify knowledge about emotional-laden documents and to use these findings in a new kind of World Wide Web information service that allows users to search and browse by emotion. The cold-start problem arises in systems where users need to rate or tag documents when there is not sufficient information to draw interferences from these documents and the existing ratings or tags for those. The cold-start problem can be divided into three situations: cold-start system, coldstart user and cold-start item. In this paper we will refer to the first one and the last one. We will discuss the cold-start problem of the specialized search engine for emotional-laden multimedia documents called MEMOSE in this article and provide a solution using a combination of using content-based features to overcome the start-up phase and an incentive system for getting and keeping users motivated to use MEMOSE. Keywords: Emotional information retrieval, indexing; retrieval, content-based information retrieval, concept-based information retrieval, cold-start, incentive system, gamification, Media EMOtion Search, MEMOSE
Tobias Siebenlist (corresponding author), Research Associate and PhD Student, Department of Information Science, Heinrich-Heine-University, [email protected] Kathrin Knautz, Research Associate and PhD Student, Department of Information Science, Heinrich-Heine-University
Introduction In the early days of the World Wide Web in the 1990’s, only a few experts were able to make content available online. In 1996 there were about 250,000 sites at 45 million worldwide users. Although these sites were accessible online, their use was passive in nature because user-generated content was rather uncommon due to the complexity of the deployment. This changed with the emergence of
Chapter 15. The critical role of the cold-start problem
377
new and easy-to-use services at the beginning of the 21st century. It was now possible for users to generate and publish content online by themselves. Via their PCs or laptop computers and additionally via mobile telephones, smart phones, netbooks or tablet computers users can nowadays produce content like videos, images and music at almost any place and time and share it with the online community. Some content in multimedia resources can depict or evoke certain emotions in users. The aim of Emotional Information Retrieval (EmIR) and of our research is to identify knowledge about emotional-laden documents and to use these findings in a new kind of World Wide Web information service that allows users to search and browse by emotion. For these reasons, the special search engine MEMOSE was developed (Knautz, Siebenlist, & Stock, 2010; Knautz, Neal, Schmidt, Siebenlist, & Stock, 2011). MEMOSE is the first prototype which tries to index emotions in multimedia documents based on user input. Users select emotions of a controlled vocabulary containing ten basic emotions. The intensity of an emotion can be adjusted via slide controls. Studies show that users are able to index images (Schmidt & Stock, 2009) and videos (Knautz et al., 2010) consistently. While users are able to index consistently, there exists a cold-start problem. If a new document is added to the system, how can it be found when it has not been rated sufficiently? Until a certain number of users have tagged the document emotionally, it is not possible to separate the appropriate emotions. An approach to solving this problem is presented here, which utilizes a combination of content-based and concept-based methods. Content-based information retrieval works with basic features which are given in the document (e.g., colour and shape of images). Additionally we take a look at implementing an incentive system (Roinestad et al., 2009; Wash & Rader, 2007) into MEMOSE in order to motivate the users to rate more documents.
1 MEMOSE – a specialized search engine for emotional-laden multimedia documents MEMOSE is a search engine that focuses on multimedia documents (images, videos and music) and the emotional indexing of those documents (Knautz et al., 2010; Knautz et al., 2011). Content is added solely by the users of MEMOSE. Multimedia documents can be imported from supported web services via their Application Programming Interfaces (APIs) or by using an upload form for such documents. For retrieval purposes the documents need to be tagged. Either the
378
Siebenlist, Knautz
tags are retrieved from the supported web service or the uploading user assigns tags while using the upload form. The specialty about MEMOSE is the focus on emotions. When a new multimedia document is added to the system it needs to be emotionally tagged. We use a controlled vocabulary of 10 basic emotions (sadness, anger, fear, disgust, shame, surprise, desire, happiness, fun and love) which are assigned using slide controls in a range from 0 to 10, with a value of 10 representing the highest intensity regarding the respective emotion. Further we distinguish between the emotion that is expressed (what is displayed) and felt (what the user feels while interacting with the document). This variety of choices leads to 20 slide controls, allowing the user to make accurate decisions about the emotions represented within a multimedia document. When searching for a document, the user can enter search terms and choose one or more emotions to search for. The search terms will be compared with the documents’ tags, and the emotional values are compared using the average value from the users’ ratings. As of today, the relevance ranking depends solely on the emotional intensity of the documents assigned by the users’ emotional tagging. A document has to be emotionally tagged by a defined number of users in order to appear within the search results. When starting MEMOSE as a publicly available service, this leads to the problem that nearly no results can be found because the documents have not been rated by a sufficient number of users to obtain reasonable results. Also, since MEMOSE depends on the documents that are contributed by the users, we need mechanics to overcome the start-up phase until it gets established with usable content. The problem here is that when starting the service, no documents exist in the database, which leads to an unusable service. When nothing can be found, there is no need for a search engine. We have provided some sample data that the developers and students have put into MEMOSE, but for satisfactory results, there is a need for real users and their content. Although we need users to contribute content, we also need them to emotionally tag the available documents. The users must be motivated to use MEMOSE regularly and to provide new content as well as tag documents with emotions.
2 The cold-start problem When rolling out a new service to the public on the Web, we have to deal with the question: How can users be motivated and attracted to actively use the service and spread the word about it? Especially when the service builds upon user-generated content and the users’ activities, the service needs active users in order to be successful. Starting a new service without having a sufficient amount of
Chapter 15. The critical role of the cold-start problem
379
user-generated data to draw inferences for the content is known as the cold-start problem. According to Park, Pennock, Madani, Good, and DeCoste (2006), three coldstart situations exist: cold-start system, cold-start user, and cold-start item. The first one describes the situation when a new service starts and there are only small numbers of users and documents. The cold-start user situation exists when a new user registers, and the service has no or little knowledge of the user. The third situation happens when a new document is added to the service and there is no more than one emotional rating by the user who added it. The first and the last one are the cold-start problems we have to deal with in MEMOSE. A cold-start user situation can be neglected in MEMOSE, because the retrieval is not based on information about the users but on the documents’ emotional ratings given by the users. The cold-start system and the cold-start item situations will both be considered with the approach described in this paper. When a new collaborative service appears on the Web, the number of users is usually small. The service must become known in order to get more users. This is even more true if the service relies on the activities of the users, as with MEMOSE. People need to add media documents and tag them in order to make MEMOSE more interesting and useful for other users. The cold-start problem describes the situation of missing knowledge about items (users, documents, etc.) which disables the service to make recommendations about these items. The following question arises: If a new document is added to the system, how can it be found when it has not been rated sufficiently? Until a certain number of users have tagged the document emotionally, it is not possible to separate the appropriate emotions. Thus no further inferences and classification can be drawn for the available documents by any means. The users need to become motivated to tag the documents and to actively use the service. Therefore, we provide a combination of contentbased and concept-based methods. In this part of the paper, we will concentrate on content-based retrieval and the usage of extracted features to support the users’ choices.
3 Content-based retrieval The field of content-based retrieval (CBR) deals with the extraction of so-called “features” from multimedia documents. These features are characteristics of the documents and can be used to index, search and find those documents for example within a retrieval system. In contrast to concept-based retrieval, which uses additional information like tags, controlled terms, notations or other infor-
380
Siebenlist, Knautz
mation in textual form, CBR is used for automatic indexing of the documents’ resources. As this can be done automatically, no further intervention is needed. It is available for usage in an automated indexing process. Besides the automatic indexing of textual data, the field of multimedia information retrieval deals with the indexing of multimedia documents (images, videos, music tracks) using those characteristics. One application of CBR is to find multimedia documents by using similar examples as the query. A problem that has to be noted is the huge amount of data that has to be processed in order to retrieve features from multimedia documents. Thus, this is a complex process that takes time. The features that can be extracted from characteristics within the documents are called low-level features. MEMOSE supports three multimedia types: images, videos and music. Frequently used features for images are colour, texture and shape. For videos, scene and shots are used. For extracted frames from videos, the same features as mentioned with images can be used. From music, features like pitch, rhythm and harmony can be extracted. In this paper, we will focus on content-based image retrieval (CBIR) as a beginning to the usage of content-based information in MEMOSE. With this decision, the following statements are focusing only on images. According to Knautz et al. (2011, p. 218), “the information which can be deduced from these low-level features is limited, and therefore only partly suitable for an analysis of the semantic and emotional content”. Due to this limitation, we propose the use of content-based retrieval not as the single solution, but as a supportive element in the whole process of indexing and retrieval. A common CBIR method is to create summary statistics about an image document. These statistics represent, for example, colour usage, texture composition, shape, structure, and further values. Regarding the features of an image, a distinction has to be made between global features and local features (Al-Khatib, Day, Ghafoor, & Berra, 1999). Global features contain an extracted value for a feature that represents the whole image with every pixel included. In comparison, local features represent not the whole image, but a smaller region that is part of the image.
3.1 Available features for content-based retrieval The basic features that can be extracted from images are colour, texture, shape and spatial information. These low-level features are often used when working in the field of CBIR. We will give a short description of these features below, before we explain what features will be used in our approach. A possible presence of noise is a limitation of this work; the images are processed as they are.
Chapter 15. The critical role of the cold-start problem
381
3.1.1 Colour The term “colour” describes light that comes from some kind of source, either directly or being reflected or transmitted. The human perception of colour is three-dimensional, and is modelled as a mixture of red, green and blue (RGB). Every pixel of an image has a value for colour; thus, colour is the property of a pixel. According to Jörgensen (2003), “a color histogram is a representation of the distribution of colors in the image”. A histogram is a graphical representation of the distribution of data. A colour histogram of an image is a visualization of the tonal distribution; it gives an estimation of the distribution of the colours in the image. Colour is a very obvious and very important feature of an image. Additionally, the values for the colour of an image can be extracted without many computational costs. Multimedia documents with a similar content often have similar or identical colours (Jörgensen, 2003). For example, an ice-bear has about the same colour on every image, though the surrounding can differ. Narrowing down the setting even more to a fixed setting (e.g. an ice-bear with the sea in the background) leads to more similar histograms for different images. Colour histograms are described as one of the most basic features, and they are widely used in CBIR (Deselaers, Keysers, & Ney, 2008).
3.1.2 Texture An image can be described as a composition of different textures. The textures are distributed in regions within the image. Regions contain some image features that can be extracted and used for indexing and retrieval purposes. Every texture describes a pattern in an image, depending on criteria like repetition, orientation and complexity. When extracting texture features from an image, the aim is to determine dominant components, whereas the colour feature could be extracted for every pixel of an image, a texture is a property of a region of pixels. Tamura et al. (1978) have determined through psychological studies that humans respond best to coarseness, contrast, and directionality. The so-called Tamura textures deal with these three features. They are computed separately and put together for example into a false-colour image where every feature is related to one colour of the RGB scale.
382
Siebenlist, Knautz
3.1.3 Shape Shapes within images describe objects with a physical structure. With the use of shape features, different shapes can be compared by using the contour and region of the shape. To determine shapes in an image, segmentation or edge detection and analysis is frequently used. Due to light effects and masking of objects, there often is no possibility of determining the shape of the images’ content reliably. Shapes can occur in many different forms. Global features are derived from the entire shape; for example, roundness or circularity. Local features are computed by processing the shape, and lead to information like the size of the shape or its corners. The information about the existence of a shape and the differentiation between shapes can be found in images by searching for changes to a signal’s intensity. Pavlidis (1988) describes the following five situations, when a change in signal refers to a shape: 1. Change in reflectivity of a surface 2. Change in orientation of a surface 3. Combined changes in the relation of the object and the viewer 4. Direct occlusion of one object by another 5. Indirect occlusion (shadow) of one object by another
3.1.4 Spatial information An image as a whole displays some kind of content. The features we have discussed so far dealt with the image as one big entity, even though it is split into shapes or different textures have been found. Spatial information can be used to identify different areas which carry different importance. For these areas, separate histograms can be computed, regarding the areas of interest in an image. A popular and quite promising idea to detect local areas is to compute points of interest. These are salient points in local areas of the image. The technique used to detect and encode local features in images is called scale-invariant feature transform (SIFT) developed by Lowe (1999, 2004). Thus the features are so-called SIFT-features. In order to compute those SIFT-features, candidate points need to be found, orientations need to be assigned, and local reference coordinate systems need to be built around them. The final SIFT-features will be extracted from this area around the candidate points, preserving information about this local reference. The features can be encoded in a scale, rotation and location invariant manner. According to Rüger (2010, p. 57), “a typical image exhibits in the order of 2000 key-points”.
Chapter 15. The critical role of the cold-start problem
383
3.1.5 Face detection On a higher level of feature extraction, faces on pictures can be extracted reliably in images using the open source library OpenCV (Open Source Computer Vision). Whenever an image contains faces that at least partially show the front of the face, the faces can be recognized, and the appropriate areas in the image are returned as results. Having the coordinates of the face, this area can be extracted and further investigated. Within the face, the eyes can be recognized easily as prominent points. Several approaches to automatic emotion recognition have used facial expression as a basis to gather information about the person’s emotions (Black & Yacoob, 1995; Essa, 1997; Mase, 1991; Tian, Kanade & Cohn, 2000; Yacoob & Davis, 1994). Azcarate, Hageloh, van de Sande, and Valenti (2005) describe a method for recognizing the facial emotion of a person by processing video content and extracting motion units to compare them to a fixed list of sample motion units. Chia (2010) uses images as well as a combination of support vector machines and multinomial logistic regression to compare and evaluate relevant extracted features. Generally, the positions and states of face elements like eyes, eyebrows, and lips and so on are extracted and compared to a sample image that prototypically expresses an emotion. According to Chia (2010, p. 1), “[f]acial expression recognition is concerned with the recognition of certain facial movements without attempts to determine or presume about the underlying emotional state of the agent”. This approach does not cover the problem that facial expressions may result from physical exertion and that there is the need for more than just visual data to analyse an emotion. But in a system like MEMOSE, this is the only data we can build upon, so we will neglect this fact here. Both attempts use the Cohn-Kanade database (Kanade, Cohn & Tian, 2000) as dataset. This dataset consists of 500 image sequences from 100 subjects. Accompanying meta-data include annotation of FACS (Facial Action Coding System) action units and emotion-specified expressions. This database can be obtained for research and non-commercial use for free.
3.2 Deciding which features to use Deselears et al. (2008) have done an experimental comparison of features for image retrieval where they give an overview of many image features, including the ones presented here. The performance evaluation of features with five different benchmark image databases for CBIR showed that the colour histogram performed very well for all colour tasks (Deselaers et al., 2008, p. 15) According to Deselaers et al. (2008, p. 15), “color histograms … clearly are a reasonably
384
Siebenlist, Knautz
good baseline for general color photographs.” Some approaches (mainly SIFTbased ones) have outperformed the colour histogram approach. The use of these approaches requires much higher computational costs, whereas the colour histogram needs only minimum computational costs. Because of these great results for the approach using colour histograms and because of the low computational costs, we decided to use this approach in MEMOSE. The low computational costs allow the service to compute the histogram immediately after an image document has been uploaded so that this image can be used for recommendations right away without waiting for an image processing period. Another interesting approach is the usage of SIFT-features. Due to higher computational costs and more complex image processing tasks, SIFT-features have not yet been implemented in MEMOSE. As we do not restrict the content of the documents to a specific domain, we can only rarely make use of face detection algorithms. Nevertheless, all images are automatically examined with a regularly repeating task. If faces are found, this information is stored in the database with coordinates where the faces occur in the image. Because there is a certain amount of false positives, the users have the possibility of reporting images when faces have been recognized falsely.
3.3 Finding emotions in low-level features Many multimedia documents depict or evoke emotions in the observer. Because MEMOSE is focused on emotions, we will discuss the context of emotions within low-level features shortly. We have already shown that there are numerous lowlevel features that can be extracted automatically from multimedia documents at higher or lower computational cost. Higher-level features (e.g. objects, names or even emotions) are much more difficult to extract automatically because when the observer looks at or listens to a multimedia document, there is some kind of understanding and connecting with the knowledge that has been gathered so far. This difficulty is known as the semantic gap (Smeulders, Worring, Santini, Gupta, & Jain, 2000; Zhao & Grosky, 2001). To bridge this gap is one of the biggest and most difficulty challenges in multimedia retrieval. According to Wang and Wang (2005, p. 1), “emotion semantics is the most abstract and highest, which is usually described in adjective form”. Wang and Wang further explain that emotions are closely related to cognitive models, culture backgrounds and aesthetic standards. The interpretation of extracted low-level features like colours, shapes and textures is based on their implication in art (Arnheim, 1983; Itten, 1973). A typical approach to using knowledge from this field is to implement these connections
Chapter 15. The critical role of the cold-start problem
385
as algorithms and programs that obtain the ability to recognize and categorize the low-level features appropriate. For example, “horizontal lines always associate with static horizon and communicate calmness and relaxation” (Wang, Yu, & Zhang, 2004, p. 6407). Wang and Wang (2005) mention four issues in emotionbased semantic image retrieval: Performing feature extraction, determining users’ emotions in relation to images, creating a usable model of emotions, and allowing users to personalize the model. These issues are widely noted in the literature regarding content-based work: “defining rules that capture visual meaning can be difficult” (Colombo, Del Bimbo, & Pietro, 1999, p. 52). Further, Dellandrea, Liu, and Chen (2010) mention three issues in classifying emotion present in images: “emotion representation, image features used to represent emotions and classification schemes designed to handle the distinctive characteristics of emotions” (para. 1). The different methods that have been developed in the field of content-based emotional image retrieval included mechanisms like Support Vector Machines (SVM) (Wang et al., 2003; Wang et al., 2004), machine learning (Feng et al., 2010), fuzzy rules (Kim et al., 2005), neural networks (Kim et al., 2007; Kim et al., 2009; Dellandrea et al., 2010), and evidence theory (Dellandrea et al., 2010). While the majority of these studies suggest promisingly effective results, the systems have been very minimally tested by users before they are reported in research. Additionally, personal preferences and individual differences are not incorporated into the work (Knautz et al., 2011). As previously mentioned, our content-based features approach concentrates on the colour features and thus on colour histograms because the extraction of these does not need high computational cost and has a low complexity. Thus the extracted histograms can directly after computation be used in the service without the need to wait for processing with more elaborate methods. As the users tag documents with emotions, we can start using learning algorithms that are trained with the users’ tagging behaviours and thus could lead to better recommendations and improved relevance ranking. This needs to be elaborated within a study with real users and real data so that the learning can be founded on a sufficient number of users that have tagged the documents with emotions.
4 Similarity measures Since we have decided to use the colour histogram approach in order to get features from images, the next step is to think about a similarity measure for comparison of the images. According to Jörgensen (2003), the most common similarity
386
Siebenlist, Knautz
metrics for evaluating colour similarity are histogram intersection and weighted distance between colour histograms. But before we concentrate on such elaborate methods, we need to start with rather basic similarity measures. The Minkowski or Lp distance is a metric on the Euclidean space and defined as follows: Let x, y ෛ Rn:
√
p
Lp(x, y) =
_________ d
∑ |xi –y i|p
i=1
Special cases of this norm are L1 known as the Manhattan norm and L2 known as the Euclidean norm. These two norms are in general widely used to measure similarity between objects. Each dimension or image feature is independent of each other and of equal importance. Thus, every bin of the colour histogram is treated independently, and does not account for the fact that certain pairs of bins correspond to features which are perceptually more similar than others. (Suhasini et al., 2009). Nevertheless, the Minskowski distance and their respective special cases are used in various fields of applications, not only the comparison of images or their histograms. Another common measure in information retrieval is the cosine similarity, defined as follows: v*w dcos(v, w) = __________ L 2 (v)L 2 (w) The cosine similarity is a measure of the similarity between vectors. The cosine of the angle between these vectors determines the similarity between them. This kind of measure is often used in text-mining and for text statistics as well as datamining. The next similarity measure is a special similarity measure for histograms. The partial histogram intersection is defined as follows: ∑ i min (v i, wi) dphi(v, w) = __________________ max (L 1 (v), L 1 (w)) Here, v and w are not necessarily normalized histograms, but they should at least be in the same range to yield usable results. The components of v and w are expected to be non-negative (Rüger, 2010). Deselaers et al. (2008) follow Puzicha
Chapter 15. The critical role of the cold-start problem
387
et al. (1999) by comparing histograms using the Jeffrey-divergence (JD) or JensenShannon divergence (JSD), which is defined as follows: M 2 H 'm 2 Hm dJSD (H,H') = ∑ H m log _________ + H 'm log _________ m=1 H m + H' m H'm + H m
where H and H’ are the histograms to be compared and Hm is the mth bin of H. The Jeffrey-divergence (JD) is the symmetric and stable version of the KullbackLeibler divergence (KL). The Kullback-Leibler divergence is regarded as a measure of the extent to which two probability density functions agree. The KL has been suggested by Ojala et al. (1996) as an image dissimilarity measure, and measures how inefficient on average it would be to code one histogram using the other as the true distribution for coding. Choosing the right similarity measure for the right task is not obvious. The results of the presented measures need to be examined within MEMOSE after an evaluation period where users are faced with recommendations based on the different measures. Since Deselaers et al. (2008) obtained very good results using the Jenson-Shannon divergence, we took this similarity measure as our first choice. The results for recommendations and relevance ranking using this similarity measures seem promising, but due to the limited scope of the colour histogram comparison, the recommendations as well as the added documents to the list of results are not always understandable and reasonable to the users.
5 Using content-based features to solve the cold-start problem 5.1 Extraction and storage of the content-based features Since we decided to use the colour histograms as the main concept-based feature in MEMOSE, we need to extract it from the image documents. This is achieved by computing the colour histogram of every image that is present in the database. These RGB values are saved in the database in order to have them accessible when needed. Right after that, a background task is started in which the colour histogram is compared to every other colour histogram in the database. As we have decided to use one of the presented measures for measuring similarity between two colour histograms, we can easily adopt the stored values to the respective formula. For the purpose of accessing this information in many search queries,
388
Siebenlist, Knautz
this data is also permanently stored in the database, because this information will not change. As the database grows and the computing of similarity values for each of the existing image documents takes longer, this task is outsourced into a background task that runs after inserting a new image. The user does not get any notice of this process. After the process has finished, the newly added image and its colour histogram are available for being used within the service. The histogram for each image in the database of MEMOSE is created using Python and the Python Imaging Library (PIL). All the histograms are normalized. The histogram of each image consists of an array of integers that describe the colour values in an RGB scale. Another feature we decided to use is the face recognition using OpenCV. Every document that contains one or more faces is marked in the database, and the coordinates for every recognized face are stored. For these recognized faces, we try to detect facial expressions and to match them to emotions. Until now we have not used the Cohn-Kanade database. The use of this database and further algorithms for the detection of facial emotion expression will be implemented later.
5.2 Using content-based features for relevance ranking The goal of this task is not to identify emotion-specific histograms, but to find similar images to those that already have been emotionally tagged, and to let the user decide if there is some similarity in those images. As we have described before, the tagged documents will be within the results only if a certain number of users have emotionally tagged these. The problem that arises is: how can these documents get tagged by a sufficient number of users when the documents cannot be found? The solution here is divided into two parts. First, the new documents which have not yet been rated by a sufficient number of users are presented in a special area of the website marked as “new documents” or “tag these documents”. Apart from the documents that have been recently added to the system, the users get an overview of all documents that they have not tagged yet and that have not been tagged by a sufficient number of users to enter the list of results. The second part uses the content-based features described earlier. Having the images within the list of results give them a better chance to attract attention and to get more users to tag them. When the user starts a query, the result is computed regarding the search terms and the chosen emotions. Before the ranked list of results is returned to the user interface, it will be extended with some documents that are similar regarding the colour histogram similarity to the retrieved results. As mentioned before, the results of the colour histogram for every document com-
Chapter 15. The critical role of the cold-start problem
389
pared with the other documents’ colour histogram are stored in the database so that a query can be processed very fast. A script controls the addition of similar images. Only documents that provide a certain similarity to the retrieved documents for the query are added to the list of results. The amount of similarity can be adjusted by a variable parameter. A good value needs to be found by evaluation. If no document reaches this similarity border, a small number of documents are also added to the results. These documents are also the ones with the highest similarity value below the parameter. Thus the results contain in each case at least some documents that need to be tagged by more users to become available as a search result. Figure 1 visualizes this process.
query query results
invoke script
process results | start query
search terms relevance and emotions ranking return database results query database
query for similar documents build new list of results rank again
enriched query results
save tagging values
adding documents and emotional tagging documents
return similar documents
display results to user
Figure 1. Relevance ranking with addition of similar documents.
As long as a document is not tagged frequently enough by users, there will be no information about the emotional tagging. Instead of a scale and the arithmetic mean value for the selected emotions, there will be just a message that asks the user to tag this document. The message’s text is adapted to the queried emotions; for example, the following message could be displayed instead of a scale: “Does this image depict or evoke love? Tell us what you think and tag it!” As the comparison of documents is based on colour histograms, the documents with a
390
Siebenlist, Knautz
high similarity value have a high probability of displaying similar content. Thus, it makes sense to ask the user if the same emotions are appropriate for the proposed document. If there are no or too few documents to compare a document to, further investigation of the colour histograms is needed. Therefore, the colour distribution of the image can be divided into warm and cold colours, and these can be applied to a group of emotions.
5.3 Content-based recommendations In addition to the use of comparing colour histograms for the purpose of relevance ranking, this data can be used to make recommendations to the users. The recommendations will be shown to the users aside from the list of results for a query. As the queries of the users are logged, a search profile of every user is built. Using the information in this profile, the user will get recommendations regarding the search behaviour and the emotional tagging that has been done so far by the user. A comparison between the tagging behaviour of the users can be drawn in order to receive documents that one user has already tagged, but another similar user has not. These documents will be recommended to users that have not tagged it yet. In order to solve the cold-start item situation, documents with too few ratings are recommended for tagging. For these recommendations, a comparison of the users’ tagging behaviour and the image histogram of the document are utilized.
5.4 Concluding remarks about the use of content-based features So far, we only have begun integrating content-based retrieval and comparison of colour histogram into MEMOSE. First tests with users show promising results. The recommendations have been correct in the majority of cases. But due to the small number of participants and documents, no reliable conclusion can be drawn from these early tests. An extensive evaluation of these new features is in preparation, and is necessary to obtain conclusive results. Deselaers et al. (2008) showed that a comparison using colour histogram does lead to usable results, although it should not be used solely as method of giving recommendations. We have shown an approach for using content-based features to solve the cold-start problem. We have to deal with two situations with MEMOSE: cold-start system and cold-start item. Both situations can be addressed using the techniques described here. The recognition of faces within images using OpenCV provides,
Chapter 15. The critical role of the cold-start problem
391
in most cases, good results. The users are informed that faces have been recognized on an image and can report false positives. However, the automatic emotion recognition using facial expressions is a hard task. We will do further research, since we have a sufficient number of images that contain one or more faces using the Cohn-Kanade database. The extraction of face elements does work, though. Therefore, we propose using the analysis of images with colour histograms as a supportive technique within a broader solution that combines this histogram comparison with an incentive system that will now be explained further.
6 Incentive system MEMOSE is designed to enable users to find videos, music and pictures, which convey or evoke certain emotions. In order to facilitate this emotional search, the media files have to be attributed with emotional properties. This is achieved by providing a slider-based emotional tagging tool to users. As a direct consequence of this approach, a high degree of user activity becomes a necessity. This part of the paper shows a multi-layered approach to an incentive system, incorporating concepts of role-playing games and gaming platforms with the purpose of motivating users to become active parts of the community and provide the necessary emotional tagging.
6.1 Depending on active users When the World Wide Web had just seen the light of day in the early 1990s, very few people could provide online content. While these websites could be accessed online, their usage was of a purely passive nature, because user-generated content was too complex to be realized at that time. However, this changed in the beginning of the 21st century with the creation of new services that are much easier to use. Using such services, users are now able to actively generate content, instead of simply consuming it. The role of the user thus changed from a passive content consumer to that of an active content provider. With the use of smartphones, computers and netbooks, users of today’s age can produce content in the form of videos, images and music at nearly any time and place and distribute it through the online communities of social media services. Similar to other Web 2.0 platforms, there is also a reciprocal, interdependent relationship between MEMOSE and its users. MEMOSE requires active users who provide content and thus create an attractive user interface.
392
Siebenlist, Knautz
On the other hand, users also desire the platform to be a source of multimedia documents or a stage on which they can present their own works. Unfortunately, this dependency in Web 2.0 is usually unbalanced. Because a user can choose among a huge variety of different services, the creator of a platform is forced to try to convince the user of the singularity of his service and provide incentives for its use. This imbalance is not necessarily a bad thing, since it forces the provider to continually improve the platform and thus make it more attractive for the users. Active user participation is especially important for MEMOSE, because the emotional search of multimedia documents is largely based on user-generated metadata. Only when new images, videos or music pieces are indexed with a sufficient number of emotional ratings can the underlying concept of the platform unfold.
6.2 Gaming concepts in an information context By indexing multimedia files emotionally, users provide a service that cannot be automated to that extent. Besides the previously described extraction of low-level features, we thus require an intellectual indexation. By using a pre-determined set of ten emotions, the emotional indexation by the user follows the categorization model. This is supplemented with the possibility of judging the intensity of an emotion through the use of the dimension model. We differentiate her between the emotions depicted by the medium and those that a user experiences while looking at it or listening to it (Knautz et al., 2010; Knautz et al., 2011). MEMOSE not only enables a user to search for media and these two different types of emotions, but also allows him to find out more about the emotional expressiveness of his own media by sharing them with the community. The system thus provides an incentive for users to integrate their own contents into the system. However, the tagging of other users’ contents carries no similar incentives. Since the indexing of content follows a collaborative approach in MEMOSE, we have thus to create incentives for users. In the last few years, game mechanics from digital roleplaying games implemented in online services have become a popular method to increase user participation. Well-known examples are locating games like foursquare.com, a service in which users transmit their current location through the GPS-locating function of their smartphones. Status messages about the restaurants they have visited or the distances they have covered turn elements of everyday life into a game. Besides the supplementary analysis features, users are also rewarded with points and badges for sharing with the community. The goal of such services is to inexpensively learn the habits and likes and dislikes of a new generation of users in order to be able to bind these users to a product or brand.
Chapter 15. The critical role of the cold-start problem
393
Measures like these are often summed up with the controversially discussed term gamification. Critiques (e.g. Bogost, 2011; Robertson, 2011) begin with the neologism gamification itself. For them, the term is a semantic turgidity, since most applications only offer a point system in conjunction with achievements. According to Robertson (2011), “[g]ames give their players meaningful choices that meaningfully impact on the world of the game” (para. 6). Real game mechanics, desirable goals and rule systems are absent, for which reason Robertson prefers the term pointification. Gamification thought leader Gabe Zichermann, on the other hand, defines gamification as “[t]he use of game thinking and game mechanics to engage users and solve problems” (Zichermann & Cunningham, 2011, p. XII). According to this definition, gamification does mean turning elements of everyday life into a game, but rather to use aspects from games successfully in other areas: Gamification makes it possible for big brands and start-ups alike to engage us in meaningful and interesting ways, with an eye on aligning our personal motivations with their business objectives. The net product of this effort will be more engagement, better products and – generally – more fun and play in all spheres of our lives. (p. XII)
On our platform, we thus also use game mechanics and give users incentives to participate in emotional tagging. It is important to consider after all that “[p] eople play not because they are personally interested in solving an instance of a computational problem but because they wish to be entertained” (von Ahn & Dabbish, 2008, p. 60).
7 Engage users – multi-layered incentive system In their paper “Designing games with a purpose”, Ahn and Dabbish (2008) characterize three motivational factors which seem to argue for the implementation of game mechanics for data generation. They mention that despite the growing existence of Internet resources, humans can often solve problems which are (as of yet) impossible for computers to solve. A third argument is that people like to play games, and spend a lot of time doing it at the computer (von Ahn & Dabbish, 2008). Taken together, these three factors open up the possibility of letting the multimedia documents in MEMOSE be indexed emotionally by networked individuals. This leaves the question of how we can successfully use the mechanics of gamification to motivate users to participate and interact. It should be noted that we differentiate between the intrinsic and the extrinsic motivation of a person
394
Siebenlist, Knautz
(Barbuto & Scholl, 1998). If a person is intrinsically motivated, it means that they are doing something that satisfies their own interests, something that they enjoy doing. The act is thus autotelic. If, on the other hand, a person is extrinsically motivated, they are doing something because they expect to gain certain advantages or rewards for it. Krystle Jiang (2011) summarizes the relation between gamification and motivation: “More and more companies are converting to gamebased marketing tools and are met with great success. The reason is because game mechanics are strong extrinsic psychological motivators in a domain where there is little to no intrinsic motivation” (p. 2). We can thus say that it is important to use extrinsic motivators in order to address intrinsic needs. The company Bunchball, which specializes in gamification, explains which game mechanics can be used to satisfy specific desires of users (Table 1). It is easy to see that all game mechanics (e.g. points) address one primary game dynamic (reward), but they also each cover different motivations (status, achievement, competition, altruism).
Game Mechanics
Points Levels Challenges Virtual Goods Leader boards Gifting & Charity
Human Desires Reward
Status
Achievement
x
ܰ x ܰ ܰ ܰ ܰ
ܰ ܰ x ܰ ܰ ܰ
ܰ ܰ
SelfExpression
ܰ x
Competition
Altruism
ܰ ܰ ܰ ܰ x ܰ
ܰ ܰ ܰ x
Table 1. Interaction of basic human desires and game play. The “x” signifies the primary desire DSDUWLFXODUJDPHPHFKDQLFIXOILOVDQGWKHܫܰܪLQGLFDWHVWKHRWKHUDUHDVWKDWDUHDIIHFWHG (Bunchball, 2010, p. 9).
In a prototype version of the incentive system of MEMOSE, we integrated points, levels and leader boards in order to cover as many motivations as possible with few mechanics. Below, we will show how the implementation works.
7.1 Points: Hatching a monster, rewarded by experience points Points form an important, if not a vital, part of incentive systems. Points and the rewards connected to them have the function of motivating users to interact in the ways the operator desires. This form of game mechanics (token economy) has its
Chapter 15. The critical role of the cold-start problem
395
origin in behaviour therapy and is based on the principles of operative conditioning (e.g. Ayllon & Azrin, 1968; Kazdin, 1977). Points are positive, generated multipliers, which are used to develop and maintain a certain behaviour. By awarding points for every successfully produced output of a user, they become an immediate motivational game mechanic. Points are rewards and provide feedback about past performance. Furthermore, they provide a way for users to make a name for them and distinguish themselves from others by increasing their points. Depending on the requirements according to which points are awarded, different kinds of point can be identified. Zichermann and Cunningham (2011) differentiate between five different kinds of points: experience points (XP), redeemable (RP), skill, karma and reputation. XP in particular play an important role. They are awarded for every successful activity and inform the players as well as the operator about rank, status and abilities. Redeemable points, on the other hand, work as a sort of currency that users can save up and subsequently spend on certain things. The third kind of points, skill points, can be earned via specific activities in the system and are closely linked to XP and RP. They function as bonus points which are awarded for meta-successes. According to Zichermann and Cunningham, karma points are used only rarely. They can be earned through evaluations of contributions and can be spent on polls for system improvements. The last and most complex kind of point that Zichermann and Cunningham (2011) mention is reputation points. Integrity and consistency are very important to make reputation points work. They are awarded for receiving positive evaluations of some form from other users (e.g. a positive critique of the user’s pictures). As a first step, we integrated XP into MEMOSE. After his registration a user is prompted to choose a monster egg. This egg basically contains a personal MEMOSE pet, and is linked directly to the user’s account (Figure 2). The user can now cause the egg to hatch by accumulating enough experience points. XP can be earned by tagging media with emotions; thus, we have developed a first incentive to use the system actively. Once the monster has hatched, further XP allow it to develop and reach higher evolutionary stages. Thus, a monster’s development stage becomes the indicator of the user’s activity and a sort of status symbol within the MEMOSE community (Figure 3). Amongst other things, newly uploaded media by users whose monster has reached a higher stage of development are given priority placement in the New Media view. As a result, copious tagging of other people’s media increases the probability that one’s own media will be emotionally tagged.
396
Siebenlist, Knautz
Figure 2. User profile with MeMonster.
Furthermore, experience points can be used as a form of currency on the platform, e.g. for buying a better placement of one’s image in the New Media view (not to be confused with the placement in the search results). Redeemable points should be introduced for such transactions, but in order to avoid unnecessary complexity in the system, we have decided to restrict ourselves to XP in the early stages. This policy confronts users with the choice between improving their placement in the short term through an investment of XP, or developing their monster further in order to achieve a better placement in the long term. The integration of experience points and their use as digital currency is one of the cornerstones of the MEMOSE incentive system, and provides lots of room for its extension.
7.2 Levels: Noob or pro, show your progress According to a theory by Latham and Locke (1990), human acts are goal-oriented. In their goal setting theory, they assume that conscious behaviour is steered by individual goals and serves a purpose. Challenges increase levels of performance and lead to satisfaction upon success. This satisfaction influences the level of commitment a person feels, which then initiates new actions.
Chapter 15. The critical role of the cold-start problem
397
Their research has shown that, once established, it makes no difference whether a goal was assigned (from someone else) or self-initiated (Latham & Locke, 1990; Klein, Wesson, Hollenbeck, & Alge, 1999). What is important, though, is to provide feedback about the performance level with respect to the goals, in order to increase the task-specific self-efficacy (Bandura, 1986). People with a higher level of self-efficacy set higher goals for themselves and have a higher level of commitment (Latham & Locke, 2002). These research results can be applied to the development of incentive systems. The accumulation of points can serve as positive reinforcement to motivate users to initiate further interactions. The implementation of levels that can be reached serves a sign of progress. The pursuit of higher levels through the collection of points is based on the basic principle of goal-setting theory, which states that human acts are goal-oriented. It should be noted that higher goals are harder to reach and require a more complex solutions (Latham & Locke, 1990; Zichermann & Cunningham, 2011). The possibility of the MEMOSE monster reaching various development stages reflects the platform’s level design. Each higher level increases the number of new points necessary to attain said level. The levels themselves function as a progress marker that reflects a user’s performance (Figure 2 and 3). The level of performance indicated the user’s personal status within the community. Thus users are always anxious to improve said status. At this time, we have realized six evolutionary stages of the MeMonster, which will be further extended in the future. A progress bar enables the user as well as visitors of his profile page to see the current evolutionary stage and point total (Zichermann & Cunningham, 2011). The progress bar also shows how many points are needed to rise to the next level. It hopefully incites the user to earn the necessary points and discover the next evolutionary stage of his personal avatar.
Figure 3. MEMOSE monster in its first evolutionary stages (hatch by accumulating enough experience points).
398
Siebenlist, Knautz
7.3 Achievements: Rewards for XP and levels A further possibility for motivating users is to allow them to unlock achievements. Achievements (or badges) reward the user for specific activities or reaching certain milestones. Like the MEMOSE monster, achievements also originated from video games and have been used by large gaming platforms like Xbox LIVE, PlayStation Network, or Steam for some time now to motivate users (Jakobsson, 2011). Similar to points, achievements act as positive reinforcement. However, they motivate those users who have already unlocked an achievement more than those who have not. It is thus important to award the first achievement for reaching a simple goal (e.g. registering). Zichermann and Cunningham (2011) list the following reasons for the success of achievements: In addition to signalling status, people want badges for all kinds of reasons. For many people, collecting is a powerful drive. Other players enjoy the sudden rush of surprise or pleasure when an unexpected badge shows up in a gamified system. A well-designed, visually valuable badge can also be compelling for purely aesthetic reasons. (p. 39)
The statement can be supplemented with the summary of Antin and Churchill (2011), who explain five basic functions of badges: – Goal-Setting: human behaviour is goal-oriented, users like to collect points and unlock badges in order to reach their goals; – Instruction: lists of achievements inform the user about the possibilities of using the system; – Reputation: the number and nature of badges allows others to draw conclusions about the experience, competency and trustworthiness of the user; – Status: badges represent past successes and are accessible to others without necessitating verbal interaction; difficult achievements can lead to a better status; – Group Identification: attaining difficult achievements can cause a feeling of solidarity between users. (pp. 2 – 3) These days, many online services use achievements to meticulously protocol when users reach various milestones. The user thus profits from a sort of gallery to showcase his achievements and possibly gain recognition (Medler, 2009). While collecting achievements satisfies a human desire, analysing the unlocked achievements allows the service operator to gain valuable information. Various user groups can be differentiated, depending on which achievements were unlocked. Jakobsson (2011) differentiates between casuals, hunters and completists. Zichermann and Cunningham (2011) identify explorers, achiev-
Chapter 15. The critical role of the cold-start problem
399
ers, socializers and killers. Identifying various user types facilitates adjusting game mechanics and thus also satisfying and reinforcing user needs. Because of these reasons, we have similarly implemented achievements in MEMOSE. If, for example, a user reaches the list of top taggers, he earns a corresponding achievement with the title Tag it like it’s hot. Even if the user fails to stay in the list, his achievement has been eternalized in his achievement gallery (Figure 4). The achievements are further linked to experience points, rewarding users with a certain number of points for each achievement. Like the concept of experience points, this system is easily extendable and should help to motivate users longterm.
Figure 4. Achievement systems.
7.4 Leader boards: Social rewarding for users According to Locke and Latham (1990), users can be motivated through challenging and specifically defined goals. The implementation of leader boards as a game mechanic is another way to formulate goals and motivate users (Hoisl et al., 2007; von Ahn & Dabbish, 2008; Zichermann & Cunningham, 2011). Furthermore, ranking lists inform the service provider that a game is being played, and how it is used (Zichermann & Linder, 2010). Ranking lists compare different achievements of users, and are usually easily understood. Competing with others and trying to reach first place can satisfy different human desires like status, grandstanding or
400
Siebenlist, Knautz
competitiveness. Hoisl, Aigner, and Miksch (2007) point out, however, that while this mechanic can be used to stimulate motivation extrinsically, intrinsic motivation is a necessary prerequisite. The ranking lists implemented in MEMOSE cover different areas: most active tagger, most active uploader, and so on. We are planning another kind of ranking list, as proposed by Zichermann and Cunningham (2011). They deem the comparison with direct friends in an online service as more interesting for users than the comparison with everyone. People tend to compete with other people around them. This is true for the digital world as well. Furthermore, such a personal ranking list is very motivating, since one’s own name is always listed. Leader boards which apply to the whole user base may promise greater reputation, but they can also have a demotivating effect on people who hold a very low rank.
7.5 Comments: Expressing thoughts Since the applications of emotionally indexed multimedia documents are manifold, it is apparent that very different user groups will exist. For example, emotional aspects play a vital part in music therapy. Its fields of application range from the therapy of psychotic patients with clinical depression (Lai, 1999; Hsu & Lai, 2004) or geriatric psychiatric patients (Lee, Chan, & Mok, 2010; Clair & Memmott, 2008; Short, 1995) to the rehabilitating treatment of neurological illnesses and manualized treatments of tumours (Burns, 2001). Another important application of emotionally indexed multimedia documents can be found in the advertising industry. Targeted and conscious influencing of people causes conscious or subconscious desires. In this case, factual product information fades into the background. Methods of suggestion (McChesney, 2008) call the subconscious mind. However, in this first instance, MEMOSE was developed for social media users. Recent studies (Bischoff, Firan, Nejdl, & Paiu, 2008, 2010) show that if we assign categories like time, place or type to tags, we find a large discrepancy between assigned tags and searched tags – particularly in the area of opinions / qualities. Thus, 24 % of Flickr queries consist of affective tags, but only 7 % of the documents have been indexed with affective tags (Bischoff, Firan, Nejdl, & Paiu, 2010). Users are searching for emotionally charged content, but they cannot find it. MEMOSE shall try to bridge this gap. It shall also allow users to exchange opinions and comments, as they do on other platforms (Flickr, YouTube). We consider MEMOSE to be a part of Web 2.0 and a website “which encourage[s] user-generated content in the form of text, video, and photo postings along with comments” (Cormode & Krishnamurthy, 2008, p. 4). Registered
Chapter 15. The critical role of the cold-start problem
401
users can add a comment to pictures of their own choosing. Unregistered visitors can access these contents, but cannot add comments. In this way, we pre-emptively reduce the number of undesirable comments and spam messages, while reducing user-unfriendly mechanisms like CAPTCHAs. For logged in users, a form under each image offers a quick means of commenting. Each comment is cleaned of malicious code and hyperlinks before being linked to its author and the image in the database. The presentation of the comment in the browser is managed via AJAX, thus eliminating the need of reloading the page. The published comment also contains the publishing date as well as a link to the author’s profile. In future versions, an avatar will accompany the nickname. Comments are ranked by date in a descending manner, thus allowing visitors to quickly trace the history of user reactions to a picture. The comment system is only a first step towards connecting users with each other. Further measures towards strengthening the community shall follow (Kim, 2000; Zichermann & Linder, 2010; Zichermann & Cunningham, 2011). It is one of many functions of MEMOSE to encourage users “to create an account in order to more fully engage with the site” (Cormode & Krishnamurthy, 2008, p. 11).
8 Concluding remarks Carrying game mechanics over from the gaming world into other economic and social areas is becoming increasingly common. The term used to describe this phenomenon, gamification, is being discussed controversially. The main point of criticism is the lack of rules and concrete goals. Nonetheless, implementing gaming elements in online services proves very successful at motivating users to generate data. These data are then used by companies and serviced to bind users to a certain brand, product or service. We believe that MEMOSE has the potential to be used by interested users even without such gamification elements. Studies show that emotional contents are searched for, but not found, in Web 2.0 services like YouTube, Flickr and Last.fm. A need for emotionally indexed media does thus exist. It is furthermore in the nature of many people to express their originality. By offering the possibility of uploading one’s own media, MEMOSE creates a way for users to act out their creativity and individuality. However, MEMOSE is still in the early stages of its development and depends on collaborative user input (e.g. indexing). Active user participation is important, because the emotional search is largely based on user-generated metadata. The extraction of low-level features has more of a support role. In order to overcome the beginning difficulties of a social media platform and motivate users to par-
402
Siebenlist, Knautz
ticipate, we thus decided to make use of the implementation of game mechanics. By rewarding users and using positive reinforcement we hope to encourage the “desired” behaviour: the emotional tagging of media. Specifically, we have implemented points, levels, achievements, leader boards and comments in MEMOSE. Furthermore, we have added “Share on Facebook” buttons. We are planning to further enhance the social, community-oriented aspects by integrating a friendship system, sensitive high score lists and competitions for uploading media with emotional content. All of these measures are based on the idea that human behaviour has a purpose and is steered by individual goals and desires. These goals can be image cultivation, reputation or recognizing complex patterns, amongst others. Extrinsic methods (e.g. collecting points and achievements) are linked to intrinsic desires (e.g. the wish to complete something). We are aware that the game mechanics presented here are only a start and require extension. Nevertheless, game mechanics offer us a means to motivate users and generate data.
Acknowledgements We wish to thank Daniel Guschauski, Daniel Miskovic, and Jens Terliesner, whose valuable support has helped us with working on MEMOSE.
References Al-Khatib, W., Day, Y. F., Ghafoor, A., & Berra, P. B. (1999). Semantic modeling and knowledge representation in multimedia databases. IEEE Transactions on Knowledge and Data Engineering, 11(1), 64 – 80. Antin, J., & Churchill, E. (2011, May 7 – 12). Badges in social media: A social psychological perspective [Blog post]. Retrieved from http: // gamification-research.org / chi2011 / papers Arnheim, R. (1983). The power of the center: A study of composition in the visual arts. Berkeley, CA: University of California Press. Ayllon, T., & Azrin, N. (1968). The token economy: A motivational system for therapy and rehabilitation. New York: Appleton-Century-Crofts. Azcarate, A., Hageloh, F., van de Sande, K., Valenti, R. (2005). Automatic facial emotion recognition. Retrieved from http: // staff.science.uva.nl / ~rvalenti / projects / mmis / Automatic%20Facial%20Emotion%20Recognition.pdf Bandura, A. (1986). Social foundations of thought and action: A social cognitive theory. New York: Prentice Hall. Barbuto, J., & Scholl, R. (1998). Motivation sources inventory: Development and validation of new scales to measure an integrative taxonomy of motivation. Psychological Reports, 82, 1011 – 1022.
Chapter 15. The critical role of the cold-start problem
403
Bischoff, K., Firan, C., Nejdl, W., & Paiu, R. (2008). Can all tags be used for search? Proceedings of the 17th ACM Conference on Information and Knowledge Management (pp. 203 – 212). Bischoff, K., Firan, C., Nejdl, W., & Paiu, R. (2010). Bridging the gap between tagging and querying vocabularies: Analyses and applications for enhancing multimedia IR. Journal of Web Semantics, 8(2), 97 – 109. Bunchball. (2010, October). Gamification 101: An introduction to the use of game dynamics to influence behavior. [Blog post]. Retrieved from http: // www.bunchball.com / gamification / gamification101.pdf Burns, D. (2001). The effect of the bonny method of guided imagery and music on the mood and life quality of cancer patients. Journal of Music Therapy, 38(1), 51 – 65. Chia, J. (2010). Emotion recognition through facial expressions. Retrieved from http: // www.cs.ubc.ca / ~johnchia / emotion-recognition.pdf Clair, A., & Memmott, J. (2008). Therapeutic uses of music with older adults. Silver Spring, MD: American Music Therapy Association. Colombo, C., Del Bimbo, A., & Pietro, P. (1999). Semantics in visual information retrieval. IEEE MultiMedia, 6, 38 – 53. Cormode, G., & Krishnamurthy, B. (2008). Key differences between Web 1.0 and Web 2.0. Retrieved from http: // www2.research.att.com / ~bala / papers / web1v2.pdf Dellandrea, E., Liu, N., & Chen, L. (2010). Classification of affective semantics in images based on discrete and dimensional models of emotions. Proceedings of the IEEE International Workshop on Content-Based Multimedia Indexing (CBMI). Deselaers, T., Keysers, D. & Ney, H. (2008). Features for image retrieval: An experimental comparison. Information Retrieval, 11(2), 77 – 107. Feng, H., Lesot, M. J., & Detyniecki, M. (2010). Using association rules to discover color-emotion relationships based on social tagging. Proceedings of Knowledge-Based and Intelligent Information and Engineering Systems (pp. 544 – 553). Hoisl, B., Aigner, W., & Miksch, S. (2007). Social rewarding in wiki systems – motivating the community. Proceedings of the 2nd International Conference on Online Communities and Social Computing (pp. 362 – 371). Hsu, W., & Lai, H. (2004). Effects of music on major depression in psychiatric inpatients. Archives of Psychiatric Nursing, 18(5), 193 – 199. Itten, J. (1973). The art of color: The subjective experience and objective rationale of color. New York: Van Nostrand Reinhold. Jakobsson, M. (2011). The achievement machine: Understanding Xbox 360 achievements in gaming practices. Game Studies. International Journal of Computer Game Research, 11(1). Retrieved from http: // gamestudies.org / 1101 / articles / jakobsson Jaing, K. (2011). The dangers of gamification: Why we shouldn’t build a game layer on top of the world. Retrieved from http: // krystlejiang.files.wordpress.com / 2011 / 07 / the-dangers-ofgamification.pdf Jörgensen, C. (2003). Image retrieval: Theory and research. Lanham, MD: Scarecrow Press. Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive database for facial expression analysis. Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG’00) (pp. 46 – 53). Kazdin, A. (1977). The token economy. New York: Plenum Press. Kim, A. J. (2000). Community building on the Web: Secret strategies for successful online communities. Boston: Addison-Wesley Longman Publishing Co.
404
Siebenlist, Knautz
Kim, E. Y., Kim, S., Koo, H., Jeong, K., & Kim, J. (2005). Emotion-based textile indexing using colors and texture. Proceedings of Fuzzy Systems and Knowledge Discovery (pp. 1077 – 1080). Kim, N. Y., Shin, Y., & Kim, E. Y. (2007) Emotion-based textile indexing using neural networks. Proceedings of the 12th International Conference on Human-Computer Interaction: Intelligent Multimodal Interaction Environments (pp. 349 – 357). Kim, Y., Shin, Y., Kim, S., Kim, E. Y., & Shin, H. (2009). EBIR: Emotion-based image retrieval. Digest of Technical Papers, International Conference on Consumer Electronics. Klein, H., Wesson, M., Hollenbeck, J., & Alge, B. (1999). Goal commitment and the goal setting process: Conceptual clarification and empirical synthesis. Journal of Applied Psychology, 64, 885 – 896. Knautz, K., Neal, D., Schmidt, S., Siebenlist, T., & Stock, W.G. (2011). Finding emotional-laden resources on the World Wide Web. Information, 2(1), 217 – 246. Knautz, K., Siebenlist, T., & Stock, W.G. (2010). MEMOSE. Search engine for emotions in multimedia documents. Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 791 – 792). Lai, Y. (1999). Effects of music listening on depressed women in Taiwan. Issues in Mental Health Nursing, 20(3), 229 – 246. Lee, Y., Chan, M., & Mok, E. (2010). Effectiveness of music intervention on the quality of life of older people. Journal of Advanced Nursing , 66(12), 2677 – 2687. Locke, E., & Latham, G. (1990). A theory of goal setting and task performance. Englewood Cliffs, NJ: Prentice-Hall. Lowe, D. G. (1999). Object recognition from local scale-invariant features. Proceedings of the International Conference on Computer Vision (pp. 1150 – 1157). Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91 – 110. McChesney, R. W. (2008). The political economy of media: Enduring issues, emerging dilemmas. New York: Monthly Review Press. Medler, B. (2009). Generations of game analytics, achievements and high scores. Eludamos. Journal for Computer Game Culture, 3(2), 177 – 194. Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture measures with classification based feature distributions. Pattern Recognition, 29(1), 51 – 59. Park, S.-T., Pennock, D., Madani, O., Good, N. & DeCoste, D. (2006). Naive filterbots for robust cold-start recommendations. Proceedings of the 12th ACM SIGKDD Iinternational Conference on Knowledge Discovery and Data Mining (pp. 699 – 705). Pavlidis, T. (1988). Image analysis. Annual Review of Computer Science, 3, 121 – 146. Puzicha, J., Rubner, Y., Tomasi, C., & Buhmann, J. (1999). Empirical evaluation of dissimilarity measures for color and texture. Proceedings of the International Conference on Computer Vision (pp. 1165 – 1173). Robertson, M. (2010, October 6). Can’t play, won’t play [Blog post]. Retrieved from http: // www.hideandseek.net / 2010 / 10 / 06 / cant-play-wont-play / Roinestad, H., Burgoon, J., Markines, B., & Menczer, F. (2009). Incentives for social annotation. Proceedings of the 20th ACM Conference on Hypertext and Hypermedia (pp. 327 – 328). Rüger, S. (2010). Multimedia information retrieval. San Rafael, CA: Morgan & Claypool Publishers. Short, A. (1995). Insight-oriented music therapy with elderly residents. The Australian Journal of Music Therapy, 6, 4 – 18.
Chapter 15. The critical role of the cold-start problem
405
Smuelders, A.W.M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349 – 1380. Suhasini, P.S., Krishna, K.S., & Krishn, I.V.N. (2009). CBIR color histogram processing. Journal of Theoretical and Applied Information Technology, 6(1), 116 – 122. Tamura, H., Mori, S., & Yamawaki, T. (1978). Textural features corresponding to visual perception. IEEE Transactions on Systems, Man, and Cybernetics, 6, 460 – 472. von Ahn, L., & Dabbish, L. (2008). Designing games with a purpose. Communications of the ACM, 51(8), 58 – 67. Wang, S., & Wang, X. (2005). Emotion semantics image retrieval: An [sic] brief overview. Proceedings of the 1st International Conference on Affective Computing and Intelligent Interaction (pp. 490 – 497). Wang, S., Chen, E., Wang, X., & Zhang, Z. (2003). Research and implementation of a content-based emotional image retrieval model. Proceedings of the Second International Conference on Active Media Technology (pp. 293 – 302). Wang, W.-N., Yu, Y.-L., & Zhang, J.-C. (2004). Image emotional classification: Static vs. dynamic. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (pp. 6407 – 6411). Wash, R., & Rader, E. (2007). Public bookmarks and private benefits: An analysis of incentives in social computing. Proceedings of the Annual Meeting of the American Society for Information Science and Technology, 44. Zhao R., & Grosky, W. I. (2002). Bridging the semantic gap in image retrieval. In T. Shih (Ed.), Distributed multimedia databases: Techniques & applications (pp. 14 – 36). Hershey, PA: Idea Group Publishing. Zichermann, G. (2011, April 26). The purpose of gamification [Blog post]. Retrieved from http: // radar.oreilly.com / 2011 / 04 / gamification-purpose-marketing.html Zichermann, G., & Cunningham, C. (2011). Gamification by design: Implementing game mechanics in Web and mobile apps. Sebastopol, CA: O’Reilly Media. Zichermann, G., & Linder, J. (2010). Game-based marketing: Inspire customer loyalty through rewards, challenges, and contests. Hoboken, NJ: John Wiley & Sons.
Caroline Whippey
Chapter 16. Non-textual information in gaming: A case study of World of Warcraft Abstract: This exploratory study investigated non-textual information in the Massively Multiplayer Online Role Playing Game World of Warcraft (WoW). Using autoethnography, the ways that visual and audio information is presented in WoW, the information they provide, as well as implications for information search and retrieval are examined. The visual and audio information in the game provides an understanding for how the game should be played and how to react in specific in-game contexts. Browsing strategies are often utilized in the information searching process. The non-textual elements of game design may be used to develop better, more interactive, information search and retrieval systems. Keywords: Video game, visual information, audio information, World of Warcraft, browsing, information searching, information retrieval, engagement, virtual environment, cinematic, sound, music
Caroline Whippey, PhD Candidate, Faculty of Information and Media Studies, The University of Western Ontario, [email protected]
Introduction Video games have become an increasingly important part of the entertainment industry, as well as the academic discourse. There have been a multitude of research projects conducted surrounding video games within the last decade, including the political economy of gaming (e.g., Dyer-Witheford & de Peuter, 2009), gaming as consumption (e.g., Castronova, 2007), culture and identity within gaming (e.g., Corneliussen & Rettburg, 2008), and user-created modifications for games (e.g., Kow & Nardi, 2010). Within library and information science, the study of video games is just beginning. For example, Adams (2009) has examined the game City of Heroes as an information space, exploring the information behaviours of players within. Winget (2011) has explored the challenges pertaining to the preservation of video games, which involves preserving the games themselves, as well as the systems required for play. Nonetheless, an element often overlooked in the discourse of video games is how information is
Chapter 16. Non-textual information in gaming
407
presented through the game’s design, and implications for information searching and retrieval. The word ‘information’ may be defined in many different ways. Generally, information can be defined as anything that can alter a person’s knowledge (Marchionini, 1995). Buckland (1991) distinguished information-as-thing: the objects that may impart information. In this chapter, I will examine information as audio and visual digital objects, found in a virtual world. Objects may transfer information to a person’s cognitive system, altering the state of their knowledge (Marchionini, 1995). Non-textual information is ubiquitous within video games. Each game combines elements of images, video, music, audio, and text to create a rich and complex environment. This exploratory study will provide an overview of how visual and auditory information is presented in World of Warcraft (WoW). WoW, a Massively Multiplayer Online Role Playing Game (MMORPG), is a game played with other people over the Internet with a focus on fantasy. WoW is a full-scale, three-dimensional virtual world where people interact and play through the use of an avatar without leaving their physical space. As of 2010, there were more than 12 million people across the world involved in the game (Blizzard Entertainment, 2010). The world is made up of a 3-D graphical environment, with visual special effects, cinematics, background music, audio speech, sound effects, and textual elements. In order to play the game, players must absorb and learn a great deal: players are in a constant state of learning, and they are always in need of new information (Nardi, 2008). Information is presented in a variety of ways through images, video, sound, music, and text. In this exploratory study, I used my own experience as a player to examine how non-textual information is presented in WoW, how this information may be accessed and used, and how this may influence information retrieval. First, an overview of WoW will be provided. Second, I will examine the existing literature on visual and auditory information and processing in games, as well as on information searching. Through utilizing autoethnographical methods, I will then illustrate how non-textual information is used in WoW in both design and play. This study will add to the discourse of video games in the field of library and information science, aiding in our understanding of non-textual information and furthering the discussion of information searching and retrieval.
408
Whippey
World of Warcraft: An overview World of Warcraft was created and is maintained by Blizzard Entertainment. WoW is divided into many realms, each with 20,000 people maximum (Ducheneaut, Yee, Nickell, & Moore, 2006). Each realm is hosted on a server which is divided in order to support WoW’s large player base, including players from North America, Europe, Asia, and Oceania. There are four types of realms that cater to different playing styles. The first, Player versus Player (PvP), provides opportunities for players to fight against each other in player versus player combat throughout gameplay. Player versus Environment (PvE) realms require players to consent to PvP combat: players are fighting primarily against characters controlled by the game. Role Play (RP) realms operate in a manner similar to PvE realms, however, players are expected to act as they believe their character would during play. Finally, Role Play Player versus Player (RPPvP) realms combine the rules of PvP and RP servers. Through selecting a realm type (the first step in entering the game world), players will influence the kind of game experience they will have. The next step is to create a character or avatar. The avatar can be viewed as a second self, with physical qualities and traits different from a player’s real embodied persona (Rettburg, 2008). Character creation is a multi-stage process. Players must first select one of two opposing factions: the Alliance or the Horde. After choosing a faction, players must then pick a race and gender for their character. There are currently six races in each faction, each providing a unique look as well as particular attributes for the character. The Alliance races are the Draenei, Dwarves, Gnomes, Humans, Night Elves, and Worgen. The Horde races are the Blood Elves, Goblins, Orcs, Tauren, Trolls, and Undead. Once players have chosen their character’s race, they must choose a character class. There are ten classes in the game, with some only available to particular races. The classes include Death Knights, Druids, Hunters, Mages, Paladins, Priests, Rogues, Shaman, Warlocks, and Warriors. The class-based system means that players need each other, which aids in building community and social contact (Taylor, 2006). Each class plays to different strengths and methods of play. For example, a Warrior may be chosen by a player preferring to be in the middle of a battle whereas another may choose a Mage in order to remain at the edges of combat. Typically, players will choose a class based on the kind of game they wish to experience. Following race and class selection, players can customize the appearance of their character, allowing for an individualized look within set parameters, and name their character. With this, the player is now ready to begin the game. WoW takes place in Azeroth, a full scale, 3-D virtual world with a large population of non-player characters (NPCs), characters controlled by the computer,
Chapter 16. Non-textual information in gaming
409
in addition to the many player-operated avatars. The rich graphical environment uses vivid colours and is cartoon-like. When playing WoW, players may choose to have audio turned on or off. While the audio is not essential, it adds substantially to the experience and provides information that cannot be gained through graphical or textual means. Azeroth is a product of recent events and history, influenced by fantasy fiction (i.e.: Tolkien’s Lord of the Rings), and cultural events exterior to the game, such as holidays, as well as a product of the players. It consists of four continents: Kalimdor, Eastern Kingdoms, Northrend and Outland. The four continents are divided into more than 70 randomly scattered ‘zones’. Each zone is geared toward a specific set of character levels. Each character starts at level one equipped with simple armour and weapons and has little experience or skill, rendering them weak and vulnerable in battle. There are twelve different starting zones, and each character’s starting place depends on the race the player selects. Players journey to different zones as they gain levels for their character. Characters gain levels through gameplay by earning experience points. As the character levels, he or she becomes stronger and is able to obtain new abilities, armour, weapons, and money. Experience points are primarily gained by completing quests given by NPCs. There are thousands of quests in the game, each requiring that the player complete a specific task. Experience may also be gained by killing foes, exploring, or participating in group activities such as dungeons or raids. WoW is a diverse world with many different game activities. Players may complete the activities discussed above to gain experience points, battle other players in battlegrounds, enjoy player-versus-player combat, or spend their time working on their professions and skills. Working on professions does not provide experience points, but it does allow the player to create useful items to either use or sell. This means that players are able to play at their own speed and in their own style, alone, or in groups. Simply, in WoW, the player experience is what each individual makes of it. On December 7, 2010, the Cataclysm expansion was released for WoW. The expansion introduced new areas to explore, a new maximum level, two new playable races (the Worgen and Goblins), and alterations to previously available areas. This expansion also included more cinematics within the game, vivid new backgrounds, new music, and many more audible speeches for NPCs. This has increased the amount of non-textual information available to players, as well as released a multitude of new information into the game space. Further information on how to play WoW, as well as the rich history of the game, can be found on the WoW website by Blizzard Entertainment at http: //
410
Whippey
www.worldofwarcraft.com. Videos of gameplay, as well as several cinematics from the game, can also be found on this website.
Literature review Human existence contains a series of interactions with our environments, both physical and virtual (Marchionini, 1995). People gain a great deal of knowledge through simply being aware, becoming conscious, and sentient in our physical environments and social contexts (Bates, 2002). A great deal of information and knowledge can be obtained through the visual senses. This visual experience is dynamic: there are many factors that affect it, including shape, colour, light, size, and movement (Arnheim, 1974). These elements affect our visual perception, which allows us to obtain knowledge of our environment. Perceiving is an active process: we seek out what we need through frequent eye movements, executed to obtain information to meet our needs (Ware, 2008). We make a visual query each time we search for some kind of visual information that we require to carry out a cognitive task (Ware, 2008). These queries acquire information which is then used to form a perceptual whole of our visual environment. When designing visual objects, it is important to create displays that encourage the rapid and correct processing for every important cognitive task that the display is intended to support (Ware, 2008). In video games, there are a number of factors that affect the design of a three-dimensional virtual world. The world is primarily made up of polygons (shapes) and textures (colours), with the addition of effects or filters for particular instances. 3-D environments create a more complex sense of place (Järvinen, 2002). They employ complex 3-D architectures, objects, and characters that move and interact in the space (El-Nasr & Yan, 2006). There are many visual elements in a video game, such as the game environment, health indicators, maps, timers, and ammunition counters (Nitsche, 2008). Through utilizing primary visual channels, such as colour, shape, size, orientation and motion, displays can be designed to promote particular elements of visual objects (Ware, 2008). For example, if a designer wants to make something easy to find, he or she should make it different from its surroundings according to a primary visual channel, such as colour or size. Video games employ a number of different graphical elements, which afford different gameplay styles. The game space is presented by a camera, not unlike in films (Nitsche, 2008). However, it is different from films in that people are free to explore and interact with the virtual world of a video game directly, as opposed to being a passive observer (Nitsche, 2008). The point of perception affects the way
Chapter 16. Non-textual information in gaming
411
that the player is introduced to and experiences the graphical environment. In video games, it is common to see the use of either first- or third-person perspectives, or a combination of both perspectives (Egenfeldt-Nielsen, Smith, & Tosca, 2008). In the first-person perspective, the player views the action from the point of view of the protagonist; conversely, in the third-person perspective, the player watches the avatar that they control (Egenfeldt-Nielsen, Smith, & Tosca, 2008). Often, the user’s avatar, whether first- or third-person, operates as a main point of visual interest within the game space, and the camera refers to this avatar to establish its position and orientation within the virtual world (Nitsche, 2008). When experiencing a visual environment, players must then perceive, process, and interpret the information they encounter. According to Biederman (1987), we segment visual information into recognizable shapes and categories, in order to identify what a particular object is or resembles. We arrange the identified components into a representation in memory. Everyday object recognition provides restraints on possible recognition models. In any given scene, there are a number of visual objects which must be identified. In a video game, this includes inanimate objects as well as creatures. Biederman, Mezzanotte and Rabinowitz (1985) state that this is not a simple list of objects and creatures; the mental representation we create also includes the relationships among the entities. Some relationships indicate where an object is relative to others in the scene. However, other relationships require an understanding of the referential meaning of the objects and creatures in question. Any violations of these relationships (for example, an object hanging in mid-air with nothing underneath it for support) are easily recognized. A second, valuable source of information is through sound. In audio-visual contexts, the use of audio may be relegated to a supporting role. However, Jørgensen (2008b) notes that audio may also provide specific information about a situation or setting or have direct influence on actions and events in an environment. In the context of video games, audio has the ability to support a specific mood and a sense of presence in the game as well as usability function. Egenfeldt-Nielsen, Smith, and Tosca (2008) identify four different kinds of audio in video games. Ambient sounds are non-specific sounds that contribute to the game atmosphere. Sound effects are sounds made by in-game objects. Vocalizations are the voices of characters in the game (both those controlled by the player and NPCs), and music is the soundtrack of the game. Game audio acts as a support for gameplay by providing different kinds of information (Jørgensen, 2008a). Much of this information must be understood in light of the specific context, providing an understanding for how the game should be played as well as how to act in a specific game context (Jørgensen, 2008a).
412
Whippey
Ambient sounds, sound effects, and vocalizations can be grouped together as “game sounds.” Sounds in games can be affected by a number of things. Egenfeldt-Nielsen, Smith, and Tosca (2008) note that the environment of the game will affect audio information: the size of a location, the materials of walls, and weather conditions may alter the way in which sounds are heard. Spatiality is also a factor in sound: the sound of an enemy 30 yards away is quite different from the sound of an enemy right behind the player. Game physics will also influence sound, as sounds may be affected by relative movement. Jørgensen (2008a) states that the situation in which a sound is heard is important for interpreting the informative content of the sound. Audio information primarily operates on a micro level, as game audio may aid the player in determining his or her choices and actions. Collins (2007, 2008) and Jørgensen (2008a) use the concepts of interactive and adaptive audio in their discussion of video games. Interactive audio are sounds that occur in response to player action. This audio is reactive in that it occurs as an immediate response to player action. These notifications can be classified as positive, neutral, or negative (Jørgensen, 2008a). Adaptive audio reacts to events that occur in the environment (Collins, 2007, 2008; Jørgensen, 2008a). These sounds are not created by the player, but may demand evaluation of action on the part of the player (Jørgensen, 2008a). Music is another form of audio that, in video games, has a different function than dialogue, sound effects, or ambient sound. Much of the music in video games can be seen as ambient pieces (Wood, 2009). These kinds of pieces are often low-key in nature, their purpose being to create a general emotional response or sense of place without distracting the player from gameplay (Wood, 2009). It may also contribute to the narrative of the game (Zehnder & Lipscomb, 2006). Wood (2009) notes that music must (1) be unobtrusive but not dull, (2) intrigue and encourage without being obnoxious, and (3) withstand repeated listening. A second kind of music technique used in video games is underscoring for non-interactive movies (cinematics). In cinematics, music may draw on the conventions of Hollywood. It is interesting to note that cinematics in games are becoming increasingly interactive. On December 20, 2011, Star Wars: The Old Republic (SWTOR), an MMORPG based on the Star Wars franchise, was released. The cinematic sequences in this game are incorporated into play, providing the player with the opportunity to interact with NPCs in the game. When engaging in quests, players are able to watch scenes unfold between their characters and the NPCs. Players select from a list of responses for their character, which is then spoken audibly, furthering the conversation. This allows players to have more agency in the story of their character and provides a higher level of immersion in the game.
Chapter 16. Non-textual information in gaming
413
Ultimately, sound effects, music, and speech have to be combined into one consistent overall soundscape that has qualities of its own such as balance and timing between different elements, and their relation to the moving image (Nitsche, 2008). This soundscape must also relate to the visual landscape. The synchronisation of screen sound with the image is a crucial element of the continuity of video game spaces (Nitsche, 2008). In video games, the visible world extends into the audible, and audio cues can give indications of the virtual space around the player (Nitsche, 2008). This is particularly important when providing players information about something that is happening out of their direct view. Within video games, sound and image work together as information systems, contributing to the meaning-making process. It is important to examine how sound and images affect the game experience, as the relationship between sound and image is dynamic, as opposed to fixed (Jørgensen, 2008b). It is also important to consider the element of textual information in games, as information is also presented in that medium. Players learn about the game through text describing quests, talking to other players via text chat, and textual descriptions of the images on the screen. Textual elements may assist in elucidating suspicions of what is occurring in a scene (Neal, 2010). Audio elements may also assist in clarification. Sound and images, as well as text, also contribute to the narrative of the game. Truly successful games combine each of these elements to promote user engagement. Within this rich and complex environment, players must make sense of the world around them through searching and retrieving relevant information. Information seeking is an interactive process: it depends on initiatives on the part of the seeker, feedback from the information environment, and decisions for subsequent queries based on this feedback (Marchionini, 1995). These interactions are standardized by our own personal information infrastructures (Marchionini, 1995). User interactions with information from visual sources may be placed on a continuum: at one end, focused, specific searching may be used, and at the other, browsing (Goodrum, Rorvig, Jeong, & Suresh, 2001). Browsing may require visual examination of images of interest, whereas search tasks may have need of the particularities of text (Goodrum et al., 2001). Browsing is a particularly relevant strategy for information searching in video game environments. Marchionini (1995) defines browsing as involving interactions between the information seeker and the system, heavily dependent on the information environment. Browsing provides an overview of a physical or conceptual space: people identify key landmarks and characteristics, and use them to form impressions of the scene and to make analogies to known scenes or concepts. It depends on human perceptual abilities to recognize and select rel-
414
Whippey
evant information. Browsing allows for discovery and learning. It may yield new insights, lead to serendipitous discoveries, or provide new insights. Browsing is an activity of engaging in a series of glimpses, each of which may or may not lead to the closer examination of an object, which may or may not lead to the acquisition of that object (physical and / or conceptual) (Bates, 2007). Browsing may often be a faster method for finding a known item. If searchers are reasonably sure of the location of the information required, and are able to recognize that information if they encounter it, the simplest method is to go directly to the information source (Wilson, 1992). In this view, browsing consists of numerous stops and starts, with activities such as reading or surveying alternative with sampling and selecting (Bates, 2007). Bates based her discussion of browsing on Kwasnik’s (1992) identification of six browsing activities: orientation, place marking, identification, anomaly resolution, comparison, and transition. According to Bates (2007), humans strongly depend on vision. It is our predominating sense. We often examine something only visually, or in combination with other senses such as hearing or smell. Humans take in a scene all at once in a massively parallel glimpse, then select or sample a spot within the glimpsed area to examine more closely. The glimpse notes basic features such as colour and movement. Sometimes, we select an object in particular to examine outside of the field of vision created by the initial glimpse. This selection allows for a higher level of processing, engaging in activities such as reading, object identification, or face recognition. Browsing allows for exploratory behaviour, something which often occurs in video game play. There are several strategies, identified by Marchionini (1995) which players may employ while searching and retrieving information. Scanning allows people to compare a well-defined set of objects with an object that is clearly represented in the information seeker’s mind. Monitoring is similar to scanning, but also allows for attention to concepts that may be related to another topic of interest. Observational strategies are used when people assume they are in a promising environment and react to stimuli from that environment. Finally, navigation strategies balance the influence of the user and the environment. Objects must be specifiable and information seekers must know what they are seeking, actively interacting with the environment while reflecting and making decisions about progress. Bates (2007) refers to observational strategies as berrypicking or grazing. World of Warcraft is a game which uses visual, audio, and textual elements to provide information to the player in an immersive environment. It is a highly unique and complex virtual world, making it an ideal space for information research. It contains rich visual and audio experiences, multiple narratives, and a large, interactive player population. This environment lends itself to browsing
Chapter 16. Non-textual information in gaming
415
strategies, allowing players to be surrounded by information and to retrieve what they deem necessary for play.
Methodology This exploratory study utilizes my own experiences in WoW to examine how information is presented through non-textual means, as well as the searching strategies and retrieval methods utilized in play. The methods of autoethnography are particularly useful for this research. Autoethnography is an autobiographical genre of writing and research that displays multiple layers of consciousness (Ellis, 2004). As a form of ethnography, autoethnography places emphasis on the research process (graphy), on culture (ethno) and on self (auto) (Chang, 2008; Ellis & Bochner, 2000). Autoethnography enhances the cultural understanding of both self and others, having the potential to transform (Chang, 2008). This study is a reflexive autoethnography. It focuses on a culture and space (WoW) and uses my own life story in that culture to look more deeply at interactions within that world (Ellis, 2004). I have been playing WoW since 2005 and have spent over 2,100 hours in the game, or 90 consecutive days. I have three characters at the highest level (level 85, at the time of writing) and 40 other characters at various stages of progress in the game. I have experienced the starting area of every race, as well as every character class. I have also experienced a variety of play, including player-versus-player combat (in which players of the opposite faction fight against each other) as well as dungeons and raids (group activities made up of 5 – 40 people). This experience has allowed me to amass a great deal of knowledge about the game. It is important to have an in-depth understanding of any video game under study, in order to be able to navigate the game and understand the phenomena observed. It also limits illogical leaps and misunderstandings. In fact, gaming communities have begun to speak out of research on games done with little background knowledge (Lavigne, 2009). For this study, I have utilized my play experiences and knowledge, revisiting non-textual information the game through reflection as well as additional play. I have reviewed previous journals and my collection of over 300 screenshots. In my additional play, I recorded the audio information with a voice recorder, took further screenshots to record still images of the visual phenomena, wrote field notes, and recorded any textual information using the ‘chatlog’ command in WoW (a command that records all text that appears in the chat portion of the
416
Whippey
game screen). In the next section, I will describe some of what I encountered in my experience, focussing on non-textual information.
Journeying through World of Warcraft When entering WoW for the first time as any character, I was shown a brief cinematic sequence, providing me with an introduction to the situation in which I will be placed. Cinematics can be likened to movie trailers: they provide a brief overview of the story, and anticipation of what is to come. They include rich visual images, sound effects, and background music. When playing my Worgen Druid, once the cinematic was over, I found my avatar standing in the starting area. Each zone in WoW has its own distinct design and feel. For example, the Worgen starting area is reminiscent of Victorian England in architecture. I saw an NPC standing nearby with a yellow exclamation point over his head. An exclamation mark signifies that an NPC has a quest available for the player to complete, while a yellow question mark indicates that a player has a completed quest to hand in to the NPC. This allows players to easily determine what activities may be available to them. When I clicked on the NPC, Prince Liam Greymane, a page of quest text, describing the task at hand, appeared on the screen accompanied by the sound of a page turning. When I accepted the quest, a drumming sound played. I turned my character and began to walk away in order to complete my quest: to find another NPC named Lieutenant Walden. As I moved away from the prince, his voice faded until I could no longer hear it, and new ambient sounds (such as paper blowing down the street) became apparent. If I neared groups of friendly NPCs fighting enemies, I could hear the cries of battle and clash of swords. A minimap is available to players in the right-hand corner of the screen to assist in navigation (See Figure 1, A). A bright yellow arrow on the minimap points to where a player needs to go to complete a quest. A larger, two-dimensional, map of the area a player is in may also be viewed in the game by pressing the “M” key. I found Lieutenant Walden lying dead on the cobblestone streets, surrounded by crows. The crows flew away as I approached. It is details such as this, crows gathering at a dead body but flying away when approached by an avatar, that contribute to the fully immersive experience of a game. Although WoW is a fantasy based environment, there are many elements that are akin to what would occur in the actual world (such as crows gathering around a corpse). Upon completion of the quest (finding the body) a triumphant trumpet sound played, signifying
Chapter 16. Non-textual information in gaming
417
with audio that I had accomplished my task. I was also rewarded with experience points. The next quest I encountered required me to enter combat, a situation in which audio-visual information is extremely important. In order to enter combat with an enemy NPC, the enemy must first be targeted by clicking on the NPC. When the NPC is targeted, a large circle will appear beneath the NPC, coloured yellow (if the NPC is neutral to the player) or red (if the NPC is unfriendly to the player). Neutral NPCs will not attack until provoked, while unfriendly NPCs will attack as soon as players are within their range. In combat, spells and abilities are shown through visual and auditory means. For example, when a Druid in cat form attacks a target, the character’s paw swipes at the target and the sound of a growl is heard. As I progressed through the game on a character, I learned new spells. Each spell has its own distinct animation. As soon as an enemy is engaged, it will begin to attack. I occasionally grunted when hit by an enemy, and the enemy would also grunt, yell, or whine when hit. In the course of gameplay, most enemies (as well as player avatars) make some kind of audible noise when fighting, as well as when they die. When an enemy dies, items or money are often left on the corpse. This is commonly referred to as “dropping loot.” The corpse became highlighted by yellow sparkles, and the shape of the cursor changed to a bag when I moved the cursor over the body. Clicking on the body resulted in a small window opening on the screen, accompanied by the sound of a bag opening. By clicking on the items, I transferred them into my own inventory. The sound of coins clinking played whenever I picked up loot from a corpse. Quests may also entail the gathering of particular objects. For example, a quest required me to salvage supplies. The boxes of supplies were highlighted by yellow sparkles around the object. As I “opened” the box, the sound of unwrapping paper, then a creak was played as the box opened. The sound of a click signified that the item had been received, and a soft sound, like something had been slid into a bag, played to illustrate that it had been placed in my inventory. In major cities, in which many players and NPCs can be found “hanging out,” there is a multitude of information to be absorbed. For example, I may want to find my class trainer, so that I can learn new spells. In this case, I must determine where this NPC is, by walking and searching through the city, getting directions from a guard (see Figure 1, B), or by using the trainer tracking function on the map. Directions can be retrieved by clicking on the guard and selecting the appropriate trainer. Once selected, a yellow arrow will appear on the mini-map pointing in the direction of the trainer. If the trainer tracking mode is used, an icon will appear on the mini map signifying the location of the trainer. However, this icon will only appear once I am within visual range.
418
Whippey
Discussion The visual It is evident from the brief description of gameplay above that audiovisual elements of information are important for both new and experienced players. The game graphics, cinematics, sound, and music help the player learn about WoW, both in terms of how to play as well as the narrative of the game. In the following discussion of WoW as a source of non-text information, I will examine the ways in which this information is presented in WoW and how players may search and retrieve information through these means, utilizing examples from the description of my play experience. In WoW, the primary elements of colour, shape, size, orientation, and motion are used to create the visual environment. The orientation and physics of the virtual world are akin to those we are familiar with in the actual world. This familiarity allows players to identify and categorize the scene. Players can use their personal information infrastructures to regulate and standardize their interaction (Marchionini, 1995). The standard point of perception is third-person; however, players can manipulate the in-game camera to adjust their view, and can choose to use a first-person perspective. The camera helps to narrate the space to the player: it selects, frames and interprets the visual, aiding in the gathering of information (Nitsche, 2008). Players can rotate the camera in a full circle around the avatar, use a mix of first- and third-person perspective, and zoom in and out. However, it should be noted that players have an additional co-existing point of perception to the game environment: the in-game map (Järvinen, 2002). The game’s cartoon-like appearance gives the impression that Azeroth is a caricature of the actual world, as well as fantasy worlds with which most are familiar, such as Tolkien’s Middle Earth. Nardi (2010) found that players often discussed and evaluated WoW’s visuals, discussing game artwork, colours, character images, animations, buildings, and game geography, and I have seen this echoed in my own observations. Players highlighted the importance of the visual as a part of the play experience. Nardi (2010) likens WoW’s design with a kind of theatre in which the audience and performers are one: players have much to look at, and are ‘onstage’ simultaneously. The visual environment is extremely rich, with detailed architecture, landscapes, and objects. The architecture and landscape present an immersive spatial experience (McGregor, 2006). Players make their way into buildings and cities, through forests, across hills, through lakes and streams into underwater realms, and into mountain ranges. All terrain in WoW can be entered by players, with
Chapter 16. Non-textual information in gaming
419
functionality expressed to the player as a function of simulated physical properties (McGregor, 2006). Each zone in the game has its own distinct character, habitat, and assigned level of difficulty (McGregor, 2006). Each zone enables, limits, and contextualizes the player’s perspectives and activities (Shaw & Warf, 2009). Players learn how to interpret the physical environments in order to obtain information about how to navigate the world and proceed in the game (Gee, 2007). The visual environment invites browsing: it is designed to encourage players to explore and try different methods of navigating the world. The game uses elements of colour and motion to identify objects in the game that are of particular relevance to the player. For example, objects that the player has been asked to look for will be surrounded by yellow sparkles, which draw the player’s visual attention. In Bates’ (2007) terms of browsing, the glimpse allows players to notice the change in texture, and players can choose to examine the spot that is sparkling more closely by approaching it and clicking on it with the mouse. Non-player characters are also an important part of WoW, and their visual presence conveys important information to the player. All characters in WoW are sexually dimorphic, and thus, it is easy to identify the gender of a character. As previously mentioned, symbols above the head of an NPC (an exclamation mark or a question mark) indicates what they may have to offer the player. The colour of the NPC’s target circle, as well as the colour of his or her name on the screen illustrates if they are friendly (green), neutral (yellow) or unfriendly (red) to the player. The cursor may also change, depending on what function an NPC serves. For example, if an NPC is a vendor that a player can sell things to, the cursor will change from a gauntlet (the default cursor) to a small bag. The cursor may also change when hovering over a dead enemy: if there are items to be picked up from the corpse, such as money or armour, the cursor will appear as a small bag. The visual changes provide feedback to the player, which will influence the next decision that is made. Some NPCs do not move very much, while others walk a predetermined or random path in a set territory (Bainbridge, 2010). NPCs that walk around an area at random give the illusion that they have free will, contributing to the immersive experience of the game (Bainbridge, 2010). The realism of NPCs is enhanced by their physical movements and occasional speech (Bainbridge, 2010). The increased amount of speech from NPCs with the Cataclysm expansion has increased the sense of immersion in NPC interactions.
420
Whippey
The audio Audio in video games makes an important contribution to the information that a player receives from the game system. Game audio works as a support for the game world, as well as providing information about the immediate surroundings. The four kinds of audio, ambient sounds, sound effects, vocalizations, and music, contribute to the information that a player receives, as well as to the overall immersive effect of the game. Ambient sounds in WoW tend to fade into the background. These include the sound of a paper drifting down the cobbled street, or the sound of the rain hitting the ground. During my play, I found that I was not highly cognizant of these sounds, but they did contribute to my gameplay. For example, the sound of rain helped to alert me to the fact that weather has been introduced to a given area. Ambient sounds contribute to the atmosphere of the game. Many sounds are easily recognizable, allowing the player to quickly identify what they signify. Sound effects are an essential part of audio information in video games, as they provide the players with cues of what is happening around them. For example, when I entered an area with spiders, if I heard a sound that could be likened to that of a spider hissing, I knew that I may be soon under attack. I could also hear the shouts of NPCs in battle around me. The volume of these sounds indicates how close an enemy is, providing valuable information about what I should do in a combat situation. Thus, sound is also useful in situations where the visual system is not available: it can provide information about events that are located out of the player’s line of sight (Jørgensen, 2008b). The comprehension of a sound also relies on the context. In WoW, the same sounds can be made by friendly or unfriendly NPCs. For example, when a Rogue goes into “stealth mode” (a mode which allows them to remain invisible) a fading sound is heard. In a battle situation, this could be a positive thing, if an NPC is on the same side as the player, or a negative, if the NPC is on the opposite side. Players need to understand what event generates a sound and what it means in a given context (Jørgensen, 2008a). If a sound is not properly identified, players must browse further to determine the situation at hand. What Jørgensen (2008a) refers to as “earcons” are commonly used in WoW: these are artificial noises, sound bursts, and musical phrases. WoW often combines these sound signals into hybrid sounds that are partly recognisable as an actual world sound (Jørgensen, 2008a). For example, when my Druid in cat form swiped at a target, I heard the growl of a large cat such as a tiger, something that is recognizable from the actual world. In addition, vocalizations are a valuable source of information. NPCs may provide information to players through spoken dialogue. Many humanoid
Chapter 16. Non-textual information in gaming
421
NPCs will have some kind of pre-recorded speech, appropriate to their role. For example, vendors may say “Hi, how are you?” when a player clicks on them. With the recent Cataclysm expansion, Blizzard has added to the number of vocalizations by NPCs. Player avatars also have vocalizations in the game. For example, if a player fails to target an enemy, but casts a spell, the avatar will announce “I don’t have a target.” These notifications can be classified as positive, neutral, or negative (Jørgensen, 2008a). They provide instant feedback to the player, informing them that they have not been successful in performing an action. This is an example of reactive audio, as it occurs in response to a player action. Player avatars can also provide dialogue on command. Players can type “ / joke” or “ / flirt” and the avatar will tell a pre-programmed joke or flirtatious line. Music is another form of audio that, in WoW, has a different function than dialogue, sound effects, or ambient sound. In WoW, much of the music is orchestral. Each zone has distinct music, as well as the introductory game menu. During my play, I found that music often faded to the background of my consciousness. I noticed when it was present, when it changed, and had a mild sense of what emotions it evoked in me, but it did not strike me as having a strong presence in the game. In WoW, players can choose to have the music fade in and out according to where they are, or to loop so that it plays continuously.
The audio-visual Visual and audio information work together as an information system that provides functional, narrative, and navigational information, as well as immersion with the game world. Environmental storytelling creates the preconditions for a more immersive narrative experience (Jenkins, 2004). The narrative of a game (constructed through non-textual elements as well as text) provides a contextual framework through which actions come to gain meaning (Journet, 2007). Through gaining meaning, players become more engaged with the game. Engagement and immersion is a key part of what makes a good game. O’Brien & Toms (2008) state that user engagement is affected by attributes of aesthetic and sensory appeal, attention, feedback, interactivity, challenge, positive affect, endurability, and perceived user control. Aesthetic experiences, which stem from visual and auditory information, are intrinsically motivating, requiring focussed attention and stimulating curiosity. Engagement is often initiated by the resonance of the aesthetic and informational aspects of the game with the players. These elements capture attention and interest, compelling players to become more engaged. While the importance of visual and audio information may seem
422
Whippey
obvious, it is interesting to note that this aspect of games has not been highlighted in the literature. While I have not discussed the textual elements of WoW in great detail, they play a major role within the game. Text has always been a key element of video games: early online fantasy games were played entirely through text. The quests in WoW are all presented through text: currently, there is no mechanic that players can use to determine the quest objective without using the text provided. The text may also help to elucidate what players are learning visually. The tradition of text-based quests has just begun to be challenged: SWTOR, in contrast to WoW, provides quest information through cinematics. In the future, games are likely to utilize greater amounts of audio-visual information. Communication with other players in WoW is also heavily text based. While voice chat is available, text is the primary mode of communication. Thus, while the use of text in games may change in future game development, text cannot be forgotten: it is a part of the overall information system within the game. It is naive to believe that any single interface can serve the needs of all users for all tasks (Marchionini, 1995). WoW players can make adjustments to the game’s interface to best suit their play. For example, players can add extra action bars (see Figure 1, C) so they can have a quick, visual method of seeing the spells and abilities available to them. Humans often arrange our physical and social environment so as to provide the information we need when we need it (Bates, 2002). WoW players are also able to enhance or alter their interface, through both visual and audio means, by using user-created modifications, or mods. Mods alter the user interface, but cannot change game terrain, character design, quests, or class character abilities (Kow & Nardi, 2010). Players with an interest in extending the game write mods and make them freely available to other players (Kow & Nardi, 2010). A popular mod used in raiding is Deadly Boss Mods, which provides additional visual, auditory, and textual information during fights with the hardest enemies in the instance. Mods allow players to customize their interface, arranging the environment to suit their information needs. This provides control of the system for the user, maximizing interactivity in the interface design. In order to support the information search and retrieval process, information system designers are challenged to provide mechanisms for the access, display, and manipulation of information, to display these representations to facilitate interpretation, and to support the extraction and manipulation of information from them (Marchionini, 1995). WoW is such an information system. Players can access and manipulate information through graphical, auditory, and textual means. Through this manipulation and access, players are able to extract the information that they require to progress in the game.
Chapter 16. Non-textual information in gaming
423
Bates (2007) notes that the design of interactive information systems also need to include an awareness of human browsing behaviour. During my play, I found myself frequently browsing the environment for information. Whenever I entered a new area, I examined it for key landmarks and characteristics, such as buildings that could be explored or NPCs. Some buildings in WoW, such as inns, may have similar layouts. Through browsing in the environment, I was able to determine if the layout was familiar. This is akin to Marchionini’s (1995) scanning strategy. If I heard a particular sound, such as the fading sound that indicates a hidden enemy, I visually browsed the immediate area in order to quickly determine where the creature was that made that sound. This example also illustrates Wilson’s (1992) point that when browsing is employed to find a known item, the search process is often quickly completed. While playing WoW, I primarily browsed utilizing observational or navigation strategies. When utilizing an observational strategy, I recognize that I am in an interesting area and react to stimuli as it is brought to my immediate attention. For example, when I was completing quests to discover what happened in the Worgen city of Gilneas, I reacted to what I saw and heard as it became relevant. When I saw a friendly NPC running into battle, I could then choose to either join the battle, or to stand back and observe. An observational strategy allows players to explore their environment at their leisure, making choices of what information is important at a particular point in play. In contrast, when looking for a specific piece of information (such as the location of an NPC), I employ a navigation strategy, actively interacting with the environment while reflecting on my progress and making decisions about what course of action to take. For instance, when a quest asks me to find a particular object, I know that I am looking for an object that is surrounded by yellow sparkles. When conducting this form of browsing behaviour, I explore my environment, examining the area for the object of interest. Once I find the object, I can choose how to interact with it, by either clicking on it, or choosing to continue exploring. It is also possible to stumble across other objects of interest during browsing, which may cause the search task to change. WoW is not a linear game: players are free to choose what to explore, which quests to complete, and how they will experience the game. Thus, browsing is an ideal method for information seeking and retrieval in a video game environment. Browsing for information should enable searchers to manifest the instinctive tendency to begin to browse. As Bates (2007) states, good browsing interfaces should consist of rich scenes containing many potential objects of interest. This is what WoW, and video games, provide: a complex, immersive environment which provides an enormous amount of information that may be explored. The
424
Whippey
eye can take it in at once, through massive parallel processing, and then select items within the scene which require closer attention.
Conclusion WoW is truly a rich and complex audio-visual environment. Players gain large amounts of information from the audio-visual elements of the game, as well as from text. The information within the game provides an understanding for how the game should be played, and how to behave in specific in-game contexts. Through visual and audio information, players learn the rules of the game. Visual and audio elements also contribute to the immersion of the game, providing a more engaging experience. Video games and virtual environments have many positive implications for information searching and retrieval. As Marchionini (1995) notes, environments should invite browsing, encouraging people to exercise curiosity. Video games encourage browsing behaviour and information searching, as players always require information to progress in the game. These principles can be applied to the creation of new, richer information spaces. Currently, search engines are generally text-based. Users type in their keywords and are provided with an itemized list of results. Some search engines, such as Google, are made with the goal to be highly user-friendly: they do not require the creation of keywords or a formal searching strategy. However, what if a virtual search environment was available? It would be far more interactive and immersive, for example, to search through an archaeological site for artefacts in a virtual world to gain information about the past, as opposed to reading about them on a website and viewing pictures. Three-dimensional representations of objects could be readily created, allowing users to view them from all perspectives. The technology and design from video games has the potential to be utilized for this kind of endeavour. This exploratory study has provided an overview of how non-textual information is presented in a video game setting. It increases the amount of knowledge the player has about the game, provides important contextual information, and increases the immersive and engaging qualities of play. The non-textual elements from video games may be taken and utilized to create better, more interactive and immersive information retrieval systems. As people spend increasing amounts of time in virtual worlds and games, it is imperative to have a better understanding of how they obtain information, and how this information can be used. This study adds to the existing literature concerning non-textual information and
Chapter 16. Non-textual information in gaming
425
information seeking, introducing a new environment, and new possibilities, to the on-going discussion.
References Adams, S. S. (2009). What games have to offer: Information behavior and meaning-making in virtual play spaces. Library Trends, 57(4), 676 – 693. Arnheim, R. (1974) [1954]. Art and visual perception: A psychology of the creative eye: The new version. Berkeley: University of California Press. Bainbridge, W. (2010). The Warcraft civilization: Social science in a virtual world. Cambridge: MIT Press. Bates, M. J. (2002). Toward an integrated model of information seeking and searching. New Review of Information Behaviour Research, 3, 1 – 15. Retrieved from http: // gseis.ucla.edu / faculty / bates / articles / info_SeekSearch-i-030329.html Bates, M. J. (2007). What is browsing – really? A model drawing from behavioural science research. Information Research, 12(4). Retrieved from http: // informationr.net / ir / 12 – 4 / paper300.html Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94(2), 115 – 147. Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14, 143 – 177. Blizzard Entertainment. (2010, October 7). World of Warcraft subscriber base reaches 12 million worldwide. Retrieved from http: // us.blizzard.com / en-us / company / press / pressreleases. html Buckland, M. (1991). Information and information systems. New York: Praeger. Castronova, E. (2007). Exodus to the virtual world: How online fun is changing reality. New York: Palgrave MacMillan. Chang, H. (2008). Autoethnography as method. Walnut Creek, CA: Left Coast Press. Collins, K. (2007). An introduction to the participatory and non-linear aspects of video games audio. In S. Hawkins & J. Richardson (Eds.), Essays on sound and vision (pp. 263 – 298). Helsinki: Helsinki University Press. Collins, K. (2008). Game sounds: An introduction to the history, theory, and practice of video game music and sound design. Cambridge: MIT Press. Corneliussen, H. G., & Rettburg, J. W. (Eds.). (2008). Digital culture, play, and identity: A World of Warcraft reader. Cambridge: MIT Press. Ducheneaut, N., Yee, N., Nickell, E., & Moore, R. J. (2006). Building an MMO with mass appeal. Games and Culture, 1(4), 281 – 317. doi:10.1177 / 1555412006292613 Dyer-Witheford, N., & de Peuter, G. (2009). Games of empire: Global capitalism and video games. Minneapolis: University of Minnesota Press. Egenfeldt-Nielsen, S., Smith, J. H., & Tosca, S. P. (2008). Understanding video games: The essential introduction. New York: Routledge. Ellis, C. (2004). The ethnographic I: A methodological novel about autoethnography. Walnut Creek, CA: Alta.M.ira Press.
426
Whippey
Ellis, C., & Bochner, A. P. (2000). Composing ethnography: Alternative forms of qualitative writing. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (2nd ed.) (pp. 733 – 768). Thousand Oaks, CA: Sage. El-Nasr, M. S., & Yan, S. (2006). Visual attention in 3D video games. Proceedings of ACE 06: Proceedings of the 2006 ACM SIGCHI International Conference on Advances in Computer Entertainment Technology. Hollywood. Gee, J. (2007). What video games have to teach us about learning and literacy. New York: Palgrave MacMillan. Goodrum, A. A., Rorvig, M. E., Jeong, K., Suresh, C. (2001). An open source agenda for research linking text and image content features. Journal of the American Society of Information Science and Technology, 52(11), 948 – 953. Järvinen, A. (2002). Gran stylissimo: The audiovisual elements and styles in computer and video games. In F. Mayra (Ed.), Proceedings of Computer Games and Digital Dultures Conference (pp. 113 – 128). Jenkins, H. (2004). Game design as narrative architecture. In N. Wardrip-Fruin & P. Harrigan (Eds.), First person: New media as story, performance, and game (pp. 118 – 130). Cambridge, MA: MIT Press. Journet, D. (2007). Narrative, action and learning: The stories of Myst. In C. L. Selfe & G. E. Hawisher (Eds.), Gaming lives in the twenty-first century: Literate connections (pp. 93 – 120). New York: Palgrave MacMillan. Jørgensen, K. (2008a). Audio and gameplay: An analysis of PvP Battlegrounds in World of Warcraft. Game Studies, 8(2). Retrieved from http: // gamestudies.org / 0802 / articles / jorgensen Jørgensen, K. (2008b). Left in the dark: Playing computer games with the sound turned off. In K. Collins (Ed.), From Pac-Man to pop music: Interactive audio in games and new media (pp. 163 – 176). Hampshire: Ashgate Publishing Limited. Kwasnik, B. H. (1992, August). A descriptive study of the functional components of browsing. In J. A. Larson & C. Under (Eds.), Proceedings of the IFIP TC2 / WG2.7 Working Conference on Engineering for Human-Computer Interaction (pp. 191 – 203). Kow, Y. M., & Nardi, B. A. (2010). Culture and creativity: World of Warcraft modding in China and the US. In W. Bainbridge (Ed.), Online worlds: Convergence of the real and the virtual (pp. 21 – 41). London: Springer-Verlag. Lavigne, C. (2009, May 25). Why video game research is flawed. Maisonneuve: A Quarterly of Arts, Opinions & Ideas. Retrieved from http: // www.masionneuve.org / pressroom / article / 2009 / may / 25 / why-video-game-research-is-flawed / Marchionini, G. (1995). Information seeking in electronic environments. Cambridge, UK: Cambridge University Press. McGregor, G. L. (2006). Architecture, space and gameplay in World of Warcraft and Battle for Middle Earth 2. In Proceedings of the 2006 International Conference on Game Research and Development (pp. 69 – 76). Nardi, B. A. (2008). Mixed realities: Information spaces then and now. Information Research, 13(4). Retrieved from http: // informationr.net / ir / 13 – 4 / paper354.html Nardi, B. A. (2010). My life as a night elf priest: An anthropological account of World of Warcraft. Michigan: University of Michigan Press. Neal, D. M. (2010). Emotion-based tags in photographic documents: The interplay of text, image, and social influence. Canadian Journal of Information and Library Science, 34(3), 329 – 353.
Chapter 16. Non-textual information in gaming
427
Nitsche, M. (2008). Video game spaces: Image, play, and structure in 3D game worlds. Cambridge: MIT Press. O’Brien, H. L., & Toms, E. G. (2008). What is user engagement? A conceptual framework for defining user engagement with technology. Journal of the American Society of Information Science and Technology, 59(6), 938 – 955. Rettburg, S. (2008). Corporate ideology in World of Warcraft. In H. Corneliussen & J. W. Rettburg (Eds.), Digital culture, play and identity: A World of Warcraft reader (pp. 19 – 38). Cambridge: MIT Press. Shaw, I. G. R., & Warf, B. (2009). Worlds of affect: Virtual geographies of video games. Environment and Planning A, 41(6), 1332 – 1343. Taylor, T. L. (2006). Play between worlds: Exploring online game culture. Cambridge: MIT Press. Ware, C. (2008). Visual thinking for design. Burlington, MA: Morgan Kaufmann Publishers. Wilson, P. (1992). Searching: Strategies and evaluation. In H. D. White, M. J. Bates & P. Wilson (Eds.), For information specialists: Interpretations of reference and bibliographic work (pp. 153 – 181). Norwood, NJ: Ablex Publishing Corporation. Winget, M. A. (2011). Videogame preservation and massively multiplayer online role-playing games: A review of the literature. Journal of the American Society of Information Science and Technology, 62(10), 1869 – 1883. Wood, S. (2009). Video game music: High scores: Making sense of music and video games. In G. Harper (Ed.), Sound and music in film and visual media: An overview (pp.129 – 148). New York: The Continuum International Publishing Group Inc. Zehnder, S. M., & Lipscomb, S. D. (2006). The role of music in video games. In P. Vorderer & J. Bryant (Eds.), Playing video games: Motives, responses and consequences (pp. 241 – 258). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.
428
Whippey
Appendix A: Figures
Figure 1: The World of Warcraft Interface. Screenshot taken by author.
A: The mini map: Players can use this map to orient themselves in the game environment. The mini-map is always present on the screen. Players can press ‘M’ to view a larger map. B: A guard from the city of Orgrimmar. Guards provide information about the location of NPCs and important buildings. C: Action bars: The action bars contain visual icons of spells and abilities characters may access. Players can choose to display extra action bars on the screen.
Index ambient sounds 411, 420 appraisal process 343, 351, 355, 356, 357, 360, 368 architecture 42, 52, 57, 78, 119, 283, 287, 290, 291, 310, 312, 313, 320, 326, 416, 418, 426 audio information 406, 412, 415, 420, 421, 424 autoethnography 9, 406, 415, 425 basic level theory 57, 185, 187, 188, 199, 202, 205, 210 bilingual environment 40 blogs 35, 83, 88, 89, 93, 96, 101 browsing 6, 17, 43, 44, 47, 83, 84, 85, 87, 93, 94, 95, 97, 101, 103, 106, 109, 139, 150, 164, 178, 187, 193, 205, 210, 231, 239, 240, 241, 257, 258, 268, 280, 283, 406, 413, 414, 419, 423, 424, 425, 426 buildings 287, 290, 291, 297, 302, 303, 308, 340, 418, 423, 428 CDWA 67, 75, 77 cinematic 242, 359, 360, 406, 412, 416 classification 9, 15, 16, 17, 19, 22, 23, 31, 32, 33, 34, 35, 36, 37, 38, 39, 42, 48, 52, 53, 75, 83, 84, 233, 236, 260, 261, 262, 270, 311, 314, 315, 316, 317, 319, 324, 325, 328, 337, 339, 370, 374, 375, 379, 385, 404, 405 cognitive theory 188, 343, 372, 402 cold-start 376 commotion 343, 356, 357, 368 content analysis 8, 55, 83, 84, 88, 89, 188, 194, 209, 210, 247, 248, 257, 374, 375 controlled vocabulary 21, 41, 42, 53, 54, 57, 215, 221, 236, 259, 282, 287, 292, 293, 298, 312, 313, 364, 377, 378 cross-genre 15, 16, 30, 31, 32, 37, 91, 104 depicted emotion 343, 357 digital libraries 40, 57, 77, 107, 138, 139, 156, 157, 185, 187, 188, 205, 234, 282
Dr. Seuss 72, 78 Dublin Core 67, 77, 141, 157, 190, 192, 207, 208, 232, 238, 289 editorial cartoons 59, 60, 61, 65, 66, 67, 68, 69, 71, 72, 73, 74, 75, 76, 77, 78 emotional information retrieval 9, 210, 343, 374, 376 engagement 29, 62, 86, 94, 95, 115, 116, 122, 126, 127, 135, 207, 213, 393, 406, 413, 421, 427 engineering drawings 314, 315, 319, 321, 322, 328, 329, 339, 340 experimentation 128, 129, 212 facet analysis 58, 234, 242, 245, 246, 257, 259, 262 faceted metadata ontologies 268 felt emotion 343, 350, 357 film archives 234, 257 film discovery and access 234 filmic analysis 234, 235, 242, 243, 244, 245, 246, 248, 259 Flickr 186, 194, 280, 313 folksonomies 208, 209, 261, 282, 373 FRBR 141, 143, 157, 158, 159, 160, 161, 163, 164, 165, 170, 178, 179, 180, 181, 240, 261, 262, 282 free-text tagging 212, 230 game sounds 412 gamification 9, 376, 393, 394, 401, 402, 403, 405 gazetteers 208, 263, 265, 280 genre 6, 8, 15, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 86, 87, 91, 92, 96, 97, 101, 103, 104, 113, 137, 138, 139, 140, 141, 143, 144, 145, 148, 149, 151, 152, 154, 155, 156, 160, 161, 162, 170, 173, 174, 175, 176, 177, 178, 180, 234, 235, 239, 240, 241, 247, 248, 250, 253, 254, 255, 258, 259, 296, 359, 415
430
Index
geospatial ontologies 263, 264, 265, 267, 268, 269, 270, 274, 277, 279, 280, 283 geotagging 185, 186, 192, 193, 194, 202, 205, 206 historical documents 59, 61, 68, 69, 79 image access 59, 188, 234, 238, 240, 311 image analysis 59, 235, 236, 294 image based retrieval system 314 image description 40, 59, 62, 63, 72, 73, 75, 208, 296 image indexing 40, 41, 45, 46, 48, 53, 59, 72, 78, 185, 188, 203, 236, 238, 287, 294, 315, 341 image queries 59, 64, 179 image retrieval system 65, 160, 208, 314, 339, 341 image tagging 46, 188, 192, 206, 207, 212, 213, 214, 215, 219, 220, 231, 232, 261, 370 IMDb 160, 161, 163, 165, 166, 167, 168, 170, 171, 173, 175, 177, 178, 179, 180, 238, 240, 241, 257 incentive system 376, 377, 391, 393, 394, 396 indexing 1, 5, 6, 7, 8, 9, 11, 15, 17, 21, 22, 33, 41, 42, 46, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 61, 63, 70, 71, 72, 75, 76, 92, 97, 98, 134, 185, 186, 187, 188, 189, 190, 191, 202, 204, 205, 208, 209, 210, 214, 215, 232, 234, 235, 236, 238, 245, 246, 260, 261, 280, 281, 287, 288, 289, 290, 291, 292, 293, 295, 296, 297, 298, 299, 310, 311, 312, 313, 314, 315, 316, 328, 339, 340, 341, 343, 344, 345, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 374, 376, 377, 380, 381, 392, 401, 404 information behaviour 8, 83, 425 information practices 111, 112, 114, 115, 116, 125, 130, 131, 132, 133, 134 information retrieval 6, 7, 8, 16, 30, 35, 36, 37, 38, 50, 56, 57, 78, 83, 86, 93, 97, 106, 107, 108, 111, 113, 158, 159, 180, 188, 207, 234, 238, 260, 295, 300, 311, 312,
314, 316, 317, 341, 372, 377, 380, 386, 403, 404, 406, 407, 424 information science 1, 4, 5, 7, 32, 37, 102, 141, 161, 192, 207, 234, 235, 237, 242, 271, 312, 313, 314, 315, 316, 322, 340, 342, 343, 359, 369, 372, 406, 407 information searching 93, 406, 407, 413, 424 information seeking behaviour 56 information systems 9, 83, 84, 85, 89, 93, 97, 100, 106, 111, 112, 134, 192, 237, 268, 269, 282, 413, 423, 425 interface 44, 72, 73, 121, 144, 151, 153, 155, 176, 212, 213, 214, 219, 220, 221, 222, 224, 225, 226, 228, 230, 231, 240, 261, 269, 271, 328, 329, 331, 365, 388, 391, 422 language issues 40 library catalogue 5, 137, 141, 142, 160, 163, 164, 165, 180 Mandeville Special Collections Library 72 map-based visualizations 263, 270, 271, 281 MEMOSE 9, 343, 345, 365, 367, 369, 371, 376, 377, 378, 379, 380, 383, 384, 387, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 404 metadata 5, 9, 20, 31, 40, 41, 43, 49, 56, 58, 59, 67, 73, 75, 77, 78, 84, 86, 91, 95, 97, 99, 101, 102, 103, 104, 105, 107, 159, 162, 185, 186, 187, 190, 192, 193, 207, 208, 210, 238, 239, 260, 263, 264, 267, 268, 270, 271, 272, 280, 281, 283, 298, 314, 316, 322, 323, 324, 328, 329, 336, 338, 339, 340, 392, 401 methodological approach 40, 45, 46, 350 moving image retrieval 9, 160, 161, 163, 164, 165, 166, 168, 179, 234 moving images 152, 190, 198, 200, 242 multilingualism 40, 50 multimedia documents 185, 190, 343, 345, 349, 354, 363, 364, 365, 366, 368, 369, 371, 376, 377, 379, 384, 392, 393, 400, 404 music catalogue 137, 139, 155 music education 113, 120
Index
music information retrieval 35, 37, 38, 83, 85, 106, 107, 108, 139, 157, 158, 159, 372, 375 music knowledge 111, 112, 113, 114, 115, 117, 118, 125, 126, 127, 128, 129, 130, 132, 134, 135, 136 music system features 137 navigation strategies 414, 423 non-player characters 419 observational strategies 414 online catalogue 158, 160, 163, 166, 270 ontology 42, 56, 190, 192, 207, 212, 213, 214, 216, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 264, 265, 266, 267, 268, 269, 271, 272, 273, 274, 275, 277, 280, 282, 283, 310 point of perception 410, 418 political cartoons 8, 59, 60, 61, 62, 63, 71, 72, 76, 77, 78 recommender systems 15, 29, 38 scanning 103, 328, 414, 423 self-directed learning 111, 112, 114, 117, 118, 121, 123, 126, 129, 130, 131, 132, 134 social tagging 46, 54, 185, 187, 188, 190, 191, 202, 210, 212, 213, 233, 313, 364, 370, 403 sound 5, 9, 17, 19, 20, 28, 30, 88, 100, 107, 118, 121, 140, 143, 146, 147, 148, 149, 155, 190, 242, 243, 251, 254, 256, 406, 407, 411, 412, 413, 416, 417, 418, 420, 421, 423, 425, 426 soundscape 413 speech 205, 238, 242, 407, 413, 419, 421 subject access 55, 164, 259, 287 subject authority files 287, 290, 300 subject heading languages 287, 300, 312 system design 8, 83, 84, 85, 88, 97, 100, 105, 111, 112, 160, 175, 178, 179, 188
431
system evaluation 137, 160 tagging 6, 7, 15, 21, 23, 28, 32, 34, 36, 40, 54, 56, 185, 186, 188, 189, 190, 191, 192, 196, 203, 205, 206, 208, 212, 213, 214, 215, 219, 220, 221, 223, 228, 229, 230, 232, 290, 343, 364, 369, 378, 385, 389, 390, 391, 392, 393, 395, 402, 403 taxonomy 8, 37, 40, 42, 43, 44, 45, 46, 47, 48, 49, 52, 53, 54, 55, 56, 87, 187, 364, 402 technical drawings 314, 320, 321, 325, 326, 332, 336, 337 UCLA-Cinema 160, 161, 163, 165, 166, 167, 168, 170, 171, 172, 173, 174, 176, 177, 178, 179 user-centred design 44, 111, 212 user needs 75, 83, 103, 138, 141, 160, 161, 164, 165, 168, 171, 177, 193, 240, 399 user perspective 49, 137, 140, 160, 161 user research 160, 164, 165 user study 137, 159, 160, 177 user tasks 42, 137, 140, 143, 144, 160, 161, 167, 168, 169, 170, 171, 177, 178, 179 video 4, 5, 21, 34, 65, 100, 105, 120, 153, 154, 162, 164, 171, 176, 178, 181, 185, 186, 189, 190, 194, 203, 206, 207, 212, 238, 239, 242, 343, 344, 368, 369, 370, 383, 398, 400, 406, 407, 410, 411, 412, 413, 414, 415, 420, 422, 423, 424, 425, 426, 427 video game 427 virtual environment 406 visual environment 410, 411, 418, 424 visual information 54, 56, 189, 207, 260, 287, 292, 403, 406, 410, 411, 417, 422 visual metadata 323, 324, 328, 339 VRA Core 67, 75, 79, 238 World of Warcraft 4, 406, 407, 408, 414, 416, 425, 426, 427, 428