120 34 19MB
English Pages 210 Year 2023
Amnesia Remembered
Digital Archaeology: Documenting the Anthropocene Series Editor: Andrew Reinhard (American Numismatic Society) The archaeology of the late twentieth and twenty-first centuries supplements traditional landscapes, sites, and artefacts with those that are digital. People increasingly inhabit digital places, investing time and money into spaces accessed only through screens. People and corporations continue to create these digital built environments and their supporting physical architecture at an astonishing rate for a rich diversity of purposes. This series aims to answer the questions of what the heritage of digital things and places looks like and how it can be understood archaeologically.
Volume 2 Amnesia Remembered: Reverse Engineering a Digital Artifact John Aycock Volume 1 An Enchantment of Digital Archaeology: Raising the Dead with Agent-Based Models, Archaeogaming and Artificial Intelligence Shawn Graham
Amnesia Remembered Reverse Engineering a Digital Artifact
--John Aycock
berghahn NEW YORK • OXFORD www.berghahnbooks.com
First published in 2023 by Berghahn Books www.berghahnbooks.com © 2023 John Aycock
All rights reserved. Except for the quotation of short passages for the purposes of criticism and review, no part of this book may be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system now known or to be invented, without written permission of the publisher.
Library of Congress Cataloging-in-Publication Data A C.I.P. cataloging record is available from the Library of Congress Library of Congress Cataloging in Publication Control Number: 2023004643
British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library
ISBN 978-1-80073-867-6 hardback ISBN 978-1-80073-868-3 ebook https://doi.org/10.3167/9781800738676
For Kathryn
Contents
List of Figures
viii
List of Tables
xii
Preface
xiii Part I. Pre-Excavation
Introduction
3
Chapter 1. Reconnaissance
15
Chapter 2. Evaluation
31
Chapter 3. Strategy and Research Questions
43
Part II. Excavation Chapter 4. Fragments
57
Chapter 5. Publisher Logo
70
Chapter 6. Text Encoding
88
Chapter 7. Interpreter
98
Chapter 8. Text Encoding, Revisited
111
Chapter 9. Parser
122
Chapter 10. Finding Locations
138
Chapter 11. Copy Protection
149
Part III. Post-Excavation Chapter 12. Analysis
167
Conclusion
180
Index
187
Figures
Figure 1.1. Text adventure example.
17
Figure 1.2. Mystery House (1980) added rudimentary graphics to text.
18
Figure 1.3. Thomas M. Disch in 1986.
19
Figure 1.4. Front cover of Harper & Row Amnesia packaging.
23
Figure 1.5. Front of the final Amnesia packaging.
24
Figure 1.6. Amnesia floppy disks.
26
Figure 2.1. Crack screen for “Dragon Lord” version of Amnesia.
32
Figure 2.2. Apple IIe computer system with Amnesia floppy disks for scale.
33
Figure 2.3. MAME debugger view of Amnesia.
36
Figure 2.4. Excerpt of printable strings in Amnesia .dsk images.
38
Figure 2.5. Applesauce disk imaging device.
39
Figure 4.1. Excerpt of printable strings in Apple DOS 3.3 .dsk image.
58
Figure 4.2. Excerpt of 7-bit printable strings in Amnesia .dsk images.
59
Figure 4.3. Sample of probable development-related strings in Amnesia .dsk images.
60
Figure 4.4. An interesting Amnesia fragment.
61
Figure 4.5. One screen of Amnesia Forth code.
64
Figures
ix
Figure 4.6. Basic addition in postfix notation.
65
Figure 4.7. A more elaborate postfix expression.
65
Figure 4.8. Interpreted code for a classical interpreter.
67
Figure 4.9. Interpreted code for a direct threaded code interpreter.
67
Figure 5.1. Exporting memory data from the emulator.
71
Figure 5.2. Injecting memory data into the emulated machine.
72
Figure 5.3. Debugger view when the display RAM watchpoint is reached.
75
Figure 5.4. Altair 8800b microcomputer, released in 1976.
77
Figure 5.5. Disassembly excerpt.
80
Figure 5.6. Another disassembly excerpt.
82
Figure 5.7. Logo data decompression algorithm.
84
Figure 5.8. Graphical logo reconstruction using a plotting program.
85
Figure 5.9. Graphical logo reconstruction using a spreadsheet.
85
Figure 6.1. Breakpoint at $fded and 6502 stack contents.
90
Figure 6.2. Disassembly of call to COUT.
92
Figure 6.3. Plotting data reads and writes.
93
Figure 6.4. Annotated log file excerpt.
95
Figure 7.1. Partial dynamic execution trace versus static disassembly.
100
Figure 7.2. Disassembled instructions from $d016.
103
Figure 7.3. The initial dictionary entry decoded.
104
Figure 7.4. Partial dictionary entry information.
105
x
Figures
Figure 7.5. Decompiling Forth source code from bytes.
107
Figure 7.6. Decompiled code for RESTORE versus the found source code.
109
Figure 8.1. Detailed Forth dynamic trace and mapping onto decompiled code.
113
Figure 8.2. Forth dynamic trace excerpt with wayfinders in bold.
115
Figure 8.3. Decompiled definition for TES_______.
116
Figure 8.4. Repeated division by powers of 10 to extract digits, with quotients in bold.
117
Figure 8.5. Decoding example for Mon.
119
Figure 9.1. Abstract state machine example.
125
Figure 9.2. Vocabulary table entry example.
128
Figure 9.3. Partial output from vocabulary table dumper.
129
Figure 9.4. Real-time Forth code execution visualization tool overview.
131
Figure 9.5. Two steps of Forth execution visualization.
132
Figure 9.6a. Amnesia parser modeled as an augmented transition network, part 1.
134
Figure 9.6b. Amnesia parser modeled as an augmented transition network, part 2.
135
Figure 10.1. Automatic location mapping to disk 1, side B.
142
Figure 10.2. Some possible little-endian byte orderings for 32-bit values.
143
Figure 10.3. Searching for a disk address and length in a sea of bytes.
144
Figure 10.4. Byte pattern to heuristically identify disk-loading code.
145
Figures
xi
Figure 10.5. Disk bytes accounted for on disk 1, side B.
146
Figure 10.6. X-Street Indexer code wheel in use.
147
Figure 10.7. Second layer of X-Street Indexer.
147
Figure 11.1. Hex byte comparison between the Dragon Lord and my .dsk versions.
151
Figure 11.2. Excerpt of disassembled code in the copy protection interpreter language.
155
Figure 11.3. Copy protection error screen, actual versus debugged.
156
Figure 11.4. Contemporaneous and modern views of Amnesia’s disk data.
157
Figure 11.5. Annotated magnetic flux visualization of Amnesia’s disk 1, side A.
159
Figure 11.6. Comparative flux visualizations with fat tracks.
160
Figure 12.1. Excerpt from original Hat Trick source code, and its representation in typical assembly code.
171
Figure 12.2. Magnetic flux visualizations for two Accolade games with fat tracks.
174
Figure 12.3. Code comparison between Apple II and Commodore 64 Amnesias.
175
Figure 12.4. Amnesia with tracing enabled.
176
Tables
Table 3.1. Example MD5 hashes.
50
Table 3.2. MD5 hashes for Amnesia .dsk files.
51
Table 5.1. Frequency of byte values in logo data.
73
Table 5.2. Numbers in base 10, base 2, and base 16.
74
Table 9.1. $d000 vocabulary statistics.
130
Table 10.1. Empirical mapping of location numbers to disks.
140
Preface
The idea for this book dates back some years. I had presented some of my video game reverse engineering work at an archaeology conference in 2017, and afterward, archaeologist Andrew Reinhard offhandedly said he’d like to be able to do that kind of work too. This made me wonder why he couldn’t do reverse engineering. The more I’ve worked with archaeologists, the more I’ve seen that they think and work in many of the same ways I do; the missing link is the understanding of how to approach and study digital artifacts—a gap this book takes a first step toward rectifying. The primary audience is archaeologists, although my hope is that the book acts as a more general reverse-engineering primer for people in other fields too, like history, game studies, and media studies. Computer scientists may also find the material interesting, if only because low-level topics there have been gradually deemphasized over time and relegated to niche status. While some previous exposure to computer programming might be helpful, I have tried my best to explain concepts as they are encountered, and the text plus an occasional Internet search or two should be sufficient to make it through. Fundamentally we are dealing with languages, which humans can understand with practice. I should caution that this was not written as a typical technical book. Martin Carver’s field research procedure that this book follows presented such a strong storyline in conjunction with reverse engineering that this book is more like a novel in many ways, telling the saga of Amnesia and what, if anything, set it worlds apart. The chapters are logical divisions in the story but not standalone entities, and one would not expect to start reading a novel at Chapter 6 and have it make sense. Similarly, jumping to Chapter 6 in this book to learn about text encoding will seem abrupt without the wrap-up from Chapter 5 and the concepts and definitions Chapter 6 relies on from earlier chapters. The notion of a technical book as a gripping page-turner may seem a ridiculous conceit, but bear with me. It’s a good story. This work would not be possible without the Internet Archive, along with hardware and software galore, including: Apple ][ Graphic,
xiv
Preface
Applesauce, DirMaster, DOSBox, Kryoflux, MAME (MESS), VICE, Wine, and wozardry. But archaeology is ultimately about people, and many people were extremely generous to me with their time and resources. Specifically, I would like to thank 4am, Jennifer Baranowski, Katie Biittner, Peter Brown, Don Daglow, John Durno, Anton Ertl, Peter Ferrie, Darcy Grant, Glen Hartley, Jane Isay, Charlie Kreitzberg, Hugo Labrande, Jimmy Maher, Kevin McDonnell, John Morris, Julia Novakovic, Jason Scott, Stefan Serbicki, James Terry, and Dean Yergens. Apologies if I have forgotten anyone! I am grateful to 4am, Jörg Denzinger, and Jim Uhl for proofreading and extensive comments; any errors, of course, remain my own. Special thanks to Stephane Racle for providing pictures and measurements of the unimaginably rare Harper & Row packaging. Thanks to series editor Andrew Reinhard, both for his thoughtful feedback as well as his enthusiasm and advocacy for this project from the outset. He made the initial suggestion of applying Carver’s procedure to reverse engineering. Last but certainly not least, Kathryn Kawalec provided her loving encouragement and support as I clattered on my clicky keyboard. —John Aycock
Trademarks are the property of their respective owners, even in the absence of explicit statements to that effect.
Part I
Pre-Excavation
---
Introduction ---
I am not an archaeologist. I am a computer scientist; what I lack in knowledge of pottery sherds and lithics I make up for in knowledge of things digital. And there is a storm coming.1 Currently the Internet Archive, a major aggregator of digital content, contains 60 million items—effectively, digital “artifacts.” Only a year ago it held 45 million, and next year it will have millions more. Some of these artifacts will be items from other forms of media, newly digitized, and some will have been born digital. The Internet Archive sees just a small fraction of what actually exists in digital form, of course, and yet even in this one sampling it is clear that there is a staggering amount of material. Why does this matter? Traditionally, archaeology has sought to understand humans through their material culture: things that people created, or affected, and things that in turn affected them. Through that lens, a digital artifact is an artifact. A digital artifact is the product of human ingenuity and creativity, one that affects the course of human lives. Much of modern culture is expressed in digital form, and this trend is increasing. To remain relevant in the modern world, archaeology must make sense of contemporary artifacts in a meaningful, knowledgeable way. Without exploring digital artifacts, we are effectively and willfully experiencing amnesia. As an outsider versed in digital technology, I am naturally biased, but this external, independent vantage point also allows me to observe and critique archaeological efforts to study the digital. Digital archaeology and computational archaeology are a good place to start: their names sound promising, but they are areas staunchly committed to the notion of using computers and computation to understand the old rather than to examine the new. To exemplify the problem, we need look no further than the international organization Computer
4
Amnesia Remembered
Applications and Quantitative Methods in Archaeology (CAA), one of whose stated goals is “to encourage communication and collaboration” between, amongst others, archaeologists and computer scientists (n.d.: item 2). Yet the CAA’s Journal of Computer Applications in Archaeology considers only “a wide variety of topics in digital applications to archaeology” (n.d.: para. 2—emphasis added): computers in servitude. While the use of the computer as a tool to advance traditional archaeology is laudable, this narrow definition of digital applications inhibits their value for comprehending the contemporary. Even a discussion of “grand challenges” for digital archaeology (Huggett, Reilly, and Lock 2018) fails to consider that the relationship between archaeology and the digital can be much more than an inwardly focused affair. One might object that the study of digital artifacts aligns with so-called media archaeology rather than archaeology proper. This is incorrect for two reasons. First, media archaeology has declared, in no uncertain terms, that it is not archaeology (Huhtamo and Parikka 2011: 3). Second, the stated scope of media archaeology explicitly omits the possibility of an overlap with science and engineering (Huhtamo and Parikka 2011: 3), the very fields necessary to help make sense of digital artifacts; certainly, media archaeology’s lack of a clear methodology has seen harsh criticism (Elsaesser 2016). As much as archaeology might wish to abdicate digital artifacts to media archaeology, this critical piece of contemporary culture cannot and should not be outsourced. Indeed, Finn argues for the archaeological importance of being able to see “technological evolution over years and decades, rather than centuries or millennia” (2013: 657). If there is no agreed-upon home for the study of digital artifacts within archaeology, we can instead examine specific pieces of research that seek to understand these artifacts. Moshenska (2014), for example, recounts the discovery of a buried USB flash drive during a school-outreach excavation. Approaching it initially as a physical artifact, he applied conservation techniques, then addressed its digital content by plugging the still functional drive into a computer. This revealed predominantly music and video files, along with a smattering of schoolwork. While a deeper dive into the flash drive’s digital strata would have been possible with the appropriate equipment and software, the point of the exercise is well made: digital artifacts can be viewed in an archaeological manner. Perry and Morgan continue in this vein, noting the absence of “a robust research process and an adapted methodological toolkit” (2015: 96) for handling digital media. One of their key contributions was an attempt to formally record the “strata” they encountered—
Introduction
5
both physical and digital—when examining a hard drive and its contents, although they stopped short of computer code “excavation,” rightly believing it to be a necessary but nontrivial undertaking. A vital but understated point in their work is the transition made in their investigation from nondestructive to destructive processes (2015: 97); the digital work took place prior to that transition. A digital artifact is a site that can be excavated in a nondestructive way, far removed from Barker’s characterization of excavation as “an unrepeatable experiment” (1993: 13). This opens the door to an archaeology that is reproducible and verifiable by others: cornerstones of the scientific endeavor. Even archaeology itself can leave digital artifacts to study. Reilly, Todd, and Walter (2016) tell the fascinating story about some early archaeological computer modeling work in the mid-1980s, visualizing the Old Minster in Winchester, UK. Presumed lost, the model’s files were rediscovered by chance some thirty years later and converted into a modern format. This afforded the opportunity not only to render them in new ways (e.g., 3D printing), but also to reflect on the original project in view of newer developments in cultural heritage, such as the London Charter that outlines best practices for computer visualization (Denard 2016). While this particular story had a happy ending, it does emphasize the reality that digital artifacts do not necessarily persist forever, and that they may be far from “future proof” when content is locked into proprietary or now-obsolete formats. As as outsider to the field, a striking facet of the type of archaeological work that studies digital artifacts is how little of it there is compared to the volume of digital material we are generating. Why is there such an aversion to digital artifacts within archaeology when we are awash with them and when they are so central to our contemporary lives? Will archaeologists in one- or two-hundred years lament that the field of today not only failed to engage with these ephemeral artifacts, but also failed to lay down groundwork to train new generations of archaeologists in how to deal with them? There is, however, a relatively new area of study within archaeology that has been actively working to fill this vacuum: archaeogaming. Reinhard (2018b: 2) defines this as “the archaeology both in and of digital games,” observing that its earliest threads stretch back to 2002 (Watrall 2002). More recently, Graham (2020) has contextualized archaeogaming by drawing a connection to agent-based modeling: for Graham, the archaeogaming he practices allows archaeologists/players to be participants in a simulation model. This is one of many understandings of archaeogaming.
6
Amnesia Remembered
While it may seem counterproductive to restrict the scope of digital artifact study to games, or frivolous to study digital games at all, there is a compelling case to be made. The video game industry is enormous, a multi–billion-dollar global affair (e.g., Entertainment Software Association 2019; Entertainment Software Association of Canada 2019; TIGA n.d.) rivaling or surpassing, by some estimates, the traditional entertainment industry. At that size, the game industry can’t not be influential on society and culture. Admittedly, financial statements may not be the best metric, but it is much harder to argue with the sheer numbers who play these games, constituting a convincing proportion of the population (Entertainment Software Association 2019; Bankhurst 2020; TIGA n.d.). Really, it should come as no surprise that humans play games—evidence for play stretches back to Neolithic times (Rollefson 1992)—and it makes sense for archaeology to study modern as well as ancient games. Furthermore, a focus on games maximizes the overlap with allied nonarchaeological fields of study, including game studies, platform studies, game history, and computing history. Archaeogaming as practiced is notably inclusive in terms of the topics that fall under its umbrella. Most familiar might be examples where the games themselves are treated as physical artifacts to be recovered by excavation: a longstanding urban legend, for instance, was that Atari had buried game cartridges in a New Mexico landfill site in the early 1980s, a story verified by excavation in 2014 (Reinhard 2015). From physical to virtual, as locations of human activity, we can treat game worlds as virtual sites and adapt archaeological techniques to study them. The 2016 game No Man’s Sky offered its players an enormous, effectively infinite, procedurally generated universe to explore. In response, the No Man’s Sky Archaeological Survey was formed to document machine-created culture (Reinhard 2018b), followed thereafter by the Legacy Hub Archaeological Project, which aimed to record human in-game material cultural remains following a cataclysmic event in the game universe (Reinhard 2019). Such in-game work, whether single- or multiple-player, additionally necessitates considerations of how it can be conducted ethically (Dennis 2016; Flick, Dennis, and Reinhard 2017)—considerations that extend to digital archaeology in general (Dennis 2020). The content of games can also be studied in archaeogaming. We see, for example, surveys of how archaeology and archaeologists are represented in video games, highlighting inaccurate portrayals as well as the need to address them, because “to deny the power of [Lara] Croft and [Indiana] Jones in encouraging young archaeologists would
Introduction
7
be futile” (Meyers Emery and Reinhard 2015: 144). By contrast, Copplestone (2017) advocates taking control of the video game narrative in order to create games that communicate and help critically reflect on the archaeological process. While these brief examples illustrate the breadth of archaeogaming work, they do not explore the implementation of the games, the technology that brought these digital artifacts to fruition. Fortunately, archaeogaming is not lacking in this regard. Extending Perry and Morgan’s aforementioned 2015 work, Reinhard (2018a) has experimented with expressing software versions of No Man’s Sky in a Harris matrix. He later undertook an extensive archaeological study of the source code for the early text adventure game Colossal Cave (Reinhard 2019). In collaborations with archaeologists and others, I myself have conducted archaeogaming research into the implementation of copy protection (Aycock and Reinhard 2017) and full-motion video (Aycock, Reinhard, and Therrien 2019), along with in-depth examinations of single games (Aycock and Copplestone 2019; Aycock and Biittner 2019) and entire game corpora (Aycock and Biittner 2020). Digital artifacts can be individually examined, and with the right tools, they can be studied en masse. It is instructive to examine the reasons why we have yet to see more technical work with digital artifacts, either in archaeogaming or in archaeology more generally. The above phrase, “collaborations with archaeologists,” is key. Both archaeology and computer science are full-time pursuits, and it is only by combining forces in an interdisciplinary effort that these digital artifacts can be thoroughly examined. This echoes one of Moshenska’s conclusions following his flash drive study (2014), and represents a natural extension of how archaeologists increasingly work with specialists in other subject areas, except here looking to apply archaeology to computer science, not the other way around. However, there are systemic issues outside archaeology that affect collaborative opportunities. Computer science, for example, glorifies the new and novel with relatively little consideration of the past. Or, as historian Thomas Haigh puts it, “the technical history of computer science is greatly understudied” (2014: 43). It is heartening, though, that computer scientists and archaeologists follow some similar methodologies. Moshenska (2016) has explored how the reverse engineering of technological artifacts and contemporary archaeology are intertwined, and in lieu of a game’s source code—which is often inaccessible—reverse engineering the binary code of the game is exactly the path a computer scientist would take. This is a specialized skill in computer science, albeit an important
8
Amnesia Remembered
one, because it is used extensively in the computer security industry to analyze malicious software, such as computer viruses and worms. The process of reverse engineering a video game can even be considered to parallel Carver’s “field research procedure” (Carver 2009): reconnoitering to select a site (a game); preliminarily evaluating the game using extant software tools; planning strategy and defining research questions; excavating via reverse engineering to address those questions. In many ways, we are not so far apart. In the time it has taken to read this section, the Internet Archive has added an average of several hundred new digital items to its trove. What a wonder it would be to discover that many physical artifacts in the same amount of time during an excavation! Digital artifacts pose enormous challenges but also enormous opportunities. Archaeology should be at its forefront.
Fetch the Torches and Pitchforks The preceding text appeared as a debate article in the journal Antiquity (Aycock 2021a) and, as part of that process, three responses were solicited (Huggett 2021; Whitcher Kansa and Kansa 2021; Morgan 2021), and I was allowed the last word (Aycock 2021b). I was unsure what the reception would be, as a foreigner to the land of archaeology, and whether I would even be smart and well-read enough to understand these responses from experts in a wholly other field. Obviously there were minor quibbles, such as my abbreviated characterization of digital archaeology (Huggett 2021), and questioning what new perspective archaeology could bring to the discourse on digital artifacts (Huggett 2021; Whitcher Kansa and Kansa 2021). Yet there were also clear points of agreement, both acknowledging the uncertainty in extending beyond the bounds of traditional archaeology (Huggett 2021) while asserting that “archaeology can and should be more than a consumer of the outcomes of computer science” (Whitcher Kansa and Kansa 2021: 1,596). What struck me the most is that we all agreed on the need for archaeology of the digital, but that this archaeology looked different for each of us. The range was notable: bigger-picture, systemic concerns about digital technology (Morgan 2021); “data literacy” and digital artifacts generated from within archaeology (Whitcher Kansa and Kansa 2021). And, of course, my own computer-science-centric view, understanding the workings of digital artifacts. There are also connections between the argument in the previous section and nondigital matters within archaeology.2 For example,
Introduction
9
Perry (2019) posits a future for archaeology wherein “enchantment” results in inspiring public consumers of the archaeological record into positive action; this can be viewed as arguing to give archaeology an increased modern-day relevance. Goading archaeology into broader engagement with digital artifacts has an end goal that is not dissimilar, and in fact it is not incompatible with Perry’s approach. Work on digital artifacts also adds fuel to the discussion over whether objects have agency (Hoskins 2006). With any computerbased artifact, but particularly with highly interactive software like video games, the object is operating apart from its creator in a very visible, tangible way. The agency of objects in the video game context is much easier to imagine than it would be for a static, inanimate physical object. At the same time, reverse engineering a digital artifact arguably makes the case against agency equally well, because it allows us to peer behind the veil to reveal the artifact as a clever artifice of human endeavor. At any rate, no vexing questions pertaining to archaeological thought will be settled within these pages because, again, I am not an archaeologist. What I can do, however, is deliver on a promise to “document, in an accessible fashion, the techniques and thought processes I use when analysing digital artefacts” (Aycock 2021b: 1,600).
About This Book In front of me is a copy of Martin Carver’s book Archaeological Investigation (2009), its pages bristling with yellow sticky notes. Carver delineates field archaeology as a six-step process: reconnaissance to locate a likely site, site evaluation, and strategy (excavation planning and choosing research questions), followed by excavation, analysis of the results, and publication. To back up my frivolous assertion above, that reverse engineering a video game parallels Carver’s process, the remainder of this book illustrates and explains the reverse engineering of the 1986 game Amnesia. I will be following Carver’s Field Research Procedure as a guide, and the organization of the book reflects this. The book is divided into three parts—pre-excavation, excavation, and post-excavation—with the majority devoted to the reverse engineering, the “excavations” I will be performing. Doubtless to the dismay of trained archaeologists, I will be using the terms “site” and “artifact” interchangeably to reflect the reality of digital excavations. On the one hand, Amnesia seems like a discrete artifact, with a physical manifestation that can be held in the hand and
10
Amnesia Remembered
a digital image that corresponds to the game. On the other hand, this is a superficial view, because the digital form of Amnesia is a multilayered site with many diverse parts and locations into which we can dig, and Amnesia is not unique in this respect. It is simply digital. I am starting my examination of the Amnesia artifact-site with no special information. I do not possess a copy of the human-readable source code that the programmers of the game wrote, only the ones and zeros present in the binary image of the game that was published decades ago. Everything that will be revealed about this binary image will be learned through reverse engineering, the gradual process of reconstructing the knowledge that went into Amnesia’s creation. A reverse engineer is a puzzle-solver and detective, roles archaeologists are well used to playing. What makes reverse engineering of computer code particularly exciting and challenging is that the site is alive in a sense. We are not simply staring at a static, lifeless pottery sherd sitting in the dirt, imagining what it might have been part of and how it was used, although studying (fragments of) code and data is certainly one technique a reverse engineer might employ. Instead, we often have the option of observing the code as it runs and watching the data as it is used. In other words, we can view the sherd as a part of the whole pottery from whence it came, see the context in which that pottery was used, even ask and answer questions about what might happen at this site if the pottery were crafted somewhat differently. As if this ability were not magical enough, there is more to digital artifacts that might seem straight from the pages of Harry Potter: the data and even the code may change as the game’s code—the instructions for the computer to perform—runs. Reverse engineering binary code is solving a puzzle where the puzzle pieces can move and transform themselves before our eyes. Reverse engineers are not only puzzle-solvers and detectives but may be toolsmiths as well. Nominally, building tools might be considered akin to experimental archaeology; here, the purpose is not only to demonstrate the plausibility of hypotheses but to create bespoke software tools that amplify our ability to analyze a program. The suggestion of forging software tools may strike fear into the hearts of archaeologists with little or no programming experience, a reasonable apprehension. However, remember that the study of digital artifacts can and should be an interdisciplinary endeavor, and the archaeologist may not be the one creating such tools. The critical takeaway here is understanding when and why new software tools may be necessary, and what benefits they might offer. Toolsmithing aside, archaeologists and other non–computer science readers may find parts of this book to be a challenging read. A number
Introduction
11
of technical details are needed to perform and comprehend binary reverse engineering, and as a result this book contains a variety of tutorial background information about necessary computer science concepts, a crash course that will permit archaeologists to begin their own meaningful examinations of the digital. I should also point out that what I am about to describe is not the only way to reverse-engineer Amnesia, and I have tried to show a variety of techniques, not all of which may be needed for a given artifact. Using different (existing or novel) software tools may also make the reverse engineering process easier or harder, although ultimately the underlying principles and thought processes remain the same and are possible to learn. Let us begin. Notes 1. The initial part of this chapter was first published as Aycock (2021a) and appears here with minor modifications. The original article is © John Aycock, 2021, and is reprinted with permission. The acknowledgments for that article read: “Thanks to Jason Scott at the Internet Archive for supplying the statistics. Katie Biittner and Andrew Reinhard provided insightful comments on a draft of this article, and the anonymous reviewers made many helpful suggestions. The author’s work was supported in part by the Natural Sciences and Engineering Research Council of Canada, grant RGPIN-2015-06359.” I am also grateful to the Antiquity editors for their superlative editing of the article. 2. I am grateful to an anonymous reviewer of the book draft for this observation and other suggestions for this section.
References Aycock, John. 2021a. “The Coming Tsunami of Digital Artefacts.” Antiquity 95(384): 1,584–89. ———. 2021b. “The Coming Tsunami of Digital Artefacts: Moving Forward.” Antiquity 95(384): 1,600–1,601. Aycock, John, and Katie Biittner. 2019. “Inspecting the Foundation of Mystery House.” Journal of Contemporary Archaeology 6(2): 183–205. ———. 2020. “LeGACy Code: Studying How (Amateur) Game Developers Used Graphic Adventure Creator.” 15th International Conference on the Foundations of Digital Games, article no. 23. DOI:10.1145/3402942.3402988. Aycock, John, and Tara Copplestone. 2019. “Entombed: An Archaeological Examination of an Atari 2600 Game.” The Art, Science, and Engineering of Programming 3(4). DOI:10.22152/programming-journal.org/2019/3/4. Aycock, John, and Andrew Reinhard. 2017. “Copy Protection in Jet Set Willy: Developing Methodology for Retrogame Archaeology.” Internet Archaeology 45. DOI:10.11141/ia.45.2.
12
Amnesia Remembered
Aycock, John, Andrew Reinhard, and Carl Therrien. 2019. “A Tale of Two CDs: Archaeological Analysis of Full-Motion Video Formats in Two PC Engine/ TurboGrafx-16 Games.” Open Archaeology 5: 350–64. Bankhurst, Adam. 2020. “Three Billion People Worldwide Now Play Video Games, New Report Shows.” August 14, 2020. https://www.ign.com/art icles/three-billion-people-worldwide-now-play-video-games-new-reportshows. Barker, Philip. 1993. Techniques of Archaeological Excavation, 3rd ed. London: Routledge. Carver, Martin. 2009. Archaeological Investigation. London: Routledge. Computer Applications and Quantitative Methods in Archaeology. n.d. “Constitution.” Retrieved May 14, 2021, from https://caa-international.org/con stitution/. Copplestone, Tara Jane. 2017. “Designing and Developing a Playful Past in Video Games.” In The Interactive Past: Archaeology, Heritage, and Video Games, ed. Angus A. A. Mol, Csilla E. Arise-Vandemeulebroucke, Krijn H. J. Boom, and Aris Politopoulos, 85–97. Leiden: Sidestone Press. Denard, Hugh. 2016. “A New Introduction to the London Charter.” In Paradata and Transparency in Virtual Heritage, ed. Anna Bentkowska-Kafel, Hugh Denard, and Drew Baker, 57–71. London: Routledge. Dennis, L. Meghan. 2016. “Archaeogaming, Ethics, and Participatory Standards.” SAA Archaeological Record 16(5): 29–33. ———. 2020. “Digital Archaeological Ethics: Successes and Failures in Disciplinary Attention.” Journal of Computer Applications in Archaeology 3: 210– 18. DOI:10.5334/jcaa.24. Elsaesser, Thomas. 2016. “Media Archaeology as Symptom.” New Review of Film and Television Studies 16(2): 181–215. Entertainment Software Association. 2019. “2019 Essential Facts About the Computer and Video Game Industry.” https://www.theesa.com/wp-con tent/uploads/2019/05/ESA_Essential_facts_2019_final.pdf. Entertainment Software Association of Canada. 2019. “The Canadian Video Game Industry 2019.” http://theesa.ca/wp-content/uploads/2019/11/Ca nadianVideoGameSector2019_EN.pdf. Finn, Christine. 2013. “Silicon Valley.” In The Oxford Handbook of the Archaeology of the Modern World, ed. Paul Graves-Brown, Rodney Harrison, and Angela Piccini, 657–70. Oxford: Oxford University Press. Flick, Catherine, L. Meghan Dennis, and Andrew Reinhard. 2017. “Exploring Simulated Game Worlds: Ethics in the No Man’s Sky Archaeological Survey.” ORBIT Journal 1(2): 1–13. DOI:10.29297/orbit.v1i2.46. Graham, Shawn. 2020. An Enchantment of Digital Archaeology: Raising the Dead with Agent Based Models, Archaeogaming and Artificial Intelligence. New York: Berghahn Books. Haigh, Thomas. 2014. “The Tears of Donald Knuth.” Communications of the ACM 58(1): 40–44.
Introduction
13
Hoskins, Janet. 2006. “Agency, Biography and Objects.” In Handbook of Material Culture, ed. Christopher Tilley, Webb Keane, Susanne Küchler, Michael Rowlands, and Patricia Spyer, 74–84. London: SAGE Publications. Huggett, Jeremy. 2021. “Archaeologies of the Digital.” Antiquity 95(384): 1,597–99. Huggett, Jeremy, Paul Reilly, and Gary Lock. 2018. “Whither Digital Archaeological Knowledge? The Challenge of Unstable Futures.” Journal of Computer Applications in Archaeology 1(1): 42–54. DOI:10.5334/jcaa.7. Huhtamo, Erkki, and Jussi Parikka. 2011. “Introduction: An Archaeology of Media Archaeology.” In Media Archaeology: Approaches, Applications, and Implications, ed. Erkki Huhtamo and Jussi Parikka, 1–21. Berkeley: University of California Press. Journal of Computer Applications in Archaeology. n.d. “About.” Retrieved May 14, 2021, from https://journal.caa-international.org/about/. Meyers Emery, Kathryn, and Andrew Reinhard. 2015. “Trading Shovels for Controllers: A Brief Exploration of the Portrayal of Archaeology in Video Games.” Public Archaeology 14(2): 137–49. Morgan, Colleen. 2021. “An Archaeology of Digital Things: Social, Political, Polemical.” Antiquity 95(384): 1,590–93. Moshenska, Gabriel. 2014. “The Archaeology of (Flash) Memory.” Post-Medieval Archaeology 48: 255–59. ———. 2016. “Reverse Engineering and the Archaeology of the Modern World.” Forum Kritische Archäologie 5: 16–28. Perry, Sara. 2019. “The Enchantment of the Archaeological Record.” European Journal of Archaeology 22(3): 354–71. Perry, Sara, and Colleen Morgan. 2015. “Materializing Media Archaeologies: The MAD-P Hard Drive Excavation.” Journal of Contemporary Archaeology 2: 94–104. Reilly, Paul, Stephen Todd, and Andy Walter. 2016. “Rediscovering and Modernising the Digital Old Minster of Winchester.” Digital Applications in Archaeology and Cultural Heritage 3: 33–41. Reinhard, Andrew. 2015. “Excavating Atari: Where the Media Was the Archaeology.” Journal of Contemporary Archaeology 2(1): 86–93. ———. 2018a. “Adapting the Harris Matrix for Software Stratigraphy.” Advances in Archaeological Practice 6(2): 157–72. ———. 2018b. Archaeogaming: An Introduction to Archaeology in and of Video Games. New York: Berghahn Books. ———. 2019. “Archaeology of Digital Environments: Tools, Methods, and Approaches.” PhD diss., University of York, UK. Rollefson, Gary O. 1992. “A Neolithic Game Board from ʿAin Ghazal, Jordan.” Bulletin of the American Schools of Oriental Research 286: 1–5. TIGA. n.d. “About the UK Video Games Industry.” Retrieved May 15, 2021, from https://tiga.org/about-tiga-and-our-industry/about-uk-video-gamesindustry.
14
Amnesia Remembered
Watrall, Ethan. 2002. “Digital Pharaoh: Archaeology, Public Education and Interactive Entertainment.” Public Archaeology 2(3): 163–69. Whitcher Kansa, Sarah, and Eric Kansa. 2021. “Digital Archaeology and ‘D’ Transforms?” Antiquity 95(384): 1,594–96.
CHAPTER 1
Reconnaissance ---
The first step of many leading to excavation is obviously the choice of a site to excavate. This is Carver’s reconnaissance stage: identifying possible site candidates, a process which may incorporate information already gathered by others (Carver 2009). How can a single video game be chosen to examine, and how can information be gleaned about a game and its as-yet-invisible internals?1 For archaeologists accustomed to thinking about physical sites, dealing with digital sites is not something that requires travel to a particular location.2 In fact, there is a plethora of information about games that can be readily browsed from the comforts of a chair, in addition to steady streams of game-related information constantly flowing past. Social media, for all its attendant ills, is one valuable source of information. There are many game enthusiasts, along with organizations dedicated to preserving games and game history, who generously share their findings and activity with the public; a carefully curated Twitter feed has led me to many interesting examples of games to study, for example. Let me be quick to stress that social media, and indeed all these information sources, may be unreliable. The purpose of performing reconnaissance is to narrow down the set of all possible games, and to hopefully find a number of likely candidates for further study. Claims made about games should be viewed as statements to be verified rather than unvarnished truth. Another source of information is the game developers themselves. Since the creation of video games falls within the recent past, we currently have the good fortune to have many of the game creators still with us. In the early throes of reconnaissance, before a single game site has been selected, we can make use of extant interviews with game developers scattered across web sites and published in game magazines.
16
Amnesia Remembered
More extensive game retrospectives can sometimes be found, such as the game “postmortems” that appear at the annual Game Developer’s Conference (GDC). Again, this should be treated as a starting point: human memory is fallible and selective, but developers’ recollections provide hints as to what might have been done in a game’s implementation, hints that can be verified through reverse engineering. Looking to traditional academic sources of information may be of limited value for reconnaissance. As might be expected, peer-reviewed, scholarly publications containing technical details about game implementation are thin on the ground. Online sources of information can be a wealth of material, though. There are searchable and browsable online databases of games generally (e.g., MobyGames) in addition to databases for specific platforms (e.g., World of Spectrum). Discussion forums and blogs can contain anything from isolated clues to extensive deep dives on game minutia. Videos can dissect some aspects of game implementation and game platforms. But ultimately, while all these sources can contribute greatly to reconnaissance, they need to be taken with a grain of salt. The primary source is what we will be eventually excavating: the game’s code and data. My selection of Amnesia as a site began with one of these online sources of information: Wikipedia. To understand what caught my eye about Amnesia’s Wikipedia entry during an otherwise banal late-night link-clicking session, we need a bit of context. Amnesia is a text adventure game, a genre of game also referred to as interactive fiction. A text adventure game in its purest form consists of text only, where the player is provided a textual description of their location, and the player must communicate their actions by typing a restricted set of natural language commands. The input and output text of Amnesia happens to be in English, but games of this genre exist in many languages. Strictly speaking, text adventures may be thought of as computer games rather than video games, because the “video” element is optional—they could be played on a teletype device that marries an input keyboard to output printed on paper. Figure 1.1 shows a brief excerpt from a play session of a text adventure game. Part of the challenge is the game itself, and part is guessing how to phrase input commands, determining what vocabulary words the game understands and how to string those words together. Simpler text adventures may only accept simple two-word verb-noun sentences, like GET LAMP, and some words may be ignored completely. For example, so-called stop words like articles (i.e., a, an, the) could be quietly overlooked by the game, making the command GET THE LAMP equivalent to GET LAMP; unfortunately, this would typically have the
Reconnaissance
17
Figure 1.1. Text adventure example, where the player’s input is in uppercase. From Tristam Island (2020), © Hugo Labrande, used with permission.
side effect of allowing GET THE A LAMP and A GET A A LAMP A and other nonsensical alternatives. Needless to say, the sophistication of a text adventure game’s parser and the extent of a game’s vocabulary could be both a selling feature for commercial games and a point of criticism for players and reviewers. The usual historical narrative is that Will Crowther created the first text adventure game in 1975–76, and Don Woods later extended that game (Jerz 2007), although there are relatively recent indications that Crowther’s Colossal Cave game may have had an antecedent (Ant 2015; Dyer 2015). In an era of large mainframe computers where graphics displays would have been a rarity, combined with outside influences like imagination-based role-playing tabletop games (Dungeons & Dragons was first released in 1974), it is easy to see why these text adventure games would have been compelling. Text adventure games made the jump to microcomputers soon after, with Scott Adams developing the text adventure Adventureland in 1978 for the TRS-80,3 although microcomputer text adventure games could be abbreviated compared to their mainframe relatives because of memory and storage constraints.4
18
Amnesia Remembered
Figure 1.2. Mystery House (1980) added rudimentary graphics to text. Emulator screenshot by author.
Technically, text adventure games can be viewed as a juxtaposition of two opposing ideas. First, basic text adventures are so simple that even amateur programmers could create them. In the early 1980s, there were any number of books detailing how people could construct their own text adventures (e.g., Hartnell 1984; Horn 1984; Tyler and Howarth 1983). Software tools specifically for the creation of text adventures were marketed (Aycock and Biittner 2020), and text adventure games are still used for assignments for teaching introductory computer science to students (Kussmaul 2017; Sharmin et al. 2019). But second, contrasting their core simplicity was the fact that human language was the user interface to which text adventure games aspired, something we are arguably only starting to get good at now, decades later and relying on massively increased computing power. Some amount of player frustration and disappointment with these games was inevitable. These games transcended text as early as Mystery House in 1980 (figure 1.2) and became augmented, effectively illustrating the text with graphics in the same manner as a book might be illustrated. This transition made sense on microcomputers, which as time went on supported not only graphics but increasingly better graphics: the Apple Macintosh, the Commodore Amiga, and the Atari ST debuted by the mid-1980s, and IBM PC graphics cards improved at roughly the
Reconnaissance
19
Figure 1.3. Thomas M. Disch in 1986. Bernard Gotfryd collection, Prints & Photographs Division, Library of Congress, LC-DIG-gtfy-01120.
same time. Along with graphics also came a shift away from textual input in adventure games. Infocom, a well-known producer of notably high-quality text adventure games, was one holdout dedicated to textonly games, making their lack of graphics a focal point of advertising (Infocom, Inc. 1983). By the mid-1980s, however, even Infocom’s days were numbered.5 Let us return to my Wikipedia expedition. When I came across the Wikipedia entry for Amnesia (Wikipedia n.d.), it struck me as extremely odd given this historical context that Electronic Arts published it—a new, text-only adventure game—in 1986. It was initially released for the Apple II and IBM PC computers, with a Commodore 64 version following a year later. The game got more interesting when I realized Amnesia was one of the remarkably few text adventure games whose text was authored by a “real” writer, Thomas M. Disch (1940–2008); figure 1.3 shows him holding the finished product. Disch was an award-winning writer of science fiction in addition to poetry, plays, and children’s books (Martin 2008; Schudel 2008). This authorship places Amnesia into a rarefied category of text adventure, whose most notable entry may be Douglas Adams’ involvement in Infocom’s game Hitchhiker’s Guide to the Galaxy (1984).
20
Amnesia Remembered
Even these two factors, the late release date and the authorship, might not have tipped the scale from treating Amnesia as a passing curiosity to a potential digital excavation site. What completed the trifecta and clinched it for me, as a computer scientist interested in game implementation, was part of a sentence in its Wikipedia entry asserting that the game was implemented with something I had never heard of: “the King Edward Adventure game authoring system” (n.d.). In-house, proprietary game development tools are intriguing because the exact form they take from the game developer’s point of view is not at all apparent in the finished product the public sees, and yet logically we must assume these tools are highly tuned to the types of game being produced or there would be no purpose in using them. While we know about some of these tools, such as Infocom’s Zork Implementation Language (Blank and Galley 1980) and Scott Adams’ menu-driven environment (Aycock 2016), there may be many others that have gone unnoticed since we rarely have access to games’ source code—like a King Edward system, for example. How did Amnesia come to be? Having chosen Amnesia as a site, I focused my reconnaissance efforts on that one game. Recall that the eventual goal is excavation through reverse engineering, and therefore any information about the game’s creation and design may be potentially useful later. Starting in 2019, I conducted five interviews with people involved with the game to piece together the story of Amnesia.6 The tale begins with the book publishing industry. An article in The Atlantic, reminiscing on that industry in the 1980s, has a striking quote: “In 1984, orders were still delivered by phone, fax, and mail. At Random House, I received the daily sales numbers on a mimeographed sheet, transcribed from some master ledger” (Osnos 2011). For reference, 1984 was the year Apple released the Macintosh; it was well into the era when computers could routinely be found in homes, schools, and businesses. Clearly book publishing at that time was a slowmoving, traditional industry, yet there were some notions that change was necessary. A 1982 essay confidently entitled “The Future of the Book” predicted that “if today’s publishers do not seize the opportunity presented by electronic media, new publishers will inevitably arise to do so” (Sigel 1982: 31). In fact, change was coming to the industry because in 1983 a number of publishers were establishing software development divisions, Harper & Row among them (McDowell 1983). Jane Isay was one of two publishers at Basic Books, which was owned by Harper & Row, and she was being recruited by Oxford US. She had lunch with the president of Harper & Row, who had caught wind of the recruitment and was keen to retain her. For her part, Isay
Reconnaissance
21
was excited by the possibilities of home computers, and had recently bought one for her children. The deal struck in the end saw Isay become the director of a new Electronic and Technical Division at Harper & Row, which she would start, and which would publish software along with books about computers. She began in this position in May 1983 (McDowell 1983). In turn, Isay asked Glen Hartley, the associate director of publicity at Simon & Schuster, to be the associate publisher and marketing director of her new division. He joined Harper & Row in 1983 as well. The credits on the physical packaging of the final published version of Amnesia say, “dedicated to Glen Hartley,” and he plays a fairly pivotal role throughout. During the existence of the Electronic and Technical Division, Harper & Row had a number of software products in various stages. The Write Stuff and Fishies were both published in 1984; the former title was a word processor developed by the company that made the Bank Street Writer word processor (Renne 1984), and the latter turned the Apple II computer into a graphical, screen-saver-like fish tank.7 In development—and now almost certainly lost—were Dollhouse, a house-designing program for children, and a Fraggle Rock game based on the television series. Harper & Row was apparently approached about software based on the book In Search of Excellence (Needle 1984), but there is no evidence that the pitch made any headway. And, of course, Harper & Row’s Electronic and Technical Division was developing Amnesia. Hartley oversaw the Amnesia project at Harper & Row. He had come to the company with the idea of interactive fiction in mind, and in addition Hartley knew Thomas Disch and was a fan of Disch’s work. Hartley was able to persuade Disch to give interactive fiction a try, a fact corroborated by Disch himself, who would later extol the virtues of the interactive format (Lehman 1988). The subject matter of amnesia was from Disch who, when interviewed about the published game, said “I forget my own life all the time. Books I read twenty years ago I remember much more clearly than the details of my own life. So amnesia was a natural subject for me” (Lehman 1988: 220). This is perhaps an oversimplified view: amnesia appeared in passing in his 1968 book Camp Concentration (Disch 1973), and also in his novel The Prisoner from that same period (Disch 1969), so amnesia was not an unknown idea in his work. Disch approached the task of writing Amnesia in the fashion one would expect of a writer, which is to say he wrote a script. Remarkably, a copy of the script was found (Scott 2008), signed by Disch, and I have confirmed the veracity of the script with multiple inter-
22
Amnesia Remembered
viewees; its copyright date is 1984, which coincides with the Harper & Row timeline. The script contains the bite-sized pieces of text that would reasonably fit onto the computer screens of the time, linked together by what can best be described as pseudocode. Pseudocode is not actual code the computer would be able to understand and need not be written in any computer language; it is a high-level, informal description of what the computer code should do, a technique used by programmers to describe and reason through the code they must write. Pseudocode may also abstract away concrete details that the computer requires. Disch’s script, for example, contains conditional statements that govern how pieces of the game are connected along with responses to the player’s actions: [IF response to 14> is YES, move to (18); if NO:] (14A) Well then? Omitted from the script’s pseudocode representation are details like how a player response is acquired, how text messages are displayed for the player, and how moving to another chunk of game text is managed. Other pseudocode directives in the script elide even more, such as one breezily describing how random variations should be introduced for the text of in-game television channels’ content. While the script is an extremely interesting find, it has limited value from the reverse engineering point of view: we do not (yet) know how this was implemented, and allusions to a King Edward system suggest that there was probably an as-yet-indeterminate intermediate form between the script and the computer-runnable version. What we see in the binary may be quite far removed from Disch’s manuscript. None of the people in Amnesia’s story until now have been computer programmers, though, and someone would need to implement Disch’s script and bring it to life. Enter the company Cognetics, which, amazingly enough, still exists today (as Cognetics Interactive) with the same person at the helm, Charlie Kreitzberg. Tracing the network of influences on a work is rarely straightforward, and while Glen Hartley came to Harper & Row with interactive fiction in mind, Dr. Kreitzberg had exposure to hypertext research that was leading in a similar direction. Kreitzberg employed a number of people through Cognetics; perhaps the most important for this story are James Terry and Kevin Bentley. Indeed, Kreitzberg and Bentley’s 1980s likenesses are captured in a photo contained inside Amnesia’s final packaging, along with fellow Cognetics employees Pat Reilly (who worked on the eventually unreleased Fraggle Rock game) and technical writer Lis Ro-
Reconnaissance
23
Figure 1.4. Front cover of Harper & Row Amnesia packaging. Image courtesy Stephane Racle.
manov. As for James Terry, he created the King Edward system that was essential to Amnesia’s implementation, along with a variant of it, called Prince Edward, for Fraggle Rock.8 Why “King Edward?” It was the name on a handy matchbook cover when they needed to christen the language, a language that was never used for anything apart from those two games. The realization of Amnesia proceeded, with Kevin Bentley using King Edward to implement Disch’s script on the Apple II computer, and James Terry performing King Edward development in parallel to add necessary game-related support. At Harper & Row, meanwhile, Amnesia production advanced to the point where packaging was created for the game by designer Irving Freeman. A picture of the front of the Harper & Row packaging is shown in figure 1.4. The format, an 8¾ × 8¾ inch folding “album cover,” opened to reveal liner notes, mostly prose explaining how to play the game; the package included a street map of Manhattan and an address book. The front cover speaks to the premise of the game, with the image of its noseless and mouthless denizen cleverly reflecting how the player begins the game in a hotel room with no memory of who they are. Besides Disch’s participation, a selling point for the game was its extensive inclusion of Manhattan locations—hence the street map—and the back cover claimed that the game “could virtually serve
24
Amnesia Remembered
Figure 1.5. Front of the final Amnesia packaging. Photo by author.
as a guided tour of Gotham.” The inside copy was even more boastful, touting Amnesia’s “almost one-to-one simulation of central Manhattan, incorporating thousands and thousands of street intersections.” All this was possible thanks not to King Edward but to “UR, the programming language specially developed for this project by Cognetics.” It would have seemed that things were in order for Amnesia, with a game in progress, packaging for the game, and a publisher. However, Harper & Row closed the Electronic and Technical Division after only eighteen months, at the end of 1984.9 Recollections vary as to why the shutdown happened, but what it meant for Amnesia was that it was now publisher-less. In came Electronic Arts, a company that had only been formed a few years earlier in 1982, and one of their producers, Don Daglow. Daglow’s name might ring a bell for game history aficionados because he is a game author himself: among others, he created the 1981 game Utopia for the Intellivision, one of the first “god games.” He played Cognetics’ game-in-progress, and he was already familiar with Disch’s written work; Daglow became a champion for Amnesia inside Electronic Arts, which became the game’s new publisher. Daglow takes responsibility for publishing the late text-only game. While he knew Infocom’s text-only adventure games were struggling, he had hoped Disch’s distinctive writing style would revitalize the genre. In the end, the game was completed during “crunch time” by
Reconnaissance
25
Kevin Bentley with help from James Terry at an apartment located near Electronic Arts in California. The final development push was described as “brutal”;10 but at last, Amnesia was finished. The final packaging for Amnesia was . . . an 8¾ × 8¾ inch folding album cover. Electronic Arts was not copying Harper & Row’s design, as it happens, because Electronic Arts had been releasing software in that format since the company’s inception. For this project, I acquired six copies of Amnesia via eBay, two for each of the three platforms for which the game was released. The most significant of these, though, was an IBM version that had never been opened: still enclosed in its plastic shrink wrap, it allowed me to definitively document the packaging contents. Figure 1.5 shows the person adorning the front cover is less anonymous than in the Harper & Row version, with a greater emphasis on the city setting. In a pocket inside the unfolded cover are two 5¼-inch double-sided floppy disks (figure 1.6), the most Daglow could get budget approval for, and a constraint that would ultimately limit how much of Disch’s script could be squeezed into the final game. The disks’ magnetic media contains the digital images of the game, and thus they are of special importance for reverse engineering; as was common practice, each disk was stored in a protective sleeve. Alongside them are pedestrian fare—a warranty card, command summary, and command summary addendum—which supplied, among other things, platform-specific instructions for installing (if necessary) and starting the game. Tucked in the left-hand side of Amnesia’s album cover, where a vinyl record would reside in the analogous musical packaging, is a stack of paper items. Not one but two identical sheets advertise the range of available Electronic Arts products, prepared earlier since Amnesia is listed as “available October 1986.” A street map is present, similar to the Harper & Row packaging, with the earlier standalone address book now found inside “A Visitors [sic] Guide to New York City.” The guide provides multiple pages of diagetic information, and flipping the guide to the reverse side reveals “Thomas M. Disch’s Amnesia: The Manual.” Here can be found information on playing the game, its vocabulary, and finally game hints that are trivially “encrypted,” for example, “Tryxaxtogaxparty!x.” The last item in the package is a round “X-Street Indexer” that is part of the game’s copy protection; we revisit this in Chapter 10. The reviews of Amnesia were mixed. Newsweek gushed over the “new art form” (Lehman 1987), and some computer magazine reviews were equally flattering, trumpeting Amnesia’s “extensive and sophisticated parser,” the vastness of Manhattan that could be explored, and
26
Amnesia Remembered
Figure 1.6. Amnesia floppy disks. Photo by author.
the “fine prose” (Trunzo 1987). Another gave it the highest praise for a text adventure game, likening it to Infocom titles (“Amnesia” 1987). Besides complimenting the parser and the vocabulary, reviews spoke to the realities of running software at the time; Cohn pointedly observed that “Amnesia’s [floppy] disk operations rarely take more than a few seconds” (Cohn 1988). Toward the other end of the spectrum, Amnesia received the backhanded compliment that “the text is so rich and the story so interesting that one hardly notices that this is probably the least interactive piece of interactive fiction ever made” (Ardai 1987: 41), along with an outright vitriolic review in that same magazine (Scorpia 1987). A primary theme of the criticism seems based on the game dragging the player through the story as opposed to letting the player interact more freely with their environment; perhaps this is the inevitable result of Disch, a career writer but interactive fiction neophyte, writing the script. Regardless, Amnesia ended up as a finalist in the Software Publisher’s Association awards, which Robin Williams presented.11 Amnesia had movie rights optioned (Hartley 1985), although it seems that nothing came of that, and Glen Hartley went on to start his own literary agency and ended up representing Thomas Disch about a decade later. At this point, reconnaissance has led to a site, and a focused effort has allowed the complicated backstory of Amnesia to emerge along with an understanding of the book publishing context in which it arose. There are still notable omissions, however. A physical game has
Reconnaissance
27
been acquired yet there is no mention of digital images, no attempts to run the game, and no progress on the technical front. Shifting to the digital manifestation of Amnesia is the next step. Notes 1. The beginning of this chapter is based on the more extensive discussion in Aycock (2018). 2. The lack of physical fieldwork and travel to remote locations makes the archaeology of digital sites an archaeology that is highly suited to people who have accessibility, harassment, and discrimination concerns. Thanks to Katie Biittner for pointing this out to me. 3. Adams places Adventureland before Pirate’s Adventure, for which he gives a copyright date of 1978 (Adams 1980b). The source code has a December 1978 copyright, the same year given for his adventure interpreter with the “Adventure Land” moniker (Adams 1980a). 4. For a fuller discussion of text adventure games and interactive fiction, see Montfort (2003). 5. Although, to be fair, Infocom’s ill-advised expansion into business software did not do their company any favors (Briceno et al. 2000). 6. The interviews took place via email and had ethics approval from the University of Calgary’s Conjoint Faculties Research Ethics Board, file REB161235. The interview transcripts are published (Aycock 2019a, 2019b, 2019c, 2019d, 2019e), and this section draws upon their contents unless otherwise noted, although for readability I have omitted explicit references to them to avoid littering the text with citations. 7. Fishies’ concept is credited to James Terry, who was amused by the notion of a pointless computer application. He joins the Amnesia story shortly. 8. Kreitzberg recalls the full title of the Fraggle Rock game as Fraggle Rock: The Mystery of the River of Song, and that it ended up with CBS Software. The Prince Edward variant of King Edward added graphics support, meaning Fraggle Rock would not have been a text-only game. 9. The publishing industry would once again flirt with “new media” within the span of a decade in the form of multimedia CD-ROMs. That foray did not go particularly well either (Clark and Phillips 2019: 53). 10. This is corroborated by Maher’s account (2014) that used information supplied by Kevin Bentley, whom unfortunately I was unable to interview. 11. The Software Publisher’s Association records are sparse, making this hard to confirm, but it does appear that Williams hosted it in 1987.
28
Amnesia Remembered
References Adams, Scott. 1980a. “Adventure Interpreter.” SoftSide 2(10): 44–49. ———. 1980b. “Pirate’s Adventure.” Byte 5(12): 192–212. “Amnesia.” 1987. Your Computer (November): 51. Ant. 2015. “Wander (1974)—a Lost Mainframe Game Is Found!” Retroactive Fiction blog. April 22, 2015. https://ahopeful.wordpress.com/2015/04/22/ wander-1974-a-lost-mainframe-game-is-found/. Ardai, Charles. 1987. “Titans of the Computer Gaming World, Part II of V: Electronic Arts.” Computer Gaming World: The Journal of Computer Gaming 37 (May): 28–29, 40–41. Aycock, John. 2016. Retrogame Archeology: Exploring Old Computer Games. Cham, Switzerland: Springer. ———. 2018. “Finding the Invisible: An Experience-Based Methodology for Selecting Retrogame Archaeology ‘Sites.’” Kinephanos (August): 34–50. ———. 2019a. “Interview with Glen Hartley Re: Amnesia.” TR 2019-1109-01. University of Calgary, Department of Computer Science. DOI:10.11575/ PRISM/36468. ———. 2019b. “Interview with Jane Isay Re: Amnesia.” TR 2019-1110-02. University of Calgary, Department of Computer Science. DOI:10.11575/ PRISM/36614. ———. 2019c. “Interview with Charlie Kreitzberg Re: Amnesia.” TR 20191111-03. University of Calgary, Department of Computer Science. DOI:10.11575/PRISM/36732. ———. 2019d. “Interview with Don Daglow Re: Amnesia.” TR 2019-1113-05. University of Calgary, Department of Computer Science. DOI:10.11575/ PRISM/36733. ———. 2019e. “Interview with James Terry Re: Amnesia.” TR 2019-1112-04. University of Calgary, Department of Computer Science. DOI:10.11575/ PRISM/36756. Aycock, John, and Katie Biittner. 2020. “LeGACy Code: Studying How (Amateur) Game Developers Used Graphic Adventure Creator.” 15th International Conference on the Foundations of Digital Games, article no: 23. DOI:10.1145/3402942.3402988. Blank, Marc S., and S. W. Galley. 1980. “How to Fit a Large Program into a Small Machine, or How to Fit the Great Underground Empire on Your Desk-Top.” Creative Computing 6(7): 80–87. Briceno, Hector, Wesley Chao, Andrew Glenn, Stanley Hu, Ashwin Krishnamurthy, and Bruce Tsuchida. 2000. “Down from the Top of Its Game: The Story of Infocom, Inc.” Infocom. December 15, 2000. http://web.mit .edu/6.933/www/Fall2000/infocom/infocom-paper.pdf. Carver, Martin. 2009. Archaeological Investigation. London: Routledge. Clark, Giles, and Angus Phillips. 2019. Inside Book Publishing, 6th ed. London: Routledge. Cohn, Jesse. 1988. “Amnesia.” Compute!’s Gazette 6(5): 42.
Reconnaissance
29
Disch, Thomas M. 1969. The Prisoner. New York: Ace Books. ———. 1973. Camp Concentration. London: Panther. Dyer, Jason. 2015. “Wander (1974) Release, and Questions Answered.” Renga in Blue: Interactive fiction and All the Adventures project (blog). April 23, 2015. https://bluerenga.wordpress.com/2015/04/23/wander-1974-release-andquestions-answered/. Hartley, Glen. 1985. “Toward the Ultimate Participatory Novel.” Electronic Publishing and Bookselling 3(4): 12–13, 15. Hartnell, Tim. 1984. Creating Adventure Games on Your Computer. New York: Ballantine Books. Horn, Delton T. 1984. Golden Flutes and Great Escapes: How to Write Adventure Games. Beaverton, OR: dilithium Press. Infocom, Inc. 1983. “We Stick Our Graphics Where the Sun Don’t Shine (Advertisement).” Softline 2(July–August): 6–7. Jerz, D. G. 2007. “Somewhere Nearby Is Colossal Cave: Examining Will Crowther’s Original ‘Adventure’ in Code and in Kentucky.” Digital Humanities Quarterly 1(2). http://www.digitalhumanities.org/dhq/vol/1/2/000009/00 0009.html. Kussmaul, Clif. 2017. “An Extended Series of Assignments in CS2 Involving a Text Adventure Game.” Journal of Computing Sciences in Colleges 32(6): 14–23. Lehman, David. 1987. “You Are What You Read: Mastering the Possibilities of the Electronic Novel.” Newsweek, January 12, 1987, 67. ———. 1988. “A Conversation with Tom Disch.” Southwest Review 73(2): 220–31. Maher, Jimmy. 2014. “Amnesia.” The Digital Antiquarian (blog). September 29, 2014. https://www.filfre.net/2014/09/amnesia/. Martin, Douglas. 2008. “Thomas Disch, Novelist, Dies at 68.” New York Times, July 8, 2008, A19. McDowell, Edwin. 1983. “Publishing: The Computer Software Race Is On.” New York Times, April 22, 1983, C27. Montfort, Nick. 2003. Twisty Little Passages: An Approach to Interactive Fiction. Cambridge, MA: MIT Press. Needle, David. 1984. “From the News Desk.” InfoWorld 6(24): 9. Osnos, Peter. 2011. “How Book Publishing Has Changed Since 1984.” The Atlantic, April 12, 2011. https://www.theatlantic.com/entertainment/arch ive/2011/04/how-book-publishing-has-changed-since-1984/237184/. Renne, Mark E. 1984. “The Write Stuff: A Word Processing Program for the Apple II.” InfoWorld 6(43): 64, 66. Schudel, Matt. 2008. “Thomas Disch; Sci-Fi Writer Was Part of ‘New Wave.’” The Washington Post, July 9, 2008, B05. Scorpia. 1987. “Thomas M. Disch’s Amnesia.” Computer Gaming World: The Journal of Computer Gaming, no. 34: 44–45, 64–65. Scott, Jason. 2008. “The Amnesia Manuscript.” October 11, 2008. http://ascii .textfiles.com/archives/1458.
30
Amnesia Remembered
Sharmin, Sadia, Daniel Zingaro, Lisa Zhang, and Clare Brett. 2019. “Impact of Open-Ended Assignments on Student Self-Efficacy in CS1.” Proceedings of the ACM Global Computing Education Conference 2019, 215–21. https://doi.org/10.1145/3300115.3309532. Sigel, Efrem. 1982. “The Future of the Book.” In Books, Libraries, and Electronics: Essays on the Future of Written Communication, 9–31. White Plains, NY: Knowledge Industry Publications. Trunzo, James V. 1987. “Amnesia.” Compute! 9(5): 46. Tyler, Jenny, and Les Howarth. 1983. Write Your Own Adventure Programs for Your Microcomputer. London: Usborne. Wikipedia. n.d. “Amnesia (1986 Video Game).” Retrieved July 3, 2021, from https://en.wikipedia.org/w/index.php?title=Amnesia_(1986_video_ga me)&oldid=996854346.
CHAPTER 2
Evaluation ---
The next stage in Carver’s field research procedure is evaluation. In a physical setting, having reconnoitered a site, evaluation might entail scouring the site’s surface or digging some small sample test pits to gain a better-informed view of where the eventual excavations should take place and what they might reveal. Evaluation can be seen as gauging the likely return on investment that excavation will yield and shaping the research questions that are asked.1 We need to consider how physical evaluation translates into the digital realm, however. Prior to reverse engineering, it would not be out of place to perform a quick and—compared to the excavations later—superficial assessment of a binary image. Here I will be evaluating Amnesia or, more precisely, Amnesias plural using extant tools to glean information about the game images, and to understand how easy/hard later excavations are likely to be; this information will also help refine or outright reject some research questions going forward. But first, I need binary images of the game to examine. While I have physical copies of the disks, it is necessary to read those disks and make the binary data available before they are useful for evaluation and excavation, and that requires additional equipment. Extracting data directly from the physical disks has the advantage of producing binary images that have a known provenance, and those images will be the eventual target of our reverse engineering. But there is an even faster route: it is hardly a secret that copies of software may be found on the Internet. Pirated software, realistically, is part of the software ecosystem and is how some humans interact with software, making it worth archaeological study in and of itself. It is easy to imagine pirating software as a shadowy, underground activity; but in fact it was commonplace in otherwise-reputable organizations at the time of Amnesia’s release and was a threat to software publishers’ profits,
32
Amnesia Remembered
Figure 2.1. Crack screen for “Dragon Lord” version of Amnesia. Emulator screenshot by author.
leading to elaborate copy-protection schemes that we will see later. There was even a magazine, Hardcore Computing (later the Computist), devoted to publishing instructions on how to defeat software copy protection under the guise of anticensorship and allowing “the user to make legal backups of protected disks” (Haight 1981: 2). Sidestepping copy protection is still of interest today and overlaps with modern software preservation efforts. Working with cracked versions of software needs to be approached with caution, technically speaking, because changes may have been introduced into the binary image. Game introductions present in the original may have been omitted for space reasons, for example; this would potentially allow multiple games to reside and be distributed on the same disk. Cheat keys may have been added to the game. Most visibly, crack screens might be added that credit the software cracker(s), or more elaborate, animated crack intros could appear (Reunanen, Wasiak, and Botz 2015). Figure 2.1 shows the crack screen present on an Apple II version of Amnesia cracked by “Dragon Lord” that I will be using for some analyses. The Dragon Lord crack I found is definitely incomplete, being comprised of only two disk images—Amnesia had two physical double-sided disks, meaning there should be four binary
Evaluation
33
Figure 2.2. Apple IIe computer system with Amnesia floppy disks for scale. Photo by author.
disk images, since the Apple II disk drives could only read one side of a physical disk at a time. I also found a separate four-disk crack of Amnesia for the Apple II, without credits, as well as a Commodore 64 and an IBM version. Unless otherwise stated, I will be focusing on the Apple II version of Amnesia in this book, as it was the first version of the game that was developed, with the other platforms serving for the occasional comparative analysis. Are these cracked versions of Amnesia functional? If we had original copies of Amnesia on floppy disks, we could answer that question directly by trying to use them with a real, physical computer system. In my case, I happen to have a working Apple IIe system sitting behind me as I write this (figure 2.2), but I am led to believe that I am atypical in this regard. Binary disk images, on the other hand, are most readily used in software emulators.2 A software emulator is software that aspires to faithfully imitate enough of a platform to run the platform’s software, and this will let Amnesia run on what appears to it to be an Apple II, for instance, whereas it is really running inside Apple II emulator software on a modern desktop computer. For archaeologists unused to working with digital artifacts, emulators may appear to be a substantial compromise, and while real hardware does a better job
34
Amnesia Remembered
of conveying the experience of using software—the (lack of) speed, the tactile feel, the sights, the sounds—real hardware is largely useless for reverse engineering. Real hardware may provide very little, if anything, in the way of analysis tools; and in any event, copy-protected software is expressly designed to thwart reverse engineering from taking place on the platform itself. Emulators, by contrast, provide the reverse engineer a view of Amnesia that is omniscient with respect to the platform, and many incorporate a “debugger” feature that will allow the code and data to be examined, poked, and prodded as it runs virtually inside the emulator. For the work described here, I used multiple emulators: MAME (for the Apple II primarily, but also for the Commodore 64 and the IBM PC), VICE (Commodore 64), and DOSBox (IBM PC). This is far from a complete list of emulators; I chose these because they emulate the platforms of interest, clearly, but there are other factors to consider. These emulators are cross-platform and run on my computer (whereas others might be Windows- or Mac-only), I have experience using them and understand the reliability and limitations of their emulation, and they have debugging facilities that I can use for analysis. Pragmatically, I already have them installed on my computer—cajoling an emulator to install and run the first time is not always a frictionless experience. The in-emulator functionality tests for these unofficial versions of Amnesia give both the first experience of playing the game in addition to raising some potential research questions. Both Apple II versions start in MAME, and a description early in the game invitingly mentions a pen near the hotel room phone, yet when the command GET PEN is issued, the game replies, It is not here. While this sort of interaction would be de rigueur in poor-quality text adventure games, it seems like a surprising failure for Amnesia, and in fact the IBM version running in DOSBox permits the pen to be acquired—a curious difference that could be explored, although it seems too minor to raise to the level of a research question. The Commodore 64 version running in MAME sports an animated crack intro: an eagle’s head occupies the bulk of the screen, a not-toscale floppy disk clutched in its beak. Below, a horizontally scrolling text marquee displays credits (“CRACKED BY EAGLE SOFT INCORPORATED ON AUGUST 19TH, 1987”) and Rush lyrics (“Jacob’s Ladder”) to musical accompaniment.3 In MAME, we get no further; the Electronic Arts logo flashes briefly before game loading freezes or the computer’s BASIC language prompt reappears. Using the VICE emulator, however, loads and runs Amnesia correctly, and the pen is GETtable.
Evaluation
35
This is one reason why the use of multiple emulators can be necessary, because perfect emulation is not a straightforward task, and software may work in one emulator but not another. During reconnaissance, James Terry mentioned that the Commodore 64 version had code running on the disk drive in addition to the computer (Aycock 2019), and the crashing could be a side effect of MAME not emulating this correctly. If the scope of the project were extended to non-Apple-II versions of Amnesia, one research question might involve verifying Terry’s recollection in the primary source—the game code. Other subtle differences in the versions of Amnesia manifest themselves. The legitimate, non-crack game credits vary slightly from version to version, as does the Electronic Arts logo; the monochrome graphic logo in the Apple II version is replaced by a color-cycling logo on the Commodore 64 and no logo at all in the IBM version. In-game, the player finds a computer in their amnesiac avatar’s hotel room, and the type of computer matches the computer the player is using. Recounting this find in the game implies I have played it, but only to a limited extent. Being able to play a game well, or at all, is not a prerequisite for reverse engineering, and what progress I did make in the game was due to an online walkthrough solution (“Amnesia Walkthrough” n.d.). In archaeological terms, playing the game might be likened to an exploratory field-walking survey; I have treated it as part of the evaluation stage because it was done in concert with the process of running Amnesia in-emulator, although where easily playable versions exist (using the Internet Archive’s ability to play games in a web browser, for instance) it might be partially conducted during reconnaissance instead. The crucial thing that results from the ability to run Amnesia in an emulator is that it unlocks a wealth of reverse engineering techniques. Static analysis refers to studying code and data without running it, and were Amnesia unable to run in an emulator, we would be restricted to static methods. Dynamic analysis, by contrast, lets the code and data be analyzed, and even manipulated, as it runs inside the emulator. Ultimately we will be using both static and dynamic approaches, switching back and forth between them to best answer the questions we pose. The interface we will use to perform dynamic analysis is the emulator’s debugger. While debuggers tend to have some common features, there is no standardization, and even the same feature in different emulators will be expressed differently. For example, the debugger command bpset 1234 would tell the MAME debugger to stop running the game when it reaches memory address 1234, whereas VICE’s debugger would expect break 1234. This is another reason to have multiple emula-
Figure 2.3. MAME debugger view of Amnesia. Screenshot by author.
Evaluation
37
tors at the ready because a debugging feature to assist analysis might be supported in one but not the other. Figure 2.3 shows how Amnesia looks in MAME’s debugger: there is a window displaying what the Apple II screen would contain, along with a debugger window (left) conveying the emulated computer’s state and what instructions it is running, and finally a separate window (lower right) with a view of a region of the computer’s memory contents. A final reason to have multiple emulators for the same platform is because of file formats. Binary disk image files can be stored in different ways, much as word processing documents might have the same content in either .doc or .docx format, and a lesser-known corollary of Murphy’s Law is that the emulator of choice never seems to support the requisite disk image file format. The Apple II disk images mentioned so far are all in .dsk format, luckily, which is commonplace. One logical way of viewing an Apple II floppy disk is as a consecutive sequence of sectors, where each sector is a discrete chunk containing 256 bytes of data, and the .dsk file format simply captures that sequence of sectors and their data. This format cannot encompass the additional information copy-protection schemes will be looking for, which correspondingly restricts the viable research questions we can pose, but the tradeoff is that the disk data is unobscured and readily available for analysis. Often in reverse engineering, an easy way to start is by looking for printable strings. A string is a sequence of characters, and a character can loosely be thought of as a single letter, number, or other symbol that is seen on the screen, although strictly speaking there are also “whitespace” characters like spaces and tabs and “control” characters that would not be directly visible. Printable strings in the binary image files often correspond to messages that are shown onscreen to players, and they can convey an initial idea of what’s present in the image. In other words, if a program prints Hello world! when it is run, it is reasonable to expect to find the characters H, e, l, l, o and so on stored consecutively somewhere in that program. A computer deals with numbers, however, not characters; each byte in our binary image file is a number with a value between 0 and 255. For a computer to work with characters, somehow all the possible characters must be mapped into numbers, a mapping referred to as encoding. There are many different character encodings, and in theory we could try them one by one; but given that Amnesia is an English text game from the 1980s, a logical place to start is the American Standard Code for Information Interchange, or ASCII for short.4 ASCII encoding, and variants thereof, was endemic in microcomputers
38
Amnesia Remembered
from that era, and ASCII lives on today within now-predominant Unicode encoding. Indeed, randomly selecting a computer book or manual from that time period yields a better-thanaverage chance of finding an ASCII encoding table in an appendix. Now, a quick Google search for “ascii table” suffices. In ASCII, there is a one-to-one mapping between characters and values: for example, the character 0 corresponds to the value 48 (not the value 0 as might be expected); A is 65; a is 97; space is 32. Programs exist that heuristically look through files for printable strings by searching for consecutive sequences of printable ASCII characters of some minimum length. This method tends to produce false positives galore, but as these programs are readily available—here, I have used the strings program on Linux— and they do find actual printable strings, their erroneous results can be tolerated at this stage. Figure 2.4 shows an excerpt of the output when we look for printable strings in the .dsk images. The strings all appear to be garbage, and this is a representative sample: no recognizable strings manifest themselves this way in any of the images. This completely negative result is unexpected. Amnesia is a text-heavy game, and the lack of any text whatsoever as printable strings suggests we must look for a more complicated encoding. It also presages the fact that not Figure 2.4. Excerpt all reverse engineering efforts will have a happy, of printable strings Hollywood ending, and patience and dogged in Amnesia .dsk persistence are good attributes for a reverse engiimages. Image neer to possess. Since the search for low-hanging created by author. fruit in our evaluation of the illicit .dsk images has reached a dead end, we can turn to the task of extracting binary disk images from our physical Amnesia floppy disks; these have a known provenance and will be the primary disk images with which we will be working. Floppy disk data is encoded as magnetic field fluctuations on the disk’s magnetizable disk surface. As the disk spins inside the floppy disk drive, the disk’s surface passes by a read/write head in the disk QKYHvt IN will execute next
>IN finished, @ will execute next current bytes at top-of-stack
Figure 9.5. Two steps of Forth execution visualization. Image created by author.
SAFE BIBLE. While this was good progress, I started to wonder if I could get a higher-level understanding of what the parser was doing. It was time for more traditional research. When I had interviewed him, James Terry mentioned, “I did find an ACM paper on text parsing that I recall was useful” (Aycock 2019). “ACM” is an abbreviation for the Association for Computing Machinery, a large, old organization (in computer science terms!) for computing professionals, and they publish a variety of academic journals and conference proceedings. Initially I had thought Terry might have been misremembering, because I knew of a 1979 paper published in an IEEE journal—IEEE is another large professional organization with a focus on engineering—that spoke in part about how Infocom games parsed their input (Lebling, Blank, and Anderson 1979). It would be easy to conflate the two after so many years. However, the Infocom paper lacked enough detail that would permit me to make a connection to Amnesia’s parser beyond mere conjecture, and I decided to follow the ACM lead. The flagship publication of the ACM is called Communications of the ACM (CACM), which as an ACM member myself I know is sent to
Parser
133
ACM members monthly; it seemed as promising a place as any to start when looking for a needle in ACM’s haystack of publications. I began a brute-force search of all issues from 1980–85, imagining that to be the span of years immediately prior to the time when Amnesia’s parser would have been developed. Going through that set of CACM papers one by one led to only a single possible candidate, a paper describing “a large and complex grammar used in an artificial intelligence system for interpreting English dialogue” (Robinson 1982: 27). Even then, that paper was quite far removed from Amnesia. I followed its references on a whim to earlier publications and discovered a CACM paper on something called “transition network grammars” (Woods 1970). Reading Woods’ paper produced feelings of déjà vu after studying Amnesia’s code because the two bore glaring similarities: for a start, the first example of a “transition network” in the paper showed graphbased state machines whose node labels were a combination of sentence structure (NP and PP, for instance) along with numbered states prefixed with “Q.” Moreover, the point of the paper was really to introduce what Woods called “augmented transition networks” that added the ability to conditionally traverse graph edges, not unlike how I was seeing the ?-prefixed Forth words being used, along with stashing values in “registers” as the parsing unfolded—again something done by the Amnesia parsing code. Treating Amnesia’s parser as an augmented transition network provided the high-level model I was seeking. I later inquired with Terry about the paper, and he thought that was indeed the design inspiration.7 Armed with my increased knowledge of the code’s operation from the real-time visualization and the lens of the augmented transition network model, I was able to reverse engineer the model of the parser as a graph-based state machine. It was apparent that the graph was going to be far too complex to draw out by hand, so I entered the description into an automated graph-layout program, and the result is rendered in figures 9.6a and b. It is not terribly legible at this size; the important thing to notice is that the parser model is complicated, aligning with what we saw written about the parser during the reconnaissance stage. Some intermediate states had to be introduced to match the reality of how the Forth code operated, but apart from that, my translation of the Forth parser code into the graph was a mechanical process. For example, the code for the Forth word [S] (sentence) begins with a conditional test for a ?VErb and, if true, calls ADVance then Q1. In the graph, this is represented by two nodes, [S] and Q1, connected by an edge labeled “?VErb *” (for brevity, the asterisk denotes a word being consumed by ADVance).
Figure 9.6a. Amnesia parser modeled as an augmented transition network, part 1. Image created by author.
Figure 9.6b. Amnesia parser modeled as an augmented transition network, part 2. Image created by author.
136
Amnesia Remembered
I could now use the graph model in addition to the code to predict how different player inputs would be parsed by the game. There were some curiosities, although not all would necessarily have been experienced during normal gameplay. Only the first noun of a sequence was retained, with the effect that GET KEY BIBLE PEN only gets the key. Punctuation was removed from the player input, which could also trigger the noun-sequence issue: GET PEN, BIBLE, KEY only takes the pen, whereas GET PEN, BIBLE, AND KEY just acquires the pen and key. TAKE ALL BUT KEY takes the key, of course. Despite these quirks, the parser was fairly sophisticated overall. There is one Forth word in the parser of special note, and it is invoked at the very end of an [S] sentence. In the decompiled code, the name of this word is KLU___, and I contend that it was originally KLUdge, a derogatory term for an ugly but functional solution to a problem. The KLUdge code looks for cases where the parser has misclassified input words and shuffles the words around to the correct spots, presumably as opposed to fixing the parser itself. Again we see the choices the Amnesia developers made speak to us through the code. We have lingered for some time now on the reverse engineering of Amnesia’s input and output, which, as a text adventure, are arguably some of the most important aspects of the game. They are not the only aspects, though. We have largely ignored the physical aspects of Amnesia insofar as its placement and distribution on floppy disks, and for the last two excavations it is time to pivot to a more physical view. Notes 1. I see this as akin to a chosen-plaintext attack in cryptography (Schneier 1996), because we can select the input and observe the results of that input. 2. I have omitted the MEH___ PRI___________ that prefaced each string for clarity. 3. *SE_______________ might be *SEarch-vocabulary, in retrospect, but this is only speculation. 4. Referring to units of 1,024 bytes using “K,” which really means 1,000, was common during Amnesia’s time. The modern way to express this is using the newer unit “KiB,” where 1 KiB is exactly 1,024 bytes. 5. Copying the ROM contents from $f800–$ffff into RAM was done very early in Amnesia’s boot sequence and was easy to find by setting a write watchpoint on $f800, the starting address of the ROM where routines like COUT are located. 6. The SPE____ words were AND, ALL, BUT, EXCE__, THEN, &, and +. 7. James Terry, email to the author, October 4, 2020.
Parser
137
References Apple Computer, Inc. 1985. Apple IIe Technical Reference Manual. Reading, MA: Addison-Wesley. Aycock, John. 2019. “Interview with James Terry Re: Amnesia.” TR 2019-111204. University of Calgary, Department of Computer Science. DOI:10.11 575/PRISM/36756. Brodie, Leo. 1987. Starting Forth, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall. Lebling, P. David, Marc S. Blank, and Timothy A. Anderson. 1979. “Zork: A Computerized Fantasy Simulation Game.” IEEE Computer 12(4): 51–59. Moore, Edward F. 1956. “Gedanken-Experiments on Sequential Machines.” In Automata Studies, ed. C. E. Shannon and J. McCarthy, 129–53. Annals of Mathematics Studies 34. Princeton, NJ: Princeton University Press. Robinson, Jane J. 1982. “DIAGRAM: A Grammar for Dialogues.” Communications of the ACM 25(1): 27–47. Schneier, Bruce. 1996. Applied Cryptography, 2nd ed. New York: Wiley. Woods, W. A. 1970. “Transition Network Grammars for Natural Language Analysis.” Communications of the ACM 13(10): 591–606.
CHAPTER 10
Finding Locations ---
Amnesia has multiple entangled layers of physical and digital locations. The player was expected to be situated facing, in our case, a physical Apple II computer system with certain properties: a floppy disk drive, a specified amount of RAM. Recall that the hotel room computer found in-game matched the type of physical computer the player was using, as one deliberately designed echo of the physical into the virtual. We have actually added another layer, because a physical Apple II computer is no longer needed to run Amnesia; the emulator supplants the physical computer system of old. Then, Amnesia itself strived to provide a digital simulation of physical Manhattan within the confines of 64K and four disk images. This means that somewhere, physically located on the two double-sided floppy disks that held Amnesia, were magnetic fluctuations containing the code and data that brought each of the digital Manhattan locations to life. Can we connect the fiction and the physical by identifying where each Amnesia location is found on the disks? We have digital versions of the physical floppies to work with, of course, and we will be using those images for this task. Furthermore, we know that Amnesia’s implementation takes a Forth-right view of its universe in the sense of treating floppy disks as a sequence of numbered, 1,024-byte blocks. Other applicable information has manifested during the course of reverse engineering, starting with the code fragments found earlier. The Forth QAZ word’s source code (figure 4.5) ends with START *GOTO, and “goto” is a command that causes an unconditional transfer of control to another location in a number of programming languages, like the BASIC language on the Apple II (Apple Computer, Inc. 1978). While *GOTO is not a standard Forth word, the name is certainly evocative of its potential purpose. Decompilation had also advanced, and I could see the decompiled code responsible for displaying “You wake
Finding Locations
139
up” and the game credits, a code sequence that ended in a similar way: 3BE_________ *GOto. Each of STArt and 3BE_________ had an associated value in the decompiled code, 0 and 1 respectively, and they were defined using a Forth word *NO_________.1 Looking for other words in the Forth code defined with *NO_________, it quickly became clear that STArt and 3BE_________ were not alone; in total, there were 121 of these, each with a different value between 0 and 120. It seemed plausible that these numbers corresponded to in-game locations, and I could do an experiment to prove it. In the debugger, before the 3BE_________ *GOto could be run, I changed the value of 3BE_________ to 5, and suddenly I began Amnesia in the bathroom instead of the hotel room bed.2 What I did not know is if this effect was limited to a subset of locations, such as those on the same disk image. As a second experiment, I arbitrarily chose a high-numbered location with an impressive-sounding Forth word name (FBI), and was greeted with the prompt Insert the f lip side of disk 1. A virtual disk change later and I was starting the game in the F.B.I. Club. I could teleport myself to different locations in Amnesia. With my newfound ability, I could perform an initial, coarsegrained mapping of game locations to disk images. The procedure was to boot the game, teleport to a numbered location, and observe the effect: if it succeeded without a disk-change prompt, I could assume the location was on disk 1, side A, the disk image from which I had booted. If I was prompted for a different disk, that was the location of the location. The only uncertain cases were where I saw no disk prompt and the teleportation seemed to fail, and I would be presented with a blank screen and no prompt. It is hard to determine why teleportation fell short without further investigation, because this teleportation procedure treats Amnesia as a “black box” on the whole. For example, the code at a failed location may have expected variables in the code to have been set up in a certain way, perhaps indicating that specific objects had been acquired or actions had been performed, and that did not happen when I sent the game directly to the location. Most locations did not fail, fortunately, as the results in table 10.1 show. The final three disk images possessed a fairly balanced number of locations, and the lower number found on the first disk image might be explained by it needing to contain the code and data to boot the game. I have not explained why I thought the locations’ code and data were spread across the disks, even though these empirical results provide weak evidence for that notion. It could be, for instance, that all the
140
Amnesia Remembered
Table 10.1. Empirical mapping of location numbers to disks. Disk 1, side A
20
Disk 1, side B
29
Disk 2, side A
33
Disk 2, side B
29
Unknown
10
text message data printed by the game was centralized in one place, as was the case for the text adventure Mystery House (Aycock and Biittner 2019). However, the program I had written to extract encoded strings from .dsk images showed the game’s text distributed throughout; the decompiler could now decompile pieces of Forth code from .dsk images and not just saved Apple II memory contents, and I could see portions of code for different parts of Amnesia’s storyline in different disk images. Somehow the game’s code had to know where to find the locations on disk, and I suspected there was an internal data table that contained that information, were I to reverse engineer enough of the code. But I thought it would be interesting to build on teleportation and try to get the game to reveal where the locations were on disk. My scheme relied on three assumptions. First, that when Amnesia *GO[es]to a numbered location, that locations’ necessary code and data would be loaded into the Apple II’s memory from (a) disk. Second, that the location’s code and data would be largely kept together in memory as a discrete chunk and not dispersed in small morsels throughout. Third, that the location’s code and data would not be changed in any substantial manner when placed into memory. For example, this final assumption would be violated if addresses embedded in the code were altered to reflect where the code had been reloaded into memory, a technique used to run programs in other situations (Levine 2000). The idea was to capture two Apple II memory images for each location: one taken before teleportation, and one after. I could then perform differential analysis on the two memory images to find out what had changed, which according to my assumptions should be related to the new location I had teleported to. Then, I could use the bytes in the changed area(s) as a signature to locate where on the .dsk files they had originated. Conceptually this is straightforward, but as always, the devil is in the details. With 121 numbered locations, I wanted to automate the process as much as possible, and I
Finding Locations
141
took the time to write a program that would start Amnesia in the emulator, issue the appropriate debugger commands to teleport on my behalf, swap the emulator’s disk image if necessary, and save the postteleportation memory image. This usually worked as a labor-saving device, although the debugger’s cue to save the second memory image was the player being prompted for input; as I mentioned, that did not occur at all locations, and sometimes I would have to intervene and manually save the post-teleportation memory. Then there was the matter of comparing two saved memory images. Some memory areas, like the memory for the Apple II’s text display, would change with the different text displayed, and those areas were excluded from consideration. For the remainder, I took a heuristic approach to try and ignore minor, isolated changes that would be unrelated to teleportation—such as individual variables’ values naturally fluctuating—and yet still spot wholesale memory changes. I wrote a program to step through both memory images 16 bytes at a time, and if at least 14 of the 16 bytes were different between the two, that 16-byte chunk was marked for later reference. Then, if eight or more consecutive 16-byte chunks were marked, the final verdict was that memory area had changed. The bytes in the areas thus decreed “changed” could be searched for piecewise in the .dsk images. This is certainly not a perfect method and would not properly handle instances where pre-teleportation memory contents had simply been shifted to a new place in memory; a more sophisticated algorithm for computing differences would be required to take that into account. Still, the results were acceptable even with a simple method and without extensive tuning. Figure 10.1 shows the numbered locations that were mapped to disk 1, side B. One observation, looking at the projection of used disk blocks below the X-axis (in gray), is that many disk blocks’ contents are not accounted for. Another observation is that the plot is quite sparse: when a numbered location is detected, there are only a handful of disk blocks the location is mapped into, considerably narrowing the range of disk blocks for further analysis. There are, however, some duplicate matches. Block 0 has parts of four distinct numbered locations mapped to it, for instance; whether this is a false positive or whether all four locations do draw upon the contents of that block is a good follow-up question. Overall, only one numbered location apart from STart was unmappable to some disk block, but it is reasonable to wonder how accurate the results are. As an example, let us focus on numbered location 119.
142
Amnesia Remembered
120
100
Numbered location
80
60
40
20
0
0
20
40
60
80
100
120
140
Disk block
Figure 10.1. Automatic location mapping to disk 1, side B. Chart created by author.
Teleporting to location 119, the player is approached by an innocuous-looking man asking for directions. Amnesia reminds the player, who has assuredly purchased the game, that the game packaging comes equipped with an X-Street Indexer “for just this very purpose.” Answering the character’s question incorrectly twice is fatal, as it transpires that the man is armed and takes a dim view of misdirection. This is obviously part of the game’s copy protection, based on the supposition that someone copying the game would not have a replica of this physical device or its information. Location 119, where the player is presented with this challenge, was mapped to two different areas on disk 1, side B; it is the topmost row in figure 10.1’s plot. I extracted the two regions of identified bytes from the .dsk image. Using my program to search for and print encoded strings, I discovered the first set of .dsk bytes held the street names listed on the X-Street Indexer, and within the second set of bytes was the dialogue with the easily perturbed man. The findings were additionally confirmed by decompiling those regions. The automated mapping thus seems to provide some good starting points even without recourse to reverse engineering the code or data. Having said that, finding the ground truth would require reverse engineering. I used my real-time Forth code execution visualizer to step through the code for *GOto, and during the code’s stack manip-
Finding Locations
byte order
$78
143
Expectation
Reality
$12 34 56 78
$12 34 56 78
$56
$34
$12
$34
$12
$78
$56
Figure 10.2. Some possible little-endian byte orderings for 32-bit values. Image created by author.
ulations, I spotted the value $c38 being used in a manner suggesting that it was the start of a table with six-byte entries. Flipping from dynamic to static analysis, I searched for the $c38 value in the decompiled code and found that address had 1,020 bytes copied there when Amnesia started up—1,020 is the largest multiple of 6 bytes that would fit in a 1,024-byte Forth disk block. This increasingly looked like a potential location mapping table. I studied the bytes at $c38 in the debugger and noticed that the first byte in each six-byte entry often seemed to correlate with the coarse-grained mapping I had done initially to link numbered locations with disk images, although not always in an obvious, direct manner. For instance, some locations I had placed on disk 2, side A would have the value 4 in the first byte of their six-byte entry, whereas others would have 5. Perhaps these entries had information packed into bits that was throwing the values off what I expected. I reran the visualizer, now with a focus on watching how the values from this alleged table were being used, and I discovered that the first four bytes of each table entry were being treated as 32-bit numbers. Previously when I referred to little-endian values, it was always 16-bit values that would fit into two bytes, and two bytes only have two possible orderings. Little-endian systems can exhibit more variety when representing 32-bit values, though (Cohen 1981). Figure 10.2 shows how a 32-bit little-endian value might logically be expected to be laid out in memory on the left. On the right side of figure 10.2 is how I was actually seeing the 32-bit number in the table being stored.3 The first byte of each table entry, which related somehow to the disk image, was in reality the middle byte of a 32-bit number. To make this more concrete, the first four table entry bytes for location 119 were $02 $00 $99 $a9, which after byte swapping was the 32-bit value $0002a999. Amnesia was using this value as a “virtual address” of sorts that could identify any spot on any of the four disk images. Translat-
144
Amnesia Remembered
address in .dsk image 00007DC0 00007DD0 00007DE0 00007DF0 00007E00 00007E10 00007E20
bytes in .dsk image D3 78 D9 A3 D2 D3 01
04 07 59 04 45 15 D8
address of second disk area
00 D3 48 D0 AB 00 4E
AA 02 2A 45 00 AA 45
D2 00 A3 D0 00 D2 D0
start of Forth code
87 59 AA 07 07 78 2A
49 48 49 D3 D3 4F A3
16 2A 0A 02 1F EC 07
00 A3 00 00 02 D2 D3
07 AA 07 2A D8 BA 16
D3 49 D3 A3 4E A9 00
63 0A 02 AA 45 00 AA
00 00 00 D2 D0 00 D2
59 67 59 78 2A 07 78
48 78 48 4F A3 D3 4F
67 7F 2A EC 07 8B EC
length of second disk area
Figure 10.3. Searching for a disk address and length in a sea of bytes. Image created by author.
ing the virtual address to a disk number was a matter of dividing it by the maximum number of bytes on a disk, which for the Apple II was 1,024 × 140, and the quotient of 1 signifies disk 1, side B;4 the remainder from the division was the position in that disk image. I wrote a program to print out the location mapping table at $c38, and for the known coarse-grained mappings summarized in table 10.1, all matched perfectly with the $c38 data. Upon closer examination, another reason for teleportation failure became apparent: some of the locations were missing, their entries in the $c38 table were all zeros, and this accounted for all but one of the “unknown” locations. What was interesting was that these missing numbered locations had been assigned names in the code, and this might be part of the intended content that was cut due to disk space limitations (Aycock 2019). It was also clear that multiple numbered locations could coexist in the same Forth disk block, and that taken together with the 32-bit virtual disk address suggested the division into Forth disk blocks was a lower-level implementation detail; otherwise, we might have expected to see the numbered locations all align perfectly with disk block boundaries. Comparing the automated, fine-grained location mapping results to the $c38 ground truth, there were definite discrepancies. Some occurrences are expected, given the heuristic approach, but there were sizable portions of disk blocks attributed to numbered locations that were not shown in the $c38 table. Returning to the running example of location 119, the automated mapping had found it in two distinct ondisk areas, yet the $c38 table only tracked one area for each location. At the same time, decompilation and string decoding had verified that
Finding Locations
$ec $d2
?
?
?
address of disk area
?
$07 $d3
145
?
?
$d8 $4e
length of disk area
Figure 10.4. Byte pattern to heuristically identify disk-loading code. Image created by author.
both the areas were legitimate—what was going on? Because the differential analysis had found both areas in memory, Amnesia must have loaded that second portion from disk, separate from the $c38-based load. I went back to the visualizer and watched the code reading from the disk and saw where the second area was being read, which the visualizer showed me had a disk address of $ab45 and a length of $021f. I suspected there was Forth code in the first loaded area that caused the second area to be loaded, and to find it I searched for those values statically in the .dsk image, appropriately byte-swapped to $45 $ab and $1f $02. Sure enough, that was exactly what I found (figure 10.3), preceded by a $d045 that indicated the start of some Forth code. It appeared that the $c38 table probably specified the primary mapping to disk for each numbered location, with other necessary portions loaded into memory on demand later. In terms of deciphering the mapping of numbered locations to the disk images, finding disk loading performed by code would normally sound a death knell. Having locations specified in a table with a fixed structure like the one at $c38 is relatively easy to decode and map, but code can be arbitrarily complicated, and in general there would be no way to identify all the disk-loading code. However, I noticed the Forth code to load these additional areas from disk was formulaic and followed a distinctive pattern (figure 10.4), and I enhanced my $c38-printing program to look for these patterns and extract data from them. A second instance of this pattern may be found in figure 10.3, in fact, with disk address $a9ba and length $018b. While this pattern-searching is a heuristic approach and might miss some disk loads, in practice it mapped out the majority of the .dsk images’ contents. Figure 10.5 shows the number of bytes that are accounted for this way in the blocks of disk 1, side B, over 82 percent on this one disk alone, and each of those bytes is associated with a single numbered location. The answer to whether game and disk locations can be connected is a resounding yes.
146
Amnesia Remembered
1024
Number of bytes used
768
512
256
0
0
20
40
60
80
100
120
140
Disk block
Figure 10.5. Disk bytes accounted for on disk 1, side B. Chart created by author.
There are two conspicuous, location-related physical/digital tieins in Amnesia that both involve physical objects. First, the game is packaged with a fold-out Street and Subway Map to Manhattan, which, if the packaging’s inner liner notes are to be believed, is provided for orientation purposes: a physical map of physical Manhattan, to facilitate navigating a virtual Manhattan. In a digital nod to the physical map, the in-game hotel lobby has a “pile of street maps,” which reveal they are “compliments of the hotel,” consistent with the inscription on the physical map. Second is the aforementioned X-Street Indexer, a physical device referencing physical yet virtually simulated places, to prevent unauthorized digital copies of content that is stored on physical floppy disks. It is fundamental enough to Amnesia that the corresponding in-game Indexer starts as part of the player’s inventory even though the player initially begins sans clothes, and the Indexer cannot be DROPped. Any attempts to LOOK at the Indexer break the fourth wall to announce that the Indexer is “part of your game’s packaging.” The physical Indexer is a method of copy protection known as a code wheel. The one in Amnesia has two layers, although the code wheels in other games could have up to three (Aycock 2016). Figure 10.6 shows the Indexer, turned to answer the in-game challenge: “1910 Broadway.” The dash beside Broadway is aligned with the one indicating 1900–1999, and the answer to supply to Amnesia (115) is
Finding Locations
Figure 10.6. X-Street Indexer code wheel in use. Photo by author.
Figure 10.7. Second layer of X-Street Indexer. Photo by author.
147
148
Amnesia Remembered
shown in the window on the right of the Indexer. Peering behind the upper wheel, all possible answers are visible as a ring of numbers (figure 10.7). The X-Street Indexer by itself arguably would have been sufficient copy protection, in theory, but the Indexer was only one copy protection method employed. The copy protection in Amnesia was an elaborate affair. Notes 1. Maybe *NOde-global, given the GLOBAL OR ACTIVE NODE ? entreaty to the developer left in the code (figure 4.3), but other completions (e.g., *NOde-def ine) would also fit. 2. I knew the address of 3BE_________’s value based on information from the Forth dictionary found earlier. 3. This byte ordering could be referred to as “middle-endian” (Raymond 2003). 4. Each disk had 140 Forth blocks, each of which was 1,024 bytes. The quotients 0 and 1 identified the two sides of disk 1; 2 and 3 were for the two sides of disk 2.
References Apple Computer, Inc. 1978. Applesoft II BASIC Programming Reference Manual. Reading, MA: Addison-Wesley. Aycock, John. 2016. Retrogame Archeology: Exploring Old Computer Games. Cham, Switzerland: Springer. ———. 2019. “Interview with Don Daglow Re: Amnesia.” TR 2019-1113-05. University of Calgary, Department of Computer Science. DOI:10.11575/ PRISM/36733. Aycock, John, and Katie Biittner. 2019. “Inspecting the Foundation of Mystery House.” Journal of Contemporary Archaeology 6(2): 183–205. Cohen, Danny. 1981. “On Holy Wars and a Plea for Peace.” IEEE Computer 14(October): 48–54. Levine, John R. 2000. Linkers & Loaders. San Francisco: Morgan Kaufmann. Raymond, Eric S., ed. 2003. “Middle-Endian.” The Jargon File, version 4.4.7. http://www.catb.org/jargon/html/M/middle-endian.html.
CHAPTER 11
Copy Protection ---
The most fascinating and challenging part to reverse engineer has been left for last. We saw that Amnesia’s copy protection included a physical component with the X-Street Indexer, and we know from the failure of multiple copying programs during the initial evaluation stage that some form of on-disk protection exists; it is time to digitally excavate the latter. Any difficulties posed when reverse engineering other parts of Amnesia were simply an inadvertent side effect of starting from a collection of bytes from which we had to derive meaning and structure and of the game’s implementation that blended assembly language with interpreted Forth code. On the other hand, copy protection code is designed to make the reverse engineering and analysis that we are attempting deliberately hard. We can expect that the copy protection code will be opaque and perhaps purposely misleading. As always, we need to know where to look. One method to employ is boot tracing, where we would watch the game step by step as it booted, under the assumption that the copy protection code would run and reveal itself prior to the game starting. Strategically, boot tracing is a direct assault on the copy protection code and one of the analysis methods that its code would be crafted to contravene. It is important to keep in mind that what we are doing is not an impossible task, though. We are in an excellent position to witness the inner workings of the copy protection from the emulator’s debugger, and apart from that, copy protection does not afford absolute security; it only delays the inevitable. We see proof of this in the fact that cracked versions of Amnesia exist, and this brings us back to the Dragon Lord version of Amnesia I found during evaluation. We have a potential shortcut for determining where the copy protection code is in Amnesia, because presumably it was bypassed or otherwise disabled in the
150
Amnesia Remembered
cracked version; we can perform some differential analysis between it and my own .dsk images to ascertain exactly what was changed. Using fine-grained, sector-by-sector MD5 hashes of disk 1, side A, I could see that of the 560 sectors on the .dsk images, only 18 differed between the two versions. Some changes would be related to the addition of the crack screen and would have no bearing on copy protection; we could embark on a more detailed analysis of all the changed sectors to find the relevant one(s), but there was no need, thanks to traditional research. In 1987, the Computist magazine published a method to bypass Amnesia’s copy protection by changing exactly one byte on the disk (Wiegley 1987), information referred to euphemistically as a “softkey” in the Computist’s parlance. It would be more precise to say the published method allegedly bypassed the copy protection, because three issues later an amendment appeared, citing the same problem with GET I had noticed when playing the Dragon Lord version of Amnesia: “you couldn’t take anything” (Wiegley 1988). Whatever the copy protection was doing, we have proof that it tripped up at least two people. The sector location where the original, one-byte softkey surgery was to be performed matched with the location of one of the changed, outlying Dragon Lord sectors. I compared the 256 bytes in that sector with the corresponding bytes in my .dsk image—this sounds tiresome, but can be done visually almost instantaneously by flipping back and forth rapidly between the two sets of hexadecimal bytes. Figure 11.1 highlights the three-byte difference, and while the changes occurred in the same disk sector, the Dragon Lord crack was distinct from the single-byte softkey, and the copy protection was effective in guarding against both. Looking at only the stream of bytes, we cannot say with absolute certainty what has been altered by Dragon Lord, because we do not know if the $20 at the beginning of $20 $ff $37 is the start of an instruction. Having said that, I suspected it was: the value $20 is the first byte of a 6502 jsr instruction, and that would mean the three bytes were jsr $37ff.1 If so, the $18 $60 that Dragon Lord replaced them with is the pair of instructions clc and rts. The rts instruction we have seen already to return from a subroutine, and the clc is a 6502 instruction that clears the carry bit. In addition to the 6502 registers described so far, the 6502 had a “status” register that contained a number of Boolean values, among which was the carry bit. The carry bit would be implicitly used by some instructions, signifying a carry from an addition operation, for instance; but the carry bit could also be used on its own where a single-bit value was all the programmer
Copy Protection
151
Dragon Lord version 00009390 000093a0 000093b0
c0 10 fb 25 50 85 50 88 60 03 18 60 de b0 03 20 e2 39 ad e0 c0 ad e2 c0
8e 4d 38 10 e9 18 a2 01 04 37 a9 00 6d e2 39 8d ad e4 c0 ad e6 c0 60 03
c0 10 fb 25 50 85 50 88 60 03 20 ff 37 b0 03 20 e2 39 ad e0 c0 ad e2 c0
8e 4d 38 10 e9 18 a2 01 04 37 a9 00 6d e2 39 8d ad e4 c0 ad e6 c0 60 03
My version 00009390 000093a0 000093b0
Figure 11.1. Hex byte comparison between the Dragon Lord and my .dsk versions. Image created by author.
required. For example, conveying whether a copy protection check was successful or not is a single-bit, Boolean value. It is possible that Dragon Lord’s three-byte patch sidestepped a copy protection subroutine being called with jsr and hardwired it to always claim success via the carry bit value. In any case, speculation about the nature of the three-byte change was not necessary yet because those three bytes only needed to serve as a signature to search for to help find the copy protection code. I had already saved the Apple II’s memory contents at different points throughout Amnesia’s startup during my earlier reverse engineering, and I looked through those for the $20 $ff $37 sequence. Interestingly, it was not present when the Forth interpreter first dispatched an interpreted instruction at $d016, but it was present in memory when Load a saved game? appeared. This delay suggested two things. First, there might be some Forth connection to the copy protection code if it had not been run prior to the interpreter’s start. Second, boot tracing would not have been an effective approach for Amnesia, because we would have needed to wade through the unintentional obfuscation of interpreted Forth code to find the game’s copy protection. The $20 $ff $37 was located in memory at $384e. To discover what it was, a debugger watchpoint could catch all reads from that address, or a breakpoint could see the more specific case when the read was done by the 6502 to execute the bytes as an instruction. A watchpoint would cast the net wider; the breakpoint was more direct, except it might fail if the bytes were not an instruction. I was fairly confident that the $20 was a jsr instruction, and I chose the breakpoint. I was correct, as it happened: the debugger stopped precisely at a jsr at $384e, surrounded by instructions that used Apple II memory loca-
152
Amnesia Remembered
tions for controlling the disk drive (Sather 1983), as one might expect of copy protection code for a floppy disk.2 To understand the context of how Amnesia’s code made it to $384e, I gathered an execution trace of its assembly code from $d016, when the Forth interpreter started, until $384e. Working backward through the trace, I could see a dramatic shift took place in the code addresses, as the 6502 transitioned abruptly from executing instructions in the $d000 range to instructions in the $3000 range. I knew the Forth interpreter was in the higher range, and the trace showed the jumping-off point from there to the $3000 range of addresses was three assembly instructions located at $4105, three instructions that configured some soft switches and then did a jsr to $3700. What was the link to the Forth code? I combined the debugger and my Forth code visualizer, running the visualization with a debugger breakpoint set at $4105, and it halted precisely at the last Forth word on the following line, /CH________: : CHE______ $14 BLOck DROp FLUsh /CH________ ; This gave me enough information to round up a cluster of Forth words likely related to copy protection, by searching for their names in the decompiled Forth code, starting with CHE______. The preserved letters of these Forth word names were suggestively CHE(ck something?), PRO(tection?), and COP(y protection?), so I seemed to be on the right track; furthermore, one called COP_____ was invoked by *GOto, positively connecting it to Amnesia’s teleportation portal. And one of the Forth words, CHE________, was particularly interesting. CHE________ began by adding together all the bytes between $3700 and $39df, inclusive, and comparing the result to $5669. If $3700 sounds familiar, that is because it was the target of the jsr instruction I had found at $4105 in the $d000–$3000 transition. The Forth code was computing an addition-based checksum on some assembly code, and in the realm of copy protection, nothing says “this code is important” like computing a checksum on it to guard against changes. We can now understand the choice of the third byte of the Dragon Lord crack: the instruction after the rts ($60) would never be executed, and that unused byte was selected to ensure that the post-crack code checksum remained the same. In the original version, $20 + $ff + $37 = 342, and in the Dragon Lord version, $18 + $60 + $de also equals 342. But CHE________ did not stop there. It also explicitly verified the destination addresses of three jmp and jsr instructions inside the as-
Copy Protection
153
sembly code that had already been checksummed. In other words, the copy protection anticipated that a cracker might be wise enough to make the checksum match, and expressly guarded against changes to certain critical addresses. Then, a second checksum was computed and verified some of the Forth code, including /CH________ and part of CHE________ itself.3 The checksums had helpfully highlighted critical code, and I followed up this lead by studying the assembly code at $3700. Static analysis of the disassembled code from $3700 showed many undocumented instructions at first glance. These could be data, or given that this was copy protection code, they may have been the result of a deliberate attempt to throw a disassembler off track. Ignoring those for the time being, I followed the execution path the CPU would take from $3700 onward and quickly found myself looking at code located at $38c3 that did a distinctive series of steps: 1. It loaded a byte from memory. 2. Using the loaded byte value, the code indexed into a table to determine an address to jump to. The address was used to modify a jmp instruction, just past the range of bytes whose values were used for the first checksum; this self-modifying code was clearly designed not to vary the checksum’s value. 3. The code jumped to the self-modified jmp. I could also see that there were many places in this general area of code that jumped back to $38c3, which would make these three steps reoccur. What I was seeing was the shape of a classical interpreter—another interpreter, separate from the Forth code interpreter, that was interpreting its own distinct language. A byte being loaded from memory in the first step was an instruction in the language. This also explained the prevalence of “undocumented instructions” I had seen in the disassembly, which were probably sequences of code in this new, non6502, non-Forth language. The use of custom interpreted languages is a known software obfuscation technique (Collberg and Nagra 2010), and its effectiveness stems from the fact that it forces a reverse engineer to decipher and understand code written in a completely new language, for which they have no extant tools. Further static analysis showed that the language had a scanty thirteen instructions, with a design meant to interoperate well with assembly code.4 For example, the interpreter had an instruction that would call a subroutine written in 6502 code. Any doubts about this interpreted language being used
154
Amnesia Remembered
for obfuscation were dispelled by the fact that instructions’ operands were concealed using a reversible logical transformation called exclusive-or. This would make telltale addresses and values in the interpreted code harder to directly see or search for. Something I could directly see was that the middle of the noninterpreted copy protection code contained a fragmentary admonition in Apple ASCII characters: DON’T BREAK T. Sometimes copy protection would leave embedded messages to would-be crackers, like the overconfident YOU’LL NEVER BREAK IT of Cannonball Blitz (1982), which, needless to say, was incorrect. In Amnesia’s case, I conjecture that this was not a cracker communiqué and was instead a reminder to either the copy protection programmer or the programmer integrating the copy protection into the game, or both.5 All the interpreter code for handling different instruction types had to be located within the same 256-byte page of memory, due to an assembly code optimization used by the interpreter, and the truncated DON’T BREAK T pads that code out to the start of a 256-byte boundary. An intriguing message, one that was probably a remnant of development. Messages in virtual bottles aside, I still had no way to easily read the interpreted copy protection code. The solution for a lack of tools is to create tools, and once my static analysis of the interpreter was complete, I wrote a disassembler that would provide a (more) human-readable version of the interpreted code. A portion of my disassembler’s output is shown in figure 11.2. Note that I had no idea what the instructions were named in the original code, and consequently the representation of the instructions shown here is purely my own invention. The language had a single register that could be used to store values, which I called A for “accumulator” following 6502 convention, hence the appearance of the 6502-esque lda instruction in my fictitious language. Disassembly in hand, I commenced static analysis of the interpreted code. The first nine instructions in figure 11.2 mirror something we saw in the checksumming Forth code: verifying that key instructions in the 6502 assembly code had not been changed. The $60 value is an rts instruction (as also used in the Dragon Lord patch), and the $38 values are for checking sec instructions to set the carry bit. These 6502 instructions were being used to indicate a copy protection error, and a cracker might want, for instance, to change the sec instructions to clc instead. If one of the three instructions had changed, more interpreted code at $388d would run. In my static analysis notes, I labeled the $388d code “where things go to die,” an infinite loop wiping out some contents of the computer’s memory, and this would ordinarily not
Copy Protection
155
warrant further comment were it not for two distinguishing characteristics. First, the interpreted code is self-modifying. There is an address in the interpreted code controlling the memory location being wiped, and the interpreted code increments this address as it runs. Because the address starts at Figure 11.2. Excerpt of disassembled $400 where on-screen text is code in the copy protection interpreter stored, there is a visible indicalanguage. Image created by author. tion that the copy protection check has gone awry. Second, the self-modifying interpreted code is buggy. The interpreted code self-modifying the address uses the wrong values and as a result is not wiping memory in the manner intended.6 This error is seen onscreen in that only three characters appear before the address reaches text-memory escape velocity. I fixed this bug using the debugger to hand-adjust the interpreted code, and figure 11.3 gives a side-by-side comparison of what the buggy error screen looks like versus how it should appear. Peeling away the copy protection interpreter’s obfuscation was a major step forward, although I still didn’t understand what the interpreted code was checking for on the disk and why. And, in order to know what the code was looking at, it helped to first know where on the disk was being read. Until now, we have seen disk sectors—and the Forth disk block abstraction built atop them—as a linear parade of data in .dsk files; but in reality, a disk surface is two-dimensional. The Apple II disk drive allowed the drive’s read/write head to be positioned in stepwise intervals, effectively dividing the disk surface into concentric circles called tracks, and each track held a certain number of sectors. As for how many sectors there were per track, the likely number was 16, the usual amount for Apple operating systems of the time (Worth and Lechner 1981, 1984). During evaluation, the unsophisticated copy program COPYA had copied three of four disks without complaint, implying a standard disk layout, and that number of sectors coincided with the analytics provided by the Applesauce disk-imaging software. In poring over the copy protection code, it occurred to me that I might already know where it was looking on the disk. Right before the descent into the code of the copy protection interpreter with the Forth word /CH________, there was an odd sequence of Forth code: 3773: 3776: 3778: 377b: 377e: 3780: 3783: 3786: 3788: 378b:
lda $3761 A = A - #$38 if A != 0 goto $388d lda $3762 A = A - #$60 if A != 0 goto $388d lda $3768 A = A - #$38 if A != 0 goto $388d call $37a8
156
Amnesia Remembered
Figure 11.3. Copy protection error screen, actual (top) versus debugged (bottom). Screenshots by author.
$14 BLOck DROp. What this would do is read Forth disk block number $14, and then immediately throw away that disk block’s data in memory;7 one reason a programmer might want to do this is because the data was unimportant, and it was a side effect of reading that data they were after—a side effect like positioning the disk drive’s read/ write head in a specific location. Thanks to the discovery of Forth screens/blocks #12 and #13 in Amnesia’s code fragments, and verifying
Copy Protection
157
Figure 11.4. Contemporaneous and modern views of Amnesia’s disk data. Screenshots by author.
them with the restarted ATILA interpreter, I knew how Forth blocks related to disk locations. Block $14 would be at offset $14 × 1,024 in the .dsk file, and that is 16 × 256 × 5, meaning track 5. We need to break some abstractions to understand what track 5 contains. Conceptualizing a sector as simply 256 bytes of data was sufficient for reverse engineering Amnesia until now, whereas copy protection code for floppy disks operates at a lower level. The 256 bytes of sector data, as stored on a disk, were encoded in different
158
Amnesia Remembered
ways to accommodate the constraints of the floppy disk hardware.8 In a standard Apple II format, the 256 bytes expanded out to 342 bytes when encoded, and those bytes were bracketed by unique, distinctive sequences of additional bytes signifying the start and end of a sector’s data—for example, $d5 $aa $ad was a standard data prologue. The sector data’s prologue, in turn, was preceded by metadata: an address field that included the track and sector number of the data, surrounded by its own unique prologue ($d5 $aa $96) and epilogue bytes. We can see this lower-level data representation in several ways. One approach is to use modern tools, like Applesauce, and another is to leverage the extant disk-analysis tools of Amnesia’s time. Figure 11.4 shows the two side by side, and the address and data prologues are clearly visible. Regardless of the method used to view the encoded data bytes on the disk, what becomes apparent is that the encoded sector data within track 5 is always the value $af, for all the sectors on that track. Furthermore, the exact same track contents are also found at track 5.25, 5.5, 5.75, and 6, and that is highly unusual. It may be surprising that tracks 5.25, 5.5, and 5.75 existed at all, and that was because the Apple II hardware gave the programmer an extraordinary degree of control over the floppy disk mechanism. The disk read/write head would ordinarily be positioned on whole-numbered tracks, but could also be stopped reliably at intervals in between, yielding the quarterand half-tracks. This did need to be done with caution, however. The Apple II disk drives were not precision instruments in that when magnetic patterns were written to the disk surface, the patterns would effectively ooze onto adjacent fractional tracks. Disk tracks could be written and read accurately as long as they were a full tracks’ width apart; track 5 and track 6 could be used, or track 5.5 and 6.5, or 5.25 and 6.25, but not all at the same time. The fractional tracks of disk 1, side A of Amnesia were impossible to create with a standard floppy disk drive and, therefore, impossible to copy perfectly with a standard floppy drive. Incidentally, we can now surmise why copy programs flagged errors on this disk during our evaluation, specifically track 6: as an exact copy of track 5, the metadata physically located on track 6 would have been lying, and falsely claiming it was really track 5. One exciting feature of Applesauce is that it provides a visualization of the magnetic flux patterns on the disk surface. We can “see” the encoded data, in a sense. The flux visualization of Amnesia’s disk 1, side A in figure 11.5 shows the flux patterns in grayscale; for reference, an all-black area would be the absence of fluctuations. The concentric circles of the tracks are apparent, and we can see that not only did tracks 5–6 have the same contents, they had the same contents per-
Copy Protection
159
tracks 5–6
sector boundaries
Figure 11.5. Annotated magnetic flux visualization of Amnesia’s disk 1, side A. Image created by author.
fectly aligned with one another, yielding the bands of uniform flux. Again, this was not a feat that could be duplicated using standard disk drives: while floppy disks had an index hole that could theoretically be used to identify the orientation of the disk as it spun, that was studiously ignored by the Apple’s disk hardware. What the identical data on these tracks and its alignment meant was that Amnesia’s copy protection was using a so-called fat track.9 A fat track required special disk duplicating equipment to produce, needing a read/write head that was wider than that of consumer drives.10 First, the fat track would be written with the wide-head drive, then the remainder of the disk’s contents would be written with a normal duplication drive, stepping around the previously written fat track. The extra step of fat track creation would obviously have added to the production costs. Comparatively, we can see this distinctive pattern in the flux images for other Electronic Arts games (figure 11.6), suggesting that fat track usage was not limited to Amnesia and was in use for a number of years. The copy protection interpreter was also not unique to Amnesia, and other analyses of Electronic Arts games
160
Amnesia Remembered
One-on-One (1983)
Skyfox (1984)
Adventure Construction Set (1985)
Arcticfox (1986)
PHM Pegasus (1987)
Figure 11.6. Comparative flux visualizations with fat tracks. Image created by author.
place variants of the interpreter back to 1983 or 1984 (Aycock 2016; Henrikson n.d.). The unused copy protection interpreter instructions and unused copy protection code functionality imply that Amnesia’s copy protection code was not bespoke to Amnesia. Conceptually, the copy protection code’s job must be to verify that the fat track exists; if not, the disk is a copy. Static analysis showed that 24 sectors would need to be read flawlessly from the fat track, both their address and data fields, to pass the test. My analysis needed to span the interpreted code along with the assembly code, because while the interpreted code was controlling the overall procedure, it called a 6502 subroutine to read data from the disk. The use of machine language was likely more a matter of necessity than an attempt at complicating cracking: even if the interpreted language was expressive enough to read the disk’s data, interpreters add a nontrivial
Copy Protection
161
performance overhead (Ertl and Gregg 2003), and interpreted code’s speed would probably have been outstripped by the relatively slow flow of disk data. What the interpreted code did hide from the casual observer11 was that the 24 sectors were being read from different places on the fat track, hence checking that the fat track was intact. Reading from tracks 5, 5.5, or 6 was not surprising. What was unexpected was that the disk head–moving mechanism was engaged immediately prior to some of the disk reads, meaning the data was physically read on an arc transiting between two tracks. It seems inconceivable that anything apart from a fat track could have sufficed to satisfy the copy protection code, making bypassing the code an essential strategy for crackers. This returns us once again to Dragon Lord. Given that the threebyte patch to Amnesia made the disk-reading code always claim success, and the post-patch checksum matched the pre-patch checksum, why did the Dragon Lord crack fail? The key lies in the fact that the checksum was computed and checked after the interpreted copy protection code was run. The checksum was computed over unvarying bytes of code—almost. There was one byte of data, ironically right next to where Dragon Lord patched Amnesia, and its value changed as the disk-reading code ran. The checksum as stored on the disk was wrong. It was only by running the copy protection code that the checksum’s value was rectified in memory by changing the data byte, thus allowing the later checksum test to succeed. If the third byte of the Dragon Lord patch had the value $dd instead of $de, to compensate for the value the data byte should have been, the crack would have worked. The Forth word invoked when the checksum failed was AXE____, according to the decompiled code, probably either AXE-get or AXEobj[ect]. Whatever the original name was, the code caused all in-game objects to be relocated to a nonexistent location, causing subsequent GETs to fail.12 It was a clever idea, effectively reducing the game into a demo version that was unwinnable, and at the same time not overtly cruel because it would be obvious to a player from the outset that the game was broken. In terms of reverse engineering, the copy protection code in Amnesia did not disappoint; it was a complex web of code with numerous self-checks, intentional obfuscation, and misdirection that allowed would-be crackers to incorrectly assume they had succeeded. The goal of this excavation is not to train a modern generation of software crackers, of course, but to illustrate how reverse engineering can be done even in an adversarial setting. It is worth pointing out that it can be hard to gauge intent, and sometimes apparently obfuscated code is
162
Amnesia Remembered
the result of optimizations that yield opaque code as a side effect. With that, the digital excavation season is complete, and we can step back and consider what can be made of all these reverse engineering results. Notes 1. As with reading any language, common words (or in this case instruction codes) become familiar with practice. The correspondence between numbers and assembly instructions is also provided in an assembly language reference manual. 2. The code was hardwired to assume that the Apple II’s disk controller card had been placed in a specific physical slot in the computer. Tsk, tsk. 3. Curiously, the checksum does not cover all of CHE________. Perhaps the Forth code had been changed and updating the length of the checksum was simply overlooked. 4. There is an independent analysis of the copy protection interpreter’s instruction set (Goetz 1988), which I compared my completed analysis with. That analysis had a few details incorrect but was otherwise in agreement with mine. 5. This Electronic Arts copy protection scheme has been attributed to the late Jim Nitchals (Maynard 2019). 6. The self-modification being done has its addresses off by one, and the most-significant byte of the address along with the opcode of the next instruction end up being incremented by mistake (the interpreter uses little-endian format for the address). 7. Strictly speaking, DROp pops the address of the buffer from the stack that was left there by BLOck. 8. Unless otherwise noted, hardware and encoding details in this section are drawn from Worth and Lechner (1981) and Sather (1983), with quartertrack information from Jones (1985). 9. Pre-Applesauce, I verified fat track usage indirectly: by manipulating the disk image that an emulator used, I caused identical, aligned fractional tracks to virtually exist and could ascertain that the formerly failing copy protection checks would then succeed (Aycock 2016: 165). 10. This description of fat track production is based on Voelcker and Wallich (1986), who refer to them as “wide tracks,” along with information from people who previously worked in the professional duplication industry (Aycock 2020; Kevin McDonnell, email to the author, July 30, 2020). 11. Insofar as anyone looking at disassembled 6502 code could be said to be a casual observer. 12. Or equivalently, the same location in-game hamburgers go when eaten. Perhaps it’s best not to dwell on this.
Copy Protection
163
References Aycock, John. 2016. Retrogame Archeology: Exploring Old Computer Games. Cham, Switzerland: Springer. ———. 2020. “Interview with Peter Brown Re: XEMAG.” TR 2020-1115-01. University of Calgary, Department of Computer Science. DOI:10.11575/ PRISM/38021. Collberg, Christian, and Jasvir Nagra. 2010. Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection. Upper Saddle River, NJ: Addison-Wesley. Ertl, M. Anton, and David Gregg. 2003. “The Structure and Performance of Efficient Interpreters.” Journal of Instruction-Level Parallelism 5(Paper 12). Goetz, Phil. 1988. “Electronic Arts Protection Language.” Computist 57 (July): 10–12. Henrikson, Keith. n.d. “Electronic Arts C64 Fat Track Loader.” Retrieved September 17, 2021, from https://c64preservation.com/files/EaLoader.txt. Jones, Bruce Wayne. 1985. “Demystifying the Quarter Track.” Hardcore Computist 21: 12–14. Maynard, David S. 2019. “Artist Tribute Jim Nitchals.” Software Artist blog. August 4, 2019. https://www.software-artist.com/artist-tribute-jim-nitchals/. Sather, Jim. 1983. Understanding the Apple II. Chatsworth, CA: Quality Software. Voelcker, John, and Paul Wallich. 1986. “How Disks Are ‘Padlocked.’” IEEE Spectrum 23(6): 32–40. Wiegley, John. 1987. “Softkey For . . . Amnesia.” Computist 48 (October): 31. ———. 1988. “Softkey For . . . Amnesia.” Computist 51 (January): 26. Worth, Don, and Pieter Lechner. 1981. Beneath Apple DOS. Reseda, CA: Quality Software. ———. 1984. Beneath Apple ProDOS. Chatsworth, CA: Quality Software.
Part III
Post-Excavation
---
CHAPTER 12
Analysis ---
Analysis is Carver’s penultimate stage of field research procedure, and the last one covered in depth in this book; his final stage is publication, which is implicit here. He takes a poetic turn with analysis, framing the data and initial interpretations from the field as a “harvest” to be reaped (Carver 2009: 36). And then, just as we are primed to imagine a fieldwork-stuffed cornucopia, from which the grain will be ground to craft artisanal baguettes of knowledge, Carver abruptly drops the metaphor. More pragmatically, he says that “out of this stage [analysis] will come a synthesis, a story” (Carver 2009: 36). What is the story of Amnesia? The story began with too much story. Something that stayed with several of the people involved with Amnesia, after all the years since its completion, was that Disch’s script was ambitious for the time. As Charlie Kreitzberg put it, “Tom Disch’s script was probably too sophisticated for the computers available. Manhattan was too large. And the story too vast. . . . The job was just larger than the technology at that time could support” (Aycock 2019a: 8). The Apple II requirements for Amnesia—or, for that matter, any of the versions of Amnesia—were a pittance by modern standards, and yet in many ways they were wholly sufficient for realizing an even larger incarnation of Amnesia. The speed of the computer’s CPU was not at issue, demonstrably, because Amnesia existed as a playable game. As a text adventure game, where the pace of the game was dictated to a large extent by the player typing commands, there were no especially onerous demands on the performance of the CPU or the system generally; an internal Infocom document advises that “the design goal also requires no more than a few seconds response time for a typical move” (Berez, Blank, and Lebling 1989: 5). Any limitations in terms of the amount of (RAM) memory the CPU had available and could address were mitigated, both by
168
Amnesia Remembered
the hardware itself and its bank switching, and through the game’s ability to load additional game content from disk. This brings us to the crux of the problem. As the game inevitably exceeded the bounds of the computer’s memory, it needed to spill out of RAM into secondary storage, which in Amnesia’s case meant floppy disks; more capacious hard disks existed at the time but were still expensive and not in ubiquitous use yet. Floppy disks were relatively slow to access and were limited in the amount of “game” that could be stored on each, although theoretically the full vision for Amnesia could have be realized given enough floppy disks. Practically speaking, floppy disks added production costs, and producer Don Daglow recalled that his “least favorite part of the job on Amnesia was having to supervise cutting so much content when we could only get budget approval for two floppy disks” (Aycock 2019b: 3). Time was also a constraint, not in terms of the speed of floppy disks but rather the time to market. Daglow mentioned, “Like most games, we were also under a lot of schedule pressure” (Aycock 2019b: 4). More content would have meant more code to develop, debug, and playtest; as it was, Amnesia had a narrow market window as an all-text game in 1986, exacerbated by the delay caused by changing publishers. Juggling multiple disks made for a less-than-optimal player experience, too, and James Terry worked on “general compression issues to lessen the need for disk swapping” (Aycock 2019c: 3). Changing one floppy disk for another would be an imposition on the game-playing experience, and a player would be rightfully annoyed to be prompted for a disk change only to be prompted again a short while later, an event that does happen from time to time in Amnesia. Terry points to compression as a way to address disk swapping, a technique that would also have increased the amount of Disch’s script that could be implemented within a finite amount of disk space; and after reverse engineering, we are in a position to understand it technically. The base 40 text encoding was used extensively throughout Amnesia; as a first approximation, there were almost 70,000 words encoded this way, although not all those words were narrative game text per se. It was a more concise encoding than plain ASCII, and that compression would have yielded a substantial reduction in space. We can see compression choices that were not made, however, especially in comparison with other text adventure games of the time and even the compression of the Electronic Arts logo. The text compression in Infocom games and Level 9 games (another producer of text adventure games) employed what can be thought of as “abbreviations” in their compression schemes (Aycock 2016). An abbreviation was a fre-
Analysis
169
quently used string in the text: a word, a part of a word, even a phrase, and the abbreviation would replace the longer string with a short encoding. Upon decoding, the short encoding would be replaced with the full-length string it abbreviated, and the regenerated text would look the same. This can be made more concrete with some examples. In the base 40 Amnesia text, the string “the” appears 4,636 times, three base 40 digits we know fit in 16 bits. A hypothetical, alternate encoding method for “the” that would use only two base 40 digits, with one digit to announce a forthcoming abbreviation and the next to select the abbreviation “the” from a small pool of strings, would already result in a space savings by using two base 40 digits instead of three. The string “ the ” with spaces on either side of the word occurs 3,834 times, and using the same alternate method just described would save three base 40 digits for each occurrence, a nontrivial amount overall. While Amnesia’s base 40 encoding did not exploit abbreviations and was oblivious to recurring sequences in the text it was compressing, we do see this idea at play in the Electronic Arts logo compression. The Huffman code being used there resulted in the most frequent bytes having the shortest representations. Generally, the multiple forms of compression that were found illustrate both that size was a consideration throughout and that there were different ways of addressing the problem. Could more of Disch’s script have been fit into Amnesia? Yes, unquestionably: we have discovered many things in Amnesia that do not need to be present, the omission of which would have freed up space for game content. For instance, the Forth dictionary, the development code remnants, and the Forth source code screens are all unnecessary, although they did assist greatly with our analysis. The four-byte MEH___ PRI___________ accompanying each base40 string seems a magnanimous use of space, and other compression techniques could have been used. What is debatable is whether a more svelte, streamlined implementation would justify the time and effort of rejigging Amnesia’s internals, and if the benefits would outweigh the cost. Reverse engineering has permitted an informed discussion on the matter. Reverse engineering has also revealed that Amnesia was a master class in languages. The player would have been aware of two languages in Amnesia, the English prose of the game along with the artificially restricted language of input; adding to this is the language of pseudocode found in Disch’s script for the game. Even the locations in the game, tying in as they do to the physical world, are evoked through language in Amnesia’s textual format. However, we see many more lan-
170
Amnesia Remembered
guages in the game’s implementation. The use of Forth amalgamated two more languages into the mix, with the language of its interpreted code—which is still a language in a formal sense even if humans do not use it directly—and the language of Forth itself that we saw in the code fragments and the dormant Forth interpreter left in Amnesia. The parts of the game implemented in lower-level 6502 code were expressions in yet another language, albeit a dialect, because the low-level Apple II code would have varied slightly from the low-level code for the Commodore 64 in terms of how I/O was accomplished, for example. The copy protection, not to be outdone, incorporated another language through its interpreted code. Furthermore, we can see the interplay between languages. Forth source code was translated into interpreted code, which was interpreted by 6502 code, implementing the languages in Disch’s script to bring forth the languages of input and output the player experienced. Forth words themselves could be implemented either using Forth or 6502 code. In the copy protection, Forth code invoked 6502 code that ran the interpreted copy protection code, which called back to 6502 code. The one language missing from this discussion is the one that helped trigger my work on Amnesia to begin with: King Edward. The game’s credits say it is “programmed in the King Edward adventure language,” and yet there were no obvious traces of another, distinct language that came to light during the excavations. Kreitzberg, in my interview with him, said: “We did not program the script directly. Instead we came up with the King Edward programming language. King Edward was a shell which interpreted the script” (Aycock 2019a: 5). That makes King Edward sound like it was a separate, domain-specific language and not simply a collection of game-specific Forth words. James Terry, the person who implemented King Edward, was less certain, remembering it as “very much a mixture” and that programmer Kevin Bentley “needed to generally know Forth but mostly worked in the DSL [domain-specific language]” (Aycock 2019c: 2). Both accounts may be valid, thanks to Forth. Forth was a highly extensible language, and the “standard” Forth words had no special status, unlike the reserved words of other programming languages. It was perfectly possible to add new words to Forth’s vocabulary that behaved like built-in words (Brodie 1984: 29), thus tailoring the Forth language to the application domain. Moreover, this practice was encouraged: “Writing a Forth program is equivalent to extending the language to include all functions needed to implement an application. Therefore, programming in Forth may be
Analysis
L: NEXT
1 # LDY DEY CLC IP LDA NEXT
LDY LDA STA DEY LDA STA CLC LDA ADC STA
171
IP )Y LDA IP )Y LDA
W 1+ STA W STA
2 # ADC
IP STA
#1 (IP),Y W+1 (IP),Y W IP #2 IP
Figure 12.1. Excerpt from original Hat Trick source code (above), and its representation in typical assembly code (below). Image created by author.
thought of as creating an application-specific language extension” (Koopman 1993: 358). Brodie similarly wrote, “FORTH has been called a ‘meta-application language’—a language that you can use to create problem-oriented languages” (Brodie 1981: 4). To see how this was done in Forth, we can look at assembly code. Programmers preferred writing assembly code to raw machine code for low-level work, and while we have made ample use of disassembly when reverse-engineering, or the conversion of machine code to assembly code, we have not considered the other direction. In other words, how was a programmer’s assembly code translated into machine code? For non-Forth users, the answer was to employ a separate program called an assembler; Forth users had another option, though. An assembler could be built as a language extension to Forth, permitting a Forth programmer to write assembly code directly in their Forth programs, assembly code that was expressed in Forth’s stackbased fashion. For example, Hat Trick was a 1987 game for the Atari 7800 game console that was programmed in Forth, and for which the source code exists. Figure 12.1 shows an excerpt from the Hat Trick assembly code as written in Forth alongside a representation I made of how that same code would have been written using a non-Forth assembler.1 The assembler-in-Forth concept was mentioned in Loeliger’s book (Loeliger 1981), the one James Terry recalled using, and various in-
172
Amnesia Remembered
Forth assemblers appeared in the early 1980s (Cassady 1982; Duncan 1982; Perry 1983); even a public-domain implementation of a 6502 assembler in Forth existed (Ragsdale 1982). It would not be a stretch to imagine that ATILA, the version of Forth underlying Amnesia, had a Forth-based 6502 assembler as part of its development environment. Fortunately, we do not have to imagine, because while reverseengineering Amnesia, I happened across the Forth words for a 6502 assembler.2 An assembler definitely exists in Amnesia’s code, another possible relic of game development. The problem was that Forth assemblers leave their assembled output directly as binary code, and there would normally be no trace of their usage. In other words, it was uncertain if the Forth assembler had been used to create any of Amnesia’s code. Forth-based assemblers would often implement higher-level program structuring facilities like if-else conditional statements, letting them support “structured” assembly code (Crespi-Reghizzi, Corti, and Dapra’ 1980), and the ATILA assembler was no exception. The ELSe of its if-else was of particular note, because it needed to assemble an unconditional jump to the corresponding ENDif terminating the ifelse statement, and it did so in an unusual way. The 6502 had an overflow bit in its status register, which in my experience was relatively infrequently used in 6502 assembly code. ELSe was assembled into a two-instruction sequence where the overflow bit was cleared (CLV) immediately followed by an instruction to branch if the overflow bit was clear (BVC)—a tautology in the code, and not a sequence of instructions a programmer writing unstructured assembly code would use because they would be much more likely to use the context of instructions preceding the branch instead of the explicit CLV. The Forth-based assembler did not keep track of this prior context, though. When reverse engineering some assembly code related to Amnesia’s parser, I had come across the CLV–BVC pair multiple times, in places where a human assembly-language programmer would have used a different, context-appropriate sequence of code. This makes a compelling argument that Amnesia’s code used the Forth-based assembler and, incidentally, it adds yet another layer of language into Amnesia. This diversion into assembly code has illustrated how Forth could be extended to implement other languages, and we can revisit King Edward in this light. The nature of Forth meant King Edward could well have been a set of Forth words that customized the basic Forth language for Amnesia’s development. Were that the case, the “King Edward” code would have been Forth and its extension words interwoven, resulting in Terry’s “mixture”; at the same time, a complete
Analysis
173
enough set of extension words could have curtailed the amount of standard Forth required, creating a veneer over Forth that Kreitzberg remembered as “a shell.” In opposition to this argument is MEH___ PRI___________. Given how frequently base 40 strings were output by Amnesia’s code, an efficient (extension) language design would surely facilitate that operation by making output a short, easy sequence for the programmer, not a twenty-one-character incantation. There is some leftover development code in Amnesia that does the work of encoding base 40 strings and placing the addresses of MEH___ and PRI___________, although that Forth word is twenty characters long and would have been no less exhausting to program. It could be that there was a shorter way to delineate strings during development that I overlooked, or that there was support in the Forth code editor, such as a one-button keyboard shortcut that would spew out a long repetitive code sequence easily. Overall, though, the balance of evidence leans toward King Edward being a Forth language extension and not a completely separate language. Finally, an integral part of Amnesia that should not be forgotten is its physicality, a surprising amount for a digital artifact. The game had its simulated Manhattan locations, which we connected to places in the floppy disk images that would have their own physical manifestation of magnetic flux. The elaborate copy protection turned out to be linked to the physical world in numerous ways: the code wheel, the fat track. Reverse engineering helped unfold the protection and its complexity because while we could superficially understand and even verify that copy protection existed in Amnesia, we were able to gain a much deeper technical understanding of the extensive measures that were taken by Electronic Arts and the threats they anticipated. And yet, Amnesia’s disk copying protection could be waved away with the change of a handful of bytes, as if by magic, just as the aspiration of reconstructing Manhattan within 1980s computers was ultimately a chimera.
The Untold Story This book is far from a complete examination of Amnesia; there are many potential research questions left unexplored, some of which relate back to recollections of the interviewees. For instance, Don Daglow remembered that Disch had been busy with a screenplay when they “needed some extra text for the security check software” (Aycock 2019b: 3). This likely refers to the interactions with nonplayer characters in the game that require the player to use the X-Street Indexer.
174
Amnesia Remembered
Law of the West (1985)
The Dam Busters (1985)
Figure 12.2. Magnetic flux visualizations for two Accolade games with fat tracks. Image created by author.
Daglow said he “wrote the short sections ‘in the style of Thomas Disch’ as drafts he [Disch] could work from to understand what we were doing,” and, most importantly, Disch “approved them verbatim” (Aycock 2019b: 3). That means there should be identifiable, discrete pieces of text appearing in Amnesia that are not Disch’s; and given the reverse-engineered ability to extract text from the game, it would be interesting to perform stylometry on the text and see if Daglow’s sanctioned masquerade is detectable. We can also apply what we have learned through reverse-engineering the Apple II version of Amnesia to expand into comparative analyses. It appears, for example, that variants of the elaborate copy protection employed by Amnesia existed in other Electronic Arts products, and one possible direction would be to examine how the copy protection and its code evolved over time, contrasting that with contemporaneous copy protection schemes—other publishers employed fat tracks as well (figure 12.2).3 However, we need not look beyond Amnesia at all for comparative analyses. Two additional versions of Amnesia were released for different platforms: the Commodore 64 and the IBM PC. Some subtle player-visible changes exist, like the type of in-game computer changing between versions, but what about the code? One advantage to working in a higher-level language like Forth over assembly code is that much of the Amnesia Forth code could theoretically transfer from one platform to another unchanged; all that would be needed was a new version of ATILA specific to the target platform. In cases where two platforms shared the same type of CPU, even some of the assembly code might
Analysis
D000: D003: D004: D006: D008: D00B: D00D: D00E: D011: D013: D014: D016: D018: D01A: D01C: D01D: D01F:
4C 00 06 A6 BD 85 E8 BD 85 E8 86 A0 B1 85 C8 B1 85
8B E0 D0 EB 00 B5 EE 00 B5 ED EB 00 ED FD ED FE
jmp brk asl ldx lda sta inx lda sta inx stx ldy lda sta iny lda sta
$e08b $d0 $eb $b500, x $ee $b500, x $ed $eb #$00 ($ed), y $fd ($ed), y $fe
175
.C:e000 .C:e003 .C:e004 .C:e006 .C:e008 .C:e00b .C:e00d .C:e00e .C:e011 .C:e013 .C:e014 .C:e016 .C:e018 .C:e01a .C:e01c .C:e01d .C:e01f
4C 00 06 A6 BD 85 E8 BD 85 E8 86 A0 B1 85 C8 B1 85
13 F2 E0 1E 00 CF 21 00 CF 20 1E 00 20 27 20 28
JMP BRK ASL LDX LDA STA INX LDA STA INX STX LDY LDA STA INY LDA STA
$F213 $E0 $1E $CF00,X $21 $CF00,X $20 $1E #$00 ($20),Y $27 ($20),Y $28
Figure 12.3. Code comparison between Apple II (left) and Commodore 64 (right) Amnesias. Image created by author.
have been reused, and this is the situation for the Apple II and the Commodore 64. The Apple II had a 6502 CPU, as mentioned, and the Commodore 64 had a 6510 CPU (Commodore Business Machines, Inc. 1982), where the major difference was simply the addition of I/O to the 6510—apart from that, the machine language and assembly code were exactly the same. To illustrate, I examined the Commodore 64 Amnesia as it ran in the VICE emulator. Gathering execution traces from the VICE debugger provided some acclimatization to the different ranges of memory addresses that version used, and with disassemblies of those areas I quickly found a connection to the Apple II version. In particular, Commodore 64 instruction addresses ending in “016” reminded me of the $d016 of the Apple II Forth interpreter, and Figure 12.3 has a side-by-side comparison of the assembly code in that “016” vicinity. The correlation, highlighted in bold, is apparent; the instructions are the same, and only the memory addresses have changed. This suggests that the Commodore 64 Amnesia may be rapidly understood in terms of its differences from the Apple II version, rather than starting anew. The analysis programs created previously may be easily adapted as well. I took my program that used the addresses of MEH___ PRI___________ to locate and print encoded strings and made a one-line modification to print out the sequence of bytes that comprised each encoded string on the Apple II. The You wake up sentence, for example, started with the bytes $16 $a0 $b2 $11 $3d. My hypothesis was that the Commodore 64 version used the same encod-
176
Amnesia Remembered
Figure 12.4. Amnesia with tracing enabled. Screenshot by author.
ing, and only the addresses of MEH___ and PRI___________ would have changed. I searched for the byte sequence in a saved image of the Commodore 64’s memory; the four bytes preceding it were the new MEH___ PRI___________ addresses. Making the string-seeking program look for these new addresses was a quick change, and I then had a new analysis program to extract encoded strings from the Commodore 64 disk images. This approach exhibited a satisfying symmetry, too: before, I had used the MEH___ PRI___________ signature to find the text, and now I was using the text to find the MEH___ PRI___________ signature. Regardless of the Amnesia version, there is still much left unexamined in the decompiled Forth code, including aspects that are adjacent to research questions already answered. Remnants of game development practices have been observed throughout our digital excavations, and there are more. The decompiled code uses a variable named TRAce, and by using the debugger to change the variable’s value, reams of debugging information suddenly appear; figure 12.4 captures the output after the player types GET BIBLE. Early on, in the recovered source code in figure 4.5, we saw how this debugging output could be enabled (or not) in development with the Forth code TRACE C0SET; a minor change of that to TRACE C1SET would have flipped on the figurative debugging switch.
Analysis
177
The physical game artifacts leave open questions to be answered by the unexamined code. The Amnesia manual saw fit to announce that “the program actually contains two parsers,” one for interacting with nonplayer characters, and a main parser for everything else. Why this technical detail was vital for a player to know is unclear, but it does foreground the fact that the parser excavation overlooked this bifurcation to avoid complicating the situation. There is, in fact, a small cluster of Forth words in the decompiled Forth code that are not otherwise used by the main parser, and these may constitute the second parser the manual proclaims. It is also asserted, on the back of the packaging, that the game has “close to 4000 separate locations in Manhattan, including 650 streets.” Developing a finer-grained teleportation method, in conjunction with the decompiled code, would be desirable to verify the marketing claim and additionally see if there were clever tricks at play to represent all those places in condensed form. Manhattan, like Amnesia’s code, is extensive, and there are still back alleys and side streets to explore in both.
Epilogue The source code for Amnesia still exists. Electronic Arts has an archive, and an archivist, whom I contacted to see if there was any way the Amnesia code could be released. Electronic Arts had someone from their legal department look into the matter, and all that was necessary was the approval of joint copyright holders Cognetics . . . and Thomas M. Disch. Disch, of course, had died over a decade prior in 2008, and this quest would entail finding out who oversaw Disch’s estate. Having no idea where to begin, I contacted Disch’s former agent, Glen Hartley, whom I had previously interviewed regarding Amnesia, which turned out to be a stroke of luck: Hartley’s agency represented Disch’s estate! Unfortunately, I was told in no uncertain terms that the copyright release was not possible, with no explanation as to why—a dead end. Meanwhile, in my exchanges with Don Daglow, he was so effusive with his praise of Disch’s writing that I thought I should read some of the late author’s work as I was researching Amnesia. Among the books of Disch’s I read was his last one, which was published mere days before Disch took his own life (Disch 2008a: 245). That book, The Word of God, enjoyed the usual disclaimer inside its front cover that “any resemblance to real people or events . . . is purely coincidental” (Disch 2008b). Having said that, Disch concluded the book with an epilogue in which he disparaged his agent—or, as he pointedly wrote,
178
Amnesia Remembered
“ex-agent”—Glen Hartley by name, going on to list Hartley’s perceived faults representing Disch, and recounting how Disch exacted some subtle revenge (Disch 2008b: 175–77). This was most unexpected. It would seem there may be nuances to this story extending well past Amnesia’s publication date. Notes 1. The assembly representation here was constructed using the actual Hat Trick disassembly of this Forth/assembly code. 2. The Amnesia assembler is definitely different in structure from the extant public-domain one; there does not appear to be a direct connection between the two. 3. Thanks to 4am for pointing out these and other fat-track titles. The Dam Busters visually appears to have a second fat track midway through the disk, but further examination showed that the tracks in the second band were not exact duplicates of each other.
References Aycock, John. 2016. Retrogame Archeology: Exploring Old Computer Games. Cham, Switzerland: Springer. ———. 2019a. “Interview with Charlie Kreitzberg Re: Amnesia.” TR 20191111-03. University of Calgary, Department of Computer Science. DOI:10 .11575/PRISM/36732. ———. 2019b. “Interview with Don Daglow Re: Amnesia.” TR 2019-1113-05. University of Calgary, Department of Computer Science. DOI:10.11575/ PRISM/36733. ———. 2019c. “Interview with James Terry Re: Amnesia.” TR 2019-1112-04. University of Calgary, Department of Computer Science. DOI:10.11575/ PRISM/36756. Berez, Joel M., Marc S. Blank, and P. David Lebling. 1989. “ZIP: Z-Language Interpreter Program.” Infocom internal document, draft. Brodie, Leo. 1981. Starting FORTH. Englewood Cliffs, NJ: Prentice-Hall. ———. 1984. Thinking FORTH: A Language and Philosophy for Solving Problems. Englewood Cliffs, NJ: Prentice-Hall. Carver, Martin. 2009. Archaeological Investigation. London: Routledge. Cassady, John J. 1982. “8080 Assembler.” Forth Dimensions III(6): 180–81. Commodore Business Machines, Inc. 1982. Commodore 64 Programmer’s Reference Guide. Wayne, PA: Commodore Business Machines, Inc. Crespi-Reghizzi, Stefano, Pierluigi Corti, and Alberto Dapra’. 1980. “A Survey of Microprocessor Languages.” IEEE Computer 13(1): 48–66. Disch, Thomas M. 2008a. The Wall of America. San Francisco: Tachyon Publications.
Analysis
179
———. 2008b. The Word of God: Or, Holy Writ Rewritten. San Francisco: Tachyon Publications. Duncan, Ray. 1982. “FORTH 8086 Assembler.” Dr. Dobb’s Journal 64 (February): 14–18, 33–46. Koopman, Philip J. 1993. “A Brief Introduction to Forth.” Second ACM SIGPLAN Conference on History of Programming Languages, April 20–23, 1993, Cambridge, MA, 357–58. https://doi.org/10.1145/154766.155395. Loeliger, R. G. 1981. Threaded Interpretive Languages: Their Design and Implementation. Peterborough, NH: BYTE Books. Perry, Michael A. 1983. “A 68000 Forth Assembler.” Dr. Dobb’s Journal 83 (September): 28–42. Ragsdale, William F. 1982. “A Forth Assembler for the 6502.” Forth Dimensions III(5): 143–50.
Conclusion ---
This journey began with a provocation. There are so many digital artifacts that define and shape modern culture, and us as humans, and there is so little work in archaeology that meets the challenge of digital artifacts. And challenging they are, whether considering their great abundance, the specialized knowledge required to interpret them, or their depth—we have seen through Amnesia that a digital artifact may equally be viewed as a site with many layers, and that there can even be linkages between the physical and digital world. There are, of course, many different kinds of digital artifacts, and I have focused on one type in particular, the humble video game, and one method of excavation: reverse engineering. Reverse engineering has given us “x-ray vision,” allowing us to see what would ordinarily be hidden. Starting with little more than a featureless bag of bytes, we have gradually been able to discern game development practices, the digital machinery responsible for implementing key parts of Amnesia, and how the game was protected (unsuccessfully) from the slings and arrows of the 1980s software ecosystem. Moreover, this has been a nondestructive study, and the reverse engineering can be reproduced and verified independently by others; the results can be built upon or debated based on fresh analyses that start from the original, unblemished artifact. It is, truly, archaeology as science. Moshenska wrote of the “thought-processes of the archaeologistas-reverse-engineer” (2016: 26), something I cannot elucidate because I am not an archaeologist. However, what I have tried to provide here is insight into the thought processes of a computer-scientist-asreverse-engineer, along with a demonstration of how reverse engineering parallels archaeological method by following the stages of Carver’s
Conclusion
181
field research procedure. In other words, this is not completely unfamiliar territory, and while reverse engineering may require learning new languages and methodology, this is a task that is well within reach. Furthermore, reverse engineering need not be a solitary endeavor, and provides excellent opportunities for interdisciplinary collaboration. It may seem from the case study of Amnesia that reverse engineering requires a lot of time spent programming tools, and this impression is somewhat misleading. The multitude of languages underlying Amnesia presented situations in which no adequate tools existed, but many (most?) digital artifacts will not present this linguistic potpourri. My own reverse engineering practice is programming-heavy because of my own background and biases, although I would be remiss not to mention the fact that industrial-strength reverse-engineering tools are available. For example, IDA is a tool that originated only a few years after Amnesia was published, and the NSA has released the more recent, freely available Ghidra.1 Both have voluminous books explaining their use (Eagle and Nance 2020; Eagle 2011), and while I have used them for reverse engineering, such tools do tend to focus on computer security and prevalent modern computer architectures. I would advocate against the sole use of leviathan reverse engineering tools, though, because I think there is an important role to be played by visualizations and reconstructions in reverse engineering, whether they are performed by custom-written programs or existing software that has been repurposed. Visualizations can help induce the aha! moment for tricky analysis jobs, as well as share an analysis with others, whereas post-analysis reconstructions increase confidence that an analysis is correct. Humans are good at deluding themselves, and a reconstruction precludes details from being waved away when reverse engineering; in addition, reconstructions can sometimes be built incrementally as reverse engineering is ongoing to highlight what is known and as yet unknown. Tools enhance human abilities, and ultimately what they are enhancing here is the power of thought. Reverse engineering is a way of thinking. Specifically, reverse engineering is a way of thinking about code and about data that is different from the norm: it is distinct from creation, maintenance, and debugging (Aycock et al. 2018), three usual types of interaction programmers would normally have with code and data. As we have seen with Amnesia, studying a digital artifact casts the archaeologist as investigator as they untangle the puzzle of the bytes, and as we have also seen, there are multiple techniques that can be employed apart from reconstructions and visualizations; it is helpful to recap them here:
182
Amnesia Remembered
•
•
•
•
•
•
Static analysis—examining code and data when the code is not running. This is the closest analogy to reading traditional, nondigital textual material, with the caveat that a static view is a snapshot in time and can be fleeting: both code and data may be mutable and change when the code runs. Dynamic analysis—studying code and data while the code is running. Recall that “running” does not necessarily imply use of the original hardware, and an emulator—particularly one equipped with a debugger—can be enormously advantageous. Debugger facilities like watchpoints, breakpoints, single-stepping, and execution traces permit the code’s own execution to give insight. Switching between static and dynamic analysis. While static and dynamic analysis are important distinctions by themselves, it is beneficial to know when to switch between the two. Some questions may be much easier to answer statically or dynamically, and selecting an appropriate mode of analysis can make the investigation progress go more rapidly. Put another way, a reverse engineer should always be mindful of the best and most direct way to answer any given question, which may change over time with experience and available tools. Instrumenting code and using the debugger to siphon data from the code as it runs. This technique can be seen as an extension to dynamic analysis and a way to gain a bigger-picture view by tracking specific aspects of code execution and its side effects. The resulting data can then be exported to other tools or visualized as necessary. Experimenting with code and data in the debugger to test reverse engineering hypotheses. Minor alterations, often to data but sometimes code, can directly induce a desired situation in the artifact under study, permitting the result to be quickly witnessed and a hypothesis to be confirmed or denied.2 Performing traditional research. Particularly for situations where the context of development is unclear, background reading about the programming language(s), tools, and platform used may be helpful to understand the constraints the original programmer faced and the solutions that were available to them. At a high level, traditional research may shed light on algorithmic decisions; at a low level, it can help explain “magic numbers” that may correspond to platform-specific code or hardware.
Conclusion
•
183
Conducting interviews. Information from interviews, whether conducted by the reverse engineer or others, may yield insight into what parts of an artifact might be interesting and worthy of study, as well as explaining aspects of development and practices that were not captured in the artifact—or that were captured but would be uninterpretable without additional information. Obviously interviews will not always be obtainable, and they are subject to the usual cautions about the reliability of human memory, yet they can still provide useful starting points.
Some techniques deserve further discussion. For example, there is the dilemma of when interviews should occur, a tension between conducting interviews before or after reverse engineering. While interviews should always be prepared for, there is a practical limit to how much speculative technical work can precede interviews. It presents a chicken-and-egg problem in that a good interview might reveal information that will assist with reverse engineering, but reverse engineering may raise technical questions that are suitable for an interview. And, on the topic of information garnered from others, the relationship between archaeologists studying digital artifacts and amateur enthusiasts conducting similar work needs to be considered, a situation highly likely to arise in the study of popular culture like video games, where there may be an engaged fan community as well. For some, this could be used as a deliberate move into the collaborative realm of community archaeology (Marshall 2002); at minimum, there is little sense ignoring information that originates elsewhere. Carver’s examples of this type of information variously place it in reconnaissance or site surveying (Carver 2009), and I have used independently gathered technical information at times to verify parts of my own analyses. Whether it is working with community members or other archaeologists, there is also the question of whether a formalized recording method should be imposed for reverse engineering work and, if so, what form it should take. For the research on Amnesia, I was not working with others, and standard notetaking on the reverse engineering process and its results sufficed. Typically I use computer-based, plain text files that can be easily edited, have disassembly and tool output copied and pasted into them, and can be tracked using version control software.3 However, a large project involving multiple reverse engineers may demand shared documents and standards for documentation; some reverse-engineering tools have facilities to annotate
184
Amnesia Remembered
disassembly and decompiler listings that may assist efforts (Eagle and Nance 2020). People-centric issues aside, technical complications exist beyond those presented in this book; Amnesia acted as a vehicle to present a representative but not complete set of reverse engineering problems. For instance, more modern artifacts are likely to have been implemented using high-level languages, and an understanding is required of the abstractions those languages provided to the programmer and the distortive effects of their translation into machine language. Even on older platforms there could be complications. The Apple II’s polling-based input and output avoided having to analyze code that involved interrupts, where code could be run suddenly and unexpectedly in response to an external event like a keypress, causing extra instructions to be disgorged into the midst of an execution trace. There is also the matter of scale, which returns us to the problem posed at the beginning: the quantity and rapid accumulation of digital artifacts. Reverse engineering a single digital artifact clearly takes time and commitment, and there are limits to how much of that can be accomplished manually. This hearkens back to my tool-centric bias, searching for ways the computer can be leveraged to analyze entire corpuses of games with minimal effort.4 How did Carver’s field research procedure hold up in an adaptation to reverse engineering? Surprisingly well. Recall that Carver’s stages were reconnaissance, evaluation, strategy, excavation, analysis, and publication (Carver 2009). What struck me is that Carver’s stages followed essentially the order in which I would normally approach a reverse engineering project; in fact, I found slotting my work into the field research procedure was beneficial in that it gave me a clearer way to articulate my practice. The biggest discrepancy was in the excavations themselves: traditional excavation imposes a certain amount of linearity due to the physical reality of, for example, not being able to start excavating immediately at an arbitrarily chosen depth. By contrast, digital excavations offer great flexibility in where to begin and ease with alternating between different excavations. Even in this single case study of Amnesia we have seen instances where progress or discoveries in one excavation advanced others. A few words should be said about Carver’s publication stage, especially since “publication” is a term he interprets very broadly (Carver 2009: 316). Certainly this book constitutes publication, as would venues familiar to academics like conferences, workshops, and journals, but with digital artifacts there is more we could consider publishing. While I have not devoted a separate chapter to it, some of the chal-
Conclusion
185
lenges associated with publication of digital artifacts have reared their head: a digital artifact under study might have its distribution legally encumbered, a workaround for which was discussed in Chapter 3. The programs written to facilitate reverse engineering are also digital artifacts, and could be made publicly available, although the primary audience for those would be other researchers. Visualizations have great potential for dissemination to the public at large, though, particularly interactive visualizations; for instance, some colleagues and I created a system that allows a user to play an Atari 2600 game in a web browser while simultaneously seeing a real-time visualization of selected parts of the game’s internal workings (Kaltman, Osborn, and Aycock 2019). That or similar systems could bring the results of digital excavations into classrooms and museums. Lastly, I am grateful that I have the privilege to do this work, and that I am able to help uncover and preserve the knowledge and human creativity that is encased in digital artifacts. I was also extremely lucky in my choice of an artifact to study in that Amnesia had interesting things to find buried in its artifact-site, and that there was a bounty of information remaining in Amnesia that assisted with the excavations; this could easily have been far harder and less productive. I do want to use my soapbox to reiterate a lesson of Chapter 6, though: there will be failures. The archaeologist and digital humanist Shawn Graham wrote that “to fail gloriously is to use the privileges that you have, as you are able, to make it safe for others to fail” (Graham 2019: 3). In the context of reverse engineering, I confess that I have at times stared down a reverse engineering problem for weeks, poking and prodding the bytes in numerous ways until I could find a path forward. This is why I wanted to give an honest portrayal of my first attempt at finding the text encoding, because it would be disingenuous to fast-forward past it to the later success. There is value in knowing that even more experienced reverse engineers will falter. Reverse engineering, like any kind of puzzle-solving, involves persistence, frustration, and elation. This is the end of the book, but really it is only the beginning. I hope this work has helped lay the foundation for future work on digital artifacts within archaeology, and that it will encourage additional reverse engineering through which we can remember more than Amnesia.
186
Amnesia Remembered
Notes 1. Yes, that NSA. 2. Making changes on a larger scale moves closer to injection, where chunks of data would be swapped out wholesale for ones constructed by the reverse engineer for the express purpose of testing some aspect of the code (like an interpreter, or the encoding of level data) as a “black box” (Aycock and Reinhard 2017). Regardless of scale, some restraint needs to be exercised with these experiments to avoid creating situations wildly outside the scope of what the code was designed to handle. 3. Invariably the files are supplemented with an assortment of hex addresses and values jotted down on bits of paper, whose meaning is incomprehensible afterward. 4. For an example of a tool-assisted analysis, see Aycock and Biittner (2020).
References Aycock, John, and Katie Biittner. 2020. “LeGACy Code: Studying How (Amateur) Game Developers Used Graphic Adventure Creator.” 15th International Conference on the Foundations of Digital Games, article no: 23. DOI:10.1145/3402942.3402988. Aycock, John, Andrew Groeneveldt, Hayden Kroepfl, and Tara Copplestone. 2018. “Exercises for Teaching Reverse Engineering.” 23rd ACM Annual Conference on Innovation and Technology in Computer Science Education, 188–93. DOI:10.1145/3197091.3197111. Aycock, John, and Andrew Reinhard. 2017. “Copy Protection in Jet Set Willy: Developing Methodology for Retrogame Archaeology.” Internet Archaeology 45. DOI:10.11141/ia.45.2. Carver, Martin. 2009. Archaeological Investigation. London: Routledge. Eagle, Chris. 2011. The IDA Pro Book: The Unofficial Guide to the World’s Most Popular Disassembler, 2nd ed. San Francisco: No Starch Press. Eagle, Chris, and Kara Nance. 2020. The Ghidra Book: The Definitive Guide. San Francisco: No Starch Press. Graham, Shawn. 2019. Failing Gloriously and Other Essays. Grand Forks, ND: The Digital Press at the University of North Dakota. Kaltman, Eric, Joseph C. Osborn, and John Aycock. 2019. “S4LVE: Shareable Videogame Analysis and Visualization.” 14th International Conference on the Foundations of Digital Games, article no: 61. DOI:10.1145/3337722.3341826. Marshall, Yvonne. 2002. “What Is Community Archaeology?” World Archaeology 34(2): 211–19. Moshenska, Gabriel. 2016. “Reverse Engineering and the Archaeology of the Modern World.” Forum Kritische Archäologie 5: 16–28.
Index --Note: In page numbers, “f,” “n,” and “t” denote figures, notes, and tables, respectively. Where multiple locators in an entry identify the same page, only the page number is given for brevity. 6502 and 16-bit addresses, 99, 101 and address space, 126 in Apple II, 77, 170, 175 and division, 127 and little-endian, 78 and stack, 89–90 and undocumented instructions, 91 variant in Commodore 64, 170, 175 See also register; zero page; individual instruction names A. See register: accumulator abstract machine, 124–25 abstraction, 22, 65, 86, 90, 101, 155, 157, 184 ACM, 132–33 Adams, Douglas, 19 Adams, Scott, 17, 20 Adventure Construction Set, 160f Adventureland, 17, 27n3 agent-based modeling, 5 algorithm, 46, 50, 84f, 141, 182 American Standard Code for Information Interchange. See ASCII amnesia (condition), 3, 21, 35 analysis (field research procedure), 9, 167–79, 184. See also field research procedure
annotation disassembly, 110n3, 183–84 log file, 94–95 Antiquity, 8, 11n1 Apple DOS, 40, 57–58 Macintosh, 18, 20 ProDOS, 40 version of Amnesia, 19, 32–35, 40, 44, 51, 167, 174–75 See also ASCII: Apple Apple II disk drive, 33, 39, 155, 158 (see also Applesauce) emulation (see MAME) graphics, 45–46, 70–72, 81–82 I/O (see polling) keyboard, 108 memory, 60, 70–71, 81, 88, 93, 126, 140–41 series computer, 33, 108 text display, 63, 70, 93, 141 Apple Threaded Interpretive LAnguage. See ATILA Applesauce, 39–40, 51, 155, 157–58, 162n9 archaeogaming, 5–7 Archaeological Investigation (Carver), 9 Arcticfox, 160f ASCII, 37–38, 57, 94–95, 116, 118, 168 Apple, 58, 89, 102, 114, 117, 119, 127, 154 nonstandard, 58 printable, 57, 59, 123 assembler, 171–72, 178n2
188 assembly. See language: assembly Association for Computing Machinery. See ACM Atari, 6 Atari 2600, 185 Atari 7800, 171 Atari ST, 18 ATILA, 107–9, 157, 172, 174 augmented transition network, 133–35 Bank Street Writer, 21 bank switching, 126–27, 168 Barker, Philip, 5, 43–44, 49 base 10, 74–76, 79–80, 82, 87n7, 108, 116–17 base 16. See hexadecimal base 2. See binary base 40, 116–20, 168–69, 173 BASIC, 34, 58, 72, 104, 138 Basic Books, 20 Bentley, Kevin, 22–23, 25, 27n10, 170 big-endian, 78 binary (number), 39, 74t, 76–77 bit, defined, 45 bit-stream, 39, 84 black box, 139, 186n2 bne, defined, 79 BNF grammar, 48 Boolean, 94, 131, 150–51 boot tracing, 149, 151 breakpoint, defined, 81 Brodie, Leo, 63, 103–4, 107–8, 171 Starting Forth, 103, 107 buffer, 94, 96, 111, 114, 120, 122–23, 162n7 bug, 118, 155–56, 162 byte capacity, 58 defined, 57 CAA, 3–4 CACM, 132–33 Camp Concentration (Disch), 21 Cannonball Blitz, 154 Carver, Martin, 8–9, 15, 31, 43, 49, 167, 180, 183–84 Archaeological Investigation, 9 central processing unit. See CPU character defined, 37
Index encoding, 37–38, 57 checksum, 50–53, 152–54, 161–62. See also hash clc, defined, 150 code (programming). See language code wheel, 146–48, 173. See also X-Street Indexer Cognetics, 22, 24, 177 Colossal Cave, 7, 17 command line, 59, 73, 110n2 comment code, 47, 62, 105, 108, 111–12, 156 stack effect, 112 Commodore 64, 34, 39 version of Amnesia, 19, 33–35, 39, 44, 170, 174–76 Commodore Amiga, 18 Communications of the ACM. See CACM community archaeology, 183 comparative analysis, 33, 108–9, 159–60, 174–76 compilation defined, 60 of Forth code, 60, 66, 103, 106 time, 60 compression, 45–46, 70, 73, 83–84, 118, 168–69 abbreviations in, 168–69 computational archaeology, 3 Computer Applications and Quantitative Methods in Archaeology. See CAA computer architecture, 99, 181 computer science, 4, 7–8, 11, 18, 48, 120, 124, 132 computer security, 8, 50, 53n3, 149, 181 Computist, 32, 150 conservation, 4, 49 container file, 52 contemporary archaeology, 7 Copy ][ Plus, 41, 157f copy program, 41, 155, 158. See also Copy ][ Plus; COPYA; Nibbles
Away II copy protection, 44, 46, 48–49, 149–63, 170 and disk images, 39–41, 48, 51, 53n4, 173–74 and software preservation, 32 See also X-Street Indexer COPYA, 40–41, 155
Index CPU, 66, 78, 86n4, 99, 126, 167, 174. See also 6502 crack (software), 32–33, 51–52, 149–50, 152–54, 160–61 intro, 32, 34 screen, 32, 51, 150 credits, 21, 27n7, 32–36, 139, 170 Crowther, Will, 17 cryptographic hash. See hash Daglow, Don, 24–25, 168, 173–74, 177 Dam Busters, The, 174f debugger and breakpoints, 35, 81, 112–14, 152 choosing watchpoint versus breakpoint, 151 and data import/export, 71–72, 89, 103, 123, 127, 141 and disassembly, 78–81, 83, 122 and execution traces, 99–100, 130, 175 and experiments, 68n2, 71–72, 82, 94, 102–3, 109, 139, 155, 176 and named pipes, 129–30 as part of emulator, 34–37, 70, 89, 149 and single stepping, 74 summary of usage, 182 use for instrumentation, 91, 93 and watchpoints, 75, 91, 99, 102, 114, 123 decimal. See base 10 decompilation, defined, 105 decompiler accuracy, 108 and .dsk images, 140 and dynamic analysis, 112–13 and false positives, 110n4 and Forth visualization, 129–31 principles of operation, 105–7 searching output of, 120, 125, 139, 143, 152, 176–77 and static analysis, 112, 114–15, 118, 125, 136, 138–39, 161 understanding output from, 111–12 and unknown bytes, 106, 116, 118–19, 121n3 development environment, 20, 23, 25, 60–62, 108, 128, 169, 172–73, 176, 182–83 dex, defined, 79
189
dictionary, 103–9, 112–14, 117, 120n2, 122, 169 differential analysis, 140, 145, 150 digital archaeology, 3–4, 6, 8 digital artifact, 3–10, 33, 44, 48–49, 51, 173, 180–81, 183–85 as site, 5, 9–10, 180, 185 disassembly correctness of, 79, 91, 153 defined, 78 for interpreted language, 154 and self-modifying code, 81 and symbolic names, 81, 91 versus execution trace, 99–100 Disch, Thomas M., 19, 21–26, 60, 88, 167–70, 173–74, 177–78 Camp Concentration, 21 Prisoner, The, 21 Word of God, The, 177–78 disk block, 111, 141–44, 146f, 155–56 disk duplication (authorized), 159, 162n10 dispatch, 98, 102, 104–5, 112, 122, 151, 153 Dollhouse, 21 DOSBox, 34 double-sided disk, 25, 32, 138 Dragon Lord, 32, 51, 149–52, 154, 161 .dsk, 37–40, 51–53, 57–61, 73–74, 83, 86, 106, 108–9, 118, 129, 140–42, 144–45, 150–51, 155, 157 DSL. See language: domain-specific dynamic analysis active, 94 alternation with static analysis, 35, 74, 78, 81, 86, 91, 120, 143, 182 defined, 35, 182 EAGLE SOFT, 34, 41n3 eBay, 25, 39 EBCDIC, 57 Electronic Arts, 19, 24–25, 48–49, 159, 162n5, 173–74, 177 logo, 34–35, 41, 45–47, 70–87, 168–69 emulator, 33–35, 37, 182. See also DOSBox; MAME; VICE enchantment, 9 encoding. See character: encoding encryption, 25 epilogue (data), 158
190
Index
error detection, 50 evaluation (field research procedure), 8–9, 31–42, 44, 47–48, 51, 149, 155, 158, 184. See also field research procedure excavation (field research procedure), 8–9, 20, 31, 41, 43, 49, 55–163, 170, 176–77, 180, 184–85. See also field research procedure exclusive-or, 154 execution trace, 99–102, 129–31, 152, 175, 182, 184 of Forth code, 112–15, 122–23, 129 experimental archaeology, 10 failure, 34, 38, 96, 103, 114, 123, 139, 144, 149, 185. See also bug false positive, 38, 59, 61, 110n4, 114, 141 field archaeology, 9 field research procedure, 8–9, 31, 167, 181, 184. See also Carver, Martin; names of individual stages field-walking, 35 finite state machine, 124 Fishies, 21, 27n7 flag bit, 57–58 floppy disk, 25–26, 33–34 capacity of, 168 and copy protection, 39, 146, 152 (see also copy protection) imaging (see Applesauce; Kryoflux) location of data on, 48, 138–48, 173 modern replacement of, 41n2 and polling, 100 speed of, 60 structure of, 37–38, 52, 138, 155, 157–59 Forth introduction to, 62–66 use in Amnesia, 46–47 See also ATILA; dictionary Fraggle Rock, 21–23, 27n8 fragment, 10, 45–46, 57–69, 73, 108, 138, 154, 156, 169–70, 176 Freeman, Irving, 23 frequency, 73t, 83–84, 100–101, 121n4 Game Developer’s Conference, 16 game studies, 6 Ghidra, 181
goto, 138, 140. See also jump, unconditional Graham, Shawn, 5, 185 graph, 61, 124, 133, 136 graphical image viewer, 71–72 graphics high resolution, 45–46, 70–73, 75, 80–81 low resolution, 45, 70 Gulliver’s Travels, 78 Haigh, Thomas, 7 hard disk, 5, 168 Hardcore Computing. See Computist Harper & Row, 20–25 Harry Potter, 10 Hartley, Glen, 21–22, 26, 177–78 hash, 50. See also checksum; MD5 Hat Trick, 171, 178n1 header, 127–28 heuristic, 38, 57, 62, 92, 128–29, 141, 144–45 hex. See hexadecimal hexadecimal, 74t, 76–77, 79, 110n4, 150 notation, 76 history code execution, 132f computing, 6–7 game, 6, 15, 24 oral, 60 (see also interview) Hitchhiker’s Guide to the Galaxy, 19 Huffman code, 84, 169 IBM and EBCDIC, 57 PC, 18–19, 34, 174 version of Amnesia, 19, 25, 33–35, 39, 44, 174 IDA, 181 IEEE, 132 if-else statement, 22, 172 In Search of Excellence, 21 inc, defined, 79 index hole, 159 industry book publishing, 20, 26–27 computer security, 8 entertainment, 6 video game, 6
Index Infocom, 19–20, 24, 26–27, 116, 118, 132, 167–68 injection, 186n2 interactive fiction, 16, 21–22, 26–27. See also text adventure interdisciplinary, 7, 10, 181 Internet Archive, 3, 8, 35 interpreter classical, 66–67, 98, 153 direct threaded, 67, 98–99 indirect threaded, 67–68, 98–99, 102 loop, 98 outer, 108–9 interrupt, 184 interview, 15, 20–21, 27, 45–46, 60–61, 132, 170, 173, 177, 183 iny, defined, 79 Isay, Jane, 20–21
jmp, defined, 91. See also jump, unconditional Journal of Computer Applications in Archaeology, 4 jsr, defined, 81. See also subroutine, defined jump, unconditional, 102, 138, 172 KansasFest, 71 King Edward, 20, 22–24, 27n8, 46, 107, 170, 172–73 KLUdge, 136 Kreitzberg, Charlie, 22, 27n8, 60, 68n3, 167, 170, 173 Kryoflux, 39 language application-specific, 171 assembly, 77–79, 83, 86n4, 89–90, 96, 104, 112, 122–23, 126–27, 129–30, 149, 152–54, 160, 162n1, 171–72, 174–75 domain-specific, 170 extension, 171–73 high-level, 184 (see also pseudocode) interpreted, 66–67, 77, 89, 93, 96, 102, 104–5, 113 for copy protection, 153–55, 160–61, 170
191
low-level, 90, 96, 101, 170–71 machine, 77–78, 86n4, 160, 171, 175, 184 (see also language: assembly) natural, 16 numeric, 66, 77, 91, 153 programming, 24, 45–46, 64, 87n9, 103, 138, 170, 182 (see also BASIC; Forth) Law of the West, 174f lda, defined, 79 in interpreted language, 154 ldx, defined, 79 ldy, defined, 79 Legacy Hub Archaeological Project, 6 Level 9, 168 Linux, 38, 53n2, 59, 68n4, 73, 86n2, 110n2 lithic, 3 little-endian, 78, 90, 104f, 106, 117, 127, 143, 162n6 Loeliger, R. G., 107–9, 171 log file, 91, 93–95, 99, 114, 122, 129 loop detection, 100 machine-created culture, 6 magazine, 15, 32, 60, 150 review, 25–26 magic number, 182 magnetic flux, 38–39, 138, 158–59, 173–74 magnetic media, 25, 39, 49–50 malicious software, 8 MAME, 34–37, 40, 70–72, 131f Manhattan, 23–25, 48, 138, 146, 167, 173, 177 manuscript. See script material culture, 3, 6 MD5, 50–53, 150 media archaeology, 4 memory bank. See bank switching metadata, 52, 158 middle-endian, 143f, 148n3 mnemonic, 78–79, 81, 91 MobyGames, 16 Moshenska, Gabriel, 4, 7, 180 movie rights, 26 Mystery House, 18, 140 named pipe, 130–31
192
Index
naming constraint, 103–4, 112 convention, 104, 124 negative space, 46, 73 Neolithic, 6 New York (city), 25. See also Manhattan newline, 62–63 Newsweek, 25 Nibbles Away II, 41 No Man’s Sky, 6–7 node, 60, 124, 133, 148n1 nondestructive, 5, 180 obfuscation, 44, 59, 91, 118, 151, 153–55, 161 One-on-One, 160f operating system, 40, 52, 62, 129, 155 optimization, 44, 91, 99, 154, 162 Oxford (publisher), 20 packaging, 21–25, 48, 128, 142, 146, 177 parser in Amnesia, 25–26, 47–48, 122–37, 172, 177 in Forth interpreter, 109 in text adventure games, 16–17 part of speech, 124, 127–28 PC. See register: program counter PHM Pegasus, 160f physical artifact, 4, 6, 8, 48. See also packaging physical excavation, 43 pirated software, 31, 49. See also crack pixel, 45, 71, 86 platform studies, 6 plot (data), 85–86, 92–93, 96, 141–42, 146f polling, 99–100, 122, 184 popular culture, 183 postfix notation, 63, 65 postmortem, 16 practice programming, 62–63, 83, 108, 111–12, 120, 127, 136, 170, 176, 180, 183 reverse engineering, 68, 83, 87n7, 110n3, 181, 183–84, 186n3 preservation, 32, 49 Prince Edward, 23, 27n8 Prisoner, The (Disch), 21 prologue (data), 158
pseudocode, 22, 169 publication (field research procedure), 9, 167, 184–85. See also field research procedure pushdown automaton, 124 QWERTY keyboard, 63 RAM. See random-access memory Ramworks, 60, 68n3, 119 Random House, 20 random-access memory, 60, 68n3, 88, 95, 126–27, 136n5, 138, 167–68 for display, 70–71, 74–75, 78, 80–81, 84, 86n1, 93, 96 read-only memory, 88, 94–95, 122–23, 126–27, 136n5 reconnaissance (field research procedure), 8–9, 15–30, 35, 45–47, 60, 133, 183–84. See also field research procedure reconstruction, 84–87, 107–8, 181 reference manual, 58, 70, 79, 81, 90, 111, 162n1 register accumulator, 79, 154 index, 79 program counter, 78 stack pointer, 89–90 status, 150, 172 Reilly, Pat, 22 Reinhard, Andrew, 5 reserved words, 170 return address, 90–91 as bookmark, 66, 89 reverse engineering and code complexity, 44, 48, 149, 161 and contemporary archaeology, 7 as field research procedure, 8–9, 184 as incremental process, 47, 96, 106, 120–21, 181 and object agency, 9 method summary, 182–83 as puzzle, 10, 65, 185 using hardware vs. emulation, 34 reversed string, 59–60 ROM. See read-only memory Romanov, Lis, 22–23 rts, defined, 89. See also subroutine, defined Rush, 34
Index Saga, xiii scientific archaeology, 5, 49 script, 21–23, 25–26, 60, 88, 167–70 sec, defined, 154 secondary storage, 168 sector -by-sector hash, 51–52, 150 boundaries, 159f defined, 37 identifying location of, 61–62, 150, 155 low-level representation, 157–58 self-modifying code, 80–83, 99, 153, 155, 162n6 semigraphics, 58 sherd (pottery), 3, 10 Simon & Schuster, 21 Skyfox, 160f soft switch, 81, 127, 152 softkey, 150 Software Publisher’s Association, 26–27 source code, 7, 20, 66, 77, 105, 170 Adventureland, 27n3 Amnesia, 10, 45–47, 60, 63–64, 66, 106–9, 112, 138, 169–70, 176–77 Hat Trick, 171 SP. See register: stack pointer space character, 37–38, 63, 96, 114, 169 (see also whitespace) limitation, 17, 25, 32, 45, 57, 63, 99, 118, 144, 167–69 negative (see negative space) spreadsheet, 85–86, 110n2 square brackets, 108 sta, defined, 79 stack data, 63–66, 111–12, 130, 132f dual-stack model, 65–66, 111 explained, 64 return stack, 66, 89 Starting Forth (Brodie), 103, 107 state, 124, 133 static analysis, defined, 35, 182. See also dynamic analysis strategy (field research procedure), 8–9, 43–54, 184. See also field research procedure strings (program), 38, 68n4 stylometry, 174 subroutine, defined, 81
193
survey, 6, 35, 183 symbolic name, 81, 101–2, 110n3 systemic concerns, 7–8 tabletop game, 17 teleporting, 139–42, 144, 152, 177 Terry, James, 22–23, 25, 27n7, 35, 45, 48, 60, 107–8, 132–33, 168, 170–72 text adventure, 7, 16–19, 26–27, 34, 45, 47–48, 116, 122, 136, 140, 167–68. See also interactive fiction Threaded Interpretive Languages: Their Design and Implementation. See Loeliger, R. G. time pressure, 24–25, 168 timestamp, 52, 95f toggle switch, 76 tool for Amnesia analysis, 52, 96n1, 103, 118, 128–29, 141, 144, 154, 175–76 (see also decompiler; reconstruction; visualization: code) building, 10, 181 total excavation, 43 track defined, 155 fat, 159–62, 173–74, 178n3 half, 158 quarter, 158 Tristam Island, 17f TRS-80, 17 Turing machine, 124 Twitter, 15 undocumented instruction, 91, 153 Unicode, 38 Utopia, 24 variable Boolean, 94 name, 125 usage in Forth, 111–12, 125, 139, 176 version control, 183 version number, 52, 120 VICE, 34–35, 175 virtual address, 143–44 visualization, 5, 181–82, 185 code, 129–33, 142–43, 145, 152 flux, 158–60, 174f vocabulary
194 6502, 79 Amnesia, 25–26, 123, 125–30 Forth, 111, 170 text adventure, 16–17 walkthrough, 35 watchpoint, defined, 75 wayfinder, 114–15 whitespace, 37, 105 Wikipedia, 16, 19–20 Williams, Robin, 26 Woods, Don, 17 Word of God, The (Disch), 177–78
Index World of Spectrum, 16 .woz, 40, 52–53 Write Stuff, The, 21 writing style, 24, 26, 174 X. See register: index X-Street Indexer, 25, 48, 142, 146–49, 173 Y. See register: index zero page, 92
.zip, 52 Zork Implementation Language, 20