The use of computers in anthropology 9783111718101, 9783111189505


191 30 21MB

English Pages 558 [564] Year 1965

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
PREFACE
INTRODUCTION
PART ONE
STRUCTURE OF THE COMPUTER AND COMPUTER USE
I. COMPUTER STRUCTURE
AN ANTHROPOLOGIST’S INTRODUCTION TO THE COMPUTER
Discussion I
II. MODES OF USE: GENERAL
A TYPOLOGY OF COMPUTER USES IN ANTHROPOLOGY
COMPUTER PROCESSING AND CULTURAL DATA: PROBLEMS OF METHOD
Discussion II
III. MODES OF USE: SPECIFIC
COMPUTERS AND THE STORAGE AND RETRIEVAL OF ANTHROPOLOGICAL INFORMATION
LINGUISTIC DATA PROCESSING
STATISTICAL PROCESSING
Discussion III
PART TWO SPECIAL RESEARCH AREAS
IV. TEXT-ORIENTED
COMPUTERS AND LEXICOGRAPHY
DIACRITICAL AND STATISTICAL MODELS FOR LANGUAGES IN RELATION TO THE COMPUTER
THE COMPUTER AS A TOOL IN FOLKLORE RESEARCH
A METHODOLOGICAL INVESTIGATION OF CONTENT ANALYSIS USING ELECTRONIC COMPUTERS FOR DATA PROCESSING
Discussion IV
V. CLASSIFICATION AND GROUPING
SURVEY OF NUMERICAL CLASSIFICATION IN ANTHROPOLOGY
COMPUTER METHODS FOR CLASSIFICATION AND GROUPING
AUTOMATIC CLASSIFICATION IN ANTHROPOLOGY
RECONSTRUCTING AN ECONOMIC NETWORK IN THE ANCIENT EAST WITH THE AID OF A COMPUTER
Discussion V
VI. EXPERIMENTAL
SIMULATION: AN INTRODUCTION FOR ANTHROPOLOGISTS
THE COMPUTER AS A TOOL FOR THEORY DEVELOPMENT
REQUEST-ANSWER INTERACTION IN RELATION TO MAN-COMPUTER INTERACTION
SUGGESTIONS FOR ANTHROPOLOGY: THE MACHINE WHICH OBSERVES AND DESCRIBES
Discussion VI
APPENDIX A. Participants, and Papers Prepared for the Conference
APPENDIX B. Some Current Uses of the Computer in Anthropology
NAME INDEX
TOPICAL INDEX
Recommend Papers

The use of computers in anthropology
 9783111718101, 9783111189505

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

THE U S E OF COMPUTERS IN ANTHROPOLOGY

STUDIES IN GENERAL ANTHROPOLOGY edited by

D A V I D BIDNEY

DELL HYMES

Indiana University

University of California

Bloomington

Berkeley

P. E. DE JOSSELIN DE JONG Leiden University

E. R. LEACH Cambridge University

I I

1965 LONDON

MOUTON & CO. · THE H A G U E ·

PARIS

THE USE OF COMPUTERS IN ANTHROPOLOGY edited by

DELL HYMES

LONDON

1965 M O U T O N & CO. · THE H A G U E ·

PARIS

© Copyright 1965 by the Wenner-Gren Foundation for Anthropological Research, Inc., New York, N.Y., U.S.A.

Printed in The Netherlands

"... the fundamental requirement of anthropology is that it begin with a personal relation and end with a personal experience, b u t . . . in between there is room for plenty of computers." Claude Levi-Strauss

PREFACE

This book is the result of a conference of the same title, at Burg Wartenstein, Austria, held during June 20-30, 1962, as symposium no. 18 of the program of summer symposia sponsored by the Wenner-Gren Foundation for Anthropological Research. The conference had its beginning in conversations in 1960-1961 at the Center for Advanced Study in the Behavioral Sciences, and at Stanford University, between Thomas A. Sebeok, A. Kimball Romney, Sydney Lamb, and myself. Sebeok, a Fellow at the Center, was testing the possibilities of the computer for his research in Cheremis folklore; Romney at Stanford was introducing computer processing into a program of social anthropological field work among the Mayan Tzeltal; Lamb at Berkeley was developing both the practical and theoretical aspects of verbal data processing. The need for a general consideration of the place of the computer in anthropological research was strongly felt. The need to pool experience on an international basis, and the long-standing personal interest of the late Paul Fejos, director of the Wenner-Gren Foundation, made the Foundation's program of symposia at its summer headquarters a logical occasion. Having been an interested party to the conversations, I was asked to serve as organizing chairman, and prospective editor, working with the close collaboration of the other three. On a visit of Dr. Fejos to Stanford in the spring of 1961 a date at Burg Wartenstein was agreed upon. The conference took place pleasantly and profitably, with a focus both upon the general topic, and upon its communication to anthropologists by means of the present book. The schedule was kept flexible and open, and topics whose importance emerged during the course of the conference were taken up at the end: typology of research problems; the results of a specially organized submeeting of those most concerned with numerical classification; prospects for organization and cooperation in research. The interest and commitment of the participants was such that new or extensively revised papers were sometimes planned on the spot, so as to

8

PREFACE

better meet the needs of the subject and the anthropological audience. Insofar as those needs are usefully served in this book, the credit must be shared by all. Having met, and talked, and taped the talk, all such conferences face the question of what best to do with the stacked boxes, several days worth, of stored voice. Few have decided from the outset to dispense with recording, or to throw the reels away. Anthropologists, like Indians, talk no less frankly and impulsively with a recorder running, especially one discreetly hidden; and they often enough say things whose import, or style, it seems culpable to let go to waste. In this instance it was decided at the conference to incorporate remarks from the discussion selectively. To do so was made possible partly by the recordings, but in equally important part by the detailed notes taken by the rapporteuse. These notes were later augmented by a complete rehearing of the tapes on her part; occasionally, a point of wording or content was checked against the notes taken independently by myself and by Paul Garvin (who generously made his set available). There can be considerable confidence, then, in the essential faithfulness of the report of discussion to what was said. At the same time, I must take the responsibility, rather than the individual participants, for the phrasing and organization, and hence for any misrepresentation due to choice of words or context. Because one or two papers were not ready until well into 1963; and because the tapes did not reach us for use when anticipated, but rather after the birth of our son; it was not possible to decide finally on the contents of the sections of discussion in time to circulate them among the participants for approval. For this reason, and to make the account as coherent as possible, the discussion has been cast primarily in the third person, and attributed rather than quoted. The changes of wording consequent upon this decision, as well as the necessary translation from the spoken to the written mode of English (including occasional modifications for reason of redundancy, explicitness, and the like), are chargeable to me, as are the phrases supplied for continuity, and the grouping of remarks on the same or related topics from different sessions in one place. A good many points from the discussions have been incorporated in the papers; if the reports of discussion which follow the papers occasionally repeat, it is because the point seemed important in itself or in its context. I apologize to readers and participants for any remarks of value that faulty judgment or format may have omitted, and hope that readers will appreciate, as I do, how many there are to include. Credit for pungency and felicity

PREFACE

9

of expression must go to the participants; editorial modulation works the other way. With regard to the papers proper, editorial intervention has chiefly been to provide numbered, named section headings in many of the papers, throughout for some, occasionally for others, in the belief that such headings, especially as gathered together at the beginning of each article, make the contents much more comprehensible and accessible. I have prepared the index with the same end in view. This is not a book which it is necessary, or even desirable, to read through from beginning to end, but one which different readers will use in different ways. Some may want to begin with the kind of subject-matter problem that most concerns them; others with a kind of processing; some will turn first to an overview of the logic and typology of problems from the viewpoint of computer use; others will want to begin with the paper which by right of logic begins the book and deals with the instrument itself. Some, indeed, may scan the discussion first. Whatever the case, it is hoped that the lists of section headings, and the index, will guide the reader appropriately, and that each will find something among the variety of backgrounds and vantage points represented that speaks to his or her condition. Two reports of the conference have appeared: "L'utilisation des ordinateurs en anthropologic (k propos d'un colloque)", VHomme, 2 (1962), 125-127 by Madame Colette Piault, and "The use of computers in anthropology", Current Anthropology 4, 1 (1963), 123-129, by myself. We thank the National Science Foundation for supplementary grants which facilitated the participation of some of the American scholars, and made possible my editorial work. On my own behalf, I should like to thank my wife, for her indispensable aid; Robert Schölte; and Laura Gould, for valiant and intelligent typing of much of the manuscript. We thank most the Wenner-Gren Foundation, and its staff, for the conference, and for the hospitality which made it so enjoyable. On behalf of the others and myself, I should like to dedicate the book to the memory of Paul Fejos. At our final session, he spoke of his concern to interest anthropology in the use of the computer, of his gratification that the conference had come about, and of his hope for the published result. May I hope in turn that the result has deserved his confidence, and that it will contribute to the future he envisioned. DELL H . HYMES

TABLE O F C O N T E N T S

PREFACE

7

INTRODUCTION

15

PART ONE STRUCTURE OF THE COMPUTER AND COMPUTER USE I. COMPUTER STRUCTURE A N ANTHROPOLOGIST'S INTRODUCTION TO THE COMPUTER

.

Sydney M. Lamb and A. Kimball Romney (University of California, Berkeley, and Stanford University) Discussion 1

37

91

II. MODES OF USE: GENERAL A TYPOLOGY OF COMPUTER USES IN ANTHROPOLOGY

.

103

J. C. GARDIN (CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE, PARIS) COMPUTER PROCESSING AND CULTURAL

DATA:

PROBLEMS OF

METHOD

119

PAUL L. GARVIN (THE BUNKER-RAMO CORPORATION) DISCUSSION I I

141 III. M O D E S O F U S E :

SPECIFIC

COMPUTERS AND THE STORAGE AND RETRIEVAL OF ANTHROPOLOGICAL INFORMATION

145

ROBERT BRUCE INVERARITY (ADIRONDACK MUSEUM) LINGUISTIC DATA PROCESSING

Sydney M. Lamb (University of California, Berkeley)

159

12

TABLE OF CONTENTS

STATISTICAL PROCESSING

189

Wilhelm Milke (Soest/Westfalen, Germany) Discussion III

205

PART TWO SPECIAL RESEARCH

AREAS

IV. TEXT-ORIENTED COMPUTERS AND LEXICOGRAPHY

215

Roy Wisbey (Cambridge University) DIACRITICAL AND

STATISTICAL

MODELS

FOR

LANGUAGES

IN

RELATION TO THE COMPUTER

235

Pierre Guiraud (University of Groningen) THE COMPUTER AS A TOOL IN FOLKLORE RESEARCH

.

.

.

255

Thomas A. Sebeok (Indiana University) A

METHODOLOGICAL

INVESTIGATION

OF

CONTENT

ANALYSIS

USING ELECTRONIC COMPUTERS FOR DATA PROCESSING .

.

273

Madame Colette Piault (Centre National de la Recherche Scientifique, Paris) Discussion IV

295

V. CLASSIFICATION A N D GROUPING SURVEY OF NUMERICAL CLASSIFICATION IN ANTHROPOLOGY

301

Harold E. Driver (Indiana University) COMPUTER METHODS FOR CLASSIFICATION AND GROUPING

.

345

R. M. Needham (Cambridge University) AUTOMATIC CLASSIFICATION IN ANTHROPOLOGY

.

.

.

.

357

Peter Ihm (EURATOM, Ispra (Varese), Italy) RECONSTRUCTING AN ECONOMIC NETWORK IN THE ANCIENT EAST WITH THE A I D OF A COMPUTER

J. C. Gardin (Centre National de la Recherche Scientifique, Paris) Discussion V

379

393

VI. EXPERIMENTAL SIMULATION: A N INTRODUCTION FOR ANTHROPOLOGISTS

David G. Hays ( R A N D Corporation)

401

13

TABLE OF CONTENTS T H E COMPUTER AS A TOOL FOR THEORY DEVELOPMENT .

.

.

427

John T. Gullahorn and Jeanne E. Gallahorn (Michigan State University) REQUEST-ANSWER INTERACTION IN RELATION TO MAN-COMPUTER INTERACTION

449

Robert Pages (University of Paris) SUGGESTIONS FOR ANTHROPOLOGY: THE MACHINE WHICH

OB-

SERVES AND DESCRIBES

Silvio Ceccato (Centro di Cibernetica e di Attivitä Linguistiche del Consiglio Nazionale delle Ricerche, Universitä degli Studi di Milano, Italy) Discussion VI APPENDIX A .

501

Participants, and Papers Prepared for the Con-

ference APPENDIX B.

465

505 Some Current Uses of the Computer in Anthro-

pology

507

NAME INDEX

529

TOPICAL INDEX

538

INTRODUCTION

It is easy simply to extoll the virtues of the computer, and the prospects of its use, but that is the purpose neither of this book nor its introduction. The virtues and prospects are real, and recognized throughout, but the aim is to see them, not in speculative isolation, but in real relation to the state and needs of anthropology. The belief on which this book rests is that the development of the electronic computer, and the diffusion of it among the sciences concerned with man, confront anthropology with a challenge that must be met, yet whose full nature is not yet generally grasped. The aim, then, is to evaluate the role of the computer (in effect, the digital computer) in anthropology.1 The book is not a handbook, or how-to-do-it kit. Computers and their programs continue to change and develop rapidly enough so that any effort to present a complete last-word would be doomed to failure. For any particular anthropological research, moreover, collaboration with a particular local facility must determine the precise shape of the design. A good deal about computer use, it is true, can be learned from the 1

Analog computers are not considered here. The difference between the two main types of computer, analog and digital, might be said to be something like the difference between sound symbolism and the arbitrary relationship between sound and meaning that predominates in languages. The digital computer, counting, sorting, and the like stands in a highly abstract and generalized relation to particular data; hence its great flexibility and utility. What resemblance there may be between the processing and the nature of the data processed inheres in the program, not in the instrument. An analog computer, on the other hand, is designed to bear some intrinsic similarity to the data on which it operates; the instrument itself has something of an iconic relationship to material. Obviously enough, such a computer can in principle have special kinds of power, but at the cost of a utility limited to special purposes. The prospective role of analog computers in anthropology was introduced at the conference by Sebeok. The present preeminence of digital computers was ascribed mainly to the greater ease of programming them. While analog computers do not seem effectively a prospect for anthropological use at present, it should be kept in mind that a good deal of work on them continues, including plans for coupling analog and digital machines to obtain the joint benefits of both. Analog computers may well come to play a significant role in anthropology for specialized purposes. Cf. Cowan, 1963, p. 1075.

16

INTRODUCTION

pages that follow. The emphasis, however, is upon the import of the computer for the intellectual content, and to some extent, the social structure, of anthropological research, questions which are likely to grow in importance for some years to come. Nothing could be further from the intent of any of us than to encourage an indiscriminate stampede to use of the computer in anthropology. A notable characteristic of the conference from which this book results was the vigor and frankness of the discussion, in which none were quicker than specialists in the computer to argue against its use, where unnecessary or ill-prepared. To participate in such a conference and book of course does imply some positive interest in the computer's role, but the interest has as context concern with the computer as an instrument of anthropology, rather than as end in itself, and with its proper use as an instrument. Succinctly put, the extremes of our ambition, so far as persuasion is concerned, can be said to be: to encourage and assist that small number of anthropologists who are willing to invest the energy and time necessary to explore the computer's anthropological uses; to convey to the profession generally the experience of the participants, so as to dispel stereotypes, and permit a balanced appraisal of what will become a major impact of technological change on human values and cultural pattern within our own domain. I myself write as someone not engaged in computer work, but who, as an organizer and chairman for the conference, has tried to take the role of a concerned but objective participant observer. In this role I should like to single out certain themes which have emerged from the papers and discussions and which, though variously treated in the rest of the book, are of such general importance, and so likely to recur in discussions of the computer in anthropology, as to demand an attempt to deal with them together here by way of introduction. The starting point of most discussion, and properly so, is likely to be an anthropologist's question, what can use of the computer do for me? The whole of this book is in one sense an effort to answer that question, insofar as present experience permits, not only in terms of general prospects but also of specific examples, especially in Part Two and Appendix B. Yet a factor often implicit in discussion of specific examples (Romney pointed out) can vitiate the outcome, namely, confusion between one's interest in, or evaluation of, the content of a known example, and one's evaluation of the general potential of the computer. To form a just estimate of the computer's utility for his or her own work, an anthropologist must abstract its general capacities from particular tasks.

INTRODUCTION

17

One may be indifferent or hostile to a project, yet discover that it embodies procedures one finds exciting if applied to different ends. Here, as throughout computer use, a demand made on the computer entails a reciprocal demand on oneself. The anthropologist is likely to think of data processing in terms of familiar subject areas, such as the broad divisions of physical, social, and linguistic anthropology, archaeology, ethnology, and folklore, or in terms of a specific corpus, whether Trobriand texts, Nazca pottery, a Tongan census, or Pecos skulls. To determine what use the computer can be, however, requires linking such categories with the operational categories of computer processing, both in terms of such broad divisions of processing as numerical and nonnumerical (or verbal and non-verbal) and in terms of specific steps of the order represented in a flow chart of the analytic sequence one has in mind. (See the papers by Gardin, Garvin, and Lamb.) One may find oneself broaching novel links and organizations of anthropological subject-matter, and coping, however tangentially, with problems of the unity and classification of the human sciences, not in ontological, but in methodological terms. In any event, if the anthropologist begins by putting the primary question, this is what I want to know, how can you help? - he or she must be prepared to answer in return the second question, what do you know about the steps of finding out what you want to know? When anthropologists discuss freely with specialists the relation between a body of data and computer processing, especially if the advisability of the latter is in question, certain differences of outlook may subtly affect the result. This is the more true especially if the distinction between particular tasks and general capacities is not made, and if attitudes as to scientific strategy and tactics are elicited. Differences of outlook here seem linked in part to different relations to the computer and to types of data. Specialists in the use of the computer naturally value the instrument's capacities, as has been remarked, and of course are familiar with what it can now do. Hence they may tend to find a task interesting or significant to the extent that it exploits such capacities, or challenges and develops them. That is, there may be a tendency to be oriented toward novel demands, rather than toward routine application of what is already known. Often enough, because of the training such specialists are likely to have, problems are thought of preferably in terms of what are for anthropologists relatively advanced concepts of methodology, theory, and scientific relevance. In contrast, most anthropologists are obviously and naturally immersed in a particular body of content, often enough

18

INTRODUCTION

valued for its own sake. They will tend to find a suggestion interesting to the extent that it promises to help realize the potentialities of their data. Methodology and theory tend to be relatively less advanced and explicit, and more closely tied to particularistic ends. Moreover, it is likely to be used on their way to becoming routine that will be most attractive, since the link between type of data and type of operation will have already been generally worked out. In this contrast there is a basis for difficulty or misunderstanding occasionally on the score of the scientific and the computational interest of a particular piece of work. It is certainly true that a task can be advanced or novel in terms of the development of the skills and content of anthropology, given its minimal present use of computers, and at the same time be elementary or routine in terms of the level of development of the computer field. Anthropological problems can, however, be a challenge and source of advance in the latter. What is needed, as several participants in the conference have stressed, is first of all development of anthropology. An anthropological theory sufficently explicit for its consequences to be tested with the aid of the computer broaches frontiers for both. And all considerations of methodological interest apart, there cannot but be general agreement that in many cases where the task is not theoretical, but the interpretation of a particular body of data (as is often the case in the humanistic branches of anthropology), the sheer power of the computer to save drudgery is a human gain that cannot be gainsaid, and its ability to integrate analysis of masses of data an empirical advantage no field concerned with the progress of its knowledge can refuse. What is essential is a conception of the use of the computer in anthropology which is flexible enough to start from where anthropology, its data and analyses, now are, and to link the immediate applications, most often clerical, to applications, most often prospective, in which theoretical analysis and explication can fruitfully predominate. There may always be a tendency for some to be interested only in methodologically advanced and exploratory uses, and for others to respond only to prospects of immediate practical payoff, but this will only reflect the range and character of anthropology itself. While many of those who pioneer in the use of the computer in anthropology will be primarily methodologists at heart, some must have a conception general enough to accomodate both poles, and be able to mediate wisely between advances in the state of the computer art, on the one hand, and in the content of anthropology, on the other. The main points so far made have all to do with assessing the potential

INTRODUCTION

19

of the computer for one's needs. But even though one may have distinguished general capacity from particular tasks, have related data to operations, and, by sticking to one's research guns, avoided embroilment in questions of how "interesting" the research may be to whom, there remain other considerations before a judgment can be wisely made as to when, and when not, to use the computer. One of the most important of these, especially for the prospective anthropological user, is that of time. Often the time saved by the computer's speed must be spent coping with its stupidity, for it can only do exactly what it is told, and so must be told exactly what to do. The need to be explicit may bring great rewards, but obviously it can make work with the computer more demanding, not less. Gardin notes in Discussion I that the development of the analysis and program for the Akkadian tablets took two years; and after experience with computer analysis of folktale themes (at the Department of Social Relations, Harvard University), Benjamin Colby reports his realization (p.c.) that any innovating computer work requires one's full time, since anything less than full time on the job is wasteful even of what time one does spend. Clearly, then, any anthropological uses of the computer other than application of standard programs can come about profitably only slowly, as individuals here and there devote themselves intensively to the task. Even where programs of a standard type are to be applied, a good deal of the anthropologist's time is obviously required, and from an early point. To think of the computer as a panacea is to misconceive it just as much as to think of it as a mechanizing monster. Both stereotypes overestimate the role of the computer in research (complementing each other in that respect), for computers do not hand down verdicts, but report results which one must judge, and which can be no better than the preparation for them. Use of the computer is but one part of a research design; as with use of statistics, one can't show up with a wheelbarrow of data and expect to get what one wants by dumping it in someone else's hopper. If the computer can be expensive of time, it is also expensive of money (though not necessarily of that of the anthropologist). From a scientific point of view, the money matter is not one of absolute amounts, but of the relation of cost to real needs and to alternatives in equipment and procedure. Waste can result from misjudgment in either direction, from deciding to use the computer when it is not needed, and from deciding not to use it, or failing to use it, when it is. Less expensive and less complicated means often will do the job. There

20

INTRODUCTION

is a knack worth cultivating, that of recognizing when the real value of the processing inheres in the preparation of the material. Punch-cards, properly coded, may themselves be all one needs, to be sorted adequately by manual means; the constraint of thinking through the sequence of operations necessary for processing may itself answer one's question; or a combination of the two, using the cards and an explicit flow-chart to manually simulate a computer run, may solve a problem. Also, the time may not be ripe in a particular field of research. Here there enters a consideration that sets anthropology somewhat apart from other behavioral or social sciences. In many areas of social research, the masses of data obtained are of little or no intrinsic interest; one set of sophomores or suburban housewives is enough like another, and the aims of the research are so conceived, that once the correlations, rankings, or whatever have been obtained, the data, while it may be preserved, is likely to be forgotten, and deservedly so. New ideas are tested on new sets of data. In much of anthropology, on the other hand, the data do have intrinsic interest and are preserved, even religiously, as part of an all too thin record of the range of human nature and culture under conditions already past or rapidly disappearing. The corpus is all too limited, and may remain significantly the same for the testing of whatever new ideas may occur (e.g., North American ethnology, or archaeological sites (which cannot be dug twice)). In short, there are many areas of anthropology in which the present data, though it may continue to be augmented, remains a permanent resource; and computer processing, if it is to be done, should be done once, so to speak, and done right or at least done well enough so as to serve usefully for some period of time. To do it right, however, achieving the comprehensive scale and serviceableness that makes a project reasonable both as a use of the computer and a contribution to the field of research, may mean agreement on the pooling of data, agreement on a center to take the lead, and agreement on a system of coding. Such agreement may be slow to come. In the interval, pilot projects help, and may be essential, and can often be conducted with simpler equipment, yielding some immediate benefit while preparing the way for the more complex activity. The fact remains that for many purposes of anthropology it must seem wasteful, from any reasonable viewpoint, not to use the computer, or to plan for its use. Some kinds of research, indeed, cannot or will not be done otherwise: certain kinds of classification, for example, and, generally, any statistical or verbal data processing, where the amount of data and/or the intricacy of the operations is of a certain scale. Where the

INTRODUCTION

21

boundary lies, of course, is not fixed, but relative to the state of a field and of alternative means. Knowledge of the existence of the computer as an alternative today may indeed discourage as pointless efforts that an earlier period might have lauded (cf. Wisbey's account of the history of concordance making). Probably the boundary varies with persons too, and one anthropologist will continue to do what another considers pointless to begin. Here we return to the primary theme of how matters look to anthropologists planning research. Even if a project is agreed to be feasible only with the aid of a computer, not all anthropologists will conclude that it should be undertaken. Of the reasons there may be for such opposition, the principle ones seem two: style of work, eschewing the novel relationships to data and colleagues that computer use entails; and, social or intellectual outlook, such that the computer is judged, not as an instrument, but as a symbol of ulterior forces. In considering these reasons for opposition, it is worth noting that literary scholars, often in the forefront of those opposed to mechanization, have been quick to accept the computer in one respect, the production of concordances. The human cost in years of drudgery, often enough coupled with failure even to finish the work, has made the computer a humanizing agent here, freeing scholars for their proper work, and providing them with more efficient and accurate materials with which to do it. There are clearly equal advantages for anthropology in concordances of native texts, not only for linguistic and folkloristic analysis, but also for study of social structure, values, etc. (Note the recent discussion of the relation between mechanical translation and ethnography by Metzger, 1963.)

In all branches of anthropology, indeed, there are large bodies of basic data suitable to the clerical advantages of computer processing. Moreover, it is difficult to believe that some problems of basic research can ever be solved without the efficient and systematic processing of large masses of data that the computer now makes possible. If, for example, all the native languages of the New World are ultimately genetically related, then their classification is a question of subgrouping, of determining the degree and hierarchy of relationship between pairs and groups of languages within the total set. Merely discovering that two languages are in fact related provides of itself no answer to the anthropologically interesting question, the closeness or level of relationship. To determine subgrouping on such a scale, however, requires systematic comparison of large numbers of languages and quantities of data, and the use of the

22

INTRODUCTION

computer. (Such work is being undertaken by Swadesh; see his report, 1963, and the discussion in the paper by Lamb.) Comparable examples exist in work with folktales, potsherds, and cultural elements of many kinds. Indeed, the computer may be seen, if properly used, as essential to what Kroeber felt the study of both culture and language to stand in crying need of, "far more systematic classification of their multifarious phenomena. Perhaps we have had a surplus of bright ideas and a shortage of consistent ordering and comparison of our data" (Kroeber, 1960, 17). One could continue for some time with straightforward opportunities and possibilities.2 It remains the case that for many anthropologists the relation of means to end may not seem as clear as it does to the literary scholar in the case of a concordance of a major poet. Literary scholars know clearly why they want and need concordances and indexes, and none, it appears, regrets loss of the chance to copy out an author word by word, slip by slip. It is not so certain that anthropologists know what they need and want to do with their data; and there may well be anthropological chores, which, even though ostensibly also only means, some may not wish to surrender. A way of obtaining or handling data may be valued for its own sake, and an intellectual goal may come to seem inseparable from a particular way of reaching it. This way of stating the situation should not prejudice evaluation. One of the values of anthropology is the kind of anthropologist it produces. It would be rash to maintain that style of work and kind of anthropologist are without connection, and apprehensions as to the effect of a computer on style of work are understandable. That computer use need imperil essential values, as to the qualities of either anthropo2

One obvious need, of course, and the opportunity most likely to be taken, is that of using the computer merely to keep track of the literature of the field (see the papers by Inverarity, and Lamb, and Discussion III). Even here, there can be opposition on the grounds of established styles of work and scholarly principle. The historian Eric Boehm, Editor of Historical Abstracts, admits that a few years ago he reacted to the suggestion of data processing equipment with the statement. "Indexing is an intellectual activity and not a machine process" (Boehm, 1963, 8). He found, however, no loss (except in visual appearance) and a good deal of gain: the machine took three hours to sort index entries, where it had taken twelve man-weeks before. Boehm's is indeed an excellent statement from a related discipline of the need to study and plan for data processing, based on a realization that the problem is not to keep the machine in its place, but to use the machine to keep the data in its place, accessible and pertinently used. As Inverarity's paper indicates, to eschew the machine in this regard is not to preserve humanistic values in scholarship, but to allow drudgery, duplication of effort, waste and parochialism to triumph over selectivity and creativity as responses to the accelerating flood of publication.

INTRODUCTION

23

logists or anthropology, is, however, an apprehension I believe to be sadly mistaken. The conference which resulted in this book is itself evidence. It had the tone of shirtsleeves, rather than of stuffed shirts, the tone of men concerned with the proper use of tools (physical and mental), and with the basic needs of basic research. There may be promoters, opportunists, and the like in the computer field, attracted by its prestige and funding, but those who pioneer the use of the computer in anthropology are more likely to be a minority having to defend itself. It is still true that the anthropologist who decides to make use of the computer willfindhimself entering into a somewhat different style of work, involving novel relationships, physical, mental, and social. Nor can a single strategy be recommended. If the benefits in principle of computer processing are clear to scholars such as those who participated in the conference, the way and the degree to which the anthropologist should participate in computer processing in practice remain controversial, as Discussion I shows. On the basis of their experience and outlook, some will highlight the degree to which the operations of the computer, and programming for it, are irrelevant to the anthropologist's proper work, which is the theoretical and methodological development of his own subject-matter. The stress will be on danger of bad work, if the anthropologist tries to become expert in all aspects of the processing, and to write his own programs. Others, however, will highlight the dangers of the anthropologist not knowing enough, such that, for example, the results of his research are shaped in ways he did not intend by someone else's programming. It may be pointed out that the anthropologist needs to know how the programming language works, and about the information capacity of various components of the computer, simply to understand better the costs, which underlie decisions on research strategy and sequence of operations (B. Colby, personal communication). The actual outcome as to degree and kind of participation is likely to vary with personal situation and choice. If Kroeber's depiction of the personality of anthropologists, as wanting contact with data through their own senses, extends to use of the computer, many anthropologists are likely to get their hands quite dirty. Two points, at least, are clear. First, the work with the computer is cooperative, involving communication with a staff - if not that of the anthropologists, then of the computer center whose services are used. The communication should begin at the earliest possible point, preferably before the collection of data, if field work is in question, and before coding of the material in any case. For effective communication, the

24

INTRODUCTION

anthropologist needs some knowledge of the operation of the machine, and programming. Some knowledge of mathematics, statistics, and logic also helps, variously according to the nature of the research. These aids to communication are simply aids in discharging the responsibility to see to the effective and appropriate outcome of research. No program can be used blindly, and often the anthropologist will be taking part in a communicative exchange intended to design a novel program for his particular purpose. Here enters the second point: the anthropologist's primary responsibility is to his data and its analysis, and his primary difficulty is likely always to be to make the nature of the data and analysis adequately clear and precise. Mathematics, statistics, and logic may help, and certainly those anthropologists attracted to the computer are more likely than not to know something of them. Use of the computer in American anthropology also is likely to increase as the next generation, having advantage of new approaches in the teaching of mathematical thinking, reach professional careers. The central thing, however, is the ability to explicate one's own processes of analysis, whether the result is couched mathematically or not. Here indeed we find that the computer, far from displacing the human mind, places even greater demands upon it. If there is one theme pervading the conference and the prospect for the computer in anthropology, it is that the chief problems are not technological, but intellectual. The potentialities of machines that exist or can be built are much greater than our capacity to tell them what we want them to do. Some may find the intellectual demand of computer processing for precise detail at every point alien to their conception of anthropology, or at least personally uncongenial. Some may see in this intellectual demand, rather than in the facts of machinery and staff research, the real if subtle ulterior menace. It is part of the great tradition of American anthropology to recognize two complementary modes of approach, one aesthetic or appreciative, one analytic (one need only name Boas, Kroeber, Sapir and Redfield), and a threat may be felt to the place of the former. Or opposition may go further, to the standpoint that "we murder to dissect" and hence, if dissecting, have murdered; to Sapir's counsel that we cannot afford to make too much conscious; or to some expression of the view that the essence of reality is forever beyond our categories. I subscribe to the view myself, much as expressed by Suzuki in an account of the Buddhist concept of "emptiness", or "suchness", sünyatä, e.g.,

INTRODUCTION

25

The more thoroughly "logicized", the more thoroughly is iünyatä destroyed. The proper way to study iünyatä is to experience it, to become aware of it, in the only way iünyatä can be approached (Barrett, 1956, p. 262). I would only note two things. First, in a thorough conception, such as that of Suzuki, reasoning itself is recognized as participating in the essence of reality (here, sünyatä), and as "quite efficient in dealing with things of this world of relativities" (which is where computer processing goes on). It is only "when we want to get down into the very bedrock of reality, which is sünyatä" (and to which computer processing makes no claim) that "we must appeal to another method" (Barrett, 1956, p. 262). (As a matter of fact, debugging a computer program and wrestling with a koan are not altogether dissimilar. For a more serious resolution of the relation between reasoning and the transcendence of reasoning, see Levi-Strauss, 1955, pp. 372-373.) Second, it is hard to believe that an anthropologist seriously committed to knowledge of a problem or subject-matter can long refuse an instrument that enables him to know more. Nor is it easy to believe, if the commitment to anthropology is serious and thoughtfully based, that there can be a genuine fear that the dual richness of anthropology, humanistic and scientific, is endangered, either in one's own person or the field at large. When we discuss the use of computers, we are discussing that roomy area referred to by Levi-Strauss in this book's epigraph as lying between the personal relation with which anthropology begins and the personal experience with which it ends. In another sense, it is a middle zone, linking the humanistic and scientific aspects of experience, one which largely constitutes the distinctive heritage of anthropology. The methodological import is especially clear in the career of Kroeber, whose work stands as an example of the attitude that should characterize anthropology as it steers its course between partisans of a narrow view of either science or the humanities: Never reject significant data to maintain the purity of certain methods; never reject useful methods to maintain the purity of certain data. There are indeed proponents of science who would seem to prefer a stance of ignorance rather than deal with messy data, and proponents of the humanities who would seem to prefer not to know things about their subject-matter that would have to be found out by distasteful means, statistics, computers, or whatever. Anthropology, however, can hardly survive as a major field if it should adopt either position. If the computer is feared as alien or inimical to a humane intellectual outlook, it is an irony to realize that the exact opposite has proven true.

26

INTRODUCTION

Far from inducing a mechanistic reduction in conceptions of human nature, computer processing, and its associated concept, the feedback loop, have aided a humanized expansion of psychological conceptions of the nature of man. Neisser (1962, pp. 57-58) describes the development in the following way: Recently, a new language has appeared in which cognitive processes can be described. This is the language of information processing, which was first developed to deal with problems in electronic communication, and has proven its value for the theory and technology of highspeed computers. It is based on the systematic exploitation of a rather simple principle: any description (or message) is itself a sequence of events. It can be considered in the context of the alternative events (statements, messages) that might otherwise have occurred, within the existing constraints. From this point of view, every statement is a choice, or a series of choices, among possibilities. These considerations lead first to a quantitative measure More important for the study of cognition is that this language permits a precise description of the transformations and condensations which information can undergo. ... However elaborate the information processing may be, it can be fully specified without any reference to the "hardware" of the computer that carries it out. A single program can be run on many machines, physically very different from one another. The sequence or pattern of processes remains the same, and can be rewardingly studied in its own right. It is thus possible to work with symbolic processes of great complexity in a relatively direct way, without falling back on any of the classical metaphors. As psychologists have come to realize that information and its vicissitudes are the subject matter of a real and flourishing branch of science ... the "higher mental processes" have been viewed in a new light. Galanter and Miller (1960) write to similar effect: "When we think of the Tote hierarchy [their central model of human planning behavior - D.H.H.] as a sequence of instructions for an organism to execute, it becomes relatively clear that we are talking about the organism in much the same language that we would use to talk about a digital computer" (p. 291). Underscoring the general point as to the humanizing effect, Galanter and Miller note in their critique of stimulus response theory, and its associated concept, the reflex arc, "Since the development of servomechanisms it must be obvious to everyone that there is nothing mentalistic, anthropomorphic, or occult about teleological machines" (p. 285) - and hence no grounds for ruling the purposive and planning aspects of human behavior out of science. (For a general theoretical discussion, admitting of no radical distinction between humanistic and mechanistic perspectives, see Northrop, 1949.) As for the effect of the computer, or the automation and organization of which it has become a symbol, on social values, I can say only that to

INTRODUCTION

27

adopt a position equivalent to that of Luddites would be to betray the history and nature of anthropology. Avoidance for the sake of social values is tantamount to refusing to seek to sustain them, an elitism of withdrawal hardly better than the elitism many fear an automated society will breed. Within the social structure of anthropology itself, there would seem to be a prospect that use of the computer can have a democratic and decentralizing effect, in fact, in the face of the increasing complexity and bureaucracy that sheer growth of numbers will bring. Partly the effect can be through facilitating dissemination of knowledge (cf. Boehm, 1963), partly through the fact that any initial centralization of computer processing can lead to greater effective decentralization, accessibility to enriched stores of data, since the ease of reproducing the punched cards used by computers makes feasible efficient exchange. Within the social structure of the population at large, the effects of computer use can best be studied and humanely channeled, perhaps, by men who understand rather than fear them. Within anthropology and without, the relevant question is not whether, but how, computers will be used. (For recent comments on the computer and social policy, cf. Cowan, 1963, p. 1070, 1074; Howe, 1962; Michael, 1962; Michael et. al., 1963, among many.) I must turn now to consider the use of computers in anthropology as not only a choice confronting individual anthropologists, or as even a matter internal to the field as a whole, but as also a question of the place of anthropology among other disciplines. No Utopian vision or promises need be entertained to be sure that a new tool with the capacity of the computer will, as it in fact already has begun to do, change radically the face of the various disciplines concerned with man. Scholars in the traditionally humantistic fields of literature seem already to be doing more with their texts than we with ours. A good deal of attention is coming to be paid to the import of the computer for the study and practice of law, and for decision making as a general problem (Jones, 1962; Cowan, 1963). On a front affecting archaeology, attention is coming to be paid to computer processing by geologists (Krumbein, 1962) in terms quite analogous to situations in anthropology.3 s

Krumbein notes, for example, that: "It is significant in the growth of geology as a science that increased use of quantification and computer techniques commonly directs ... attention back to the field [my emphaisis - D.H.H.] ... from which ... data come. The occurrence of unexpected deviations or seeming inconsistencies in the analyzed data may suggest that some features merely recorded in passing need to be examined more carefully as an important part of the larger problem." And the prospect Krumbein entertains has clear analogy: "Perhaps it is too early to suggest that the advent of the computer, with its capability for processing qualitative

28

INTRODUCTION

In the life sciences use of the computer is already well under way. A conference, for example, was held some three or four years ago at the Massachusetts Institute of Technology to stimulate use of computers in biology; a major cooperative center for the use of the computers in the biological sciences is being planned in New England under the direction of Walter Rosenblith. Developments in the life sciences have already affected physical anthropology, primarily in the application of statistical and mathematical programs, (enough indeed to seem to some a fashion about to run its course (Hunt, 1963, p. 22, arguing in effect for return to the primacy of theoretical analysis immanent to the field over reliance on externally obtained numerical devices: "now that such intricate mathematical models are so easy to apply and so difficult to interpret intuitively, perhaps disillusionment will set in"). Turning to the behavioral science fields, sociologists and psychologists are exploring the implications of models and theories with the aid of the computer in ways that are likely to continue to affect the assumptions current about human behavior and the nature of the human animal (e.g., Feigenbaum, 1962; Newell, Shaw, Simon, 1962; Tomkins and Messick, 1963; and cf. the report of the Committee on Simulation of Cognitive Processes, 1963). As for crosscultural comparative studies of a sociological order, note Walter Goldschmidt (1962): "The research here reported would have been all but impossible without modern electronic computer technology." The most extensive, productive cross-cultural study of which I know is now being conducted by a social psychologist, Charles Osgood, at the Institute for Communications Research at the University of Illinois. The project is remarkable for the amount of theoretically relevant cross-cultural data efficiently obtained, and for the way in which its conduct contributes to the international development of as well as quantitative data, may pave the way for broadened use of the method of multiple working hypotheses (a method for qualitative evaluation and control of multivariate phenomena set forth by T. C. Chamberlin in 1897), this time on an even more comprehensive basis, by means of formal models adapted to a wide variety o f . . . problems. In this framework the computer becomes an essential part in a sequence of acquisition, storage, retrieval, and analysis that makes possible the assimilation into geology of the continually increasing flood of observational and experimental data. The presently dominantly empirical aspect of much data analysis... is not disturbing in a science where much effort, both qualitative and quantitative, must still be directed toward a search for controls and responses in a web of intricately interlocked data. Out of these methods will arise an understanding of functional relationships that can be used in developing more analytical models that increasingly reflect the "real-life" world of ... phenomena." [I have deleted the manifest references to geology in order to emphasize the analogy to anthropology - D.H.H.].

INTRODUCTION

29

social research. Two other features also stand out: it is precisely the sort of enterprise one would expect (like that of Gouldner and Peterson) to be the province of anthropologists, and it is feasible only because of the existence of the computer for processing the large masses of data. (Other applications of the computer in the human sciences can be found discussed in Proceedings, 1962, and Borko, 1962.) The same general picture, that of the development of computer use in fields adjacent to anthropology (but not especially in anthropology), emerges if one inspects the record of an active center. In its report for July 1961- June 1962 the Computer Center at the University of California, Berkeley, listed 75 projects under Psychology; 4 under Sociology; 3 under Linguistics; 2 under Political Science; 1 under French; and 0 under Anthropology. Such a survey of related disciplines, although cursory and impressionistic, is supported by consideration of a sheerly economic factor. It has been considered not unreasonable to guess that in the United States the government will invest in the use of computers in the next three years a sum of the same order of magnitude as the total present budget of the National Science Foundation. To put the prospect facing anthropology in another way, let us compare, as have others, the use of the computer to the use of the telescope and microscope in their effect on the earlier history of knowledge. The telescope and microscope of course did not of themselves work revolutions, but came into being in response to needs and conditions of their times. Once available, however, they became indispensable adjuncts of new ways of seeing the universe, especially in the physical and life sciences, but ultimately also for all men. So far as the physical and life sciences today are concerned, the computer has become equally indispensable (and may affect anthropology through the connections between physical anthropology and the life sciences, archaeology and the natural sciences). The impact of the computer on the human sciences, however, is likely proportionately to be far more revolutionary in the long run. Partly this is because, "There is reason to suppose that the computer is the same sort of breakthrough instrument that must be used if it can be used" (Cowan, 1963, 1071, on whom the preceding remarks draw). Our society, and the sciences adjacent to anthropology, are such that it seems inevitable that the computer will be used extensively, willy-nilly, so that the choice is only whether in the immediate future the computer will be used well or ill. Some of this has to do simply with willingness to take advantage of an opportunity, or predisposition through already extensive use of processes,

30

INTRODUCTION

especially statistical, facilitated by the computer. More, perhaps, has to do with what the computer, in a sense like the telescope and microscope, can enable us to see. In simplest terms, computer processing, properly prepared, can enable us to see relations and patterns in masses of data previously too large to comprehend; and to see the literal consequence of an idea applied to data, if not uniquely, then certainly far more inexorably and quickly. What if anthropology should leave these practical and intellectual opportunities to others? Can it not continue to concentrate on its traditional tasks? I fear that the answer must be in the negative, if the implication is that anthropology can so continue and still retain its present status as a peer among the human sciences. Other changes than the development of the computer pose for anthropology an entirely novel competitive situation. Let me briefly review them. If two themes can be said to have sustained the importance and the image of cultural anthropology during the generation just past, at least in the United States, they are the themes of cultural relativity (as opposed to ethnocentrism) and of field work in distant or difficult places (as opposed to library work at home). In both respects the disciplinary environment has changed drastically. Regarding the former, hardly any self-respecting scholar in any discipline is ethnocentric in the sense of the stereotypes and misconceptions anthropology has combatted. Indeed, our student audiences are so versed from so many sources in the facts of difference and variation about the world that they need and want much more to be shown (without euphemism and superficiality), respects in which peoples and cultures have a common ground. A world-wide conception and interest has triumphed to the point that the problem often is not for the concerned individual to break out of a local shell, so much as to find some piinciple of selectivity of response to permit the growing of one. Regarding field work, it becomes increasingly the case that the friend or colleague who has just returned from a stay in an African, Asiatic or Latin American country is as likely to be a political scientist, psychologist, sociologist, or other representative of some part of the human sciences, as to be an anthropologist. In sum, two main rationales for the anthropological discipline, a universal perspective of sympathetic tolerance, and first-hand knowledge of remote peoples, have become almost common coin. Add to this the rate of the disappearance of the peoples, or their ways of life, which have formed traditional anthropological subject matter, to become accessible

INTRODUCTION

31

only through documents and often enough shared property with historians ; and the extent to which the development of general theory in the behavioral sciences is vested in sociologists and psychologists; and there would seem to be some cause for alarm. The surge of popularity for anthropology as a subject in American colleges, and for anthropological books, should not be mistaken for a necessary sign of health and vigor in anthropological research and thought. The popularity is most likely a response in the present generation of students and public to the causes which made anthropology vital in the past generation. Such popularity is an opportunity to attract good minds, but something more is needed to hold them. To detail that something more would involve a good deal of history and speculation, and more of a personal vision of anthropology than is appropriate here. I can and must say, however, that I believe the computer to be an opportunity, not a threat, within the situation in which anthropology now finds itself. For the answer to the situation, however one specifies the details, must in general terms be a heightening of the quality of work. That heightening, it seems clear, must entail increased attention by anthropologists to two things: the logic and practice of quantitative and qualitative analysis, and the forms of cooperation and integration needed to make our stores of data systematic, comparable, accessible to each other and to theory. These demands, for formalization of analysis, and exchange of data (as this introduction has repeatedly stressed) are precisely the demands made by efficient use of the computer. The story of the computer in anthropology will be the story of how these two demands are met.

REFERENCES Barrett, William (ed.), Zen Buddhism. Selected writings of D. T. Suzuki (Garden City, New York, Doubleday Anchor Books, 1956). Boehm, Eric H., "Dissemination of knowledge in the humanities and social sciences", AC LS (American Council of Learned Societies) Newsletter, 14 (5) (New York, 1963), 3-12. Borko, Harold C. (ed.), Computer applications in the behavioral sciences (Englewood Cliffs, N. J., Prentice-Hall, 1962). Committee on Simulation of Cognitive Processes, Report. Social Science Research Council Annual Report 1961-1962 (New York, 1963), pp. 56-57. Computer Center, Project abstracts 1961-1962 (Berkeley, University of California, Computer Center, 1962). Cowan, Thomas Α., "Decision theory in law, science, and technology", Science, 42 (1963), 1065-1075. Feigenbaum, Edward A., "An experimental course in simulation of cognitive processes", Behavioral Science, 1962, 244-245.

32

INTRODUCTION

Galanter, Eugene, and Miller, George Α., "Some comments on stochastic models and psychological theories", in Arrow, Kenneth J.; Karlin, Samuel; and Suppes, Patrick (eds.), Mathematical methods in the social sciences 1959 (Stanford, Stanford University Press, 1960), pp. 277-297. Goldschmidt, Walter R., "Foreword", in Gouldner, Alvin W. and Peterson, Richard Α., Technology and the moral order (Indianapolis, Bobbs-Merrill, 1962). Howe, Irving, "Cybernation: the trauma that awaits us", Dissent, 9 (1962), 107-110. Hunt, Ε. E., Jr., Comment on J. Brozek, "Quantitative description of body composition: physical anthropology's fourth dimension", Current Anthropology, 4 (1963), 22. Jones, Edgar Α., Jr., (ed.), Law and electronics: the challenge of a new era (Albany, San Francisco, New York, Matthew Bender and Co., 1962). Kroeber, A. L., "Statistics, Indo-European and taxonomy", Language, 36 (1960), 1-21.

Krumbein, C. W., "The computer and geology", Science, 136 (1962), 1087-1092. Levi-Strauss, Claude, Tristes tropiques (Paris, Librairie Plön, 1955). Metzger, Duane, Review of H. P. Edmundson (ed.), Proceedings of the National symposium on machine translation held at the University of California, Los Angeles, February 2-5, 1960, in American Anthropologist, 65 (1963), 755-757. Michael, Donald M., Cybernation: the silent conquest (Santa Barbara, Cal., Center for the Study of Democratic Institutions, 1962). ; Johnson, David L.; Kobler, Arthur L., "Computers and human values", Science, 139 (1963), 1231-1234. Neisser, Ulric, "Culture and cognitive discontinuity", in Gladwin, Thomas and Sturtevant, William C. (eds.), Anthropology and human behavior (Washington, D.C., Anthropological Society of Washington, 1962), pp. 54-71. Newell, Allen; Shaw, J. C.; Simon, Herbert Α., "The process of creative thinking", in Gruber, Howard E.; Terrell, Glenn; Wertheimer, Michael (eds.), Contemporary approaches to creative thinking (New York, Atherton Press, 1962), Ch. 3. Northrop, F. S. C., "Ideological man in his relation to scientifically known natural man", in Northrop (ed.), Ideological differences and world order (New Haven, Yale University Press, 1949), Ch. 19. Proceedings of a Harvard symposium on digital computers and their applications (Cambridge, Harvard University Press, 1962). Swadesh, Morris, Application del equipo electromecanico a la comparicion linguistica (Mexico, D. F., Centro de Calculo Electronico, Universidad Nacional Autonoma de Mexico, 1963). (Mimeographed.) Tomkins, Silvan S., and Messick, Samuel (eds.), Computer simulation of personality. Frontier of psychological theory (New York, John Wiley, 1963).

PART ONE STRUCTURE OF THE COMPUTER A N D COMPUTER USE

I. COMPUTER

STRUCTURE

AN ANTHROPOLOGIST'S INTRODUCTION TO THE COMPUTER*

SYDNEY M. LAMB AND A. KIMBALL ROMNEY

1. Introductory. 1.1. Availability of Computers. 1.2. Examples of Anthropological Application. 1.2.1. Simulation of marriage rules. 1.2.2. Ordering, chronological and other. 1.2.3. Multiple uses of text material. 1.2.4. A hypothetical marriage problem. 2. Fundamentals of Computer Structure. 3. The Coding and Input of Verbal Data. 4. Registers. 5. The Program. 6. Types of Machine Words. 7. Instructions and Operations. 8. Loops. 9. Survey of Important Operations. 10. Additional Uses of Index Registers. 11. Programming Aids. 12 Conclusions.

* This work was supported in part by the National Science Foundation and by the National Institute of Mental Health (M-3937).

1. INTRODUCTORY

The symposium on "The Use of Computers in Anthropology" was organized primarily to evaluate recent developments in data processing technology in relation to the field of anthropology. This book may serve as a means of informing the anthropologist of the types of problems for which the computer and related devices may be applicable and of introducing him to the computer and its capabilities in a manner that will facilitate the use of such devices in solving problems on which he is currently working. The present paper consists of two sections. In the introductory section, the availability of computers and how the anthropologist can best obtain working access to them are discussed, together with some illustrations of programs and results of anthropological interest. In the second section, the structure of digital computers is described, and their operation is explained with actual examples of various processes. Further examples of computer processes and operations are found in several of the succeeding papers (Garvin, Wisbey, Sebeok, Piault, Gardin II, Needham, Ihm, Gullahorn and Gullahorn). 1.1. Availability

of

Computers

The era of automatically sequenced digital computers began some fifteen years ago and has revolutionized many aspects of university research. Many problems which formerly had to be neglected because of the magnitude of the size of the task may now be done routinely. In addition, a whole new set of problems has been conceived and is currently under investigation. Inverarity discusses these developments in his paper in the present volume. Scholars from many different fields have been attracted to the use of the computer in their research. For example, in one university a scholar in classics has used the computer in the analysis of Greek

AN ANTHROPOLOGIST'S INTRODUCTION TO THE COMPUTER

39

poetry; in another university a historian has examined hypotheses about old shipping records. As would be expected, business schools, medical schools, and schools of education are major computer users today. Even so, use of computers is in its infancy and the end of its mushrooming is nowhere in sight. Most anthropologists have shied away from the use of computers, in part because of the impression that access to them is exceedingly difficult and that their use requires facilities and knowledge beyond the capability of most anthropology departments. A brief review of the availability of facilities provided by university data processing centers will dispel the mistaken notion that computer facilities are beyond reach for anthropological research. The May 1962 issue of Datamation presents a compilation (entitled Computing in the University) based on two recent surveys, one conducted for the American Mathematical Society, the other by the Ohio State Research Center. The results show that 167 campuses in the United States had computing facilities either installed or on order, many of them with more than one computer. The total number of machines involved was 298. The figures grow so rapidly that they are undoubtedly significantly bigger by now. This means that there are considerably more universities which have adequate computer facilities than there are universities which have departments of anthropology offering advanced degrees. Many ofthe campus computers are of the small- and medium-scale varieties, but an appreciable number of institutions now have large-scale machines of the type exemplified by the CDC 1604 and the IBM 709 and 7090. An interesting feature of the university picture is the remarkable extent to which IBM outranks its competitors in number of installations. Of the 298 campus computers reported to be installed or on order, 154 were made by IBM. In other words, IBM is responsible for as many as all other manufacturers combined. This domination is actually of lesser extent than that found in the commercial world where it is variously reported that IBM has as much as 70 % to 80% of the computer market.1 The most popular of the large-scale computers in American universities are the CDC 1604 and the IBM 709 and 7090. The latter two are practically identical except in speed. (The 7090 is transistorized and therefore about five times as fast as the 709.) Institutions reported as having a 709 or 7090 installed or on order at the time of the surveys included Harvard, 1

An article in Business Week, Feb. 2, 1963, states that "IBM has installed more than three-fourths of the computers in the world - an estimated 13,000 to 14,000 - or more than ten times the tally of its nearest competitor, Univac Division of Sperry Rand Corp."

40

SYDNEY Μ. LAMB AND A. KIMBALL ROMNEY

Indiana, Johns Hopkins, Michigan, MIT, New York University, Northwestern, Ohio State, Princeton, Stanford, Texas A & M, UCLA, Washington, Washington State, and Yale. By now the list is longer. For example, the Berkeley campus of the University of California, which had an IBM 704 at the time of these surveys, acquired a 7090 in the summer of 1962. The University of Toronto also has a 7090. (Canadian Universities were not included in the surveys.) The CDC 1604, which is similar to the 7090, was reported present or on order at the University of California at San Diego, Michigan State, Minnesota, Cornell, New York University, Texas, and Wisconsin. Although the presence of computers among European universities is not as widespread as in the United States, they are becoming available to an increasing extent, and are used for research of anthropological interest at a number of institutions, such as the Laboratoire de Psychologie Sociale of the University of Paris, and the Centre National de la Recherche Scientifique (Paris); and Laboratoire d'Analyse Lexicologique of the Faculte des Lettres et Sciences Humaines, Universite de Besangon; the Centro per L'automatizione dell'analisi Letteraria, of the Pontificial Faculty of Philosophy, Aloisanum, Gallarate (Varese); the Centro di Cibernetica e di Attivitä Linguistiche, of the Universitä degli Studi di Milano; the Gmelin Institute Documentation Center in Frankfurt am Main; the Cambridge Language Research Unit in England; and various activities of the section on Recherches et Enseignement of the Communaute Europeenne de l'Energie Atomique (Euratom), including its support of research at some of the centers listed. One can call attention in Mexico to the Centro de Cälculo Electronico, of the Universidad Nacional Autonoma de Mexico. The pattern of computer usage within a university is most frequently set by the financial structure of the computing center. Many universities have free computer usage for all faculty and staff, while other schools charge for machine time and staff service. An important factor for many university computing centers has been the very helpful financial support provided by the National Science Foundation. The Report on a Conference of University Computing Center Directors makes the following recommen-

dations on financing : A university or college should be able to support the basic operating costs of its computing center from its operating funds, though it may need special aid for the initial cost. Under certain circumstances, it may be advisable to sell "second shift" time to research projects (or nearby industrial organizations) which can afford to pay; however, this sold time should never be more than half

AN ANTHROPOLOGIST'S INTRODUCTION TO THE COMPUTER

41

the available computing time, and it must never interfere with the educational and unsponsored research activities of the center.2 Whatever the local university policy for financing may be, the computer has become a necessary part of research. Even where the price of the machine is high, the cost for work accomplished is low. In fact, a research person with large amounts of data can hardly afford not to use the computing equipment. The power and speed of the machine takes much clerical and mechanical work away from the researcher leaving him relatively free to improve his research design and therefore his theoretical contribution. A. E. Beaton (1961), in a survey of university data processing centers, has said, "It is surely more extravagant and more costly not to take advantage of the computing machine" (p. 246). Beaton summarizes the operation of the university centers and their relationship to the researcher as follows: Most computing centers today are run on an "open shop" basis. An "open shop" is a center which requires a researcher to do his own programming, coding, and possibly even his own machine operating. A totally "closed shop" would be one in which all the programming and machine operating was done by the center's own professional group. Anyone who has programmed realizes the advantages of encouraging the scholar to program. There is no better way to have him see the power of the machine and to open his eyes to new research approaches. On the other hand, is it purposeful to have every scholar in a university spend large amounts of time in debugging computer programs? This author believes that while it is certainly important to encourage researchers to program for themselves, there should be a staff in the university for programming for those who have need of it. Optimum usage of a computing center requires some advanced planning. A research person should feel free to approach the center, preferably before the data are collected, for information about center policy. A center may have preferred methods of coding experimental results which must be known in advance. Since programming is a slow, tedious process that can easily take weeks or months depending on the problem, it may be possible to have the program prepared while the data are collected. Furthermore, ambiguities in research design and superfluous data collection can often be remedied by advance planning, and costs and long delays may be reduced (Beaton 1961, 246-247). (Cf. discussion of "The University Computing Center", by Wrigley, 1962.) Some universities which have large computers provide free use of them to faculty members of other academic institutions in their geographic areas, as long as time is available. This means that most anthropologists 2

American Mathematical Society, Report on a Conference of University Computing A Report to the National Science Foundation, August, 1960.

Center Directors.

42

SYDNEY Μ. LAMB AND A. KIMBALL ROMNEY

in the United States are potentially in a position to make use of largescale computers. One reason that computers are not yet being widely used in anthropology is that their presence on the academic scene is so recent a phenomenon that anthropologists, for the most part, have not had the opportunity to become acquainted with them. Moreover, the process of becoming acquainted has been impeded by various prevalent misconceptions regarding the range of applicability of the computer, the difficulty of learning how to use one, etc. Perhaps the most widespread misconception is that the computer is primarily, if not entirely, a glorified mathematical calculator. This misconception is even widespread among mathematicians who use computers. Since they use the machines only for purposes of numerical calculation, many of them have remained unaware of the numerous nonnumeric applications which are possible. Related to this misconception is the one that it is necessary to know a great deal of mathematics in order to know how to make use of a computer. This notion is correct only if one intends to use the computer for work in advanced mathematics. By the same token, if one wants to use the the computer for anthropological work, the most important thing he needs to know is anthropology, and the mathematical prerequisites involve little more than a smattering of arithmetic. Later on in this paper (Table 4) there is a survey of 709-7090 operations which are of particular importance for applications in anthropology and linguistics, and these are primarily non-arithmetic in nature. Most anthropologists will not become computer programmers, nor is there any need for a researcher to be a trained programmer in order to utilize computers in processing his data. For the anthropologist interested in exploring the use of computers, our strongest and most emphatic recommendation is to seek advice from knowledgeable personnel in his university. Most anthropologists cannot be expected to have intimate familiarity with the machines, but they must realize that if data processing and statistical analysis are involved, competent advice must be obtained from the start. The computer follows a set of specifically prescribed arithmetical and logical steps. To expect that the computer can make sense out of a pile of data in some mysterious way is not an uncommon expectation, but it is certainly unwarranted. A second recommendation is related to the first. When seeking advice, one should have one's problem stated in as precise a manner as possible. The value of the outcome is almost totally dependent on the quality of the anthropological knowhow that goes into the formulation of the problem.

AN ANTHROPOLOGIST'S INTRODUCTION TO THE COMPUTER

43

The machine is not creative, and the programmer only asks questions of the machine that have explicitly been stated. Thus, for example, one would not approach the programmer for advice on a general topic such as, "How do I analyze kinship by machine"? Before the programmer could be of aid, he would have to see the problem broken into small, discrete steps with explicit directions as to what is desired at each stage. (Cf. 1.2.4. below, and Garvin 2.1., Gardin II, 3, 4.) 1.2. Examples of Anthropological

Application

A review of a few selected applications of computers to anthropological studies will serve to exemplify the above recommendations. The first example represents the product of collaboration between a social anthropologist (Hammel) and a statistician-programmer (Gilbert). 1.2.1. Simulation of marriage rules

One of the potentialities of the computer lies in the possibility of its simulating the behavior of populations according to particular rules. In one recent application, Eugene Hammel of Berkeley and John Gilbert of the Center for Advanced Study of the Behavioral Sciences have simulated marriage behavior in a population through twenty generations (personal communication). Assuming random mating patterns, they were interested in determining the percentage of parallel patrilateral cousin marriage that would occur in a small society with specified residence rules. In effect, they were asking the question as to whether or not the percentage of parallel patrilateral cousin marriage empirically observed in certain Arab societies could be accounted for by demographic factors without invoking a preferential marriage rule phrased in kinship idiom. In one problem they began with a population of forty couples distributed equally in four "villages". The rules governing the operation of the model are as follows: 1. Descent is traced bilaterally (within limits noted below) although it may be phrased patrilineally if desired. 2. "Villages" have a preference for endogamy, i.e., a man will marry outside his village only if there are no marriageable women available in his own village. 3. Postmarital residence is patrilocal, i.e., in the village of the husband. 4. Marriages are monogamous. 5. Married couples are assigned children according to a Poisson distribution (with a mean of 2.5 and truncated at 10) on a random basis.

44

SYDNEY Μ. LAMB AND A. KIMBALL ROMNEY

6. The sex of children is assigned on a random basis according to a binomial distribution with ρ of .5. Operating with the above rules, the 7090 simulated twenty generations of marriage and computed the relationship of every married pair at each generation. To begin the simulation, four villages with ten couples each are set up such that no individual is related to any other. Each couple is then given children according to rules 5 and 6 above, i.e., each couple has a family. The procedure for marriage is as follows: For each village, two lists are formed, one of eldest marriageable males and one of eldest marriageable females. A village is selected at random. Within that village, two random selections are made in sequence, one from the list of eldest marriageable males and one from that of females. In the model, this pair is married forming a new couple. If there is no eligible male child in the village, a different village is chosen (and the village dies out since residence is patrilocal). If there is no eligible female child to pair with a chosen male child, a second village is selected at random and searched for a female. As soon as an individual is married, the next oldest sibling of the same sex in that family takes his or her place on the list of those available for marriage. Thus, only one child of a family can get married at a time. These procedures are repeated until all available individuals are married. These procedures simulate one generation of marriage and residence behavior that follow the rules enumerated earlier. To follow the simulated population through further generations requires only that the procedures be repeated utilizing the generated set of couples in place of the original population or an earlier generation, as the case may be. An important feature of the program is its ability to keep track of relationships among individuals. (To save storage space only selected "best" relationships are "remembered" by the computer.) Note that the kinship relationship between spouses does not explicitly enter into the procedure except that brother-sister or cross-generational matings are taboo. Keeping track of the relationship between individuals enables the program to compute the percentage of patrilateral parallel cousin marriages that are a "natural" reflex of the residence and territorial rules. Operating with the above rules, the machine simulated twenty generations of marriages and computed the relationship between married pairs. In this particular simulation, one village died out while the three remaining villages tripled in size (accounting for the increase in population is the fact that each couple was given an average of 2.5 rather than 2

AN ANTHROPOLOGIST'S INTRODUCTION TO THE COMPUTER

45

children). The percentage of parallel patrilateral cousin marriage varied between 3 and 8 percent but had not reached a stable asymptote. Total machine time on the 7090 was less than two minutes. Hammel and Gilbert are planning a series of simulations as variation on the original basic program which did demonstrate the practicality of population simulation (see Appendix B). By regularly varying the rules and procedures, the program can simulate populations with a variety of parameters. For example, the Poisson distribution can be different so that the population increases, decreases, cycles, or remains stable. The number of villages can be increased or decreased. Patrilocal rules of residence could be relaxed or changed. Polygyny can be introduced. Simulation of this type offers many possibilities in the study of kinship and social organization. The ability to study variations in one part of a system as a reflex of variation in other parts of the system has long been a goal in social anthropology. Simulation is one ideal tool for such studies. Hammel and Gilbert are exploring other problems including the expected rate of lineage extinction given unilineal descent under various combinations of population parameters. These parameters include population size, rate of population growth, and depth of generational reckoning. 1.2.2. Ordering, chronological and other Another general problem facing anthropologists is the ordering of a series of tribes, artifacts, etc., in terms of some measures of similarity. For example, in a stratigraphic study, archeologists might want to order various strata in terms of overall similarity among strata in order to make chronological and cultural inferences. Another example would be the ordering of a series of tribes in the plains area in terms of similarities in Sun Dance elements for the purpose of inferring diffusion patterns (cf. the paper by Driver on the history of such and his paper with those of Needham, Ihm, and Milke on present prospects). In an interesting paper entitled "Chronological Ordering by Computers", the Aschers (1963) have devised a program that orders a series of objects (tribes, archeological stratigraphic levels, etc.) in terms of similarities on a set of chosen criteria (e.g., cultural traits, percentages of pot sherd types, etc.). In essence, their program orders a matrix of coefficients of similarity in which the coefficients decrease as distance from the diagonal increases. They have applied their program to four sets of previously analyzed anthropological materials. The first was a set of archeological data used by Flanders (1960) in which the Ascher

46

SYDNEY Μ. LAMB AND A. KIMBALL ROMNEY

program processed a 9 by 9 coefficient matrix and reproduced Flanders' original order which had been arrived at by other methods. They also replicated two examples from Driver (1956) in which the matrix analysis was used to determine the best serial order of patri-centered traits and of functional relationship averages. In a final test case, the program was applied to Robinson's (1951) archeological data. In this case, three stratigraphic levels in each of three different trenches were ordered chronologically. The Aschers' program corroborated Robinson's original inferences as to the proper temporal order of the deposits. 1.2.3. Multiple

uses of text

material

Another area of computer application that has been relatively unexplored is the use of computer produced data for purposes beyond those in the original design. Thus, an anthropologist might be alert for opportunities to extract anthropological information from studies designed for other purposes. For example, in the course of most linguistic studies, it is necessary to store text materials on tape. It is sometimes feasible to extract relevant social and cultural data from such texts. (On text processing in general, see Lamb's paper.) In a study carried out under the direction of Dr. David Hays at RAND and Duane Metzger at Stanford, some 50,000 running words of Tzeltal texts were phonemically transcribed and converted onto tape for analysis. The texts consisted of tape recordings of about 90 Tzeltal interviews concerning illness. In the course of linguistic analysis, the material was run through standard programs producing a dictionary, a glossary, and concordance for the total text. (See, for example, C. H. Smith and T. W. Ziehe, Tzeltal text in the 7090 language-data processing

system,

The RAND Corporation, February 27, 1961.) On several occasions, in carrying out social anthropological investigations, the results of the computer analysis proved to be of great aid. The concordance, for example, provided material for a study on terms of reference and address that covered not only kinship terms, but other role and status terms as well. The collection of a large number of such occurrences would have been virtually impossible to carry out by hand. By utilizing the concordance, several hundred such occurrences were isolated in a matter of hours. The concordance has proved to be especially useful in syntactic analysis of Tzeltal. Patterns of distribution are made immediately available for inspection. In another study concerning the semantic implications of numeral classifiers in Tzeltal, it became necesary to systematically check on

AN ANTHROPOLOGIST'S INTRODUCTION TO THE COMPUTER

47

possible Tzeltal monosyllabic roots. In producing the concordance, one sub-routine of the computer program had generated all allowable monosyllabic combinations of phonemes for purposes of assigning root numbers to each morpheme in the text. This generated list provided a convenient check list that was used to elicit information from informants by systematically rotating all possible morphemes through a systematic set of frames. Utilizing these procedures, it was possible to isolate 624 functioning numeral classifiers from the systematic list of 4410 theoretically possible forms. 1.2.4. A hypothetical marriage problem As an example of the sort of problem one might take to a programmer for formulation, we give a hypothetical presentation in the following paragraphs. The actual process has to do with computing coefficients of inbreeding from genealogical records. The example is purposefully not stated in machine language. It is meant only to demonstrate the suggested degree of explicitness the statement of a problem, together with an illustration of how it might be broken into discrete steps. It is an example of what could be discussed with profit by an anthropologist and a computer programming expert. We plan to study the marriage patterns of the Mik society in Southern Mexico. We are particularly interested in the amount of inbreeding produced by the marriage patterns of the Mik. There are about 8,000 of these people, and they are divided into twenty-one villages. People tend to marry mostly within their own village; so the villagers are endogamous. We will gather complete genealogical data on the group including all remembered ancestors. Our problem will be to search, using computers if practical, all these genealogies for all marriages between consanguineal relatives, i.e., marriages between blood relatives. We will want to keep track of all relationships and the degree or distance of relationship. In order to illustrate the problem, we have worked out a sample problem together with a possible approach to its solution. The problem comes from Haldane and Moshinsky (1939.) They present the genealogy or pedigree in Fig. 1. The problem is to compute the coefficient of inbreeding between female m and male N. The formula for f, coefficient of autosomal inbreeding, is f = !/2 Σ 2"m r where: m is the number of steps in a path of r

blood relationships linking m and N, and r is the number of such paths.

48

SYDNEY Μ. LAMB AND A. KIMBALL ROMNEY

Haldane and Moshinsky (pp. 321-322) explain the situation as follows (using W for wife, m, and Η for husband, N):

Δ=0

οZK ο

=

Δ

ο

Δ=ο G

ο=Α ΔΝ

m:

Ο Fig. 1. Sample genealogy adapted from Haldane and Moshinsky.

We can express the relationship between two individuals W and Η by the number m of steps in each path of relationship connecting them. Each path passes from W to Η through a latest common ancestor, unless one is an ancestor of the other. A step is the relation of parent and offspring. Thus a parent and child are connected by a path of one step, a grandparent and grandchild or a half-brother and half-sister by a path of two steps, and so on. In general there are several paths of relationship. In human pedigrees these usually occur in pairs of equal paths, owing to the practice of monogamy. Thus an uncle and niece are connected by two paths of three steps each. Where there are several paths they may coincide to a greater or less extent. In particular, if the latest common ancestor is inbred, and is thus not a random sample of the population, an extra path (or paths) of relationship runs through him or her and the latest common ancestor of his or her parents. But since one generation of outbreeding wipes out the effects of inbreeding as measured by homozygosis, such extra paths only occur when the parents of the latest common ancestor are related, and not when his grandparents or other ancestors are related. We now have the following theorem: "If paths are specified as above, then if W and Η are connected by paths of lengths m r , . . . steps the probability that an a gamete of W will be fertilized by an a gamete of Η is ρ + fq, where f = V2

. r. In particular, ρ = 0, i.e., the gene is very rare, the r

probability is f." The proof follows.

49

AN ANTHROPOLOGIST'S INTRODUCTION TO THE COMPUTER

In Figure 1, where m and Ν are connected by four paths, two passing through each of their latest common ancestors, we have mx = m2 = m 3 = m4 = 6. Thus f = ι/2 χ 4 χ 2"6 = Τ 5 = V32. For machine processing, we need to reduce Figure 1 to some kind of linear code for processing. Our suggestions may be outlined briefly. Assign each individual a unique code number for identification. In the example the letters constitute such a code, although in actual practice we would need to reserve space for longer code numbers. In the example, let capital letters indicate males and lower case letters indicate females. We suggest the following code for representing relationships between individuals: = marriage 0 sibling link — child link + parent link Thus in the example, the relationship between i and J may be represented by the expression: i + C 0 F — J which may be read as father's brother's son. Two of the symbols are symmetrical or transitive while two are intransitive. Thus in Figure 1: 1 = J is the same as J = i, and C 0 F is the same as F 0 C, but C — i is the same as i + C, and i -f C is the same as C — i. We may find the reciprocal of any expression by writing the elements in reverse order and changing each + to a — and each — to a + · Thus the relationship between J and i is the reciprocal of the relationship between i and J given above, and may be written as: J + F 0 C — i. In coding the data in Figure 1 in linear form, we need only use the child link since the paths as defined in Haldane and Moshinksy use only + and — links, and information on + links may be derived from the — links. For convenience we can code only dyadic links. Figure 2 presents the data of Figure 1 in linear form. A A A C d

—C - F - h - i - i

e F i J m

-

J

- j —m —m — 0

G h L k Ν

- L - L - N - N - ο

A C F C J

= = = -

b d e h i

Figure 2. A linear coding of data in Figure 1.

L = N= b b b -

k m C F h

50

SYDNEY Μ. LAMB AND A. KIMBALL ROMNEY

Figure 2 contains exactly the same information as Figure 1. The problem is to program a search procedure that will discover all possible paths between m and N, ignoring all affinal links. Since all such paths must go through a common ancestor, we suggest generating a list of all ancestors for m and N. One possible series of steps for the calculation of the inbreeding coefficient would be the following: For each Mi (male) and Fi (female) in the current population list Aij, his ancestors. A unit record is , where Xi is Mi or Fi, Si is Xi's spouse, and Aij is one of Xi's ancestors of the g-th generation. Sort records by Aij; sort all records with common Aij by putting Xi, Si in the order Mi, Fi and sorting on that. If Aij is a common ancestor of Xi, Si, then there will be two records sorted under Aij with identical Mi, Fi - one for the husband and his wife, the other for the wife and her husband. Call these Mi Fi and Mi1 Fi1. Form a single record:

X c h 7 8 6

LI

y

b

j

3 Κ) Λ

3 9

r

1 1 2

U

Η III

· ·

q

Greek: (Greek shift plus) α a ß b γ g δ d ε e ζ z η h θ j ι i κ k λ 1 μ m ν n ξ X ο ο π Ρ Ρ r σς s τ t υ u φ

f

χ ψ ω

c

Punctuation: open paren close paren open or close quotes open or close brackets period comma semi-colon colon exclamation mark question mark / (slash) - (hyphen) — (dash; put space on both sides) asterisk (put space on both sides)

/ ?

.$ # # •

+

.$. *! —

*

Shifts (governing following characters): cap. letter cap. sequence to space or ) cap. sequence to ) (over a space) > > italic sequence to space or + + italic sequence to + (over a space) + * Greek letter 1 Latin letter to space or ) 1 Arabic number to space or ) superscript = subscript $ close cap. or Latin sequence (deshift) ) close italic sequence (deshift) + •

y q

+

Miscellaneous: =0 (degree sign) 1 * (prime sign) > (after vowel) % (percent sign) ( (after vowel) table, fig., or plate in text omitted 0 (after vowel) equation or formula in text omitted (= begin heading (to be followed by space) * 0

AN ANTHROPOLOGIST'S INTRODUCTION TO THE COMPUTER

57

( Table 2 continued)

Shift Sequences: Wherever two or more shifts go into effect simultaneously they are punched according to the following order from left to right: Order 1

2

3

any punctuation sign

$ (subscript) + (word = (superitalics) script) + + (long italics)

4

5

' (word Latin. (short cap) or Arabic), (word cap) * (Greek) , (long cap)

Deshift Sequences: Where two or more closing marks (deshift or punctuation) have effect simultaneously, they are punched in the following order, the reverse of that above: Order 1 ) (close cap or Latin sequence)

2

3

+ (close italics)

any punctuation sign

Mathematical and Chemical Symbols (to have a space on both sides when punched) *3 $ *7= > 1 *7 ± - > oo »1 *8 $= *2 χ (times) *9 —



way of describing the capacity of core storage would be to say that there are 36 times 2 15 (i.e. over one million) cores capable of storing that number of bits. Since there are 2 15 cells and each one has its distinctive address, exactly 15 bits are needed to designate the address of any cell. Of the 36 bit positions in a cell, the first is used to indicate the sign (plus or minus) whenever a number is stored. The state which is symbolized by the digit 0 represents a plus while 1 corresponds to minus. Because it has this function, the first bit of a word is usually called the sign bit. The remaining 35 bits are numbered consecutively 1-35 from left to right (see Figure 4). The sign bit position may also be designated

58

SYDNEY Μ. LAMB AND A. KIMBALL ROMNEY

0 1 2 3 4 - -

- . - . . - - -

35

Fig. 4. Diagram of a cell in core storage.

by the number 0, and we shall follow this practice here, since we shall be concentrating on the non-arithmetic functions of the computer. In the arithmetic section there are two registers, known as the accumulator and the multiplier-quotient register (see Figure 5). The latter, which

SOPl 2

35

Accumulator

35

01 23

M.Q. Register

Fig. 5. Registers in the arithmetic section.

is generally called the MQ for short, is used in multiplying and dividing; and the accumulator gets its name from the fact that it is used in addition. Both of these registers, however, have numerous non-arithmetic functions. The MQ register has 36 bit positions, which may be labeled 0-35 (or S and 1-35) as in a storage register. It may be thought of being situated directly at the right of the accumulator. The latter has two extra bit positions. The bits in the accumulator are labeled S, Q, P, 1-35. (S stands for "sign", but Q and Ρ apparently don't stand for anything.) When a machine word is taken from core storage and placed in the accumulator, its leftmost bit (i.e. that in position zero) may be placed either in the S position or the Ρ position of the accumulator. In the latter case, the word is usually referred to as a logical word. Similarly, if a word in the accumulator is placed in a storage location, either the P-bit or the S-bit may be placed in the Ο bit position of the storage location, depending on the requirements of the specific situation. The reason for the two possibilities is, of course, that the S-bit and the P-bit are treated differently by the computer in some operations. For example, when the accumulator is being used for addition, its S-bit indicates the sign of the number in the accumulator. If as the result of an addition a magnitude is obtained which is too large to be contained in bit positions 1-35, then the overflow goes into the P-bit. Thus, if the number consisting of a 1 in the first bit position and zero in all the other positions (i.e. 234) is added to itself, then the result will be a 1 in the Ρ position, all

AN ANTHROPOLOGIST'S INTRODUCTION TO THE COMPUTER

59

the rest zeros (i.e. 235). If more addition is done and there is further overflow, it goes into the Q position. Any additional overflow is lost. To take another example, it is possible and often very useful to shift the contents of the accumulator to the left or right. When a left shift is performed, bits shift into position Ρ from position 1 and into Q from P, but the sign bit is unaffected. An accumulator right shift works in exactly the opposite way; that is, bits are shifted from Q to Ρ and from Ρ to 1, but the S-bit is unaffected.

5. T H E P R O G R A M

We now come to consideration of one of the most important properties of the digital computer, namely what is known as the stored-program feature. Computers are quite unlike machines of the type with which people are generally familiar with regard to the nature of the basic function which they perform. The ordinary machine is built to do a certain thing or perhaps a limited variety of related things. When it is turned on or when the start button is pushed, it does what it has been built to do. The operation of the digital computer is one step removed. The function which it has been built to perform is simply to follow instructions. What it does at any particular time, therefore, depends not so much on what it has been built to do as on what the instructions being executed tell it to do. In other words, it is an instruction-following machine. A complete list of instructions provided for the machine for the execution of a set of operations is called a program. The machine will do exactly and only what the program specifies, and aside from the simple individual operations which it performs, such as adding or transferring information from one place to another, it does nothing on its own. At the time it is in operation, the program is contained in core storage along with the data being operated on. The operation of the computer consists in taking an instruction from core storage, executing it, and then taking the next instruction, and so forth. In the 7090, the typical instruction is executed in 4.36 microseconds. (A microsecond is a millionth of a second.) Since the actual functioning of the machine is governed entirely by the program, it is capable of performing a limitless variety of tasks, and it can turn from one type of work to an entirely different kind immediately, provided a different set of instructions is furnished to it. No rewiring or any other physical manipulation is necessary to enable the computer to

60

SYDNEY Μ. LAMB AND A. KIMBALL ROMNEY

turn from calculation of orbits of earth satellites to analysis of a Mayan text. All that is required is a new set of instructions. To say "all that is required" is perhaps to make the machine operations appear simpler than they actually are. Coupled with the enormous flexibility and freedom which one has in designing machine programs is the weighty responsibility of providing one's instructions for the machine in complete detail. Since the machine can do practically nothing on its own, it must be told by the program exactly how to do everything that is desired of it. Nothing can be left for the machine to take for granted, and every possible set of circumstances which might come up during the execution of a program must be provided for. The most important of the operations performed by the machine are concerned in one way or another with material contained in one or more of the following three places: core storage, the arithmetic section, and one or more index registers. The program, or that portion of it which is being executed at a particular time, is present in core storage along with the data being operated upon. Except for their distinct locations and the information which happens to be contained in them at a give time, all cells in core storage are exactly alike. Therefore, as far as the machine is concerned, there is no difference between the program and the data except for the fact that the machine words which make up the program are being taken as instructions, while the others are not. This means, among other things, that instructions may be operated upon as if they were data. The opportunity is thus provided for modifying the program during its own execution. This is one of the properties of the computer which provide for great flexibility within the scope of a single program. And it is because of this feature that it is possible to construct game-playing programs which can improve themselves on the basis of experience. If the program or the material to be operated upon is very bulky, there may not be room for all of both to be kept in core storage at one time. Under such circumstances parts of either the program or the data which are not in use at a particular point in the execution of the program may be kept on magnetic tape until needed. When the proper time comes, the program can instruct the machine to read in, say, the next portion of the program from one magnetic tape and the body of data which it is to operate on from another, after which the machine can start executing the portion of the program just read in. Each individual instruction in a program constitutes one machine word, and it therefore occupies one cell when it has been placed in core storage. Whenever directions to the contrary are not given, instructions are

AN ANTHROPOLOGIST'S INTRODUCTION TO THE COMPUTER

61

executed in sequence, control passing from one instruction to the one occupying the next storage location. However, some instructions, known as transfers, cause the machine to take its next instruction from a specified location instead of the following one in sequence. Most transfer operations are conditional; that is, the transfer is taken if a particular condition is present, but otherwise the machine passes to the next instruction in sequence, as usual. This feature allows the machine to follow different courses of action depending upon the nature of the results obtained in some portion of the analysis or computation. It also provides the means for performing a given set of operations over and over again, since transfer instructions can be provided which will send control back to a preceding instruction. The term program is almost interchangeable with the term routine, but some people distinguish the two on the basis of scope, a routine being less complete. A program covers an entire set of operations on a complete body of data from the input of the data and the program itself to the output of the final result. It may include several routines and subroutines. Subroutines are constructed for sequences of operations that must be used many times in many different programs. Computer installations generally have on file a large collection of subroutines which can be incorporated into specific programs where needed, thus saving the programmer a great deal of work. For verbal data such a subroutine might be one which would go through a body of text picking out occurrences of a particular word. In making use of this subroutine, the user would specify precisely which word (or stem) to look for as well as perhaps what is to be done to the examples found. A subroutine of this type might, for example, take the following form: Find all occurrences of the item (to be specified), and for each occurrence list the η (to be specified) preceding or following (to be specified) items. The information to be specified, which has to be provided separately in the program each time a subroutine is used, is known as the calling sequence. Subroutines are usually kept separate from the main program in core storage. Part of the calling sequence, therefore, has the function of transferring control to the subroutine, and after the execution of the latter there is ordinarily another transfer back to the main routine. (The operation "Transfer and Set Index" provides a very simple means for getting back to the right place. It sets an index register with a number identifying the location transferred from.)

62

SYDNEY Μ. LAMB AND A. KIMBALL ROMNEY

6. TYPES OF MACHINE WORDS

As already noted, a machine word can serve as an instruction or as a sequence of six BCD characters. Generally there is nothing in the nature of the word itself to distinguish its function from that of any other word. The environments in which these words occur in the program is what determines their function. Figure 6 shows two interpretations which can As B C D :

H

I

P

f~5" ι 1 0 0 0 0 1 1 0 0 1 1 0 0 1 1 A s Instruction:

TXH

1639

P 11001 4

O

S

hippos

1 1 100110110010") 31154

TXH 31154,4,1639

(i. e., take next instruction from location 31154 if the number in index register φ 4 is greater t h i n 1639.)

Fig. 6. Two interpretations of a machine word.

be placed on a particular word. If by some slip on the part of the programmer a data word turns up in the location following an instruction of the program, or if a transfer instruction sends control to a data word, then the machine will interpret this word as an instruction and will perform whatever operation is indicated. In other words, computers too have a language in which one and the same unit of expression may have different meanings in different contexts. In addition to the types of words mentioned above, there are two types of numbers3 (see Figure 7). A fixed point number is made up of a sign, in the sign position, and a magnitude, which is represented in binary digits in the rest of the word (i.e. bits 1-35). If one is doing arithmetic on the machine, numbers expressed in this way are easy to deal with as long as it is either unnecessary or simple to keep track of the binary point (cf. decimal point), as in working exclusively with integers. Otherwise one uses floating point numbers and floating point arithmetic operations, which keep track of the binary point automatically. A floating point number is made up of three parts: (1) the sign, in the sign position; (2) the characteristic, occupying positions 1-8; and (3) the fraction, in positions 9-35. Such a number is expressed as a binary 3

Thus the word of Figure 6 could have two additional meanings. As a fixed-point number with binary point to the right of position 35, it has a value of about 26 billion (2 S1 + 2 33 + 228 + 2" etc.). As a floating point number, its value is in the neighborhood of 30 quintillion.

63

AN ANTHROPOLOGIST'S INTRODUCTION TO THE COMPUTER

1. Instruction a. With decrement (TIX, TNX, TXH, TXL, TXI) prefix

decrement

tag —s*

1 1 1 012 3

address ,

1 1 1 1 1718192021

1 35

b. Without decrement operation

flag

1 0 12 3

LLL 11 1213 14

2. BCD word 2nd

1

ι

11

address

111

35

1718 2021

Six characters

1st

1 1

tag

1

3rd

1

1

11 12

56

4th

1

I

17 18

6th

5th

ι

I

23 24

ι

ι

29 30

I

35

3. Fixed point number magnitude

S1

35

4. Floating point number §> ·ψ
Ο ΪΓ § ο „ c ιu ε ·· υ .2 U .2 c ~ υ

:1

ο α, 8 c •a C ο υ § ο § ο 1 ® Μ

NO

' DEFINE TYPE J j O F DIFFERENCE^*

(

APE THE

A

DIFFERENCES 1 OF THE DEFINED I T YNO PE ? y

J

^

g >

J [ Ν

C O TO P A C E TWO

Fig. 1. Flowchart of System Organization (based on Ruesch and Bateson, p. 1).

Λ I •

C ©

ENTER FROM PAGE ONE

c

ι

| I

©

>AΘ

DO THE ENTITIES COMMUNICATE?

Τ

NO

Γ DEFINE I STEADY STATE I FOR SYSTEM I

THE SYSTEM IN A STEADY STATE

C Θ -

©

Θ

±

NO

ν

IS THE ^ IS THE STEADY STATE \ in J cCONTINUATION o OF OF THE SYSTEM I - ^ T H TLTHE H I STEADY STATE TEHCAPABLE OF CONDITIONAL y ν "T" ^CONTINUATION^''

NO .

NO

)

DOES THE CONTINUATION OF THE STEADY STATE DEPEND ON COMMUNICATION BETWEEN THE ENTITIES?

/ T H E SYSTEM , s NOT (V ORGANIZED ORG

—Γ" NO

I OEFINE • SIMILARITY I OF ACTIVITY

{ J I

r~J

Θ

DOES THE CONTINUATION OF I THE STEADY STATE 1 DEPEND ON \ THE SIMILARITIES \OFACTVITY? /

© , / S THE CONDITION^, FR/ OF SIMILARITY \ J 3 \ BY ITSELF J *· ^ SUFFICIENT?

S

NO

i

NO

T DEFINE Ι I DIFFERENCE ' • OF ACTIVITY I

Θ /^doestTIE^

'CONTINUATION OF THE STEADY STATE DEPEND ON THE DIFFERENCES OF ACTIVITY?

/

\

/ I S THE CONDITION\ « J OF DIFFERENCE I " . >-*· BY ITSELF ·> \ SUFFICIENT? /

V.

/WAS CONDITION ( OF SIMILARITY

V

MET?



NO

_t Fig. 2. Flowchart of System Organization (based on Ruesch and Bateson, p. 2).

COMPUTER PROCESSING AND CULTURAL DATA

129

the explanations, the latter indicate the direction of the "flow". Two general conventions are worth mentioning:8 (1) boxes with rounded edges enclose entrances, exits, and questions, boxes with pointed edges enclose instructions or information; (2) "yes" arrows start out horizontally to the right, "no" arrows start out vertically downward. We introduced a few special conventions for our particular flowchart: we have numbered all the boxes consecutively on both pages, and we have indicated the relation between the flowchart and the original definition by drawing different borders around the boxes. Solid borders indicate questions and information taken directly from the verbal definition; broken lines bordering a box indicate questions or information not explicitly contained in the text but implied by it as hidden assumptions necessary to the logical flow; wavy lines bordering a box indicate questions or information contained in the text and deemed redundant. The questions on the flowchart are addressed to the input, which means that the input data must be chosen and formatted so as to provide the information needed to answer the questions. In our case, the notation "Enter with system" in the unnumbered box at the top of page 1 indicates that the input to the program represented by our flowchart would be a system. This means that the system which we would wish to study by means of our program would have to be specified precisely enough so it both could be formatted for input, and would provide the information required to answer the questions on the flowchart. The flowchart shown in Figures 1 and 2 represents one interpretation of the original verbal definition. We can now compare our flowchart to this interpretation. There are two aspects in which we consider such a comparison meaningful: (1) we can compare the overall structure of the flowchart to that of the verbal definition; (2) we can compare certain details of the flowchart with the corresponding portions of the verbal statement. It is important to note that a different interpretation of the original definition will result in significant differences in the flowcharting. The differences in flowcharting due to a different interpretation of one portion of the definition are shown in Figure 3. First, a brief comparison of the two structures. The verbal definition contains four parts, labeled (a), (b), (c), (d) respectively by the authors. These parts correspond to boxes on the flowchart as follows: Part (a) - boxes 1, 2, 3, 4, 6, on page 1 8

Cf. "Proposed Standard Flow Chart Symbols", Communications of the Association for Computing Machinery, vol. 2, no. 10 (Oct. 1959), pp. 17-8.

130

(

PAUL L. GARVIN

ENTER F R O M \

RG l·

0

J



J

THE^V

\

!£llls?y~ Τ NO A R1E T H E N

C

ENTITIES _ORGANS?

71 — Y E S - >

THE ENTITIES ARE ACTIVE

/ E X I T T O \

NO

(

i

ARE THE

ENTITIES V—V INDIVIDUALS?/ NO

,-JU

. UNDEFINED X ^CONSEQUENCE^

Fig. 3. Alternate Branching of Flowchart of System Organization (based on Ruesch and Bateson).

Part (b) - boxes 7, 8, 9, 10, 11, 12, 13, 14, 15, on page 1 Part (c) - box 1 on page 2 Part (d) - boxes 2, 3,4, 5, 6, 8, 9,10, 11,12,13, 14, on page 2 Note that the different parts of the verbal definition correspond to rather unequal portions of the flowchart: part (c) corresponds to a single box, while part (d) corresponds to as many as 12 boxes. We might want to conclude from this that the logical structure of the original definition is uneven, in the sense that the conditions set forth in the different parts of the definition are not of the same order. This interesting characteristic of the verbal statement both permits and encourages us to explore whether the unevenness in structure constitutes a logical flaw in the definition or reflects a significant and hitherto unnoticed property of the defined object. In the details, the flowchart differs from the verbal definition primarily by being more explicit, as is shown by the special flowcharting conventions that we have adopted. By way of illustration we want to point out the hidden assumptions implicit in part (b) of the definition which reads: "if among these entities certain similarities and differences occur." Boxes 7 through 10 of page 1 of the flowchart indicate that this statement either implies the assumption that neither all nor some of the elements are identical, or ignores the possibility of a con-

COMPUTER PROCESSING AND CULTURAL DATA

131

dition of identity altogether (this is indicated by the notation "Undefined consequence" in boxes 8 and 10). Boxes 12 and 14 indicate that the term "certain" in the verbal statement implies a particular type of similarity or difference which requires specification. Finally, we want to point out that boxes 2 through 6 on page 1 of the flowchart are based on interpreting part (a) of the verbal definition to mean that "cell, organs, individuals, and so forth" are merely redundant elaborations of the requirement for "active entities". An alternative interpretation is shown on Figure 3 which is based on assuming that "cells, organs, individuals, and so forth" are meant to be an incomplete definition of the requirement for "active entities". Note that while this interpretation removes the redundancy, it creates an "Undefined consequence" due to the statement "and so forth".

3. PROCESSING: TEXTUAL DATA

The basic distinction between textual and nontextual data, which we mentioned in the discussion of formatting further above, applies to the entire field of the computer processing of cultural data. Although we wish to concentrate upon the problems of processing nontextual data, we shall first give some attention to the question of textual data. The textual data of the anthropologist can be divided into two major categories: (1) text produced by the culture under study, such as folklore; (2) text produced by anthropologists, such as ethnographic descriptions. The processing of text produced by the culture raises the question of computer applications for purposes other than simple tabulation, which, while it may unquestionably be useful, does not require further discussion. In other words, can the computer be used for purposes of analysis, and more specifically, for an analysis that is primarily cultural rather than linguistic? At out present state of knowledge we are not yet in a position to set forth the conditions for a cultural analysis of text for more than the automatic production of concordances. In concordance automation, text size and cost are significant considerations. Unless a simple general concordance program is available ready for use in some program library, the amount of text at hand for a particular culture may not warrant the time or effort required to write such a program. Once a concordance has been produced, it is a useful added tool for analysis - it assembles the pieces of the text in a form

132

PAUL L. GARVIN

convenient for inspection. But it must be remembered that a concordance doesn't of itself constitute an analytic result. Text produced by anthropologists, that is, field notes or other ethnographic documents, are from the standpoint of computer processing merely technical documents in the particular field of anthropology. Hence, the question of treating them automatically is simply a question of information processing - an automated information retrieval activity or an automatic abstracting activity, dealing with cultural anthropological documents rather than with the more usual documents of physics or chemistry. The anthropologist here would become the customer of an automatic system the same as the physical scientist or librarian or government official. The serious question which arises here is whether or not the enormous cost of such automatic systems, particularly in view of their present imperfections, is warranted by the comparatively limited needs of the profession. We take a more positive view of the opportunities for research on the structure of cultural anthropological terminology afforded by the requirements of information retrieval or automatic abstracting. This promises to be productive and to lead to interesting insights, but we consider the study of terminology a linguistic rather than a cultural problem. 4. PROCESSING: NONTEXTUAL DATA

Turning now to the processing of nontextual data, we first want to repeat that we are interested in nonstatistical processing. We also wish to stress again that under nontextual data we include all data that do not consist of connected text. This means that data which consist of isolated verbal material without the connectedness provided by linguistic relations are classed as nontextual, since our diiferentiative criterion is the ordering imparted to the data by the structure of a natural language, rather than the verbal substance as such. It is also worth noting that our distinction of textual and nontextual does not coincide with the usual differentiation of verbal and nonverbal behavior: data on either form of behavior can be textual or nontextual. 4.1. Degrees of Computer

Participation

The nonstatistical computer processing of nontextual data in an anthropological frame of reference has, to our knowledge, so far not been

COMPUTER PROCESSING AND CULTURAL DATA

133

attempted on a serious scale. We are therefore justified in considering a related frame of reference, that of linguistics, and examining its possible bearing on the question at hand. In this consideration, we shall take as our point of departure an earlier paper in which we suggest that there might be three basic degrees of computer participation in linguistic research.9 We propose that the lowest degree is data collection; an intermediate degree is the verification by a computer program of the results of research obtained through other means; and the highest degree is testing the validity of a method by a computer program. As an important example of the first degree, we used the automatic compilation of a concordance. As an example of the second degree we used machine translation, and as an example of the third degree we used automatic linguistic analysis, in particular, our own conception of it which is one of distributional analysis (first cited further above to illustrate the effects of computer processing on the assumptions and aims of research, see pp. 122-3 and op. cit., in fn. 2). The reasons for considering that these are different degrees of computer participation are as follows: In the compilation of a concordance or other means of data collection, the computer is used in what is essentially a bookkeeping and filing function. While it is true that all computer operations can ultimately be reduced to a form of bookkeeping and filing, it is equally true that many computer programs have a logical structure far transcending that of the bookkeeping and filing components of which they are made up. This certainly applies to the computer programs used in machine translation and automatic linguistic analysis. A machine translation program in the context of linguistic research serves as a tool to verify the correctness of a particular analysis: if the analysis on which the translation program is based is correct, the program will produce acceptable translation, and if not, it will not. Finally, it is clear that in the case of automatic linguistic analysis the program will carry out, with the necessary logical consistency, the analytic instructions built into it. Clearly, if these are good instructions the output of the program will be acceptable, and if they are not, it will not. We can now ask whether the frame of reference provided by these degrees of computer participation can help us in developing a systematic approach to the nonstatistical processing of nontextual data. As a first approximation, we can explore whether the three processes which we cited as examples of the three degrees - namely, concordance-making, • Paul L. Garvin, "Computer Participation in Linguistic Research", Language, 38 (1962), 385-9. (Originally presented at the Wenner-Gren Symposium.)

134

PAUL L. GARVIN

translation, and distributional analysis - are in any way applicable to nontextual as well as textual data. The concrete question will be: is it possible to conceive of a concordance, a translation, a distributional analysis, of nontextual data. The consideration of this question will of necessity have to be even more speculative than the discussion so far. But even if an answer turns out to be impossible to give or trivial, the question is worth asking, because it may lead us to an understanding of the nature of cultural data in a new light. 4.2. Music as an Example

From this standpoint an interesting division suggests itself. Upon brief consideration, it will be apparent that of all the varieties of nontextual data, music may lend itself most readily to the above named processes. This is not surprising, since music is comparable to language in the sense that it, too, forms a separate and self-contained system; furthermore, music permits the use of a discrete notation which can readily, though not at this stage mechanically, be transformed into computer input. It is quite conceivable that a musical concordance can be constructed. The requirements for such a concordance would be: (1) some clearly delimitable units, the environments of which are to be set forth in the concordance; (2) some frame unit, if the environments are to be defined by more than just a certain mechanically determined number of units adjacent to each side. The required units exist in music in the form of its well-known rhythmical stretches. As in the case of language, we would not consider a musical concordance an analytic result, but rather a tool for further analysis. It would exceed the scope of this paper to speculate how such a tool might be used in ethnomusicology. Translation can likewise be envisioned in the case of music. It is common knowledge that musical pieces can be transposed from one scale into another (for instance, from diatonic into pentatonic or conversely), or that scores can be rewritten for different instrumentation (for instance, from chamber quartet to symphony orchestra). From an anthropological standpoint, it would be important to differentiate between cross-cultural and intracultural forms of translation. It would also be interesting to inquire what particular property (such as, for instance, the same melody) forms the basis on which two pieces of music would be considered translations of each other by a given listening or performing community. We can conceive of the meaningful application of such an approach to both descriptive and comparative problems.

COMPUTER PROCESSING AND CULTURAL DATA

135

Finally, we can envision a form of musical analysis similar to distributional analysis in linguistics, namely, the analysis of the occurrence pattern of smaller musical units, such as bars, within larger musical units. This might be an interesting question to consider in its own right, and the methods and results of distributional analysis might be compared to those of other forms of analysis. We are convinced that anyone of these operations could be automated. The significant question, in our opinion, would be not only how automation could best be implemented, but what research advantages would be gained from automation that go beyond the results which could be obtained by attempting concordance-making, translation, and distributional analysis of music "manually". In the case of a musical concordance (just as in the case of language concordances) one could state that automation only makes sense for a body of musical data large enough to warrant the use of computing equipment. The interest of automatic translation in music would be similar to that of automatic language translation. A program which would automatically transpose one form of music into another would produce the logically consistent results of certain assumed rules of transposition, which would have to be stated with the requisite explicitness. The output of the program would provide a mode of verification comparable to that obtained in language translation: by performing this output, the logical consequences of the transposition rules could be listened to and the adequacy of the rules could be judged. We can most readily conceive of automatic musical analysis along lines similar to automatic linguistic analysis, that is, as a computer program for distributional analysis (cf. p. 123 and op. cit. in fn. 2). It may be possible to consider the automation of the better known forms of musical analysis; we are not in a position to judge whether the procedures of musical analysis have ever been set forth in sufficient detail to permit computer use. Here again, the most conspicuous advantages would be explicitness and logical consistency. 4.3. Other Examples (kinesics, culture change, etc.)

We may now go on to speculate about the possible application of concordance-making, translation, and distributional analysis to nontextual data other than music. Such nontextual data might consist of symbolic notation such as that of kinesics10 or choreography, or of direct repre10

Ray L. Birdwhistell, Introduction to Kinesics (Louisville, 1954).

136

PAUL L. GARVIN

sentations such as motion picture film. While the notations are discrete to begin with, direct representation raises the problem of segmentation into the units required for the desired applications. It seems fairly obvious in the case of film, for instance, that such technologically given segments as individual frames do not have cultural significance. The problem here is a direct correlate of the more general problem of the segmentation of the nonverbal behavior which is represented by the nontextual data. Once we are able to provide the necessary segmentation, we may look upon this nonverbal behavior as a temporal sequence of segments similar to the linguistic or musical units in their temporal sequence. We can then consider the application of concordance-making, translation, and distributional analysis to this sequence of behavioral segments. As in the case of musical data, it is clear that a concordance of nonmusical nontextual data is conceivable, given the appropriate segmentation of behavior. As before, it is worth asking the question of the use to which such a concordance could be put in analyzing cultural behavior. Whether properly segmented data will be available in sufficient quantity to warrant automating a concordance can at present not even be asked. In considering the translation of nonmusical nontextual data several interesting questions suggest themselves. Thus, for instance, the question of the comparison constant for determining what segment of behavior in one culture is the translation of some equivalent behavior in another culture has to our knowledge as yet not even been posed. As a very crude approximation one might assume, for instance, that the famous list of needs given by Malinowski is a set of such comparison constants,11 and that behavior in one culture in response to a given such need under a certain statable set of physical conditions, is the translation of behavior in another culture filling that same need under comparable physical conditions. For instance, behavior at a particular meal in one culture may be considered the "translational equivalent" of behavior at a comparable meal in another culture. It might be instructive to view certain aspects of acculturation as instances of "cultural translation". In this light, it might not be too far-fetched to speculate about a "cultural translation" program which might have the aim of verifying a hypothesis about culture change. The program could be based on the rules of culture change as set forth by the hypothesis; it could be made to operate on a set of data drawn from a situation the outcome of which is historically known. The program could be expected to devise the logical 11

I am indebted to M . G . Smith for this suggestion.

COMPUTER PROCESSING AND CULTURAL DATA

137

consequences of the effect of the rules of change on the original situation. A comparison of the output of the program with the historically known outcome might contribute to confirming or infirming the hypothesis. Conceivably, the same program could be applied to more than one situation, serving to test a more ambitious hypothesis. Needless to say that the many procedural safeguards required to insure the validity of such an approach can not even be foreseen at present. In regard to a distributional analysis of cultural behavior outside of language and music, the basic questions are again worth posing: that of the units to be examined, and that of their distributional frame. It is conceivable, for instance, to consider the particular separable portions of a ceremony the units under consideration, and consider the ceremony as a whole, the distribution frame for these units. Something along these lines has been suggested by Pike in his description of the football game and the breakfast in his theoretical discussion of culture. 12 The rather cool reception that Pike's approach has found among culturally oriented workers 13 shows, if nothing else, the difficulty of applying the analytic concepts of linguistics to nonverbal culture. Nonetheless, the question of distributional analysis seems worth posing, 14 although at this stage it seems premature to speculate about its automation. 5. COMPUTER PROGRAMS AND CULTURE THEORY

We are now ready for our most ambitious speculation: we propose to consider certain problems of the theory of culture in the light of a basic characteristic of computer programs. We are thinking of the fundamental debate in cultural anthropology regarding the concepts of function, structure and process. We will, however, limit ourselves to the latter two concepts, since we are not as yet able to suggest an interesting way of 12

Kenneth L. Pike, Language in Relation to a Unified Theory of the Structure of Human Behavior, part 1 (Glendale, 1954), pp. 44-63. 18 Cf. Stanley Newman, review of op. cit. in fn. 12, IJAL, 22 (1956), 84-8, particularly p. 87: "At the present time, however, the formulation of a theory to coordinate the behavioral sciences is scarcely a task which a single individual could be expected to perform successfully." 14 Stanley Newman, ibid.: "... for example, his concept of 'spots and classes,' an elaboration of the substitution-frame procedure, appears sufficiently practicable and potentially useful to merit testing in the nonverbal area of behavior." For an analogous analysis of ceremonial behaviour, see now K. S. French, "Ceremonial organization", VIe Congris international des sciences anthropologiques et ethnologiques, t. II: Ethnologie, I e r volume (Paris, Musie de l'Homme, 1963), 101-106.

138

PAUL L. GARVIN

speculating about the concept of function in the frame of reference which we are proposing. The need for a further clarification of the concepts of structure and process becomes evident once one leaves such well traveled areas as kinship in the case of structure, or ethnohistorical change in the case of process. We base our speculation on the observation of some gross similarities between the distinction of structure and process on the one hand, and the distinction of two basic programming techniques, namely, table lookup and algorithm, on the other.

We can illustrate the difference between table lookup and algorithm by an elementary example. Given the task of multiplying two by four, we can either add two four times to itself - which is the algorithmic approach, or look up 2 χ 4 in a multiplication table. Even in this elementary context, algorithm and table lookup are not mutually exclusive but complementary: given the problem of multiplying numbers greater than those contained in the multiplication table, we use an algorithm to decompose the problem into a series of elementary multiplications, the results of which can be looked up in the multiplication table (although by now we have memorized the table and look it up in our memory), and we combine these part results by an additional algorithm to obtain the final result. A table in a present-day computer program can become quite complex, to the extent of allowing the analogy with a structure; so can an algorithm, to the extent of permitting the analogy with a process. Although the difference between table lookup and algorithm is somewhat more clearly defined than that between structure and process in anthropology, we were able to point out on the elementary level of our initial example that even here the two opposites are not always clearly separated, although they are more precisely definable. In any reasonably complex computer program, there will be algorithms containing tables or calling for tables, and a large-scale table lookup scheme will ultimately require an algorithm for finding one's way around in a table. At one time in linguistic computer applications there was a serious dispute about the advantages of a predominantly algorithmic versus a predominantly table-lookup approach. At present more general considerations of efficiency are applied, and algorithm and table lookup are no longer looked upon as mutually exclusive alternatives. We may want to carry this conception over into our analogy with cultural anthropology and consider that structure and process likewise are not necessarily mutually exclusive alternatives but can be considered as two aspects of the same phenomenon.

COMPUTER PROCESSING AND CULTURAL DATA

139

We now come to the high point of our speculation. Let us take our analogy seriously to the extent of adopting the oversimplifying assumption that structure is table lookup and algorithm is process, and that table lookup and algorithm are always clearly distinct. We are now in a position to use table lookup and algorithm as operational definitions of structure and process and consider that anything in a cultural description that lends itself to a table-lookup approach is structure, anything that lends itself to an algorithmic approach is process, and finally anything that lends itself to both approaches contains elements of both structure and process. This final speculation is not completely unrealistic. It is possible to conceive of a computer program simulating, for instance, culture conflict along the lines of the computer programs that are now being written for management games (i.e., the simulation of business competition for purposes of predicting the outcome of management decisions). It would then be possible to consider that, in this conflict simulation, the elements which enter into a stored table represent the structures of the conflicting cultures, and the rules of which the algorithm consists represent the relevant processes. Similarly to what was suggested in the acculturation example further above, the outcome of the simulation program could be compared with an observed outcome of a conflict, and thereby serve to verify the assumptions built into the program. But the aspect which we want to stress in the present context is the operational consideration of the question of structure and process allowed by our analogy in the case of the example of culture conflict. Our decision to store the structures in tables and to reserve the algorithm for the actual process of conflict may give us a means of isolating process from structure and of studying the two separately. By varying the content of our tables and our algorithm we may perhaps be able to achieve a controlled variation of both structure and process. By comparing the outputs of these variant programs to each other and to the observed outcome used for purposes of verification, we may conceivably increase our understanding of the elements of structure and process represented by the variations which we introduced into the tables and algorithms. Needless to say that it is premature to speculate about how such a series of computer programs could be implemented in a manner which would be both realistic from the standpoint of what we know about culture, and manageable in the light of what we know about computers and programming.

D I S C U S S I O N II

In discussion of verbal data processing, Garvin raised a question of the types of computer use. He expressed doubt as to whether any use of verbal data processing for non-linguistic purposes was more than information retrieval plus a dash of fact correlation, and stressed a distinction between clerical and nonclerical uses of the computer, according to the extent to which the program itself is more than a mechanized tool for research. A concordance or any simple-minded processing of data is useful, but is it more than an automatic file? The interesting question is what one can do with a computer in anthropology that is more than a purely clerical operation. Needham queried whether there were anything one could do in any field with a computer which was not in fact purely clerical. Garvin responded that combinations of instructions in a program were not purely clerical in import, although the instructions individually were. Hymes emphasized the great value for some anthropological purposes of the purely clerical operations, not only in terms of handling quantities of data and economies of time, but also in terms of the rewards of the explicitness demanded by computer processing. Gardin described as problems which have been tested, and are neither Utopian nor merely clerical, those of classification; of establishing networks; and of content analysis. Lamb stated that a great part of the emphasis in anthropology at the present time should indeed be on clerical applications, because such are needed. Garvin responded that he was not rejecting any valuable use of the computer, but merely concerned to make the distinction in question, and to point up the non-clerical potential of the computer. Hays proposed a somewhat different order of classification of the kinds of things one can do with a computer: (1) filing and sorting; (2) data reduction for simulation, or realization of formal properties of data not otherwise obtainable (e.g., numerical integration, numerical solution of differential equations). Many analyses of social systems (simulation) fall here - theory developed but not realizable by mathematics. Data reduction includes Gardin's three types of problem (classification, network analysis, content analysis); statistics also falls on the side of data reduction, which includes estimation of parameters, identification of values and variables, and transformations of data, and recoding. Most linguistic processes are of that kind, content analysis, sentence structure determination, seeing what attitudes, values, beliefs, notions, etc., are expressed in a text. One can go on for a long time, and usefully so, from many different angles and points of view. What Garvin has been expressing is a preference for

142

DISCUSSION II

cases in which the structure of the program reflects the structure of the anthropological thinking. Gardin's presentation of an original typology of computer uses and anthropological problems was in response to earlier discussion, and a consequent need felt for systematic attention. The presentation in turn elicited a vigorous exchange, and a number of suggestions, including fresh interpretations of the scheme, being offered. There was some agreement that more dimensions existed than could easily be represented in a plane, but that selecting the most significant would serve. (Much of the discussion has been assimilated by Dr. Gardin in the present paper.) There was agreement on the need to move from thinking of problems only in terms of conventional anthropological fields, to thinking of problems in terms of the general features they share from a theoretical and methodological point of view, features which may cut across conventional lines. In a typology of problems, the chief concern should be (as stated by Gardin in the introduction to his paper) to link formulations familiar to the anthropologist, usually in subject-matter terms of type of data and of result desired, with formulations in the operational terms involved in computer processing, so that anthropologists can better see the possibilities of the latter.

III. MODES OF USE: SPECIFIC

COMPUTERS A N D THE STORAGE A N D RETRIEVAL OF ANTHROPOLOGICAL INFORMATION

ROBERT BRUCE INVERARITY

1. Introduction: The Problem of Proliferation. 2. Scientific Journals. 2.1. The Situation. 2.2. One Answer: Microreproduction. 3. Libraries. 3.1. The Situation: Bottleneck. 3.2. The Situation: Attitudes. 3.3. One Answer: Take the Lead. 4. Development and Uses of Instrumentation. 4.1. Earlier Background. 4.2. An Example: Visual Files. 4.3. Confusions of Terminology as to Instrumentation and Its Uses. 4.4. Present Nature and Uses of Computers. 4.4.1. Nature. 4.4.2. Uses. 5. The Human Problems. 6. Uses for Anthropology. 6.1. Storage and Retrieval of Basic Literature. 6.2. Machine Translation. 6.3. Need for Centers and Cooperation. 6.4. Simulation. 6.5. Data Storage (e.g., Mayan). 6.6. Art Analysis. 7. Conclusions: Costs and Prospects.

1. INTRODUCTION: THE PROBLEM OF PROLIFERATION

Not too many years ago an intelligent and educated person could understand and retain a large part of the important writings of the world. In fact at the turn of this century it was possible for the few existing anthropologists to understand and retain most of the available anthropological literature. But man is limited by the capacity of his memory no amount of stretching, squeezing, or pounding is going to increase this capacity enough to matter. Quite naturally, instead of storing information within his memory man can store it outside himself, and he has done just that. However, no amount of stretching, squeezing, or pounding is now going to make it possible for him to amass the available literature, beside being able to locate all the available information on any insignificant item. It would be depressing to evaluate the reservoir of knowledge on a subject, now in the literature, with the tiny amount that we personally put to use - we are indeed ignorant.

2. SCIENTIFIC JOURNALS

2.1. The Situation For the past century the technological collections of the Library of Congress have nearly doubled every twenty years and presently contain over a million and a half volumes of books and periodicals (Committee on Government Operations, 1960). The increase in scientific journals is obviously proportionate to greater specialization and constant fracturing of scientific fields. In turn the scientist, in an attempt to keep abreast, subscribes to more journals and is faced with the reality of being unable to intelligently read them all and subsequently misses a greater number of

COMPUTERS AND THE STORAGE OF INFORMATION

147

articles of concern to him. An attempt to solve this dilemma has been instigated in various disciplines primarily by printing organized indexes of certain journals. In turn the scientist tends to forego his subscriptions to journals and relies on journals of indexes. However, he now finds that the indexes may not include articles he needs or that it is difficult to secure a reprint of a desired article. 2.2. One Answer: Microreproduction

The whole problem of original publication in scientific journals is a major one, not only from the viewpoint of the reader's inability to find, or learn about, what is published, but also the increasing costs of publication are making the journals more expensive. One of the obvious answers to the latter problem is to publish in microform or in miniprint. At the moment microreproduction, either microfilm or microcards, is excellent for the reproduction of a limited number of copies, but as the number of copies increases the cost is no longer competitive with standard printing procedures. Therefore, microreproduction, at this time, unless we develop cheaper processes as well as greater reduction ratios and adequate means of reading such reductions, is most suitable for original publication of from one to five hundred copies of an article. In many instances this is all the copies that are needed and, if distribution techniques were developed so these could reach the people concerned, there would no longer be a need to print articles of restricted interest in a journal with an edition of, say, 15,000. The editing, proofreading, typesetting, paper, press time, binding, addressing and mailing of 14,500 copies of such a specialized article multiplied by other such articles, probably included in the same journal, should present cost figures of interest to any group publishing scientific journals.

3. LIBRARIES

3.1. The Situation: Bottleneck

I have tried to indicate briefly the problem which exists with journals, but the greater problem is the bottleneck which exists in libraries. The libraries may collect and preserve knowledge for the present and future but, unless such information is readily available, the library is a place of dead storage. The cost of purchasing, cataloging, binding and storing a

148

ROBERT BRUCE INVERARITY

book on the library shelf is rapidly increasing but more frightening is the fantastic growth of library collections and our inability to make real use of these collections. Statistics are readily available for library usage, but has anyone tried to develop statistics indicating what information is in the library that is not or cannot be used? 3.2. The Situation: Attitudes The lag between man's ability to develop new ideas and equipment to meet these needs and his ability to put them to use is no better demonstrated than in the whole area of information retrieval. The world of business, the biological, physical and chemical sciences must be given due credit for their rapid development and use of new techniques and instrumentation. Naturally, in all fields of endeavor, there are those who anticipate and attempt to lead but, unfortunately, except for a small minority, the very field of librarianship and documentation which should be most concerned with these problems appear to be apathetic. 3.3. One Answer: Take the Lead As recently as 1960 the Society of American Archivists at its 24th annual convention was reported as arriving at an interesting conclusion: "It was generally conceded that, from the viewpoint of the scholar, automation could not replace the personal touch of the accomplished archivist, and that the indexing of data stored by electronic process would be hardly suitable in future years for the demands of the researcher" (Bishop, 1960). [See now Automation and the Library of Congress (Washington, D.C., U.S. Government Printing Office, 1964); Walsh (1964); and Boehm (1963).] This point of view leads me to believe that the various disciplines must proceed on their own and hope that, in time, the librarians and archivists will follow unless they wish to watch their professions crumble to that of mere custodianship. It is the users of knowledge, not the organizers, who are presently making progress. For a moment let us forget the constantly expanding flood of printed words, concepts, ideas and illustrations which threaten to inundate us. The problem is still there whether we choose to see it or not. In turn let me momentarily give some hope to the scientist and scholar who may be unaware of what is happening in this great field of information storage and manipulation by saying that, although we are only in the beginning stage, the ability of man to make

COMPUTERS AND THE STORAGE OF INFORMATION

149

instrumentation to meet these problems has been adequately demonstrated. In fact at this time we do not have the procedures for handling information that will allow us to adequately use the instrumentation now available. I firmly believe that, if we can explain our problems to the instrument designers, it is likely they have, or can, construct instruments to meet our needs. 4. DEVELOPMENT AND USES OF INSTRUMENTATION

4.1. Earlier Background There is evidence to suggest that some of the basic principles currently used in information storage and retrieval are not as new as we might like to think. Clay tablets discovered in ancient Babylonia at Sumer are very similar in concept to one of the successful, flexible and inexpensive punched card systems presently finding acceptance. Each clay tablet is concerned with one symptom of illness, but also lists the diseases in which the symptom is found. By comparing tablets of symptoms, it is possible to form an idea of what the disease might be (International Study Conference on Classification for Information Retrieval, 1957). 4.2. An Example: Visual Files Just what does all this mean to the anthropologist and how does it relate to his own activities? Perhaps some frame of reference can be supplied by relating an actual example. In the late 1940's I had an idea and successfully demonstrated a technique for creating a great central depository of all types of visual material to which anthropologists and others concerned with such material could come for information necessary to their studies, or, to which written requests could be made (see Inverarity, 1960). It was part of my thesis that, by such an organization and structuralization, relationships not possible by other means could become apparent. Visual material as envisioned then was from the literature of anthropology, depositories and private collections of photographs from all parts of the world. Access to the file was originally achieved by using the punchedcard technique then primarily used in business. For a period of about two years, assisted by a small staff, research was conducted to explore many different potential processes in order to develop the most efficient method. Among others an extremely successful technique was developed, using a then unique punched card system in conjunction with micro-

150

ROBERT BRUCE INVERARITY

reproduction, and successful tests were made of one culture. At this point one of the two sponsors of the project, although dedicated to such pursuits, decided that it all had no value to them and the project unfortunately terminated. At that time I was constantly amazed at the lack of foresight among scientists of stature who questioned the value of such a program happily, some of these people have now reversed their opinions. 4.3. Confusions of Terminology as to Instrumentation and its Uses There is confusion in the terminology to describe these new resources and what they do. The early use of machines as calculators, punched card devices and so forth was in the field of business and accounting and they were called business machines. Then the word automation slipped into use but in reality it would seem more intelligent to confine this word to the mechanical handling and control of materials. Data processing is now commonly found in use and here data is information and processing is the use of this information by some machine or instrument. The magic word electronic indicates the change that has taken place from mechanical to electric components or combinations of both. However, computer added to this list does indicate a major change. The computer has its own memory system for receiving instructions and it can calculate, take the conclusions, compare them with other data in its memory or look for more instructions in its memory and then, on the basis of what instruction it receives, continue to do more calculating. There are many more specialized words such as analog, digital, readers, sensing, storage media, recording, transcribing, input, output, converters, and so on, but of these most can be skipped at this time, leaving them to the highly trained specialists including the software men and programmers who deal with all this instrumentation familiarly called hardware. Even the whole general field as yet has not received a commonly accepted name. Information storage and retrieval indicates only a part of the capacity of contemporary instrumentation and, until the objectives and possible results are further clarified by time, the confusion will no doubt continue.

4.4. Present Nature and Uses of Computers 4.4.1. Nature For the moment let us be concerned with data, in the form of some language. We are concerned with storage and the ability to recall from storage the information we have placed there. Shortly we can conjecture

COMPUTERS AND THE STORAGE OF INFORMATION

151

as to other approaches, but let us consider what these instruments can do and what they are doing, remembering that there now exists a great variety of equipment to handle a great variety of problems - no one piece of equipment does everything, but many of these instruments can be interlocked to accomplish amazing results. 4.4.2. Uses Now, what are these instruments doing, and where are they used? Federal, State and city governments, as well as universities, appear to be at this moment the largest users of computers. In California, the police put the modus operandi as well as the descriptions of criminals into a computer and, when a crime is committed, can often pin-point the possible criminal by comparative methods. Los Angeles city does its accounting and has also determined the best routing of its garbage pick-up by computer planning. 5. THE HUMAN PROBLEM

A vast wonderful world of whirling, clicking and flashing instruments that appear to be able to cope with a great range of problems lead many to argue whether these machines in themselves constitute a threat to man. Norbert Wiener wrote not too long ago: "It is my thesis that machines can and do transcend some of the limitations of their designers, and that in doing so they may be both effective and dangerous. It may well be that in principle we cannot make any machine the elements of whose behavior we cannot comprehend sooner or later. This does not mean in any way that we shall be able to comprehend these elements in substantially less time than the time required for operation of the machine, or even within any given number of years or generations" (Wiener, 1960). Others do not agree with Wiener's thesis.

6. USES FOR ANTHROPOLOGY

6.1. Storage and Retrieval of Basic Literature It is now possible to store the basic literature of anthropology in a computer once an acceptable technique is chosen and a language of description and retrieval developed. One method which might be used is storage and retrieval in the form of full-text indexing which will permit access to key words; however, related and non-related material will be

152

ROBERT BRUCE INVERARITY

retrieved due to language problems. Bibliographical information so obtained will permit the anthropologist to arrange subject analyses in a multitude of new ways which are impossible to do with today's vertical file indexes. But some thought must be given as to what the anthropologists want. An oversimplification of this problem is; if the material is to be worked comparatively by such equipment - store it in a computer but if you wish merely to recall the article, reference or paper, it may be simpler and less expensive to use other equipment. Equipment to read typewriter copy is on the market and in the laboratory work is under way on machinery to read handwriting. The anthropologist should think about the techniques, available equipment, costs, frequency of use, and the type of result desired before launching on any program of information storage and retrieval. I believe that without considering these factors the potential user tends to be emotionally swayed by the publicity, whirling wheels, lights, and prestige value of computers and compulsively institutes a computer program, or becomes lost wandering in this forest of vacuum tubes and transistors and eventually emerges discouraged and confused in his attempt to organize the information he has at hand. Consequently the data continues to pile up in the same old disordered manner which, rationalized by the user, becomes an acceptable order. The potential of the manually sorted edge or internally punched card has been demonstrated in the field of business and some of the sciences. The anthropologist can well afford to explore this potential to see if it will meet his needs as the cost is low and the various systems are adaptable to a wide variety of problems. Systems and equipment are available from a number of manufacturers both in Europe and America. A variety of coding methods either by numbers, letters, symbols or words may be used to refer to documents, references, and in some cases may carry on the card itself abstracts, bibliographical information, the actual reference, or reproductions of text or object. Mechanically and electrically operated systems obviously mean greater expense and often greater speed but can, in turn, handle larger masses of data. However, if tremendous masses of data need to be stored and returned or complex numerical manipulations are needed, the computer is the answer. 6.2. Machine Translation

Throughout the world a movement is afoot to evolve means of mechanical translation using a cross-discipline approach, including electronic

COMPUTERS AND THE STORAGE OF INFORMATION

153

engineers, linguists, philosophers, mathematicians and logicians. Anthropology will certainly benefit from this development, although it is part of a larger concept as expressed by one Russian worker: "Today machine translation is regarded only as the first stage toward solving a more general and important problem: by most fully using electronic machines as auxiliary tools of human thinking, to make the machines capable of performing the widest possible operations with texts written in different languages, to enable it not only to translate but also to edit, make abstracts, furnish bibliographical and other references, etc. All these operations boil down to extracting from the text required information and to recording that information in some other place" (Melchuk, 1959). 6.3. Need for Centers and Cooperation The appalling fact about the use of these techniques and devices remains: much is written and talked about and comparatively little is actually started or completed. The Royal Society Scientific Information Conference, held in London in 1948, discussed and published at length regarding the need for and ways to store information and retrieve it (Royal Society, 1948). At the International Conference of Scientific Information, held in Washington, D.C. in 1958, many of the same problems were presented and discussed with indications of progress in equipment but not necessarily in application (National Academy of Sciences - National Research Council, 1960). We are still developing small clusters of uncoordinated data - tiny pools of knowledge scattered like drops of oil on a pond. Let us hope these drops will clump into larger masses until we will evolve an international memory center. However, this will be years in the making and now is the time for the various scientific segments to start developing their own memory centers. Anthropologists, I feel, should immediately initiate a united effort to develop intelligently machine language, codes and whatever else may be necessary to utilize existing equipment and begin to store the knowledge that is now available. (See 6.7 below.) At this writing I do not feel that emphasis should be placed on the techniques or equipment used but rather that simple beginnings could be made by individual groups, at various levels, in all parts of the world. Only one point of caution need be stressed and it is that these beginnings, regardless of the coding systems, should have some form of compatability

154

ROBERT BRUCE INVERARITY

so that the stored and organized information may at a later date be transferred or encoded into other systems on present or forthcoming equipment. In this way a scientist using a simple punched-card system costing only a few dollars could actually be preparing material which might be integrated with a much larger project and fed into a computer by someone else at another place at a later time. 6.4. Simulation

At this time, it seems to me, one of the marked opportunities for the anthropologist is to make use of the ability of computers to simulate. By taking data now available and feeding it into computers, one may test theories, processes and the like with an accuracy, speed and encompassment beyond the abilities of the human brain. Certainly the known changes, tensions and complexities resulting from the impact on a culture by a more complex culture could be programmed for computer use to anticipate similar cultural clashes which are occuring today and which will no doubt occur in the future. It is perhaps through such studies that the anthropologist using a cross-discipline approach could make a continuing contribution to the easing of contemporary tensions. 6.5. Data Storage (e.g.,

Mayan)

The storage of large amounts of statistical and other data should alone provide an enviable memory laboratory in which the computer becomes the key instrument. Or consider the specialist who is concerned with research on one cultural group - for example, the Maya. At one time he may be working on variances between cities; another time on the variations between the hundreds of houses of Mayapan and some other city; he might relate the known trade routes and the influences of trade with different areas and people (cf. Gardin's paper in this volume on such a study for the Near East); trace the use and movement of a particular color in ceramics; organize a giant file on Maya art; compare Maya beliefs with a different culture; check the archaeological facts against the vast reservoir of historical writings; attempt further clarification of undeciphered glyphs; compile statistical records of physical anthropology; establish kinship patterns and relations; produce pot sherd sequences; develop cultural trait lists, or organize the mass of literature written about the Maya - all this and so much more is food for the insatiable computer.

COMPUTERS AND THE STORAGE OF INFORMATION

155

6.6. Art Analysis In the area of art analysis, on a large scale, the computer may be the breakthrough for further knowledge, as the discovery of the murals of Bonompak, on a smaller scale, were a breakthrough for further knowledge of Mayan life. It occurs to me there are two minor points pertinent to the above that should be noted. In art analysis the reproduction of the object is one thing and what the analyzer may say or code for computer use is a different thing. As the computer is not a substitute for a book, in turn the storage of an image for retrieval is but an approximation of the object - truly an abstraction. A photograph of a ceramic pot will give the observer a monocular abstracted image limited by the artificial range of the photographic medium. The other sides of the pot are not seen. It is possible to take sequence photographs of other views of the pot or a continuous strip photograph of the whole exterior surface, but nothing will replace seeing the actual vessel itself and the sequential views correlated by our brain as we handle and examine such a ceramic. The pot has a configuration which is unique and any reproduction is merely an approximation of this configuration. To a sensitive trained observer, there is a great difference between a fine color reproduction of a painting and the painting itself. What one individual may decide to code and program into a computer regarding the pot naturally will be determined by his acuteness, personal interest and coding discipline. What a user of such data may be interested in may not have been coded. A series of criteria for coding must be established and retrieval must be couched in the same terms. The dichotomy which now exists between coding for visual images and coding for verbal material may not be as great as some people think.

7. CONCLUSIONS: COSTS AND PROSPECTS

It is almost impossible to keep up to date with developments in computer design and use that are spewing forth in all parts of the world. Even the present use or potential of such equipment in the field of anthropology is little known. The cost of computers and computer time is in itself so frightening to the anthropologist, who is perhaps living on an assistant professor's salary, that to toy with the idea of using such a research tool is beyond his wildest dreams. But need this be so - with instrumentation being miniaturized, with greater production of computers and greater use,

156

ROBERT BRUCE INVERARITY

obviously costs must decrease. Yet when one is talking about such instrumentation, the costs are so astronomical that even large reductions still leave computers out of the range of most anthropologists. This factor could be alleviated if the anthropological profession as a whole could work with a group of universities to develop a computer program and utilize university computers when not in use. As computer programming takes the largest amount of time, there are periods when most computers are not in use. One company has been formed that buys from computer owners this inactive time and sells it to groups that do not have their own computers. Some parallel arrangement could be started now so that members of the anthropological profession could begin to think in terms of computer use and problems. Unless this or some similar step is taken this generation of anthropologists may be in time unkindly referred to as stone-age anthropologists. Earlier I mentioned the lag which occurs between discovery and use. If this lag did not exist and we could forget the costs, it would now be possible to have a world computer center for anthropology - in which we could store not only all the knowledge we now have in the literature, but also research notes and other untapped files of data now scattered around the world. Data to and from such a center could be sent via the present telephone systems. Recently the phone numbers, names and addresses of 1,000 subscribers listed in the Manhattan phone directory were sent from New York to Poughkeepsie and return, a distance of 136 miles, in a few seconds (International Business Machines, 1961b). Such equipment makes it possible for various smaller computer centers to be linked together. These smaller centers might each be specialized but when necessary they could be tied together and make all the units as accessible as if they were in one building. Such a facility could take over the present practice of publishing journals. The information could be stored in computers and the anthropologist would have available to him a far broader spectrum of material than otherwise possible. With computers now able to store and print, large bibliographies of current references could be quickly produced and distributed monthly. Access to such computers could be by telephone dial and codes for simple requests could be dialed. Or, by adding present-day television techniques and office copying machines, the anthropologist could be in direct contact with the center and have placed on his TV-like screen the actual pages he desired to read and make immediate copies for himself when he so desired. With the advent of communication satellites which are now feasible, it has been established that one orbital "Post Office", using present-day

COMPUTERS AND THE STORAGE OF INFORMATION

157

facsimile equipment, could easily process all of today's transatlantic correspondence (Clarke, 1962). A large number of such satellites will certainly change the world's sense of time and subsequently patterns of living. It may be that all of the above will take place at a snail's pace and the anthropologists' use of these instruments will be equally slow. Let us hope that the already successful attempts of computers to simulate human thinking will not lead to computers regenerating themselves and thus make man obsolete.

REFERENCES Bergamini, David, "Government by computers", The Reporter Magazine, New York, 17 August 1961. Bishop, C. Nelson, "Archivists convervative on automatic aids", Christian Science Monitor, Boston, 8 October 1960. Boehm, Eric H., "Dissemination of knowledge in the humanities and social sciences", ACLS (American Council of Learned Societies) Newsletter, 14 (5) (New York, 1963), 3-12. Brindley, C. E., Indexing methods as a means of increasing the fertilily factor of medical literature (Rahway, N. J., Merck and Company, Inc., Medical Division, 1953). Clarke, Arthur C., "The social consequences of communications satellites", Horizon Magazine, New York, January, 1962. [Committee on Government Operations], Documentation, indexing, and retrieval of scientific information (Report prepared by the Staff of the Committee on Government Operations, United States Senate, 86th Congress, 2nd Session) (Washington, D. C., Government Printing Office, 1960). [International Business Machines], "The WALNUT information retrieval system", Stockholders Quarterly Report, New York, International Business Machines, July, 1961. (a) —, "Linking far-flung computer centers", Stockholders Quarterly Report (October). New York, International Business Machines, October, 1961. (b) [International Study Conference on Classification for Information Retrieval], Proceedings of the International Study Conference on Classification for Information Retrieval (London, Aslib; New York, Pergamon, 1957). Inverarity, Robert Bruce, Visual files coding index (=Publications of the Research Center for Anthropology, Folklore, and Linguistics, No. 15; Supplement to International Journal of American Linguistics 26:4, Part III) (Bloomington, Indiana University, 1960). McCulley, W. R., "Univac compiles a complete Bible concordance", Systems Magazine, March-April 1956, 20. Melchuk, I. Α., [Article in] Computers and Automation, 8 (1959), 23. [National Academy of Sciences - National Research Council], Proceedings of the International Conference on Scientific Information (Washington, D. C., National Academy of Sciences - National Research Council, 1960). [Cited from the preprints of papers for the conference, November 1958.] [Royal Society], Reports and papers submitted, The Royal Society Scientific Information Conference (London, The Royal Society, 1948).

158

ROBERT BRUCE INVERARITY

[Standard Register Company], Paperwork simplification (=Standard Register Company. No. 36) (1954). Walsh, John, "Library of Congress: Automation Urged for Bibliographic Control But Not Prescribed as a Panacea", Science, 143 (1964), 452-455. Wiener, Norbert, "Some moral and technical consequences of automation", Science, Washington, D. C , 6 May, 1960.

LINGUISTIC D A T A PROCESSING*

SYDNEY M. LAMB

1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Scope of the Field. Classification of the Field. Simple Text Processing. Automatic Linguistic Analysis. Linguistic Automation. Processing of Linguistic Files. Simulation of Language Dynamics. Knowledge Retrieval. Statistical Studies of Verbal Data. Concluding Remarks.

* This work was supported by the National Science Foundation.

1. SCOPE OF THE FIELD

The term linguistic data processing is ambiguous. In fact, it is actually triguous, since it can be interpreted in three ways. I have intentionally used this triguous term as the title of this paper and as the name of a field of intellectual activity, since I want it to be taken in all three ways. That is, the scope of the area covered in this paper is the union of the three areas defined by the three meanings of linguistic data processing (LDP). These three meanings can be expressed as (1) processing of linguistic data (whether for linguistic or for non-linguistic purposes) (LDP]}; (2) processing of data (whether verbal or not) for linguistic purposes (LDP2); (3) linguistic processing of data (i.e. operating on data with linguistic processes) (LDP3). The term linguistic process may be taken as applying to those processes which are part of the central concern of linguists, in particular the processes involved in the production and understanding of discourse and in linguistic analysis. The boundaries of the area covered by this term are somewhat vague, and this is not necessarily undesirable. Linguistic analysis inevitably includes certain simple clerical operations in addition to those complex analytical processes which are part of the central concern of linguists. The former are presumably to be excluded from the class of linguistic processes, but it is not immediately apparent where the line is to be drawn. Different linguists would doubtless disagree on this point. Consequently, the scope of the area covered by LDP 3 (linguistic processing of data) is not fixed. However, since linguistic processing of data presumably always involves linguistic data, LDP 3 is presumably included within LDPX (processing of linguistic data). Thus we may take the scope of linguistic data processing to comprise all of LDPX plus all of LDP2, with assurance of automatically including all of LDP,.

161

LINGUISTIC DATA PROCESSING

LDPX and LDP 2 have, as might be expected, a large area of overlap, but each includes some area not included in the other. The relationships among all three kinds of linguistic data processing are shown in Table I, in which we have two dimensions based on the first two meanings of linguistic data processing, and two subdivisions of LDPX based upon the third meaning. The table divides the universe of computer uses into six sections,fiveof which come under the heading of linguistic data processing. Obviously, the areas of the boxes in the figure have no relation whatever to the size of the corresponding areas of computational activity; section six of the chart, which is excluded from linguistic data processing, represents a much greater sphere of current activity than all of the other five combined. On the other hand, there is a rapidly growing realization of the importance of various types of linguistic data processing which should eventually lead to a tremendous increase in the amount of activity going on in LDP, so that it may eventually become one of the most important areas of computational activity. table

ι

The Scope of Linguistic Data Processing and of Mechanolinguistics Of Linguistic Data With Linguistic Processes

PROCESSING

For Linguistic Purposes

For Non-linguistic Purposes

1

2 ldp1>2>3

3 ldp2

LDPi.2

ML 4

Of Non-linguistic Data

With Non-linguistic Processes

ML

ML 6

5 ldp1>3

LDP, ML

Examples: 1. Research on automatic parsing as an approach to the study of linguistic structure. 2. Statistical studies of texts to throw light on linguistic structure. 3. Simulation of language-area dynamics. 4. Machine translation of Russian ethnographic literature (for anthropologists). 5. Construction and use of an automatic citation index for ethnography. 6. Calculating orbits of earth satellites.

162

SYDNEY Μ. LAMB

The table also delimits the scope of a somewhat more restricted area than linguistic data processing, namely mechanolinguistics (ML). Mechanolinguistics (or, as an alternative designation, computational linguistics) is the interdisciplinary area concerned with the fields of linguistics and automatic computation. As such, it can be taken to include all processing of data for linguistic purposes (i.e. all of LDP2) plus all linguistic processing of data (i.e. all of LDP 3 ); but it excludes the non-linguistic processing of linguistic data for non-linguistic purposes; i.e. it excludes section five of Table I as well as section six. Obviously, the field of linguistics is not concerned with all that involves verbal data, since if it were it would have to include all of literary criticism, most of history, and a great deal of most other fields. So such computer applications as the simple processing of folklore text for purposes of ethnographic research, or simple processing of bibliographic information as an aid in keeping track of ethnographic literature, belong under the heading of linguistic data processing but not under the heading of mechanolinguistics. Mechanolinguistics includes any data processing which is directly applicable to the work of the linguist (therefore all of LDPa) plus any work involving operation upon linguistic data with linguistic processes even if the results are useful outside of linguistics, as in the case of a linguistically sophisticated retrieval system for ethnographic literature that would be able to do grammatical analysis of the sentences contained in this literature in order to enhance its effectiveness.

2. CLASSIFICATION OF THE FIELD

One way of classifying LDP is that shown in representing its scope in the figure above, since it divides the field into five sections. I find this, however, to be rather unsatisfactory as a classification. The dimensions on which it is based are not the most useful for dividing the area into interesting sub-areas. So I attempt a more interesting set of dimensions. Computer applications may differ from or resemble one another with regard to (1) the type of use to which they are put, (2) the type of process involved, (3) the type of objective being aimed at, and (4) the type of data being operated on. The first, third, and fourth of these are related to the categories shown above in Table I, but here I intend to be more specific with regard to them. For example, under type of data we want to distinguish not just whether the data are linguistic or non-linguistic, but rather whether we are dealing with texts, lists of grammatical rules,

LINGUISTIC DATA PROCESSING

163

or bibliographical references (all of which are linguistic data). Similarly, instead of merely distinguishing linguistic processes from non-linguistic processes we want to distinguish simple data reorganization from sentence generation, from automatic segmentation, etc. These four dimensions are relatively independent of each other. For a given type of use, e.g. for purposes of synchronic linguistic analysis, there is a wide variety of applicable processes. A given type of process, e.g. simple processing of texts such as concordance-making, has a very wide variety of uses, since concordances are useful in linguistic analysis, in the study of folklore, in the study of literary style, in ethnographic research, etc. A single type of objective, say machine translation, can have not only a wide variety of uses (e.g. in Linguistics, Anthropology, Chemistry, etc.) but can also be associated with a variety of types of process, since there are various approaches, using various different procedures, to the problem of machine translation, Similarly, there are various different procedural approaches aimed at the objective of synchronic linguistic analysis, so that if we were classifying by type of process, these would fall under different headings, while if we classified by type of objective they would come under the same heading. And operations on a given type of data can be associated with a wide variety of processes, a wide variety of uses, and a wide variety of objectives. For example, texts as a type of data to be operated on can be subjected to such processes as concordance-making, indexing, or automatic decoding; for purposes of linguistic analysis, ethnographic research, etc.; or they can be operated on in the context of different objectives, such as machine translation, information retrieval, automatic linguistic analysis. One way to classify objects is to isolate whatever dimensions appear to be most meaningful as a basis for the classification, and then intersect the dimensions to obtain various compartments in a multi-dimensional space, each of which is a category of the classification. This approach to classification has the advantage of tending to insure that everything in the area to be classified is provided for, but it has the disadvantage that some of the compartments which it provides may be empty or nearly empty while others are very crowded. It is a deductive approach, and may be contrasted with the inductive approach, which I use here, in which we base the classification on the objects to be classified rather than on the framework provided by the dimensions. We can still use our dimensions and we can even still think of our «-dimensional space, but our primary categories will come from observation of the objects. In this survey I divide the field of linguistic data processing into eight

164

SYDNEY Μ. LAMB

areas on the basis of various similarities and differences having to do with the four dimensions named above; but with reference to different ones in different cases. For example, one of the areas can be called simple text processing. The various related computer uses in this area are closely related to one another because they involve the same type of data and similar types of processes. But they differ widely from one another with regard to uses to which they can be put. Thus, with reference to the dimensions, the systems in this area are clustered together in one part of the process dimension and in one part of the data dimension but they are spread very widely throughout the use dimension. An entirely different situation holds for the category of automatic synchronic linguistic analysis. Here too it is meaningful to consider this a single category even though the field as a whole can be broken down into several processes (e.g. segmentation, tactic analysis) and a variety of procedures is available for each of these processes. The eight areas, two of which are divided into two sub-areas each, are shown in Table II below, together with information showing various properties which characterize each. Column 1 names the areas. Column 2 states what it is that the various uses within each area have in common that makes it convenient to group them into the same area; that is, it names the dimensions (in terms of the above discussion) in terms of which the uses in each area are clustered. Column 3 is concerned with what it is about each area that makes it valuable or interesting to the anthropologist (discussed in the following paragraph). The next three columns relate to the three meanings of linguistic data processing. An X means yes, a space means no. Column 4 is concerned with whether or not the area involves linguistic processes, i.e. whether or not it comes under the heading of LDP3. Column 5 is concerned with whether or not the area involves linguistic data, i.e. whether it is part of LDPj. Column 6 is concerned with whether or not each has linguistic uses (all of them do). I give also column 7 to indicate whether or not each area has applicability outside of linguistics; and here I have in mind particularly anthropology (by which I mean that part of anthropology which is not also linguistics). If for an area there is an X in both columns 6 and 7, it means that the area is partly within LDP2 and partly outside of it. Thus compartment 5 in Table I above (that part of LDP which is not also mechanolinguistics), the processing of linguistic data with non-linguistic processes for nonlinguistic purposes, is partly contained within the area of simple text processing and partly within the area of statistical studies of verbal data. But since each of these areas also has linguistic uses they are also partly

-