254 54 138MB
English Pages 2990 [2995] Year 2014
THE UTTAL TETRALOGY OF COGNITIVE NEUROSCIENCE
Volume 1
THE PSYCHOBIOLOGY OF SENSORY CODING
THE PSYCHOBIOLOGY OF SENSORY CODING
WILLIAM R. UTTAL
First published in 1973 This edition first published in 2014 by Psychology Press 27 Church Road, Hove BN3 2FA and by Psychology Press 711 Third Avenue, New York, NY 10017 Psychology Press is an imprint of the Taylor & Francis Group, an informa business © 1973 William R, Uttal All rights reserved, No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-1-84872-428-0 (Set) eISBN: 978-1-315-76930-1 (Set) ISBN: 978-1-84872-429-7 (Volume I) eISBN: 978-1-315-76929-5 (Volume I)
Publisher's Note The publisher has gone to great lengths to ensure the quality of this book but points out that some imperfections from the original may be apparent. Disclaimer The publisher has made every effort to trace copyright holders and would welcome correspondence from those they have been unable to trace.
THE PSYCHOBIOLOGY OF SENSORY CODING WILLIAM R. UTT AL
University of Michigan
Harper & Row, Publishers New York, Evanston, San Francisco, London
Sponsoring Editor: George A. Middendorf Project Editor: Sandra G. Turner Designer: Michel Craig Production Supervisor: Robert A. Pirrung THE PSYCHOBIOLOGY OF SENSORY CODING Copyright © 1973 by William R. Vttal All rights reserved. Printed in the Vnited States of America. No part of this book may be used or reproduced in any manner whatsoever without written permission except in the case of brief quotations embodied in critical articles and reviews. For information address Harper & Row, Publishers, Inc., 10 East 53rd Street, New York, N.Y. 10022. Standard Book Number: 06-046737-1 Library of Congress Catalog Card Number: 73-15972
For Michan
CONTENTS
Preface Chapter 1. An Orientation I. INTRODUCTION A. Psychology, Physiology, and Psychobiology B. The Scope of "Sensory" Sciences II. VARIETIES OF SCIENTIFIC QUESTIONS AND APPROACHES A. Some Thoughts on the Nature oE Scientific Activity B. A Hierarchy of Questions of Sensory Psychobiology III. PLAN OF THE BOOK SECTION ONE-FUNDAMENTAL MATERIALS Chapter 2. The Nature of Physical Stimuli I. INTRODUCTION A. What Is a Stimulus? B. The Logical Necessity for Dealing with the Potential Stimulus as an Initial Reference C. The Notion of the Adequate Stimulus II. MECHANICAL STIMULI A. The Nature oE Mechanical Stimuli B. Metrics oE Kind C. Quantity Measurements of Acoustic Stimuli
xiii 1 1
1
4
6 6
10 13
19 21 21 21 23 24 26 26 30 35
Contents
viü
D. E. F. G.
III.
IV.
V.
VI. VII. Chapter 3. I. 11. III.
IV. V. VI. VII. VIII.
IX.
Sources of Aeoustic Stimuli The Nature of Vestibular Stimuli Sources of Vestibular Stimuli The Nature and Sources of Meehanical Cutaneous Stimuli THERMAL STIMULI A. The Nature of Thermal Stimuli B. Quantity Measurements of Thermal Stimuli C. Sources of Thermal Stimuli PHOTIC STIMULI A. The Nature of Photic Stimuli B. Metrics of Kind C. Quantity Measurements of Photic Stimuli D. Soure es of Photic Stimuli CHEMICAL STIMULI A. The Nature of Chemical Stimuli B. Quantity Measurements of Chemical Stimuli C. Soure es of Chemical Stimuli ELECTRICITY-THE UNIVERSAL STIMULUS A. The Nature of Eleetricity B. Sources of Electrical Stimuli THE SPECIFICATION OF PATTERN The Anatomy of Receptors and the Sensory Pathways INTRODUCTION THE ELECTRON MICROSCOPE SOME BASIC RESEARCH TECHNIQUES IN NEUROANATOMY A. Additional Optical Microanatomical Teehniques B. Evoked Potential Teehniques C. Degeneration Teehniques THE GROSS STRUCTURE OF THE CENTRAL NERVOUS SYSTEM THE VISUAL SYSTEM A. The Anatomy of the Visual Reeeptor B. The Aseending Visual Pathway THE AUDITORY SYSTEM A. The Anatomy of the Auditory Reeeptor B. The Aseending Auditory Pathway THE VESTIBULAR SYSTEM A. The Anatomy of the Vestibular Reeeptors B. The Aseending Vestibular Pathway THE SOMESTHETIC SYSTEM A. The Anatomy of the Somesthetic Reeeptors B. The Aseending Cutaneous and Proprioeeptive Pathways THE OLFACTORY SYSTEM A. The Anatomy of the Olfactory Reeeptor B. The Aseending Olfaetory Pathway
38 39 40 42 44 44 45 46 47 48 50 52 58
60 60 62 65 67 67 71 75 81 81
82
91 91 92 92 94 101 101 110 113 113 122 122 122 129 131 131 137 140 140 144
Contents X. THE GUSTATORY SYSTEM A. The Anatomy of the Gustatory Receptor B. The Ascending Gustatory Pathway XI. AN INTERIM SUMMARY Chapter 4. Sensory Transduction I. INTRODUCTION H. TRANS DUC ER ACTION IN THE EYE A. Nonneural Stimulus Modifications B. The Primary Sensory Action in Vision C. The Production of the Visual Receptor Potential III. TRANSDUCER ACTION IN THE EAR A. Nonneural Stimulus Modifications B. The Primary Sensory Action in Audition C. The Production of the Auditory Receptor Potential IV. TRANSDUCER ACTION IN SOMESTHESIS A. Nonneural Stimulus Modifications B. The Primary Sensory Action in Somatosensation C. The Production of the Somesthetic Generator Potential V. TRANSDUCER ACTION IN OLFACTION A. Nonneural Stimulus Modifications B. The Primary Sensory Action in Olfaction C. The Production of the Olfactory Generator Potential VI. TRANSDUCER ACTION IN GUSTATION A. Nonneural Stimulus Modification B. The Primary Sensory Action in Gustation C. The Production of the Gustatory Receptor Potential SECTION TWO-SENSORY CODING Chapter 5. An Introduction to the Basic Concepts of Sensory Coding
I. INTRODUCTION 11. DISCRIMINABLE DIMENSIONS OF THE PHYSICAL STIMULUS (THE COMMON SENSORY DIMENSIONS) A. Perceived Quantity B. Perceived Quality C. Temporal Discriminations D. Spatial Discriminations III. POSSIBLE DIMENSIONS OF THE NEURAL CODE (THE CANDIDATE CODES) A. Place B. Number of Activated Units C. Neural Event Amplitude D. Temporal Pattern IV. CAUTIONS IN THE ASSOCIA TION OF SENSORY DIMENSIONS AND CANDIDA TE CODES A. A Distinction Between Signs and Codes-The Merely Concomitant Versus the Truly Relevant B. Dimension Alterations
ix 146 146 153 154 156 156 160 160 161 168 171 171 174 175 179 179 182 183 188 188 188 196 198 198 198 203 205 207 207 214 214 214 215 217 217 218 219 220 220 223 223 226
"
Contents C. Boundary Conditions Result D. Multiple and Overlapping Coding in Two or More Dimensions E. Species and Intraindividual Variability F. Attentional Limits on Our Perspective G. False Analogies V. A SUMMARY
227
Chapter 6. The Coding of Sensory Magnitude I. INTRODUCTION 11. THRESHOLDS AND SIGNAL DETECTION A. The Theory of Signal Detection B. Spontaneous Neural Activity C. The Detection of Threshold Visual Stimuli D. Adaptation of Visual Thresholds III. THE RANGE OF THE INTENSITY DIMENSION AND THE RESPONSE DYNAMIC A. Introduction B. Mathematical Descriptions of Response Dynamics C. The Neurophysiology of Response Dynamics D. What Is the Site of Response Compression? IV. IS INTERVAL IRREGULARITY A CODE FOR SENSORY INTENSITY?-A MODEL ANALYSIS A. Demonstrations of the Natural Occurrence of Interval Irregularity B. The Effect of Interval Irregularity on Synaptic Transmission C. The Effect of Interval Irregularity on Psychophysical Judgments V. AN INTERIM SUMMARY
232
Chapter 7. The Neural Coding of Space and Time I. INTRODUCTION A. Some Complexities B. Time and Space 11. SPATIAL AND TEMPORAL ACUITY A. Convergence and Divergence in Neural Nets B. Spatial Acuity and Receptive Fields C. Temporal Acuity and the Psychological Moment III. SPATIAL INTERACTIONS A. Perceptual Phenomena Related to Spatial Interactions B. The Neurophysiological Data C. Theories of LateralInteraction IV. THE NEURAL CODING OF TIME PATTERNS A. A Comment B. Mountcastle's Studies of the Physiological Basis of Flutter and Vibration Sensitivity
337 338 338 340 342 343 346 360
227 228 228 228 229
232 234 235 243 246 250 258 258 260 273 317 318 319 321 325 335
369 371 377
396 400 400 402
Contents
xi
V. SPATIAL LOCALIZATION-REPRESENTATION BY TEMPORAL AS WELL AS SPATIAL CODES A. The Psychophysical Data of Spatial Localization B. Neurophysiological Data of Relevance to Spatial Localization
410 410 415
Chapter 8. Feature Detection-Neurophysiology and Psychophysics I. INTRODUCTION 11. A GERMINAL STUDY: "WHAT THE FROG'S EYE TELLS THE FROG'S BRAIN" 111. ANOTHER GERMINAL STUDY: HUBEL AND WIESEL'S MOVING BAR DETECTOR IV. DIRECTIONAL SENSITIVITY IN THE SUBCORTICAL VISUAL CENTERS OF MAMMALS V. FEATURE DETECTION IN AUDITION VI. DO LATERAL INTERACTION AND FEATURE DETECTION MECHANISMS ADEQUATEL Y MODEL HUMAN PERCEPTUAL PHENOMENA? A. The Psychobiological Theories B. Some Discrepancies C. Conclusions VII. AN INTERIM SUMMARY
422 422
Chapter 9. The Neural Coding of Sensory Quality-Vision I. INTRODUCTION 11. THE KEY PSYCHOPHYSICAL DATA A. The Duplex Retina and Its Psychophysical Correlates B. Trichromatic Color Mixture C. Stiles' and Wald's Increment Threshold Experiments D. Hue Discrimination and Color Blindness E. "Fundamental Yellow," Complementary and Paired Colors, and Neutral Loci III. THE COMPETITIVE THEORIES A. The Trichromatic Theories B. The Opponent Color Theory IV. THE BIOLOGICAL DA TA A. The Photoreceptor B. Color Coding Beyond the Photoreceptors V. A CONTEMPORARY MODEL
460 460 463
Chapter 10. The Neural Coding oE Sensory Quality-Audition I. THE KEY PSYCHOPHYSICAL DA TA A. Frequency Analysis and Pitch Discrimination B. The Pitch of Combined Stimulus Frequencies C. Masking and the Notion of Critical Bands 11. THE COMPETITIVE THEORIES A. Place Theories
424 428 433 437 439 441 443 456 458
463 468 475 479 482 485 486 489 490 491 502 515 523 523 523 525 531 533 533
Contents
xii
B. Pure Frequency, Periodicity, or Telephone Theories
C. Combined Place and Frequency Theories
III. THE BIOLOGICAL DATA A. The Cochlea B. The Response Area at Various Levels of the Ascending Auditory Pathway C. Tonotopy-The Spatial Localization of Frequency Dependent Responses at Various Levels of the Auditory Pathways D. Phase Locking at Various Levels of the Auditory Pathways E. Neurophysiological Data Pertaining to the Problem of Combined Stimuli IV. A CONTEMPORY MODEL
Chapter 11. The Neural Coding of Sensory Quality-The Other Senses and a Summary 1. SOMATOSENSATION A. The Key Psychophysical Data B. The Competitive Theories C. The Biological Data D. A Contemporary Model II. GUSTATION A. The Key Psychophysical Data B. The Competitive Theories C. The Neurophysiological Data D. A Contemporary Model III. OLFACTION A. The Key Psychophysical Data B. The Competitive Theories C. The Neurophysiological Data and a Contemporary Model IV. AN INTERIM SUMMARY
539 541 545 545 560 562 566 571 572
578 579 579 581 586 594 596 596 603 603 610 612 612 616 617 619
Chapter 12. Epilogue-Emerging Principles of Sensory Coding
622
Suggestions for Further Reading Bibliography Index
633 639 667
PREFACE
Why would anyone want to write a textbook? After having done so, I feel impelled to once again ask myself that question. The answer to which I return time after time is that there is a message to be delivered, and it is often difficult to make that message heard at the level of the senior experimental scientisti rather, it is to the young people who are beginning their careers in science to whom this book is mainly addressed-those graduate and advanced undergraduate students who have not yet crystallized their personal answers to some of the basic questions often ignored later in one's career. Briefly, the message of this book is that the relationship between the neurophysiological and the behavioral is a researchable problem, particularly if one works in the domain of the sensory processes. An additional major premise of this work is that the real psychobiological sensory-coding issue is embodied in comparisons made between behavioral and neurophysiological responses rather than between the stimulus and either of these two response domains. Furthermore, it is clear that a more complex set of conceptual issues is involved in this combined aspect of the mindbody problem than either neurophysiology or behavior must deal with separately. An additional fact is that this psychobiological domain must be framed in a more multi dimensional manner than is often implied by some of the more conventional textbooks. Finally (and delightfully), all workers in the field would agree with me that there are still many gaps in our knowledge concerning the neurophysiological coding of sensory processes.
xiv
Preface
Neither this book, nor any other in the foreseeable future, is going to tell the whole story of the psychobiology of sensory coding. I hope it is not too extravagant to express my conviction that this field is one of the most exciting and interesting in modern science and yet also only one new facet of one of the grand and ancient issues in man's perpetual concern with his own nature. As such, its topical relevance in today's world is great in spite of the fact that some might feel that the world we live in has other more pressing needs. I believe in this work sufficiently to be convinced, however, that the world would gradually run out of its in tellectual" fuel" if studies of this and other similarly esoteric" fields were not pursued with all possible energy by a substantially large portion of our scientific community. The subject matter of this book is broad, but it is not intended to be encyclopedic. Rather than an unending list of a host of experimental studies, I have tried, as judiciously as possible, to select the prime examples of research that make key points and then to present them in detail. I.n doing 50, I am sure that I have missed quite a few important contributors, and I offer whatever apologies are appropriate. The content matter of this book must necessarily be selective. Topic selection has been influenced in large part, frankly, by wh at I am interested in as well as constrained and delimited by the main goal-a definition of the relationship between sensory neurophysiology and sensory psychophysics. Thus, some important material has been intentionally omitted or deemphasized. This book is the culmination of a long-term plan. I suppose I first began thinking about doing it in graduate school, but I did not formally begin work on a manuscript unti11967. In the years that have elapsed, it has consumed a large portion of my energies and, as any author might note, there were times when I was not so sure that it ever would be finished. But it is now, and I present it to the community with a sense of completion, but also with a profound sense of obligation to all of those people who have contributed in one way or another to the development of the ideas expressed here or to the preparation of this volume. First,. it is most appropriate for me to acknowledge the enormous debt that I owe my mentors at the Ohio State University. Professor Donald R. Meyer must bear the responsibility for having converted me from physics to physiological psychology and also for alerting me to the grandeur of the mindbody problem. Professor Philburn Ratoosh specifically whetted my appetite for the sensory sciences, and Professor Leo Lipetz introduced me to the delights of the neurophysiological approach. The influence of the late Professor Paul M. Btts, both as a scientist and as a responsible human being, I hope will stick with me for the rest of my life. More recently, I have had the advantages of support from a number of other colleagues and agencies. Parts of this book were written during visits away from the University of Michigan. I enjoyed the hospitality and support of the Laboratory of Sensory Sciences at the University of Hawaii in the winter of 1968-a glorious sun-drenched period for which I am especially grateful to Professor A. Leonard Diamond, who was the director at that time. In 1970-1971 I spent an extraordinarily productive and pleasant academic year among the wonderful people and stimulating intel11
Preface
xv
lectual environment of the Department of Psychology at the University of Western Australia. The cordiality and friendship of Professors John Ross and Aubrey Yates, Dr. Vincent DiLollo, and all of my other Australian friends made that year a memorable one both professionally and personally. For the last decade at the University of Michigan, my research has been supported by the university, in particular the Mental Health Research Institute, and also by federal grants. I am particularly happy to acknowledge continued research support over the last nine years from the Psychobiology Branch of the National Science Foundation and, more recently, a National Institute of Mental Health Research Scientist Award. I have also had the benefit of editorial comments from some very capable people. Professor Paul Witkovsky of the New York University did a Herculean job of providing advice (thus cleaning the "Augean Stables" of some early drafts) in a way which has made this a much better book than it would otherwise have been. Professor H. Philip Zeigler, editor of the Harper & Row Physiological Psychology Series, was indefatigable in his comments and advice over the entire period in which I worked on the book. George Middendorf, executive editor of the Harper & Row College Department, also "helped," to say the least, all of us over so me of the rough spots. Others of my colleagues had read lesser portions or provided specific bits of advice, which I hope I have fully acknowledged in the text. Needless to say, however, none of these people can be held accountable for the deficiencies that certainly will be included even in the final version of this book. The astute reader will also note the influence of many other workers with whom I am not personally weIl acquainted. I often know them only through their writings, and many of them rightfully des erve a more specific acknowledgment than is possible here. Ideas spread through the scientific community along paths that are sometimes hard to identify precisely, and I am sure that, especiaIly in an effort such as this, the multiple effects of other people's work is both profound and quite clear. I hope that each has been adequately cited in the body of the text, although I am sure that there is no way in which this could be done completely. No omission was intentional, but no list of citations could be complete either. If they should ever read this book, such distinguished scientists as Theodore Bullock, Frank Geldard, Ragnar Granit, Henri Pieron, Walter Rosenblith, and Burton Rosner will be able to detect their influence on my general perspective. Perhaps most particularly, I have in many ways been influenced by the prolific and stimulating writing of the late Georg von Bekesy who passed away this summer after a long and distinguished career. Many others share a sense of loss with me at his passing. To those others I may have missed, my apologies and gratitude for helping me toward an understanding of this subject matter. Many students who have been through my course on sensory processes at the University of Michigan have also contributed, though unknowingly, to the development of this book. Their comments, discussions, as weIl as the vigorous controversies which somehow emerged even in this highly technical subject matter have all added to my understanding of this material and, I hope, to the quality of my presentation.
xvi
Preface
The manuscript of this book has been typed (and retyped so many times, it seems) by so many secretaries that some of their names have faded off into the past, but I am especially grateful for the dedicated efforts of Ms. Lynn Gore and Ms. Patricia Eaton who withstood the brunt of the most difficult final period of work on the manuscript. I am also indebted to Mr. Paul Laszlo and Ms. Maureen Powers who assisted and advised in the preparation of the figures. Last but not least, my family, my wife, May, and three daughters, Taneil, Lynet, and Lisa, who have traveled with me to unfamiliar places or simply done without me on those evenings and weekends I became overly engrossed in the manuscript, not only have my affection and love but also my appreciation for making all of this possible. May, better than anyone else, you know that without you this never could have come about. William R. Vttal
CHAPTER 1: AN ORIENTATION
I. INTRODUCTION
A. Psychology, Physiology, and Psychobiology This book is about the acquisition and transmission of information patterns from the external environment to the central nervous system, where presumably these patterns become the substance of our mental life. It deals with a subject matter that thus transcends the conventional borders of neurophysiology and psychology as weH as those of related disciplines such as anatomy and physics. From this diversity of substance has come both the richness of content and the major difficulty in the preparation of a satisfactory text of this sort. It is impossible to conceive of a single volume that- would be capable of comprehensively covering the broad spectrum of theory and data that now make up such a vast literature. This present book, therefore, is selective and incomplete in a number of important ways. It has not been possible to comprehensively cover either the phenomena of sensory psychology or the abundant detail of sensory physiology to the depth desired. On the contrary, this book is built on a single axiom, and only material specificaHy relevant to that axiom has been included. That axiom states that the extern al world is represented within the organism by patterns of neural activity that, while the patterns themselves differ greatly from the external world in terms of the physical energies involved, do maintain some sort of an equivalence of pattern or organization, if not an isomorphism, of neural activity. The persistence of patterns of information, independent of the physical medium or specmc
2
An Orientation
neural dimension in which they exist, is the essential idea of the notion of neural coding. To concretize this idea, let us consider a specifie example of what it is that we mean by equivalence and isomorphism. Consider the problem of auditory pitch. Physical signals, varying primarily in the temporal characteristies of waves of compression and rarefaction of air pressure, are interpreted as sounds of different pitches by a listener. The original signal information is, in this example, in the form of the temporal pattern of the air wave. But as we shall see, these temporal information patterns are encoded largely by spatial distributions of activity in the inner ear. Thus, while the spatial neural patterns convey the same information and are hence equivalent, they are in no sense isomorphie (of the same shape), nor do they even possess the same dimensionality as the stimulus. Rather, they are encoded representations of the original pattern. The concept of coding has in recent years come to be, for sensory studies in partieular, the contemporary expression of a continuing intellectual perplexity: how does the body and its parts relate to, represent, or in some way provide a receptacle for the self-awareness that every man has of his own existence? We shall detail in later parts of this chapter the specifie nature of this new outlook, but for the moment let us consider the problem of mind-body relationships in their more classieal sense. The problem has been considered by some thoughtful philosophers (see, for example, Feigl, 1958) to be a question inadequately stated and inappropriately asked. Certainly, when one considers the fact that the intrapersonal self-awareness that all men seem to possess is not subject to interpersonal examination and comparison, the perplexities of how one can handle the problem of the construction of an internally consistent science of psychology appear substantial. But before one should be discouraged from attempting the task, it should be remembered that in fact most of our sciences-physical, biological, as wen as behavioral-share this same difficulty. The physics of nuclear structure, for example, is a science in which the components under investigation can never be seen directly. We can only infer their existence (and transient indeed they may be) from the tracks of a bubble or spark chamber or even more abstractly from the necessity to balance an energy equation. Astronomy shares this same dilemma. It appears that the facts surrounding the speed of light as an upper limit for the movement of either physieal objects or energetie waves through space make travel to distant galaxies just as unlikely as a direct observation of one man's awareness by another. If the theory that there exists an upper limit of the speed of light is sustained in the future, we shall always be forced to limit our observations of the nature of distant universes to those measurements that can be made on the ancient light signals that have been traveling toward us in some cases for millions of years. It should be remembered that what we may be observing with telescopes now may long ago have ceased to exist. This temporal dilemma, at least, is a problem not faced by psychology. Psychology, a science whose proper content is the set of the inner awarenesses which Feigl (1958) refers to as Raw Feels, is therefore in no way unique from the point of view of invisible subject matter. Psychology, as a science, has responded to the limitations on inter-
An Orientation
3
personal comparison by emphasizing behavioral measures. This is often so, it seems to some, to the exclusion of the subject matter itself. But even though it seems clear that the great initiative for studying awareness or consciousness comes from the individual's self-awareness, nevertheless, in order to provide a reasonable foundation for replicable and organizable observations, it is to the behavior of others that we must turn for our experimental data. This methodological twist often obscures the fact that behavior is not the subject matter of the psychological sciences, but it is rather only an approach to the real content, which is represented by such symbolic terms as consciousness, awareness, thought, perception, or any of the other terms that have gained momentary popularity over the centuries. In the same vein, who would classify astronomy as either the study of telescopes or the study of ancient light? The objects of attention of most astronomers are planets, stars, and galaxies and not the instruments or measurements necessary to acquire information about these objects. Similarly, the objects of attention of psychology are thought, perceptions, sensations, and other aspects of conscious experience. The other major approach to sensory science is the neurophysiological one. The objects of concern to this science are the processes that are performed by the specialized tissues and ceIls of the nervous system. From our particular frame of reference, that of the problem of sensory communication, the objects of attention are the physiological processes of the highly specialized nervous tissues responsible for the transduction, transmission, and integration of incoming information patterns. The scientific accomplishments of the last half century have made it quite clear that it is the brain and the other components of the nervous system to which we must look for a reductive explanation of psychological acts. It may be somewhat surprising to realize how recently in human history this idea gained asolid foundation. We still are aIl too apt, in the vernacular at least, to refer to the "heart" as the seat of emotional awareness. There is another idea which has also been current in our modern infatuation with electrophysiological techniques. This idea is that a physiological investigation of the activity produced in the nervous system by peripheral stimuli is sufficient in itself to explain the co ding processes of sensory systems. Such a notion is probably untrue. A substantial number of misconceptions have been introduced into our understanding of the mind-body problem by this overattentive concern with physiological recording. On the other hand, it is important not to minimize the enormous contributions that have been made using these techniques when they are placed within a psychological framework. It is also true, however, that the psychological approach is, by itself, incapable of providing a definitive answer to problems of sensory coding. In fact, because of the intrinsic limitations of the two techniques when used separately, a combined approach has become more and more popular in recent years. This approach has gone und er many names-biopsychology, neuropsychology, or physiological psychology, but we have chosen the term psychobiology as the one that seems most appropriate to the emphasis we shaIl place on the physiological substrates of sensory experience. Psychobiology represents a merger of techniques from both the psychological and
4
An Orientation
the physiological laboratories. Indeed, we see today behavioral techniques being used by physiologists, and electrophysiological recording proficiency developed to a high level among people originally trained in psychology. It is often difficult to distinguish the original training of a practitioner of the new combined art. The adoption of new techniques in either field reflects the increasing tempo of the interdisciplinary trend in many other sciences. But more important, in this particular field, it reflects the emerging awareness of the significant conceptual pressure that an integrated psychobiology of sensory activity exerts on our perspective of the classic mind-brain controversy. My aim in this book is to tie together the work of the psychologist and the neurophysiologist into a coordinated approach to the problems of sensory experience. Some have pointed out that the psychological approach is a necessary precursor to electrophysiological studies so that the neurophysiologist will know what there is to look for. Neurophysiologists have properly pointed out the inability of the psychological techniques to solve the problems of sensory representation within the "black box" of the closed organism. In fact, both points are correct, and just as a sensory psychology text that ignored the neurophysiological contributions would be meaningless in the terms of today's knowledge, similarly a text of sensory neurophysiology would be incoherent if it ignored the massive contributions of the psychophysicallaboratories. B. The Scope of "Sensory" Sciences It is appropriate at this point that we attempt to define the field of inquiry with which we shall be concerned. Sensory processes in biological systems are defined, in large part, by an intrinsic directionality of the flow of information. The sensory system is designed, as we have said, to pick up patterns of information from the external world, transduce this information from any of a number of physical forms of energy to the electrochemical forces of neural activity, and transmit that pattern toward the complex central portions of the nervous system. The nature of the subsequent processing of sensory information by the higher nervous centers in the exquisitely complicated ways cataloged by the cognitive descriptors evolved by psychologists will not, by intent, be among the topics we shall consider. It is, of course, not possible to draw a sharp line of demarcation between the content of this book and that of these other areas of subjective activity. Indeed, as our understanding progresses, the line should be pushed further and further upward. A cogent argument can be made that the "simpler" sensory processes, with which we shall concern ourselves, are in reality no simpler than the more complex processes. They may represent only a sub set of cognitive functions which, for a number of reasons, are more susceptible to experimental control and probing. Furthermore, we must recognize that in defining the field to be covered by this book, we are specifically faced with the knotty problem of distinguishing between those subject matters that are usually classified as sensory and those that are usually classified as perceptual. It seems most appropriate to ignore any real difference between the two. Rather, for the purposes of this volume, we shall make a very artificial
An Orientation
5
distinction. We shall talk about the sensory phenomena that are closely related to well-defined stimulus patterns and known or plausible neural mechanisms. We shall simply exclude from our discussion those more complex perceptions that are apparently based upon such complex neural interactions and weakly defined stimulus conditions that the reductive aspects of our science have not yet come to grips with them. For example, we shall not deal with the well-known fact that familiarity enhances word recognition. There is no physiological theory that can even begin to account for such a phenomenon. The conceptual and anatomical straightforwardness of the sensory functions has been their great asset in establishing their preeminence in the reductionism of modern psychobiology. We understand information processing better in sensory systems in large part because they are much simpler in both anatomical and functional organization than the systems underlying, for example, such processes as problem solving. They are in some cases so simple that they can be considered characterizations or abstractions of the operation of more complex and general neural nets, just as simple invertebrate "model" nervous systems often help us to understand mammalian neural mechanisms. For this reason, through the study of sensory processes, we are often able to advance our knowledge not only of the immediate problems surrounding sensory coding, but also of the possible modes of action of the more complicated mechanisms. We shall be looking at the sensory systems mainly in terms of their function of communicating information from the periphery toward the central nervous structures. This has been called, by some, communication in the ascending direction (the pun being associated with the notion of the "higher" centers of the brain), even though in some cases the "ascending" nerves may actually travel downward during at least part of their route. Others have used the term afferent emphasizing the inward-going direction of these communication links as distinguished from the efferent lines of the effector or motor systems. The word sensory is always used in this book as a synonym for either input, ascending, or afferent and not-as some would choose-as a synomym for simple perceptions. Thus, it is entirely consistent to speak of sensory neurophysiology or sensory co ding without referring to the psychological connotations. However, this simple categorization is inadequate to handle some recent data. We shall be concerned with information processing between and among units of the afferent pathways in ways that require awareness of such concepts as feedback and reverberation. In fact, we shall also be concerned with some signal pathways that are intimately tied to the sensory systems but are patently efferent or descending. We refer here to observations which indicate that there is centrifugal (from the center) activity which operates to powerfully modulate the sensory input pattern. From a more psychological point of view, the material with which we shall be most concerned will deal with those relatively simple awarenesses of relatively simple stimulus patterns. Psychophysical techniques are in a rapid stage of metamorphosis as the traditional techniques of measuring the responses to brief and transient unitary stimuli give way to more interestingly designed experiments with more elaborately patterned stimu-
6
An Orientation
lus situations. But in spite of this, there is a clearly defined group of experimental paradigms, which will be seen to fit the problem area we have chosen for this book more closely than others involving other more complex, although equally interesting, stimuli. The key test of relevance will be: is a given sensory or perceptual phenomenon an input function with a plausible neural explanation? As we have said, the main goal of this book is to integrate the two sets of data from psychology and neurophysiology into a unified discussion. In addition to selecting a circumscribed topic area-sensory processesfor consideration, we shall also be making a number of implicit assumptions in the development of this book, which should be overtly stated as fundamental premises. One of these premises is our point of view concerning the nature of the inter action between the psychological and the neurophysiological. Our approach is one that would certainly be classified within the general rubric of monisms. It is, explicitly, an expression of the belief that sensory activities, and for that matter all psychological activity, are ultimately reducible to the terms, linguistic and conceptual, of patterns of neural activity. In addition to the premise of monism, which underlies our attempt to integrate the two approach es to the study of sensory processes-the psychological and the neurophysiological-we shall also attempt a synthesis at another level of discussion. Traditionally (Troland, 1930; Geldard, 1953; Wyburn, Pickford, and Hirst, 1964), sensory textbooks have been organized in chapters, which discuss one sensory modality at a time. Although Osgood (1953) and Corso (1967) do partially deviate from this tradition, we usually see conventional chapters on vision separated from those on audition, which are likewise separate from those on the chemical or somatic sensations. It is a further premise of the present work that there are general principles of sensory organization that can and should be emphasized independent of a specific modality. The processes of transduction or of transmission, for example, have features common to all the senses, and these commonalities become self-evident when all of the senses are discussed together. Similarly, the coding of sensory quantity, space and time, or even quality have so many common attributes among the different modalities that we believe it is meaningful to consider these dimensions free of the restrictions imposed by the limitation of our attention to a single modality at a time. It is important to acknowledge at this point that this is not the first book to deviate from the tradition al pattern of considering the senses separately. I have been particularly influenced by the writing of Henri PiE~ron, particularly as he presents his discussion of the sensory processes in his elegant treatise, The Sensations (1952). 11. VARIETIES OF SCIENTIFIC QUESTIONS AND APPROACHES
A. Some Thoughts on the Nature of Scientific Activity As we have said, the general notion of coding provides a framework for consideration of the mind-body problem that is much more concrete than
An Orientation
7
the purely philosophical speculations of the past. But, paradoxically, each of the experiments, which are performed in the belief that they are relevant to the problem of sensory coding, must be considered relatively insignificant and unimportant in itself. A sensory-coding experiment, like any other one, is important only in terms of its contribution to our general appreciation of the system we are studying and in direct relation to its ability to add to or modify the generaloutlook. It is all too easy, in a highly technological problem area such as this, to lose sight of the reason that experiments are performed and to overemphasize the techniques or resulting set of data. The end product of this scientific tunnel vision is was ted effort and resources. It is my belief that the reasons that physiological psychologists do their work are generally misrepresented by their own writings. It is incomplete to say that one is doing an experiment to determine the response function of a given receptor in the eye of the frog, for example, without some realization of the more general problems which surround that purely technical issue. There exists, in fact, a hierarchy of questions which are asked about any scientific issue. In general, those scientists who are unaware of the greater implications of the specific technical questions asked day to day in their laboratories must ultimately stagnate in a pool of trivial and pointless experimental manipulations. Not everyone has to be a full-time phUosopher of science, but all technically oriented scientists must be aware of the conceptual implications of their work. We would all be better men and better scientists if there were a wider appreciation of the great issues that surround the small steps we make in our laboratories. Coding theory, as it represents the major cutting edge into the mindbody problem, represents perhaps the pinnacle of man's contemporary intellectual efforts. There may be fields of science that have made greater technical progress, or ones in which the formal structure is better understood, but there is no other field that more clearly epitomizes the most human of endeavors-the analysis by man of his own intelIect. Just as the field is important, it is also diversified. It is obvious that no single professional scientist is fully capable of the mastery of all of the skills necessary to study the full range of sensory phenomena. Similarly, few of our great insights into sensory processes have been based on a single key experiment from a single approach. Most have been based on an intentional effort to integrate a variety of data bases into a single theoretical structure. The point that is being made is that there is probably no one single "best" way to approach a problem like sensory coding. Indeed, there is probably no single strategy that can be applied to any subproblem in science which is uniquely capable of pro vi ding a solution. Platt (1964) calls for an increase in the technique of hard inference as the single approach to experimental research. Yet one cannot read the eloquent plea of Hebb (1959) for an eclectic approach to science without appreciating the value of multiple approaches to problems of interesting complexity. Perhaps the best argument for multiple approaches to scientific research is the inherent disorganization of the fron tiers of science. The exciting cutting edge of scientific inquiry is not the precisely laid out studies explicated in the modern journal format, but the often confused and disorganized ideas of a
8
An Orientation
man working with an idea for the first time in human history. The simple fact is that the meat of science is the formless and the unknown, and not the known and collectively understood. As fast as new knowledge is exposed and integrated into a community perspective, it diminishes in importance and interest to any scientist worth his salto The best scientists in the business are the ones who eventually lose interest, become bored, and move on to other new areas of inquiry. The role of each new discovery is to change our world as it changes our perspectives and subtly alters our approach to the next problem. Because of this unstructured chipping away at the boundaries of our ignorance, I believe it remains a practieal impossibUity to predetermine whieh will be the best strategy or to make long-range advance plans in the truly explorative aspects of science. If the process of science is to be considered from the point of view of a search rather than as an accomplishment, if the content of science is to be considered as the unknown rather than the known, then it is clear that science can be best appreciated when it is formulated in terms of hierarchieal series of questions rather than as answers or specifie research methods. These questions arise in many different ways. There are some questions whose history is aslong as man's and arose as soon as the unique capability of man to concern himself with his relationship to the world around him appeared. The emergence of self-awareness probably did not substantially predate the awareness that there was a difference between a living and dead organism that could apparently be explained only in terms of some "thing" missing. The observation of the death of a member of one's own species is a powerful force first toward asking questions about life and death and then answering them in ways that do not violate the obvious "fact" of one's own consciousness. Primitive considerations such as this have evolved into the extremely intrieate philosophies of the mind-brain problem. The mind-brain problem is still of interest because it has not yet been answered, not because its riddle has been solved. Physiologieal psychology, in partieular, is formulated more in terms of the quest than in the results. 50 these grand and historie questions can provide one set of motives for the thrust of scientific inquiry. But contemporary surprises and the unexpected findings of some accident of observation also provide a rieh source of motivation for scientific endeavor. This is very often associated with the introduction of a new observational instrument. The invention of the mieroscope or of a new electrophysiologieal recording instrument have both been extremely powerful situational factors in the development of new knowledge by exposing a portion of our environment that had been unappreciated previously. The laboratory use of computers promises to be equally important. New sciences spring up almost immediately as the world around us enlarges. The questions of the past are reformulated in terms of new observations; new approaches emerge that could never have been seriously considered solely in terms of earlier speculations and less effective tools. Another rieh source of questions for science, whieh must not be underestimated particularly in our own pragmatieally oriented world, is the search for the solution of specifie problems that face mankind from time to time. How can we cure this disease? How can we provide enough food for
An Orientation
9
our population? How can we build a device capable of operating at 4000°F? All of these issues, in addition to their applied engineering aspect, bring a larger group of people into contact with sets of natural phenomena. They thus increase the possibility of discovery as they open new vistas. We can, therefore, discern three major sources of motivation for scientific research: 1. Classic issues of speculative philosophy. 2. The unexpected discovery from an undirected exploration of the
general environment often due to the introduction of a new instrument. 3. The attempted elimination of some clearly identified obstacle to human comfort, health, or happiness.
It is unlikely that any of the three can be given priority in the past motivation of science. Nevertheless, it is possible to distinguish general stages in the development of any particular scientific inquiry. These stages must be considered only in their historical sense, for no individual scientist progresses rigidly through the sequence any more than any experiment actually fits into the rigid format of today's standard scientific journal article. Different scientists work at different levels of this paradigm at different times in history . To illustrate the nature of the proposed hierarchy of levels of inquiry, let us consider a major problem within the field with which this book is concerned. We have chosen a scientific problem, which has really been motivated by all three of the sources mentioned above. The problem of color vision has had a long history. It originally was formulated in terms of the philosophical issues of the mind-brain problem. But later, the anomalies of color blindness were noted, and the facts of color addition were observed and formulated. 1 More recently applied technological motivations have led to additional effort to explain the phenomenon of color vision. But whatever the source of the effort, the first question in the hierarchy is: how do we see colors? This is a question that is, however, rarely overtly asked by modern students of color vision. They are more often operating at a finer level. For example, the next lower echelon of questions might weIl be described in terms of different receptor theories. One might ask the question: are there separate red and green receptors? This is a relatively technical issue based upon a specific hypothesis or prejudgment about the nature of color receptors which has gone far beyond the general philosophical notions originaIly being considered. Later, both historically and in the individual experience, even more detailed questions must be asked; for example, what is the nature of the inter action between the different kinds of receptors? Finally, we find the most specific questions of all formulated in terms of the solution of some 1 The formularization of a specific series of rules is not, of course, aprerequisite for the implicit utilization of the rules of any subset of them. Artists have been aware of the laws of color and visual space and have been able to use both for tens of thousands of years without explicit formularization.
10
An Orientation
very specific issues of measurement. How can we measure some particular phenomenon? What is the effect of manipulating independent variable xl The hierarchy of questions might be carried on to even more detailed levels. The important point is that there are two ways to consider such a hierarchy. The first is the historical one already expressed. As we gain more and more information about a specific problem, we tend to deal with more specialized subportions of it. On the other hand, the other way to look at the hierarchy is that there are, at any given instant in time, people working simultaneously at several different levels. We still must have our philosophers of science even if their profession is currently in somewhat low regard among the scientific pragmatists. We certainly also must have plenty of the instrument makers at the other end of the hierarchy. The unfortunate thing is that a worker, whose commitment is to one or another level, loses sight of how dependent he is on progress at another level. We must reiterate, at this point, the notion with which we started. The approach es to science are manifold. The sources of the motivations of scientists are neither simple nor single. A very distinguished sensory psychobiologist, Georg von Bekesy (1960), has made a somewhat tonguein-cheek listing of the approaches to science. First, he distinguishes the theoretical approach in all its formal majesty. This is usually the format presented to scientific neonates. He then contrasts this theoretical approach with the more realistic mosaic approach, in which problems are attacked' as they arise with any available tool. Von Bekesy goes on to discuss a number of different types of questions ranging from the "dassical questions still unsolved" to the "embarrassing question frequently arising at a meeting and serving no useful purpose." To his list we might also add the "greedy approach" reflecting the motivation of some scientists to pursue their research not for its own intrinsic sake, but for the potential value it might have in helping them achieve some irrelevant ambitions and desires. But even this set of "ulterior motives" must not be completely deprecated. They, like many of the other sources of scientific effort, have been effective in providing incentives for scientific research and have often contributed to our understanding of nature in surprising ways. B. A Hierarchy of Questions of 5ensory Psychobiology So far we have dealt with a number of very diffuse issues. We have stressed the inherently chaotic and disorganized nature of the true scientific frontier. We have also stressed the great variety of questions that scientists ask about the unknown. Now let us turn our attention to another set of questions based on a far more concrete model. This other hierarchy of questions concerns the sequence of physiological steps occurring in the afferent pathway. The hierarchy is one which is constantly changing as our knowledge of the sensory processes becomes more and more complete. Later in this book we shall very specifically define the nature of the neural coding processes in the afferent chain. We shall point out that some of the most serious problems faced by sensory psychobiologists are inadequately precise definitions of the question that he is asking and the level at which he is asking it. As our knowledge becomes more detailed, it becomes dear that
An Orientation
11
what had once been considered as a single process might later be interpreted as two or more distinguishable subprocesses. We shall now list some of the more important of these questions concerning the sequence of physiological processes of sensory transduction and transmission. The general guiding sequence will be the afferent one we described earlier, starting from the most peripher al measurements of the physical stimulus and working toward the central nervous system. These are general questions, which seem meaningful to ask today about any sensory modality and which must be answered for each sense in order to complete a description of its function. We shall briefly mention some terms, which will be discussed in great detail later in this book. For the moment, the definitions are unavoidably vague, and the words are only introduced as a necessary means of formulating the questions. 1. What is the nature of the stimulus? Each sense organ is designed to be maximally sensitive to a certain type of physical energy. The description of the range of the physical dimensions of the most effective stimulus is probably the first question that must be asked of any sensory modality. The answer to this question may come either from psychophysical or electrophysiological studies. 2. How is the physical energy transmitted through the accessory sensory
structures to the site of transduction ? Most sense organs are designed to transmit the physical stimulus to the transduction site by means of accessory and nonneural apparatus, which do not alter, in any substantial way, the nature of the physical energy of the stimulus. For example, the mechanical compression waves, wh ich we call sound, are transmitted to the cochlea by means of a guiding and collecting horn-the external ear, a system of levers and membranes-the middle ear, and finally by a hydraulic medium in the inner ear. The nature of the physical stimulus-mechanical energy-remains, however, the same throughout. There may be some modifications of the pattern of the stimulus by certain selective filtering properties, but it is still mechanical energy with the same physical dimensions that originally arrived at the external ear. Similarly, the optical properties of the eye, while important in defining the visual sensations, do not alter the nature of the photic stimulus-only the relative amounts of each wavelength that ultimately arrive at particular points in the retina. 3. What is the site of the primary sensory action? This question refers to
the precise location at which the primary sensory action for each of the sensory modalities occurs. Surprisingly enough, it is not yet clear exactly where incoming stimuli exert their influence when one considers the issue at the level of the microanatomy of individual receptor cells of the various senses. The question has been answered with varying degrees of completeness in the different receptor systems. One of the most important developments in the cutaneous senses has been the observation that there are highly specialized areas of the membrane of the terminal portion of an axon that behave differently than adjacent regions with apparently identical structure. In the auditory sense, the most interesting current hypothesis is that
12
An Orientation
the site of action is the cuticular plate at the base of the cilialike hairs. In the chemical senses, we have only the beginnings of the information needed to conclusively specify where in the ceH membrane the critical effects occur. This question is a fundamental one and clearly distinguishable from the others that precede and foHow it. 4. What is the nature of the initial energy absorption process underlying
transducer action? AH receptors perform the process of transduction converting one form of energy to another as a result of the primary sensory action. The actual transduction process involves a number of steps, the first of which is the alteration of the chemical equilibrium of the receptor by the physical energy of the stimulus. The nature of the actual process underlying this disequilibriation must be defined and analyzed. As examples, the sequential decomposition of large organic molecules has been identified as the key process for the visual receptor, and the alteration of hair ceH membrane permeability by purely mechanical deformation has been implicated in the mechanoreceptor action of touch receptors in the skin. 5. How does the energy absorption process lead to the generator potential? One of the most exciting discoveries of re cent years in the sensory sciences has been the elucidation of the generator or receptor potential mechanism of sensory receptors. (See the Introduction to Chapter 4 for a discussion of the differences in receptor and generator potentials.) It is now clear on the basis of a variety of experiments in the various modalities that the propagated spike action potentials conducted in long axons are actually always produced through the medium of some precursor potential of quite different properties. This precursor potential in receptors, which has come to be caHed the generator or receptor potential, is the first electropotential produced in the transduction process. It turns out that although we know fairly weH what the energy absorption process is for most senses, the actual manner in which this absorption leads to the generator potential is, in general, one of the least-well-known stages in the sequence of processes that we have been discussing. The elucidation of the mechanisms responsible for the productions of the generator potentials wilfbe one of the main issues of inquiry during the next decade for sensory neurophysiology. 6. How are propagated action potentials generated by the generator or re-
ceptor potential? The rapid developments in our understanding of the ionic chemistry of nerve ceH membranes and the nature of the regenerative spike action potential, which can be propagated over long distances, has also been accompanied by new insights into the processes underlying the way in which the generator potential pro duces spike action potentials. The intervention of synaptic effects makes this one of the most interesting and wellknown parts of the story. 7. How are information-bearing electropotentials propagated from peripheral portions of the nervous system to other more central portions? This question is seemingly too general for the fineness of the sequence we have
An Orientation
13
been describing. However, we are referring to a more restricted notion. The question of specific interest here concerns the mechanism of spike action potential propagation over the long axons evolved specifically for transmission purposes within the body. We shall only note here that there has been an enormous amount of progress in our understanding of spike action potential propagation throughout the nervous system, and for the purposes of sensory-coding theory, it may even be possible to go so far as to note that a sufficiently detailed answer has been given to this question. 8. How do signals cross from one ceU to another at synapses, the points of anatomical discontinuity between neurons in the afferent chain? Synaptic action is, in itself, also a long and exciting story, which is gradually becoming understood. However, it seems as if synaptic action is one of those issues that are in the process of breaking apart into a number of more finely defined subquestions. Of special interest to us is the fact that there appears to be a very strong analogy developing between those portions of the synaptic process that occur on the upstream side of the synapse and the action of the peripheral receptor itself. It is only necessary to substitute a few words like "postsynaptic" potential for "generator" or "receptor" potential for a similar story to be told for each. The preceding questions are issues of peripheral action and transduction and, in part, represent the subject matter of the first part of this book. There is, of course, also a host of questions which pertain to the central processing of this information once it arrives at the brain. For example, questions of the organization of the cerebral cortex and of the pathways intercommunicating among the various central brain structures are under active scrutiny. Similarly, the large dass of psychophysical questions concerning the overall information-processing characteristics of the entire afferent and interpretive systems represents another major grouping of significant issues. Unfortunately, we have not been able to cover all of this and other material of equivalent importance to the depth some would have preferred. Answers to questions 7 and 8, unfortunately represent the main part of an entirely separate text. What we have done is to emphasize another specific question, that of co ding or representation, to the maximum possible extent. The archetypical question might be formulated as follows: how are the patterns of information defined by the external stimuli represented at each of the levels of neural processing? And, as a corollary of this general question, what are the neural events thatare used as symbols in this representation? These latter two issues represent the main content of the second part of this book. How we have put this all together can be best introduced by reviewing its general plan.
III. PLAN OF THE BOOK This book is divided into two major parts. The first part provides what is believed to be a relatively complete foundation of the basic and relevant physical and physiological materials necessary for a complete and indepth understanding of transduction. The second part is devoted to an analysis
14
An Orientation
of the specific coding patterns that have been associated with various attributes of sensory continua. Now let us consider these two sections in detail, looking at the specific goal of each of the constituent chapters. The first section of this book is introduced by Chapter 2, an introduction to the basic physics of external stimuli. We classify stimuli in this chapter according to the conventional taxonomies of physics and point out the ways in which stimuli are generated. But most important for later materials we define the metrics used for specifying the amount and kind of each form of physical energy. We emphasize in this chapter the importance of using our weIl-established physical measures of energy as the starting point in any sensory experiment whether psychological or physiological. Chapter 3 is dedicated entirely to neuroanatomy with special reference to the structure of the sensory systems. After introducing some of the techniques used by the modern neuroanatomist (with particular attention to modern developments in electron microscopy), the gross anatomy of the nervous system is presented to provide a frame of reference for later discussions. This chapter then collects together, in a single location, wh at are considered to be some of the finest drawings and photographs of receptor structure. Each modality is presented separately, first at the macroscopic level of organization and then, with increasingly magnified micrographs, in greater detail of fine structure. It is hoped that this brief atlas will not only introduce the various structures in an orderly way, but also emphasize the common anatomical features of this set of remarkable receptor cells. An added feature of the chapter is a set of drawings and discussions of the relay points in the ascending pathways from each modality. So much of the material in later chapters will assurne knowledge of their anatomy that it was feIt that it would be judicious to present it in a coherent form in this chapter. The next chapter (Chapter 4) deals specifically with the transduction process or, as some have called it, interenergy transfer. The general problem area is one to which we have already alluded several times. Physical energy is transformed by the receptor organ into electrochemical energy. This chapter is concerned specifically with the processes that are thought to underlie the transformation action up to the stage at which the propagated spike action potential is generated. This chapter treats each of the senses individually for what are believed to be sound biological reasons. It is at the site of transduction that one modality differs more from another than at any later level. We mention the nonneural modification of physical stimuli by the accessory portions of each of the receptor organs. There is a wide variety of processes that tend to alter the physical stimulus be fore it arrives at the ac tu al transducer tissue. It is important to consider these nonneural processes in terms of their simple physics rather than in the pseudophysical sense implied by aredefinition of the "proximal" and "distal" stimuli. The material 50 far presented has been intended to provide an introduction for the second and main part of the book. In this second part, our thesis that there is an orderly relationship between the sensory experiences and the underlying pattern of neural activity is explored and developed in
An Orientation
15
detail. The first chapter of this second section (Chapter 5) presents a statement of what it is that we mean by coding. We distinguish between those discriminable sensory dimensions common to all the modalities and those possible parameters of the neural language that may be associated with them. But the association of a specific neural dimension with a specific parameter of experience is a difficult and sometimes treacherous undertaking. We, therefore, point out some of the hazards faced by the psychophysiological cryptographer as he attempts to unravel the neural code from the multidimensional uproar of activity. The next six chapters present our view of what is an up-to-date statement of the current views of coding for quantity, time and space, and quality, respectively. Chapter 6 presents the case for the coding of sensory magnitudes. It is initiated by a discussion of the relevant psychological data which must be explained. While it is clear that we cannot completely solve each and every problem or unequivocally define each and every code, enough data have accumulated in the last 20 years to provide at least a tentative statement of the significant coding variables and particularly to emphasize some of the most important consistencies between the psychological and physiological studies of the response dynamic. We then survey the neurophysiological data for as many of the senses for which data have been obtained as a function of variations in stimulus intensity, noting especially the peculiar status of quantity co ding in the auditory modality. Stimulus-intensity, response-magnitude relations are considered for the generator potential, peripheral nerve, and central responses in each case. Based on this data, an answer can be proposed to a very important question: at what level does the major nonlinearity of the neural response occur? FinaIly, to illustrate a general method of inquiry as weIl for its own substantive sake, we consider whether or not interval irregularity is a "true code" for the representation of sensory magnitudes. Chapter 7 presents a varied mixture of studies, which illustrate how drastically our attitudes toward the coding of space and time have changed in recent years. We speak, in this chapter, of such phenomena as spatial localization and the inter action of tempo rally separated stimuli as examples of the loss of fidelity inherent in sensory processing. It is also in this chapter that we discuss the elegant research on the physiological basis of spatial interaction carried out by a large number of physiologists stimulated by the works of H. Keffer Hartline. We then turn to the perception of time, particularly emphasizing some of the newly em erging awareness of the limitations of time as a neural code. We then, in Chapter 8, discuss what has come to be a most important field of modern sensory psychobiologyhighly selective neural responses to specific spatio-temporal features of the stimulus pattern. Finally, since so often this type of physiological finding has been misapplied to the explanation of perceptual phenomena, we then discuss, in a critical vein, the relevance of this microscopic neural phenomena to molar behavior. Chapter 9 presents the various theories of quality coding for vision. It is introduced with a discussion of what it is that we mean by a quality. The dictionary definition that quality is that discriminable difference which remains after one accounts for intensive and extensive dimensions in time
16
An Orientation
and space is weak, yet it hardly seems possible to give a more compelling definition. We first consider some of the key psychophysical data and the resulting alternative theories wh ich have been proposed over the years. Not all theories of visual quality coding, which have been proposed over the years, have a neurophysiological basis. Some are merely descriptive and are not therefore germane to the topic of this book. But for those that are relevant, an attempt is made to extract the key neurophysiological elements so that we are really sure wh at the controversies are all about, at the level of discourse in wh ich we are engaged. Having done so, we then review the neurophysiological data and on that basis describe wh at is believed to be the best possible contemporary theory of quality coding. Audition is discussed in Chapter 10 in the same way, and the other senses in Chapter 11. In concluding Chapter 11, we seek the generalities common among all of the senses and specifically discuss the ways in which Müller's classic law of specific nerve energies has been modified under the impact of these modern findings. Clearly, the problems of quality co ding are among the most complex to be discussed in this book. While the physics of the stimulus helps us to order our thoughts ab out visual and auditory quality coding, we do not have appropriate or simple physical dimensions on which we can found our analysis in the somesthetic and the chemical senses. There is no simple analogue in somesthesis, gustation, or olfaction for the roles played by wavelengths of the photic stimulus or the frequency of the acoustic stimulus. In Chapter 12-The Epilog-we sum up the various facts and theories that we have considered so far in the book. We ask the question: is a unified theory of sensory activity possible in the context of our current state of knowledge of sensory processes? And finally we present a set of emerging "principles" of sensory coding, which help to further sum up current knowledge concerning the neural representation of sensory messages. The reader should now appreciate that this book covers a wide swath through the sensory psychobiological literature. He should also be made aware of the fact that a single volume can, at best, only do this superficially. To emphasize the point, nothing could be more effective than to note that at ab out the same time that this book is published, one of the major publishing events of the century will also occur. A many-volumed handbook of sensory physiology will begin to appear from the presses of the SpringerVerlag Publishing Company of Berlin, Heidelberg, and New York. The full extent of this project, which will probably not be fulfilled for some years, is not yet clear, but it is evident at this point that there will be at least 20 volumes, each of approximately the same size as this present text. Obviously, this important handbook will be a virtually inexhaustible and valuable source of information for those who want a much more complete story than this much more modest text can possibly offer. Equally obviously, however, such a handbook is hardly the place to start for the student who is being newly introduced to the psychobiology of sensory coding. There are, furthermore, a number of important and fundamental areas which we have not covered in detail. For example, if it had not been for the fact that there are numerous good summaries of psychophysical
An Orientation
17
techniques, it would have been necessary for this book to cover that ground too. There are many instances in which we have assumed a knowledge of the procedures and philosophy of psychophysical experimentation, wh ich at least some of the readers of this book may not have. A particularly good discussion, which covers this important material, is to be found in Chapter 7 of Corso (1967). Some readers will also miss a review of the relevant electrochemistry and neurophysiology of transmission in axons and across synapses. Indeed, the draft of this book included chapters on these topics. Limits on the size of the volume as finally published required that they be deleted, and therefore the reader is directed to other sources (for example, Thompson, 1967; Katz, 1966; Stevens, 1966)2 for modern surveys of the relevant material. 2 More detailed discussions of this material can be found in D. J. Aidley. The Physiology of Exeitable Cells. Cambridge, 1971; V. B. Mountcastle (Ed.) Medieal Physiology, VII. Mosby, 1968; or the distinguished volume by K. S. Cole. Membranes, Ions and Impulses. University of California, 1968.
SECTION ONE: FUNDAMENTAL MATERIALS
CHAPTER 2: THE NATURE OF PHYSICAL STIMULI
I. INTRODUCTION
A. What is a Stimulus? Now that we have set the stage for our discussion by placing the scientific and technical aspects of sensory psychophysiology in their appropriate philosophical context, we can begin our journey up the ascending pathways. We must start, however, with some nonphysiological facts, for it is always some inorganic aspect of the extern al world of physical energies that is the precursor for each sensory response. Though sense organs in many cases do respond without the presence of some identifiable external physical action, we shall not be concerned with such "spontaneous" or "persistent" activity for the moment. Rather, in this chapter, we shall try to provide the necessary foundation material so that the reader will be able to interpret the metrics used for the specification of stimuli. The word stimulus has a very specific meaning in sensory psychobiology. It is defined as a pattern of physical energy, which produces activity in the sensory pathways. There are a number of restrictions implicit in this definition, which we should discuss explicitly. First of all, note that we have not said that a stimulus is a pattern of physical energy that is merely capable of producing activity. To be a stimulus, the physical energy must actually produce activity. We, therefore, must distinguish between those patterns of physical energy that are potential stimuli and those that are actual stimuli. Potential stimuli are physical energies that lie within the range of sensitivity of some receptor organ, but that have not yet produced activity in the neural portions of the sensor. Actual stimuli are those poten-
22
Fundamental Materials
tial stimuli that actually have been transduced into forms of neural energy. This chapter, within the scope of these definitions, is concerned with potential stimuli, for we shall be discussing the nature of physical energies as measured with instruments independent of their possible biological effectiveness. An important point made in the definition of an actual stimulus is that it must produce some subsequent neural response. The organism is constantly exposed to a wide variety of physical energies, not all of which produce electrical activity in the receptors and transmission pathways. Furthermore, not all of these neural responses are either behaviorally or metabolically significant. There appear to be some important neural mechanisms whose function is to gate or allow only a small number of these signals to actually arrive at the level of consciousness or awareness, or to otherwise affect the function of the organism. Those that are not so allowed are actual stimuli only in a limited physiological way but not psychologically. Electrophysiologists might be unhappy with this definition, but it is necessary for psychobiologists to further restrict the meaning of the term stimulus to behaviorally significant ones in order to avoid having it take on such an enormous variety of meanings that it loses allspecific meaning. Another important point to note concerning the definition of a behaviorally significant stimulus is that we have made no specific limitation on the temporal dimension of the term subsequent. The response, which is engendered by a stimulus, may be relatively immediate-occurring within a few milliseconds of the time the signal arrives at an appropriate anatomicallevel-or it may be greatly delayed. Those delays may be as great as several decades. Some of the experiments conducted by Penfield and Roberts (1959) have suggested that when the brain is appropriately stimulated, there may be detailed recall of sensory messages that were recorded many years previously. Another significant restriction in our definition is that the mere presence of physical energy does not always constitute a behaviorally effective stimulus even if the physical energy is well within the limits of sensitivity of the appropriate sense organ. The physical energy, in most cases, must be patterned or modulated both temporally and spatially to be effective. The notion of pattern is interjected for very specific empirical reasons. Continuous or unchanging modes of physical stimulation seem to lose their efficacy as stimuli very quickly, unless there is some fluctuation of one or another of their dimensions. This fluctuation may be due either to the nature of the stimulus or be introduced by the receptor organ itself. For example, Riggs, Ratliff, Cornsweet, and Cornsweet (1953) have shown that visual patterns, projected into the eye in such a way that they do not move with respect to the retina, te nd to disappear very quickly. Their interpretation is that the natural eye movements (saccades) are very important in making a pattern of physical energy effective as a stimulus, because they modulate the energy falling on any given receptor. It is not at all clear why the fading of the stabilized image should take pI ace, but it is clear that in the later stages of fading, the same pattern of physical energy is no longer effective as a stimulus. Furthermore, we shall be discussing instances later in this book in which physical energy patterns that differ
The Nature oE Physical Stimuli
23
only in their shape, speed, or direction of movement may be either vigorously effective or totally ineffective (at least with regard to certain specialized neural elements) purelyon the basis of that spatial or temporal difference. These are the sorts of phenomenon that make us restrict and limit the meaning of psychobiologically effective stimuli.
B. The Logical Necessity for Dealing with the Potential Stimulus as an Initial Reference In any formal logical structure, there is a set ofaxioms or fundamental beliefs which is used as a starting point for discussion. We do not usually identify all of the axioms upon which we build our formal system. In science this is no less true, even though there is usually at least an illusion of a dearly defined set of definitions and basic measurements which are assumed to be commonly understood by all conversants. We spend a large amount of our collegiate experience learning about the parameters of these basic measures and assumptions. For each of the sciences there may be quite a different set of these axioms. For example, in chemistry many phenomena may be explained even though the unit of discussion is no more detailed than the atomic structures and their associated laws of combination. In sensory psychobiology it is particularly necessary to have a c1early defined starting point, since the general subject matter is one wh ich, as we pointed out in Chapter I, is defined in terms of a natural sequence of events which occur in the afferent direction. So where do we begin? What is the set of references to which we can turn when we measure the responses of either some neural step in the chain, or even the most complicated psychophysical judgment? In spite of the fact that we have defined stimuli in terms of their effectiveness in generating a subsequent response, it is only fair to say that in many cases this is not an operational definition. This is so because it is often not possible to specify whether or not a response has actually occurred, or will ultimately occur. Thus, the theoretically pure definition of an actual stimulus cannot provide the concreteness we need in the day-to-day manipulations in the laboratory. The notion of a potential stimulus, however, can be so used. The discussion, which makes up the greater part of this chapter, is therefore a discussion of potential stimuli. It is a discussion of physical energies as they exist, are generated, and are measured in the external world prior to their reception, modification, and transduction by the receptor mechanism. The dass of potential stimuli, of course, enlarges or contracts occasionally as new receptor mechanisms, wh ich are sensitive to unexpected forms of physical energy, are uncovered, or when known receptors are shown to be sensitive to narrower or wider ranges of known potential stimuli. Nevertheless, we feel that it is a better starting point than some confounded measure of actual stimuli. This chapter is, therefore, really a short course in the physics of potential stimuli, independent of their subsequent sensory effects. Nonetheless, the specific subject matter is selected on the basis of known receptor sensitivities. A precise definition of physical energy patterns is the only meaningful place to start our discussion of sensory systems. It makes little sense, for
24
Fundamental Materials
example, to add the artificial dichotomy of the proximal or distal stimuli in the light of present-day knowledge. The use of these terms was an attempt to emphasize the fact that the physical energy that falls upon the surface of the receptor may be modified in many ways by the time it arrives at the transducer itself. There is, in many cases, a substantial difference in energy content of the stimulus, though usually not in energy type, as a function of passive absorption and modifications performed by the nonneural accessory structures of the various sense organs. We prefer to emphasize the same point in a different way: we shall view the problem as one of aseries of energy modifications and transformations, in which the passive properties of the accessory structures are considered in their own right, just as much a biological phenomenon as the neural transduction processes themselves. Throughout this analysis it must be remembered that the actual transduction mechanism has not been definitely identified for all receptors, and many questions remain concerning the specific mode of action of any given stimulus. For these reasons, and as the reference point from which all other sensory phenomena must be considered, it seems that the only choice is the potential stimulus-the physics of the variety of biologically significant energies. C. The Notion of the Adequate Stimulus Having defined a stimulus and distinguished between actual and potential stimuli, it is now necessary to turn to another definition, which has long played an important role in sensory psychobiology-the notion of the adequate stimulus. The adequate stimulus is defined, for each sense modality, as the type of physical energy to which the particular receptor is most sensitive. Thus, conventionally, photic energy was considered to be the adequate stimulus for the eye. However, as we have learned more and more about the receptive range of a given receptor, the notion of the adequate stimulus has changed. In vision, for example, it turns out that "photic energy in general" is not a term that has sufficient precision to add very much to our description of the physical process. Rather, it has been found that there are different types of visual receptors, each responsive to a slightly different band of wavelengths of electromagnetic energy. Thus, the adequate stimulus for a long wavelength-sensitive receptor in the eye is, for all practical purposes, a band of wavelengths less than the total visual range, but greater than a single wavelength. The specialized visual receptors may, therefore, be said to be relatively broadly tuned, but yet not sensitive to the full range of visual stimuli. In actual fact, the efficiency of the "quantal catch" for any photopigment is never completely zero for any wavelength of light, visible or not. There is always the possibility of some absorption if the intensity of the photic stimulus is high enough, but the approximation of finite bandwidths for each of the visual pigments is a good one for most of what is known about human vision. Later in this book we shall discuss the important implications of this recurring notion of broadly, yet not infinitely, tuned receptors, but for the moment consider a definition of the most ade qua te stimulus in terms of a well-specified, moderately broad range of physical energies.
The Nature of Physical Stimuli
25
Closely related to this notion of the ade qua te stimulus is a law of sensory activity, which has been only infrequently challenged since it was put forth by Müller in 1840. This is the famous law of the specific energies of nerves. This doctrine states that the sensation resulting from stimulation is a result of the activity of a given set of nerves, rather than the nature of the extern al stimulus. A modern corollary of this notion is that regardless of the nature of the stimulus, activation of a given nerve produces a sensation characteristic of that nerve. The aU-or-none law of nerve spike action potentials, is also closely related to this doctrine. It states that the amplitude and duration of the pulsatile response of an axon are independent of the stimulus characteristics once the threshold of activation is exceeded. There is reason to suggest, however, that the time has co me to modify the connotations, if not the denotations, of the specific nerve energy doctrine at the individual cellular level. The reasons behind such a change arise out of the accumulating mass of data, which indicate that most receptors are broadly, rather than sharply, tuned. Furthermore, there is additional evidence that suggests that it requires a comparison by some decoding mechanism of the relative amount of activity in two or more receptors to establish the quality of a stimulus. Thus, the information conveyed by a single neural element is not sufficient to define a unique stimulus quality. The implications of these developments may profoundly affect the contemporary extrapolation of Müller's law to the idea of single ceU specificity. H no single element could carry information ade qua te to define a sensory quality, then in this sense Müller's law is misleading, for implicit in the modern view of his doctrine is the notion that the particular neuron or the "pI ace in the nervous system" is the critical code for sensory quality. H, on the other hand, this is not the case, but if in addition or instead a comparison of the rate of activity in two or more loci is more important, then we do not really have the pure place coding of quality that is implied by MülIer's law. Rather, a mixture of spatial and temporal comparisons is required, which is quite a different concept. The tale of the particular dimensions of neural activity, which are responsible for co ding sensory quality, is one we shall tell in detail in Chapters 9, 10, and 11 but the points we have made in this section are important adjuncts to our appreciation of wh at is meant by the term stimulus. Now let us consider another aspect of stimulus specification-the dimensions of the physical energy itself. This consideration will be formulated in terms of a brief review of the physics of various forms of energy. For each of the types of physical energy that has been shown to be a potential stimulus for one or another sensory modality, we shaU consider the measures that are used to define the kind or quality of the stimulus when applicable, and the typicallaboratory sources of the physical energy for experimental purposes. There is no particular order to the discussion other 'than the one used in the traditional elementary physics course. Table 2.1 is presented as a general preview and summary of much of the material contained in this chapter on stimuli. In addition to listing the various kinds of physical energies in this table, we have also briefly noted the nature of the adequate stimulus, the metric of quantity or amount,
26
Fundamental Materials
TABLE 2.1. THE DIMENSIONS, SOURCES, AND UNITS OF POTENTIAL PHYSICAL STIMULI Adequate Stimulus
Sensory Modality
ca
.2 c:
Voltage
Current
Time
HGURE 2.16 Diagram showing the shape of the current produced in a circuit that has both resistive and capacitive components (like the skin) by the application of a perfectly square voltage pulse. This differentiation of the current waveform is the reason that constant current stimulators must be used in electrostimulation experiments.
Probably, as so often happens, this argument is spurious, since voltage and current are directly related by an impedance coefficient. What is important is that the stimulus be adequately defined and controlled so that experimental conditions are repeatable. For these reasons, rather than any subtle special property of current or voltage, most experimenters have chosen to define their electrical stimuli in terms of current. Applietl. voltage pulses are extremely susceptible to distortion due to the impedance characteristic of the tissue or electrodes. Therefore, while applied voltages may be constant, the current through the tissue and, therefore, the voltage across some interior portion of the tissue may vary considerably. The difference between the waveform of an applied voltage pulse and the current produced in a capacitive load is shown in Figure 2.16. The most desirable electrical stimulus is, therefore, a constant current applied through some appropriate electrodes. Thus, even if the impedance of the skin or of the electrodes does vary, the current flowing through the underlying tissue will still be adequately defined. The timing and shaping of electrical signals is a subject within the
The Nature of Physical Stimuli
73
realm of electronic instrumentation and is a long and detailed story in itself. Assuming the availability of a given waveform, the key issue for sensory psychophysiology is how one goes about connecting these waveforms to the receptor tissue. There are several further technical issues with which the experimenter must be concerned when he specifies electricity as the experimental stimulus. First of all, very often electrical stimulation is used in conjunction with very-high-gain preamplifiers, whose purpose is to detect and amplify the resulting neural signals. The signals from biologically active tissue may be, at the very largest, a few millivolts, and are more likely to be of the order of a few tens of microvolts. The stimulus voltages associated with the typical impedance conditions of the electrode-tissue interface may, however, be as large as a couple of hundred volts. This means that the very sensitive amplifier tuned for extreme sensitivity to those low-Ievel biopotentials would suddenly be exposed to these enormous stimulus amplitudes. In most cases, high-gain amplifiers simply "block" or are saturated with such high signal levels for periods that may last up to several seconds. The best way to protect the detecting amplifier and prevent this blockade is to isolate the electrical stimulus from the amplifier electrically. This can be accomplished by being sure that the voltages of the stimulators are not referenced to the same ground used by the preamplifier. This process is called voltage isolation. Thus, the amplifier will be unable to sense the stimulus voltage except for the small amount due to capacitive leakage to the common ground. Electrical stimulus isolation can be accomplished with special trans formers, or by radio-frequency isolation units, various kinds of which are now commercially available. This latter type of unit, originally developed by Schmitt and Dubbert (1949), is actually a very small radio transmitting and receiving station built into a common circuit. The Schmitt isolator accomplishes the same function that the transformer does, isolating the stimulus from ground by transmitting a ground referenced signal across an air gap, but differs from the trans former in that it actively generates the high-frequency carrier signal. With constant current regulation and stimulus isolation both present, the electrical stimulus is now ready to be presented to the subject. The final task is the selection of appropriate electrode or contact materials. It is at the interface between this electrode and the biological tissue that many of the most serious obstacles to efficient and controlled electrical stimulation occur. Stimulating electrodes may be classified into one of two types. The first category includes metallic electrodes, and the second category includes electrolytic solutions. The latter group is most commonly made of a solution of some salt, while metallic electrodes of zinc, copper, stainless steel, silver, platinum and, among others, tungsten have been used. But even metallic electrodes are often used in conjunction with some sort of a salt solution or jelly to decrease the contact resistance. Silver, tungsten, and stainless steel are the most often used metal stimulation electrodes. Figure 2.17(a) is a photograph of an elegant set of stainless steel stimulating electrodes developed by Gibson (1968) for electrocutaneous stimulation. Electrolytic electrodes can be very simple. Figure 2.17(b) is a photo-
Fundamental Materials
74
(a)
(b)
(c)
FIGURE 2.17 (a) An elegant set of electrodes used for electrostimulation of the skin (courtesy of Dr. Robert Gibson). (b) A simple electrical stimulator consisting of two test tubes of body normal NaCl solution into which the fingers are inserted. Note the insulated electrical wires terminating in metallic electrodes (from the author' s laboratory). (c) Electrostimulators made up of cloth bags soaked in saline solution and held to the hand by lead bands. Note also the electrodes used for the recording of the induced nerve action potentials (from Buchthai and Rosenfalck, 1966).
The Nature of Physical Stimuli
7S
graph of a stimulator used by the author to stimulate nerves in the fingers. The subject inserts his fingers into two test tubes of a salt solution. Contact is made with the electronic apparatus by simply immersing a metallic contact into the solution. No electrode jelly or any other special preparation is required by this type of electrode, since there is a very large involvement of the immersed skin resulting in a very low interface impedance. Figure 2.17( c) is a picture of an alternative form of electrolytic electrode, used by Buchthai and Rosenfalck (1966). Flannel bags surrounding a soft lead electrode were soaked with a salt solution and then bent around the finger. The lead acted both as the contactor and as the support for the salt-soaked flannel electrode. The electrode could be easily kept wet by application of the salt solution to the flannel. We have now discussed the essential elements of an electrical stimulator capable of providing constant current and isolated electrical stimuli. The waveform of the signal applied to the electrodes is dependent upon the particular experimental design. In the large number of situations in which the purpose of the experiment is to excite nerve action potentials, the waveform of choice is a brief pulse lasting for less than the duration of the action potential. This type of signal has a number of advantages, not the least of which is that a single pulse electrical stimulus, if brief enough, results in only a single nerve action potential. It is, therefore, possible to control the temporal sequence of the nervous response by patterning the external stimulus. This is a special condition quite unobtainable in those situations in which the usual adequate stimuli are activating the usual receptors. The broad implications of this technique will be discussed later in this book (in Chapter 6) when we discuss the neural coding of sensory intensity. Other workers have used ac voltages as their stimuli for studies of cutaneous amplitude and frequency discrimination. But in general, the skin has been shown to be insensitive to frequency changes using this technique, and it seems to some that ac voltages are inappropriate stimuli in light of the pulse coded mechanisms of the transmitting neurons. Primarily for these reasons, ac stimuli have been used less and less in recent years. VII. THE SPECIFICAnON OF PATTERN In previous sections of this chapter, we concentrated on the quantification of the quality and quantity of the physical energy of the stimulus. However, it has repeatedly been shown to be the case that a simple statement of only the physical energetics of the stimulus is often grossly insufficient to account for the complexity of a response. Very often two stimuli with identical average physical energies can produce extraordinarily different responses due to other organizational properties. The patterning of equal amounts of photic energy into different geometrical patterns can strongly affect such diverse responses as our attitudes toward a painting or the evoked brain potential. Figure 2.18, for example, shows a set of stimulus patterns from an experiment (Beatty and Uttal, 1968) in which an attempt was made to specifically test the hypothesis that pattern itself was a significant determinant of the amplitude of the evoked brain potential. The patterns shown all contain the same amount of physical energy. The only
76
Fundamental Materials Grid 1
Grid 3
nCURE 2.18 A set of visual
Grid 2
Grid 4
stimulus arrays in which the energy content is constant, hut the organization of the four grids differs. This difference in pattern is sufficient to produce suhstantial differences in the shape of the cortical potentials evoked hy tachistoscopic exposure of these stimuli, even though the energy content of all four is constant (from Beatty and Uttal, 1968).
diffetence among them is in the overall organization (in this case, grouping) of the stimulus lines. Yet the results of the experiment showed that there was a very strong effect on the evoked potential as a function of the grouping, namely, that the larger the number of lines in a group, the smaller the evoked potential. Indeed, this effect was considerably stronger and more reliable than the effect produced by varying the intensity of a visual stimulus. Equally certainly, the patterning of a musical composition is a far more significant determiner of our response than the absolute frequencies and amplitudes of its constituent sounds. The difference between the Appassionata and the same set of sounds presented in another order or pattern is so profound as to be almost unnecessary to mention. Neurophysiologists and psychologists, working with the senses of many different animals, have found what seem to be specific sensitivities to pattern (Lettvin, Maturana, McCulloch, and Pitts, 1959; Roeder and Treat, 1961; Segundo, Moore, Stensaas, and Bullock, 1963; Uttal and Krissoff, 1968) that transcend even at the receptor level the sensitivity to the simpler metrics of physical energy. The result of these and other related observations over the last few years has led to a very much increased interest in the use of stimuli with complex patterns and a decrease in interest in the use of impulsive unitary stimuli. In Chapter 8 we shall discuss this problem in very great detail. The experimental specification of stimulus pattern, however, is often difficult and is, in some cases, an underdeveloped technology in spite of its demonstrated importance. In other realms, it is almost overdeveloped, but unfortunately underused in psychological experimentation. Consider, for example, the musical notation system. A musical scale is one precise statement of a sequence of sound stimuli. As more and more experimenters begin to use patterns of musical tones as their stimuli, we shall probably see more precise means of stimulus specification. The frequency analytical techniques, which are used to characterize
The Nature of Physical Stimuli
77
signals as sums of sinusoid components, and the related correlational techniques, whieh extract general periodie tendencies, must also be considered as representational systems for pattern. They do, however, eliminate a large amount of the critieal detail information of a given temporal pattern. Nevertheless, they do emphasize the pattern aspects in an important way and ought not to be overlooked in our present discussion. The specification of temporal acoustie patterns, as we have said, has the rieh background of musieal notation systems to depend upon. Spatial patterns, however, have no equivalent artistic notational schema. The spatial arts, sculpture, painting, and architecture have never used anything other than their own pietographie representation. Sketches or starkly diagrammatie blueprints are about all that has classieally been available as a spatial shorthand. Recently, however, some of the automatie machine tool techniques have required that geometrie forms be represented by a sequence of coded instructions. This technique may have the germ of a future notational system for geometrieal forms implicit in it. Presently, however, the representation of two and three dimensional objects in space is usually done by a direct analogue-a map-whieh pictures the form in almost its complete detail. But it is often as diffieult to deal with a map as it is to deal with the multidimensional object itself. Consider the small differences between an original painting and a reproduction. About the only advantages that are forthcoming from maps are reductions or enlargements of the scale of the original object, or the removal of an extraneous dimension. This latter notion is exemplified by a map of a geographieal area in whieh the height of various items is ignored. The map is smaller, and it can be carried in one's pocket, but for the purpose of this discussion, it is still very much a pietographie representation of the original object. Maps, of course, are used frequently to specify visual stimuli for psychologieal experiments. A collection of s1ides of different geometrieal forms is essentially a set of maps in whieh no information reduction or co ding scheme is used. Many investigators, on the other hand, have searched for algorithms, that is, computation rules or formulas that generate sets of geometrieal forms according to certain generative rules. In many of the procedures, there is a random element inserted so that a general algorithm can produce a large number of similar but unpredietable geometrieal forms. In this case, the algorithm itself cannot be used to uniquely define any given character, apriori, since the specific figure generated will depend, to a great extent, upon the specific random numbers that are produced during the generation process. The final output of such a routine, however, may represent a geometrie al form in a reduced fashion in the sense we use when we refer to a notational or coding system. Some of the more interesting figure generation routines are described in the following paragraphs : 1. The " quasi-random histogram." Fitts and his colleagues (Fitts, Weinstein, Rappaport, Anderson, and Leonard, 1956) suggested a system for making random forms that has a number of advantages for psychologieal research. All shapes generated by this system are similar, yet differ enough to provide a sufficiently variable dimension for experimental manipulation. Figure 2.19 shows two of these patterns. The rule for generating
78
Fundamental Materials
(a)
(b)
neURE 2.19 Two stimulus patterns formed by random selection of the height of columns in an 8 X 8 matrix. The pattern in (a) has been formed by true random sampling with replacement, while the one in (b) was formed by constrained random sampling without replacement (trom Fitts, Weinstein, Rappaport, Anderson, and Leonard,1956).
this type of pattern is that a number between 0 and the maximum allowable height of a bar is randomly selected for each of the bar positions. This results in a pattern like that in Figure 2.19(a). If the random selection process is constrained so that a given height can be chosen only once (random selection without replacement), then a form like that in Figure 2.19(b) results. It is interesting to note, in passing, that the figures formed by the latter procedure are much more difficult to identify in a group of similar patterns than are those that are generated by the first fully random rule. Considering that the information content of the constrained random group is less because of the introduced redundancy, this is not too surprising a result. 2. The random polygon. Attneave and Arnoult (1956) developed an algorithmic technique for generating a variety of polygonal visual stimulus patterns. Random locations were selected on a sheet of graph paper. These locations were then connected, forming an exterior limit on the size of the polygon. Then any remaining interior points were used to make inden tations in the otherwise relatively regular polygon. Figure 2.20 shows one of the shapes and the generation process. Attneave and Arnoult in their publications have been especially adamant in their appeal for precise metrics of geometrical form, for they, like the author of this book, feel that without an adequate specification of the nature of the stimulus, little formal progress is possible in our understanding of sensory processes. 3. The "random" dot pattern. The problem of form can also be approached in a three-dimensional context as weIl as a two-dimensional one. Julesz (1960, 1971) has been interested in the study of depth perception, specifically in the cues that are introduced by retinal disparity independent of the other stereoscopic cues. To provide stimulus materials that had no other cue present but retinal disparity, he chose to use a novel and ingenious display composed of wh at appeared, when seen monocularly, to be an absolutely random dot pattern. Figure 2.21 shows a pair of these patterns. However, when the objects are viewed dichoptically, certain regions of the dot pattern can, under the proper convergence conditions, fall on corresponding points of the retina. Because certain regions of the dual pictures are not really completely random, but rather are identical in both pictures except for uniform lateral shifts in plotted position, they can be
79
The Nature of Physical Stimuli 100 80
7
A
8
20 0
80 4
C
9
100
5
8
60 40
6
3 2
2
20
40
60
80 100
60 40
8 C
9
20 0
6
7
10
100
5
80 A
8
4
3
2
40 20
2
20
60
40 60 80 100
0
20
40
60
80 100
(c) (b) (a) FlGURE 2.20 Steps in the construction of a randorn polygon, using the procedure invented by Attneave and Arnoult (1956). Randorn points are first plotted, and then the exterior points are connected. Then slices are taken out of the external polygon to connect the internal points, thus forrning irregular polygons with both convexities and concavities.
interpreted by the visual system in exactly the same way as the slightly shifted aspects of a more conventional stereoscopic pair produced by differences in the position of the two eyes or two camera lenses. Images that had not been obvious to any degree in the original pattern suddenly emerge in sometimes awesome complexity. It is obvious that the very powerful responses elicited by these random dot patterns have little to do with the simple physical energetics of the stimulus and that the "pattern" of dots is the a11-important variable. Julesz has done some work in quantifying these patterns in terms of the introduced disparity, but the best models of a11, of course, are the computer generation routines, which plot out the pictures. There is no question that these algorithmic generation routines do play an important and useful role in defining stimuli for psychological experiments. However, they do not completely define a metric that can be used to specify the exact configuration of the generated characters because of the random factors that appear throughout a11 of the methods. Thus, while a given routine might generate the members of a particular dass of patterns, it is not known which particular item will occur with any given evaluation of the algorithm. This limitation may, indeed, not be a handicap if one is interested in the problem of a general theory of form perception, but it does not help us in the specification of the critical differences in the perception of forms like the alphabetic characters, which are already predetermined. This same problem occurs in an analogous form in the study of speech, where, of course, the same generalization holds; namely, the energetics of the speech sound are secondary and almost incidental compared to the pattern of sound and meanings involved in an utterance. Stochastic procedures and analytical techniques are commonly used in speech analysis to help describe the sequential dependencies and general statistical structure of speech sounds, but again statistical models fall short of the desired goal-a specific and fu11 metric for the representation of pattern and form.
80
Fundamental Materials
HGURE 2.21 A dicoptic stimulus pattern in which no figure is apparent when viewed monocularly. When viewed in a stereoscope, however, the single cue of retinal disparity generated by systematic variation of dot position gives rise to a strong impression of a center square rising above the background (from Julesz, 1971).
Perhaps the next doser approximation to a complete map is a computer program. Unfortunately the problems of communicating this particular longhand description of a geometrical form have still not been solved, and few psychoIogicai experiments report the construction of the computer program defining the stimulus, in the methods section. In this chapter we have attempted to gather together the basic measures and fundamental principles of physics that are necessary for a full understanding of the sensory process. We have stressed the point that the physical stimulus must be the starting place for all discussions of neural coding of sensory messages. We have distinguished between a potential stimulus and a realized actual one and have, in condusion, stressed the point that the gross energetics of the physical stimulus may often play a secondary role to those distributions of energy in time or space which we call patterns, but for which we have not yet developed an adequate descriptive notation. With this introduction to physical stimu)i, we can now proceed to the biological aspects of sensory processing and consider the anatomy and physiology of the information input mechanisms in the remainder of the first section of this book.
CHAPTER 3: THE ANATOMY OF RECEPTORS AND THE SENSORY PATHWAYS
I. INTRODUCTION
The discussion in the preceding chapter has provided a foundation of the necessary physics, which must serve as a reference point to which we shall anchor all studies of sensory communication. The energetics and organization of the physical stimulus are, as has been pointed out, the only meaningful referrents for all of the stages of transduction and co ding that follow. However, before we can meaningfully discuss transduction and the coding neural signals, it is necessary that we make some further preparatory steps. In this chapter, we shall concern ourselves with the structure of those highly specialized receptor structures in which these transduction actions occur and the neural pathways which convey information from them to the central nervous system. There are three levels of anatomical information concerning the receptors with which we must deal: the macroscopic, the microscopic, and the ultra or electron microscopic. In recent years there has been great progress in our understanding of these cells, highly specialized as they are for the conversion of almost unimaginably slight amounts of energy into transmittable neural signals, particularly at the ultramicroscopic level. This is primarily the result of some extraordinary progress in the Held of electron microscopy. In addition, there has also been considerable progress made, at a grosser level, in tracking the ascending sensory pathways. The emphasis in this latter case is on the location of the synaptic interconnections and the tracing of the sensory nerves and tracts from the receptor organs to
82
Fundamental Materials
the areas of the upper reaches of the central nervous system at which, presumably, the communicated information becomes, in some unknown manner, the stuff of which experience is made. In presenting this information, we shall deal with the anatomy of the receptors and the associated pathways of a single modality as units, rather than adopt the alternative strategy of describing all of the receptors and then, separately, all of the pathways. As far as possible, the anatomy of ascending pathways and of receptor structures presented will be that of man himself. Occasionally infrahuman data will be used where human findings are not available for one reason or another or to make a particular point. To fully understand the nature of modern neuroanatomical knowledge, there are three pieces of preparatory material that must first be discussed. We must first consider the need for, and the operation of, the electron microscope itself, the tool, par excellence, of the microanatomist. Second, we must consider some of the techniques that are routinely used for the elucidation of neuroanatomical structure. Finally, we must also consider some grosser features of the general organization of the central nervous system, without which our subsequent discussion of the ascending pathways would be very difficult to fully understand.
11. THE ELECTRON MICROSCOPE
In the rest of this chapter, we shall be dealing with the sensory organs at several different levels of magnification. Cross structural analyses and the application of optical microscopy have both contributed much and will continue to contribute substantial amounts to our knowledge. The continued fertility of optical microscopic techniques is well illustrated by Johnsson and Hawkins' (1967) recent work using phase contrast microscopy to elucidate the structure of the cochlea. Nevertheless, the real cutting edge of ultrastructural research in the last decade has certainly been the application of the electron microscope to the study of receptor and neural anatomy. It is for this reason that we shall pause for a moment to consider exactly why one has to use an electron microscope, and then how this incredibly effective machine actually operates. To und erstand the operation of the electron microscope, we must first understand something about the wave properties of electromagnetic energy and the limitations of the optical microscope. We must consider the nature of the concept of resolving power and its relation to the wavelength of the radiation being used in the magnification process. It is not possible to review completely the principles of optics that underly the optical microscopej however, a few reminders of the basic mechanism of the refraction of light by lenses might help the reader to understand some of the material that will follow. We have already discussed, in the previous chapter, some of the ways in which the quantal characteristics of light are exhibited (for example, in the line spectrum of a luminescent light sour ce) and have mentioned how it is possible to convert from quantal energy measures to the wavelength or frequency measures. Discussion of the optical properties of lenses is one that is always framed
The Anatomy of Receptors and the Sensory Pathways
83
Medium #1
Bi HG URE 3.1 Diagram showing the change in the direction (refraction) of a beam of light as it enters a medium of a different index of refraction than that of the one in which it had been traveling. Refraction is the basis of all optical lens effects including magnification and is analogaus to the refraction of electran beams by magnetic fields.
Bi Bi =angle of incidence Br = angle of refraction sin
Medium #2
Bi
- - = constant
sin
Br
in terms of the wave properties of light and eonstitutes the scienee that is usually ealled geometrical optics. The basic premise of geometrical opties is that light is diverted in direction as it enters at an angle a medium with a different index of refraction than that in which it had been traveling. Since the velocity of the light ehanges with the index of refraetion, the effect is to alter the direction of the wave front by retarding one portion earlier than the others (see Figure 3.1). A lens will divert or refract rays of light, an amount whieh depends upon not only the index of refraction of the component glass but also upon the curvature of the lens. Thus, eaeh lens has a characteristic "focallength," which is defined in terms of the distanee from the lens at which parallel rays (coming from infinitely far away) are brought to a foeus. For light sources (the object) that are not infinitely far away, the distance Q at which an image is formed depends upon the distance P to the object and the foeallength F of the lens in accord with the following relation:
P
Q
P P M=-
Q
Q
(3.1)
These notions are summarized by Figure 3.2 for a simple thin lens. The degree of magnification M of a lens can be defined as the ratio of the image size to the object size, and it can be shown that this is simply equal to the ratio of P and Q. Therefore:
P
M=-
Q
(3.2)
Fundamental Materials
84
p
Q
Object
Image
II
~
Ol"b
Ol"b
~
::;;
...1r.
Ol"b
Q)
s:
~
= =
Ol"b
Lens of focallength, F
neURE 3.2 Diagram of the action of a simple thin lens. Light will be imaged at a distance Q from the lens, which is a function both of the distance of the object from the lens P and the foeal Zength F of the Zens. The magnifieation will be exaetly equaZ to P/Q. Several lenses of this sort can be used to gain additional magnifieation aeeording to a muZtiplieative rule (magnification total = magnificationl X magnification2 X ... magnificationn ), but onZy up to the resolving power limit whieh, for an opticaZ mieroscope, is approximately equivalent to a magnification of 2000.
Since the relation between P and Q is a function only of F, it is clear that the degree of magnification of a glass lens will be solely a function of its focallength and where the object is located. The magnification of a lens or lens system (the magnification of one lens can be concatenated with that of another to give even larger magnifications than those possible with a single lens), however, is unrelated to the so-called resolving power of a microscope. Magnification can, for all practical purposes, be increased without limit, but a limit is imposed upon any microscope by the fact that the light being used is also being magnified as much as the object under study. Thus, there comes a time at which the individual wavelengths of the light begin to be large enough to interfere with the imaging of the object. This limit is the basis of the idea of the "resolving power" of any microscope. Because of diffraction effects produced by the inter action of wave fronts from the much magnified individual light waves/ there is no realizable way of improving the optical properties of any microscope to distinguish between two adjacent points that are closer than a certain minimum amount. That is to say, even with perfect optical elements the wavelength of the utilized radiation specifies a minimum separable distance (resolving power) for any magnifying system. An expression, which can be used to estimate the maximum possible resolving power of any given magnifier, can be derived and shown to be a function of both the wavelength of the radiation and the geometry of the magnifying system: O.5A
r=-NA
(3.3)
The Anatomy of Receptors and the Sensory Pathways
85
where r is the resolving power (the minimal separable distance achievable by the magnifier); A is the wavelength of the utilized radiation; and NA is the "numerical aperture" of the magnifying system. For optical microscopes, the numerical aperture is defined by: NA = J1- sin (J
(3.4)
where J1- is the index of refraction of the medium through which the light passes, and () is half the angle subtended by the lens. Under optimum conditions of viewing, using an appropriate fluid immersion system and appropriately slanted light pathways, the numerical aperture of a microscope using visible light can only be as great as 1.5. Therefore, the resolving power of the best quality optical microscope is limited to ab out a third of the wavelength of the magnifying light. The shortest visible blue light that can be seen under normal conditions is about 380 or 400 nm. Resolving power is thus limited to separations of about 130 or 140 nm. By use of even shorter invisible lights, such as ultraviolet radiation, and special photographic or electronic detection systems, resolving power can be extended even further. Finally, however, the problems of lens transparency and of lens aberrations begin to take over, and the limits of the optical microscopes cannot be extended by any means. Unfortunately, this technologicallimit is not identical to the limit of the sizes of many of the elements of biological concern. As one important example, the lamellar layers of the outer segment of a retinal rod are only 50 Ä thick (see Porter and Bonneville, 1968, p. 181), and it is at this ultramicroscopic level that most significant transductive phenomena occur. Alternative means of increased magnification are, therefore, necessary if one is to have fuller information ab out the details of microstructure. Equation (3.3) leaves two different routes open in the search for a means to improve the resolving power of a magnifying system; the wavelength of the radiation may be decreased, or the numerical aperture of the system may be increased. For many practical reasons, it has generally not been possible to greatly increase the numerical aperture of any magnifier. The optical or electrical properties of lenses and magnets seem to be recalcitrant to much improvement in this direction. Therefore, it has been necessary to work on the other factor in the equation, the wavelength of the radiation used in the magnifying process. This strategy is facilitated by the fact that visible light is but one small region of the continuum of electromagnetic energies and wavelengths, which spans the energy spectrum from long, low-frequency radio waves to the ultrahigh-frequency equivalents of some of the basic particles of matter when they are accelerated to high velocities. It is not germane to the present discussion to detail further the relationship between the particle and wave properties of the basic building blocks of matter. Let us simply reite rate that modern physical theory treats electrons, along with all other elementary particles, as packets of waves, which exist in only a certain statistical sense in any place at any given time, the limit on their precise localization being a function of the Heisenberg uncertainty principle. The critical fact in the context of the present
86
Fundamental Materials
discussion is that a beam of electrons, just like a light ray, has associated with it wave properties which result in the fact that it can be refracted, reflected, and focused by appropriate "lenses" just as can visible light. The lenses that can accomplish this for the electron beam, however, are not made of glass or quartz, but must be either electrostatic plates or electromagnetic coils of exactly the same nature one would find in an oscilloscope or a commercial television set. By applying electrostatic or magnetic fields to a beam of electrons, the path of the particles can be altered in a selective manner, depending upon their course and the shape and magnitude of the deflecting field. Electrons from a point source, spherically divergent in their initial paths, can be collimated into a nearly parallel stream or focused by the action of, say, an appropriate magnetic field. The amount of deflecting force exerted on each electron is proportional to the component of its velocity, which is at right angles to the local magnetic field. This is, of course, the famous right-hand rule of motors. Deflection control by magnetic force is exactly analogous to the focusing action of a glass lens in the optical microscope and can be used in the same way to magnify an image. Magnetic lenses also have "focal lengths" and refractive properties analogous to glass lenses. But in the case of the electron beam, we are dealing with an electromagnetic wavelength that is far shorter than that of visible light, and thus the resolving power of the electron micrographic system can be much greater. In practical fact, assuming a relatively constant numerical aperture (and, as we have said, there is relatively little that can be done in modifying the numerical aperture of magnifying systems), rather than 150 nm resolution ability, typical of a good optical microscope, a good electron microscope will have an ability to resolve two objects only 1 nm apart. This separability can be even further improved to a few angstrom units as the energy of the electron beam is increased beyond a million volts as it is in some modern machines. The associated decrease in equivalent wavelength as a function of electron energy is the basic reason that the more highly resolving electron microscopes are also much larger physical instruments. They simply require high er accelerating voltages and the correspondingly more complex and massive instrumentation to achieve and maintain these higher-energy electron beams. It is also a practical result that electron microscopes do not operate as ne ar their theoreticallimit on resolving power as does the optical microscope. The art of designing the deflecting coils for an electron microscope simply has not achieved the same relative level of freedom from aberrations of one sort or another that the equivalent optical arts have. A focused and magnified beam of electrons from the electron microscope, after passing through or being reflected from a specimen, of course, is still invisible to the human eye. The information contained in such a beam must be converted to a visible image or a photographic record by projection of the beam of electrons onto aluminescent screen capable of emitting light within the visible spectrum. Additional amplification is often achieved by image intensifiers, closed-circuit television, or other electrooptical methods of light enhancement.
The Anatomy of Receptors and the Sensory Pathways
87
FIGURE 3.3 A photograph 01 the outside of a modern eleetronmieroscope (eourtesy 01 earl Zeiss, Ine.).
88
Fundamental Materials
Electron mieroscopes, like optieal ones, may be either reflection or transmission types. In either case, a special and appropriate technology of speciment preparation has developed. To minimize the absorption and thus the associated heating of the electron beam in the transmission electron mieroscope, specimen tissues of extremely thin cross section must be prepared. Ultramierotomes capable of slicing a section of a fraction of a mieron in thiekness are routinely used. These ultramicrotomes may require blades made from specially fractured pieces of glass. For reflection electron microscopy, a process specialized for the examination of the surface detail of a specimen, special metallizing techniques have been invented. In this process a thin coating of a metal, especially selected for its ability to reflect electrons, is sputtered onto the surface of the specimen. The reflected electrons are then guided to the fluorescent screen or photograph film. Figure 3.3 shows a photograph of a modern electron microscope. An important new development in electron microscopy has been the recent availability of the scanning electron microscope. This device has the advantage of enormous depth of field-so great, in fact, thät in many instances objects appear to be in focus over their entire volume. Figure 3.4(a), for example, is an example of what would have been an impossible mierographie task prior to the development of this instrument. An entire receptor organ is seen at a relatively low level of magnification. This same sort of three-dimensional accuracy is also available, furthermore, at magnifications of many thousands as shown in Figures 3.4(b) and (c). The scanning electron microscope operates on the basis of the secondary ejection of electrons from the specimen after the surface is bombarded with the primary electrons produced by the electron gun of the microscope. The key point is that not allloci on the surface of the specimen are bombarded simultaneously, but rather a highly focused beam of electrons "scans" each point in sequence and in synchrony with the painting of an image on a cathode-ray oscilloscope. It is this sequential scanning procedure similar to that used on an ordinary television that is the basis of the name of this form of the electron microscope, as weIl as of the great depth of focus. Figure 3.5 is a cut away diagram of the main features of a scanning electron microscope, showing the electronie synchronization and control circuitry, the specialized detector capable of selectively sensing the number of secondary electrons ejected from the specimen and the display oscilloscope. Very often, the specimen is coated with some material to increase the number of secondary electrons. This material is usually some metal capable of being vaporized and deposited in a coating only a few angstrom units thiek. As we shall see later in this chapter, some important insights into receptor anatomy have emerged through judicious application of scanning techniques. The technical details of the construction of an electron microscope are, however, only secondary in interest to the astonishing pictures these elegant machines have provided for students of many of the biologieal sciences and particularly for those of us interested in the senses. There is no better way to make this point than to present in later sections of this chap-
The Anatomy of Receptors and the Sensory Pathways
89
(c)
(c)
(c)
FICURE 3.4 Aseries (a), (b), and (c) of scanning electronmicrographs of the visual organ of a fruit fly in increasing order of magnification showing the astounding three-dimensional effects produced with this instrument. (a) The whole head (X 108). (b) Several ommatidia (X 302). (c) A single ommatidum (X 11,200) (courtesy of Dr. Lloyd Beidler, Florida State University).
Fundamental Materials
90
Electron gun High voltage supply
Electron beam First condenser lens
Lens supplies
Second condenser lens
Magnification control
Scanning lens
Scanning circuits
Final lens
Signal amplifiers Specimen
Vacuum system
Secondary electron collection system
Display and record unit
FIGURE 3.5 A cutaway drawing of a modern scanning electronmicroscope showing the additional scanning and display circuitry necessary to scan the matrix of points on the specimen in sequential order. The bottom electromagnetic lens controls the scanning of the electron beam. The serial infei'mation emitted from the specimen in the form of a stream of secondary electrons of uarying intensity is then measured and used to plot out a picture on an oscilloscope for photographie purposes (courtesy of Kent Cambridge Scientific, [ne., Morton Groue, Illinois).
ter a sampling of some of the better ones and some of the interpretive drawings made from them. The general scheme we -shall follow in this chapter will be to proceed from the gross to the ultramicroscopic anatomy. In a very true sense we shall thus be saving the best for the last. But all of the levels of structural complexity at which we shall look in this chapter will appeal to the esthetic sens es of the reader in a way quite distinct from the impact of some of the later chapters. However, before we can look at sensory anatomy in particular, we must consider central neural anatomy in general. The next two sections accomplish this function. We first consider some of the general techniques used for neuroanatomical research and then the gross structure of the central nervous system as it is presently known.
The Anatomy of Receptors and the Sensory Pathways
91
III. SOME BASIC RESEARCH TECHNIQUES IN NEUROANATOMY
A. Additional Optical Microanatomical Techniques The optical microscope, as we have said, has played and will continue to play an important role in mapping the central nervous system. To use a microscope to observe wh at are normally transparent tissues, special techniques must be employed to make the cells visible. The elassic technique has been to use some sort of a dye or stain, which is relatively specific to a particular kind of tissue. Thus, there are stains like toluidine blue, which are highly specific for cell bodies alone, and others like the silver based stains, which act in little understood ways to selectively stain some parts of some cells ineluding the axons and dendrites. Many pictures of whole neurons have been processed with a technique known as the Golgi impregnation method. In this procedure, crystals of silver chromate fill the cell completely, and thus it is really not so much a membrane " staining" technique as it is a sort of "fossilization" procedure. In other silver staining techniques like the Holmes and Bodian procedures, the silver atoms combine with neurofibrillar elements, and a true stain is thus achieved. More recently, new techniques, that allow unstained nervous tissue to be observed, have been developed. Phase contrast optical techniques take advantage of the minute differences in the refractive index of the different parts of individual ceIls, so that the microscopist can observe almost completely unprepared tissue. In fact, it is possible in some instances to actually see a living cell going through its normal metabolie activities with this type of microscope. Fluorescence techniques, which allow examination of tissues in a form very elose to their naturalstates, have also been developed. The fluorescence technique usually involves some specialized preparation of the tissue ineluding the injection of special fluorescent dyes, however, and living cells cannot usually be observed in this manner. Typically the tissue is frozen and then dried in a vacuum. The prepared tissue is then placed in a special microseope, in which the light source is rich in ultraviolet rays. Different tissues pick up different amounts of dyes and thus fluoresce in different colors when they are exposed to ultraviolet light. It is this secondary emission of light that is seen by the observer, rather than transmitted light filtered by a stained slice of tissue, as is the case in a conventional optical microscope. (The particular color of the cell can thus indicate which cells are histochemically alike and perhaps, even more important, functionally similar.) A very recent development (Stretton and Kravitz, 1968) in fluorescence microscopy takes advantage of a special dye known as Procion yellow. This dye has a number of important advantages. One of the most significant is the fact that it is relatively harmless to a living neuron. It is, therefore, possible to inject this material into a physiologically responsive cell through a micropipette and, for a short time, observe the anatomy of the living cell while simultaneously recording its electrophysiological signals. Another important advantage is that the neuronal membrane seems to be very impermeable to the Pro don yellow dye. Thus, all of the dye remains within the single cell into which it is injected, yet it is also capable
92
Fundamental Materials
of diffusing to most of the branches of the neuron. A cell can, therefore, be anatomically studied as if it were perfectly isolated from its neighbors if the injection is appropriately done. Prodon yellow fluoresces in the yellow portion of the spectrum when irradiated by a bluish light, while most cells that do not contain dye will fluoresce with a greenish color. The injected cell will stand out clearly, therefore, including most of its smaller ramifications. The dye is also stable during normal fixation procedures and remains in serial sections so that it is still useful for postmortem microscopic examination. The microanatomical techniques, which we have so briefly mentioned here, help to directly distinguish one anatomical structure from another. Some fiber bundles and nuclei are separated from their neighbors in sharp and distinct patterns when observed through the microscope. The constituent cells of each region may be quite different from one area to another. In other cases, however, the microanatomical difference may be quite small, and no obvious line of demarcation may exist between two adjacent regions. Furthermore, microanatomical techniques also may show greater differences in sequential portions of the same pathway than in adjacent portions of different pathways. Since so much of our neuroanatomical interest at this microscopic level is based upon our interest in the functional rather than the structural properties, neuroanatomists often turn to the following procedures to define the limits of a tract even when it shows very little visible anatomical differentiation from its neighbor. B. Evoked Potential Techniques Stimulating and recording procedures and electronic instrumentation are often used to trace out the anatomy of the nervous system. The procedure typically involves the activation of one portion of the nervous system with some stimulus and the probing for electrical responses in other locations. The stimulation may be accomplished in a number of ways. Adequate stimuli may be applied to the normal receptor channels; electrical stimuli may be used to activate almost any region of the nervous system; or chemicals such as strychnine may be applied to a portion of the brain. A relatively large electrode may then be used to determine the regions of the brain or spinal cord that are activated by the stimulus. One of the major problems with this technique is that the number of remote regions that may be activated by any stimulus may be very large due to abundant interconnectivity. In the cat, for example, it seems as if stimulating any one part of the brain activates a large number of other regions of the brain. For this reason, the most powerful application of this evoked potential technique has been to the analysis of sensory systems that are suffidently simple so that afferent conduction can be very weIl traced. Much of what we know of sensory localization in the cortext has come about through this sort of approach. C. Degeneration Techniques We mentioned earlier that if an axon is disconnected from its cell body, the axon will die. This is due to the fact that the metabolically important structures of a cell are located in the cell body; peripheral structures, like axons,
The Anatomy of Receptors and the Sensory Pathways
93
FlGURE 3 .6 Micrograph of adegenerating tract in the superior colliculus of a hamster. This degeneration was produced by removing the controlateral eye of the anima I a week prior to the time it was sacrificed and the histological sampie taken. The evidence of degeneration is the accumulation of the dense granular material, a phenomenon which is shown in more detail in the more highly magnified lower photograph. (A photograph of a sampie collected by Dr. G. Schneider, reproduced with the courtesy of Dr. L. Heimer, 1970.)
cannot survive alone. This fact has been used as a useful adjunct to the stimulation and recording and microanatomic techniques to trace sensory pathways. If a cut is made between the originating cell bodies and the axons of a major pathway, there will appear, over the next few days and weeks, a neurologicallesion along the entire pathway, ending only at the next synaptic junction. The degeneration of the myelin sheath is the microscopically observable result ofaxonal degeneration. In some of the ascending pathways of the spinal cord that are quite long, this provides an exact means of mapping the course of the tract. Figure 3.6 shows a prepared slice of the
94
Fundamental Materials
superior colliculus of a hamster showing the granulated appearance of a sharply circumscribed degenerated region produced by removing the contralateral eye. Axonal degeneration of this sort is known as pro grade degeneration. Degeneration of a cell body can also occur when the axon is cut loose. This is known as retrograde degeneration and is exhibited as a systematic change in the intracellular structure of the cell body or perikaryon. In the peripheral nervous system, retrograde degeneration is often reversible, and the perikaryon may return to normal after regeneration of the axon. In the primate central nervous system, however, the process is not reversible, and once the axon is disconnected, the entire cell usually dies. Transynaptic degeneration has also been reported, adding a further complication to some of the obvious difficulties encountered when one depends too much upon degeneration techniques as a source of anatomical information. These, then, are some of the basic techniques that are used to trace out the functional anatomy of the nervous system. Before we consider the current state of our knowledge of the specific anatomy of the individual sensory pathways, we must digress to a discussion of the general organization of the central nervous system.
IV. THE GROSS STRUCTURE OF THE CENTRAL NERVOUS SYSTEM Later in this chapter, when we discuss the anatomy of the various sensory channels, we shall refer to various relay stations or nuclei along the ascending pathways where, for example, synaptic associations between sequential neurons may occur. In many previous instances in which this sort of material was presented, the discussions were sometimes obscured by the fact that no accompanying presentation was made of the general gross anatomy of the central nervous system. In this section, we shall present abrief gross anatomy of the central nervous system, introducing the various centers in a way which, it is hoped, will make the subsequential detailed discussion of the ascending tracts more intelligible. The central nervous system is made up of three main portions; the brain, the spinal cord, and the retinae. The retinae are associated for embryological reasons with the other two portions, but will be considered separately in our discussion of receptor anatomy. In this section, we shall concentrate only on the brain and spinal cord. Figure 3.7 is a drawing of the central nervous system as it would appear after an initial gross dissection. Because of the evolutionary growth of the cerebral hemispheres in particular, this drawing obscures the overall structure and important relationships among many of the various parts that are going to be referred to later. When the brain and spinal cord are dissected and observed without any special preparation, there is little to indicate the rich and orderly array of pathways that convey sensory information from the periphery to the highest portions of the brain or among the constituent parts of the brain. Some regions of the central nervous system are particularly rich in myelin sheaths and display a whitish cast. These regions are predominantly
The Anatomy of Receptors and the Sensory Pathways Dura mater
Optic chiasma Internal carotid artery Medulla oblongata Sternocleidomastoideus Vertebral artery Scalenus anterior 1st thoracic vertebra First Pons
95
Superior sagittal sinus Falx cerebri Olfactory bulb Pons Basilar artery Vagus nerve Superior cervical sympathetic gangl ion Brachial plexus Sympathetic chain
Spinal nerve Greater splanchnic nerve Ramus communicans Spinal ganglion 1st lumbar vertebra Conus medullaris
Filum terminale
Cauda equina
Sacrum Sacral plexus
HGURE 3.7 Drawing of the central nervous system as it would appear in a gross dissection approached from the back (from Goss, 1973, after Hirschfeld and Leveille).
axonal bundles and thus mainly serve the function of transmitting information from one portion of the brain to another. Other regions are pinkish grey and are made up mostly of neuronal cell bodies and other unmyelinated processes. But the boundaries between these two regions are often not obvious in the gross dissection. Yet when one looks at achart of a slice of the central nervous system in a conventional anatomy text, a complex pattern of ganglia and tracts is usually seen. To emphasize this difference, we have presented in Figure 3.8 both an unretouched photo graph of a lightly stained slice of the rat's brain and a more detailed stereotaxic grid and anatomy indicating some of the many known centers involved in the same region. The discrepancy, between the two parts of the figure of course, is due
Fundamental Materials
96
(a)
0
11 11
1111
6
0
5
2
4
3
3
4
2
5
0
6
0
7
-1
8
-2
9
-3
10
-4
11
-:5 8
7
654
321
0
1
234
5
678
FlGURE 3.8 Photograph (a) showing the appearance of a lightly stained section of the brain of a rat and a stereo taxie map (b) with abbreviations of some of the many known centers and nuclei of the same. section. Since most of these centers and nuclei cannot be seen in the original photograph, it is clear that they have had to be tediously traced by degeneration, evoked potentiat and other anatomie locating procedures (from Pellegrino and Cushman, 1967).
97
The Anatomy of Receptors and the Sensory Pathways
Cerebral hemisphere
Lateral ventricle Pineal body Olfactory bulb
Thalamus
Colliculi
Cerebellum IV ventricle
Crus Pons
Corpus striatum Optic tract
Medulla
III ventricle Mamillary body
Pituitary body
nCURE 3.9 Diagrammatic sketch that maintains the topological relation of the central nervous system, but distorts the actual physical dimensions. The central nervous system is seen to be organized as a tube with various hypertrophied centers (trom Morgan and Stellar, 1950, after Lickley, 1919).
to the enormous amount of work that has been done to plot the structure of the central nervous system by anatomists using some of the techniques described above. Another useful pictorial display of the central nervous system is shown in Figure 3.9. In this figure, a plan proposed and first used by Lickley (1919) and later adapted by Morgan and Stellar (1950), the topological relationships of the central nervous system have been maintained, but in so doing the actual physical dimensions have been distorted to emphasize the tubelike structure and the relatively linear order in which the various major centers are organized. The original neural tube, the primitive and simple precursor of the nervous system seen in the first few weeks of embryological development, has had an extraordinary growth by the time a vertebrate is an adult. The structure has swollen at several pi aces and is, in the adult anima!, represented by aseries of major relay and integrative centers at various levels along the course of the brain or spinal cord. The grandest development from the original neural tube is certainly the great cerebral hemispheres, which now physically overlay almost all portions of the rest of the brain, completely hiding some of the lower centers. Not only do the cerebral hemispheres physically overshadow the lower centers, but it is, of course, also thought that it is in these giant cerebral ganglia that the mechanisms underlying the highest accomplishment of organic evolution-cognitive behavior-reside.
98
Fundamental Materials
Neural connections between the hemispheres are conveyed through the cerebral commissures-the anterior and posterior commissures and the corpus callosum. Signals into the hemispheres come via a variety of ganglia and tracts conveying information upward mainly from the thalamus and the ascending reticular formation. The thalamus is the great sensory relay station of the brain, passing messages from the lower centers to the cortex. Below these more anterior portions of the brain, a region which is often collectively referred to as the forebrain (composed mainly of the cerebral hemispheres, the thalamus, and the hypothalamus), lies the anterior portion of the brain stern, which is also known as the midbrain. The midbrain is primarily made up of aseries of relay stations for both ascending sensory and descending motor information. On the dorsal (or back side) of the human midbrain lie the two pairs of inferior and superior colliculi, which together make up a general region known as the tectum or roof. The colliculi are especially important relay stations for sensory information, and we shall find many synapses here of especial importance to the auditory and visual pathways. Ventral to these structures, or on the front of the human midbrain, lies the crus, a region through which many sensory and motor tracts pass up and down the central nervous system. In the hindbrain, the pons contains a number of important nuclei and also has many important tracts passing through it without synapsing. The cerebellum of the hindbrain is primarily a center for motor control, and while it does receive some sensory information, notably from the proprioceptive and vestibular receptors, we shall not give it any extensive coverage in this book. The medulla, on the other hand, is richly endowed with a wide variety of sensory ganglia and tracts and will appear as one of the important relay stations in many of the discussions of the ascending pathways. The spinal cord is mainly a collection of giant tracts, but some cell bodies and synaptic connections, of which we shall speak in detail later, are also present there. Table 3.1 summarizes the anatomy of the central nervous system as discussed in the preceding paragraphs. It also does two other things. This table intro duces certain additional terminology used to describe the various structures and also acts as a locator for certain of the major centers of interest in our later discussions. I am glad to acknowledge that Table 3.1, and much of this present discussion, are based upon the presentation in Morgan and Stellar's (1950) textbook on physiological psychology. Another major feature of the central nervous system is the complex net of nerves that enter and leave it. Neuroanatomists distinguish two different classes of these nerves-the spinal nerves and the cranial nervesdepending upon where the nerves enter or leave the nervous system. Table 3.2 lists the various cranial nerves, many of which are mainly sensory. The main distinction of importance to our present discussion is that the spinal nerves, unlike the cranial nerves, travel at least a portion of their course in the spinal cord, and all are related to bodily sensory or motor functions. Depending upon the level of their entry into the spinal cord, the spinal nerves are classified as cervical, thoracic, lumbar, sacral or coccygeal. We now should have a general notion of the organization of the central nervous system, including the sequence of centers and tracts to be
TABLE 3.1
THE VARIOUS NULCEI AND CENTERS OF THE CENTRAL NERVOUS SYSTEM FURTHER SUBDIVISIONS OF SENSORY IMPORTANCE
MAJOR CENTERS
Telencephalon
Cerebral Hemisphere
Sensory Regions of Cerebral Cortex Association Regions of Cerebral Cortex
Olfactory bulb Corpus straitum Basal ganglion Forebrain Diencephalon
Lateral geniculate body Lateral nucleus Medial geniculate body Hypothalamus Arcuate nucleus Ventrolateral nucleus
Thalamus
Optic tract and retina Hypothalamus The Brain
Mesencephalon Midbrain
Superior colliculus* g Inferior coll icu lus * "; E Crus (the floor) * Ci LL ~
!1 ::l (.)
t
er:
Metencephalon
Cerebellum Pons*
Hindbrain Myelencephalon Medulla *
The Spinal Cord
Red nucleus Substantia nigra Mesencephalic nucleus Basis pedunculi Oculomotor nucleus Trochlear nucleus Trigeminal nerve nucleus Parabrachial nucleus Gracile nucleus Cuneate nucleus Dorsal, ventral, and cochlear nuclei Solitary nucleus Superior, medial, and lateral vestibular nuclei Superior olive Inferior olive Trapezoid body
100
Fundamental Materials
TABLE 3.2 THE CRANIAL NERVES (ADAPTED FROM THOMPSON, 1967) Number
I
1I
111 IV V
VI VII VIII
Name
Olfactory
Smell
Optic Oculomotor Trochlear Trigeminal
Vision Eye movement Eye movement Masticatory movements Sensitivity of face and tongue Eye movement Facial movement Hearing Balance Tongue and pharynx Heart, blood vessels, viscera Neck muscles and viscera Tongue muscles
Abducens Facial Auditory vestibular
IX
Glossopharyngeal
X
Vagus
XI
Spinal accessory
XII
Functions
Hypoglossal
found. Once again, we repeat that while our emphasis will be heavily on the communication properties of the brain, most of the centers to be described have other functions besides. The thalamus, for example, which has been referred to as the great sensory transmission center, contains several different nuclei, which do not directly receive afferents from the sensory pathways. The distinction that we make here is slightly artificial, because even the process of integration, for example, is based upon the communication of information from place to place. Nevertheless, we feel it is necessary in the context of modern neurophysiology to separate out those processes that represent stages in the transmission of information from the peripheral receptors to the cerebral cortices, from purely internal communication links. The sensory tracts through the brain and spinal cord, wh ich we shall discuss, are the most direct access foutes to the specific upper portions of the brain associated with each of the senses. We must also take note of an additional common pathway through the reticular system, which also plays an important role in sensory functions. The reticular system has been known anatomically for many years, but only recently has been shown to have important functional properties associated with arousal and wakefulness (Moruzzi and Magoun, 1949). It is a central tube of poorly defined anatomy lying within the direct ascend-
The Anatomy of Receptors and the Sensory Pathways
101
ing pathways, which we shall discuss in detail in the following section. Anatomically it has been described as an almost structureless "reticulum" of cells with relatively short fibers and lacking the linear ordering of the usual afferent pathways. In addition to the specific sensory nerve tracts that pass to the sensory cortical areas, there are collateral sensory inputs from all of the sense organs to the reticular formation which have been physiologically, if not structurally, defined. The reticular formation has, as its output, a general and nonspecific pattern of signals, which are conveyed almost universally to almost all parts of the forebrain. These nonspecific outputs can produce astate of activation or excitement in these cortical regions, which many researchers believe may correspond in some way to the psychological states of attention, awareness, and consciousness. The many complex properties and functions of the reticular system are fully discussed in another volume of this series (Thompson, 1967). This then completes our discussion of the preliminary anatomie material. We shall now consider the specific material of importance to an understanding of the anatomy of sensory receptors and the ascending sensory pathways. V. THE VISUAL SYSTEM
A. The Anatomy of the Visual Receptor From some points of view, the paired photoreceptors of the vertebrate visual system may be considered to represent the highest evolutionary development of any sensory system. Certainly if one used some arbitrary comparative measure of, say, the possible amount of information transmitted, the visual system would stand at the top of the list. Yet, in some curious way, people are often able t0 cope with blindness more easily than loss of some of the other less rich senses. Total deafness, for example, cuts one off from human communieation in ways that some would consider even more profound than blindness. Loss of the proprioceptive system, whieh has a very low information-processing rate, on the other hand, is probably lethaI, as is a total loss in cutaneous sensitivity. The point is-it is not always possible to use information-processing capacity as the single measure of sensory importance; rather, the adaptive utility of a sensory system may be more important to the organism in ways that go beyond riehness of information flow. Our plan for this chapter, as no ted above, is to go from the macroscopie to the increasingly microscopic for each sense organ. Therefore, let us first consider the structure of the eye as it might be observed by another naked eye or with a low-power hand magnifying glass. Figure 3.10 is a cross section of the eye observed from the top. The most anterior portion of the eye is the transparent cornea, the interface between the external and internal worlds. The cornea performs a passive information-transmission function in addition to the protective and supportive functions it shares with the two other layers of the eye, the sclera and the choroid. Behind the cornea lies the anterior chamber of the eye filled with a clear watery fluid known as the aqueous humor. Behind this anterior fluid-filled
102
Fundamental Materials Cornea Iris
Posterior chamber Limbal zone·
Conjunctiva Canal of Schlemm Ciliary muscle
Anterior chamber
Lens Ciliary process
Zonule . fibers
Retrolental space
Rectus ~endon
Ciliary epithelium Ora serrata
Optic axis
Visual axis
Canal of Cloquet '
Vitreous humor
Retina Sclera Choroid
Retina
Lamina cribrosa Macula lutea
HG URE 3.10. A cross section of the eye from the top, showing many of the important structures involved in the visual process. See text for full details (fra m Brown, 1965, after Walls, 1942, as modified from Salzmann, 1912).
chamber lie two mechanically active units of extraordinary capability-the iris and the crystalline lens. The iris, colored by the deposition of various pigments, is a diaphragm composed of two counteracting muscular bands, the sphincter and the dilator. The sphincter is an anterior layer composed of musc1e fibers running parallel to the perimeter oE the circular iris. It is capable, upon contraction, of reducing the diameter of the pupil-the aperture in the iris-to less than 25 percent of its relaxed dimensions. The dilator, on the other hand, is composed of radial muscle fibers which, when contracted, tend to puB the pupil to its maximum opening. Pupil diameter is a function of many different variables. The amount of incident light can directly affect pupillary diameter, but the internal emo-
The Anatomy of Receptors and the Sensory Pathways
103
tional state of the animal can also have dramatie effect~. Hess (1965), for example, has shown that the attractiveness" of a male or female portrait correlates highly with pupil dilation. Kahneman and Beatty (1966) have also shown that such intellectual activities as mental arithmetie affect the pupil diameter. It is not too surprising then to realize that the muscles of the iris are smooth muscles controlled involuntarily by the autonomie nervous system. Dilation is stimulated by sympathetie signals, and the constrietor action of the sphincter muscle by parasympathetie nerve action. The action of the iris, like any other mechanieal diaphragm in an optiealsystem, reduces the amount of light entering the eye. However, pupil area can vary only over a range of ab out 16 times the minimum size. Other neural and photochemie al factors must be invoked to explain the full range of light level adaptation. Reducing the size of the pupil, furthermore, can also increase the depth of Held of clear vision and the overall sharpness of the retinal image by cutting off the rays coming from the periphery. These peripher al, oblique rays contribute more to the spherieal, cylindrieal, and astigmatic aberrations of the visual image than the central direct rays. Their absence, therefore, greatly decreases the amount of blur, and in a crisis situation, this may be a truly useful advantage for an anima!. Another mechanieally active tissue behind the anterior chamber is the crystalline lens. Its actions are equally as remarkable as those of the pupil, but are mainly associated with retinal focussing rather than the control of stimulus intensity. The lens, as it necessarily must be, is also a transparent object in the normal eye and is made up of aseries of lamina very much like those of an onion. The lens is suspended by aseries of "zonule" fibers from another muscular ring, the ciliary muscle, whieh lies just under the sclera below and to the outside of the iris. The action of this ciliary muscle is to automatieally adjust the shape of the lens to provide the appropriate focusing action for objects at differing objective distances. The lens must be flattened to provide increased focallength for objects that are far away. This is achieved by arelaxation of the ciliary muscle, producing an increased tautness of the zonule fibers overcoming the natural elasticity of the lens and thus flattening its shape. For near objects, the focal length of the lens must be shortened by making the lens less flat. This is achieved when the ciliary muscle contracts with a resulting decrease in the strain on the zonule fibers. Then the lens' natural elasticity tends to make it more spherica!. The ciliary muscle is, like the iris, also a smooth muscle controlled by the autonomie nervous system. Accommodation is, therefore, not under voluntary contro!. Rather, information ab out the degree of retinal blur is conducted from the central nervous systems to the autonomie nervous efferents and then to the muscles that control the lens' shape. The exact nature of this communieation link is not fully known. Behind the lens lies the great posterior cavity of the eye, whieh is filled with the gelatinous vitreous humor. The canal of Cloquet, the dirn structure shown running through the posterior chamber from the lens to the retina in Figure 3.10, is a vestige of the pathway of the hyaloid artery, whieh actually connects the lens to the blood supply du ring the embryonie development of the eye. It degenerates during childhood. 11
104
Fundamental Materials
FICURE 3 .11 A photograph of the fundus of a human eye as it might be seen through an ophthalmoscope. The fovea is at the center of the macula-the dark area at the center of the picture (courtesy of Mr. C. L. Martonyi of the University of Michigan).
Other gross features are also visible without magnification to an observer when he views the retina through an opthalmoscope. One of these features, which can be seen in Figure 3.11, is the optic disk or, as it is better known, the "blind spot." It is in this region that the retinal blood vessels and the optic nerve both enter or leave the eye through a region in which there are no photoreceptors present. Another easily seen feature is the tiny bright spot, which indicates the position of the fovea. The fovea is a highly specialized region of the retina, mediating the most spatially and chromatically acute visual processes due mainly to the fact that it is here that the maximal density of cones occur. It is also interesting to note that the arborization of the blood vessels can actually be observed in one's own eye by shining a smalllight into its corner in a darkened room. Most important of all ocular structures and the raison d'etre for the presence of alI of the other accessory structures is the retina. This tissue, visible only as a slight discoloration at the gross level at which we are
The Anatomy of Receptors and the Sensory Pathways
105
retina
FlCURE 3.12 A photograph of a cross section of the monkey retina at the fovea. The arrow indicates the direction from which light enters the eye (X 40) (courtesy of Dr. M. Clickstein of Brown University).
examining the eye, contains the critically important receptor cells and the neural plexus, which mediate the transductive and initial neural processes of vision. To appreciate the structure and the significance of this astonishing receptor tissue, we must take our first step down into the microscopic world. Dur guide in this exploration initially will be the two volumes that sum up the monumental work of Steven Polyak (1941 and 1957). Polyak dedicated his entire life to the study of the optical micrography of the primate retina, and re cent progress had added little to the story he told until the application of the electron microscope to the problem of retinal anatomy. Figure 3.12 is a microscopic cross section of the human retina taken from a region near the fovea. First, the reader should consider the thickness of this tissue. The retina is exceedingly delicate, varying in thickness from 125 JL at the fovea to about 300 JL at the periphery, and this of course includes ten distinguishable structural layers. Even more surprising than the delicacy of the tissue, however, is the fact that the retina seems to be inverted. The broad arrow in this figure shows the direction of light coming in from the outer world after having passed through the ocular media. Before the light arrives at the photoreceptors themselves, it must traverse the entire thickness of the other neural and supporting cell layers of the retina. The absorption of visible light by these initial layers is relatively slight, fortunately, for they are nearly all completely transparent. Nevertheless, it is measureable and, therefore, it is especially important that the absorption spectrum of this tissue, as well as that of the lens and vitreous, be understood and corrected for in any calculations that attempt to distinguish absorption spectra of the retinal receptors from that of the whole eye. Looking at Figure 3.12 once again, we see that there are three main
Fundamental Materials
106 Choroid
Bruch's membrane Pigment epithelium Outer Inner
Segmentof rod & cone layer
Outer limi ting membrane Outer nuclear layer (Receptor cell bodies) - Outer plexiform layer
e
11.
;
11.
Horizontal cells
11.
'ct lit
h-
'd
'h;
11.
f
u.
t
c
fa.
h
"
u.
Inner n uclear layer - (b ipolar cell bodies) optic nerve fibers
Inner plexiform layer
,..
'17
s
S 71
S
S
s
S
s .".
S
Gangl ion cells q
S Layer of optic nerve fibers
CL
lectrode lectrode
Inner limit ing membrane E lectrode
FlCURE 3.13 Diagrammatic rendition of a cross section of the retina at a region that has both rods and cones present. The normal direction of incident light and of an electrode penetration are shown. (This dia gram, trom Ruch and Patton [1965] has been modified from an earlier drawin g by Polyak [1941] and is used through the courtesy of Mrs . Stephen Po/yak.)
cellular layers in the vertebrate retina. The initiallayer, defined functionally in terms of the photic action, if not anatomically, is the photoreceptor layer. The second layer is composed of short bipolar cells which convey information from the receptors to the third layer, the ganglion cells, whose axons make up the optic nerve and pass without a synapse to the thalamus. Figure 3.13 is a much more diagrammatic sketch, from Ruch's (1965) modification of a Polyak drawing, which defines the 10 different regions of the retina to which we have referred. This division is based upon the cellular anatomy and also indicates some other structures we have not described in any detail. In addition, both of these figur es show the presence of various kinds of horizontally connecting cells.
The Anatomy of Receptors and the Sensory Pathways
107
The photoreceptor layer in all ver te brate eyes is made up of one or both of two highly specialized types of neurons. On the basis of their shape, these cells have been distinguished and named rods and cones. However, it is clear from surveying the anatomical literature that this dichotomy refers to what is actually a continuously graded series of cells of varying shape. There are both rodlike cones and conelike rods to be found in the primate retina. Nevertheless, the two groups also differ in their function, and the duplicity theory of retinal function is one of the basic tenets of retinal psychophysiology. The rods and cones themselves have only minimum axons, if their terminal axon al brushes can be dignified at all with that nomenclature. They synapse almost immediately with the layer of bipolar cells and other horizontally interconnecting cells. Horizontal cells are found in the second layer of the retina and interconnect among rods and cones, often synapsing at several different points in a horizontal plane. Amacrine cells are also horizontally conducting units, but interconnect at the level of the junction of the bipolar and ganglion cells. This, briefly, was the anatomical story as it was known up to about the early 1950s. In the last 10 or 20 years, the electron microscope has begun to contribute much more to our knowledge. This is particularly so in regard to the ultramicroanatomy of the rods and cones and the synaptic interconnections in the retina. Our current knowledge of the anatomy of the rods and cones has been summarized in some recent contributions of Young (1967, 1969). He describes not only what appears to be a major anatomical distinction between rods and cones, but also presents adescription of how these delicate photoreceptive cells repair themselves after damage. Figure 3.14 shows the structure of a typical rod and cone from the retina of a frog. The obvious difference in the shape of the outer segment is the primary feature, which had led to the original dichotomy of retinal receptors into these two categories; however, it can also be seen that there are a number of other differences between the two types of cell. Most notably, the outer segment of the rod is considerably longer. There is also an oil drop let typically found in the frog cone's inner segment, which mayaiso have some optical properties associated with color vision. A major structural aspect in Young's drawing is the fact that the outer segments of both the rod and the co ne are connected to the rest of the cell by a very thin stalk. The presence of this stalk was first reported by Oe Robertis (1956). It is presumably through this connecting cilium that the metabolic necessities and newly formed protein must be transported. An important related fact is illustrated in Figure 3.15. While the outer segments of both rods and cones had long been known to be composed of flat lamella or disks, Young/s electron-micrographic studies showed that there was a major difference in the construction of the rod's disks as compared to those of the cone. The disks in the rod are free-floating toward the terminal end of the outer segment, and their outer membranes are apparently quite separate from the outer cell membrane. Near the base of the rod/s outer segment, however, the outer membrane of the disk is
108
Fundamental Materials
HGURE 3.14 Drawing of a single rod and a single cone from a frag , showing the characteristic difference in shape. This figure also shows the connecting cilium, a feature not visible until electronmicrographs were taken of this cell (from Young, 1970).
apparently continuous with the outer membrane of the cello The arrangement of the membrane of the disks in the cone, however, is solely like that of the rod disks near the base; the outer membrane of the co ne disks always retains physieal continuity with the membrane of the outer cell wall. Careful further mieroscopie examination of the region beyond rod outer segments showed debris, whieh was apparently made up of fragments of cast-off, free-floating rod disks. Radiographie tracer techniques, whieh tracked the course of newly formed rod disks, also showed that new protein was first incorporated into disks near the bottom of the outer segment, and then this new disk gradually migrated to the end of the outer segment, where it disappeared. The ultima te disappearance of the labeled disk at the terminus of the outer segment was apparently associated with its "casting off" at the end of its useful lifetime. Such a renewal process in the rod is one way in whieh the outer segments of photoreceptors can be maintained and rejuvenated over the lifetime of the anima!. Young found, however, that the cone, quite to the contrary, does not incorporate new protein in the same way. In the cone,
The Anatomy of Receptors and the Sensory Pathways
109
Free-floating disks
Folding of outer cell membrane Folding of outer cell membrane Connecting cilium
Connecting cilium
FIGURE 3.15 Drawing of an etJen further magnification of the outer segments of a rod and of a cone. There is a characteristic difference between the two in the manner the disks are formed. In a rod the disks are formed from in vaginations of the cell membrane, but gradually become detached from it as they migrate toward the terminal end. In the cone there is no such disk migration, and the disks remain fused to the outer cell membrane throughout the life of the ceU (from Youn g, 1970) .
there is apparently no regeneration of disks and no migration. In fact, he believes that the cones are cone shapeq primarily because the outermost segments were created early in the life of the animal and the more proximal larger on es later in its development. No further development of new disks occurs after the animal has matured. New protein is formed and introduced into the disk structure, but rather than being entirely incorporated into a newly generated disk, at the base of the co ne outer segment, which then migrates outward, the new protein diffuses outward and is uniformly incorporated into all of the cone's permanent disks throughout the outer segment simultaneously. Dowling and Werblin (1969) have taken advantage of the electron microscope and the very large cells of the retina of the mud puppy (Necturus) to demonstrate some of the details of the interconnections between the various cells. Figure 3.16 is one of their electron micrographs showing a number of important features. One of the most notable details is that there are several different kinds of synaptic contacts present on the receptors. Synapses may be invaginated, with protrusions of one cell actually extending into the body of another cell. The synapses may, on the other hand, only be superficial, merely "denting" the surface of the recipient cello Dowling and Werblin believe, on the basis of their anatomical studies, that rods often have to make synaptic contact with bipolar and horizontal ceIls, whose perikaryon He in positions laterally displaced from the rod
Fundamental Materials
110
FlCURE 3.16 Electronmicrograph of the synaptic region at the bases of a cone of the mud puppy retina. Two different kinds of synapses have been observed. One makes only a superficial contact (SC), while the other makes contact within an invagination of the cell wall (lC). The arrows indicate synaptic vesicles in what may be a chemical synapse from a horizontal cell to a cone (from Dowling and Werblin, 1969) .
itself. In fact, they believe that many rods terminate in the region under a cone, while the cone typically has its synapse directly beneath itself in this animal. The two types of ganglion ceIl, which are believed to occur in the mud puppy, differ in that one type is mainly connected to bipolar cells, while the other is mainly connected to Amacrine cells, wh ich are themselves directly activated by bipolar conducted information. Dowling and Boycott (1966) have also used electromicrographic techniques to study the primate retina, and their findings are of special interest in the context of this book. Their summary sketch is displayed in Figure 3.17. Some of the importantfeatures of this drawing are: 1 . The primate rods and cones each have specifically shaped terminal
arborizations.
2. The rods and cones in the primate retina connect to three differ-
ently shaped classes of bipolar cells. The rods connect only to "rod mop" bipolars, while the cones connect to either "midget" or "flat" bipolars. 3 . Each synaptic invagination of a primate rod usually contains between four and seven synaptic filaments, while each synaptic invagination into a co ne contains no more than three filaments. 4. The horizontal connections at the base of the primate receptors (mediated by the horizontal ceIls) are far less complex than those at the bases of bipolars (mediated by Amacrine cells). Presumably this could allow substantially greater integrative information processing in the latter than in the former. B. The Ascending Visual Pathway Figure 3.18 is a diagrammatic presentation of the anatomy of the primary visual pathway. For all its awesome information-processing capacity, the ascending visual pathway is among the easiest of the sensory tracts to describe. We have already discussed the triple-Iayered retina with its two synaptic layers. Beyond the retina there are at least three different path-
The Anatomy of Receptors and the Sensory Pathways
c FIGURE 3.17 A schematic drawing showing the organization of the primate retina. Note particularly that Dowling and Werblin have distinguished among midget bipolars (MB), flat bipolars (FB), rod bipolars (RB), midget ganglion (MG), and diffuse ganglion (DG) cells. Note also the several different varieties of synaptic contact that are evident in this figure (from Dowling and Boycott, 1966).
c
R
R
111
R
c
R.
c
R
H MB
FB
RB
DG
RB
A
A
MG
FB
MB
MG
DG
ways through which signals can get to the cortex. The classical pathway is through the optic nerves and tracts to a single synaptic contact in the lateral geniculate bodies. From there, the signals project to at least three visual areas in the occipital cortex-the areas known as 17, 18, and 19. An additional pathway passes through the reticular system, which is thought to be associated with nonspecific activation of widely distributed portions of the cortex. A third major pathway carrying relatively specific information to the cortex, although less weIl known, has recently received additional attention. Schneider (1969) and Travarthen (1968) have emphasized the importance of a pathway from the retina through the superior colliculus with somewhat different properties than those of the geniculate pathway. This collicular pathway is characterized by very large receptive fields and seems to be more involved in gross spatial localization than in fine form perception. Kaplan (1970) has suggested that it serves a number of functions, which are quite important and observable in human psychophysical processes such as walking or texture sensitivity. Superficially, this is all that seems necessary to be said about the neuronal pathways of the visual system. However, there are a number of other extremely interesting points of anatomy, which are vital to our und erstanding of visual perception. For example, though the ganglion cell axons pass without synapse to the lateral geniculate body, there is a most important and significant sorting out of these axons at the optic chiasma, the junction point that separates the part of the visual pathway known as the
112
Fundamental Materials
Three neurons in retina
3 Optic nerve
Optic chiasma To other parts of cortex
Optic tract LGN
sc
sc
LGN
Reticular formation
Superior colliculus
4
4
Occipital cortex
To other parts of cortex
Lateral geniculate of thalamus
4
Occipital cortex
FIGURE 3.18. A schematic drawing of the entire ascending visual pathway. Small circled numbers indicate the order of the neuron in the chain from the receptor to the cortex.
optic nerve from that known as the optic tract. Each retina is functionally divided into two regions, which gather visual information from the left and right visual field respectively. In the right eye, the right field of view is media ted by the nasal hemiretina. In the left eye, the same right view is mediated by the temporal hemiretina. On the other hand, the left-hand view for the right eye is media ted by the temporal hemiretina and in the left eye by the nasal hemiretina. So many different perceptual phenomena are media ted by fusion of the corresponding images from the two left or the two right viewing hemiretina that it is not surprising to discover that the evolutionary forces operating have produced a system in which the necessary crossovers provide exactly this sort of image conjunction. As shown in Figure 3.18, ganglion cell axons from the left temporal and the right nasal hemiretina project to the left cerebral hemispheres, and ganglion cell axons from the right temporal and left nasal hemiretina project to the right cerebral hemisphere. Interhemispheric connection almost certainly exists through the corpus callosum, but such processes as steroscopic depth perception can presumably be handled from each field of view within a single hemisphere. As we shall see later, there is no known interaction between the two eyes at any level more peripheral than the geniculates of the thalamus. As we have said, visual signals project primarily to the visual centers
The Anatomy of Receptors and the Sensory Pathways
113
in the occipitallobe through the lateral (and perhaps also through the medial) geniculate bodies, as weH as through the colliculi. In addition, it is probably also true that there are other cortical destinations of visual signals. Visual information is also capable of producing generalized brain activation by virtue of its connections to the nonspecific reticular system as noted above. Furthermore, there have been many reports indicating that many regions of association cortex contain "polysensory" ceHs, which can be activated by stimuli of several different modalities. Thus, vision, audition, and somatosensation may aH be acceptable inputs to a single ceH of this type. VI. THE AUDITORY SYSTEM
A. The Anatomy of the Auditory Receptor The visual system excels at making fine spatial discriminations. The auditory mechanism, on the other hand, has evolved into the system par excellence for the interpretation of very fine temporal patterns. However, a very curious and important fact is that the ears' ability to deal with temporal patterns is, in fact, also mediated by spatial encoding processes. Later in this book, we shaIl emphasize in greater detail that the sensory systems generaIly seem to operate in this manner; that is, they do weIl with spatial discriminations, but badly with temporaiones. In those instances in which fine temporal discriminations are made by a sensory system, one generaHy finds that such performance is achieved by converting temporal codes into spatial ones. To understand how such conversions can come about in the ear, we must und erstand the structure and the ultrastructure of the acoustic mechanism and, in particular, the anatomy of the elegantly organized array of receptor cells wi thin the coiled cochlea. Surprisingly, even many of the relatively macroscopic features of the anatomy of the auditory mechanism were not known until the last few hundred years. This is probably due to the fact that some of the most important parts of the auditory mechanism are embedded within the bony structure of the skull. Initial attempts at dissection often obliterated the delicate inner structures, which we now know are critical to auditory transduction. The massive destruction of bony tissue required to observe the middle and inner ear requires special care 50 that the delicate inner structures could be salvaged. It was not until the sixteenth century, therefore, that the three ossicles, the three tiny bones of the middle ear, were discovered, and the general spiral arrangement of the cochlea and the orthogonal arrangement of the tubes of the bony vestibular labyrinth described. Even then, however, no one understood the significance of the tiny droplets of water that emerged when the cochlea was opened. It was not until 200 years later that it was confirmed that the inner ear was entirely filled with fluid in its normal state. This discovery was the key fact in the development of the several theories of auditory quality coding, which were subsequently to emerge. The major technological development, which was to 50 significantly affect knowledge of auditory anatomy, of course, was the invention of the
114
Fundamental Materials
Auditory area
Pinna
Temporal lobe
Epitympanic recess Cochlea Acoustic nerve Ossicles
Pharynx
External auditory meatus Middle ear Tympanic membrane
Eustacian tube
HGURE 3.19 Gross anatomy of the auditory system showing the external, middie, and inner ears and the associated vestibular mechanism (courtesy of B. ,. Melloni and Abbott Laboratories, N. Chicago, Illinois) .
compound microscope by Jannsen in 1590. But it was not until the beginning of the nineteenth century that it was being regularly applied to auditory structure analysis, and those years were filled with one startling discovery after another concerning cochlear microanatomy. In 1851 Reissner discovered the delicate membrane now known by his name. That same year Corti made the extraordinary contributions to an understanding of the auditory receptor tissue itself. His findings resulted in his name being associated with several parts of the cochlear apparatus, as well as the inclusive term describing the actual sensitive tissue itself-the organ of Corti. We shall consider the history of some of the theories of auditory pitch encoding that have appeared over the years in Chapter 10 of this book. But for the moment, let us turn our attention to some diagrams of the auditory apparatus that are currently accepted as examples of the anatomists' best work on the auditory system. Figure 3.19 is a drawing of the gross anatomy of the human auditory mechanism as it might be seen with the unaided eye or with a hand magni-
The Anatomy of Receptors and the Sensory Pathways
115
fying glass. The pinna of man is a vestigial organ, no longer capable of any substantial amount of directional movement, but still serving effectively as a hearing horn. The pinna collects signals from a wide area and feeds them in a concentrated form into the external ear passage-the extern al auditory meatus. It also may aid in directional hearing by selectively attenuating signals coming from behind or from one side of the head. The eardrum or, as it is otherwise known, the tympanie membrane, separates the outer ear cavity from that of the air-filled middle ear. In the middle ear is found the chain of three tiny bones, the auditory ossicles, whieh convey mechanieal perturbations from the tympanie membrane to the oval window. The ossicles not only transit the mechanieal deformations of the tympanie membrane, but they also considerably amplify the forces applied to it. This occurs because the three ossicles are arranged as aseries of levers. Thus, while the amplitude oE motion is greatly reduced, the applied force can be proportionately increased. The reduction in acoustie signal amplitude, coupled with the exquisite nervous sensitivity of the auditory system, allows dis placements oE the oval window as small as one-tenth the diameter of a hydrogen atom to be detected in psychophysieal experiments. The air of the middle ear is continuous with that of the outside environment. This continuity is normally achieved through a most unusual pathway. If the tympanie membrane is not perforated, the continuity of the air in the middle ear with the outside air is solely through the eustachian tube, whieh connects the middle ear to the throat. The existence of this tube explains why swallowing can equalize or relieve the press ure in the ears and why sore throats sometimes lead to middle ear pain or, indeed, infection. Both the external and the middle ear are nonneural processors and are capable only of modifying the amplitude of the mechanieal action of the acoustie stimulus. There is no energy transduction in these outer regions. It is only in the fluid-filled inner ear that the actual neurologically active elements reside. The inner ear is made up of two regions, whieh are functionally guite distinct: one, the cochlear mechanism, transduces acoustic signals, whereas the other, the vestibular mechanism, is associated only with the sense of balance. However, the spiral cochlea can be seen in Figure 3.19 to be continuous with the vestibular apparatus-the three semicircular canals and the two oval chambers. Therefore, both systems are mIed with the same fluid. For the moment we shall concentrate only on the anatomy of the cochlea-the auditory receptor structure. As noted previously, the optieal mieroscope is still a profoundly useful instrument, contributing important information to our knowledge of cochlear anatomy, even though the newer electron-mieroscope techniques are becoming increasingly important. Foremost, among the optieal mieroscopists working on the auditory mechanism today are Joseph Hawkins and Lars Johnsson of the University of Miehigan's Kresge Hearing Research Institute. Their new methods allow very fresh tissue from cadavers to be studied before the inevitable deterioration allowed by prolonged staining preparation procedures occurs. To introduce us to this level of magnifieation of the cochlear mieroanatomy, let us consider Figure 3.20(a), whieh is a slightly enlarged photograph of a dissection of the human coch-
116
Fundamental Materials
FIGURE 3.20 (a) Photograph of a slightly enlarged dissection of a human cochlea showing the spiral structure, the oval window (OW), and the round window (RW) (from lohnsson and Hawkins, 1967). (b) A more detailed view showing the heliocotrema (H), the organ of Corti (OC), the spiral ligament (SL), and the auditory nerve fibers (N) from lohnsson and Hawkins, 1972).
lea carried out and photographed by Johnsson and Hawkins (1967). The two and a half turns of the human spiral cochlear structure can be clearly seen. It is also possible to see dark traces, which are the bands of fibers of the cochlear nerve, somewhat beUer in Figure 3.20(b). A plane view of a cochlea at a slightly increased magnification is shown in Figure 3.21. In this figure we begin to see the structure of the mechanically sensitive tissue on the basilar membrane itself. The dark structures indicated by N are auditory neurons lying within the bony shelf or osseus lamina. The lighter band is the basilar membrane, while the slightly darker band indicated SL is the spiral ligament. IHC and OHC indicate the inner and outer hair cells, respectively. Figure 3.22(a) shows, at an even further degree of magnification, the general arrangement and the surprising regularity of the four rows of hair cells as observeq with a scanning electron microscope. Unlike the previous three figur es, this is of a cat. Figure 3.22(b) completes the series by showing a scanning electron micrograph of a tuft of stereocilia on a single hair cello Johnsson and Hawkins point out that these views from the top are somewhat unusual, yet most instructive in their presentation of the orderly
The Anatomy of Receptors and the Sensory Pathways
117
FIGURE 3.21 A more highly magnified microphotograph of a top view of the interior of a human cochlea, showing the auditory nerve fibers (N), the limbus (L), and the spiral ligament (SL), as well as the rows of inner (IHC) and outer (OHC) hair cells (courtesy of Dr. Lars Johnsson, University of Michigan).
array of receptor cells. Most students, they say, are indoctrinated into the anatomy of the inner ear from a different perspective, that of the cross section of the cochlea. The cross section is indeed better suited for helping to explain the next smallest level of microscopic reduction to which we now turn. Figure 3.23 is a very up-to-date cross-sectional drawing, showing in detail what the dark and light surface bands of Figure 3.21 can only suggest. It can be seen in this figure (which, incidentally, was also prepared by Hawkins) that the cochlea is really a tube within a tube. The scale vestibuli and the scala tympani are really two parts of a single continuous outer cavity, which interconnect through a small hole called the heliocotrema at the apical end of the cochlear coil. This continuity means that the perilymph, the fluid that fills these two scalae, is virtually identical throughout both, although small potential differences indicating some ionic concentration differences have been reported by some investigators. The scala media, however, is separated from the other two scalae by two barriers. The superior wall of the scala media is made up of a very thin tissue known as Reissner's membrane. Reissner's membrane is believed to be only a single cell in thickness and has virtually no mechanical properties. However, it does act as an ionic barrier and thus helps to maintain substantial ionic concentration differences between the perilymph filling the scala vestibuli and tympani and the endolymph filling the scala media. Also preventing perilymphatic-endolymphatic mixing is the tripartite inferior wall of the scala media, a group of tissues which we observed from the top in Figure 3.21. One major portion of the scala media's inferior wall is the bony shelf, the osseous lamina, through which the fibers of the cochlear nerve pass. From the terminus of the bony shelf, the basilar membrane, the structure supporting the other tissues of the acoustic receptive mecha-
118
Fundamental Materials
HGURE 3.22 (a) A scanning electronmicrograph at an even greater magnification, showing the highly ordered arrangement of the three rows of outer hair cells (OHC) and the one row of inner hair cells (IHC) as weil as indicating the Pillar cells (P) in the cat's basilar membrane (X 1000). Note the V-shaped tufts of stereocilia on the outer hair cells, which are visible only because the tectorial membrane has been removed (courtesy of Mr. Robert E. Preston of the University of Michigan) (b) A scanning electronmicrograph at an even high er magnification (X 15,000), showing the stereocilia of a single outer hair cell from a baboon (courtesy of Dr. Hans Engström of the University of Göteborg).
nism itself, constitutes the next portion of this inferior wall. Finally, when the basilar membrane terminates, the spiral ligament, attached to the outer bony wall of the skull, completes the barrier. The receptor cells and the associated supporting cells that make up the organ of Corti, a major portion of the basilar membrane, are of the greatest importance for us, and we shall now consider them in some detail. In addition to the specific nervous tissues, the receptor cells, and the axons of the cochlear nerve, various kinds of supporting cells are also found within the organ of Corti. Deiter's cells, Jansen's cells, and the cells of Claudius have all been identified in Figure 3.23. While these cells perform important functions, they are not thought to be involved in the auditory transductive process. Another important accessory structure, the triangular and cartilagenous arches of Corti, at one time was thought to be tuned resonators primarily responsible for acoustic quality encoding (see Chapter 10). It is now thought, however, that they merely are supportive structures. The elements that are thought to be of the greatest significance in the transductive process are the tectorial membrane and the hair cells. The tectorial membrane is a flap of tissue growing from the spiral limbus. What is now known to be its outer edge, the reticular lamina, was once thought to be an outer sheaf on the layer of Hensen's cells, but most acoustic anatomists now believe this to be an error due to artifacts in the older microscopic techniques. The tectorial membrane is loosely attached to the limbus and is able to move in relation to the hair cells, which are more rigidly
The Anatomy of Receptors and the Sensory Pathways
119
coch1e,.coch1e,.coch1e,.-
Scala vest ibu li (perilymph) [ Na+) > [K+)
flltf~'hrt', flltf~'hrt',
coch1e,.coch1e,.-
Scala media lcochlear ductl (endolymph) [K+) > [Na+)
coch1e,.coch1e,.coch1e,.-
coch1e,.-
~T'r'w, ~ 1>"1>" coch1e,.-
flltf~'hrt',
coch1e,.coch1e,.-
flltf~'hrt', flltf~'hrt',
coch1e,.coch1e,.coch1e,.-
flltf~'hrt', flltf~'hrt',
coch1e,.coch1e,.coch1e,.-
coch1e,.flltf~'hrt',
coch1e,.-
coch1e,.flltf~'hrt', coch1e,.-
coch1e,.coch1e,.coch1e,.flltf~'hrt', flltf~'hrt',
coch1e,.-
coch1e,.-
flltf~'hrt',
coch1e,.- flltf~'hrt', coch1e,.flltf~'hrt', flltf~'hrt',
flltf~'hrt',
coch1e,.-
coch1e,.coch1e,.flltf~'hrt', flltf~'hrt',
My.." My.." coch1e,.-
Scala tympani (perilymph) (Na+) > (K+)
flltf~'hrt',
FlGURE 3,23 Cross section of one coi! of the cochlea, showing the elaborate structure of the organ of Corti (courtesy of Dr, loseph Hawkins, University of Michigan from Best and Taylor, 1966).
supported by the basilar membrane. This relative movement oE the tectorial membrane with regard to the hair cells results in a shearing action, in which the tiny cilia (the hairs) oE the hair cells are bent or otherwise displaced. It is this action that is considered to be the primary sensory action in acoustic transduction. Hair cells He in four parallel rows as shown in both Figure 3.21 and Figure 3.22. Three oE the rows are outside (away from the center oE the spiral) oE the arches oE Corti and are known as the outer or external hair cells. There is but a single row of inner or internal hair cells located inside (toward the center oE the spiral) the arches. To understand the structure oE these hair cells and the significance oE their structure in developing an ex-
Fundamental Materials
120
H
B
MV
sc sc
NE 2
sc NE I
Gr
FICURE 3.24 Drawing of a guinea pig's auditory inner hair cell, showing the cilia or hairs (H), the basal body (R), the microvilli (MV) on the surrounding supporting cells (SC), the cell's nucleus (Nu), two kinds of synaptic terminals, one of which appears to be afferent (NE 1) and the other efferent (NE 2), and some granulated structures (CR) which are possibly axo-axonal junctions (from Engström, Ades, and Hawkins, 1965).
planation of the transductive process, we must increase our magnification and descend to the even more microscopic level of cellular ultrastructure. It is at this level of magnification that the electron microscope is the only instrument capable of providing the necessary degree of resolution. Figure 3.24 (taken from the works of Engstrom, Ades, and Hawkins [1965]-the basis of much of the rest of our discussion on acoustic haircell microanatomy) is a drawing made from electron micrographs of an inner hair cello An important feature of the acoustic receptor cell is that it has no elongated axonal process. It is unguestionably a true neuron, but like the rods and cones of the eye, it synapses almost immediately. Another extremely interesting anatomical observation made by Engstrom and his colleagues, which could be of special significance in our interpretations of receptor inter action (see Chapter 7), is that there are two different classes of synapse among the several that are usually observed to make connection at the base of each inner hair cello The first type is a heavily granulated structure, while the second exhibits much less of this granular intracellular material. It has been suggested that this anatomical difference
The Anatomy of Receptors and the Sensory Pathways
121 H
H
sc
HGURE 3.25 Drawing of a guinea pig's auditory outer hair cello Abbreviated captions are the same as in Figure 3.24 (courtesy of Dr. Hans Engsträm of the University of Gäteborg).
reflects the fact that both afferent and efferent neurons synapse with the hair cell, since the granules may be synaptic transmitter substance. If this suposition is correct, it would be a most important fact, for it would mean that central nervous activity or activity from adjacent receptors would both be potentially capable of modulating the sensitivity of the receptor cell itself. The complications that such feedback mechanisms would introduce into the stimulus-response relationships even at the receptor level are profound, indeed. Perhaps the most important portion of the hair cell is the almost invisible body that lies under the basal end of the hairs or, as they are more correctly known, the stereocilia. Whatever the mechanism of the auditory transduction, which is ultimately accepted as the actual one, there is not much doubt that the basal body will be found to play a very significant role. It is almost certain that it is the junction at which the mechanical action is finally converted into a neural signal. Figure 3.25 (also from Engstrom, Ades, and Hawkins, 1965) depicts a typical outer hair cello The cilia of the outer hair cell, more numerous than on the inner hair cell, He atop a similar basal body, which once again is thought by many of the workers in this field to play the key role in auditory transduction. The synaptic arrangements at the base of the outer
122
Fundamental Materials
hair cell are somewhat more regular than those found at the base of inner hair cells, but both the afferent and efferent types of synapse also appear to be present. The efferent cells, if that is indeed what they are, appear to envelop the afferent ones, the latter being connected more centrally to the base of the outer hair cello Such an arrangement would allow considerable interaction. It should also be noted that the very intimate relationship of the possible candidate afferent and efferent cells at the base of the hair cell raises another possibility. The two may actually be communicating with each other directly rather than indirectly via the altered properties of the receptor. This, then, is a structural analysis of auditory receptor cells. As with any other sensory system, this sort of transductive structure would be irrelevant unless information could be conveyed to the central nervous system. We shall now consider the anatomy of the ascending auditory pathway, which serves this communication function. B. The Ascending Auditory Pathway Figure 3.26 diagrams the ascending auditory pathway. As we have seen, cochlear hair cells, the primary auditory receptors, do not have any axons of their own. Nevertheless, they certainly must be considered to be the first neuron in the auditory chain. Synaptic connections at their base connect the hair cells to the dendrites of the bipolar cel1s of the auditory or cochlear nerve. The cell bodies of the fibers in the cochlear nerve collectively make up the spiral ganglion, which is embedded in the bone of the skull adjacent to the cochlea. The cochlear nerve, itself, bifurcates in the ventral cochlear nucleus immediately upon entering the medulla. At this branching point, one of the resulting bands of fibers immediately synapses with other neurons, which then pass horizontally across the medulla, sending some fibers to the ipsilateral and some to the contralateral superior olivary nudeL The other branch of the cochlear nerve goes directly to the dorsal cochlear nudeus and, after an additional synapse, crosses over to the superior olive. Ascending fibers from the superior olives pass begin their ascent through the midbrain along the great tracts known as the lateral lemnisci. Once again, however, these tracts bifurcate, and as can be seen in Figure 3.26, some portion of the fibers synapses again at the mesencephalic nudeus of the inferior colliculus. Again the band of fibers bifurcates. Some fibers go directly to the medial geniculate body of the thalamus, while others cross over to the opposite inferior colliculus before ascending to the contralateral medial geniculate body. From the geniculates, the final neuron, the fifth or sixth in the chain, projects directly to the auditory cortex, which is buried within the lateral fis sure of the cerebral hemisphere. VII. THE VESTIBULAR SYSTEM
A. The Anatomy of the Vestibular Receptors Figure 3.19, in addition to depicting the gross structure of the cochlea, also showed the macrostructure of the rest of the fluid-filled inner ear. The ad-
123
The Anatomy oE Receptors and the Sensory Pathways Auditory cortex
Auditory cortex
6
6 Auditory thalamus
Superior colliculus
5
5
5 Inferior colliculus
4
Lateral lemniscus
To all portions of cortex
4
4 Reticular formation
4
3 Dorsal cochlear nucleus
Lateral lemniscal nucleus Superior olivary nucleus
Ventral cochlear nucleus.
3
2
3
Trapezoid body Medulla
FIGURE 3.26 A schematic drawing of the entire ascending auditory pathway. Small circled numbers indicate the order of the neuron in the chain from the receptor hair cells to the auditory cortex.
ditional structures, oE course, do not serve an auditory function, even though they share a common fluid medium and a common embryological history with the cochlea. Rather, the canals and bulbs of the vestibular system are the basic receptors of a sensory system responsible for the maintenance of balance and of our spatial relations to accelerative fields. The vestibular receptors are made up of five cavities: two nearly perfectly oval bulbs, and the three semicircular canals.
124
Fundamental Materials
Cupula
Hair cells
nCURE 3.27 Drawing 01 the crista ampullaris 01 the vestibular system's semi-
circular canals. The gelatinous cupula transmits mechanical energy to the hair cells (from Flock, 1971, after Wersäll).
The two oval structures, the sacculus and the utriculus, are immediately adjacent to the basal end oE the cochlea. The sacculus is actually connected to the cochlear inner ear by a tiny duct-the canal of reuniens, thus providing continuity oE the endolymphatic fluid throughout the inner-ear structure. The utriculus, in turn, is connected to the sacculus through another tiny duct. The three semicircular canals all terminate on the surface of the utriculus in enlarged bulblike regions known as ampullae. The inner surface epithelium of these ampullae and of the utriculus and the sacculus contain the vestibular receptor structures, which are capable of being activated by the accelerative Eorces operating on the fluid medium. Figure 3.27 is an interpretive drawing of a micrograph of a cross section of the ridgelike crista ampullaris, the receptor structure found in the
The Anatomy of Receptors and the Sensory Pathways
125
Otolithic membrane
Hair cell Type
1
Supporting cell
Hair cell Type 11
Nerve fibers
FICURE 3.28 Drawing of the receptor area of the utricle and saccule, showing the gelatinous membrane (analogous to the cupula of the crista ampullaris) supporting calcium carbonate crystals or otoconia (from Iurato, 1967).
ampulla of the semicircular canals. (This figure is taken from Flock, 1971, but originally had been drawn by Wersäll, 1956.) The gelatinous cupula acts as an intermediary mechanical linkage to convey the forces imposed on the semicircular canal fluid to the epithelium of the crista and, thus, to the receptor cells. In this figure can also be seen the epithelium containing the receptor cells on the surface oE the crista. In the utricular and saccular regions, a different sort of receptor structure exists. Their surfaces are also covered with a specialized epithelium containing receptor hair cells, but the epithelium is arranged more in a flat sheet, and the gelatinous material is organized differently. Figure 3.28 (Erom Iurato, 1967) is a greatly enlarged drawing of a large region oE the sensory epithelium from the utricule. This figure shows that the gelatinous mass, which covers the sensory epithelium, is in turn covered by a fine layer of calcium salts known as otoconia. The difference in the structure of this otoconial layer and the cupula is the basis oE a major difference in the function oE the vestibular receptors of the saccule and utricule on the one hand and the ampullar cristae on the other. The ampulla cristae are activated entirely by velo city changes resulting from rotary or linear motion of the head. The three semicircular canals, therefore, are primarily sources of information about changes in the velocity dvl dt of the head, and thus can be thought of as being the transducers for discontinuous accelerations. The receptors of the utricule and saccule on the other hand, because oE the overlying layer of otoconia, are continuously activated by constant accelerative fields like gravity. As such, they constantly provide information about the orientation of the head rather than discontinuous accelerations as do the semicular canals. Thus, armed with the general anatomy of the vestibular macrostructure, we are now prepared to descend to a more highly magnified level of microscopic examination. This particular body of knowledge results Erom the electron-microscopic investigations of Ades and Engstrom (1965) and Wersäll (1956). As in the cochlea, there are two distinguishable varieties
126
Fundamental Materials
KC H
KC
H
KC H
FICURE 3.29 A Type 1 vestibular hair cell, showing the kinocilium (KC), the stereocilia (H), some mitochondria (M), the cell's nudeus (Nu), and, as with the auditory hair cell, two kinds of synaptic connections. One afferent type Occurs in several locations (arrows) on the interface between the hair cell and a surrounding "calyx" (NC), while the other appears to be efferent (NE 2) (from Ades and Engström, 1965).
oE vestibular hair cells. The two types are distinguished on the basis oE their shape as weIl as the nature of the synaptic connections between the vestibular hair cell and its connecting axons. As Engstrom and Ades point out, however, these two categories may be but two extremes of a continuum, which actually includes many different intermediate varieties. Figure 3.29 is a typical Type 1 vestibular receptor cell. It is seen to be flask shaped and to contain all of the usual cellular metabolie apparatus necessary for the maintenance of any ceIl, such as mitochondria and the nucleus. Like so many other of the receptor cells we have seen so far, it is a hair or ciliated cello Each cell possesses a single kinocilium and as many as 70 smaller stereocilia arranged in an orderly array. While the kinocilium appears to contain an extension of the cytoplasm of the receptor ceIl, the stereocilia appear to be extensions of the cuticular base plate overlaying almost all of the distal end of the cell. Just as with the auditory hair ceIl, it is possible that this cuticular plate is, in fact, the critical organ of trans-
The Anatomy of Receptors and the Sensory Pathways
127
duction for vestibular sensitivity. Thus, it is suspected that this is the point at which the primary sensory action occurs, but this is not certain and extremely difficult to establish one way or the other with our current technology. Type 1 cells have two very distinct kinds of synaptic connections. The first, according to Ades and Engstrom, is a"calyx," which almost completely encloses the proximal end of the receptor cell body itself. This structure is indicated by the label NE1 in Figure 3.29. Synaptic contact is believed to be made at the slight indentations where the membrane of the calyx seems to enter invaginations of the membrane of the receptor cello In electron micrographs little, if any, granular structure can be observed in the calyx and, thus it is presumed to be afferent. However, the second type ofaxonal connection-the sm aller synaptic knob indicated by NE2 in Figure 3.29-has rich granulation. Ades and Engstrom infer from this structural difference that this second type of synapse is an efferent one and may provide a means of modulating receptor sensitivity by central mechanisms in a way similar to that hypothesized in the auditory system. Figure 3.30 shows the structure of a Type 2 vestibular epithelial hair cello The terminal end of the Type 2 cell is also covered with a microforest of cilia. They appear to be organized almost identically to the array found on the Type 1 cello The major differences between the Type 1 and the Type 2 cells are found in the overall shape of the cell and the nature of the synaptic junctions. Type 2 cells are much more regular cylinders with little evidence of the bulbous or flask shape of the Type 1 cello Similarly, the synaptic calyx typically enclosing the Type 1 cell is a feature completely absent from the Type 2 cello Synaptic connections for the latter type are, in general, far simpler in structure. The tiny invaginations, where the synaptic junction is thought to occur, can still be seen, however. These synapses also seem to be divided into two different categories, on the basis of the amount of granular structure seen in electron micrographs, and, therefore, both afferent and efferent communications are also indicated for in Type 2 vestibular hair cells. An interesting feature of both types of vestibular hair cells emphasized by Ades and Engstrom is that the second sm aller type of hair, the stereocilia, are very regularly graded in length. The further a stereocilium is from the kinocilium, the shorter it tends to be. This arrangement, Ades and Engstrom speculate, may serve an important function by "polarizing" the hair cell, so that forces applied in one direction are more effective than in another. In such a mann er, directionality might be introduced into vestibular sensation. Other investigators (Wersäll and Flock, 1965) have suggested that the polarization is effective enough to entirely reverse the neural effect of dis placements in opposite directions. According to them, displacement of the hairs in one direction will inhibit the cell's activity, while displacements in the other direction will lead to excitation. Considering the possible direct relation between the mechanical distortion of the base plate by the hairs and the permeability to ion flow, such a possibility seems eminently feasible. Another interesting and slightly different point of view has been expressed by Hillman and Lewis (1971), based on scanning electron rnicro-
Fundamental Materials
128
KC H
KC
KC H KC H
FIeURE 3.30 A Type 2 vestibular hair cell; nomenclature as in Figure 3.29 (trom Ades and Engström, 1965).
graphs of the vestibular hair cells. They have observed that the many stereocilia are attached to a bulblike growth on the end of the single kinocilium. This arrangement is clearly seen in Figure 3.31, a reproduction of one of their scanning electron micrographs of the hairs of a single vestibular receptor cello As we saw earlier, the stereocilia all emerge from the cuticular base plate, while the kinocilium is simply attached to a portion of the cell membrane. Hillman and Lewis' suggestion is that the mechanical linkage of the stereocilia and kinocilia acts to produce a "plungerlike" motion, maximizing the conversion of the mainly transverse motion of the stimulus into the up-and-down motion at the base of the kinostereocilium. From Hillman and Lewis' point of view then, the base plate may not be the key factor in the transduction, but may merely serve a mechanical role assisting the nearby cell membrane of the cell to do that important job. There are also a number of other anatomical curiosities that may be
The Anatomy of Receptors and the Sensory Pathways
129
neURE 3.31 A scanning electronmicrograph of the cilia of vestibular hair cells, showing the bulbous ending on the end of the kinocilium and the arrangement whereby back-and-forth motion is converted into a plungerlike up-and-down motion 0>
0 ...J
FIGURE 6.25 The data of Figure 6.24 converted to loglog coordinates. (a) Spike frequency. (b) Generator potential amplitude. Both functions display power function exponents of 0.28.
0.6 0.4 0.2 2
2
2
2
Log stimulus intensity
(b)
presented, namely, a log-linear plot. Fuortes noted that these scales were not perfectly straight lines on this scale and, therefore, were not strictly logarithmic functions. Fuortes' data have been replotted on log-log scales in Figure 6.25 (a) and (b). In this case, the data for both the receptor potential and the spike frequency are much more closely approximated by a
290
Sensory Coding
30
~ c:
20
Q)
:::I
~
~
Q)
.>t
~ 10
10
10 10 10 mV Generator potential amplitude
FlGURE 6.26 The linear relation between spike frequency and generator potential amplitude (from Fuortes, 1958).
straight line. Both functions are, therefore, better fit by apower function with an exponent of about 0.28 than by the logarithmic function originally suggested by Fuortes. (When two functions have the same exponent on a log-log plot, they must be linearly related, and therefore the frequency of spike firing is a linear function of receptor potential in this preparation.) This improvement in fit, it should be noted, is not intended as a criticism of Fuortes' original analysis. It is simply a historie fact that much of the early work was influenced by the Fechnerian psychophysics, and it was not until the early 1960s that Stevens' demonstrations of the apparent ubiquity of the power function began to influence neurophysiology. Thus, many of the older reports, for example, Hartline and Graham (1932) or MacNichol (1956), report approximations to log functions, while more recent experiments (see below) seem most often to report power functions. It is hardly likely that the biology of the organisms has changed over the years. Rather, the perspective of the investigators and the mathematical models available to hirn changed in ways that have led to different descriptive models. This is, once again, evidence of the compelling force of a perspective or an idea in shaping even our view of raw data. The most important result of Fuortes' experiment was that the receptor potential amplitude and the spike action potential frequency are linearly related. The extreme linearity of this relationship is evidenced in Figure 6.26, also a reproduction of one of Fuortes' graphs. The implications of this important set of observations are clear. The response compression, which is observed in this system, is almost entirely accounted for within the initial transductive process. c. Other peripheral visual nerve compression functions. As an additional example of the sort of response dynamics that can occur in peripheral nerves emanating from photosensitive receptors, let us consider a completely different invertebrate preparation. Uttal and Kasprzak (1962) studied the stimulus-response relationship in the axons conducting im-
The Coding of Sensory Magnitude
291
2.6 2.4 2.2 111
. c:
FICURE 6.27 The relation- .,& 2.0 ship between the logarithm n = 0.5 of the stimulating light in- .3~ 1.8 Approximate tensity and the logarithm of 1.6 threshold the frequency of the spike response of the caudal pho1.6 toreceptor of the crayfish, 1.2 displaying apower function 0 0.2 0.4 0 .6 O.8 1.0 1.2 1 .4 1 .6 1 .8 2 .0 exponent of about 0.5 (from Log stimulus Uttal and Kasprzak, 1962).
pulses from a curious photoreceptor located in the most caudal ganglion of the ventral nerve cord of the freshwater crayfish. This unusual receptor organ has an absorption spectrum which is very much like that of the rhodopsin found in the human eye, but apparently the photopigment resides in only two cells, one on either side of the terminal ganglion in this bilaterally symmetrical ventral nervous system. A pair of sdssors can be simply used to cut away the chitinous shell exposing the ventral nerve cord and the caudal ganglion. The entire nerve cord was then hung over a gross hooklike platinum recording electrode, which fed signals into an ac coupled preamplifier. Since there were only two cells present that were photosensitive, illumination of the crayfish's tail gives rise to spikes in only a pair of large axons in the ventral cord. The preparation could be made into a single rather than double cell one, even though no microdissection or microelectrodes were used, by simply splitting the easily sepaated nerve cord into two parts. Illumination of the caudal ganglion gives rise to a stream of regularly spaced responses with, as we have mentioned before, very long latendes and relatively modest frequendes. Simple counting procedures were used to measure the number of responses that occurred in aperiod of 5 sec after the response had settled down from the initial turn-on transient. Figure 6.27 is a plot on a log-log scale of the number of responses that were obtained in that period as one varied the stimulus intensity. The form of the function is once again a straight line with the slope indicating an exponent of 0.53. Similar compressed response dynamics are obtained in the visual system of vertebrates, as weIl as the invertebrate systems we have described. As convenient as invertebrate preparations are with their large cells and easy dissection procedures, it still would be more interesting to observe the same functions in an animal much doser in the phylogenetic tree to man than limulus or a crayfish. Hartline, who won a Nobel Prize for his extensive contributions to visual knowledge on the basis of his work on limulus, did not limit his career to that single anima!. He has also done some very
292
Sensory Coding
50
Max ~ c:
40
GI j
Initial
0-
...
\!! 30
GI
~
'e. CI)
20 4 sec
10
-4
-3
-2
Log light intensity
-1
FIGURE 6.28 The relationship between the logarithm of light intensity and three different measures of the spike action potential frequency in the frog' s optic nerve fibers. Data marked with circles are for the maximum frequencYi da ta marked with solid dots are for the initial frequency spurt; and those marked with x are for a stable period 4 sec after o the stimulus was turned on (from Hartline, 1938).
important work on the eye of the frog-another one of those extremely useful mode~ systems in visual neurophysiology. Over three decades ago he published a paper (Hartline, 1938) reporting, among other characteristics, the relation between the spike action potential response of single axons in the optie nerve and stimulus intensity. This paper is still relevant and important in the context of our current discussions. Hartline dissected out small bunches of fibers from the frog's optic nerve and laid them across a wick electrode saturated with a conductive electrolyte. The wiek electrode technique is not used very often these days, and the quality of the recordings, whieh were obtained with the galvonometrie oscillograph used by Hartline, would not be considered to be of the same high level routinely achieved with metal or glass mieropipettes and the modern oscilloscope. Nevertheless, his findings have retained their utility because of their clarity and insightfulness. The general nature of the results of Hartline's experiments begins to give a hint of why some of the data that we would like to have, namely, simple functional relationships between stimulus intensity and response magnitude, are so difficult to obtain in the vertebrate visual system. The pattern of response is not as simple as the invertebrate data of the preceding discussion. Some cells were found to produce responses only when the light was turned on, others only when it was turned off. Others were found to produce bursts at both the onset and offset of the stimulating lights, while still others gave some combination of the two, with a more or less constant maintained response during the period of stimulation. Figure 6.28 is a reproduction of one of Hartline's graphs for a fiber that produces an on transient and then a maintained discharge during the course of the stimulus. The three curves drawn on this figure represent only three of the different measures that might be examined to determine the effect of stimulus intensity. The curve indicated by the open circles is based upon measures of the maximum frequency achieved during the brief
293
The Coding of Sensory Magnitude (a) 0.28
1.6
n = 0.28
1.4
iS c 1.2 CI> ::I
~
1.0
CI>
.8
....
.>t 'Ö, UI
Cl
FIGURE 6.29 The data for the 4-msec stahle period in Figure 6.28 converted to loglog coordinates, displaying a power function exponent of 0.30.
.3
.6 .4 .2
-4
-4
-4 -4 -4 Log stimulus intensity
-4
on burst. The curve indicated by the filled circles is for the initial frequency of the burst of spikes in the on transient, while the curve indicated with x is the function for the sustained activity after 4 sec of steady illumination. The curve for the sustained activity is the one that we have chosen to emphasize for the purposes of the present discussion. But, is it fair to do so? Why this measure rather than any other, and why this type of cell rather than a simple on cell or a simple off cell? Is there anything special about the sustained activity? In general, we cannot answer these questions any better than to simply say that the choice of the maintained activity in this type of cell is more in conformity with both the subjective experience and the data from the simpler nervous systems of lower animals as well as the findings from the somewhat simpler structures of other vertebrate senses. In Figure 6.29 we replot Hartline's data for this maintained activity on a log-log scale. It, too, appears to be well fitted by a straight line, suggesting apower function with an experiment of about 0.30, a low value typical of many visual systems. According to Lipetz (1971), it is often the case in vertebrate preparations that the optic nerve ganglion cell axons do not respond in proportion to stimulus intensity. Rather, the responses of most of the cells seem to be dependent upon some specific feature of the spatio-temporal or quality pattern of the stimulus-a matter which we shall discuss in great detail in Chapters 7 and 8. A few attempts to study the response dynamic of the ganglion cell have been made, however. Most notably Easter (1968) obtained this sort of data as an adjunct to a most interesting study of spatial summation among photoreceptors. He used isolated goldfish retinas in his experiments and limited his experimental sampies to those ganglion cells that gave on responses to red light, thus simplifying what could have been a much more complicated set of findings. Flashes of light, which were
294
Sensory Coding
no longer than 0.1 sec, were used as stimuli to mlmmlze lateral inhibitory interactions between the two stimulus spots that were used in some other parts of the experiment. Easter's findings go far beyond the single point that we are interested in here, but for the purposes of our discussion, it is most significant to note that he found an average exponent of 0.55 for the 130 ganglion cells he sampled in this unusually rich experiment. d. Central visual nervous activity. In spite of all the work that has been done on the central portions of the visual system, there is also little that can be specifically cited as pertinent to the problem of intensity coding. This is, perhaps, also due to the fact that so many variables other than simple intensity are important in the specification of the amplitude of the response, that in fact intensity actually plays only a minimum role. On the other hand, it may just be a matter of where the energies of neurophysiologists have been expended in recent years, and the pertinent data will appear sometime in the near future. Perhaps the only study relevant to the specification of the response dynamic in the visual cortex is areport on the relations hip between visual stimulus intensity and the response amplitude of the evoked cortical potential made by Loewenich and Finkenzeller (1967). They found that the response dynamic could be characterized by apower function with a very flat slope. The exponent of this function was only ab out 0.2. Low exponents of this sort also seem to be very characteristic of the few other studies of central visual intensity functions, according to Stevens (1971). It should also be noted that the opponent mechanisms, which seem to operate in the central visual nervous system, make it difficult to specify the nature of the relationship between stimulus intensity and response amplitude. Some cells respond to increased light intensity by decreasing their rate of activity, while others respond by increasing their rates. The nonopponent cells observed by De Valois (1965), which are presumed to be associated with the representation of the overall brightness levels in the lateral geniculate body, seemed to be encoded with power function response dynamics, but the slopes could be either positive or negative in accord with the inhibitory or excitatory role played by a given cello Again the reader's attention is directed to Table 6.2, where the key details of this discussion are summarized. 3. lntensity-response Relations in the Chemoreceptive Systems. In some-
what surprising contrast to the paucity of visual data, there is an extensive literature on olfactory and gustatory intensity functions. This is, at least in part, due to the fact that although we find considerable diversity of response as one sampies from cell to cell in the chemical senses, there is little evidence of any of the very complex specific sensitivity to features of the spatio-temporal pattern of the stimulus as is found in the visual receptors. A related but contrary reason why the chemical sens es have been particularly well-trodden avenues of research for studies along the intensity dimension is that it is very difficult to control the spatial and temporal aspects of their adequate stimuli. A puff of odorous vapor cannot
The Coding of Sensory Magnitude
295
be as easily limited to a duration of only a few milliseconds as can an acoustic, a visual, or even a somatosensory one. Because of the very small size of some of the fibers conveying chemoreceptive information, many of the studies reported use summa ted or integrated measures of large numbers of spike action potentials recorded simultaneously from many different axons. This sort of measure is not the same as the conventional compound action potential, which is the result of a sort of physiological integration, but rather is a pseudocompounding produced by electronically integrating the observed neural activity. Thus, individual spikes, and perhaps also graded potentials, are synthetically accumulated into a single overall global estimate of the magnitude of the response. a. Olfactory generator and gustatory receptor potentials. The classic studies of olfactory receptor potentials were originally carried out by Ottoson (1956). He first demonstrated the slow potentials now known as the electroolfactogram (EOG). We have already pointed out in our chapter on transducers that there have been some questions raised with regard to the validity of Ottoson's claim that the slow potentials recorded with his technique are, in fact, summations of the individual olfactory receptorgenerator potentials; however, we shall still consider his results as the best possible candidate so far proposed for this role. In fact, as we shall see, the Ottoson potentials' response dynamics are sufficiently similar to those obtained with other better established generator potentials to add to the credibility of his hypothesis. To briefly recall his technique, Ottoson simply placed silver-silver chloride electrodes on the exposed olfactory epithelium of a frog and led off the signals to a dc coupled amplifier system. Stimulation was usually accomplished in his experiments by simply blowing small quantities of air located with the odorous substances across the olfactory epithelium. While Ottoson's monographs deal with a wide variety of different topics, we shall concentrate only on the relative magnitude of the response produced by varying concentrations of butanol-a substance chosen by hirn for extended investigation because of its high solubility in water. Figure 6.30 is a reproduction of Ottoson's data for concentrations ranging from those that were so 10w that only immeasurably small amounts of the stimuli were present in otherwise pure air to molar concentrations of the stimulus as high as 0.1. His original data are plotted on a log-linear sca1e in Figure 6.30 and can be seen to deviate considerably from a straight line, indicating that a logarithmic fit is not very satisfactory. In Figure 6.31 we have rep10tted these data on a log-log sca1e, and the fit to a straight line is much better. The slope of the data in this figure indicates an exponent of about 0.4 for the power function. This is comparab1e to 0.6 found by Stevens (1961) for olfactory subjective magnitude estimates for heptane and similar substances. Kimura and Beid1er (1961) have studied the intensitv function of the receptor potential in the gustatory system. Using KCl-filled glass microelectrodes and dc coupled preamplifiers, they were able to evaluate the induced amplitude of the receptor potential as a function of the kind and concentration of the stimu1ating substance. They were able to show that
Sensory Coding
296
% 100 G>
... :aE
'0 :l
c{
80
60
51 c:
0
Q.
'"G> a:
40
FIGURE 6.30 The relationship between the amplitude of the olfactory generator potential and the logarithm of odorant stimulus strength (trom Ottoson, 1956).
20
-4
-3
-2
-1
Log stimulus strength
even at the level of the receptor cell itself, there was sensitivity to a wide variety of substances in each and every ceIl, and the transductive mechanism can be said, therefore, to be broadly tuned (see Chapter 11). This is illustrated in Figure 6.32, which shows the responses to several basic taste stimuli by a single cell and also depicts the time course of the elicited receptor potentials. Since the stimulating substance was simply allowed to flow onto the tongue and was not immediately removed, the elongated duration of this response should not be misinterpreted as being equivalent to a prolonged response to a brief stimulus. The stimulus was simply present for that entire period. All of the 10 cells studied by Kimura and Beidler were compressed in their response dynamics. Summary curves showing the average of both the receptor response amplitude and the integrated chorda tympani response to various concentrations of NaCI are plotted in Figure 6.33 on 2.2 G>
n = 0.28 n = 0.28
2.0
'0
:l .+;
Q.
E tU
51
c:
1.8 1.6
8.
1.4
Cl
1.2
'"~ 0
..J
1.0 0.8 -4
-2 -3 Log stimulus strength
-1
FIGURE 6.31 The data of Figure 6.30 converted to loglog coordinates, displaying a power function exponent of 0.4.
297
The Coding of Sensory Magnitude
HGURE 6.32 Reeeptor potentials produeed by NaCI, quinine hydroehloride, su- > E erose, and HCI, respeetively, o applied to the taste bud of a hamster 't'tl
1.4
n
=
1
1.2
S 1.0 E
~
.'g 0.8
FICURE 6.37 The data of Figure 6.36 has been conver ted en masse to log-log coordinates and replotted in this figure, indicating that it is well fit by apower function with an exponent of about 0.67.
.,c: i 0.6 E
EL MOEA
Cl
.3
DEC DPC
0.4 0.2
o
1.0
1.2
1 .4
1 .6
1 .8
2.0
2.2
2.4
Log concentration (/oImol!L)
power function is seen to be relatively good, and an exponent of about 0.67 seems to be a pretty good approximation for the four odorants. Pfaffman, Fisher, and Frank (1967) have carried out analogous studies on various taste nerves in the rat. Stimuli (quinine hydro chloride, hydrochloric acid, sodium chloride, and sucrose) were applied through glass pipettes inserted directly into the cavities of taste buds. Recording techniques were similar to the second type used by O'Connell and Mozelli gross wire electrodes picked up the signals and fed them into an integrating circuit, the output of which displayed the integral of the total amount of activity. Figure 6.38 (a) and (b) are two of their raw data records, showing the integrated response (the upper traces) produced by stimuli timed as shown by the deflection of the lower traces. Figure 6.39 presents the summarized results obtained in the experiments for the four different sapid substances. These recordings were made from the chorda tympani. However, recordings from other nerve trunks often showed different response patterns, and our choice of the chorda tympani data is, admittedly, selective. These data, originally plotted on a log-linear sc ale by Pfaffman and his colleagues, are very similar to those reported by O'Connell and Mozell for the olfactory nerves, and when they are replotted on a log-log scale in Figure 6.40, they give relatively good fits to straight lines. The exponents of the power functions represented by these straight lines for the four stimuli are all very dose to 0.53. Another relevant study has been reported by Borg, Diamant, Oakley, Ström, and Zotterman (1967) and Borg, Diamant, Ström, and Zotterman (1967). This work is particularly rich in its implications for our analysis of the relationship between stimulus intensity, on the one hand, and neural
302
Sensory Coding
1.0 NaCI
B.7(80)
H.O
1.0 NaCI
H.O
NaCI
(a) B.7(80)
H.O
1.3 Sucrose
Suc.
H.O
Rat circumvallate responses
(b) HGURE 6.38 Integrated glossopharyngeal nerve responses to NaCl and sucrose 01 various concentrations with water rinses at the indicated times. Note the difference in the time course of the response to the two different stimuli (from Pfaffmann, Fisher, and Frank, 1967).
and psychophysica1 response magnitudes on the other, for an unusua1 experimental anima1 was used in this experiment-man hirnself! Certain midd1e ear operations require that the chorda tympani be cut, and it was on1y because of the therapeutic va1ue of the operation that this extreme1y interesting data became availab1e. This work is of special importance not on1y because Borg and his colleagues made general comparisons of neural and psychophysica1 responses under identica1 stimulus conditions, but also because both these types of data were collected on exact1y the same subjects. In the course of the corrective surgery, it was necessary to cut away
The Coding of Sensory Magnitude
200
303
Rat chorda tympani
180
.,
'tl
160
:J
.'c=
Cl
n = 0.42
5
.,'c."
3
';'
2
,;,l
b
.,'" c..
,;,l
2 50
60
70
50
60
70 70
70
Sound pressure level (db)
dredge, and Davis (1962). They ins er ted macroelectrodes into the cochlea of a guinea pig, one on either side of the basilar membrane, and stimulated with brief tone bursts. This sort of recording electrode configuration responds to mierophonies, summating potentials, and compound nerve action potentials. Stevens (1970) has replotted the data obtained by Teas and his colleagues on log-log coordinates to demonstrate the relationship between the amplitude of this compound spike action potential and stimulus intensity. These data, presented in Figure 6.44, show a reasonably dose approximation to apower function with an exponent of 0.42, a value considerably less than the corresponding psychophysieal power function, but at least one whieh is monotonie over a considerable range. c. Central auditory nervous activity. The nonmonotonic and irregular nature of the acoustie nerve action potential as a function of stimulus intensity is also reflected as higher levels of the afferent pathway are examined. Rose, Greenwood, Goldberg, and Hind (1963) examined single ceU responses in the inferior coUieulus, one of the important midbrain relay stations for acoustie information. They found that over 50 percent of the ceUs in their sampie had a nonmonotonie relation to stimulus intensity i these ceUs gradually increased, and then decreased their rate of firing as stimulus intensity continuously increased. Rose and his colleagues speculate that the decrease in firing rate at higher intensities is due to the simultaneous evocation of inhibitory and excitatory signals. In contrast, working at the level of the medial geniculate body of the cat, Gross and Thurlow (1951) found that the responses they recorded with mieroelectrodes from a small sampie of cells were monotonie, although, as in the periphery, individual cells exhibited the same narrow dynamie range at this level. The acoustie response pattern, not unexpectedly, reaches its utmost level of complexity at the level of the cortex. Goldstein, Hall, and Butterfieid (1968) have explored the patterns of responsiveness of single cells in the acoustie cortex to a wide variety of stimulus conditions and have con-
310
Sensory Coding
900 800 700
> .3
600
GI
500
g> E 51 c:
400
... 'c
1::J :::l
Cochlear nucleus
0
Q. CI>
GI
ce
300
Auditory cortex
200 100
+30 +20 +10
0
-10 -20 -30 -40 -50 -60 2 db re 1 dyne/cm
FIGURE 6.45 The relationship between compound potentials evoked at the level of the cochlear nucleus and at the auditory cortex and the log of stimulus intensity (from Saunders, 1970).
cluded that the behavior of these single ceHs is highly idiosyncratic. Some ceHs increased their activity i some decreased theirs i and others changed from increases to decreases as the frequency of the stimulating tone changed or as its intensity varied. It seems apparent that simple stimulus intensity-response amplitude correlations may simply be inapplicable in the acoustic modality. The situation seems to be much more complex than found in the other modalities and depends upon interacting patterns of neural discharge at centrallevels as weH as at the lower ones. The reader should rest assured that there is no need to throw out co ding theory in general even if this is the case. We simply must acknowledge the fact that the intensity codes in the acoustic sense are tmknown at the present time. The only other speculation that seems timely is that the difficulty encountered in an analysis of the temporal codes in the acoustie sense should not be too surprising if we remember that this sense, more than any other modality, must cope with the temporal dimensions of its adequate stimulus. What about integrated response measures? In spite of the fact that the single cell data from the central auditory system are so idiosyncratic, is there any possibility that some collective measure like the evoked brain potential might reflect wide ranging monotonie intensity functions? Keidel
311
The Co ding of Sensory Magnitude
3.0
n = 0.8 Cochlear nucleus
2.8
> 2.6
.3 cu
2.4
oE
2.2
-g FIGURE 6.46 The data of Figure 6.45 have been converted to log-log coordinates, displaying a two-segment response curve in each case. Over the larger portion of the range, the cochlear nudeus and the auditory cortex responses are fit by straight lines indicative of power function exponents of 0.8 and 0.7, respectively.
f 5lc
Cl
1.6
~
n=3
n = 4.5
n=1
. .3
Auditory cortex
2.0
1.8
8.
n = 0.7
1.4 1.2 1.0
+30 +20 +10
0 -10 -20 -30 -40 -50 -60 db re 1 dyne/cm
2
and Spreng (1965) examined the rise of specific components of the acoustically evoked brain potential and found at least one of them-a late nonspecific component occurring at about 150 msec after the stimulus-had a fairly wide range of response (90 db) and also exhibited a response dynamic that could be weIl represented by apower function with an exponent of 0.36. However, other components with different latencies, or in different stages of adaptation, or even just different criteria of measurement, exhibited different exponents for their representative power functions. Whether Keidel and Spreng had actually discovered the key dimension or had simply found one that was fortuitously in agreement with the psychological data is moot. Saunders (1970) has also used compound action potentials recorded with macroelectrodes as his measure of the activity level of both the cochlear nucleus and the auditory cortex in the cat. His findings, like Keidel and Spreng's, show that these potentials, evoked with auditory dicks of varying magnitude, covaried at least over a 90-db range and appeared to be fairly linear on a log-linear plot. Figure 6.45 shows his original data, and Figure 6.46 shows the data replotted on a log-log plot to give a general estimate of the trend of his data. These data, once again, exhibit a twosegment power function, but adhering to our previous criterion, we shall only consider the higher-intensity branches. These exponents were 0.7 and 0.8 for the auditory cortex and cochlear nudeus, respectively. It is quite dear that while individual cells seem to behave idiosyncratically and even nonmonotonically, as weil as displaying narrow response ranges, there are cumulative bioelectric potentials associated with these levels of the nervous system that do seem to vary over a good portion of the auditory psycho-
312
Sensory Coding
physical range and to correlate with behavioral data to at least a first approximation. 5. A Summary of the Neurophysiological and Psychophysical Data on Response Dynamies. This, then, completes our survey of the relevant neurophysiological data relating stimulus intensities and response parameters. Admittedly, our selection of "relevant" studies has been restricted. In selecting the data used to exemplify the response of the various sensory modalities, there has been a bias on our choice, for we have almost certainly overemphasized those ceHs that do respond monotonicaHy to changes in stimulus intensities, as weH as steady-state conditions of response. The complexity of the findings in the acoustic system emphasizes that the picture may, in fact, be quite a bit more intricate than this sampling might suggest. For example, some ceHs in different nerves of the gustatory tract behave differently than the chorda tympani responses that we have selected for our discussion. Furthermore, in aH sensory modalities there is clearly a very great degree of inter action between the quality of the stimulus and its intensity. There are, of course, also some further questions about the applicability of invertebrate data to theories of mammalian perception. But aH in alt the depth of the coverage presented here does seem to support the conclusions we shaH draw below. Another reminder-our use of the power function nomenclature in these sections and our replotting of data into log-log plots are not to be considered to be equivalent to an acceptance of Stevens' power function theory. We have already no ted that there are a number of problems in its general application. But, in a descriptive sense, it does provide a unique and simple way to characterize the general degree of compression (or of expansion), which any given function is exhibiting. With these cautions in mind, it is still possible to draw certain general conclusions from our consideration of at least four of the five sensory modalities we have considered. Table 6.2 lists the sense modalities and the exponents of the power function used to describe their response dynamics for each of the several neural levels at which we have the appropriate data. The exponents have been noted for the level of the generator potential, and the single cell peripheral neuron action potentiat compound or integrated peripheral nerve action potentiat and central nervous single cell and compound action potentials whenever possible. Finally, we have also indicated those psychophysical power functions that are feIt to be closely analogous to the electrophysiological experiment. As we look over this summary table, a number of important points are immediately apparent. First, there appears to be no simple relationship between the neurophysiological exponents at any level of the nervous system and the psychophysical ones. There are virtually no neural exponents in this table or reported elsewhere in the neurophysiological literature that exceed 1.1 for any but a smaH portion of their total range. Second, the receptor potentials are typically as compressed or more compressed than any later stages in the neural processing. Therefore, most response compression apparently occurs in the peripheral transductive process. Third, even evoked brain potentials, presumably much closer to the behavior than
The Coding of Sensory Magnitude
313
the peripheral signals, do notC'orrespond to the psychophysical exponents. The evoked potential exponents are typically lower and thus must be nonlinearly related to the psychophysical functions. Fourth, there is a paucity of data of dynamic functions from the central nervous system. This is in part due to technical reasons, but also in part due to the biology of the respective situations in each of the modalities. Let us now consider the implications of these findings in more detail. The fact that so much of the compression appears immediately at peripher al levels and the fact that there is relatively little further change as one ascends the sensory pathways compel us to condude that the major portions of the response dynamic are accounted for by the very processes that produce the generator potential. It is probably the case that the specific mechanism of compression is quite different from one type of receptor to another, but whatever the underlying process, most of the compression appears at the transductive level. We also see that even some cortical evoked potentials-the neural signal that some would say comes dosest to being equitable to the psychophysical judgments-are usually compressed even though corresponding psychophysical judgments of the same stimulus dimensions often exhibit expanded power functions with exponents greater than 1. A critical and germane issue is thus raised-is it necessary to assurne isomorphism between the neural response functions and subjective magnitudes? The answer to this query is dearly no. The whole idea of co ding is based upon the notion that symbols in one domain can represent information patterns in another if there are appropriate transformation rules. Thus, in the spatial domain, linear perceptions of the spatial relations on the body can be represented by wildly distorted but topologically constant spatial representations on the cortex. (See, for example, Woolsey, 1958). But even this topological constancy, as one goes from one point to another on the somatosensory cortex, is not necessary. There is no apriori reason why spatial relations need necessarily be maintained. As long as there is some sort of coded arrangement and the decoding mechanisms, whatever they are, were programmed with the rules describing the nature of the dis order, a spatially random system is as good as a linear and well-ordered one from the point of view of representation or co ding theory. From this same perspective, it can be seen that there is no apriori need to assurne that compressed neurophysiological responses in the intensity domain could not represent or encode linear or expanded psychophysical functions. For example, the compressed response dynamic of the evoked potential may be an entirely valid representation of a linear or even an expanded psychophysical function or vice versa-isomorphism need not be maintained. Another important consideration is that there may be a sort of bias on the psychophysical estimate that is not present in the electrophysiological data. The issue here is that the form of the function for psychophysical judgments may really be a mixture of several different phenomena. One influence on the psychophysical judgment would be the pure sensory magnitude as encoded by the incoming signals in the various manners we have already considered. In addition, and overlaid on top of this purely
314
Sensory Coding
TABLE 6.2
THE EXPONENTS OF THE POWER FUNCTIONS THAT DESCRIBE THE RELATIONSHIP BETWEEN THE STIMULUS INTENSITY AND BOTH NEUROPHYSIOLOGICAL AND PSYCHOPHYSICAL RESPONSES" Modality
Vision
Audition
Somatosensation
Generator or Receptor Potential
0.04, 0.28, 0.58 0.7 1.0 (then nonconstant)
0.38
Gustation
0.32
Olfaction
0.4
Peripheral Single Cell
(4) 0.30 0.28 0.53 0.55 (2)
Cummulated
(1 )
0.8 0.42
0.59 0.52
0.53
0.53 (All) 0.5-- 0.85 (Sour) 1.0 (Salt) 1.1 (Sweet) 0.31 (Salt) 0.67
.. Data hlve been tabulated for each of the senses at several different levels of the ascending pathways. Numbers inserted in the tables are the power function exponents, whieh have been discussed earlier in the text, and entries have been made without regard to species differences. The conclusions that this table compels us to accept include: (a) Receptor or generator potentials usually have as low or lower exponents as any higher level. Most neurophysiological compression occurs, therefore, at that most peripheral level. (b) The exponents of evoked brain potentials are usually different from the corresponding psychophysieal exponents, and thus the two sets of data are nonlinearly related. (c) In general, no simple isomorphie relationship exists between the psychophysieal and neurophysiologieal data. Note: (1) In vertebrates, response magnitude is usually sensitive to temporal or spatial features rather than to absolute stimulus intensity. (2) Often nonmonotonic, narrow range, and idiosyncratie. (3) Depends on the specific component of the evoked potential measured. (4) Barlow finds three different functions that describe different portions of the overall visual intensity range.
sensory influence, are fluctuations in the decision criteria, which are certainly operating in the production of magnitude estimates by the subject. Subjects may be thrust into quite a different operating condition when, as one drastic example, small increases in stimulus intensity at high levels suddenly begin to produce painful experiences or other qualitative changes. Simply put, subjects may simply use different decision criteria for strong
315
The Coding of Sensory Magnitude
Table 6.Z (cant.) Brain Stem Single Cell
Cortex
Cummulated
Single Cell
Cummulated
0.2 (3)
(2)
0.73 0.43
0.8
(2)
0.7 0.36 (3)
0.6 (meeh.) 1.0 (elec.) (3)
Psychophysical
0.33 (5° target) 0.5 (Flash)
2.0 (elec.) 0.67 (meeh.) 1.1 (vocal sound) 0.6 ..... 0.95 (meeh.) 2.0 .... 7.0 (elec.) 0.45 (meeh.)
0.5 -0.85 (Sour) 1.0 -+-1.9 (Salt) 1.3 -2.0 (Sweet)
0.6 (heptane)
than for weak stimuli. In other words, the meaning or expected value of a stimulus may influence the magnitude estimate as much or more than its pure sensory value. Such a set of notions of the complete nature of psychophysical judgments is not meant to invalidate the subjective magnitude estimates techniques, but simply to point out that other factors than the most simple forms of sensory intensity representation may have to be considered in the analysis of the meaning of some of the data obtained with such psychophysical methods. There is no question that these techniques give rise to data that are reliable indicators of the way in which people behave (barring routine experimental confoundings), but the way in which people behave in estimating magnitudes may be only partially dependent upon sensory intensities. In sum, psychophysical measures are believed to reflect the influences of several different processes. In addition to the contribution of the sensory magnitude, per se, there are also decisional and criterion influences that make the psychophysical data deviate from wh at one would obtain if there were some direct way to assess sensory magnitude in isolation. It would be a grievous oversight if at this point we did not consider the viewpoint of S. S. Stevens hirns elf with regard to this sort of data. Stevens has published two re cent statements of his views (Stevens, 1970,
316
Sensory Coding
1971) concerning the neural correlates of the power functions, which he has found so frequently in the results of psychophysical experiments. In these papers Stevens discusses two points, which must be considered separately. First, he concerns hirnself with the question of whether or not the neurophysiological data are adequately represented by power functions. As we have seen, this is an empirical issue, which is clouded with some uncertainties, but which will ultimately be answered by critical tests of goodness of fit. Unfortunately, goodness of fit i5 only rarely tested in a formal fashion in most experiments, and future investigation will have to resolve this particular con troversy . The second issue, whieh is repeatedly alluded to by Stevens throughout his two papers, concerns an entirely separate point: the presence or absence of an isomorphie relationship between the psychophysical and the neurophysiological data. How congruent, he asks (as we have in the preceding portions of this chapter), are the findings from the psychophysical and neurophysiologicallaboratories when the stimuli are identical? Stevens' ans wer to this question is wisely equivocal. His analysis of the data from a number of senses suggests to hirn that in some instances there is isomorphism with regard to the exponents of the psychophysieal power functions. In other instances, he acknowledges that there is no congruence and that the psychophysieal data and the neurophysiological data do not agree. He says, for example, in various places in the two papers:
. . . the cochlear microphonic with its exponent 1.04 is probably not the instigator of the loudness response with its exponent 2/3. (STEVENS, 1970, p. 1047.)
Seen by electrodes in the cochlea, then the growth of the summated nerve impulses proceeds at a slower pace, with a lower exponent, than the growth of loudness in the human ear. (STEVENS, 1970, p. 1047.)
Jt appears, then, that there are at least four sense modalities in wh ich some particular aspect of the human cortical potential has been shown to follow apower function, and in which the four exponents exhibit the same relative values as those obtained in psychophysical experiments. (STEVENS, 1970, p. 1048.)
The failure of the cortical V-potentials to exhibit growth functions having the same exponents that govern perceived sensory magnitude .. (STEVENS, 1970, p. 1048.)
And finally : That outcome has thus far proved to be the most general finding with averaged evoked potentials: in those rather numerous instances in 4
As we have seen, this is not an entirely correct statement.
The Coding of Sensory Magnitude
317
which the growth of a cortical response has seemed to follow apower function, the value of the exponent has fallen systematically below the corresponding value of the exponent obtained in psychophysical studies. When two power functions differ in exponent, they are nonlinearly related. (STEVENS, 1971, p. 237.)
Later after commenting on a study by Easter (1968), in which the goldfish retinal ganglion ceH was shown to exhibit apower function with regard to stimulus intensity that had the same exponent as the human visual psychophysical data, Stevens says:
The temptation is great to conclude from the coincidence of exponents that a powerful method for the analysis of the operating characteristics of the visual transducer has at last been formulated by Easter' s splendid experiments, and that the site of the psychophysical power law has been pushed into the retina. Caution must prevail, however, for the abundant richness of current physiological findings do no more, at the present stage of kl1owledge, than signal directions for future excursions. (STEVENS, 1971, p. 239.)
Stevens' advice is weH taken in part. Looking over the papers we have reviewed in this chapter and the additional papers cited by Stevens, it is quite clear that there is at best only a partial isomorphism between the psychophysical and neurophysiological data. Stevens makes no claims to the contrary and, in addition, implies in this last quotation that he, too, has doubts about the significance of what even perfect isomorphism among corresponding exponents from the two data domains would mean. We do, however, believe that the data do speak to the problem of the locus of neural response compression, the topic of our next section. D. What Is the Site of Response Compression? FinaHy, let us return to the question of the site of the compression. It seems safe to conclude that almost aH of the compression measured in the neurophysiological studies is introduced by the receptor itself. Whenever specific comparisons are made of the generator potential amplitude with the rate of peripheral nerve spike activity, for example [in the visual system (Fuortes, 1958), in the gustatory system (Borg et al., 1967, or Kimura and Beidler, 1961), or in mechanoreceptors (Katz, 1950, Terzuolo and Washizu, 1962, and Wolbarsht, 1960)], nearly linear relations obtain. The general fact that the values of the exponents in Table 6.2 for generator potentials are less than or equal to those measured for any of the later stages of neurophysiological processing also speaks strongly for the conclusion that the response dynamic of the generator potential can be credited with contributing most to the enormous dynamic range of all sensory systems. As a final comment concerning the locus of the compression function,
318
Sensory Coding
let us reconsider the specific comparison of somatosensory evoked potential data discussed above. When one compared the somatosensory cortieal potentials evoked by electrieal stimuli (Uttal and Cook, 1964) to those evoked by natural mechanieal stimulation (Ehrenberger, Finkenzeller, Keidel, and Plattig, 1966), it was found that both the dynamie range and the exponent of the representative power function were quite different. Electrieal stimulation produces a very narrow dynamie range with an exponent very dose to I, but stimulation with natural mechanieal stimuli produces quite a wide dynamie range and an exponent of 0.6. The explanation of these differences was assumed to be based on the fact that the receptors had been bypassed by the electrieill stimulus. Implicit in this assumption is the condusion that the receptors are mainly responsible for the dynamie compression observed in the entire somatosensory system. A similar sort of comparison had been made earlier by Stevens (1961) for the auditory system using a psychophysical indicator, the exponent of subjective magnitude scales, rather than the evoked brain potential. Stevens had noted that when one evaluated the power function relating variations in the intensity of natural acoustie stimulation to subjective magnitudes, the exponent was usually found to be 0.6. However, when electrical stimuli were used (Iones, Stevens, and Lurie, 1940) to directly stimulate the cochlear nerve in subjects without tympanie membranes, the auditory experience resulting was best described by a loudness power function, whieh had an exponent of over 2.0. Stevens' earlier condusion was exactly the same as the point made here, namely, that the receptors are the major source of the compression function. IV. IS INTERVAL IRREGULARITY A CODE FOR SENSORY INTENSITY? -A MODEL ANALYSIS
So far in our survey in this chapter on the coding of sensory magnitudes, we have concerned ourselves mainly with only two candidate codes-the frequency of the spike action potential train and amplitude of either some composite wave or some graded portion of the single cell's response. Both of these candidate codes have been seen to be associated with stimulus intensity in many different physiological experiments. A new question now must be considered. Are there any other of the candidate codes, whieh we discussed in Chapter 5, that might be conceivably associated with stimulus intensity and the related sensory magnitudes? The answer to this question, in brief, is that the variance of the interspike interval ofaxonal spike action potentials does seem to correlate with the intensive dimension in a manner that appears to be independent of their mean frequency. Before we discuss the general point, though, we should note an important simple statistical fact. It is entirely possible for the standard deviation (one possible index of variability) to vary independently of the mean. To illustrate the implications of this statement, we have presented in Figure 6.47 two pulse trains. Each of these trains lasts for the same duration as the other, and each contains the same number of impulses. Thus, both have the same me an frequency. However, it is dear that the two trains are quite different along some other dimension.
319
The Coding of Sensory Magnitude
154 msec
(al
FlCURE 6.47 Two trains of spike action potentials (or stimuli) of equal mean interval but differing interval regularity.
154 msec
(bI
That dimension is the inter val irregularity or variability, or the "jitter" as it is often called, of the size of the interval between successive pulses. The fact that the me an frequency and the standard deviation of the spike train are statistically independent suggests the possibility that they may also be independent biologically, and each might act as aseparate and distinct code. However, before this hypothesis can be accepted, there are a number of questions that must be asked and answered in sequence. First, it is necessary to establish that the dimension of inter pulse variability is indeed a candidate code; that is, it must be demonstrated that interval irregularity actually occurs in natural neurophysiological situations as a consequence of some stimulus manipulation. Second, we should also establish that this irregularity information has a differential effect on synaptic junctions as the degree of jitter varies. Otherwise the information, which is contained in the variability dimension, would be simply blocked at the first synapse encountered during its ascent along the sensory pathway. Third, as we have said so often, in order to fully establish this candidate code as a true code, useful to the system for the communication of sensory information, we must show that it has some sort of behavioral effect. In the following sections, each of these questions is discussed in turn and answered affirmatively. In doing so, it is hoped that this material will also serve as an illustration of the general procedure, which is necessary to confirm a candidate code as a true code. A. Demonstrations of the Natural Occurrence of Interval Irregularity The his tory and the theory of the general problem of spike action potential interval irregularity have been reviewed by Moore, Perkel, and Segundo
320
Sensory Coding
9 (J
8
E7 GI
g
'';;
6
.~ 5 '0
4
-l3c:
3
'E
J3
SO = 0.012 + 0.141 x X
2 2 10
20
30
40
Mean interval (msec)
50
60
HGURE 6.48. The linear relationship between the standard deviation of intervals between spikes and the mean interval for peripheral neurons driven by brief me chanical stimuli. The equation of the best fitting straight line is given in the slope intercept form ::I.
5,
8N
o
2
4
5,
5,
5, 6
8
10
12
14
16
18
20
Time (msec)
FICURE 6.52 The three stimulus artifacts and the three neural responses in the experiment in wh ich human psychophysical responses were compared with human peripheral compound nerve response magnitudes (from Uttal, 1960).
of the evoked experience. In this way direct comparisons could be made between the physiological and the psychophysical response. Now let us consider the results of this experiment. It can be seen in Figure 6.53 that for four subjects the sums of the amplitude of the evoked nerve responses change in a relatively systematic way as these stimulus patterns vary. H the first two pulses are close together, then the second response will occur in the refractory period of the first, and it, the second response, will be of a lower amplitudE' than it might otherwise be. H, on the other hand, the second pulse is dose to the third one, then the third response will be in the refractory period of the second, and it, the third response, will be sma11er than it might otherwise be. Unfortunately, the osci11ographic displays of the three stimulus artifacts (the deflections produced at zero latency by the stimulus pulses) and the three neural responses are overlapping to a considerable degree. Not a11 of the three response amplitudes can, therefore, be measured for a11 conditions. The three sets of curves in Figure 6.53 are thus necessarily incomplete. Fortunately, the sum of the amplitudes of these three neural responses can be predicted on the basis of what is known of the interaction between two pulses (Uttal, 1959). This has been done for a hypothetical "standard subject" and plotted in Figure 6.54. The result of this calculation is a family of dome-shaped curves with the maximum summed amplitude of the three responses occurring when the second stimulus is midway between the first and the third stimulus. The data summarizing the psychophysical magnitude estimates obtained for this set of patterns is shown in Figure 6.55. They also exhibit the same domed shape. The important conclusion to be drawn from this experiment is that the two sets of data, the predicted sum of the neural response amplitudes (the validity of which was at least partially confirmed for those recordings in which the stimulus artifact did not obliterate neural responses) and the subjective magnitude of the sensation, agree quite weH.
Sensory Coding
328
TABLE 6.3 STIMULUS CONFIGURATIONS IN THE THREEPULSE ELECTROCUTANEOUS EXPERIMENT (FROM UTTAL, 1960). Delay in msec Config.
P2
P3
1 2 3 4 5 6 7
1 2 3 4 5 6 7
8 8 8 8 8 8 8
8
1 2 3 4 5 6
7 7 7 7 7 7
14 15 16 17 18
1 2 3 4
6 6 6 6
5
6
19 20 21 22
1 2 3 4
5
23 24 25
1 2 3
4 4 4
26 27
1 2
3 3
9 10 11 12 13
5
5 5
In other words, for stimulus triplets, the interval irregularity does not appear to be carrying very much useful information, and the subjects seem to be insensitive to the temporal pattern of the stimuli. The main conclusion, therefore, is that the subjective magnitude of the sensation produced by the stimulus triplet is almost completely accounted for in terms of the summed amplitude of the neural response, when the three pulses occur within the temporal region in which the three stimuli feel to the subject like a unified sensation. Subjective magnitude in this time region (less than
x-x-~, P3 =8 msec
P3
=5
msec
P3
=
7 msec
P3 = 4 msec
P3
= 6 msec
P3 = 3 msec
140 130
AB
120 110 100 90 80 70 140 130
BB
C., 120
...~ 0
~ Q)
a
'tl
.,
.~
Ol
E
.,
Q)
c
110 100 90 80
6 msec
70 140
0
.,c. 130 ~
Q)
Ol
~
Q)
>
cl:
DT
120 110 100 90 80 70 140 130
JT
120 110 100 90 80 70
2
2
5 4 Delay of P2 (msec)
3
6
7
FICURE 6.53 The average summated neural response magnitude tor the various stimulus conditions shown in Table 6.3. Data are pooled trom four subjects. Not all values are present, because under certain conditions the stimulus artifacts obscured the neural responses (trom Uttal, 1960).
Sensory Coding
330
c
~ 150
Ö 140
*-
~ 130 :::l .~
c:
Cl
E
120
51 110 c: o ~ 100
f
~co
90
ä
80
ä
80
2
2
3
4
5
6
7
Delay of P2 (msec)
FlCURE 6.54 Theoretically calculated summated response amplitudes of the neural responses to a stimulus trip let, assuming a standard form of the interaction between two pulses (trom Uttal, 1960).
about 10 msec) is, thus, apparently encoded by some counting procedure, which is sensitive only to the number ofaxonal responses that are reflected in the amplitude of the compound action potential, and not to the interval pattern. Interval seems only to exert its influence indirectly as it modulates the amplitude of the individual compound responses. But this conclusion is valid only for the situation in which a11 of the responses occur within the region of temporal fusion. Does this same insensitivity to interpulse irregularity obtain with stimulus pulse patterns that are more prolonged and in which the individual pulses are greater than 8 or 10 msec apart? The answer to this question seems to be-no, this conclusion is not valid in the situation with longer bursts and with wider interpulse separations. The study (Uttal and Smith, 1967) that examined this question utilized much the same equipment and methodology as the three-pulse study just described, but asked the subject to make a judgment directly of irregularity. As we have mentioned, the general question that was being asked was, "Can the effects of interval irregularity as such be observed in psychophysical judgments?" Nevertheless, to answer this general question, the specific experimental question asked had to be: can a subject make use of the additional information in many irregular intervals to give a finer discrimination than that possible in a stimulus train with only one irregularity? Figure 6.56 presents the results of two experiments in a way that allows this second comparison to be directly made .. The upper curve plots the results of the first experiment, in which aseries of pulse stimuli was used having only one irregularity-a single gap. A two-alternative forced
P3 =8 msec
P3 = 5 msec
P3 = 7 msec
P3 = 4 msec P3 = 3 msec
P3 = 6 msec
50
AB
40
30
BB 60
Q)
"tl
::l
,,=
c
50
Cl
~
....0
... Q)
60
DT
'"
..
E
';;
w 50
40 60 JT
50
40
2
2
3
4
5
6
7
Delay of P2 (msec)
HeURE 6.55 The psychophysical data showing averaged magnitude estimates for the stimulus conditions shown in Table 6.3. Note the correspondence between this response pattern and the calculated neural response of Figure 6.54 (from Uttal,1960).
332
Sensory Coding 25.0
]
! u"O
22.5
21 '0 20 .0 E~
"0
GI "-
'0 -S 17.5 ~I~ GI-
.2 ~ 15.0
;: "0
I Cl .S ~g
~.~ c:.c "-
., GI
~
12.5
! 10.0 ::l
.,c:
E o
7.5 7.5
10
10
12
14
j
I
I
I
16
18
20
25
FIGURE 6.56 Psychophysical interval irregularity and gap detection experiments compared. Subjects do have a significantly better ability to detect overall irregularity as compared to the detection of 30 40 a single gap in a stimulus train (trom Uttal and Smith,
Basic interpulse interval (msec)
1967).
choice experimental procedure was used as the psychophysical method such that the subject was forced to say which one of two sequential bursts contained the single gap. The size of the gap was controlled by a contingent computer program, which reduced the gap size if it was detected, and enlarged it if it was not. In the second experiment, the degree of interval irregularity of an entire stimulus train was controlled by a computer procedure, which was also contingent upon the subject's prior responses. In this case, rather than varying the size of a gap, the wh oie pattern of interpulse intervals varied according to a simple computational algorithm. Figure 6.47 (presented earlier in this chapter) illustrates the jittered stimulus pattern. In this case, the subject's ability to discriminate the irregularities was considerably greater than the case in which only a single gap discontinuity was present. This result indicates that the subject is able to make use of the additional information contained in the statistics of the interval irregularity in this situation. This conclusion was further substantiated by looking at the effect of the number of gaps in an irregular train with varying numbers of pulses. If the subject is acting as a sort of statistical analyzer, we should expect his performance to improve as larger sampies of intervals are presented to hirn. Figure 6.57 shows that this is exactly what happens. As the nu mb er of pulses in the stimulus train increases, his sensitivity to irregularity also increases, but only up to a limit of about 8 pulses. Another important point concerns the general shape of the irregularity detection function of Figure 6.56 and the apparent discrepancy be-
333
The Coding of Sensory Magnitude R 40
35
U 30 CI>
Data for 5 5s
E
20 msec 16 msec 12 msec
co
"0
'0 .s:: co ~
...
.s::
25
Interpulse interval
i§ x
CI>
"0
e: 20 e: 0
.,
";;
-e;! "CI>
15
C.
., e:
CI>
~
10
5
o
3
4
5
6
8
10
12
15
20
Number of pulses
FICURE 6.57 The increase in the ability of subjects to detect irregularities as the number of pulses in the train increases. The fact that all three curves asymptote at the same point indicates that this is a counting process and not one dependent upon the overall timing of the trains (trom Uttal and Smith, 1968).
tween the data for the three-pulse situation (where interval was not important) and the longer burst situation (in which irregularity detection seems to be a well-developed capacity). The disagreement between the two situations is only superficial, for the three-pulse da ta deal solely with total intervals that are less than the temporal fusion threshold. Furthermore, an examination of the data obtained with irregular bursts suggests that there may even be three, rather than two, distinguishable
334
Sensory Coding
regions of temporal response, each of which follows different rules i one region extending from 0 to 10 msec, one from 10 to 20 msec, and one for longer interpulse intervals. This multiple branch characteristic of the response curve is very similar to one of the results reported by MountcastIe, Talbot, OarianSmith, and Kornhuber (1967) for the detection of vibratory stimuli. They found that their response curve was made up of two segments with the point of inflection between the two, occurring at about the same interpulse interval, 20 msec, as the second break in Figure 6.56. They concluded from their results that there were two different somatosensory receptor systems operating in the detection of vibratory activity. The first was sensitive to low frequencies and was media ted by receptors located in the skin. The second was sensitive to high er frequencies and was mediated by receptors buried deep in the tissues of the hand. Which receptor system was activated, they suggested, was directly dependent upon the frequency of the mechanical vibration. It is, however, possible that the two-segment curve obtained in their experiment was not a function of two different sets of nerve fibers, but rather two branches of abipartite response curve of a single set of neurons. Considering the nature of the electrical pulse stimulus of the Uttal and Smith experiment, there is no reason to assurne that the individual pulses affect different fibers or follow different current paths as their frequency of occurrence is increased. It is more likely that the segments of the response curve do not represent different groups of receptors or fibers, but rather represent a curve with multiple branches or characteristic responses for temporal processing by a single family of fibers. Thus, the multiple arms of the response curves in both the Uttal and Smith and the Mountcastle et al. experiments may be alternatively explained by the following hypothesis. The temporal response of peripheral nerves exhibits a continuum of two or three regions of differing temporal processing capabilities as the frequency spectrum is scanned. Below interpulse intervals of 10 msec (100 Hz), there is a region of fusion in which there is generally a very low sensitivity to the temporal characteristics of the signal. In this time zone, the integrated number of impulses is more important than the interval pattern. Above 10 msec and below 20 msec, there is a region in which there is an independent sensitivity to the second-order statistics-the irregularity-of the interpulse intervals, and the mean frequency does not exhibit any substantial effect. Above 20 msec, one crosses into a region in which the mean interpulse inter val is so large that the integrative powers of the nervous system are no longer capable of taking advantage of such higher-order statistical data, and judgments are made on the basis of me an interval or some other independent measure of the size of local intervals. 00 these data and conclusions correspond to those obtained in the experiments with Aplysia? In fact, and assuming that the differences in time scale of the invertebrate and vertebrate preparations can be ignored~ the two sets of data do pretty weIl agree. Segundo, Moore, Stensaas, and Bullock (1963) point out that for " short durations and, thus, high frequencies . . . all trios of subthreshold shocks of moderate intensity evoke
The Coding of Sensory Magnitude
335
aspike, irrespective of timing"; that there is "an intermediate range of durations (for example, 500 msec) and frequencies in which different positions of S2 are reflected by significantly different degrees of depolarization (to the third stimulus) and at high er frequencies, different spike probabilities." They conclude, "Consequently, it is justifiable to state that, under fixed conditions of nerve, cell, and EPSP size, there exists an intermediate input frequency range in which output is critically dependent on timing." No specific comment is made of even longer intervals, but with suprathreshold stimuli, the analogy can be easily made between the third branch of the Uttal and Cook response data and some gross frequency discrimination by Aplysia. Thus, we have demonstrated in the course of our discussion that the variability of the intervals between spike action potentials: 1. is a natural consequent of stimulation with stimuli of varying
intensity;
2. is a parameter which is capable of being transmitted across synap-
tic junctions; and
3. does differentially affect behavior in the form of psychophysical
judgments.
In doing so, we have established that this candidate code is also available to be used as a code over at least part of the temporal domain, even though we have not finally established that interval variability is a code specifically for sensory magnitudes. The implication is that it is, but the analysis is admittedly not complete at this point.
v.
AN INTERIM SUMMARY
This chapter has been an extended discussion of the factors involved in the neural coding of sensory magnitudes. We began our discussion by considering the contemporary view of the "threshold." Where once the threshold had been considered to be a sharp line of demarcation, it is now considered to be a region in which, at the very least, statistical proces ses opera te and, at most, may possess no lower bound. We then discussed signal detection theory and showed how it could be advantageously applied to both psychophysical and neurophysiological data in the light of this new view. We can summarize the material on the coding of the full range of stimulus and subjective magnitudes by noting that in neurophysiological experiments, almost all functions relating stimulus intensity and response tend to be compressed on the average. This compression holds true from the initial transduction and does not increase noticeably from the most peripheral signals recorded from the receptors to the highest levels of the central nervous system. This result directly leads us to the general idea that response compression is a feature of neural co ding, which is primarily attributable to the transductive processes themselves. Further, the candidate codes, wh ich seem most often to be used for the representa-
336
Sensory Coding
tion of intensive dimensions, include the number of responding cells, response amplitude (as measured either with the amplitude of a graded potential or with the integral of a pooled response), the mean frequency of spike action potentials, and the interpulse irregularity of the spike train. Another important generalization, which we can draw from a comparison of the psychophysical data and even the central nervous system's neurophysiological data, is that there is no direct linear relations hip between the two. The psychophysical functions are described by power functions, whose exponents range from 0.2 to 7.0, while the neurophysiological functions almost always seem to be compressed functions with averaged or integrated exponents that range only from 0.2 to at most 1.1. It seems fair to say, then, that the psychological exponents reflect something more complex than the simple sensory events evoked by the stimulus. Decision criterion, judgmental factors, and emotional overtone all seem to be words, which are probably descriptive of some of the processes involved in the psychophysical magnitude estimate processes, and thus far more complex systems of neurons are being assayed by the psychophysical procedures than by the neurophysiological Ones. We have also carried out an analysis of the acceptablity of a particular candidate code-spike interval irregularity-as a true code and have found that it can so be accepted, although it is not possible to specifically associate it yet with a particular common sensory dimension.
CHAPTER 7: THE NEURAL CODING OF SPACE AND TIME
In this chapter, we shall begin our discussion of a different set of dimensions and a different set of problems than the intensive ones emphasized in Chapter 6. The temporal and spatial dimensions we shall discuss are surprisingly closely linked in any psychobiological analysis, and often it is impossible to separate them without losing the meaningfulness of the data. It is the main purpose of this chapter to show how these two domains-the temporal and spatial-are linked and then to present both psychophysical and neurophysiological evidence that will help to establish the relationship between the perceptual and neurophysiological findings. We shall first consider two ways in which fine discriminations become blunted. There is a loss in both temporal and spatial acuity as one ascends from the peripher al nerve response to the percept. This loss can be accounted for in terms of neural convergence and divergence, respectively, and we shall introduce and discuss each of these processes prior to the discussion of the psychophysical data themselves. The next topic in spatial co ding considered will be the emerging body of evidence that suggests that there is a marked interaction between nearby events at least at some peripher al levels of the afferent pathway. The important point being made is that signals do not, in general, travel along isolated and private lines, but do affect each other in numerous ways through a sort of parallel "cross talk" or interaction. We shall consider some of the psychophysical evidence that seems to reflect
Sensory Coding
338
behavioral effects of this cross talk, and also the observed details of the neural nets that might mediate these interactions. We, then, shall consider two related questions, the answers to which turn out to be surprisingly equivocal in the light of so much of what is taken for granted in today's psychobiological theorizing: 1. Does lateral inter action exist in other senses besides vision? 2. Does a similar form of lateral inter action exist centrally as well as peripherally?
We shall then discuss some of the mathematical nerve net theories of lateral inter action. These theories are very highly developed and play an important role in the formal modeling of both perceptual and neurophysiological findings in contemporary thinking. Next, we shall discuss the neural correlates and the psychophysics of purely temporal discriminations. Finally, we shall discuss the neural coding of spatial discriminations and show that in audition, in particular, perceptual space is best explained in terms of dynamic temporal and intensive codes, while in vision, static spatial codes seem to be most closely associated with spatial percepts. I. INTRODUCTION
A. Some Complexities We now consider the spatial and temporal aspects of perception, an area which is so broad and for which there is such an abundance of material that at the very best our coverage can be only partial. Therefore, almost any critical reader will find that some of the topics or studies in which he might have been interested are going to be missing from this chapter. The coverage of this material will necessarily be selective, and we shall survey only those aspects of spatial and temporal experience that may be reasonably subjected to some kind of neural coding analysis. This reservation is necessary because, while all sensory and perceptual phenomena, without doubt, are represented and encoded by patterns of neural activity, not all of these problems are equally amenable to study using the concepts and methodologies of our contemporary neurophysiology. Unfortunately, however, there is no simple apriori way to separate those sensory and perceptual phenomena that we cannot treat in neurophysiological detail from thosE! that we can. It is a matter of historical accident and ad hoc decision making. Some phenomena, such as the Mach band, for example, clearly have been prototypes for physiological analysis since their initial description, while other similar visual spatial illusions (for example, the Poggendorf illusion) have not yet been subject to a comparable analysis. In the next chapter we shall also deal with the problem of whether or not some oE the current physiological theorizing has been, in fact, premature. This problem arises because some physiological proces ses are superficially similar in functional form to some perceptual
The Neural Coding of Space and Time
339
phenomena, and thus analogues of functional form have often been misperceived as homologous neural mechanisms by some authors. Very often, properties initially explained by simple geometrie neural net interactions are found later to depend upon very complex cognitive properties such as meaning or form similiarity. Neural net analyses are hard pressed, in spite of their superficial relevance, to handle such situations. The point that is being made here is that interactions based upon the geometry of a stimulus pattern may not be satisfactory explanations when the stimulus information has been encoded and represented at some symbolic or semantie level, where a square is no longer a square geometrieally but only in terms of certain cognitive implications. For example, where an entire object is masked as a total entity rather then only the portion of it that is dose to a masking stimulus, it is hard to see how one can apply a theory of lateral interactions that is heavily dependent upon geometrie al propinquity and well-defined distance functions. We shall have little to say in this chapter about the spatial or temporal aspects of the chemiealsenses. Because of the diffieulty of stimulus contro!, experiments involving spatial and temporal manipulations in these modalities are very hard to instrument. Moreover, one might also suspect, apriori, that these chemie al modalities are not primarily designed to cope with the spatial and temporal dimensions. Those senses that have so evolved-hearing, vision, and to a lesser degree the somatosensory senses-wil!, therefore, be the main source of experimental information as we pursue our analysis. Any generalization of the sort described in the previous paragraph is likely to fall victim to the inventions of some ingenious experimenter. Indeed, von Bekesy (1964a) seems to have pushed the notions of lateral interaction and spatial localization to their extremes in his demonstrations of the ability of the olfactory and gustatory senses to localize chemieal stimuli. This feat seems to be accomplished on the same basis (differences in time of arrival of the stimulus to different portions of the tongue or olfactory receptors) that underlies acoustic spatial localization. However, this occurs under some highly unusuallaboratory conditions, whieh do not seem to obtain regularly in human experience. The general notion that the chemical senses are weak as discriminators of time and space still seems to be valid. Another difficulty in organizing this chapter arises out of the fact that space and time interact so strongly and are traded off, one for the other, in many different situations. It is not always going to be satisfying to the reader that a particular topic has been suitably placed if it is considered solely in terms of either spatial or temporal dimensions. Very often stimulus conditions that are varying in the temporal domain will be encoded by spatial parameters and, in fact, be perceived as varying along a spatial continuum. Similarly, we shall also often see instances in which spatial dimensions are encoded by signals varying in the temporal domain and examples of temporal neural response patterns critically dependent on spatial aspects of a stimulus. It is important, therefore, that we distinguish between a number of related notions. We must speak of the spatial or temporal pattern of the stimulus independently of the spatial or temporal
340
Sensory Coding
pattern of the neural code. In some instances we must even keep these notions separate from the temporal and spatial dimensions of the perceived experience. To do so, of course, is to take advantage of the great strength of sensory-coding theory. It allows-in fact, emphasizes-that stimulus, neural and experiential patterns need not be isomorphie at their respective levels of representation. B. Time and Space What do we mean by time and temporal patterns? Wh at do we mean by space and form? In considering the answers to these questions, we must first consider wh ether they really are two separate domains. A common sense answer is that they certainly are. We use rulers (rulers whieh, of course, may be as sophisticated as the wavelength of a certain kind of light) to measure distance and docks to measure times. But this common sense answer is deceptive. The initial answer is common sense only in the terms of a partieular physieal model of reality-Newtonian physies and Euclidean geometry-which has been superseded among physieal sdentists for over half a century by constructs that are, at first glance, very much contrary to "common sense." Classieal Newtonian physies dealt with the three dimensions of space as if they were constant. The shape of the coordinate system did not depend upon the events going on in time in the system or the objects that were contained therein. Time was aseparate factor, whieh could be measured or not in any given experiment. Yet, even in a timeless snapshot, the spatial coordinates remained rigid and fixed. Parallel lines were parallel (except for errors of measurement) out to infinity, and indeed the notion of infinity was axiomatic in this system, since no one could imagine wh at could He on the other side of the end of the universe. Mass was constant and the laws of the conservation of.mass and energy almost religiously adhered to. The more modern relativistic theories, largely based on the contributions of that unique intellect Albert Einstein, have changed our perspective considerably. Not only has the notion of the interchangeability of mass and energy, along with the attendant enormous sodal consequences, been introduced, but even the parallel universe of Euclidean geometry has been replaced by one in which parallel Hnes meet somewhere "out there." Furthermore, and getting to the present point, the coordinate systems themselves came to be seen as depending upon the nature of the bodies contained within a volume and their states of motion. Modern relativistic physies-the Einsteinian model of realityfurther suggests that time and space are not as separate as our older common sense notions would suggest. The popular persistence of the "common sense" Newtonian notions of space and time is probably largely due to the fact that Einsteinian relativistie notions are primarily concepts that become relevant at levels of speed and size at whieh most people infrequently operate. The dimensions of our daily lives are well enough described by Newtonian ideas so that we never really deal with the discrepandes that physidsts encounter when they are dealing with enormous or very small masses or great speeds.
The Neural Coding of Space and Time
341
In point of fact, even the idea that we use differing measuring instruments to measure time and space falls by the wayside in the illuminating light of the new docks, which turn out to be the very same devices used as a ruler for the spatial dimensions. A modern dock is nothing more (or less) than a specific light emitted by some selected atomic substance. This is done by a process which, not too surprisingly, turns out to be the measurement with a spectroscope of the wavelength of the emitted line. The wavelength and the frequency of vibrations are then related by Equation (3.19) presented earlier in this book. The basic unit of time, then, is defined as the period it takes for a certain number of these vibrations to occur. In the world of atomic dimensions, space and time are no longer even separated by different measurement operations. Thus, the major new contribution of Einsteinian relativistic physics may be viewed as the fact that space and time are now considered to be but alternate attributes of a multidimensional continuum, and neither can be considered independently of the other. Capek (1961) points out that this notion of the inseparability of space and time has often been misunderstood to mean that time has been reduced to simply another spatial dimension ("the fourth dimension"), but he emphasizes that in fact this is amisinterpretation of the real significance of the idea. He suggests, rather, that the new unity is "more accurately described as a temporalization or dynamization of space than a spatialization of time." One of the advantages of the approach that considers time as not just another spatial dimension is that time need no longer be forced to fit the same set of characteristics exhibited by any of the other three spatial dimensions. It need not be required, for example, that time be bidirectional-a notion implicit in its assignment of the status of simply another spatial dimension. In bidirectional time, time machines and perpetual motion and a host of other science fiction ideas take on validity which still seems questionahle. If Capek is right that we should dynamize or temporalize the three spatial dimensions rather than adding time as a fourth dimension, then we are ahle to hold on to a very important notion-the undirectionality of time. In a notahle modern consideration of the directionality of time, Time' s Arrow and Evolution, Blum (1955) points out that there is a directionality imposed on time that is not imposed on a spatial dimension. That is the second law of thermodynamics-a physical law which he believes is time's "arrow," imparting a directionality to the sequence of events. The idea is that, on the whole, the uni verse must be running downhill to astate of greater and greater entropy or disorganization. In local regions, such as areas of time-space where ontogenetic or phylogenetic development is occurring, decreases in entropy (increasing order) can appear to occur, but this is at the expense of energy resources at other places in the universe. Thus, while organic evolution can produce an increasingly organized and more complex species, and while the individual grows from astate of minimum order to a greater degree of order, these local revers als of time's arrow represent anomalies, and the true overall state of the universe is one of increasing disorder. The implications of modern relativistic theory linking time and
342
Sensory Coding
space and of this notion of unidirectional time are enormous, and many philosopher-physicists have spent and will continue to spend a not inconsiderable sum of intellectual energy trying to fully interpret their significance. It is most important to note that psychobiological data and theory mayaiso be impelling neurophysiologists and psychologists in exactly the same direction. It as we have no ted, temporal stimulus patterns are often represented by spatial neurophysiological patterns and spatial stimulus patterns by temporal neurophysiological patterns, then it is meaningless to talk about the spatial code independent of time. A simple spatial plot of the events involved in the neurophysiological representation of the localization of asound source, for example, obscures the most critical information involved. Similarly, a "temporal snapshot" is not too unreasonable a description of the time function usually plotted as a result of single microelectrode experiments. These ideas suggest the current emergence of a new sort of relativism in psychobiology, just as at the turn of the twentieth century, physics evolved a new concept of space and time in its domain. Time and space (as well as, to some degree, intensity and quality) should probably be looked at together rather than separately as we have in the past. Of course, this sort of a conceptual revolution is not going to come about immediately, and we shall have to depend upon the existing sorts of data until new methodologies, new equipment, and, perhaps most important, new perspectives evolve. As for the questions of what is space, what is time, or even what is psychobiological space-time, we still are confronted with very much the same problem that we had when we tried to define the intensive dimension. There just are no adequately precise definitions possible. There are only conventions of general agreement and certain classes of operations that act as exemplars of wh at the concepts of time and space mean to experimenters. It is probably the case that a better picture of what is meant by time and space will emerge from the discussion that follows in this chapter than from any attempt to specifically define such abstruse concepts. 11. SPA TIAL AND TEMPORAL ACUITY
Of all the neurophysiological correlates of perceptual phenomena, perha ps none is so simple to handle conceptually as is the problem of the 1055 of temporal and spatial acuity that occurs as one passes from the peripheral neural response to the psychophysical response. YetI perhaps, no single concept is as fundamental and with such widespread implications. It is a fact that we are not able to separately feel or see two stimulus points that are closer than a certain minimum distance or to distinguish, as successive, two events that are closer than a certain minimum time interval. This is so in spite of the fact that the neural responses are quite distinct either spatially or temporally when one looks at the peripheral neural response. Yet centrally the neuronal response does display a corresponding loss in discriminability as it is distributed in both time and space far beyond the limits of the peripheral response.
The Neural Coding of Space and Time
343
We refer to the perceptual phenomena, whieh we shall discuss generally, as temporal and spatial acuity. The notion of spatial acuity includes, among other more restricted notions, the historie and fundamental ideas implicit in the measurement of the two-point threshold. The notion of temporal acuity has been generalized, on the other hand, into the more comprehensive concept of the " psychologieal moment." The moment is the indivisible period of time in whieh, at least, some workers thought subjects lost the ability to determine both simultaneity and successiveness. We shall discuss later the variations on the theme of the psychologieal moment that have emerged in recent years. In the neurophysiologieal domain, psychophysieal spatial acuity is embodied in the notion of the "receptive field." The receptive field is defined for some specific neurologie al structure, however, and this is sometimes forgotten in the psychophysieal literature. A receptive field of a neuron, for example, is defined as that portion of the total receptive surface that is capable of eliciting a response from that neuron. It is important to remember that the fields of inter action, whieh are measured psychophysically, may not be identieal to the receptive field of a specific and single neural unit. The psychophysieal finding may, on the other hand, reflect the action of complex systems of overlapping neural receptive fields. Nevertheless, there are conceptual, if not anatomieal, correspondences between the two sets of data, whieh make a coding analysis such as the one that follows useful. Again the reader is reminded that the relation between the spatial extent of the percept and of the nerve is just as much as part of the coding problem as is the relation between subjective magnitude and action potential frequency with whieh he is more familiar. The general problem of the 1055 of information as the signal ascends the nervous system is emphasized in the subsequent discussion of temporal and spatial acuity. Such data are critieally important to our understanding of the rate and flow of information processing in the nervous system. Furthermore, the clear contrast between the acuity evidenced in the peripher al nervous response and that evidenced in the psychophysieal response is one of the most obvious clues to the sort of pitfalls one can encounter when one attempts to explain psychophysies by referring to neurophysiologie al codes far removed from the central level at which the final decoding actually takes place. A model of co ding based upon the characteristics and codes of the peripheral nervous response ignores the fact that a considerable amount of further processing, including the 1055 of information, has occurred between the two levels. It ignores the fact that information has, even at that level, already been encoded several different ways. It, thus, attributes a uniqueness to one level or one code, whieh is far from the reality of the situation. A. Convergence and Divergence in Neural Nets To begin our consideration of the neural co ding of time and space, we shall consider two analogous concepts limiting sensory acuity. First, we must note the existence of the spatial receptive field, whieh tends to reduce spatial acuity in several sensory modalities. Because of the anatomieal organization of many receptor surfaces, it is a general fact that stimuli ap-
344
Sensory Coding
plied to any portion of a relatively large area do not initiate activity solely in a single afferent neuron. Notwithstanding the presence of other neural processes which tend to sharpen the representations of stimulus patterns, the anatomical convergence of many inputs on a single output tends to make it less likely that the precise locus of incoming information can be identified. Second, we should also note that there is a similar sort of diffusing effect limiting temporal acuity. Two stimuli that are separated from each other by less than some minimum interval cannot be resolved. This loss of acuity in the temporal domain is probably due to neural mechanisms, which are direct opposites of those responsible for the loss of spatial acuity. Spatial acuity is mainly lost because of the convergence of many input fibers onto a single output fiber. Temporal acuity, on the other hand, is lost primarily because of the great divergence of input fibers onto a far greater number of output fibers, the responses of which are extended over a considerable period of time. A single neural response in a peripheral somatosensory fiber, which may last for only about 1 msec, can lead to the activation of literally millions of central neurons, whose cumulative action lasts for 1 sec or more. The important physiological constructs of divergence and convergence that underlie these los ses of temporal and spatial acuity were first described by Sherrington, a neurophysiologist who probably comes as dose as anyone to playing the role for this science that Einstein did for physics. Figure 7.1 (a) and (b) are simplified sketches of neural convergence and divergence, respectively. As shown in part (a) of this figure, the basic premise of convergence is that many input fibers from different locations synapse upon a single output fiber. In such a system, while information can be preserved concerning the pooled timing of the responses of the family of input cells, it is not possible to tell, by observing the activity of the output cell, whence the input signal came. This holds true not only for the electrophysiologist who might be observing the output of the cell with a microelectrode as well as the decoding mechanisms of some psychophysical subject's central nervous system. The different information needed to do so is simply no longer available, having been lost at the interface between the input and the output neurons. Divergence, the opposing mechanism, is represented in Figure 7.1(b). Although divergence, as originally conceived by Sherrington, was a purely spatial concept (one input cell produces responses in a larger number of output cells), the notion as we shall use it is a more complex one. We shall be concerned with both spatial and temporal divergence such that both the number of cells involved in the response and the duration over which they are active are expanded. Spatial divergence is a simple notion and is explained fully by a single mechanism, but it should be noted that response duration can be expanded by two or more mechanisms. For example, a potent temporal extender would be a system of feedback loops, which maintain reverberatory activity. In such a network, a given cell would be reactivated either directly or indirectly by its own output. The reactivation might be media ted by a single synaptic relay or by the action of a highly complex neural net, but in each case the end result would be a prolongation
345
The Neural Co ding of Space and Time
Spatial convergence
CNS decoder
Input cells
Output cell
(a)
Temporal divergers
CNS Spatial divergers
Input cell
decoder
Output cells
(b)
FIGURE 7.1 (a) A schematic diagram of spatial convergence in neural systems. (b) A schematic diagram of spatial and temporal divergence in neural systems.
of the time course of the original response. Another mechanism, which might also contribute to the prolongation of a neural response, is simply that chains of neurons of varying length may be involved. Though there are but a few (four or five) neurons in the typical direct sensory pathway, there is no assurance that very long sequences are not activated as responses course through the various levels of the central nervous system. The end result of neural divergence, feedback, and the activation of chains of varying length is that a single spike action potential in a peripheral nerve can be extended into a central nervous system response that may last for many seconds. Information is certainly lost in this manner as the response is prolonged, for the ability to distinguish, when a stimulus occurred or if there were more than a single stimulus, is diminished as the response it evokes is extended over a long period of time. The loss of psychophysical temporal acuity [for example, two action potentials that are
346
Sensory Coding
perfect1y resolved at 1 or 2 msec in peripheral somatosensory nerves (Uttal, 1959; Rosner, 1961) can only be psychophysically resolved at 10 to 12 msec separation] is probably due to this sort of prolongation of the response duration at the level of the central nervous system. In sum, neural divergence and convergence, while both necessary as integrating mechanisms, do tend to reduce the amount of available information-convergence by tending to establish a spatial receptive field in which differences in localization cannot be discriminated, and divergence by tending to establish a temporal period in which differences in time of occurrence cannot be discriminated. We shall now consider these two related notions in detail. B. Spatial Acuity and Receptive Fields 1. The Two-Point Threshold-A Psychophysical Correlate of the Neural
Receptive Field. The notion of the cutaneous two-point threshold, one kind of spatial acuity, was introduced into psychological research by Weber (1834). The classical two-point limen experiment involved the placement of the two points of a compass on the skin and the determination of the threshold separation required for the two stimuli to be discriminated as two rather than one point. One interesting early discovery was that the size of the two-point limen varies over the surface of the body. The most modern determination of these size differences at different locations on the surface of the body has been made by Weinstein (1968). His study was a modern replication of the classic work done over a century earlier by Weber. It was intended to extend the measurements of the two-point threshold by examining the effect of sex and of laterality in addition to the dimension of body part used by Weber. Weinstein used a group of 24 men and 24 women, all of whom were right-handed and the standard two-point compass as an aesthesiometer. Figure 7.2 presents the results of his work for men. An analysis of his data shows that as Weber observed, by far the strongest effect was produced by varying the part of the body examined. With regard to the parameters of sex and laterality, Weinstein found that few of the comparisons showed significant differences. From the very beginning, Weber hypothesized that the two-point threshold was probably due to neural convergence. His notion was that the terminal arborizations of a given cutaneous afferent fiber spread over a relatively large area (clearly an antecedent of the modern idea of a receptive field) and that stimulation anywhere within this area led to indistinguishable sensations. The drawings of these areas, sensory circles in Weber's terminology, suggest however that he did not believe that they were overlapping. Overlap is a relatively modern concept and is now an integral part of the basic notion of the receptive Held. A further development, which has become apparent since Weber's early studies, has been the emerging realization that the sensory "circle" on the limbs is not a circle after all, but probably more generally has an oval outline. This is particularly noticeable when small electrical shocks are used rather than mechanical stimuli. Presumably the oval shape of receptive fields reflects the fact that the nervous system is organized such that
347
The Neural Coding of Space and Time 501
Right side Left side
45 40 35
-30 E E
1j
(5
25
.c 111 G> .... .c 20 c
... '" :2 G>
15 10 5 0 Hallux
Calf
Thlgh
Belly
1 23 4
Breast Upper IIp
Sole BaCk
Cheek Shoulder
Forehead
Fmgers
Forearm Upper
arm Palm
FICURE 7.2 A drawing showing two-point discrimination values as a function of body loeation for males (from Weinstein, 1968).
fibers run more along the limbs than across it. On the forearm particularly, the difference in two-point thresholds can be as great as 10 to 1 when one measures along rather than across the arm with electrical stimuli. The visual analogue of the somatosensory two-point threshold is acuity. The general problem here is to determine how dose two photic stimuli can be to each other and still be distinguished as separate entities. A wide variety of acuity measuring tests have been developed over the years induding grids, letters, and broken cirdes. These devices usually indicate that the maximum visual acuity is about 1 min of visual angle. However, by far the most sensitive acuity measures seem to be obtained for the "hanging wire." Thin lines, 2 or 3 sec of visual angle wide, a width which is far sm aller than the width of a single cone in the primate eye [which according to Polyak (1941) varies from 12 to 18 sec of visual angle] can be detected if they are lang enaugh! The last phrase, "if they are long enough," suggests that the limits on visual acuity are not solely determined by the dimensions of the retinal mosaic, or by receptive field size, but that some sort of integrative or statistical processing associated with inter-
348
Sensory Coding
0.71
0.83
1.0
c
'E
-1.25 >-
':; ~
(.)
'" 1.66 m .,;:, :; 2.50
5.00
16" Temporal field
8°
8° Rod-free fovea
16" Nasal field
FICURE 7.3 Visual acuity plotted as a function of visual angle. Note that once away from the central 4-deg region, there is little further decline in acuity out to 16 deg of visual angle (from Alpern, 1969).
action among adjacent regions is possible. Thus, aseries of adjacent regions may each contribute to the collective process that detects the wire, even though no single region alone would have been able to elicit that response. Visual acuity monotonically decreases as the stimulus moves further away from the fovea as shown in Figure 7.3, but most of the falloff occurs within the central 4 degj after that the acuity function is relatively flat. Peripheral regions of the retina, where rods predominate, seem to have particularly poor acuity. This fact is probably attributable to the high ratio of convergence of rods on optic nerve ganglion cells, a number of which may be as high as several thousand to one. An excellent and complete discussion of many other of the psychophysical details of visual acuity and of the determining parameters is available in Rigg's chapter in Graham's (1965) important book on vision and visual perception. Another means of formally specifying the ability of the eye to discriminate between adjacent objects involves the measurement of the subject's ability to tell wh ether a grating differs from a uniform field. The grating is generated either by means of photographic plates or on the face of a CRT in such a way that a time stable intensity pattern varies sinus-
349
The Neural Coding of Space and Time
~
'in
...c CI>
C
Position
FIGURE 7.4 A spatial sine wave grating and a graph showing the brightness level across the grating (from Cornsweet, 1970).
oidally across the field of the image as shown in Figure 7.4. Schade (1956) was probably the first to use this type of modulation transfer function (MTF) or contrast sensitivity measure (CSM), but recently the notion has been further developed by Campbell and Green (1965), Green (1968), and by Davidson (1968). A particularly lucid and complete discussion of the technique has recently been given by Cornsweet (1970). In brief, the contrast threshold of a sinusoidal grating of varying spatial frequencies (usually measured in cycles per degree of visual angle) is determined using the modulation ratio (the ratio of brightnesses of the darkest trough to the brightest peak) as adependent variable. Figure 7.5 is a typical plot of the modulation ratio required for visual detection of the contrast of a green colored grating on a green background as a function of the spatial frequency of the grating. The ratio required is modest at low frequencies (broad bands); increases at intermediate frequencies (indicating that there is an optimal or resonant region), and then decreases steeply as
.
50
8
20
'i
-;;
Green test Green background
...ec
"0
FIGURE 7.5 The results of an experiment in spatial sine wave detection, The subject's task was to detect a green grating against a bright green background, Data are presen ted for two different background brightnesses with the open circles being the results for the brighter of the two backgrounds (from Green, 1968),
o
~
~ 10
.:: ~
...>
'S;
';;
5
'in
c
51 t: ...ec
o U
2 2
2
2
5
10
20
Spatial frequency (cycles/deg)
50
350
Sensory Coding
the grid lines narrow and the spatial frequency increases beyond 10 cycles/ deg of visual angle. Cornsweet has applied this notion to a far wider range of subject matters than just the notion of separable stimuli. His discussion also indicates how the simple modulation transfer function or contrast sensitivity measure can be used to predict the Mach band illusion. The reader is directed to that discussion for further details. However, it should also be noted that though the mathematical model of contrast sensitivity may adequately describe the phenomenon, this is not a necessary and sufficient proof that any particular neural interactive mechanism is in action. several analogous mechanisms could produce similar results without showing any homologous structure. Are there analogous phenomena in the auditory system? The ans wer to this question, which might initially come to mind, is the discrimination of differences in acoustical spatial localization, but as we have repeatedly pointed out, this phenomenon is probably not media ted by equivalent spatial neural mechanisms. Auditory spatiallocalization is a discrimination which, from a neural point of view, is quite unlike visual acuity-rather, it seems to be encoded by differences in the phase and intensity of the two transmitted dichotic signals. When considered from the perspective of the similarity of neural codes, the true analogue for spatial acuity on the skin or in the eye is auditory frequency discrimination itself. Two auditory stimuli produce different spatial patterns on the basilar membrane with maxima located at nearby points. This phenomenon is supposed to be the basis of pitch encoding and frequency (quality) discrimination and will, therefore, be discussed mainly in Chapter 10 when we consider theories of auditory quality coding. But the discrimination of similar tones must also be considered in the context of the present discussion, for this frequency discrimination appears to depend upon the ability to distinguish the patterns of stimulation on the cochlea in a way quite similar to that in which Weber's two-point limen was conceived to be operating on the skin. The corollary notion of receptive fields, namely, that they are overlapping, is also an especially important consideration in this auditory context, because the cochlear response patterns produced even by pure tonal stimuli are not highly localized, but rather are distributed across almost the entire surface of the cochlea. Auditory frequency discrimination curves, which are the best analogues of the two-point threshold, can be measured in a number of different ways. All of the methods agree that the size of a just detectable change in frequency depends not only on the frequency of the stimulus, but also upon its intensity. The data, which are considered definitive in the description of auditory frequency discrimination, were obtained by Show er and Biddulph (1931) and have been replotted in an information packed fashion by Licklider (1951a). This chart is reproduced in Figure 7.6. This threedimensional representation shows that frequency discrimination can be extremely fine; over a large region of the surface, the subject is able to distinguish frequency differences less than 4 cycles/ sec. A first approximation to explaining these data may be obtained by also plotting in Figure 7.7 data adapted from von Bekesy (1949), in which the locus of the maximum
351
The Neural Coding of Space and Time 24 20 24
16 12
20
8
16
4 12
0
N ~ c:
... CI>
E
10 20
8
~ u
c:
20 20
4
20 20 20
63
250
1000
4000
0 16,000
Frequency (Hz)
FICURE 7.6 A plot of the surface representing auditory frequency discrimination as a function of acoustic stimulus frequency and amplitude. Auditory frequency discrimination is as good as 2 or 3 Hz in many parts of this space (from Lieklider, 1951a, after Shower and Biddulph, 1931).
vibration of the cochlear partition is plotted as a function of the frequency of the vibrating tone. The function is roughly linear, and this could be used to explain the flattened portion of the frequency discrimination curve. Equal distances on the cochlea could, possibly, be associated with equal just noticeable differences in the psychophysical domain. Clearly, however, the notion of simple two-point discrimination cannot be pushed too far as a detailed explanatory model of auditory frequency discrimination. Nevertheless, the general notion of spatial interactions as an explanatory mechanism for frequency discrimination has been furthered by the models developed by von Bekesy. The difficulties of working directly at the microscopic level of the cochlea led hirn to develop a hydraulic model of the cochlea, upon which a subject could lay his arm. The arm played the role in this model of the array of hair cells aligned along the basilar membrane, while the fluid-filled rubber model (shown in Figure 7.8) modeled the fluidfilled cochlea. Vibratory patterns introduced at one end of the tube (just as they would have entered the oval window in the real cochlea) set up traveling wave patterns on the surface of the rubber "cochlea," which were then picked up and "processed" by the somatosensory system. Von Bekesy's notions of the significance of the various parts of the model system have been misunderstood in part by at least some ob servers. The actual physical localization of a band of maximum activity at a particular place on the rubber model was a function of the rubber tube, and not
352
Sensory Coding 5
10
E
.
E
15
..
~
1;;
... Gl
.=.
-.....
20
E
Localization as measured on basilar membrane
0
"-
~
.. c:
25
Ö
30
35
20
50
100
200
500
1000
2000
Frequency (Hz)
FIGURE 7.7 A graph showing the loeation of the maximum amplitude of defleetion on the basilar membrane as a function of the frequeney of the stimulating stimulus (from von Bekesy, 1949a).
of any "neural inter action on the forearm." This point is well made by running one's finger along the tube. A maximum point of oscillation is clearly detectable at a particular place on the tube purely as a function of the frequency of stimulation and independently of whether the fore arm is on the rubber tube or not. That maximum point is determined by the pattern of the traveling waves within the fluid-filled rubber chamber. Von Bekesy did, quite separately, emphasize the fact that there is further processing of this spatially localized information by the subject's nervous system. This sort of processing was assumed by hirn to be typified by a sharpening process, which tends to make the spatial experiences feIt by the subject more precisely localized than the actual pattern of the stimulus transduced from the rubber tube would be expected to invoke. Such sharpening mechanisms were largely thought by hirn to be due to interactions among adjacent neural units, but as we shall see later, it is not completely clear that such neural sharpening really exists! 2. Neurophysiologieal Evidence for the Existence of Receptive Fields. Now
that we have discussed the psychophysics of the two-point threshold and of some related perceptual phenomena, we shall turn our attention to those neurophysiological mechanisms that may possibly underlie or encode those experiences. Basic neuroanatomical evidence suggests that some sort of
The Neural Coding of Space and Time
353
HG URE 7.8 Photograph showing von Bekesy's water-filled rubber tube model of
human cochlea. In this model, the arm is the model of the array of hair cells, and it is there that any lateral inhibitory interaction is analogized, but the tube does tend to spatially localize the different frequencies of different places independent of the presence of the arm (courtesy of Dr. Robert eole, University of Hawaii).
convergence is present in aH sensory systems and that, therefore, the receptive field type of organization is ubiquitous. For example, in the retina Polyak (1941) has estimated that there are over 130 million rods and over 4 million cones, but that in the optic nerve there are probably few more than 1 million ganglion ceH axons. On the average, then, the responses of multiple retinal locations must be signaled along single axons, and this is exactly what is meant by the notion of a receptive field. More recent evidence has indicated that perhaps a few foveal cones have private communication lines, but that virtually aH rods share ganglion ceH axons. At the periphery, the information from several hundred rods may aH ultimately converge on a single ganglion ceH fiber. Similarly in the somatosensory system, although there are no synapses prior to the spinal cord, there are abundant ramifications of the cutaneous nerve fibers. This is also a sort of convergent mechanism for signals originating at different spatial locations. In this instance also, even though there are no many-to-one synapses, the information from the various branches is pooled along a single afferent nerve fiber. The number of auditory nerve neurons has been placed at about 50,000 by Gacek and Rasmussen (1961) for many mammals (pig, cat, and monkey), while the number of receptor hair ceHs in their cochleas is only about 25,000. This is exceptional and, thus, once again points out the somewhat peculiar features of the auditory receptor mechanism. Nevertheless, as we saw earlier in the chapter on receptor anatomy, there is an overlapping sort of innervation which allows a specific region of the organ of Corti (a "receptive field"), including many hair cells, to be serviced by a single cochlear nerve axon. Furthermore, according to Stevens and Davis
354
Sensory Coding
'3b
Radial bundles
3a 4
3
5
5
FlGURE 7.9 A schematic drawing of the arrangement of nerve fibers in the basilar membrane, showing the extensive opportunity for lateral inhibitory interaction and the convergence of the outputs of the receptors onto single auditory nerve fibers (from Wever, 1949).
(1938), single cochlea nerve fibers make contact with a very large number of extern al hair ceIls, although only two or three internal hair cells are connected to a single cochlear nerve neuron. Figure 7.9 shows the elaborate nature of this convergence. It should be noted that the only way to determine the shape of the receptive field of a single neuron is to carry out an experiment with a particular and characteristic design. An appropriate stimulus is used to explore a region that might possibly be associated with the responses of a certain cello By observing the evoked activity of the single neuron as the stimulus is moved about in the potential receptive, field its actual receptive field can be very precisely determined and its extent plotted. However, the first sensory modality, in which receptive fields were plotted, was the somatosensory receptive system, and single neuron responses were not the physiological indicators used. Sherrington conceived of the idea of using reflex action as a me ans of mapping the dermatomes" of whole somatosensory nerves. Dermatomes were clearly precursors of the modern notions of receptive fields. In a volume of his collected works, Sherrington (1940) describes the techniques for mapping the dermatomes on the limbs of Macaque monkeys, using blood pressure or the response of antagonistic muscle groups as the indicator of a response. His drawings, one example of which is given in Figure 7.10, illustrate how precise these techniques were for the definition of the field of action of an entire nerve. The technique typically involved the surgical isolation of a single spinal U
The Neural Coding of Space and Time
355
Border of the 7th cervical skin-field
FICURE 7.10 A drawing of the hand of a monkey, showing the distinct borders between the regions serviced by different sensory nerves (from Sherrington, 1940).
Border of the 7th cervical skin-field
nerve trunk, whose activation would give rise to the specific reflex. The skin was then probed and stimulated and a map prepared of the regions that would give the reflex. The area within whieh a response could be evoked was considered to be served by the single remaining nerve trunk. This work was done in the late 1800s and antedated the development of the modern electrophysiologieal techniques for single cell analysis. In more recent times, electronie mierotechnique has been applied to the same problem, and some important new insights into the degree of convergence (and, as we shall also see, the degree of divergence) have been obtained. A most graphie representation of the size and shape of peripheral receptive fields in single neurons is given in Figure 7.11, whieh shows photographs of the hand of a monkey with two types of receptive fields drawn in. This pieture (from Talbot, Darian-Smith, Kornhuber, and Mountcastle, 1968) shows the details of receptive fields of neurons in the median nerve of a Macaque monkey. Mountcastle (1961) also points out that the receptive fields for somatosensory cortieal cells are up to 100 times larger than the peripher al neuron ones, indieating further spatial convergence at higher levels of the nervous system. Perhaps the most extensive set of measurements and precise specifieation of receptive fields have been obtained in experiments on the visual system. While a number of relatively conventional recording systems have been used by numerous investigators, one of the most modern and most interesting is the computerized system developed by Spinelli (1967) to plot the receptive fields of the cat's optic nerve fibers. Figure 7.12 displays his complete system. Stimuli were presented on a 25 deg by 25 deg field by positioning a spot (either black on white or white on black) by means of an x-y coordinate servosystem, controlled by the output of a small digital
356
Sensory Coding
HG URE 7.11 Drawings, superimposed on photographs of the monkey' s hand, showing the shape and extent of the receptive fields of single afferent fibers (from Talbot, Darian-Smith, Kornhuber, and Mountcastle, 1968).
computer. The computer also picked up spike potential activity with microelectrodes inserted into the optic nerve and counted the number of spike action potentials occurring when the stimulus was positioned at a particular location. To determine the extent and shape of the receptive field oE a given optic nerve neuron, Spinelli programmed the computer to move the spot .along 50 horizontallines, each 1/2 deg apart and each 25 deg long. Counts were made oE the amount oE activity evoked at each lfz-deg separation in both directions. The data were then plotted on a two-dimensional CRT display in the Eollowing manner. A spot of light was plotted in each of the 50 X 50 regions deEined by the scanning procedure only iE the number of nerve action potentials exceeded some criterion. Figure 7.13 is a sampie oE this sort oE plot. The two displays in this figure result from an experiment in which the direction oE scan [(al horizontal, (b) vertical] was varied, while all other dimensions were held constant to determine if direction of motion ()f the stimulus spot was an important parameter in defining receptive field shape. The similarity in shape and distribution of the dots in each case illustrates the relative indifference oE the shape of the cat' s optic nerve receptive Held to the direction oE movement oE a stimulus. It should be no ted, at this point, that this result
The Neural Coding of Space and Time
357
To referenee
Fll00 SOurce
lollowers To referenee efectrode
Schmoll uigger In
Audlo ampllfler
In
Mlcrot;olectrodes In Opt1c nerve
v ajl(ls
servomOtOr and servocontrol pOtentfOrneter servomotor and servocontrof potenUomettr
}{ a)(IS
564 lo ur chan" . .. momtor CRT x aXI$ servo· ampHfier
PDP-S PDP-S analogue
y aXls servO· amphfler
output 9 PDP-S
502 slave CRT
~Ilalogue
output
Evenl pulse
FIGURE 7.12 A diagram of the experimental apparatus used by Spinelli to automatically plot out the receptive field of visual system neurons. Note the PDP-8 computer, which automatically controls the presentation and location of the stimuli as weil as tabulating the pulse modulated data generated by the Schmitt trigger-a device that emits a standard pulse whenever a nerve impulse cross es a criterion amplitude (from Spinelli, 1967).
is species specific and that we shall discuss later in this chapter retinal directional sensitivity in both the rabbit and the frog. This sort of plot also begins to give us an idea of the complexity of the visual receptive field. The particular field shown is not uniformly excitatory, but in this case there is a central region in which a stimulus seems to inhibit the response to a level below spontaneous activity. Spinelli found this concentric organization to be very common, thus confirming Kuffler's (1953) earlier work. Other cells, particularly those with oval receptive fields, were organized in just the opposite fashion. An active central region
Sensory Coding
358
(a)
(a)
(b)
FIGURE 7.13 Two plots (a) and (b) of the same receptive field of a ganglion cell from the cat's retina. The two figures were plotted with horizontal and vertical scanning, respectively. Slight difference between the two indicates a modest directional sensitivity of this particular ganglion cell (trom Spinelli, 1967).
was surrounded by an annular region of inhibition, in which the activity, rather than being evoked by stimulation, was actually depressed below spontaneous levels. SpineHi's work, as weH as illustrating a new computer technique that is an important advance over the usual single cell microelectrode time function paradigm, also illustrates, once again, the possible complexities of receptive field organization. In the cat' s retina, at least, receptive fields appear not to be either simply shaped or monopolar. Regions of increased excitation or inhibition of ongoing activity may be enclosed by regions of the opposite polarity. Furthermore, Spinelli's attempt to determine if any directional sensitivity was present reflects the most important fact that the shapes of the receptive fields in some animals are determined by other features of the stimulus than simple position. Thus, a stationary spot might define one sort of receptive field for a given neuron, while a moving slit might give an entirely different shape and size for the receptive field of the same cello Independent of artifacts, such as scattered light, the intensity of the stimulus mayaiso alter the shape or size of the measured receptive field. As we shall see later in this chapter, this sort of data has been generalized to a very important notion-namely, that some specific features of a stimulus may lead to activation of a given cell, but in the absence of that feature, the cell may be totally unresponsive to any sort of stimulation. The emergence of the notion that the receptive field structure of a given neuron is not rigid, but labile, and mayaiso depend upon the space-time pattern of the stimulus has been one of the most important neurophysiological conceptual developments in the last decade. The receptive field structure of neurons of the visual cortex has been studied by Spinelli and Barrett (1969); Baumgartner, Brown, and Schultz (1964); and by Hubel and Wiesel (1962). Unfortunately, the three sets of data do not always agree-probably due to differences in technique. For example, Hubel and Wiesel (1962)-work that we shall discuss in greater detaillater in this chapter-reported that all neurons in the cat's visual cortex (area 17) had elongated receptive fields, whose axis symmetry was a line. As we shall more fully develop later, they also thought this to be associated with the fact that these cells were selectively sensitive to elongated "bar"like stimuli. Many of their receptive fields were arranged in a side-by-side arrangement with adjacent inhibitory and excitatory regions.
The Neural Coding of Space and Time
359
On the other hand, Spinelli and Barrett (1969) found that very large numbers (at least 44 percent) of their recorded units had circular receptive fields with the concentric and antagonistic arrangement of inhibitory and excitatory regions with which we have already become familiar in the retina. Baumgartner, Brown, and Schultz (1964) found that fewer than 20 percent of the cells in the cortex had the characteristics described by Hubel and Wiesel and provided substantive support for the notion that large numbers of cells in the cortex had circular or disk-shaped rather than linearly organized receptive fields. It is thus clear that the last word has not yet been heard on this problem of the shape of the cortical receptive field. The possibility that shape changes under different dynamic conditions of stimulation remains with uso What is clear, though, is that neurons at all levels of the nervous system do exhibit receptive field organization resulting from the convergence of large numbers of receptor unit outputs onto the inputs of single more central cells. Is there an auditory analogue of the neural receptive field? Coding considerations have already led us to the conclusion that auditory frequency discrimination is the analogue of the two-point acuity threshold in somatosensation and vision. Similarly, we might also argue that the receptive field notion is perhaps best represented in audition by the tuning curves of individual auditory neurons. Galambos and Davis (1943) had shown that each neuron in the cochlear nerve is characteristically activated by a band of frequencies that is progressively wider as the intensity of the stimulus increased. This leads to a V-shaped response area when threshold is plotted as a function of frequency and intensity for each neuron. ThoUßh we shall have much more to say of this phenomenon when we discuss neural quality coding, we can briefly introduce the idea here. Figure 7.14 shows the general pattern of these response areas for a few cochlear nerve cells. The V-shape of the response areas once again emphasizes the fact that the response area or receptive field extent is not fixed, but varies in width as a function of the amplitude of the signal. The shape of the typical response area was thought to vary as a function of the level of the auditory pathway at which neurons are being examined. Katsuki (1961) had suggested that the width of the response curves decreases the high er one gets in the nervous system until one reaches the level of the cortex. At the cortex-as in somatosensation, but unlike vision-there appears, however, to be additional further convergent inter action, which once again broadens the receptive field, and 50 some workers now feel that the hypothetical sharpening does not exist (Simmons, 1970). This brief introduction to the concept of receptive fields has been intended to emphasize a number of points. First, the general ubiquitousness of receptive field organization excludes the primitive idea of private line signaling from each point on a receptive surface to a set of corresponding points in the central nervous system. A more modern statistical notion of pattern interaction and average or relative neural activity then emerges. Second, the notion of the receptive field is essentially one that emphasizes the sequential 1055 of information. Two spatial points, though perfectIy resolvable in terms of the stimulus or early neural response, cannot always
360
Sensory Coding
0
db below 2 V from oscillator 2000 (Hz)
10 20
30 17,100 (Hz)
40 50 2000 (Hz)
60 70
2000 (Hz)
80 90 100 110 100
200
200
1000
2000
4000
10,000
20,000
Frequency (Hz)
HG URE 7.14 The classic representation of the response areas of auditory nerve fibers (from Galambos and Davis, 1943).
be discriminated in terms of either the later neurophysiological or psychophysical responses. As we shall see, the notion of a uniform receptive field independent of activity in its neighbors is, however, also a gross oversimplification. We also have to consider the fact that there are very important neural interactions occurring among the neurons both within and between receptive fields. Before we discuss these spatial interactions, however, we must consider another way in which sensory information is lost. In this case we shall not be concerned with the problems of spatial acuity and spatial receptive fields, but rather with the fusion of temporally discrete stimulus events into events that appear to be perceptually simultaneous.
C. Temporal Acuity and the Psychological Moment 1. Masking and the Psychological Moment. Two sequential stimuli may be fused into a single sensory experience with no perceived temporal microstructure if they are separated by less than a certain minimum interval. The particular size of this minimum interval varies from one modality to another. These psychophysical data seem generally to be a function of the fact that very prolonged cortical responses are elicited by even the briefest of impulsive stimuli. As we have noted, a I-msec spike can evoke a l-seclong cortical response. The ability to distinguish between two sequential
The Neural Coding of Space and Time
361
temporal events, a notion we might refer to as temporal acuity, is analogous to the spatial two-point threshold acuity measure. 15 there also a temporal analogue to the notion of the spatial receptive field? The answer to this question is yes, and an extensive literature has grown in the last decade concerned with the problem of wh at has been ca11ed the psychological moment or quantum. The psychological moment is conceived of as aperiod within which there is an inability to either distinguish simultaneous from sequential events or the order in which events occurred. In the fo11owing discussion, we sha11 concentrate mainly on the visual system, but it is clear that the lack of temporal acuity illustrated in this discussion is also a common feature of the other senses, although perhaps with different time constants. The interested reader is referred to Pier on (1952) for a comprehensive and comparative statement of the temporal acuity of the various senses. 1 It should be no ted first that studies of the 1055 of temporal separateness, that is, of perceptual simultaneity (Lichtenstein, 1961; Fraisse, 1966) or of short-term visual storage (Averbach and Corie11, 1961; Eriksen and Co11ins, 1965) or of the psychological moment (Stroud, 1955; Allport, 1968) a11 essentia11y deal with a similar general problem: how are the finely divided moments of real time transduced into the far less highly resolved temporal units within our perceptualsystem? The studies just mentioned and a host of other similar ones make it clear that the ability to deal with time, as time, is also successively reduced in a11 sensory systems as one passes from the stimulus to the percept. We do not yet know the complete details of this 1055; however, a central hypothesis, which may help explain experimental findings concerning simultaneity, sequential masking, and the psychological moment, is that there is a temporal dispersion of brief stimulus events into longer neural and subjective events due to the neural divergent mechanisms descrtbed above. There is ample direct and indirect evidence that this temporal defocusing or slurring does occur, and that micromoments of real time in the world of the stimulus are extended into much more prolonged psychological and neurophysiological phenomena. The slurring of temporal distinctiveness and the dimensions of the psychological moment are often measured in an experiment in which the masking effects of a subsequent or preceding stimulus on a test signal are assayed with some sort of a masking procedure. Masking experiments are of several different kinds, and each seems to assay the effects of one or more different kinds of physiological mechanism. For example, one of the earliest forms of masking was that in wh ich a very bright flash fo11owed and masked a dimmer one (Crawford, 1947). This sort of masking occurs only monoptica11y and is thus thought to reflect the effects of some peripher al mechanism. Masking with random dotted noise (which we sha11 discuss in greater detail later) is dichoptic (the mask may be presented to the eye opposite to that receiving the signal with Httle diminution in the masking effect), suggesting a centrallocus. Metacontrast masking does not require overlap of stimulus and mask; the other types do. 1
Much of the material in the following section is adapted from Uttal (197la).
362
Sensory Coding
The general result is that there is some sort of a functional relationship between the degree of masking and the interval between the two stimuli. This period of interaction is, at least in part, identified with the psychological moment and the prolongation of the stimulus event into a longer lasting perceptual event. Similar moments, but with different time courses, can be measured in the other senses too. The interested reader may want to look at Kahneman (1968) for one of the most comprehensive and thoughtful discussions of the methods, findings, and theories of the visual masking literature. Most important, it is dear that during the prolongation of the stimulus event, there is no apriori reason to assurne that each successive portion of the prolonged event is weighted equally with regard to its perceptual efficiency. Experiments on visual masking of the several different kinds (Kinsbourne and Warrington, 1962a, 1962b; Schiller and Smith, 1965; Kahneman, 1968; Uttal, 1969c) suggest that there is, quite to the contrary, a weighting of the efficiency of any given masking stimulus, depending upon how dose it is to the masked stimulus. Herein lies the critical issue of the controversy surrounding the notions of psychological moments and perceptual simultaneity. The integrative mathematics, which is often used to analyze these events, is generally unable to distinguish the phase relations of the integrating period or moments (or whatever other name has been applied to the period of interaction surrounding the stimulus). Thus, like almost any correlation technique, the consolidation of experimental data into a simple statistical measure means a 1055 of some of the original information. Unfortunately, the phase information, wh ich is lost when one pools data or performs a correlation, is the critical dimension in the controversy over discrete discontinuous moments and continuously sliding moments. From this point of view, all experiments that pool data over many trials must be considered ill-equipped to resolve the controversy. What is meant by these two terms-the discrete discontinuous moment and the continuous sliding moment? Both refer to an integrating period, in which information about the temporal order of stimuli is lost. This term "moment" was probably first used by Stroud (1955), when he described his ideas of a discrete and discontinuous interval in which order information is lost, but the general notion of an integrating period has been used by many other authors. The term is used here in the more general sense, denoting the temporal inter val between two sequential stimuli in which they interact in one way or another. Simply stated, the two kinds of moments describe different ways in which the degree of inter action may be weighted as a function of the size of the interval between the two stimuli. The notion of the continuously sliding moment is characterized by the assumption that a region in which interaction occurs surrounds every stimulus. This region is weighted differentially 50 that the doser one stimulus is to another, the greater will be the interaction. This idea is an analogue of the weighting function of the lateral inhibitory inter action described by Hartline, Wagner, and Ratliff (1956), but it operates in time rather than space. It might weIl be called the sliding window hypothesis. It should be noted that the sliding window hypothesis is very similar
The Neural Coding oE Space and Time
363
to the notions oE the temporal summation model of masking proposed by Eriksen and Hoffman (1963) and Kinsbourne and Warrington (1962a and 1962b), and further developed by Kahneman (1967), although the notion that weighted effectiveness of inter action is a function of the temporal distance from the stimulus is an added parameter not considered by some of these workers. The sliding window notion also has great similarity to the term "spread function" suggested by Baron and Krantz (1970);2 one occasionally also hears the related terms "temporal receptive field," "psychological time quantum," and "sensory trace." Sperling and Sondhi's (1968) "impulse function" and Shallice's (1967) "moving average" also seem to denote similar ideas. Another important consideration is that while we shall often refer to the window as a region surrounding the stimulus, implying that it occurs both prior to and following the stimulus, and while we often measure the function in this manner, the interaction is not truly bilateral. Aperiod follows a stimulus event in which the trace of that event is capable of interacting with another stimulus event. Thus, a leading mask appears as a region of inter action preceding a test stimulus, if the criterion is the effect on the test stimulus; yet clearly wh at is really meant is the persistent effect of the leading mask on the trailing stimulus. The region of interaction surrounding a stimulus is, therefore, the period following it plus the period following any other stimulus that may have preceded it. This simplistic notion is weakened by the fact that even those masking situations that exhibit both leading and trailing masking (not all do) do not produce identical behavior in the forward and backward directions (Schiller and Smith, 1965; and Uttal, 1969a). This discrepancy is yet to be explained. The usual alternative formulation, the discontinuous discrete moment notion, assurnes that the weighting function is not differentially weighted and sliding with the stimulus, but is, rather, rectangular and not timelocked to the stimulus. From this point of view, any two events that occur within the same rectangular summation period would interact equally strongly, regardless of the interval between the two. The continuous functions of interval obtained in the several experiments mentioned above would, therefore, be produced by the statistics of the results of randomly placed periods of interaction over the many trials of any given experiment. As we have said, the main criterion that distinguishes the continuously sliding moment from the discrete discontinuous moment is whether the period of interaction is rectangular and unrelated to the stimulus or whether it is a continuous differentially weighted function surrounding and time-Iocked to the stimulus event. Both of these possibilities are schematically illustrated in Figure 7.15. There is only one experiment (Allport, 1968), with which the author is familiar, that specifically tests this issue; all others, to the best of our present knowledge, are ambiguous. This elegant work of Allport is necessarily in the form of a demonstration rather than a statistical study using pooled data. Allport's display was aseries of 12 horizontal lines, which were plotted on the face of an oscilloscope in ascending or descending order. 2
J. Baron and D. Krantz, personal communication, 1970.
364
Sensory Coding
CI
c: .;; .c
CI
~
Time
(a)
CI
c: .;; .c
CI
~
Time
(b)
FIGURE 7.15 Two drawings suggestive of the alternative models of the psychological moment. (a) The discrete, nonoverlapping, rectangular nonstimulus related moment. (b) The sliding and weighted moving window closely linked to stimulus events.
Once the topmost or bottommost line was drawn, the display recycled starting at the lowest or highest line. The subject's task was to observe this pattern of moving lines when certain ones were omitted, thus producing a moving dark band. Allport's analysis involved the timing of the omitted bands. If psychological moments were randomly placed, then the dark band would appear to move in a direction opposite to its real physical movement. If, on the other hand, the psychological moment were continuous and "traveling" or " sliding," then it should move in the same direction as its real physical movement. All subjects participating in the demonstration reported the dark band to move in the direction of the real physical movement. Allport concludes, therefore, that there is no support for the latter of the two alternatives mentioned above, which he refers to as the "Discrete Moment Hypothesis" (following Stroud, 1955). He believes that his results support the notion that he calls the "Continuous or Traveling Moment Hypothesis." It is now believed that the perspective of modern neurophysiology also renders implausible the notion of the randomly placed, rectangular, and discrete moment. Allport's carefully conceived experiments add further credence to the alternative point of view-that each stimulus event, no matter how brief, is surrounded by a temporal region in which it can interact with other stimuli. The work of many investigators in the Held
The Neural Coding of Space and Time
365
suggests further that this region of interaction is weighted by a function, which is dependent upon the interval between the two stimuli. We further assurne, although this is subject to doubt, that this region of interaction slides continuously along the real-world time line. Questions of simultaneity arise because of indiscriminably small differences in the weights of sequential events. Recently, ROSS3 has pointed out that while the moments may be continuous and traveling or sliding, it is also possible that they may not be constant in duration in different experimental contexts. It may be that the duration of the moment can be affected by the duration of the stimulus event, for example; if so, this would add a great deal of complexity to the problem as we know it today. It also would tend to emphasize the central nature of short-term sensory storage at the expense of peripheral explanations. H, indeed, moments are continuous and sliding, some possible experimental questions become nonsensical, while others become empirical issues eminently susceptible to quantitative measurement. The search for the phase relationships of rectangular and discrete moments, in this light, is a vain search for a nebulous chimera. At the very best, we shall always be bounded by the problems of data pooling; at the very worst, discrete moments may simply not exist. On the other hand, if we accept the notion of the sliding moment as a conceptual model and as an initial axiom, we can make meaningful statistical measurements of temporal inter action. 2. An Experimental Analysis of the Moment. While it is not, by any means, yet clear what the specific neurophysiological mechanisms are, other than the general notion of divergence, which have to be invoked to explain the psychological moment, it is possible to measure its dimensions. Perhaps one of the most direct ways is to use a computer controlled display and da ta acquisition procedure. In the following section, we shall discuss a relatively novel experimental masking approach to the problem of the psychological moment, which will help to concretize some of the preceding discussion. The author (Uttal, 1969c, 1971a) has studied the duration of the psychological moment by a masking test, which involved the discriminati on of a set of dots making up a test pattern (like an alphabetical character) from bursts of random noise dots presented at some interval fo11owing the pattern. The general idea of this procedure is that the perceptual events initiated even by the very brief stimulus presentations are prolonged, and the responses initiated by the dots of the stimulus and the dots of the mask will interact to a degree dependent upon this prolongation. Presumably the subject's discrimination is based upon an ability to discriminate between different levels of the residual brightness of the fading sensory trace associated with each of the two stimuli. This computer controlled experimental procedure is capable of giving especially precise control over stimulus presentation because a11 of the stimulus materials were generated on the face of an oscilloscope. The oscil3 J. Ross, University of Western Australia, personal communication, 1971.
Sensory Coding
366
(al
(b)
(cl
FICURE 7.16 (a) An exampIe of the dotted characters used in the dot masking experiments. (b) An example of the dotted noise used in the dot masking experiment . (c) An actual photograph taken from the oscilloscope of the letter E embedded in 100 random noise dots. This character can be seen better in tachistoscopic exposure than when viewed continuously.
loscope had a special phosphor, the light output of which decayed in less than 50 J.Lsec to imperceptibly low levels. The interaction between the two stimuli measured by the masking technique was, therefore, purely a function of temporal dispersion introduced by the various parts of the nervous system. Figure 7.16(a) shows a typical dotted alphabetical character. Figure 7.16(b) shows a typical random masking dot pattern of intermediate density. This random dotted noise may either be presented in a single burst or as aseries of dots distributed over an extended period of time. In the latter case, it is referred to as dynamic visual noise or DVN. Depending upon the interval between the individual dots in the DVN, the eye is capable of integrating or slurring a certain number so that many appear to be simultaneously present. Figure 7.16(c) shows the mixture of the dotted character and the random noise. Only Figure 7.16(a) and (b) are really physical entities. The mixture in Figure 7.16(c) is produced by the same temporal slurring we have spoken about. Depending upon numbers in the mixture of signal . and noise dots, the alphabetic character will be more or less recognizable. It should be noted that this technique will tend to underestimate the duration of the persistence of the visual image. Most certainly there will be some interval between the two stimuli at which the subject will be able to completely discriminate the dots of the character from those of the noise (the residual brightness will be sufficiently different), even though both may still appear to be present to some degree at that and longer durations. Nevertheless, with this caveat in mind, a good idea of the minimum duration of the interaction between sequential visual stimuli can be obtained with this technique. Figure 7.17 presents the results of a typical masking experiment (Uttal, 1969c), in wh ich the interval between the stimulus and a mask consisting of aseries of randomly positioned DVN dots was va ried in both the leading and trailing configuration in separate experiments. As the interval increased in either case, there is a corresponding increase in the
367
The Neural Coding of Space and Time 100 iä
B80
'0
~
1/
~ 60 c:
3-msec DVN
CI>
~
>-
1-msec DVN
t; ~
8
40
~
Trailing noise
CI>
Leading noise
tica
:;; 20 oe u
100
40
20
o
o
20
40
100
Interval between DVN and character (msec)
FIGURE 7.17 The degree ot masking as a tunetion ot the interval between leading and trailing dynamic visual naise (DVN) ot two different densities. Leading masking is seen to have a slightly more persistent effect than trailing (tram Uttal,1969c).
proportion of the number of presented characters that are correctly identified. Though the results are not exactly symmetrical, it does appear that there is nearly perfect identification at ab out a 30-msec inter val when the mask trails the character and at about a 40-msec interval when the mask precedes the character. This is necessarily an estimate of the duration of the period of interaction between the character and the interfering noise that must be on the low side for the reasons mentioned above. When the masking dots are all presented in a very brief burst following the signal character, there is also aperiod of interaction measured, which is almost identical to that measured with the distributed DVN masking noise. Figure 7.18 shows this result only for the backward case, but a comparison of these data with those of the right-hand side of Figure 7.17 confirms the ne ar identity of the duration of the period of interaction measured with either of these two methods. Perhaps a better estimate of the true duration of the moment can be obtained in an experiment in which the signal character is both followed and preceded by a masking train of dots (see Figure 7.19). The character in this case will be in a temporal hole between the two masking DVN trains. In this case, the period of interaction is not simply the sum of the periods obtained for the forward and backward cases, but a much longer period (about 150 msec) as shown in Figure 7.20. Thus, hole size is much more
368
Sensory Coding
100
... ...
(Q 0
80
..... 0
?f< "0 Q)
;;:: 0;; c:
10 dots
60
20 dots
Q)
~
40 dots
>
~
r..>
... 8... Q)
60 dots
40
80 dots
... Q)
100 dots
r..>
f!
co .&;
(,)
20
o
10
20
30
40
50
60
Interval (msec)
HGURE 7.18 The effects of variation of the number of masking dots and the interval between the character and the burst of masking dots on the recognizability of alphabetic character (from Uttal, 1971a).
The hole
Character Leading noise
Trailing noise Time-----
HGURE 7.19 The design of the "character in the hole" experiment, in which an
alphabetic character was inserted in the temporal gap between a train of leading and trailing dynamic visual noise.
369
The Neural Coding of Space and Time 100
§
'ö 80
~
i
:S60 c :2
>
tl
....
~
40 3-msec DVN 1-msecDVN
~
~ 20
.. ~
FlGURE 7.20 Results of the "character in the hole experiment" for two DVN densities (from Uttal, 1969a).
.c.
U
o
200
150
100
60
20 10
0
Hole size (msec)
than twice the sum of the durations of the forward and backward effects separately. In summary, there is a loss of both spatial and temporal acuity because of both convergent and divergent organization of the sensory pathways and central sensory structures. The notions of receptive fields and psychological moments, however, only begin to suggest the full complexity of the spatial and temporal interactions that occur among neurons at various levels of the sensory pathways. A major recent development in neurophysiological thinking has been the development of ideas concerning the spatial inter action of adjacent spatial stimuli in a way that transcends the simple fusion of the receptive field. Similarly, integration of or interaction among stimuli arriving at different times also has emerged as a major new concept related to a large number of psychological phenomena, especially those that have to do with the localization of some stimulus in space. As we discuss these processes, we shall see further examples of situations in which spatial aspects of stimuli are encoded by time, as well as ones in which temporally fluctuating stimuli are encoded by spatial patterns. The general problem that we now face is to describe and analyze these interactive processes and to draw from them the general principles that will be most instructive in helping the reader understand the operation of all sensory systems, regardless of the modality in which the observation was initially made. We shall turn first to an extensive consideration of studies of spatial interactions; then we shall briefly mention temporal interactions. III. SP ATIAL INTERACTIONS
It has probably been implicitly known for millenia by artists that their works must be slightly different from pure pictographic representation to give a realistic view of the world. The use of linear perspective, a mathe-
370
Sensory Coding
FICURE 7.21 The Hermann grid illusion, demonstrating the grey areas at the interseetion of the white grid lines.
matical notion formalized by Brunelleschi in the early days of the renaissance, for example, was introduced into drawing by Masaccio in the early fifteenth century, thus adding a third dimension to an art form, which had been mainly two-dimensional up to that time. Even earlier, many artists had realized that the apparent colors and brightnesses of a given area were dependent not only upon the constituent pigments, but also upon the spatial relationships of the area to its surroundings. This was an implicit, if not an overt, expression of the fact that nearby areas interacted to give different perceptual effects than would otherwise have been expected. A simple display, which illustrates this sort of spatial interaction, is the famous Hermann Grid illusion reproduced in Figure 7.21. At the intersections of the white grid lines, there appear to be greyish patches everywhere except at the point of fixation. The effect is more pronounced for smaller grids than for more broad ones. The most important characteristic of this sort of "illusion" is that it is very strongly influenced by the geometry of the stimulus pattern. Other visual illusions seem not to be so strikingly dependent upon the local geometry, but rather vary in a uniform way over large regions. The dassic simultaneous contrast illusions are of this sort, and later we shall discuss why it is believed that these two sets of phenomena may not be as dosely connected as they have been thought to be by many current writers. In the following material, we shall concern ourselves with the dass of effects that are dependent primarily upon local geometry and that, therefore, seem amenable to a neural coding analysis. To do so, we must first consider the nature of the perceptual phenomena, for which we shall seek a physiological explanation.
371
The Neural Coding of Space and Time
.8
:J .6 E
0'
"0
a;
....,. ...c:n
~
0 ...J
0'
0' 0'
0'
0'
10'
.4
50' 640'
.2 .0
B
-00
-1
o
2
Log inducing field (mL)
FIGURE 7.22 A graph showing the lateral inhibitory interaction (B), which can be observed in a human visual psychophysical experiment using a stimulus as shown in (A) as a function of the spatial separation between the test (T) and "inducing stimuli" (I) and the brightness of the inducing field measured with a reference field (R) as a function of angular separation and brightness. P is a fixation point (from Leibowitz, Mate, and Thurlow, 1953).
A. Perceptual Phenomena Related to Spatial Interactions As no ted, it is believed that the types of perceptual spatial interaction that do seem amenable to explanation by a simple nerve net inter action model are characterized by some functional relationship to the local geometry of the interacting stimuli. Two alternative experimental paradigms may be used to elucidate these geometrical functions. Inspections may be made of the interactions at regions of high information content, such as the boundary between large adjacent fields of different brightness or color. Alternatively, measurements may be made of the variation in the degree of interaction between two small regions that are placed at various differences from each other on the receptive surface. This latter category of experiment has often been used to establish the distances over which interactive forces can be exerted in visual experiments; for example, Leibowitz, Mote, and Thurlow (1953) used a pattern of rectangles arranged as shown in Figure 7.22 to measure the distance function in human vision. The task of the subject was to match a " reference field" in brightness to a "test field," which was in dose proximity to an "inducing field." As the experimenter decreased the separation between the inducing field and the test field, the inducing field became more effective in lowering the subjective brightness of the test field of constant illuminance. Thus, the brightness of the test Held had to be increased for the match. A measurable decrease in the subjective brightness of the test field could be
372
Sensory Coding cm
Lower arm
0.5
0.5
l3. .;; 0
...~ .c ...c
1.5
GI
GI GI
2.0
~GI
.Cl GI
U C tU
...
CI>
2.5
0
3.0
3.5
FIGURE 7.23 Agraphie presentation of the pereeptual effeet produeed by two meehanical stimuli applied to skin. When the two stimuli are presented elose together, only a single fused sensation of low intensity and narrow spatial extent is felt, but when the two stimuli are separated by a somewhat greater distanee, both the amplitude and the spatial extent appear to inerease. At even larger separations, the sensation is no longer fused, and the subjeet feels two points of modest amplitude (from von Bekesy, 1957).
measured even though the separations were as great as 9 deg between the test and inducing fields. As the illuminance of the inducing field increased, the perceived brightness of the test field also decreased. These findings are also shown in Figure 7.22. MacKavey, Bartley, and Casella (1962) in a related study showed a similar inhibitory inter action in human vision and also discovered that the effects of the first inducing field could be reduced if it, in turn, was inhibited by a second one. This latter finding is analogous to the physiological disinhibition phenomenon to be discussed below. Von Bekesy (1957) has carried out aseries of studies on the skin, which demonstrate phenomena very similar to those functional relationships obtained in visual studies. His findings are very important in furthering the notion of complex inter action in receptive fields, an idea that he has used as the theme of an entire book (von Bekesy, 1967). The experimental paradigm in this study involved the use of two-point stimuli applied to the skin, hut von Bekesy, instead of using the conventional twopoint threshold as adependent variable, chose to draw what was the suhject's impression of the magnitude and shape of the resulting sensation as a function of the separation of the two points. Though only quantitative in a limited graphical sense, this sort of data presentation can present an enormous amount of information in a dear and interesting manner. The results of this particular experiment, for example, are shown in Figure 7.23. Von Bekesy noted that there appears to be systematic variation in the
The Neural Coding of Space and Time
373
magnitude of the single sensation produced by two stimuli (within the twopoint threshold) as they were moved doser and doser together. However, when the single percept separated into two distinguishable points (when the two stimuli were separated by a distance greater than the two-point threshold), the sensory magnitude of each point dropped substantially and did not increase again until the two points were quite far apart. This waxing and waning of the cutaneous magnitudes suggests that the notion of a purely convergent receptive field is probably a considerable oversimplification. Rather, there are further spatial integrations and interactions, which are superimposed upon the basic idea of convergent pooling. If there is any single idea that has most captured the fancy of neurophysiologists in the last two decades, it may be said to be the notion of mutual lateral interaction among adjacent neural units. The interaction is said to be mutual because each unit seems to be depressed (or, less frequently, enhanced) reciprocally by the neighbor it inhibits. The polarity of the interaction, wh ether it is inhibitory or excitatory, is another issue of paramount importance. Some of the earlier studies suggest that the interaction is mainly inhibitory, but one of the important results emerging from von Bekesy's psychophysical experiments on human subjects is that the interaction often reverses polarity at different separations. These data suggest the notion of a central field, in which the two stimulus intensities can be summa ted to produce a sensory magnitude greater than that produced by either alone. This central region, it is further suggested, may be surrounded by an inhibitory annular field, in which a stimulus would produ ce a response which, in turn, might tend to actually diminish the magnitude of the response produced by a second stimulus. As we shall see, there is substantial neurophysiological evidence to confirm this idea of concentric rings of opposite polarity in the vertebrate nervous system. To anticipate a bit, such an organization has become known as an antagonistic centersurround receptive field. One of the most interesting, and nowadays familiar, perceptual phenomena associated with the inter action of spatial stimuli is the enhancement of contours-a phenomenon which was first described in detail by E. Mach (1865) and which is now known in his honor as the "Mach band." Recently, a very important book reviewing Mach's contributions to our understanding of the general problem of spatial interactions as well as his specific description of the contour enhancement effects has been published by Ratliff (1965). It would be hard to exceed the competent level or the sensitive and perceptive commentary that characterizes Ratliff's book, and the interested reader is, therefore, directed to that source for his fuH treatment. In the brief section that follows, we shall simply summarize the nature of the Mach band phenomenon as one compelling example of a psychophysical observation, which is probably media ted by simple lateral interactions among neurons. In brief, if a stimulus pattern is presented across a retinal field such that a cross section of the physicallight distribution can be represented by the solid line in Figure 7.24, the perceptual experience does not exactly follow the stimulus. Rather, a plot of the induced subjective magnitude would look more like the dotted curve in that figure. At the region joining
374
Sensory Coding
Amplitude
HG URE 7.24 A drawing il-
Physical Subjective
Distance
lustrating the basic phenomena of the Mach band. A physical stimulus, which is simply a gradiant, is seen (or heard or feit) as if there were a more intense band at the upper end of the gradient. The contour intensifications so illustrated are now believed by many psychobiologists to be mainly a function of lateral interactions in the nervous system.
the area of constant low-levellight with the beginning of the upward-going gradient, there is a depressed apparent brightness level, and a dark band thus appears. At the other end of the gradient where the gradient borders on the constant bright area, an apparent bright band occurs. Just how "real" these apparent bright and dark Mach bands can be is best illustrated by some of the anecdotes related by Ratliff in his book. He teIls how astronomers, artists, and weavers, as weIl as psychologists, have all had to deal with some curious problems of credibility as a result of these illusory bands. The important point in the demonstration of Mach bands is that the function representing the perceived brightness distribution varies considerably from the function describing the physical stimulus luminance distribution. It is sometimes difficult to get this notion across, for even in photographs the effect can come streaming through in a way that makes it difficult to convince an observer that the effect is mainly perceptual. This is so for two reasons. The first reason is that the photographic film like the human eye often has its own contrast enhancement mechanisms, not neural but chemical. The second reason is that the stimulus conditions, wh ich are necessary for the evocation of the Mach band enhancement, are maintained even in a photograph. Many auditory phenomena, which appear to be analogous, may also be cited, at least in part, to these visual spatial inter action phenomena. For example, the general masking of one tone by another is probably due to spatial interactions along the cochlea comparable to those producing the Mach bands. Figure 7.25 is Licklider's adaptation of Fletcher's interpretation of Wegel and Lane's (1924) now classic summary graph of the tonal interactions that occur between a 1200-Hz tone and a "secondary tone" of varying frequency. In this drawing, the secondary tone is masked most effectively when its frequency is higher than that of the masking tone. On the other hand, when the masking tone is of a higher frequency than the secondary tone, then the masking is relatively ineffective. A wide range of other auditory interactions, probably also attributable
The Neural Coding of Space and Time
375
90 level 01 primery tone
80
Mixtur. 01 lI'
\one$
.I:
~ 70
... 0
Primlry, secondlry, end differene. ton.
> ~ 60 e: 0
C)
50
öi >
40
'!
.,
30
'E 'I: co.
~
20
0
0
Mixture 01 Iones
'll
:.1
Piimary, secondary, and differenee lone
Primlryand differenee tone
.
~
Primary and , secondary ,
In
e:
.. cB
1I
.s
~
';;
Iones
Primary, secondary, end differenee tone
...51
e:
Mixture 01
,~
Primary onlv
c
!
!
10 0
'ö
400
600
800
1000 1200 1600 2000
2400
3200
4000
Frequency of secondary tone
FICURE 7.25 The dassic masking curve, showing the variety of interactions that can occur between two acoustic stimuli. When the two tones are dose in frequency, beating occurs. (In same instances, combination, difference or summation tones can occur). In other instances, one tone tends to mask the other, requiring a larger-than-normal masked tone before it can be detected in the presence of the masker. Many of these interactions can be best understood in terms of spatial interactions on the basilar membrane (tram Licklider, 1951b, after Fletcher, 1929, and Wegel and Lane, 1924).
to spatial interactions along the cochlea, is also represented on this figure. At some frequency combination difference and mixture tones, as weH as beats,occur. One of the most interesting elaborations of the auditory masking paradigm as an analogue of visual spatial interactions is Carterette, Friedman, and LoveH's (1969) demonstration of what they believe to be auditory "Mach bands." These workers demonstrated that there was a heightened masking between a narrow band of white noise and a pure tone when the frequency of the test tone was at either border of the band of masking white noise. The noise in their experiment was generated by a computer technique, wh ich produced a stimulus that had very sharp cutoffs at the upper and lower limits (edges) of a loo-Hz bandwidth. The general technique involved the measurement of the increase in loudness, which was required to compensate for the masking as one scanned a frequency domain varying from 0.4 to 0.8 kHz. Figure 7.26 is a sampie of their data, showing the gradual increase in loudness required for the detection of the test tone as its frequency is increased toward the lower edge of the noise band. In many cases, a decrease in the masking effect is observed when the tone is in the middle regions of
376
Sensory Coding 60
50
Auditory
"Mach"
bands
40
:a :g .......
..
:c
'0
30
..
Ö
.&;
on
1:
I-
20
60 db 10 40db 20 db .1
.3
.3
.4
.5
.6
.7 .8
Frequency (kHz)
FIGURE 7.26 An auditory phenomena that may be analogous to visual Mach bands. There is a peak in the masking function for tones near the edge of a band of masking white noise-the width of which is indicated by the two vertical dotted lines (from Carterette, Friedman, and Lovell, 1969).
the noise band. Subsequently, there often appears another increase in the required amplitude of the test tone for detection as the test tone passes over the high-frequency edge of the noise band. It is the increases in the test tone level required for absolute detection at the edges of the noise band that Carterette and his colleagues believe to be analogues of the Mach band edge effects in vision. In sum, there is a reasonably large number of spatially interacting processes observable in the various sens es that are characterized by a dependence upon the local geometry of adjacent stimulus spaces. There are also, on the other hand, overall pattern effects such as simultaneous contrast, in which the entire stimulus object seems to be uniformly affected by its surround without any differentiation as a function of propinquity. It
The Neural Coding of Space and Time
377
is possible that these two sets of data represent two completely different classes of phenomena, the latter possibly mediated by central mechanisms and the former by peripheral ones. In the following sections, we shall concern ourselves only with those neurophysiological data that clearly display distance and propinquity functions and that, therefore, appear to be plausible models of the former dass of geometry sensitive illusions. B. The Neurophysiological Data 1. Laterallnhibitory Interaction in the Horseshoe Crab Eye. From his earliest observations on the Mach band and related visual phenomena, Mach stressed the point that the subjectively bright and dark bands were produced by neural interactions in the retina. His insights, interestingly enough, were based almost exclusively on the psychophysical data, yet they sound quite modern and familiar in the light of contemporary neurophysiological findings. A sentence from one of Mach's original papers reads: "Two retinal points stand in a reciprocal relation which is determined by a function of their separation./I Mach's mathematical model of perceived contour enhancement was founded mainly on continuous second-order differential equations, which described the process thought to be transforming the stimulus distribution into the perceived brightness distribution. Although, he frequently speaks of the reciprocal interaction of retinal points (in fact, one of the papers reproduced in Ratliff's book is specifically entitled "On the Dependence of Retinal Points on One Another/l), the neurophysiology of his time did not allow hirn to be more specific about the anatomicalor physiological nature of the interdependence. Nevertheless, his writing suggests that he was fully aware of the necessity for postulating neural interactions on a very local basis. For example, on page 267 of Ratliff (1965), Mach is quoted as saying: "It appears to me that the phenomena discussed can only be explained on the basis of a reciprocal action of neighboring areas of the retina./I Most modern physiological theories of sensory lateral interaction, however, are founded upon the pioneering neurophysiological findings originally obtained by H. Keefer Hartline and his colleagues from their studies of the compound visual eye of the horseshoe crab Limulus polyphemus. Hartline and a number of his associates, most notably F. Ratliff and, more recently, B. Knight and F. Dodge, have used this remarkable animal as a model preparation in one of the most genuinely germinal and stimulating experimental programs ever to have come from any biological laboratory. As we shall see below, their findings have had widespread ramifications, not only in explaining the sort of spatial effects in perception that enchanted Mach so thoroughly, but also as a possible explanation of some other more recently discovered spatio-temporal interactions. While not all of these newer theories or microtheories will probably hold up under continued scrutiny, it is certainly a dear indication of the very great impact that Hartline's work has had to observe in how many different contexts other researchers have tried to use it as an explanation of their own data. The anatomy of an individual ommatidium is presented in detail in
378
Sensory Coding
Cornea I lens
Glassy cells
ret . c. B rhab . C
Eccentric cell
B
C
Pigment
FICURE 7.27 A drawing (A) of the neural connections at the base of a horseshoe crab ommatidium, showing two types of cross connectives. One connective type includes collateral fibers from the eccentric cell axon, and one includes collaterals of retinula retinula, cell axons: ret. rhab. = rhabdom, sh. cell nuc. sheath cell nucleus. (B) and (C) are cross sections viewed at the level of the lines labeled Band C (from Bullock and Horridge, 1965, after MacNicho/, 1956).
=
=
Neuropile A
sh . cell nuc .
Figure 7.27. For the purposes of the present discussion, all that is really required now is that we consider each ommatidium as a unitary receptor unit, interconnected with other similar units by the relatively simple linear collaterals shown in the figure. The lateral interconnectives between the ommatidia are now known to be made up of collaterals from both the axons of the retinula cells and of the eccentric cello A peculiar feature of this network of interconnecting fibers is that whatever synaptic connections exist between them appear to be unpolarized. Granular material, apparently transmitter substance, is found on both sides of the possible synaptic connectives, and this means that either cell can be the presynaptic or, for that matter, the postsynaptic partner. An important caveat, which should be remembered in this discussion, is that while the limulus eye is a most instructive experimental paradigm for the analysis of visual processes, it exhibits only inhibitory interactions. As we shall see, the data for both physiological and psychophysical experiments in the vertebrate eye seem better explained by networks in which both inhibitory and excitatory influences occur. The work of Hartline's group was originally reported in four very important papers in the Journal of General Physiology (Hartline, Wagner, and Ratliff, 1956; Hartline and Ratliff, 1957, 1958; Ratliff and Hartline, 1959).
The general procedure used throughout the experiments was described in the first paper in this series. Photic stimulation was accomplished by appropriately shuttered light sources that were focused by an optical system that consisted mainly of an inverted microscope. In this manner, a controlled demagnification, which allowed a very small stimulus spot to be
The Neural Coding of Space and Time
379
sharply focused on a single ommatidium, could be achieved. Individual unit records were usually obtained, not with mieroelectrodes, but by recording from single axons of eccentric cells dissected out of the optie nerve and laid ac ross a wiek electrode soaked with electrolyte solution. OccasionaHy, Hartline and his colleagues did use mieroelectrodes, but this technology was not reaHy necessary for this preparation because of the ease with whieh single fibers could be isolated. Similarly, the signifieance of the information picked up from the single optic nerve fiber was very straightforward. This fiber was the direct extension of one eccentrie ceH to one ommatidium and reflected the activity solely of that single ceH modulated only by the stimulus intensity and any inhibitory influences exerted by its neighbors. The inter action between two stimulated ommatidia could then be observed by stimulating each with aseparate beam of light and observing the mutual effects. Hartline, Wagner, and Ratliff have described the general nature of the response that was evoked when two ceHs were stimulated in words whieh can hardly be bettered.
Illumination of regions of a Limulus lateral eye in the vicinity of any particular ommatidium reduces the ability of that receptor unit to discharge impulses in response to light. During such illumination, the threshold of the receptor unit is raised, the number of impulses it discharges in response to a suprathreshold flash of light is diminished, and the frequency with which it discharges impulses during steady illumination is reduced. (HARTLINE, WAGNER, and RATLIFF, 1956, p. 655.)
Another important aspect of their initial discovery was that the interactive effects were reciprocal or mutual. The inhibition exerted by one ommatidium on a nearby one was paraIleled by an inhibition of the first by the nearby one. Thus, when two ceIls were simultaneously activated, the resulting level of activity in each was less than would have been obtained if only that one had been stimulated. The nature of the anatomical mechanism underlying these inhibitory interactions was then explored by Hartline and his associates by a direct and definitive method. The interconnecting plexus, which hung between the responding ommatidia, was carefully dissected away. As increasing amounts of the tissue were severed, there was a corresponding decrease in the amount of inhibition, and the ceIl increased its rate of firing until finaHy aIl traces of the inhibition disappeared and the ceIl responded as it did originally without the inhibiting stimulus. The gradual diminution of the response weakly suggests that the effect is probably not mediated by propagated spike activity, but rather by graded potentials electronieaHy conducted along the connectives between the ommatidia. Having established the general nature of the phenomenon, Hartline, Wagner, and Ratliff went on to vary several of the critical parameters that defined the magnitude of the inhibitory effect. We can best summarize their results by again quoting from the 1956 paper directly:
380
Sensory Coding 1. The degree to which the activity of an ommatidium is inhibited by
illumination of regions of the eye is greater the higher the intensity of that illumination. (p. 659) 2. The more intense the inhibiting illumination, the deeper and longer lasting was the initial depression of the frequency of the discharge and the greater was the depression of the steady level that was reached after the inhibiting light had been shining for a second or more. (p. 660) 3. The larger the area of the eye illummated by the inhibiting beam, the greater is the slowing of the rate at which the ommatidium discharges nerve impulses. (p. 662) 4. The response of an ommatidium is most effectively inhibited by illumination of other ommatidia located dose to it; the effectiveness decreases with diminishing distance. Usually however, some degree of inhibition can be produced by illumination anywhere within a region surrounding it that may cover as much as one half of the total area of the eye. (p. 662)
With regard to the shape of the inhibitory receptive field, they also found that the regions were not perfectly circular, but were ovallike the eye itself. Hartline, Wagner, and Ratliff were able to map a typical inhibitory field around a given ommatidium for two different light intensities. These results from their work are shown in Figure 7.28 and indicate dearly that the size of the inhibitory field is also a function of stimulus intensity. In the next paper in their series (Hartline and Ratliff, 1957), two additional important points were made. First, Hartline and Ratliff determined that the degree of inhibition of one ommatidium on another was a linear function of the acivity in the inhibitor. They also established the existence of an important indirect form of inter action, which they called disinhibition. H a stimulus situation with three aligned points A, B, and C is set up and the test activity recorded from point A, the following sequence oE effects can be obtained. H point A is stimulated in isolation, there will be an amount of activity induced that is purely dependent upon the intensity of the stimulating light. H, in addition, point B is stimulated, then the activity at point A will be reduced in the manner we have seen, due to the lateral inhibitory interaction of B on A. H, however, point C is subsequently stimulated with a similar light source, the activity at point A will, quite to the contrary, increase. This will be so in spite of the fact that separate stimulation of only point C and point A will often show that the two are so far apart as to be outside of the area of mutual inter action. Hartline and Ratliff's explanation oE the disinhibition phenomenon was that point C ac ted to inhibit point B. As B's activity was reduced, the effective inhibitory Eorces exerted by point B on point A were correspondingly reduced. In this manner a facilitory or summative effect could be simulated even though all oE the interactions were, in reality, inhibitory. The important point to be made here is that disinhibition exists and can mimic an excitatory inter action. Thus, it represents the prototype of a very important dass of neural interactions, which indudes superficially
381
The Neural Coding of Space and Time
Dorsal
Cf
90°
Cf
30°
60°1
30°
60°
90° 0°
log Iinhib.
0.0
0 .;:
...'" Q)
-1.0
30°
30°
excit .
0
0..
60°'
30°
90°
FICURE 7.28 A drawing of the horseshoe erab eye, showing the isometrie lines of eonstant amount of inhibitory effect for two different light intensities on a single excited ommatidium loeated at X. This figure teils us that the distanee funetion of inhibition is not eonstant in all directians (tram Hartline, Wagner, and Ratliff, 1956).
complicated processes that turn out to be synthesizable from neural proces ses of a much simpler kind. For example, Ratliff and Mueller (1957) have shown how simple lateral inhibitory interactions can give rise to such phenomenon as the "on," "off," and "on-off" responses in visual fibers. In the light of this sort of emergent complexity when only a few cells with simple properties are combined, it is quite elear why Hartline's work has had such a fundamental impact and why lateral inhibitory interaction is a potentially powerful construct in expanding our understanding of neural interactions. This point was further emphasized in the third of their series of papers. Hartline and Ratliff (1958) considered the problem of the sort of interactions that occur in a situation that was a simple but very significant modification of the three stimulus dis inhibition paradigm described above. In this modified experiment, point C was brought elose enough to point A (from wh ich the recordings are made) to exert its own inhibitory influence. The question is then: how do the inhibitory influences of both point C and point B combine to jointly reduce the responses elicited by a given stimulus presented to point A? Hartline and Ratliff determined that these inhibitory influences combined in a simple and direct manner-they were simply cumulative. Though Hartline and Ratliff use the word additive, the situation, as they describe it, is a bit more complicated than a simple alge-
382
Sensory Coding
braic addition of the inhibitory effects of point B or point C acting alone. The additional complication, of course, is due to the fact that the two inhibiting points are not always independent of each other in this situation, but mayaiso mutually inhibit the response of each other as weIl as the test point. Thus, the effect of both is less than the effect of the sum of each presented individuaIly. Nevertheless, the combined effect is the sum of the inhibitory effects of each, but as established after the level of activity of the inhibitor has been set by the combination of excitatory and inhibitory forces ex er ted on it. The effect is still simple-merely a summation-and in no way invokes mysterious or separate processes or the magical emergence of totally novel mechanisms, even though this additional consideration must be taken into account. Perhaps there is some insight to be gained here for later consideration of the more impressive processes of cognition, awareness, and intelligence, each of which also may be assumed to be a concatenation of essentially simple processes. While they are quantitatively large in number, there is no necessity for assuming that there is anything involved that is qualitatively different from the basic processes even at the more complex cognitive level. It is just this sort of pregnant possibility that is the basis of the wide acclaim the Hartline work has so richly earned. In their fourth paper, Ratliff and Hartline (1959) took two very important further steps. First, they established the details of the distance function and demonstrated that both the threshold of the inhibitory effect and its magnitude (once the lower threshold was crossed) decreased the further away the inhibiting site was from the test site. Most excitingly, in this paper they also ingeniously simulated the perceptual Mach band and thus presented one of the most singularly compelling of the few direct pieces of evidence confirming a neurological model of a perceptual process. The ideal procedure would have involved the presentation of the classic stimulus gradient of stimulus intensity used in the psychophysical Mach band experiment (see Figure 7.24) and then a measurement of the amount of neural activity at a large number of different points across the surface of the horseshoe crab eye. Obviously, this is one of those places where the logistic limitations of the single microelectrode technique forces the experimenter to deviate from the ideal. To use a spatial array of microelectrodes would either involve a very large amount of equipment or a very large amount of electrode manipulation. Ratliff and Hartline ingeniously suggested, as an alternative, a technique, which substituted time for space. They set up a single cell preparation, as usual, and then moved the stimulus gradient across the eye at a rather slow pace so that transient effects would be minimized. In this manner, they were able to observe the response of that single cell in what was effectively many different positions with reference to the stimulus pattern. Thus, it was as if they had moved their electrode sequentially from one receptor to another across the whole pattern. Figure 7.29 is a plot of the three important parts of their findings. The insert shows the typical energy gradient of the physical stimulus. The curve indicated with triangles shows the amount of activity induced when the
The Neural Coding of Space and Time
383 4.0
8.~
c.5 ::> "
....;;;
>
0.0
c:
'"
7.0
e...c:: 8
g t)
..... .;
Cl 1.0
~
~~
c::
GI
2.0
6.0
+1.0
5.0
0.0
4.0
1.0
3.0
2.0
2.0
3.0
+1.0
4.0
.~~
... c.
3.0
:::l
~
LL.
0.0
8.~
c.5 ::> "
+1.0
0.5 mm at the eye
FICURE 7.29 An analogue ot the Mach band experiment set up in the horseshoe crab eye. The insert shows the physical stimulus intensity pattern. The curve marked with triangles is the response obtained it all except the single ommatidium, trom wh ich the neural response is recorded, are shielded trom the stimulus as it is moved across the eye. The curve marked with small circles shows the typical "Mach band" contour enhancement, which occurs when the whole eye is illuminated with the moving pattern. lt is important to remember in this experiment that the recording is trom only one ommatidium and that it is the stimulus that is being moved from place to place rather than the electrode (from Ratliff and Hartline, 1959).
illumination was restricted solely to the region of the test receptor. In this case there is no activity elicited in the area surrounding the test ommatidium, and the response, predictably, simply reproduces the pattern of the stimulus intensity. However, when the entire stimulus pattern is allowed to fall on the horseshoe crab eye, then quite a different situation obtains. First, the overall amount of activity is decreased, thus confirming the general nature of the inhibitory inter action among the receptors of the limulus eye. More significantly, at the points of inflection of the stimulus gradient, there now appears a local increase or decrease in the amount of induced neural activity corresponding to the subjective reports of the bright and dark bands in the human psychophysical experiment. It would be nice if we could at this point say "QED" and assume that the issue is completely resolved and that in fact a compelling theoretical model of the Mach band and related phenomena has been given. But it is still important to remember that this is a model system. The horseshoe crab
384
Sensory Coding
has told us nothing of what he is "perceiving," and up to this point there has been no discussion of similar mechanisms in the vertebrate retina. It is to this matter we shall now turn and discover that, not unexpectedly, there is more to be said about the situation in higher animals. 2. Laterallnteraction in the Vertebrate Retina. The key data, which began
to answer some of the questions of lateral neurophysiological interaction in the vertebrate retina, were first reported by Kuffler (1952, 1953) and were obtained from experiments carried out in the cat's eye. Prior to his studies of the interaction effects, Kuffler had been exploring the organization of receptive fields in the eye of this animal, using glass insulated metal microelectrodes. These tiny electrodes were inserted into the retina and were able to pick up, extracellularly, spike action potentials from ganglion cells and their axons. Thus, the data obtained by Kuffler were based upon a flow of information, which had already passed through at least two levels of synaptic inter action and had been already subject to considerable processing and integration. (The reader may want to refer back to Figure 3.17 in our section on retinal anatomy to reflect upon the diversity of neural connectives in and about the inner and outer plexiform layers of the retina at this point and consider the possible interactions suggested by such a net.) Clearly, the anatomy of the situation is considerably more complicated than that of the limulus eye, and correspondingly, the physiological behavior of the system was also found to be far more complex. Kuffler made a number of important observations concerning receptive field organization. He had, for example, discovered that the receptive fields, while averaging 1 to 2 mm in diameter on the retina, varied considerably in size as the stimulus intensity changed-a fact reminiscent of Hartline's observations that the size of the field of inhibitory interaction in the limulus eye also varied with stimulus intensity. Kuffler further discovered that not only did the size of the field change, but the general organization and polarity of the interactions within the field could also vary with stimulus intensity or the size of the stimulus. The receptive field of a typical ganglion cell was characteristically organized as shown in Figure 7.30. Aseries of three concentric regions could activate the ganglion cell in three different ways. 5mall stimuli applied to the central region would produce a pure "on" response; that is, spikes would be evoked only at the onset of the stimulus. Small stimuli applied to the outermost ring would produce a pure "off" response; that is, spikes would be evoked only at the offset of the stimulus, while stimuli applied to the intermediate ring would produce both "on" and "off" responses-spikes being evoked at both the onset and offset of the stimulus. Given this type of receptive field organization, it is not surprising to learn that Kuffler discovered that the type of inter action observed between two stimuli was also more involved than that observed in the limulus eye. In fact, the best generalization of his results that can be made is that he found almost all possible forms of inter action that might have been hypothesized among the "on," "off," and "on-off" type of responses. It was possible, for example, to inhibit the "off" response of a test stimulus by the presence of a stimulus that produced an "on" response and vice versa, even
The Neural Coding of Space and Time
385
"on" responses
1 mm
Region of "on" responses Region of "on-off" responses Region of "off" responses
FIGURE 7.30 Organization of a receptive field in the cat's retina. In this particular ganglion cell's receptive field, a center region capable of only producing "on" responses is surrounded by amiddie ring of "on-off" sensitivity and then a larger annulus of purely "off" responses (from Kuff/er, 1953).
though the results varied depending upon the intensity of the stimulating lights, and where the mask and test stimuli were located in the tripartite receptive Held. Figure 7.31 (adapted from Kuffler, 1953) shows the three different types of inter action patterns, which could be obtained by slight changes in the intensity of simultaneous stimuli in the same cell. In the first series of three plates (indicated as Series I), one stimulus is presented to a pure "on" region and another to a pure "off" region. If they are presented separately, the evoked responses are as shown in the two upper photographs. If the two stimuli are presented together, then the "off" responses can be inhibited if the intensity relations of the two stimuli are such that the stimulus evoking the "on" response is more intense, and the stimulus evoking the "off" response is less intense. The same ceIl will show an inhibition of the "on" response when the two stimuli are presented simultaneously, but the intensity relation reversed (Series Ir). Mutual inhibition of both the "on" and the off" responses could occur if the two stimuli were both increased in intensity. The respective responses to both of the stimuli would be less than that to either alone (Series III). Kuffler also found that if the two stimuli were both inserted adjacent to each other in the part of the receptive Held producing "on" responses, that "on" response could also inhibit other "on" responses. However, superimposing the two stimuli leads to a summation of their effects rather than a mutual inhibition. 11
Sensory Coding
386
(a)
(b)
(a+b)
11
11
/U
FlCURE 7.31 This set of pictures shows the complicated interactions that occur between space and intensity in the ganglion ceU response of the cat's retina. Spot a was placed in the center of the receptive field and produced a pure "on" response. Spot b was placed in that portion of the periphery of the receptive field, which produced a pure "off" response. Depending upon the relative intensities of the two spots, the combined response might be either "on," "off," or a combined "on-off" response as shown in the three columns labeled I, II, and III (trom Kuff/er, 1953).
Perhaps the most important result of this work was the emergence of the notion of the center-surround antagonistic receptive field organization. While Kuffler observed center "on" and surround "off" organization, other investigators have also observed cells that are organized in just the opposite waYi an inner region that, when stimulated, produces only "off" responses surrounded by an outer region which produces only "on" responses. This sort of organization has been referred to as an "off center" field just as the former arrangement Kuffler originally observed is now known as an "on center" type of receptive field. Rodieck and Stone (1965a, 1965b) have, in re cent years, pursued the same problem of receptive field organization in the cat's retina pioneered by Kuffler, and their findings are generally in agreement with his older results. In addition, however, Rodieck and Stone have shown that the antagonistic center-surround receptive fields are most effectively activated by moving figures (although as we shall see later, there was no evidence of a specific directional preference in the cat). Dark spots on light fields and light spots on dark fields produced identical results when used to activate inhibitory center and excitatory center receptive fields, respectively, regardless of the direction of movement. Velocity specificity also seemed to be
The Neural Coding of Space and Time
387
present, but no shape specific sensitivity was observed except for sensitivity to the width (in whatever the direction of movement) of the stimulus. Perhaps their most important contribution in the present context was Rodieck and Stone's speculation that both "on," "off," and "on-off" mechanisms and the center-surround receptive field organization are produced by the summation of two dome-shaped processes, one inhibitory and one excitatory, overlapping each other in the receptive field as shown in Figure 7.32, in such a way that the weighted sum of their effects determined the response. Each process may actually cover large and overlapping areas of the receptive field. "On," "off," and "on-off" responses are produced, frorn their point of view, by the summation of relative magnitudes of the two opposing effects. Thus, the surround is only functionally an annulus, and, in fact, the underlying mechanisms are of the same sort of uniform dome shape as the inhibitory fields observed in the limulus eye. The more elaborate receptive fjeld structure in the vertebrate eye is presumably imposed because of the pooled effects of a dome-shaped excitatory pattern of interaction and the equally monopolar inhibitory pattern similar to that found in the horseshoe crab eye. 3. Da Lateral Interactian Mechanisms Exist in the Other Senses? The case
seems fairly conclusive that there is a wide variety of spatial interactions possible among laterally displaced stimuli in vision. These effects can be reflected in many different ways in the peripheral evoked neural responses. The question now arises: is a similar sort of lateral inter action possible in the other senses? Naturally, it is to hearing we must turn first to consider this issue, and it is in this modality that, second only to vision, the most is known about possible lateral interactive mechanisms. Figure 7.9 showed why the plexus underlying the hair ceU receptors may provide the necessary anatomical substrate for this interaction. We have also considered the psychophysical evidence earlier in this chapter. But what of the physiological evidence? Galambos and Davis (1943) demonstrated that the response pattern of single auditory nerve fibers could be weIl represented by a V-shaped response region on an intensity-frequency plot, within which a stimulus is capable of evoking a response in that neuron. We have already discussed this point briefly in the section on receptive fields and shall present a fuller discussion in the next chapter on quality coding. Galambos (1944), with the collaboration of Davis, subsequently discovered that tones with frequencies and intensities outside of the V-shaped response area could also have an effect on the response of these auditory nerve fibers. Generally, the effect was manifested by an inhibition of the spontaneous noise level in these fibers. All combinations of inhibitory arrangements were found with some ceUs inhibited by frequencies higher than those included in their response areas, some by tones lower and some by tones both higher and lower. Recently some important additions to this notion of lateral inhibition of tones by other tones have been made by Greenwood and Maruyama (1965). They were also interested in the induced effects of stimuli that fell
Sensory Coding
388
Center mechanism
Surround mechanism
Center component
Surround component
Sum
Response Center type response
No response
On-off type response
Surround type response
FICURE 7.32 A further demonstration of the eomplex effeet of stimulus loeation and reeeptive field geometry on the response that may be obtained from even a simple type of field. The top figure shows Rodieek and Stone' s interpretation of the eoneentrie eenter-surround antagonistie reeeptive field as being the result of two dome shaped and monophasie areas of inhibition and exeitation. Depending upon where a stimulus was plaeed in the field, it might produee different degrees of "on" and "off" responses as indieated by the seeond and third row of drawings. The sum of the two, direetly a funetion of the spatial loeation of the stimulus, determines the nature of the output (from Rodieek and Stone, 1965a).
outside of the response area of a given cell on the activity of that cello As we have seen, Galambos, as well as defining the response area in collaboration with Davis, had also found that there were inhibitory effects of stimuli outside the response area. Greenwood and Maruyama have refined this preliminary observation and shown that if spatial coding analogies are kept in mind, the type of inter action that occurs is very similar to that found in the vertebrate retina. Indium-filled microelectrodes were inserted
389
The Neural Coding of Space and Time
Cat 62-15 Unit 3 26
55
17
48
16
45
FICURE 7.33 The pattern of response of a cochlear nerve neuron to a 1.1 kHz stimulus at several different intensities. The response is plotted 10 horizontally for many runs at each stimulus intensity. Increasing the stimulus intensity first increases the response, hut after 40 dh it hegins to decrease again db SPL (from Greenwood and Maruyama, 1965).
40
200-msec
tone
db SP L
200-msec
tone
1.1 kHz
into single cells of the cochlear nucleus of the upper medulla, and the acoustically evoked spike responses plotted as aseries of dot patterns. Each dot represented the occurrence of a single spike action potential. These response records were plotted on an oscilloscope with the horizontal axis representing the time after stimulation. Each time a new stimulus was presented, the oscilloscopic trace was lowered a small amount, and in this manner a full ac count of the activity following each of aseries of stimuli could be recorded. If the dotted line is very dense, then the spike activity was also great; if it is sparse, then, so was the neural activity. Changes in the pattern of activity could then be observed as a function of other conditions as demonstrated in Figure 7.33, a sampie record. Greenwood and Maruyama usually replotted their data to display the relationships between the inhibitory and excitatory response areas of the individual cochlear nucleus cell (see, for example, Figure 7.34, below). In this case, the horizontal axis on their figures has a dual meaning. First, it indicates the frequency of the stimulating signal, and second, it indicates the approximate
Sensory Coding
390
..J Cl.
80
80
70
70
60
60
50
50
40
40
30
30
20
20
10
10
0
0
(I)
..c
. .,
"0
~
"
e:
...e:
-10 0
..-
15
0
-i ....
16
0
.n
0
LI)
cO cO 17
0
ci
~
N
~
N N
.1
18
0
LI)
-i N
~
,, 19
20
10
KC 21
mm from apex
FIGURE 7.34 The response area of a single cochlear nucleus neuron with a center frequency of 21.5 kHz. The isometrie lines in the middle define equal regions of excitation. The crosshatched regions indicate points in the intensityfrequency space that inhibited the spontaneous activity of this cell by at least three standard deviations below spontaneous activity. The stippled area indicates inhibitory regions in which there was somewhat less inhibition-between one and three standard deviations below spontaneous activity (from Greenwood and Maruyama, 1965).
point on the cochlea at which it is thought the maximum deflection occurs for that stimulus frequency. A wide variety of cellular response types was observed by Greenwood and Maruyama, and the significance of the response patterns is not always immediately obvious. Not the least difficulty was caused by the fact that stimuli often exerted both inhibitory and excitatory influences sequentially as the stimulus intensity or frequency was varied. Nevertheless, there was a general or, perhaps it would be better to say, a prototypical form of response found in many cells that does seem to emphasize the basic form of the interaction. As measured by their influence on the amount of spontaneous activity, stimuli with intensity and frequency outside of the excitatory response areas of a given cochlear nucleus cell usually inhibit if the difference is not too great, but when the frequencies differ greatly, they exert no
The Neural Coding of Space and Time
391
influence. Figure 7.34 displays this sort of inter action for a high-frequency ceU, in wh ich the response pattern was of this prototypical type. This complex figure contains a wealth of information. First, aIl of the contour lines connect points of equal response amplitude and polarity. Thus, a 21.5-kHz stimulus with an intensity 10 db above the reference sound pressure level produces just as great a response as an 18-kHz tone 50 db above that reference. More pertinent to our present discussion, however, is the fact that opposite effects can be produced by stimuli of identical frequency but different amplitudes. The middle unshaded region of Figure 7.34 is a reconstruction of the usual V-shaped response area as defined by Galambos and Davis (1943). On the other hand, there is also a surrounding region, as indicated in this figure with dots or cross-hatching in which an excitatory stimulus, when only slightly shifted in intensity, will begin to inhibit spontaneous activity or activity induced with some other tone. The potential intricacy of this type of inter action is profound, for a stimulus can also change from an excitatory to an inhibitory one merely by a smaIl shift in its frequency, even though its intensity is held constant. Conversely, stimuli with constant frequency can be made to alter the polarity of their effect by a slight variation in their intensity. The complexities that such simple mechanisms can and probably do introduce into the coding of such auditory patterns as music, for example, are quite beyond the analytical powers of our science at the present time. A further important generalization that can be drawn from this sort of data is that the cochlea exhibits the same sort of antagonistic centersurround organization observed in the retina. There is a central region, always excitatory in the case of this auditory response area, within an antagonistic surround that is always inhibitory. This antagonistic centersurround organization is observed through the medium of tonal (temporal) interactions, but, hopefuIly, it should be clear by now that auditory tone pattern discrimination is, in large part, mediated by spatial neural patterns as a result of the mechanical analyzing action of the cochlea. A similar sort of antagonistic center-surround organization has been observed by Mountcastle and Powell (1959) in ceIls of the somatosensory area of the monkey's cerebral cortex. A microelectrode was inserted into the cortex near acelI, which was activated by a spedfic receptive field on the forearm of the experimental animal. The receptive field could be traced out in the usual fashion. When a second stimulus was applied to a region surrounding that receptive field, the neural activity in the cortical ceU was drastically reduced. Figure 7.35 shows the general shape and size of the antagonistic excitatory and inhibitory receptive fields on the monkey's arm of that cortical cell. As usual, the process seemed to be quite reciprocal. Mountcastle and PoweIl's extraceIlular technique of recording also picked up the responses of other nearby ceIls, and in some records they were able to observe other ceIls activated by the same stimulus that inhibited the action of the test neuron. For these ceIls, the "inhibitory stimulus was an excitatory one." Furthermore, these ceIls could be seen to decrease their activity during the time the original excitatory stimulus was on. Figure 7.36 shows the effects
392
Sensory Coding
Excitatory
Inhibitory
FIGURE 7.35 The antagonistic center-surround receptive field arrangement of a somatosensory corticall cell on the monkey'5 arm (from Mountcastle and Powell, 1959).
of the two stimuli on the responses of both cells. The interaction between the two is seen to be reciprocal in very much the same sense as the mutual inhibitory interaction in the limulus eye. The general conclusions that may be drawn from all of this work is that there is inter action of spatially separated stimuli in vertebrate receptor systems just as there was in invertebrates. Although the pattern of interaction is far more complicated than the simple lateral inhibition found in the limulus eye, it is clear that some sort of lateral inter action is present at least peripherally and that it represents a very important mechanism, whieh must be taken into ac count in all accounts of vertebrate information processing. In sum, the vertebrate lateral inter action seems to be characterized by an important geographie property. There usually seems to be some sort of an antagonistie center-surround organization such that a central region, in wh ich stimuli may either excite or inhibit, is surrounded by a region of the opposite polarity. These receptive regions are defined, as we have no ted, only in terms of a given input channel (an axon or a nerve). The same retinal locus may be inhibitory for one channel, but excitatory for some other one. Furthermore, it should be appreciated that the general result of simultaneous stimulation of the center and the surround is to reduce the effect of either stimulus presented alone.
393
The Neural Coding of Space and Time FIeURE 7.36 The response
patterns of two somatosensory cortical cells that are linked in an antagonistic pattern. The excitatory stimulus turns on the cell whose response is marked with the dotted line. While the excitatory stimulus was on another, stimulus applied to the ~ annular inhibitory region 'i shown in Figure 7.35 excited =a. an "inhibitory cell" and in- .. hibited an "excitatory cell" whose response is marked with the solid line. This pattern of interaction is quite similar to the concentric center-surround antagonistic receptive field found in the eye and the ear (from Mountcastle and Powell, 1959).
Excitatory stimulus
70
Inhib. stirn.
60
50 40 30
20 10 0
2
4
6
8
10
12
14
sec
4. Do Central Laterallnteraction Mechanisms Exist? A very important point,
usually neglected but of the utmost importance as attempts are made to utilize the lateral interaction mechanisms as a neurophysiological model of various perceptual phenomena, concerns whether or .not such interaction occurs among central neurons or groups of neurons. Although there is satisfactory and sufficient evidence of peripher al lateral interaction, this same sort of study does not, unfortunately, directly confirm the existence of central inter action. Although central cells often reflect the results of simple geometrical lateral interaction, this finding does not tell us that such interaction necessarily occurred centrally. As with any other analysis of a closed input-output system, it is not possible to tell at what stage in the system a given transformation occurred by simply observing the relationship between the input and the output. To reduce the argument to its most extreme form (and one which we do not necessarily wish to champion at the moment), it may, on the other hand, have been that reciprocallateral interaction is a geometrical process, which is only really possible within the limits of a simple neural plexus in the periphery. It may be, in other words, that this form of neural integration is meaningful only in the initial portion of the communication process, which conveys information from the periphery to the CNS. The question is not, it must be emphasized, simply a matter of whether or not inhibitory synapses occur in the central nervous system, but rather whether mutual lateral inhibition exists centrally in a way that is analogous to the geometrical relations that have been established for the periphery. Because the type of experiment that is necessary to definitively answer
394
Sensory Coding
this question is difficult to do, central reciprocallateral inhibition has been only very infrequently discussed in a critical manner. The ideal experiment would require some way to separately stimulate, for example, cortical regions that were dose enough to interact but that were sufficiently independent to avoid simultaneous activation when a stimulus was applied. Stimulus effects would have to be produced on the cortex, for example, with the same sort of topological relations as they are presented to the retina and mutual interaction observed in the absence of peripherallateral interaction. But central structures are weH known to be very heavily crossconnected, and this sort of experiment may, in fact, be impossible to perform. A thought experiment, which might provide some possible approaches toward an ans wer to this question, may be proposed. Hubel and Wiesel (1963) have demonstrated that the ends of the cortical columns are laid out across the surface of the cortex in an irregular mosaic pattern. It should be possible to locate two adjacent areas that are preferentially activated by stimuli moving in opposite or perpendicular directions. In this manner, neither would be activated by the effective stimulus for the other area, and the change, if any, observed in the response pattern of one region when the second stimulus evoked activity in the adjacent region would be a true indication of the possible interactions. A corollary experiment would be to observe the interactions of columns that are not adjacent physically, but that are stimulated by stimuli moving in only slightly different preferential directions. Without evidence of interactions in either of these cases, the notion of lateral interaction among cortical columns, a hypothesis frequently invoked by some psychophysiologists in their models of such phenomena as metacontrast or sequential blanking, becomes equivocal. Von Bekesy (1967), in his most insightful and interesting book on sensory inhibition, has also considered this question of whether lateral interactions occur centraHy. He also was faced with a paucity of direct experimental evidence of interactions at high er levels of the nervous system and was able to refer only to demonstrations of interactive effects recorded at the high er levels, but induced with natural peripheral stimuli. As no ted, we do not believe that the question can be answered with this type of experimental paradigm. Another way to demonstrate central lateral interaction, although somewhat indirectly, would be to show that there is an increased sharpening of the spatial spread of a particular patterned response at sequential levels. Indeed, it is the need for this sort of sharpening or "funneling" that is at the heart of von Bekesy's expression of the fact that such centrallateral inhibitory inter action "must" occur. However, as noted by Simmons (1970), there appears to be no such sharpening in the auditory system. He says, "In fact, most such tuning is already nearing completion in the cochlear nudeus" (Simmons, 1970, p. 359). We shall deal with this notion in greater detail in Chapter 12, when we discuss auditory neuron tuning curves and their relationship to quality coding. A similar situation seems to obtain in the somatosensory system, according to Kenneth Casey of the University of Michigan. He notes (personal communication, 1971) that there is no reduction in the size of somato-
The Neural Coding of Space and Time
395
sensory receptive fields as one ascends the nervous chain. Iggo (1960) has shown that the receptive fields of C fibers are no more than about 2 X 2 to 5 X 5 mm at the periphery, and Winter (1965) has shown that peripheral A fibers have receptive field sizes that are usually no larger than 0.1 cm2 for the cat's digits. In the dorsal column nuclei, Winter also describes the same receptive fields as being about 1 cm 2 • In the monkey thalamus, they appear to be even larger; according to Poggio and Mountcastle (1963), they can be as large as 2 to 20 cm 2 • In the monkey's cortex, Mountcastle and Powell (1959) describe receptive fields varying up to 35 cm 2 • Clearly, rather than being funneled into smaller and smaller receptive fields, the receptive fields of the somatosensory system cells are generally becoming ever larger as one moves more centrally. Another important finding in all of these reports was the relative rarity of center-surround organization. The center-surround organization appeared to be a very rare phenomenon with no more than 10 percent of the cells at any of the somatosensory levels exhibiting that now-familiar form of lateral inhibitory organization. Similarly, visual receptive field size in the cortex seems not to be any smaller than those at the peripheral ganglion celllayer in the cat. (Compare the findings of Spinelli, 1967, with those of Spinelli and Barrett, 1969.) If anything, they are generally larger (and some diffuse types are quite a bit larger), suggesting that rather than sharpening, there has been an increase in the receptive field area due to the convergence of multiple inputs. It should be noted, on the other hand, that a few neurophysiologists have directly concerned themselves with the problem of centrallateral inhibition and have what they consider to be positive evidence for such processes. Krnjevic, Randic, and Straughan (1964) specifically have shown areduction in the responsiveness to stimulation with electrophoretically deposited L-glutamate when other nearby regions of the cortex were electrically stimulated. This is evidence for a sort of general inhibitory mechanism, some form of which would be necessary to keep brains from being in a constant state of activity or status epilepticus. Whether this generalized inhibition is analogous to the geometrical processes observed in the peripheral receptor net of limulus or the retina of a cat is, of course, moot. More to the specific point is the work of Singer and Creutzfeldt (1970) concerning reciprocal lateral inhibition of neurons in the lateral geniculate body of the cat's visual system. They found that cells with center-surround antagonistic field arrangement did appear to be inhibited by nearby cells with the opposite polarity of organization. Thus, a center-surround unit with an excitatory center was inhibited, it seemed, by a nearby unit with an inhibitory center receptive field organization. However, these authors are also careful to note that the type of reciprocal inhibition they observe is not necessarily the same as the type we have already discussed. For example, they note: The organization of "reciprocal lateral inhibition" between the onand the off-system will not improve spatial contrast like the classical type of lateral inhibition in Limulus and in the cat's retina. (SINGER and CREUTZFELDT, 1970, p. 329.)
396
Sensory Coding
Thus they dearly separate the mechanism they have studied from the simple geometrical sort of lateral interaction found in the periphery. They do, however, suggest that it might be a useful mechanism for sensitivity and successive contrast enhancement. It is dear that this issue cannot be resolved at the present time. It seems that the receptive field and tuning curve data provide no support for continued lateral reciprocal inter action (of the peripheral sort) at higher levels, but it is also certain that there must be some sort of inhibitory process centrally to stabilize cortical activity. There is little evidence of further central sharpening in any sense modality, however, and that is the main point which this section is intended to make. (But, note the discussion oE Gardner and Spencer's [1972a, 1972b] work below.)
C. Theories of LateralInteraction 1. Ratliff's Summary. Ratliff's (1965) book has been especially useful, since
it dearly points out the similarities among several theories that have been presented by different authors to explain lateral interactions in both the invertebrate and the vertebrate eye. Ratliff is careful to emphasize that while there are important differences between the simple inhibitory nets of the horseshoe crab eye and the combined excitatory and inhibitory interactions typical of the vertebrate retina or cochlea, all of the theories converge on a few common major ideas. The most important discrepancy among the different theories is, of course, the presence or absence of the central excitatory zone. Depending upon what particular mechanism in which particular species is being modeled, some of the authors ignore the presence of the central zone and assurne that summation is a process that only occurs when two or more stimuli fall directly on a single receptor. In their models, summation, therefore, is only a reflection of the fact that the single receptor has spatial extent itself. On the other hand, other theorists, usually dealing with vertebrate receptors, assurne that there is a network of excitatory inter action very much like the network of inhibition, which is the only sort of inter action found in the limulus eye. The nature of the differences and of the similarities has been best summed up by Ratliff in achart which is reproduced in Table 7.1, showing both the assumed inhibition-excitation functions and the mathematical relations assumed in six of the most important theories. Although Ratliff feels that the difference between Taylor's (1956) or Hartline and Ratliff's (1958) theories, on the one hand, and those of von Bekesy or Huggins and Licklider (1951), on the other, can be minimized by simply assuming a blur function that spreads the stimulus, such a notion probably does not take into account a more fundamental difference between the two approaches. The theories that assurne a very narrow excitatory region, corresponding only to the width of a single cell, do not necessarily assurne that there is an excitatory plexus of interconnecting fibers. On the other hand, those with broad central regions always implicitly, if not explicitly, assurne some sort of a central receptive field organization with neural inter connections organized such that the activation of many individual receptors can elecit a response in a single ganglion nerve fiber. From
The Neural Coding of Space and Time
397
this latter perspective, excitatory neural nets are assumed to be present while from the former, they are not. It seems clear that this difference is fundamental and that the theories may be assumed to be of two clearly distinguishable types. The more general one includes both dispersed excitation and inhibition mediated by the appropriate neural nets, and the less general one involves only inhibitory neural interactions. Ratliff further emphasizes that there are also some other differences among the several theories that are of note. The Hartline and Ratliff model (1958) and that of Taylor (1963) are the only two of the six theories described in Table 7.1, that are "recurrent"i that is, the inhibitory force exerted by each receptor is dependent upon both the stimulus applied to it and the inhibitory forces that are exerted on it by its neighbor. In this case, since the neighbor is presumably also recurrently affected by the response level of the first cell itself, the first ceIl's output is also indirectly a function of its own level of activity. A nonrecurrent system, on the other hand, is one in wh ich the inhibitory action that is exerted by a cell is purely a function of the stimulus acting on that cell. The difference between the two has been graphically shown by Ratliff in a drawing, which is reproduced in Figure 7.37. Unfortunately, as Ratliff points out, it is difficult to determine from the overall response of the system whether the actual neural interaction, in the vertebrate retina, for example, is recurrent or nonrecurrent, since both single recurrent stages of lateral inter action or dual stages of nonrecurrent interaction can produce almost identical performance. Since the models are nearly equivalent, if one disregards certain of the considerations we have mentioned, let us consider Hartline and Ratliff's own model as a simple example of the type of theory that is both plausible and useful in describing their important findings. The basic equation of their model, which is, in fact, a system of simultaneous equation, can be expressed as follows: rp
=
n
ep -
L: kp,i{rj -
0
rp,j)
(7.1)
j=1
This equation provides a means of computing the response of rp of the pth axon, which is defined by this relation to be equal to the amount of excitation ep produced by the stimulus falling on the receptor minus the sum of the inhibitory forces that are exerted on it by its j neighbors. The sum of the inhibitory forces is caIculated by computing, for all of those j cells (from j = 1 to j = n) that inhibit the response of the pth ceIl, the term k P.).(rrOP,).), where k P,). is a constant specifying the magnitude of the interJ action between cell p and cell ji r.) is the response of the cell ji and rOp,'. is the threshold value of the response of the jth cell for any inter action between cells p and j. The constant kp,j is necessary because of the empirical fact (noted above) that the degree of inter action between any two cells varies considerably from one pair to another. It is a constant, fortunately for a given pair of cells and the inter action is linear over wide ranges. Otherwise, a very complicated system of nonlinear equations would be required to adequately model this system. The necessity to subtract a
398
Sensory Coding
TABLE 7.1 GEOMETRICAL AND MATHEMATICAL CHARACTERISTICS OF SIX MODELS OF LATERAL INHIBITORY INTERACTION (FROM RATLIFF, 1965) Huggins and Licklider W(~)
kp,j
n
rp=·~1kpjj j=l
p
m
1=
n v. Bekesy
n rp =
j~1kpjj
Mach
r p
n
= I
~
p
Klp
k
j= 1
1·l
p, j
Ip
Fry
r
= log
p
1+
n
I
~ kp,jj
j=1
Hartline and Ratliff Threshold
n
rp = ep
+ ~
r p =ep
+ ·~1kp,jrj
Taylor
k ·(r· - rO .) j=1 P,I I P,I
n
J=
In this table W(O is the two dimensional inhibition-excitation function of which a cross seetion is shown; j is the parameter (that may assume values of p,m,n etc.) scanning the series of adjacent receptors; kp • 1 is the coefficient of inhibition between any point j and the central point p; r p is the response of the central point p; Ip and Ij are the illuminations of points p and j respectively; rg. j is the threshold of interaction between points p and j; ep is excitation produced by the stimulus on point p; rj is the response of point j and K is a constant.
threshold term rOP,}. is also a direct result of the observed biology of the situation. Hartline and his colleagues had determined in their experiments that the threshold for the response of a cell to stimulation was generally lower than the level at which it could begin to exert inhibitory influence on another cel!. Thus, there was a region in which the cell j could be responding to a stimulus without exerting any inhibitory influences on cell p. Such a system of simultaneous equations can be solved by hand, but it is a laborious and difficult process. This is particularly so if the inter action
399
The Neural Coding oE Space and Time FICURE 7.37 Two drawings that show the difference between (a) recurrent and (b) nonrecurrent lateral inhibitory interaction processes. The distinction is, simply, based upon whether or not the inhibitory action of a neighboring unit affects the excitatory process or impinges upon the cell beyond the point (marked x) at which the level of activity is defined (from Ratliff, Hartline and Miller, 1963).
(al
(bI
functions are assumed to be recurrent. In that case, aseries of iterative steps approaching a stable solution to the problem must be used. A computer is, therefore, by far the superior me ans of solving the system of equations. Either digital or analogue processes can be used, but in this type of problem, in which simultaneous linear algebraic equations must be evaluated, there are considerable advantages to using a digital computer. Very rapid solutions are possible, but perhaps even more important, the ability to rapidly change the parameters of the equation or even the nature of the hypothesized neural interactions may help to facilitate the observations of a number of mathematical "experiments." The present author had tried his hand at a computer simulation of the Hartline-Ratliff model some years ago. One important result that quickly emerged was that the distance function (that is, the set of kp,j) that had to be used was of critical importance in determining the closeness of fit of the computed response to the observe'd response. If the distance function did not extend out far enough (for example, if cells were only inhibited by their immediate neighbors), then the Mach-band-like contour intensification oscillated in space, and rather than one dark band, aseries of dark and light bands with successively diminishing amplitude appeared at both ends of the gradient of stimulation. It should be emphasized here that the equations that we have presen ted here are descriptive only of the steady-state stable situation that obtains after the initial transients have disappeared. An entirely different formularization would be required to describe these transient responses, and there is little question that particularly in human vision they must play an important role in encoding dynamic visual stimulus patterns.
2. Von Bekt~sy's Theory of Funneling. Though von Bekesy's theory is one of the six models discussed in the previous seetion, his formulation of the notion is sufficiently unique that an additional comment on it is appropriate. It is particularly of interest in the context of this present book, because von Bekesy has applied this notion to such a wide variety of different sensory phenomena. In one of his many important papers describing the sirnilarity
400
Sensory Coding
between cutaneous and acoustic sensations, von Bekesy (1958) had coined the word "funneling" to describe the sharpening of diffuse stimulus patterns applied to a spatia11y extensive receptor surface within which there was lateral interaction. His model essentia11y involves a concentrica11y organized system, in which a central region of summative excitation is surrounded by a concentric ring-shaped area in which sensations are inhibited. Von Bekesy clearly feels that this sort of organization is attributable to lateral interconnections at a11 levels of the nervous system. The important function of this funneling mechanism is to alter both the spatial distribution of a response and the intensity of the response. Stimuli that occur in the surrounding ring of suppression tend not to be feIt, but are able to increase the amplitude of the sensation produced by stimuli in the central region, while stimuli presented in the central region summate and positively interact. Thus, a widely spread stimulus pattern tends to be channeled or funneled into a narrow region of sensation as indicated in Figure 7.38. It should be noted that von Bekesy's funneling model is not rea11y a neural net model and is certainly not one that can be solely explained by peripher al mechanisms. Rather, funneling is a general statement of sensory phenomena based on his speculations of the kind of inhibitory and excitatory interactions at a11 levels of the nervous system that might account for them. This problem has recently been directly attacked by comparing neural responses from peripheral and centrallevels of the cat's somatosensory system to human psychophysical responses (Gardner and Spencer, 1972a, 1972b). These workers found that here were no peripheral neural interactions that mimicked the psychophysical phenomena of funneling. However, Gardner and Spencer did find that central neurons did exhibit an analogous funneling effect. They do state, nevertheless, that this central neural funneling is due entirely to excitatory summations and that there is no evidence in their data that lateral inhibitions occur centra11y. IV. THE NEURAL CODING OF TIME PATTERNS
A. AComment The study of time as an objective of psychophysical inquiry is a topic with many ramifications. Students of the subject like Fraisse (1963) or Frankenhaeuser (1959) have included within the rubric such topics as the perception of the duration of passed time, temporal order, effect of the environment, and even the threshold for the perception of duration. However, the neurophysiology of a11 of these topics is far beyond our current grasp, and thus they are irrelevant in a discussion of how time is neurally encoded. We have to turn to a much more circumscribed context to find a set of data that is meaningful in current neurophysiological terms. If there is any single topic that we might consider to fill this bill, it is the perception of rhythm or repetitive regularity. How are the rhythmic characteristics of a repeating stimulus perceived and encoded? It is unusua11y important in this case that we establish wh at the limits of perception are before we attempt to find neural correlates, because it is quite evident that a good bit of the temporal information present in stimu-
401
The Neural Coding of Space and Time HGURE 7.38 A sketch showing the funneling hypothesis according to von Bekesy. An inhibitory region surrounding an excitatory region has the effect of "funneling" the psychophysical experience into a narrower region than that defined by the physical extent of the stimulus. This is another way of expressing the lateral interaction process in which contours are enhanced (from von Bekesy,
Sensation
Stimulus
1958).
lus patterns early in the encoding process is totally lost at higher nervous levels. We have already considered the problems of temporal acuity and simultaneity in our discussions of the psychological moment. In that case, the nervous system simply discards the details of the temporal pattern, and it would be pointless to attempt to find a central neural code for intervals less than those psychophysically detectable. In other experimental paradigms, the same situation obtains, even though it may not be so obvious. Irregularities in the pattern of intervals between a repetitive stimulus train are often beyond the limits of detection. In some instances, however, even though the temporal pattern per se is lost, some other common sensory dimension may encode some of the temporal information contained in the stimulus pattern and thus allow a psychophysical discrimination of two temporal patterns. For example, Uttal (1970) has reported that the frequency of a flashing light can affect the ability to discriminate the presence of a temporal gap, even though the frequency of the train of flashes is far above critical fusion frequency. A small dot of light, lasting for ab out 30 !Lsec, was repetitively flashed on the face of an osdlloscope at frequendes of 1000 and 333 Hz. A gap (one irregularly large interval) inserted in the train of flashes was detected at significantly different threshold durations for these two repetition rates. Apparently, some vestige of the information contained in the highfrequency train was retained, even though each of the two trains was perfectly fused. Probably, the subject was responding to differences in the average brightness of the train of light flashes that was a function of the total energy content of the stimulus. Thus, an intensive judgment was substituted for a temporal one. The detection of irregularities has already been discussed in the previous chapter. In doing 50, we bypassed, however, the problem of frequency discrimination itself, the first-order phenomenon of which interval irregularity is but a further perturbation. The ability to discriminate stimulus frequency varies considerably for the various senses for reasons that are fairly obvious. These are fundamental limits imposed on frequency discrimination in vision (and we are, of course, referring here to the frequency
402
Sensory Coding
of a repetitive flash rather than to the frequency of the electromagnetic wave) by the physiological processes in the receptor and the rest of the retina that deHne the flicker fusion threshold. In audition, the high degree of frequency discrimination ability (and in this case we are speaking of the frequency of the acoustic signal), as we have seen, seems in large part to be media ted by a spatial recoding. When electrical stimuli are applied to the auditory nerve directly (Simmons, Epley, Lummis, Guttman, Frishkopf, Harmon, and Zwicker, 1965), then the frequency discrimination seems to be very much of the same order as that defined for electrical stimulation of the skin by Anderson and Munsen (1951). For low-frequency (50-300 Hz) electrical auditory stimulation (Simmons et al.), subjects required about a 30 pulse/ sec difference before they could discriminate two stimuli. On the skin (Anderson and Munsen), the subjects also required a difference of 30 Hz below 100 Hz, but the threshold difference rapidly enlarged as the frequency increased. If there is any generality that we might draw from these studies, it is that if one closely ex amines the purely temporal discriminative capabilities of the human observer, independent of recodings to intensive or spatial dimensions, they seem to be severely restricted. Presumably, the auditory system uses space as an additional code for frequency discrimination to achieve the high levels of performance obtained in psychophysical experiments. In the preceding, present, and following chapters, we have seen and shall see many other instances in which temporal codes are used to represent spatial, intensive, and even qualitative discrimination. We shall also note instances in which temporal differences are represented by spatial or intensive codes. This is a curious paradox, but entirely consistent with the basic premises of coding theory. To illustrate the trade-offs among space, intensity, and time, we shall now turn to a discussion of an interesting set of studies concerning the experience of somatosensory flutter and vibration. In addition, these studies are a model of the useful contribution that can emerge from a combined psychobiological approach to the problems of sensory coding. B. Mountcastle's Studies of the Physiological Basis of Flutter and Vibration Sensitivity Workers in Mountcastle's laboratory at Johns Hopkins University have long been concerned with temporal co ding in the somatosensory system. They have been making psychophysical and neurophysiological comparisons of the responses in a cutaneous modality, which they refer to as "flutter" at low frequencies and "vibration" at high frequencies. Psychophysical data obtained in human experiments and neurophysiological data obtained both at the level of peripher al nerves (Talbot, Darian-Smith, Kornhuber, and Mountcastle, 1968) and at the level of the cerebral cortex (Mountcastle, Talbot, Sakata, and Hyvärinen, 1969) in the monkey have been compared to determine how the stimulus pattern is represented as it is communicated up the ascending pathways. The mechanical stimulus waveform with which they worked was a combination of a step or pedestal constant level displacement and a sinusoidally driven oscillation of displacement as shown in Figure 7.39. This
The Neural Coding of Space and Time
Sine wave amplitude
403
Pedestal height
FICURE 7.39 A sketch of the stimulus pattern used in the study of the neural correlates of f/utter and vibration sensitivity. A pedestal of constant stimulus amplitude was used to raise the stimulus into an active portion of the intensity range. Superimposed on the intensity pedestal is a sinusoidal f/uctuation of intensity which, although small in amplitude, selectively acts to produce wide ranges of neural and psychophysical responses (modified from Talbot, DarianSmith, Kornhuber, and Mountcastle, 1968).
combination waveform was necessary to avoid the establishment of a high level of adaptation prior to the experimental stimulation, yet still allowed a reasonable absolute depression of the skin. The only other alternative would have been to use very massive oscillations, which could potentially damage the receptors and which were certainly far above thresholds. The pedestal stimulus also served to establish a base of background activity, against which the effects of the sinusoidal components could be evaluated. In this meticulous manner, typical of the great care that is characteristic of the work from Mountcastle's laboratory, single fibers were kept active for three or more hours. This same sort of stimulus was used in both psychophysical experiments on human subjects (in which the threshold of the sense of flutter-vibration was the main dependent variable) and neurophysiological studies on Macaque monkeys (in which several measures of the frequency pattern of the spike action potential response were the major dependent variables). The psychophysical data recorded by Talbot, Darian-Smith, Kornhuber, and Mountcastle can be quickly summarized. The important part of their findings indicated that the threshold of detectability of the oscillating portion of the stimulus, as measured by the amplitude of the sine wave, varied monotonically, but usually with a point of inflection between two segments of the curve. Figure 7.40 shows typical results for two different locations on the hand: the finger and the thenar eminence (the root of the thumb). Talbot and his colleagues have interpreted the break in the curve of the data obtained from the thenar eminence as indicating that fluttervibration sensitivity is media ted there by two sets of receptors. They also report corresponding neurophysiological evidence which, they believe, supports their notion of a dual somatosensory flutter-vibration system.4 In fact, they found three different classes of peripher al neurons that responded in different ways to this kind of stimulus. The characteristics of the three different types are summarized in Table 7.2, which is reproduced here from their paper. 4
But, note our earlier discussion on this matter in Chapter 6.
404
Sensory Coding
300
3
Thenar eminence Finger
2 S. E.'s
100
.,~ 3:
CI>
c:
'in
....0 CI>
'tl ::I
.t:
10
Ö.
E
ca
"'0
"'0 CI>
CI>
i=
i=
E
E
Area of sensation
.,>
Gi
Gi
Right
Left
12 cm
~)
~
FIGURE 7.46 A diagrammatic sketch of the difterence in the spatial loealization phenomena in hearing and on the skin. Part (b) is another way of plotting the data presented in Figure 7.45. When the two stimuli oeeur dose enough together in time, they are fused into a single sensation, whieh is differentially loeated in spaee depending upon the interval between the two stimuli. In hearing, unlike its cutaneous analogue, there is some sort of an automatie gain eontrol so that the subjeetive magnitude of the fused sensation always remains the same (trom von Bekesy, 1957).
Von Bekesy has also demonstrated an analogous spatial localization phenomenon in gustation. Using controlled streams of salt solutions asynchronously applied to two areas of the tongue, he was able to demonstrate that the two stimuli led to two sensations if they were very much separated in time, but to a single more diffuse, yet spatiallY localizable, single fused experience if they were separated by only a short interval. Figure 7.47 (reproduced from von Bekesy, 1964b), using the same sort of chart as Figure 7.46, demonstrates that the precision of detection of intrastimulus interval in gustation, using spatiallocalization as an indicator, was on the order of 1 or 2 msec. Von Bekesy (1964a) has also extended this work to the olfactory modality and shown similar results. Thus, we can see that the precision of temporal measurements, which has been so poor when human subjects were asked to make purely temporal judgments, turns out to be greatly improved when spatial judgments are required of the subject. This finding illustrates once again that isomorphism
415
The Neural Coding of Space and Time
NaCI
2.6 cm
Stimulus
+3
Time difference between stimuli (msec)
+2
+1
o
-1
-2
-3
Local sensation magnitude FICURE 7.47 A plot (similar to Figure 7.46) of the localization of a gustatory
stimulus on the tongue. Surprisingly, the temporal discriminability in this case is also very fine in spite of the absence of fine temporal experience in the gustatory modality (from von Bekesy, 1964b).
of stimulus, neural code, and percept is not at all necessary. Freeing our perspeetives from this partieular trap ean have some eurious effects. Not only are we relieved of the constraints of the search for isomorphie codes for stimulus variables, but we also no longer find it quite so necessary to assurne apriori that an isomorphie correlate is a more likely code for some stimulus dimension than a nonisomorphie one. B. Neurophysiological Data of Relevance to Spatial Localization As we have seen, the psychophysieal data suggest that with the exception of visual depth pereeption, dynamie temporal dimensions playa preeminent role in the spatial loealization of a stimulus. The neurophysiologieal data appear to eorrespond with these findings in all of the senses. The temporal relations of the nerve impulse pattern also appear to be preeminent everywhere exeept vision. In this latter modality, relatively statie interaetions of the retinally disparate images seem to play the key role. In the following section, we shall eonsider some of the statie properties
416
Sensory Coding
of neural interaction underlying depth perception in the ascending visual pathways, and then review some of the highly significant contemporary neurophysiological data, which emphasize the temporal encoding of acoustic spatial perception.
Static Visual Codes for Spatial Localization. Because of the anatomy of the visual pathway, binocular stereopsis, the most powerful of the cues for visual depth perception, can only be media ted by mechanisms located at the very highest levels of the nervous system. It is generally agreed by most contemporary authorities that there is no neural binocular inter action at any level lower than the lateral geniculate body of the thalamus. In the lateral geniculate body, the presence of true binocularly sensitive cells has been disputed for many years. DeValois (1960) had been unable to detect any binocular inter action in the thalamus. His work supported a number of earlier studies, although there was at least one report (Bishop and Davis, 1953) that a few binocularly sensitive cells were occasionally observed in the cat's thalamus. In such a controversial situation, the reader may quite justifiably suspect that the phenomenon, if present, is rare. A definitive study on the matter of lateral geniculate binocular interaction by Lindsley, Chow, and Gollender (1967) supports the notion that, though infrequent, binocular inter action is present in the cat's thalamus. Using stainless steel electrodes, they recorded extracellular spikes from a relatively large sampie of lateral geniculate neurons and found that somewhat less than 20 percent of the cells were able to respond to some form of binocular or dichoptic stimulation. Interestingly, some of the cells behaved in a manner with which we have become quite familiari stimulation of one eye would give rise to an increase in the activity of a particular cell, while stimulation of the other eye would reduce the activity of that cello This antagonistic interaction is reminiscent of much of the data described earlier concerning receptive field structure and interaction. Nevertheless, it is to the cortex itself that we must turn for most of the detailed analyses of the possible mechanisms of binocular and dichoptic spatial perception. Hubel and Wiesel (1962) had been among the first to look specifically at the problem of the representation of binocular depth perception in the cells of the visual cortex. However, Barlow, Blakemore, and Pettigrew (1967) have recently described some findings that update our current knowledge of this matter, and we shall concentrate upon these more recent data. Using microelectrodes inserted into the exposed primary visual cortex, they observed the response of single cells to combinations of binocularly presented visual stimuli. To obtain adequate responses, they had to use moving bars as visual stimuli, reflecting a preferential feature sensitivity described earlier by Hubel and Wiesel. The success of Barlow, Blakemore, and Pettigrew's studies depended upon the precise specification of the angular degrees of retinal disparity 5 of the visual stimuli applied to the two eyes. For many technical reasons, this was not an easy thing to do, but they were 1.
5 Retinal disparity is said to exist when the two retinal images fall on retinal points that will not produce fusion of the two images or, alternatively, fuse, hut so as to produce a difference in the apparent depth of the stimulus.
The Neural Coding of Space and Time
417 Unit 13/20 Siow movement of narrow bars
Unit 13/19 Siow movement of narrow slits 2 (26)
2
1 sec
(1'4)
Left eye alone l'
0
2
(00)
(46)
Right eye alone
HeURE 7.48 Data from two cells in the cat's visual cortex are presented to show the fine tuning to specific spatial disparities. Each cell responds weakly to a stimulus to one eye and weakly to binocular stimuli that are either too dose or too far apart. When the two binocular stimuli have a specific disparity (5.2 deg for Unit 13/19 and 3.3 deg for Unit 13/20), then a substantial amount of activity is generated in each binocular cello Su.ch a mechanism may explain how objects are coded for depth on the basis of retinal disparity (tram Barlow, Blakemore, and Pettigrew, 1967).
Binocular facilitation
17
n
(10)
(15)
5.7 0 Distance between binocular centres
Binocular occlusion. Targets too close
Binocular occlusion. Targets too far apart
3·3"
1
0
(1-0)
(00)
5.20 3·3" 0
o
(08)
(00)
6.2 0 3·3"
able to give a pretty good approximation of both the vertical and horizontal disparities by controlling eye position and by estimating the position of the nearly invisible fovea by physiological methods. The eye was immobilized by a continuous injection of Flaxedil, a muscle action depressant, as weIl as by a mechanical restraint. Estimates of the retinal disparity were based upon the assumption that the common point of view of the two foveas represented zero disparity. The results of their experiments showed that binocularly activated single cells in the cat's visual cortex responded most vigorously when binocular stimuli were presented at a characteristic angular disparity. Individual cells responded weakly to monocular stimuli (to either eye) and even less strongly to stimuli that were either further apart or closer together than the characteristic optimum disparity for each cell. Figure 7.48 is a sampie set of records for two cells with different optimum disparity angles, showing various combinations of stimulus conditions and the resulting neural activity.
418
Sensory Coding
Barlow, Blakemore, and Pettigrew conclude their paper with a suggestion that may be the basis of the key code in binocular depth perception. They point out that with a fixed convergence, the images of objects at different distances will have different disparities. Thus, a different population of ceHs sensitive to a different degree of disparity will be maximaHy activated for objects at different distances. In this manner, the phenomenal distance of an object (and the spatial stimulus dimension) may be encoded by a specific set of labeled ceHs (another quite distinct spatial dimension in the neural domain). Even though both dimensions are spatial, this situation represents still another form of coded representation, in which an information pattern has been transformed from one dimension to another while still maintaining the critical features of the stimuli. If this theoretical organization seems to be sketchy, we can, be fairly sure, considering the high level of the nervous system at which binocular stimuli first mix, that the primary neural mechanism for binocular depth perception is a spatial one located at the very highest corticallevels. 2. Dynamic Auditory Codes for Spatial Localization. In the visual system, as we have seen, there is no neural evidence of binocular interaction at levels below the thalamus, and little evidence of acute temporal sensitivities is evident in psychophysical findings. In the auditory system, however, the psychophysical data suggest a very high degree of sensitivity to the temporal dimensions of the stimulus, and the neurophysiological data consistently provide supporting evidence of possible convergent mechanisms from the two ears at the most peripherallevels. The auditory system is anatomically organized quite qifferently than the visual system. As we have seen in Chapter 3, there appear to be anatomical possibilities for binaural synaptic inter action at very early levels of the ascending auditory pathway. This early anatomical convergence from the two ears might be vital to the detection of very fine asynchronies, since as signals ascend toward the central nervous system, they would be expected to increase their temporal dispersion, thus reducing the fidelity of reproduction of the stimulus timing. Galambos, Schwartzkopff, and Rupert (1959), for example, had originally established the fact that the superior olive contained ceHs that were capable of being binaurally activated. CeHs in this nucleus were shown to be selectively sensitive to both the phase and the intensity differences of dichotic acoustic stimuli, the usual cues for spatiallocalization. The complexity of the binaural inter action at this level has been further emphasized by a more re cent study reported by Wernick and Starr (1968). They showed that such complex interaural interactions as the production of neurophysiological beats (equal to the difference frequency of the two dichotic signals) corresponding to human psychophysical phenomena could be obtained at this level. Gross evoked potentials, used as the electrophysiological indicator in their experiment, clearly showed slow modulations, which varied as a function of the stimulus frequency differences. The nature of the inter action was explored further by Boudreau and Tsuchitani (1968), who found that there were ceHs of the superior olive that were excited by stimuli to the ipsilateral ear and inhibited by stimuli
The Neural Coding of Space and Time
419
to the contralateral ear-an antagonistic relationship, similar to the receptive field organization with which we have already become familiar. Interestingly, their analysis indicated that the inhibitory and excitatory response areas of a single cell were both of roughly the same shape, and both had approximately the same center frequency. More recently, Goldberg and Brown (1969), also working in several of the olivary nuclei, have extended many of these findings and have found specifically that cells most sensitive to high-frequency stimuli (that is, with high-frequency centered response areas) were typically more sensitive to intensity differences than to phase differences. Those cells most sensitive to low-frequency signals (that is, with low-frequency centered response areas) were typically more sensitive to phase than to intensity differences. This finding is entirely congruent with the psychophysical data, which have also shown this same association. Within this dichotomy, however, they found that there were other important classifications of cellular behavior worthy of note. For example, cells sensitive to binaural intensity differences and presumably associated with high-frequency localizations could be excited by stimuli from either ear or inhibited by the stimulus from one ear while being excited by the other. However, only those cells that were inhibited by one ear seemed to be specifically sensitive to differences in the intensities of the binaural stimuli. Cells excited by both ears, by contrast, seemed to res pond on the basis of the average sound level of the two stimuli. It is interesting to note that a comparison of the average or absolute sound level with interaural intensity differences would be an important means of providing additional information relevant to auditory localization. It possibly might also be the basis of the compensatory mechanism that maintains constant loudness as an auditory percept moves about in space. The issue of whether or not analogous response patterns at the next level of the auditory pathway are specifically related to the problem of sound localization has been specifically considered in an important paper by Rose, Gross, Geisler, and Hind (1966). Recording spike action potential patterns from single cells in the inferior colliculus of a cat, they measured the effects of binaural stimuli, which differed in phase and intensity. They found that there were a number of different kind of cells present that were affected by differential binaural stimuli but in several different ways. Some cells turned out to be exquisite detectors of slight phase differences between two continuous tones. For example, Figure 7.49 shows the response rate of a ceIl that varied its frequency of firing from virtually zero to over a thousand spike action potentials per second as the phase angle of a 50o-Hz tone presented to both ears shifted, even though the intensities of both the original and shifted tones remained constant. Rose and his coIleagues found that the same ceH, when stimulated with signals applied to either the right ear or the left ear alone, produced the nonmonotollic functions shown in the inset drawing of Figure 7.49. Both of these monaural responses, however, were at a level of activity that was considerably less than that produced by the combined stimulus. A sampling of a wide variety of these phase sensitive cells by Rose and his associates showed that each cell has its own "characteristic delay time." Because of this individual characteristic, different cells would re-
420
.,
Sensory Coding
Per iod = 2000 ,",sec
1100
::l
80th ears: 45 db SPL
31000 E .;::;
500 Hz
., 900
N=46
(ij
c:
...
800
c:
0
700
0
600
600
500 & .,
500
400
400
300
300
E 200
200
Z
100
0
1:) (.) CI)
.,
CI) ~
'5. .,
....0 ...
45 db SPL
CI)
.0 ::l
100 1600
800
o
Delay of RE stimulus
800
1600
Delay of LE stimulus
LE alone
25
45
65 85
Tone intensity db SPL
(,usec)
(a)
(bI
HGURE 7.49 Response of a cell in the inferior colliculus of a cat to phase
differences in dichotic stimulation (b) Shows the negligible response, which is genera ted by stimuli presented only in one ear. When two stimuli are presented dichotically (a) the response over 10 sec is very much a function of the phase angle between the two. Properly phased dichotic stimuli can elicit a response magnitude that is very much larger than that produced by any monaural stimulus. Such a mechanism may be the basis of spatial localization in the auditory system (from Rose, Gross, Geisler, and Hind, 1966).
spond maximally at different phase ang1es between the two stimuli. Once again, this is a dear examp1e of a case in which the temporal differences have been re-encoded into a spatia1 representation. Spatia1 encoding of this sort is sensitive to time differences as small as a few microseconds, a va1ue that once again corresponds to the known limits of psychophysically measured spatia11ocalization. Rose and his associates also observed another dass of cells that were re1ative1y in sensitive to the phase difference between dichotic stimuli, but that were high1y sensitive to their intensity differences. Sensitivities to intensity differences as fine as 1 or 2 db were observed in some of these cells. Unfortunate1y, one corre1ation, which one might have hoped to find, was not observed. These intensity sensitive ceHs were not sole1y those that responded at the higher center frequencies. Such a resu1t would have been expected since, as we have seen, intensive differences seem to be more important for higher-frequency signals. In this case, intensity sensitive cells
The Neural Coding of Space and Time
421
of this type seemed to be found across the entire range of the frequency spectrum. Rose's group has also considered the issue of dichotic interactions of cells at the level of the cortex (Brugge, Dubrovsky, and Rose, 1964). These investigators found a small sampie of cells that seemed to behave very much like the phase sensitive cells of the inferior colliculus. Small phase differences between dichotic stimuli led to substantial differences in both spike counts and the latent period before the initial response in these ceHs. Wh ether these cells were simply following the response of the cells at the lower centers or were truly responding to dichotic convergent inputs themselves is, of course, moot. The details of th€ anatomy suggest, however, that this may simply have been a following of an interactive process occurring at a lower level. On the other hand, it is known that the presence of cortical cells is critical to the ability of an animal to detect phase differences and presumably also to the localization of sound sources. Masterson and Diamond (1964) have shown that extirpation of the acoustic cortical tissue, in which these types of ceH occur, drasticaHy reduced a cat's ability to discriminate intervals between clicks. Yet intensity discrimination per se seems to be very much unaffected by massive bilateral extirpations of the acoustic areas (see Neff, 1961, for a review of these topics). In sum, then, we have seen that the encoding of spatiallocalization in vision and in hearing is done by mechanisms of fundamentally different organization. There is little anatomical opportunity for convergence of dichoptic stimuli in the visual system until the highest centers. On the other hand, the auditory system shows clear evidence of the mixing and interaction of dichotic signals at the lowest possible level (the second-order neuron). Nevertheless, both systems seem to be able to extract information about the environment and encode the perceptions of space and depth with a high degree of precision. This evidence for different levels of inter action and possibly different mechanisms for spatial localization in vision and audition does suggest that there may be limits to the generality of any unified theory of sensory coding.
CHAPTER 8: FEATURE DETECTION-
NEUROPHYSIOLOGY AND PSYCHOPHYSICS
In this chapter, we continue our discussion of temporal and spatial coding by emphasizing a very important subtopic-the specific sensitivities that certain neurons seem to display to certain spatio-temporal features of the stimulus. The discussion is centered around the very important findings of two research groups that have been especiaHy influential in recent years. While their empirical findings are very interesting in themselves, perhaps more important is the change in perspective that they have introduced into contemporary psychobiological thinking. We try to stress this theoretical contribution in the discussion. We, then, consider the issue of specific spatio-temporal sensitivity at several different levels of the nervous system, as weH as the differences observed among species. Finally, we conclude the chapter with a discussion of the relevance of these data to human perception, the main purpose of which is to suggest that there are some limits on the applicability of the basic idea of neuronal feature detection to perceptual phenomena that seem to some to be identical but that, in fact, are only superficially analogous. I. INTRODUCTION Perhaps the most exciting development in the field of sensory psychobiology in the last decade has been the emergence of the important notion
Feature Detection-Neurophysiology and Psychophysics
423
of feature detection. The basic idea of feature detection is that very specific spatio-temporal patterns of stimuli are almost uniquely capable of initiating activity in particular ceUs. For example, a dot of light may be an effective stimulus for a particular visual cortex ceU H, and only if, it is moving in a particular direction or expanding at a certain rate. An identicaUy shaped stimulus will not be effective, unless its dynamics are appropriate, and a differently shaped object will not work even H it is moving at the appropriate rate. Both the spatial and temporal attributes of a stimulus must be right for it to do the job. As a further example, a particular sequence of tones may be able to activate a sequence-specific ceU, while another quite similar one, perhaps even containing the same constituent notes but in a different order, may not be able to do so. Single ceU response specificity to particular patterns in time and space of this sort supports the notion of neurophysiological relativism discussed at the beginning of Chapter 7. It is often difficult to separate the temporal aspects of a stimulus from its spatial on es without losing everything. We are beginning to appreciate that the temporal and spatial codes are not simply summated, but are inseparable aspects of each other. Unfortunately, the last decade, the period in which this notion of feature detection has been evolving, has not been a sufficiently long time for data to accumulate to equaUy describe all of the sensory modalities. We shall be mostly dependent upon the work of investigators who have been working in the field of vision for the subsequent discussion. There are only a few important references from auditory neurophysiology that can be considered to be relevant, and, sadly, there is little of interest to report from any of the other senses. HopefuUy, this is a temporary state of affairs wh ich will change soon, for psychophysical data make it almost certain that similar mechanisms must be operating in the other sensory modalities. The psychophysics of somatosensation, as von Bekesy's work has so eloquently shown, exhibits much the same sort of response to spatio-temporal patterns that appear in vision and audition. Presumably, spatio-temporal patterns, which can key or trigger specific neural systems in olfaction or gustation, are also present. But the peculiarities of their stimulus environments suggest that it will be many years be fore we can reasonably expect knowledge to be available at the depth already obtained in vision. It is very important for the reader to remember that in introducing this material on pattern detection, we are not introducing completely novel neural mechanisms. Most of the interpretations of specific feature sensitivities are based upon models involving simpler neural mechanisms of the sort we have already discussed. Edge detection is one form of feature detection and is certainly associated with the lateral inhibitory interactions previously mentioned. The detection of a spot or bar moving in a particular direction also, as we shall see, can be easily understood in terms of lateral interactions and neural delays. Thus, feature detection mechanisms sensitive to specific spatio-temporal patterns represent a somewhat more complex level of discourse, but one which is simply a direct derivation from the concatenations of simpler mechanisms.
424
Sensory Coding
The main point of this chapter will be to discuss the fact that sensory systems have evolved to be selectively sensitive to specific spatio-temporal features of the environment. A corollary of this general premise is that the features selected are those that have adaptive significance.
11. A GERMINAL STUDY-"WHAT THE FROG'S EYE TELLS THE FROG'S BRAIN" Prior to the 19605, the typical stimulus used in many neurophysiological experiments was an impulsive click or flash of light (or as H. L. Teuber put it, "bolts of lightning and flashes of thunder") with little complex structure and no temporal or spatial patterning. In 1959, Lettvin, Maturana, McCulloch, and Pitts published a very important paper that was to influence research for the next decade. The major impact of this paper lay in its suggestion that since the stimuli, with which the organism usually deals, are not simple impulsive stimuli, there was no reason to assurne that the neural organization was evolved to deal with such stimuli either. Their paper concerned the responses recorded from single cells of the optic nerve and of the superior colliculus of the frog. Most insightfully, they entitled their paper "What the Frog's Eye Tells the Frog's Brain," a whimsically anthropomorphic title, which emphasizes the notion of complex processing in the periphery, rather than merely passive mosaic reproduction and transmission of the stimulus patterns. This work has stimulated a decade of research, in which the temporal and spatial pattern of a stimulus came to be considered to be as important, if not more important, than its intensity, extent, or its onset and offset times. There is no question of the fundamental importance of this new idea, and it would be weIl to quote specifically the sort of thinking that led these investigators to this most significant conceptual breakthrough: The assumption has been that the eye mainly senses light, whose Ioeal distribution is transmitted to the brain in a kind of eopy by a mosaie of impulses. Suppose we held otherwise, that the nervous apparatus in the eye is itself devoted to deteeting eertain patterns of light and their ehanges, eorresponding to partieular relations in the visible world. [f this should be the ease, the laws found by using small spots of light on the retina may be true and yet, in asense, be misleading. (LETTVIN, MATURANA, McCULLOCH, and PITTS, 1959, p. 1942.)
Their paper initiated an era in which neurophysiologists have begun to take into account a number of important facts. First, peripheral neurons are capable of responding differentially to complex stimulus patterns in a manner transcending simple registration and transmission. Second, complex stimulus patterns may produce responses that are not simply predictable from the notion of a mosaiclike replication of the spatial pattern. Third, the types of responses that are observed neurophysiologically in the frog's visual system at least, are associated in a meaningful way
Feature Detection-Neurophysiology and Psychophysics
425
with the environment al tasks faced by the anima!. The frog does not live in a world of static spots of light or of diffuse fields. It lives in a world of flies and dragon flies and, for that matter, snakes and hawks. These stimuli are better modeled by moving spot, or moving convex edges (in other words, by complex spatio-temporal patterns) than by dots or stationary forms briefly exposed. The surprising result that has emerged from their work and of other laboratories exploring similar processes is that this sort of specific pattern detection mechanism occurs as peripherally as the retina in at least some species. Lettvin and his colleagues made other important conceptual contributions in that important and germinal paper. At the time it was published, it certainly seemed somewhat farfetched to make the statement that different anatomical types of retinal neurons could be associated with specific classes of behavioral responses. Yet, as we shall see below, evidence is accumulating that this is exactly the case. Specifically, and by way of completely clarifying this point, it should be noted that the key idea is that the different anatomical types are functionally different by virtue of the differing spatial distribution~ of their dendritic trees. Such differences in the extent and nature of the arborization allow various manners of interconnection to the more peripheral neural levels, and this means that each type can receive and emphasize a specific spatio-temporal distribution of excitation and inhibition. Now that we have mentioned the fundamental changes of perspective stimulated by Lettvin, Maturana, McCulloch, and Pitts' paper, let us consider the specific findings. Their unique contribution, as we have said, depended upon the realization that the older stimulus paradigms were incomplete from both a behavioral adaptation and a neurophysiological point of view. To make more realistic visual stimuli, they hit upon the idea of using a 14-in. aluminum hemisphere upon which small dots, rectangles, and other stimulus shapes could be attached and moved by external magnets. Figure 8.1 is a drawing of the stimulus arrangement, showing the hemisphere upon which the stimulus was placed and the arrangement of the apparatus, wh ich held the platinum black coated metal electrodes inserted into the optic nerve or the superior colliculus through a hole in the frog's skull. By examining the effects of a variety of different kinds of stimulus patterns, four different classes of optic nerve fibers were found. First, a certain group of unmyelinated fibers, which was responsive to contrast, per se was found. A difference in the brightness of two portions of the visual field led to continuous firing of this type of cello These " sus tained contrast" cells had the additional properties of being able to maintain their activity for over a minute after the stimulus was removed. Furthermore, the output of these cells saturated at low stimulus intensity and did not vary with further increases in stimulus intensity. The second type of unmyelinated cells, which Lettvin and his colleagues referred to as the "net convexity detectors," responded only to small moving spots (a fly?) but not to large moving objects, unless they entered the receptive field of the ceH corner first. Dotted or checked patterns, even though containing small units, were not able to activate these
426
Sensory Coding Experimenter moves external magnet moving stimulus objects on inside of hemisphere
Amplifiers and oscilloscope
Pt or Ag microelectrode
Frog
Flap of skull removed 14-inch diameter aluminum hemisphere
FIGURE 8.1 The experimental arrangement used in the Lettvin, Maturana, McCulloch, and Pitts (1959) experiment.
cells. The adaptive utility of such a fiber can best be expressed in the author' s words: A delightful exhibit uses a large color photograph of the natural habitat of a fro g from a fro g' s eye view, flowers and grass. We can move this photograph through the receptive field of such a fiber, waving it around at a seven inch distance: there is no response. If we perch with a magnet a fly-sized object 1 0 large on the part of the picture seen by the receptive field and move only the object, we get an excellent response. If the object is fixed to the picture in about the same place and the whole moved about, then there is none. (LETTVIN, MATURANA, McCULLOCH, and PITTS, 1959, p. 1945.)
A third dass, but in this case made up of myelinated neurons, was solely responsive to moving edges. A stationary edge, though of exactly the same shape, was totally ineffective in activating members of this third dass of neuron. The fourth type induded myelinated cells that responded to any overall dimming of the field. This cell type, with its characteristically large receptive field, also possessed the fastest conducting axons. It may be that this fourth ceU type is behaviorally associated with the detection of predators. Whatever the evolutionaryhistory of
Feature Detection-Neurophysiology and Psychophysics B. Tadpole trog
A. Adult trog Physiology Class 1 edge detector Class 2 convex edge detector Class 3 moving contrast detector Class 4 dimness detector
427
Physiology
Anatomy Constricted tree E tree
H tree Broad tree
100",
Anatomy
Class 1 edge detector *
Constricted tree *
CI ass 2 convex edge detector t Class 3 movi ng contrast detector Class 4 dimness detector
E tree t H tree Broad tree
FICURE 8.2 The four types of ganglion cell dendritic arborizations found in the adult frog and in the tadpole and the associated spatio-temporal specificities suggested by Lettvin and his colleagues. (*) Absent from tadpolei (t) absent from tadpoles retinal periphery (from Pomeranz and Chung, 1970).
such a mechanism, it is a highly useful feature of the frog's neural mechanism, since this mechanism appears to be associated with some sort of an escape response-moving edges being typical of moving "hawks." The four sets of fibers observed in the optic nerve by Lettvin, Maturana, McCuIloch, and Pitts sort themselves out at the level of the superior colliculus. Each type projects to one of the four collicular layers, each of which seems to topologically represent the layout of the whole retinal mosaic. Thus, the retinal image is represented four different times, but in layers that are particularly responsive to four different features of the stimulus pattern. Lettvin and his colleagues have suggested that the remarkable ability of the optic nerve (ganglion ceIl) fibers to respond to specific aspects of the spatio-temporal stimulus patterns is due to the fact that differently shaped dendritic trees of the four types of ganglion cells sampie a variety of different inputs from the bipolar and receptor layers of the retina. Figure 8.2 shows the four types of ganglion ceUs that these investigators considered to be prototypical of the four-ceU type ceUs of the frog' s retina as classified by their neurophysiological behavior. Lettvin and his colleagues associated the sustained contrast detector with the constricted field ganglion ceil, the convexity detector sensitive to small moving spots with the E type of ganglion ceil, the moving edge detector with the H type, and the dimming detector with the broad tree. Recently, Pomeranz and Chung (1970) have carried out some developmental studies, which demonstrated that the four types of cells do not all appear at the same time in the metamorphic development of the tadpole. They further conclude that the associations suggested above by Lettvin and his colleagues are supported when one compares the presence or absence of a specific type of ceU with the ability of the tadpole or adult frog's retina to produce one of the prototypical feature-specific responses. Pomeranz and Chung
428
Sensory Coding
found, with mieroscopie examination, that during the tadpole stage of the frog's development, the restricted field ganglion cell is totally absent and that E tree ganglion cells are absent from the retinal periphery as summarized in Figure 8.2. Their electrophysiologieal recordings showed, correspondingly, that the sustained contrast type of response pattern was totally absent in the tadpole and that the small spot ~onvexity detection process could not be observed in the tadpole's peripher al retina. This correlation adds substantial support to the earlier speculations of Lettvin's group, relating specifie cellular anatomy and physiologie response pattern. III. ANOTHER GERMINAL sruOY-HUBEL AND WIESEVS MOVING BAR OETECTOR About the same time that work was going on in Lettvin's laboratory at M.I.T., down the street at the Harvard Medieal School a pair of researchers were beginning to report what was to be a most remarkable series of studies of the organization and specifie sensitivities of visual cells in the mammalian nervous system. Hubel and Wiesel had quiekly moved to take advantage of a very sturdy new tungsten electrode developed earlier by one of them (Hube!, 1957) that allowed very long and stable periods of observations from single cells in the central nervous system. The papers, whieh have been forthcoming from the continued collaboration of these two investigators, have been most compelling in establishing the fact that mammalian nerve cells also reflect coding mechanisms that are sensitive to the spatio-temporal pattern of visual stimuli rather than to simply the distribution of light on the retina. In the next part of this section, we shall trace the difference in response patterns at different levels of the ascending visual pathway. In this section, we shall, so to speak, start at the top to try to give abrief introduction to the type of electrophysiological techniques that formed the conceptual basis of much of Hubel and Wiesel's later work. In the first application of the new electrode, Hubel (1959) observed specifie sensitivity to spatio-temporal features of stimulus patterns in cells of the visual cortex of the cat similar to the response patterns Lettvin and his colleagues had discovered in the frog' s optie nerve and tectum. Recording from the striate cortex of unrestrained cats, Hubel had found large numbers of cells that were unaffected by the general level of retinal illumination, but that produced large amounts of spike activity when a small spot of light was moved across the retina. Not only was the movement of the spot necessary, but those cells also seemed to have preferential directions of movements for maximal activation. In a later paper that year, Hubel and Wiesel (1959) mapped out the shape and polarity of the receptive fieIds of these cells and showed that these fields were organized somewhat differently than those observed in the retina by Kuffler (1953). Rather than concentric rings of inhibition and dis inhibition, the antagonistie regions in' the cortex seemed to be usually arranged in side-by-side patterns as shown in Figure 8.3. Although not all of the cortical cells mapped required movement to elicit responses, the responses were en-
429
Feature Detection-Neurophysiology and Psychophysics Inhibition Excitation
Preferred orientation
FIGURE 8.3 A typical linearly arranged receptive field of a cat' s visual cortex neuron with side-by-side antagonistic excitatory and inhibitory regions (from Hubel and Wiesel, 1959).
hanced from almost a11 ce11s when the stimulus was moving in a partieular direction. It should be noted that the response of many of these ce11s seemed to be dependent upon not only movement and a preferential direction, but also upon the shape of the moving stimulus. While bars of light were effective for some cells, dark bars or moving edges were found to be the best stimuli for other cells. Presumably, the effectiveness of these elongated stimuli is associated with the shape of the receptive field and the adjacent elongated regions of inhibition and excitation. Shortly thereafter, Hubel and Wiesel (1962, 1963) made another important discovery about the cortieal organization of the moving bar detectors. Suggestions had come from work in some of the other senses (for example, Mountcastle, 1957), as weH as from anatomieal studies showing vertical structural organization, that the ce11s of the cerebral cortex were araanged in functional, as we11 as structural, columns oriented normal to the surface of the brain. The notion was that these vertieal columns, while not distinguishable by any anatomie boundaries from neighboring columns, did have functional properties that were quite different from column to column and that there would be abrupt changes as a mieroelectrode crossed the invisible boundaries separating them. Hubel and Wiesel showed that if a mieroelectrode was pushed into the cortex in such a way that it penetrated into the successive layers of a single vertieal, cortical column, the experimenter would pick' up spike trains from a succession of individual cells, a11 of which behaved in the same way. An important observation was that each ce11 in the penetration series was specifically sensitive to the same direction of movement of a bar-shaped visuaI stimulus. Of course, it was not always possible to keep the cell within one column in their experiments, but it was dear when the micro-
Sensory Coding
430
1 2
Layer 1
White
matter
FlCURE 8.4 Two parallel pathways of microelectrodes into the cat's visual cortex. One of the tracks apparently has slid down a column, in which all of the cells have specific sensitivities in the same direction (1). The other track (2) cross es from column to column hitting many cells with different directional sensitivities (as indica ted by the small cross lines) in sequence (from Hubel and Wiesel, 1963).
electrode emerged from one column and entered another, the preferential direction of activation would change to a new orientation and remain so oriented until the electrode entered another column. Figure 8.4 shows the results for two typieal deep penetrations. One of the penetrations has fortuitously slipped down a single column, and all of the cells were maximally sensitive to stimuli moving in the same direction. The other penetration, however, crossed from one column to another several times as indieated by the succession of preferential orientation whieh was encountered. From the surface of the cortex, another view obtains. Figure 8.5 shows the shape of the surface mosaie of the columns. The dotted lines outline the areas that Hubel and Wiesel believed to be associated with single columns. The preferential direction of the cells sampled in that column is shown by the small bars. Note carefully that the cross-sectional area of these columns is only 1 mm or so on the cortex and that adjacent regions seem not to have any special relation with regard to their preferential direction to that of their neighbors. The shape of the columns, furthermore, was quite irregular; some were round, and some more elongated. The major correlated anatomieal fact is that the individual columns do appear to be organized along the direction of the fibers of the optie radiation, whieh extend from the white matter to the superficial grey matter of the cortex. Hubel and Wiesel have made another anatomie point of considerable interest. The dassie notions of cortical localization assumed some sort of topologieally constant projection of the retinal mosaie. However, their new findings and the discovery of the orientation specific columns have made it quite dear that the retinal mosaie is, in fact, represented many different times over on the cortex. Each retinal point is represented in many different columns, each of whieh corresponds to a specifie direction of movement of the barlike stimulus across the point. This multiple representation is in addition to the multiple representation of the retinal mosaie in several different regions of the cortex (for example, the three classie visual areas in the cortex regions known as 17, 18, and 19). Hubel and Wiesel (1965) have also discovered that the spatio-tem-
431
Feature Detection-Neurophysiology and Psychophysics 26
27 25
31
24 8
9
23 22
7
30
21
HG URE 8.5 The organization of cortical columns as viewed on the surface of the brain. There is no apparent organization or relation between adjacent columns. The smalliines indicate preferred directions. Numbers refer to the sequence of penetrations (trom Hubel and Wiesel, 1963).
6
20
16 17
15
10
5
11
12
14
19 18
28
29
13
4
3
2 3
poral specificity of the cells in the cat's cortex involves more than a single type of response. In fact, they determined that there were at least four different kinds of cellular feature extractors present. Some cells displayed the simple shape and direction of movement specificity that we have already discussed. The three other dasses, however, seemed to indude cells that received multiple convergent inputs from this primitive type of cello In fact, there appeared to be a hierarchy in which each successive dass-simple, complex, lower-order hypercomplex and higher-order hypercomplex-produced responses, which presumably resulted from integration of the inputs from many cells of the next simpler type. Each had a specifie stimulus geometry, which maximally activated it. Some (the simplest ones) responded only to stimuli in a particular loeation and at a particular orientation, while others (the more complicated ones) seemed to be able to respond to multiple positions and directions, but still requiring some sort of particular movement and shape for their optimal activation. The hierarehy and specific sensitivities of each of the four types are presented in Table 8.1, which has been summarized from Hubel and Wiesel (1965). In recent years, this table of visual cortex cellular organization has been challenged by other workers. Spinelli and Barrett (1969), in particular, have challenged the notion that line-shaped stimuli are of particular importanee, even when one is dealing with elongated receptive fields. In their study, they found that some of the cells, which Hubel and Wiesel had thought to be uniquely stimulatable with lines, could also be stimulated with moving spots or dots of light. Nevertheless, whatever the outcome of specific aspects of the controversy, it is cIear that Hubel and Wiesel's
Sensory Coding
432
TABLE 8.1 CORTICAL CELL TYPES SENSITIVE TO SPECIFIC ASPECTS OF THE TEMPORAL-SPATIAL PATTERN OF VISUAL STIMULI (ADAPTED FROM THE DISCUSSION OF HUBEL AND WIESEL, 1965) Cortical Cell Type
Best Stimulus Shape
Other Stimulus Conditions
Possible Mechanism
Simple
Edges, bars, slits, all with preferred orientation of movement
Critically dependent on location within receptive field
Interaction of inhibitory and excitatory influences of receptive fields
Complex
Edges, bars, slits, all with preferred orientation of movement
Independent of location within receptive field
Integration of the output of a number of simple cells
Lower-order hypercomplex
Edges, corners, tongues, and angles of particular sizes
Specific to length of stimulus_ Some cells required a stimulus exactly the width of receptive field. Other required one which ended (a tongue) within the field. Stimulus must not extend beyond a certain point.
Integration of the output of both inhibitory and excitatory complex cells
Upper-order hypercomplex
Edges, corners, Similar to lower-order tongues, and angles hypercomplex. But in of particular size addition, cells of this group responded in two preferred directions 90° apart.
Integration of the output of two or more lower-order hypercomplex cells
work has been of the greatest significance in opening the door to a new understanding of neuronal circuity_ By their emphasis of the specific sensitivity to spatio-temporal features of the stimulus, they have helped to reorient many of the basic premises of sensory-coding theory_ Another very important issue concerning these feature specific receptive fields has been emerging in re cent years. How do these receptive fields change as a function of maturation and experience? Clearly a great deal of plasticity is possible in receptive field structure even to the point that environmental stimuli may be able to determine the shape of the field. Though this topic is not directly germane to the main concern of this book, it is clear that the subject will be extremely important to students of development_
Feature Detection-Neurophysiology and Psychophysics
433
IV. DIRECTIONAL SENSITIVITY IN THE SUBCORTICAL VISUAL CENTERS OF MAMMALS
The work of Lettvin, Maturana, McCulloch, and Pitts and that of Hubel and Wiesel have stimulated many other investigators to examine other centers of the ascending visual pathways for evidence of feature detectors. Subsequently, a number of investigators began to find evidence of similar directionally sensitive cells at all other levels of the visual system, although with some curious species differences. Arden (1963) was examining the receptive fields of lateral geniculate ceHs of the rabbit when he, too, discovered that small spots oE light, which were relatively ineffectual stimuli when immobile, became remarkably effective when they were moved about in the receptive field. Not only was the movement of the spot necessary for the effective elicitation of responses, but many cells seemed to exhibit preferential sensitivity to movement in a given direction. Kozack, Rodieck, and Bishop (1965) reported a comprehensive study of the effect of moving stimuli in the lateral geniculate body of the cat. They found units that were directionaHy sensitive, but without the same narrow range of specificity reported by Hubel and Wiesel in the cat/s cortex. However, when this work was extended to the cat's retina (Rodieck and Stone, 1965b), they found that, while the center-surround arrangement was present and the responses were sensitive to such factors as size, shape, contrast, and speed of movement of stimulus objects, retinal ceHs were, in fact, not directionally sensitive! This finding is in sharp contrast to the observation of directional sensitivity in the retina of the pigeon (Maturana and Frenk, 1963) or of the rabbit (Bariowand Hill, 1963), as weH as many other animals. How can directional sensitivity arise out oE the inter action of simple neurons, which are not themselves directionaUy sensitive? Perhaps the most instructive example of such an analysis, using the rabbit retina as a model system, has been presented by Barlow, Hill, and Levick (1964). First, let us discuss the details of their experimental procedure and some of their findings, and then we shaH consider their neural model of directional sensitivity. Fine tungsten electrodes were inserted directly through the rabbit's sclera into the retina. Extracellular ganglion cell spikes are recorded in this mann er, and this means, of course, that the responses observed reflect the integrative activity oE at least two preceding synaptic levels and probably several other horizontal interactions. Barlow, Hill, and Levick report the presence of ganglion cells organized with antagonistic centersurround receptive fields, as weIl as ganglion cells that seemed to be activated solelyon an "on" and "off" basis without the antagonistic S'lrround arrangement. These latter ceHs, in particular, seemed to have specific sensitivities not only to the direction, but also to the speed of the moving spot. Figure 8.6 (adapted from Barlow, Hill, and Levick) shows the differential sensitivity to direction of movement of a particular centersurround type oE ceH. It can be clearly seen that this ceU is maximally sensitive when the stimulus is moving upward and declines in sensitivity to virtually zero response levels when the stimulus is moving down ward.
434
Sensory Coding
2
Spot size
27
19
37
51
64
68
79
0'5 sec
FICURE 8.6 Directional sensitivity in the ganglion cell of the rabbit retina, showing the response of a cell which is maximally sensitive to upward moving stimuli and minimally sensitive to downward moving ones. Numbers indicate spike count, and ± marks indicate the size of the receptive field in wh ich both "on" and "off" responses were elicited by stationary spots curved lines indicate movement of stimulus; straight lines indicate periods of stimulus illumination (from Barlow, Hill, and Levick, 1964).
Intermediate directions of movement produce intermediate numbers of spike action potentials., Barlow, Hill, and Levick also observed two other types of cells in the rabbit retina, in which the speed rather than the direction of a moving stimulus was critical in determining the amount of induced spike activity. One type had a rather large receptive field and proved to be maximally activated by stimuli that moved very rapidly. Figure 8.7 shows the pattern of response, indicating a greater sensitivity to rapidly moving stimuli. On the other hand, there were also ganglion cells present, which had very small fields and were maximally activated by very slow moving stimulia most unusual sort of specific sensitivity the adaptive utility of which is not immediately obvious. Figure 8.8 demonstrates the heightened sensitivity of this type of cell to slowly moving stimuli. Barlow and Levick (1965) have also shown that retinal direction sensitivity in the rabbit is primarily due to the retinal differentiation of the response on the basis of the sequence in which various portions of the receptive field have been stimulated. If a sequence is presented in the opposite order, then the effect of the stimulus would be nil or, in some cases, it might even inhibit the level of spontaneous activity. They believe that some sort of inhibitory lateral interaction is the basis of the sequence
Feature Detection-Neurophysiology and Psychophysics Spot size
(a)
26
(b)
(c) (d) (e)
(f)
435
17
12
27
20
Deg/sec
10-60
11 70-175
HGURE 8.7 Response of a ceIl with a large receptive field. This ceIl responds best to movements that are moderately rapid. (a), (c), and (e) are the spike responses resulting from the movement shown in (b), (d), and (f), which are photoelectric traces of the stimulating light (also showing considerable 50-Hz ripple from the power source). The small numbers indicate the number of spikes in the preceding burst. This cell obviously responds best to intermediate speeds of movement. Symbols as in Figure 8.6 (adapted from Barlow, Hill, and Levick, 1964).
detection process. This is not a conclusion that can be drawn apriori, for there are two complementary mechanisms-one excitatory and one inhibitory-which could produce exactly the same sort of directional sensitivity. Other tests must be made to determine whether an excitatory or inhibitory process accounts for this particular feature detection process. Perhaps the distinction between the two possible mechanisms can be made more clearly by considering two possible logical systems, as Barlow and Levick have done, each of which is capable of producing a response only when the stimulus is moving in a preferred direction. Figure 8.9 (adapted from Barlow and Levick) shows the two hypothetical nerve nets. One operates on the basis of an excitatory inter action i the output indicative of movement in the preferred direction occurs only when the stimulus activates a given receptor and the receptor that preceded it in that order and at a particular interval. The other possible mechanism is based upon an inhibitory process i an output will occur if a stimulus activates a given receptor, but only if the prior receptor has not been activated in the period defined by the time delay. In this case, the directional sensitive mechanism is one based on the inhibition of a response that otherwise would have occurred. In the former case, a response can occur only if gated or allowed by an appropriately timed excitatory response. Barlow and Levick carried out experiments that showed that two spatially disparate stimuli activated in sequence produced a smaller number of spike action potentials than when the two were presented separately. This suggested to them that the mechanism for the sequence detection, and thus the directional sensitivity, was, in fact, more likely to be an inhibitory mechanism than an excitatory one. To complete the story, some correlation must be shown between the hypothetical logical mecha-
Sensory Coding
436
(a)
99
(b) (c)
0'6
9
9
(d)
Spot size
Deg/sec
2'3,2'8
(e)
Axis of motion
6'6,10,21
(f) (g)
48
(h) (i)
ij)
42,1'6 0'5 sec 28 1'8
FIGURE 8.8 Response of a cell with a small receptive field. This cell type responds best, surprisingly, to very slowly moving stimuli as indicated by the small number of spikes to rapidly moving stimuli and the large number evoked by slowly moving stimuli. Symbols as in Figure 8.6 (trom Barlow, Hill, and Levick,1964).
nisms and the known anatomy of the rabbit's retina. Figure 8.10 is Barlow and Levick's schematic diagram oE the anatomy oE the rabbit retina. The Eunction oE the logical"and" (the gating or allowing function) units is perEormed by the bipolar cell, such that no output will occur unless both of the inputs are activated. Inputs from the horizontal cells are assumed to be mainly inhibitory and able to prevent the bipolars from firing iE a group of cells has already been activated at an appropriate prior interval. The conduction time of the horizontal cells is supposed to be identifiable with the time delays indicated in Figure 8.9. As Barlow and Levick note: The strength of the proposed scheme arises from the fact that a tunction can naturally be assigned to the neural elements that are known to exist without making esoteric or revolutionary assumptions about how they work. (BARLOW and LEVICK, 1965, p. 497.)
For the purposes oE our discussions, this model represents a sampie oE the sort of neural inter action that will probably have to be invoked to explain the action of feature sensitive cells of varying degrees of complexity and hypercomplexity at all levels of the nervous system.
437
Feature Detection-Neurophysiology and Psychophysics Inhibitory mechanism
Excitatory mechanism
A
C
8
At
"And" (conjunction) gates
At
B. -C'
Preferred direction
At
At
B:C
B. -C'
C
8
A
B. -C'
"And not" (veto) gates
Null direction
FICURE 8.9 Two equivalent logical circuits, both of which may display directional sensitivity, but which are based on (a) summation and (b) inhibition mechanisms, respectively. bot is a time delay unit, wh ich delays the signal trom a preceding receptor sufficiently to allow either synchrony (thus enabling the "and" gate) or dysynchrony (thus enabling the "veto" gate) of inputs from adjacent receptor units (trom Barlow and Levick, 1965).
V. FEATURE DETECTION IN AUDITION As we said in introducing this section, neurophysiological feature detection mechanisms in the other sensory modalities have been only infrequently reported. Nevertheless, there is at least one body of evidence that suggests that mechanisms analogous to the spatio-temporal pattern recognizers of the visual system are also present in the auditory system. Whitfield and Evans (1965) obtained such data in a sampling of neurons in the auditory cortex, which selectively responded to complex tonal patterns. Like Hubel and Wiesel, as well as Lettvin, Maturana, McCulloch, and Pitts, they had become convinced that simple stimuli, either in the form of impulsive dicks or constant tones, did not evoke the true diversity of responses of which cortical cells were capable. Instead of constant tones or impulsive dicks, Whitfield and Evans, therefore, used frequency modulated tones as their stimuli. If we proceed under the assumption that frequency is to cochlear locus what stimulus position is to retinallocus, it is dear that these frequency modulated stimuli are dose analogues of mo ving spots or slits in the visual domain. And, indeed, the data obtained did seem to exhibit many of the properties we have already become familiar with in the visual sense. Figure 8.11 shows one of the typical records obtained in their experiments. The lower trace in this figure reflects the modulations of the frequency of the stimulating tone. The early unmodulated flat portion of the stimulus represents aperiod during which its frequency was held constant. Clearly, the neuronal responses adapt very little. On the other
438
Sensory Coding
Null direction
R
R
Td
Td H
H A
B
1°
2"
= 100
B
Td
j.I
FIGURE 8.10 A more anatomically reasonable model of the inhibitory circuit wh ich, Barlow and Levick feel, shows how certain known structures could fill the role of the logical unit shown in Figure 8.9. (R) receptor layer, (Td) teleodendria of the horizontal cells, (H) bipolar cells, (A) amacrine cells, (G) ganglion cells. The ganglion cells act as the logical veto cells of Figure 8.9, and the horizontal cells serve to insert the time delay (from Barlow and Levick, 1965).
hand, it can be seen that elicited responses are almost in perfect synchrony with the later modulations of the frequency of the stimulating tone, a sort of entrainment that we will also see occurring with tonal stimuli of constant frequency (Rose et al., 1967) in Chapter 10. In fact, Whitfield and Evans report that as many as 10 percent of the cells that they investigated were not capable of being stimulated by any constant tone under any conditions, but did vigorously respond when the frequency was modulated in some manner. A most interesting part of their data concerns a group of cells that were excited only when the stimulus tone was varying its frequency in a particular direction. The response areas and the preferential direction of frequency change of some of these cells are shown in Figure 8.12. As can he seen, some of the cells produce spike activity only when the frequency is increasing; others only when the frequency is decreasing; and others differ in their directional sensitivity, depending on whether the signal is in the high- or low-frequency region of the response area. The analogies between such auditory "directional" sensitivities and those of the visual system discussed in the earlier section are self evident. On the other hand, there are newly discovered examples of feature sensitive cells in the nervous system that do not seem to he analogues of any other known process. Recently, for example, Thompson, Mayers, Robertson, and Patterson (1970) have encountered some curious cells in
Feature Detection-Neurophysiology and Psychophysics
439
2mV.
1 sec
nCURE 8.11 The response of an auditory cortex cell to a acoustic signal. The lower curve is the frequency envelope, The response shown in the upper trace quickly adapts to tone, but becomes responsive again when the frequency is (from Whitfield and Evans, 1965).
frequency modulated not the signal itself. a constant 11.6-kHz modulated as shown
the association areas of the cat's brain that seem to be able to count a certain characteristic and specific number of input stimuli be fore emitting their own spike potential response. Interestingly enough, these highly specific counting responses were produced regardless of which sensory modality was used as an input, and the cells, therefore, are examples of polysensory neurons. Nor for that matter did the inter stimulus interval seem to be important, a dimension which should have affected the counting behavior if it were due simply to some " charging Up" sort of process. Whatever the mechanism, this is an example of another sort of feature sensitive process; in this case, the feature is the number of input stimuli. VI. 00 LATERAL INTER ACTION AND FEATURE DETECTION MECHANISMS ADEQUATELY MODEL HUMAN PERCEPTUAL PHENOMENA?l
The purpose of the following section is a somewhat digressive one in light of the main theme of this book, but perhaps one of the most important parts of all that has to be said. The neurophysiological findings, which we have just discussed, have been remarkably effective in stimulating a considerable amount of psychobiological theorizing, particularly in regard to perceptual phenomena. However, not all of this theory, some believe, has been justified by adequate identifications of the neural and psychophysical mechanisms being compared. Often, it seems as if a sort of psychobiological"silly season" has taken hold of many would-be theorists and that they have fallen victim to what is just the most recent of a long history of reductionistic fads. All of this, of course, is nothing new. From the earHest recorded notions up to contemporary theories, the prevailing technology has usually been the source of many of the basic premises. [See Brazier (1959) for a more complete history of the psychobiological thinking which permeates the history of physiology.] Thus, Aristotle and Galen turned to the circulatory system-the heart and blood-as the potential source of the various "vapors and spirits" (a hydraulic-gaseous model) for their very persistent views of mental life. The view that the blood was the basis of 1
Much of the following material has been adapted from Uttal, 1971b.
440
Intensity
Sensory Coding
Low
High
(al
Low
High
Frequency (bI
Low
High
(cl
FICURE 8.12 Simplified plots of the response area of three cells that respond only when the frequency of the input signals varies. (a) The response area of a cell that responds only when the frequency of the stimulus increases. (b) The response area of a cell that responds cmly when the frequency of the stimulus decreases. (c) The response area of a cell that responds to any frequency change (from Whitfield and Evans, 1965).
the mind lasted from their time up until the seventeenth century, and even such a genius as William Harvey still accepted it. His contemporary, Oescartes, though taking a slightly different approach and assuming that the pineal body was the seat of man's mental life, thought that the mind operated on a hydraulic basis. An interesting twist to his theory was the supposition, however, that various fluids were pumped about through the then recently discovered to-be-tubular nerves. Ouring the latter part of the eighteenth century, scientific theorizing was domina ted by Newtonian thinking. His postulation of an "ether" led many biologists to seek similar ethereal models as explanations of nervous function. In the nineteenth century, the emergence of electrical telephonic technology led some to speculate that a similar mechanism was the basis of nervous activity. The two decades following World War II saw the rise of conceptual models based upon similarities and analogies between the brain and the recently developed electronic digital computers. In retrospect, these hydraulic, mechanical, ethereal, telephonic, and computer-based theories all seem relatively naive; from our current vantage point, it is quite easy to see how each generation of reductionistically oriented theorizers was so easily seduced into using the most current, exciting, interesting, and potentially useful technology as the basis for its ideas. This was natural enough, for what was past had already been tested and found wanting, and even in our currently advanced times, there is no way to build a theory on an unborn idea. But the fact to be emphasized is that each generation thought for a while, at least, that they had the answer denied to their predecessors, only to see it rejected later. It is not quite so easy to realize that one's current fad may only be that-only one more link in a growing chain of interesting ideas, of which the most recent link is momentarily, if not ultimately, the terminal
Feature Detection-Neurophysiology and Psychophysics
441
one. In each generation, the temptation becomes stronger. Each new technology is more powerful than its predecessor and explains more, and more tempting yet, it promises to explain still more. As a further inducement, it is certainly also true that, according to some measures, the technologies are converging in the right direction. This brief history sets the stage for the thesis of this section, namely, that there is a current fad in neurophysiological explanation of perceptual phenomena that is not yet recognized as such and that also may be unjustified. Just as in recent years we have begun to be suspicious of simpleminded computer-based models of mental processes, we should begin to consider the possibility that perhaps some of the psychological models based upon neurophysiological data have been extended beyond reasonable explanatory limits. In the following sections, we shall first consider some of the specific instances in which feature detection models have been, it is believed, applied inappropriately to perceptual phenomena, and then point out some of the psychophysical data that seem inconsistent with such an approach. A few discrepant neurophysiological data will then be considered. Perhaps the point of this section was best summed up by Charles Harris of the Bell Telephone Laboratories when he said:
Some peaple target that when neurophysiological data are used as explanations tor psychological phenomena, they become neurophysiological theories. A. The Psychobiological Theories The impact of the compelling and exciting neurophysiological data concerning the feature specific mechanisms we have already described was to suggest to some psychologists that certain of the processes, which had been handled only descriptively at best by the qualitative theories of the previous decades, might be better explained by hypothesizing similar simple neurological nets in the human nervous system. Some of the earliest examples of this sort of neurophysiological modeling were deceptively straightforward. McCullough (1965) found wh at appeared to be a near analogue of the directional sensitivity of the Hubel and Wiesel cortical columns when she reported color adaptation effects of hypothesized edge detectors. Whether or not the effects are peripher al (retinal) or central is still not clear, and the interested reader is directed to Blakemore and Campbell (1969) and Gilinsky and Doherty (1969), who considered this point in detail. Andrews (1965), basing his conclusions mostly upon preferred perceived orientations in a psychophysical experiment, points out a number of similarities between his data and the Hubel and Wiesel results. Much to his credit, he does point out that "alternative hypotheses could be invoked." Blakemore and Sutton (1969) reflect an analogous, though reverse, direction in their thinking when they say that grating adaptation experiments of this sort may allow a means for "the study of the trigger features of optimum stimuli of human sensory neurons."
442
Sensory Coding
Mayzner's group (for example, Mayzner, Tresselt, and Helfer, 1967; Buchsbaum and Mayzner, 1969; and Mayzner and Tresselt, 1970) has vigorously supported the notion that lateral inhibitory interaction between Hubel and Wiesel type cortieal columns is responsible for the "sequential blanking" effects found in their experiments, using a computer controlled cathode-ray oscilloscope to produce stimulus patterns. They hypothesize a form of inter action between columns that are not only specifie to direction of line movement (as reported by Hubel and Wiesel), but also to the number of sides in a stimulus polygon or even to word meaning. CampbeIl and Kulikowski (1966), examining the effect of orientation on visual resolution, also link their data to Hubel and Wiesel type directional sensitivities. Noting that "it is, of course, not possible to argue convincingly from psychophysieal data to neurophysiologie al descriptions of the visual system," they find that some of the characteristies of the Hubel and Wiesel data are very similar to some of the data obtained in their experiment, not only qualitatively, but in the details of the range of angles of sensitivity of the compared effects as weIl. Dember and PurceIl (1967) feel that the Hartline type of lateral inhibition is the most likely candidate to explain the psychophysieal disinhibition effects they observed in a visual masking study. Perhaps the most sophistieated theoretieal position in this area has been taken by Weis stein (1968). Pursuing her own earlier experimental work (Weisstein, 1966) and that of others, she has developed a very compelling mathematieal model for metacontrast effects, based upon notions of lateral inhibitory interaction. Metacontrast is a phenomenon first observed by Stigler (1910) and later popularized by Alpern (1953), in whieh nonoverlapping stimuli, presented subsequent to a test stimulus, decrease the apparent brightness of that preceding test stimulus. The two classie stimulus patterns used in the metacontrast experiment are the concentrie ring and central disk, and the three side-by-side rectangles as shown in Figure 8-13 (a) and (b), respectively. The central disk or central rectangle is, respectively, the test stimulus, whieh is suppressed under appropriate conditions in each case. Alpern (1953) discovered that the flanking rectangles, for example, were able to mask or diminish the brightness of the central rectangle only when they foIlowed it. When the flanking rectangles preceded the central one, there was no inhibition. The metacontrast phenomenon has been closely linked with apparent movement by Kahneman (1967) and has been shown to be present when the test and masking stimuli were . applied to opposite eyes (Kolers and Rosner, 1960). Metacontrast is visual phenomenon, whieh has been beset with controversy since Alpern's original studies. There is considerable argument concerning whether or not the response curves relating the metacontrast phenomena are U-shaped or monotonie (see Eriksen, Becker, and Hoffman, 1970) as weIl as with regard to their physiologieal basis. Recently, Weisstein (1969) published a most provocative paper dealing with the same general problem discussed in this section of this chapter. In it, she spoke rather more hopefully of psychophysieal techniques as a means of exploring single ceIl mechanisms than some other investigators would be willing to accept. While Weisstein does appreciate the diffieulties involved in go-
Feature Detection-Neurophysiology and Psychophysics
443
(a)
FICURE 8.13 Two standard
stimulus patterns that can be used to produce metacontrast effects. (a) Concentric disk and annular ring. (b) Three side by side rectangles.
(b)
ing from a behavioral description to a neuronal analysis, she seems willing to assurne that many behavioral tests do reflect the operation of simple neural networks. Rothberg (1968) has also used Weisstein's data to develop a computer model based upon similar notions of lateral inhibitory interaction. Pantle and Sekuler (1968) postulated the existence of a hierarchy of size-detecting feature filters in an attempt to explain the results of some experiments in which only some grating line widths inhibit the detection of subsequently presented gratings of similar size. An abundance of other recent data using sinusoidal stimuli has been accumulated in the past few years by an increasing number of investigators. Many of them also assurne that spatial frequency analyses are made by neural units in the brain sensitive to specific frequency features. In sum, we can see that the idea that psychophysical data direct1y reflect the operation of simple feature filters has gained wide currency in the psychologicalliterature. B. Same Discrepancies But consider that, in fact, there have been no actual identifications of physiological nets underlying any of these results in other than invertebrate preparations. The mode of argument of all the papers cited above is always based upon analogous forms of response. Nevertheless, all have strongly suggested that the observed results are largely due to very specific neural micromechanisms. From a general epistemological point of view, such assumptions are based upon the most treacherous grounds. The main idea behind analogue computers, for example, is based upon the fact that certain systems, which are organized in similar manners, may exhibit comparable behavior even though they are made of completely different components. The typical eIectrostatic office co pier, for exampIe, displays behavior that is very much like the Mach band observed in human vision. In the copier, however, this contour intensification is due to the lateral inhibitory interaction of powders and electric fieIds, rather than to neural interaction. Yet the conclusion drawn by the cited physiological
444
Sensory Coding
psychologists is not one of process or analogue similarity, but specifically one of structural identity. Earlier in this book, attention was called to the fact that there must necessarily be a distinction made between the neural signals that are identifiable with behavioral processes (true codes) and those that, while they may correlate with the stimulus, actually are only irrelevant concomitants (signs). Presently we are specifically concerned with the question of whether or not the effects of many of the fascinating and exciting single cell interactions we have already discussed are exhibited in psychophysical responses or whether the contemporary suggested correlations are spurious and misleading. In this discussion, attention has been concentrated on those papers that do not support the notion that single cell behavior is reflected in molar behavior to present the other side of the case. Once again, it must be emphasized, this is not meant to discredit the more general notion that some sort of neural activity underlies all behavior, but rather this approach represents a critique of the very special associations made between predominantly peripheral and relatively simple neural net interactions on the one hand, and some very special perceptual phenomena on the other. Our discussion is essentially a plea against premature oversimplification of what probably will turn out to be very complex physiological mechanisms of only superficially simple perceptual phenomena. 1. Missing Parts. A necessary axiom of the general thesis that the opera-
tions of simple feature-filtering networks are reflected in molar behavior is that the features to be detected must be present. This axiom might be said to be "syntactical," for it deals with the specific geometrical placement and structure of the parts of the pattern. From this point of view, the meaning or significance (semantic content) should not be an effective variable if a feature-filtering model is correct. However, some recent experiments, in which the critical stimulus material is actually missing, suggest that some kinds of form perception may be influenced more by general organization and context than by the presence or absence of specific features or the simple geometrical information content of the stimulus pattern. Warren (1970) reports the results of a very interesting experiment, in which a masking sound (such as a cough or a tone) is used to completely replace a speech sound in a recorded sentence. Because of the sequential redundancy, as expected, listeners in this situation have no difficulty in reproducing the sentence, including the missing sound. However, surprisingly, they also report that they do hear the missing speech sound. The redundancy built into the recorded sentence is not only sufficient, therefore, to convey the meaning, but also to allow the listener to perceive the missing speech sound, even though it is not there. Interestingly enough, Warren also shows that if the missing speech sound is not replaced with an extraneous noise, but simply clipped out of the recording, it is easy to detect and locate the gap. This auditory study is an analogue of a number of visual phenomena. Our inability to deal with lacunae, as lacunae, is a striking phenomenon, which has been too infrequently studied. The minimum thresholds for visual temporal gap detection (Uttal and Hieronymus, 1970), the inability
Feature Detection-Neurophysiology and Psychophysics
445
to detect the "blind spot" of the eye, and other similar phenomena all speak to this point. Leeper's (1935) classic studies of the perception of fractured figures also clearly show that the perception of form is influenced by overall organizational factors other than specific features. The general significance of the remaining portions of partial figures becomes instantly clear when appropriate clues of meaning are given in a way that could hardly be conceived of as being due to the action of simple neural feature filters. Man's perceptual history is filled with many other instances in which missing parts are filled in or incongruous parts ignored by the perceiver. A whole vocabulary of such descriptive terms as "perceptual filling" had evolved to describe these perceptual phenomena in the tradition of Gestalt psychology. The dot patterns used in some of the experiments mentioned earlier (Vttal, 1970, 1971a) also are forms with missing parts, for no continuous features are actually present-only arrangements of dots. Thus, the features are only suggested by the statistical relation of the dots, and any explanatory model of these phenomena based on feature analysis alone would have to involve higher level statistieal or global evaluation of the cumulative response of individuallocal receptive areas. One of the initial results of these studies (Vttal, 1969c) was that not all characters were identified with equal ease. At very high noise levels, at whieh most characters could not be detected with any greater success than chance levels, the four characters I, K, L, and X were still surprisingly recognizable. The character X was on this list because of a rather curious artifact. In the font used in that experiment, X, unlike all other characters of the special alphabet, had no long straight lines of dots (see Figure 8.14 for a complete set of the characters) and was, therefore, almost invisible even at moderate noise levels. The subjects learned quite early that when they saw nothing at all, they had probably been presented with an X and thus they reported that character. Quite artifactually, therefore, the "recognition score for X" was elevated. The characters I, K, and L, however, were recognized at supernormal levels for two different reasons. First, their confusion with other members of the alphabet stimulus set was lower than that of the other characters; second, they contained long straight dotted lines, whieh uniquely defined these three characters. Long lines of dots appeared to be more easily detected than the shorter fragments that provided the distinguishing criteria for many of the other characters. When other members of the character set containing long lines of dots received low recognition scores, it was usually due to the diffieulty of detecting the shorter line segments that were critically necessary to define which of several confusable characters was actually presented. The parameters of line detectability are thus of considerable interest in relation to the specifie problem of geometrie form recognition. In one study (Vttal, Bunnell, and Corwin, 1970), the parameters of line density, orientation, and dot numerosity were studied to determine what effect, if any, these parameters had on line detectability. Dynamie visual noise (DVN) as described earlier was used to degrade the dotted line stimulus, which otherwise would have been almost perfectly detectable at all brightness levels above the threshold for the individual dots.
446
A A A A
Sensory Coding
A A A A
A A A A
A A A A
A A A A
A A A A
FIGURE ·8.14 A sample set of alphabetic characters that can be used as test stimuli in the dot masking experiment (trom Uttal, 1969c).
The task of the 5 was to identify the orientation of a dotted line embedded at the center of a two-second-long burst of dotted visual noise of variable density. One of the main results of the experiment is shown in the curves plotted in Figure 8.15. The data have been separated into a family of four curves separated as a function of the spacing between the dots that make up the straight line. The variation in the recognizability of the various patterns is shown as an effect of various levels of masking DVN by plotting the percentage of the total number of dotted lines, which were correctly oriented against the noise level. It is clear from this rendition of the data that the spacing of the dots is a key factor in the detection and correct orientation of the straight lines. It was also shown that beyond four or five dots, dot numerosity had no effect on the recognition score. The main impact of this experiment arises out of its requirement that the subject select, from a number of isolated point stimuli, a particular set that is aligned, in some statistical manner, along a common axis. The fact that the dots of the line are no more interconnected with one another than they are with the dots of the noise and are, actually, in some cases less so makes it hard to understand how a peripheral spatial filter or feature-extracting mechanism sensitive to this form or organization could operate at the noise densities used. On the contrary, we would have to hypothesize a very elaborate set of receptor signal analyzers, which would have to operate on statistical distributions in a way that comes very dose to modern discussions of decision making and other cognitive interpretative functions, and is increasingly distant from the simple geometrical
447
Feature Detection-Neurophysiology and Psychophysics 100
80 % correct
FICURE 8.15 Results of an experiment on the recognizability of dotted lines of various dot spacings. • = dot spacings of 17.5 mini. = 35 mini X = 52.5 min, and • = 70 min. The horizontal axis indicates the interflash interval between dots in the DVN, while the vertical coordinate represents the percentage of the total number of lines presented that were correctly oriented by the 5 (from Uttal, Bunnell, and Corwin, 1970).
60
40
20
o
2
4
6
8
10
Noise interval (msec)
neural interactions that probably underlie such phenomena as Mach bands. 2. Some Other Psychophysical Data. The missing part experiments are only one subset of a much larger group of psychophysical experiments, which suggest that the obtained psychophysical results do not concur with microtheories based on the action of simple nerve nets. Mayzner's support of the relevance of Hubel and Wiesel's data to sequential blanking has been noted above. However, it is not at all clear that his results have been interpreted appropriately. While Mayzner's notions are extremely stimulating, it is all too easy to see how they are contradicted by his own data. In arecent elaboration of his views (Mayzner and Tresselt, 1970), he points out that the meaning of the stimulus materials is very effective in defining the sequential masking effects he has observed in so many different forms. To quote his interpretation: The implication af these results seems clear. If the first five letters displayed form a word, the inhibitary field effects af the last five letters displayed prave almast tatally ineffective in producing sequential blanking effects, while if the first five letters do not form a word, wh ether the second five letters farm a word or not, inhibitory effects are very strong and sequential blanking occurs most readily. Thus, it would appear that not only can input content or geometry greatly modulate sequential blanking effects, but word meaning can also produce equally powerful modulating or attenuating effects on sequential blanking. (MAYZNER and TRESSELT, 1970, p. 611.) He then goes on to discuss how this means that the higher cortical centers must be deeply implicated in the sequential blanking effects. In this regard he is correct, for experiments in the present author's laboratory have shown that a sequential masking effect first observed by Schoenberg, Katz, and Mayzner (1970) with five simple dots is obtained very strongly if the
448
Sensory Coding
three masking dots are viewed by the right eye and the two masked dots are viewed by the left eye. Since there is no possible neural inter action prior to the lateral geniculate body in this case, it is at this level or at the level of the cortex that the effects must be mediated. Mayzner and Tresselt explain this sort of sequential blanking effect in terms of lateral inhibitory inter action between cortical columns of the sort described by Hubel and Wiesel. It seems that Mayzner's suggestion requires that columns be available which are selectively sensitive, not only, as observed, to line movement directionality, but also to meaning or polygonal geometry, in a way that had not been observed in the neurophysiological laboratory. These hypothetical columns go so far beyond the observed data that it seems more reasonable to talk about inhibitory interaction of quite a different kind. While there may be mutual inhibition, it is a sort of semantic inhibition based upon the activities of nervous mechanisms that are very, very complex and might indeed have little structural similarity to the highly delimited sensitivity of a column of cortical cells. The inhibitory interactions may be analogous in some limited sense, but are certainly not homologous. An important related fact is that even in some of the quasi-geometrical masking paradigms such as metacontrast, it seems unlikely that lateral inhibitory interaction works in the simple way hypothesized by some theorists. For example, form similarity appears to be necessary for simple metacontrast. It is sometimes forgotten that when this particular kind of backward masking was discovered, it was extremely difficult to find stimuli that exhibited the classical metacontrast described by Alpern (1953) or Kolers (1962). The many studies done with disks and concentric rings or with the three adjacent rectangles in the last few years obscure the fact that the effect is not ubiquitous. In fact, metacontrast is a somewhat rare phenomenon, which occurs only in a few highly specific situations, all of which necessitate form similarity. Figure 8.16 is a set of masking figures, which have been used in the author's laboratory (Uttat 1970) to test for metacontrast effects. Only the combinations labeled (a), (k), and (m) produced substantial masking of the central figure. Thus, prior form recognition seems to be very important in this type of masking situation, again suggesting that very-high-level cognitive functions may be more important than simple geometrical propinquity in the understanding of these effects. In that same paper (Uttal, 1970), specific attention was given to whether or not three specific characteristics of peripheral lateral inhibition, which are usually reported, were obtained in backward masking with dot patterns. Though this stimulus material is quite speciat it is of some interest to note that none of the three conditions led to the usual results. Stimuli interacted only when they were overlapping in the same visual space; disinhibition was not obtained when a second masking pattern followed the first; and dichoptic masking was nearly as strong as under monoptic and binocular conditions. Other high er cognitive mechanisms, such as limitations on the information-processing capacity of the nervous system (pattern confusion), were thus implicated, but hardly the simple sort of lateral inhibitory interaction found in the eye of Limulus polyphemus.
449
Feature Detection-Neurophysiology and Psychophysics
(a)
(h)
(b)
(i)
(c)
(j)
FICURE 8.16 A group of stimuli used to test whether (d) from similarity was necessary for the metacontrast effect. Only (a) and (m) pro- (e) duce a strong effect suggesting that the phenomenon is dependent upon prior form identification and not so (f) much upon the geometry or the propinquity of lines (from (g) Uttal,1970).
(k)
(1)1
G
(m)
Kahneman (1967) has called attenthm to the fact that the metacontrast type of masking and apparent movement are very closely related. Implicit in his suggestion is also the notion that many of these masking effects are media ted at the highest levels of the nervous sytem through mechanisms of the utmost complexity, rather than through relatively simple peripheral inter action. Metacontrast, he suggests, may be more a difficulty in the interpretation of "impossible" apparent motion than a lateral interaction process. Furthermore, Julesz and Hesse (1970) have performed an interesting experiment, in which areas of rotating short line segments (best described by their term "needles") were plotted by a computer for a motion picture display. They discovered that differences in speed of rotation of the needles in a circular central test patch and a surrounding annular region of similar line segments would lead to the perception of the test patch as a distinct subunit. However, if the test region differed from the surround only in the direction of rotation of the line segments, subjects were not able to perceive it as aseparate entity. Julesz and Hesse had designed their experiment specifically to test whether or not the directional sensitivity of single cells observed in the Hubel and Wiesel, and Barlow and Hill work could exhibit a molar psychological effect. Their final paragraph is most interestingly stated:
The finding that clusters of locally rotating line segments which are seemingly well matched to the neurophysiological feature extractors do not yield aglobai psychological percept may be of some interest. It outlines for the neurophysiologist some of the limitations of the human visual system. (lULESZ and HESSE, 1970, p. 244.)
450
Sensory Coding
Their work is of the utmost importance to the argument being presented here. Julesz and Hesse argue that their results indicate the absence of /lintegrating mechanisms that can extract the directionality of a set of similarly moving edges./I It might be further suggested that these studies outline not so much limitations of the human visual system as of that brand of physiological theorizing that originally linked such simple physiological mechanisms to such complex perceptions. Their findings suggest that the microscopic characteristics of single cells may not, contrary to the suggestions of the workers to which we referred earlier, be simply or directly reflected in certain kinds of molar perceptual behavior. From another perspective, we might consider observations on perceptual illusions. Weintraub and Krantz (1971), for example, have conducted some experiments on the directionality of the Poggendorf illusion. When the two parallel lines are either horizontal or vertical, the illusion is maximal, but tilting the entire display away from these axes leads to a precipitous drop in the strength of the illusion. Although some acuity measures have been shown to be axis-sensitive to a slight degree (Taylor, 1963), it is hard to conceive of some simple cellular axial sensitivity of the visual system that could lead to the very gross effects that Weintraub and Krantz observed. These effects must be a function of much more complicated cognitive mechanisms. Similarly, the Zolner and Hering illusions are examples of phenomena in which the overall pattern dominates the geometry and leads to the perceptual distortion of straight lines into curved lines. While one could postulate relatively simple neural net mechanisms exhibiting analogous behavior to explain these illusions, it seems far more likely that very complicated central neural mechanisms mediate such field effects. Another relevant psychophysical datum, which seems to have been often overlooked, was obtained by Nachmias (1967). He showed that visual contrast sensitivity effects, thought to be typical of lateral inhibitory interaction, generally do not occur when very brief exposures are used. This finding is a behavioral analogue and confirmation of neurophysiological data obtained earlier by Barlow, Fitzhugh, and Kuffler (1957). In the brief tachistoscopic light of the experimental design of most of the masking experiments, it becomes increasingly difficult to understand how lateral inhibitory interaction can serve as a model for some of these spatiotemporal perceptual phenomena. Another discordant note is struck by the classic simultaneous contrast experiment with either monochrome or colored stimuli. These phenomena have been frequently cited as effects of lateral neural interactions. For example, Ratliff (1965) has linked the classic simultaneous contrast stimulus (as shown in Figure 8.17) with Mach bands and has suggested that all must share some underlying physiological mechanisms in common." But it should also be noted that there are very substantial differences between the contour effects and the broad field effect. The Mach band is clearly affected by the details of the geometry of the stimulus situation. The brightness at any point is specifically a function of the distance the various components are from one another. On the other hand, the simultaneous contrast phenomenon, like the metacontrast one de/I •
••
Feature Detection-Neurophysiology and Psychophysics
451
FICURE 8.17 The classic simultaneous contrast stimulus. The lower grey square should appear darker (because of the lighter background) than the upper grey square.
scribed earlier, is nongeometrical in a very important sense. Stimuli come .and go or change in apparent brightness as wholes, almost independent of the propinquity of the conditioning fields to specific portions of the test field. This difference is fundamental and suggests that there may be two very different classes of similar perceptual phenomena, one including Mach bands, specifically dependent upon geometry and, thus, upon simple neural interactions, and the other .more global and including simultaneous contrast, but dependent upon far more complex interactions at higher neural levels. Koppitz (1957), for example, has shown that Mach bands cannot be produced by dichoptic mixing of two stimulus patterns which, if presented jointly to either or both eyes, would produce the Mach band effect. Fig. 8.18 shows his experimental paradigm. A ramp or gradient of visual intensity is presented to one eye and a level plateau of visual intensity to another. When the two are brought together in the dichoptic field so that their edges touch, the conditions for the appearance of a bright Mach band
452
Sensory Coding
Left eye Intensity
Right eye
~ Distance
(a)
Fused binocular image
(b)
FIGURE 8.18 The design of if Mach bands could be produced binocularly. When the Koppitz' stimuli to determine two upper monocular stimuli are combined in binocular fusion to produce the usual gradient, there is no observable enhancement indicating that the Mach band effect is media ted by neuronal interactions in the peripheral retina (courtesy of Dr. Werner Koppitz, Mt. Kisco, New York).
would be satisfied. However, no Mach band appears. The visual system apparently is not able to carry out the same interactions centrally that are performed in the retina to produce the Mach band. A closely related contour effect is the famous Hermann grid experiment reproduced earlier in Figure 7.22. If a similar attempt is made to reproduce this phenomenon by dichoptic presentation of the two parts of the stereoscopic slide shown in Figure 8.19, there is no evidence of the grey spots at the intersection of the white lines, which were so evident at all locations (except at the fixation point) in the monocular presentation. This latter stereoscopic slide, incidentally, is quite hard to keep stably fused, and a considerable amount of rivalry occurs. After some practice, however, there are periods of stable fusion, and it is within these time periods that the absence of the grey spots is noted. On the other hand, Figure 8.20 is a drawing of a picture pair which, when presented in a stereoscope, produces a simultaneous contrast effect even though the stimuli presented to each eye do not and cannot produce any such effect individually. When the two figur es are jointly presented, the upper square formed from the mixing of the grey square from the left-hand picture and the black square from the right-hand picture is apparently darker than its counterpart below. This difference is attributed to the difference in the mixed brightnesses of the background fields. A similar demonstration has been made by Levelt (1965), who feels that the effect is due to the difference in contours presented to each of the eyes. Nevertheless, he, too, agrees that the effect is central-the main point being made by this demonstration. Julesz (1971) has also shown similar effects with random dot stereograms. In sum, there appear to be many differences between the simultane-
Feature Detection-Neurophysiology and Psychophysics
453
nCURE 8.19 A stereoscopic slide prepared to determine if the Hermann grid effect can be produced dichoptically. Although this slide is difficult to fuse because of a strong retinal rivalry effect, du ring the stable periods there is no evidence of the grey spots. The suggestion, therefore, is that the Hermann grid, like the Mach band, is mediated by neuronal interactions in the periphery.
ous eontrast and eontour enhaneement phenomena, which suggest that they are mediated by meehanisms at vastly different levels of the nervous system. Contour enhaneement does appear to be peripheral and dependent upon loeal geometry, while the simultaneous eontrast meehanism appears to be eentral and global rather than loeal. As we have noted, metaeontrast and apparent motion also ean be obtained und er dichoptie presentation eonditions. Therefore, sinee we know that binoeular representation does not oeeur below the level of the thalamus, these phenomena must be media ted by meehanisms weH up in the eentral nervous system. All of this seems to support the notion of two classes of phenomena: one, geometrical and peripher al and the other, nongeometrical and eentral. It should also be noted that analogies based on the Hartline data ohen tend to eonfuse the spatial notions of lateral interaetion, that is, simultaneous presentation for prolonged periods of spatiaHy disparate stimuli with sequential presentation of overlapping visual form without regard to some of the major differenees between these two experimental paradigms. Another differenee between the Maeh band and simultaneous eontrast phenomena relates to the additivity of two or more inducing fields or of inereases in the area of a single eontrast field. Hartline and Ratliff (1958) had shown that in Limulus, the inhibitory effeets of the two stimuli are linearly additive, although one must take into aeeount the reciproeal interaetion between them to arrive at a eorrect numerical value. Hartline's group has also shown (Hartline, Wagner, and Ratliff, 1956) that as the area of the inhibiting field increases, the inhibitory effect also inereases. In fact, all of the lateral inhibitory interaetion models that Ratliff (1965) diseusses exhibit this sort of additivity. By eontrast, Cole and Diamond (1971) have shown that the proportion of the total area of the surrounding
454
Sensory Coding
FIGURE 8.20 A stereoscopie slide prepared to determine if the simultaneous contrast illusion can be produced dichoptically. This slide does give rise to a simultaneous contrast effect in spite of the fact that neither monocular view could possibly produce any brightness difference. See text for complete discussion.
contrast field used in a simultaneous contrast situation does not affect the magnitude of the contrast. Diamond's (1960) theory of brightness enhancement, it should also be no ted, was only able to describe adjacent rectangle inter action and not the contrast effect of a surrounding circular inducing field on a circle. This difference in mathematical description also suggests that different neural processes may underlie each of these phenomena. Simultaneous contrast, though, displays another important effect, which suggests that simple geometry is not the basis of the phenomenon. Gogel and Mershon (1969) and Mershon and Gogel (1970) have shown that the contrasting effect of a surround on a target is diminished if the two appear not to He in the same stereoscopic plane, even though the lateral relations hold. Their interpretation is .that this finding argues against any theory that places the contrast effect in the periphery. Simultaneous contrast, thus, according to their view, as weH as the other arguments already cited, must occur at a level at or above that mediating stereoscopic depth perception. We mayaiso refer to arecent paper by Weisstein (1970) herself for additional evidence that simplistic physiological models do not apply to some of the very complex perceptual experiences. In this most intriguing paper, she has shown that the masking effects of a grating on a subsequently displayed test pattern occur even if the region of the grating, in which the test patch is shown, is obscured by a cubelike drawing as shown in Figure 8.21. Weisstein claims that this result means that there probably is a neural mechanism, which encodes the information "in back of" in this stimulus situation. It is almost certainly true that she is correct in this
Feature Detection-Neurophysiology and Psychophysics
455
(a)
FIGURE 8.21 An adaptation field (a) and a test stimulus (b) which can be used to show that the grating adaptation effects do not require that the grating specifically stimulate the tested region. The perception of the grating is still interfered with to some degree even if it is placed in the region obscured by the cube (trom Weis stein, 1970).
(b)
assumption, but it is probably also true that this hypothetical mechanism is also far more complex than the simplistic notions of neuron inter action or columnar structure that Weisstein, among others, has invoked to explain some of the masking effects found in this and related masking paradigms. She is careful to note that these observations are related to higher level "symbolic" mechanisms in co ding the concept "in back of" but, in doing so, fails to note that many of the simpler correlations observed between other perceptual phenomena and neural mechanisms also might be mediated by such symbolic mechanisms. Rather than adaptation of directionally sensitive receptors, the symbolic value of "similarity" may be the key issue. Just as she has had to extend the existing theoretical structure to include highly complex "in back of" coding mechanisms, other results superficially analogous to known physiological mechanisms may also require "symbolic mechanisms" that go far beyond the observed neural data.
456
Sensory Coding
3. A Few Discrepant Neurophysiological Data. Attempts to correlate
specific neurophysiological data with specific perceptual phenomena have also led to some data that contradict the notion that the operation of simple neural net mechanisms is satisfactory explanatory models for some of the perceptual phenomena we have discussed. One line of research dealing with visual sequential interference has been pursued very effectively by Schiller. Distinguishing among three different kinds of sequential masking-masking in which a bright diffuse light reduces the likelihood of detection, of a dimmer smaller light masking of complex patterns that overlap, and metacontrast, where the stimuli do not overlap-Schiller (1968, 1969) has shown that it is only when stimuli of the first two categories are used that the single cells of the lateral geniculate body and cortex exhibit any analogous interactive effects. The metacontrast situation, which has so often been subject to this quasiphysiological kind of theorizing, shows no lateral geniculate cellular response analogues that might be correlated. Schiller goes on to say, " . . . metacontrast is a complex phenomenon, which may depend on levels in the visual system above the LGN [lateral geniculate nucleus]." (Schiller, 1968, p. 865.) In related work, Fehmi, Adkins, and Lindsley (1969) find very peripheral suppression of the neural response, but only when the. stimuli overlap in the same manner as in the Schiller experiments. DeValois and Pease (1971) have shown, furthermore, that single cell responses from lateral geniculate neurons definitely separate simultaneous contrast and the border effects leading to such phenomenon as the Mach band into two separate and distinct processes just as suggested by the psychophysical data. Comparable lateral interactions among these cells were observed when stimuli, which led to psychophysical contour enhancement, were used. However, there was no evidence of any suppression of the neural response when the stimulus was similar to that producing the simultaneaous contrast. effect described earlier. DeValois and Pease conclude that these latter data are compelling arguments for much more complicated and more central cortical mechanisms for simultaneous contrast. The implication is that the geometry of the situation is not critical, and therefore mechanisms such as lateral inhibitory interaction and centersurround organization probably playasmall, if any, role in such phenomena.
C. Conclusions It is important to clearly express a significant caveat at this point. The following concluding comment is not intended to suggest that the entire monistic philosophy underlying modern psychobiology should be rejected or that the neurophysiological da ta are, in any way, less than fully valid within their own realm. Rather, it is intended to stress the idea that neurophysiological reductionism for the particular aspects of visual perception discussed in this section may have been somewhat premature. The general question of whether or not aspects of single cell activity can be detected at the molar psychological level is critically important. But this section
Feature Detection-Neurophysiology and Psychophysics
457
attempts to raise the more specific question of whether or not these very particular perceptual phenomena can be meaningfully ascribed to lateral inhibitory interaction, feature filtering, or cortical columns in the light of some of this recent psychologieal and physiological evidence. Someday, future psychobiologists may consider this current trend in "explanations" exactly as we now look upon the computer models, the telephonie models, or the even more ancient hydraulie and pneumatic models of brain function, as simply the subsequent stage in aseries of reductionistic fads. It is entirely possible that they will also make some judgments ab out reductionism itself, which may not differ too drastically from those judgments made by physicists in the field of gas dynamics. Few physicists would consider a mieroscopic analysis of the dynamics of individual gas atoms as a possible and practieal means of predicting the overall behavior of a container of gas. Instead, they use the external metrics of pressure, volume, and temperature-statistical estimates of the central tendencies of the individual particles. Psychologists may, in the not-too-distant future, also come to the same sort of conclusion-that complex mental activities, though admittedly based on the pooled statistieal properties of ensembles of individual cells, still cannot, in any practical sense, all be explained in terms of the individual behavior of these microscopic structural units. This may be more obvious in some aspects of psychology than in others. The communieation of information patterns through the ascending pathways does seem amenable to some types of single cell analysis. The explanation of complex perceptual phenomona, however, may lie at a level of complexity with whieh no current neurophysiology can deal. It may also be true that we shall have a diffieult time distinguishing between processes whieh, superficially, seem closely related by experimental designs, but which may have quite different underlying mechanisms. For example, it is probably true that retinal lateral inhibitory interaction does, in fact, operate in some ways that can be detected behaviorally, as in the perception of Mach bands on the Hermann grid. Nevertheless, other mutual inhibitory mechanisms, whieh are superficially quite similar, may be based on far more complex mechanisms. It is for this reason that the taxonomies that Schiller (1969) and, similarly, Fitzgerald (1970) developed to distinguish the several different kinds of masking are so important in helping us understand these phenomena. It is also important to note differences in the meaning of the terms feature extraction and co ding at the cellular level and at the operational or psychologie al level. Feature-extraction theories of visual perception may deal with greatly media ted representations of visual forms. Words, for example, are wonderful mediators for the co ding and subsequent recognition of visual forms. The word square, for example, is a nongeometrical code for a certain geometrie form, as in a,a,a,a, or even \[!] Jf:J . Once appropriately coded, the features (for example, corner, side corner, and so on) that make up any partieular geometrie form can be stored and retrieved, or allowed to interact with one another in ways that are not directly related to the spatial location of the parts or overall geometry of the stimulus. Mutually antagonistic processes could be present, although
458
Sensory Coding
not based on the classieal geometrie al lateral inhibitory interaction. Thus, recognition schema operating on coded features are entirely possible and might even exhibit processes and dimensions analogous to many neurogeometrie effects, although the schema may be mediated by entirely different neural mechanisms of, for all practieal purposes, infinite complexity. Psychobiology may yet have to face the probability-determinism schism that physies faced a few decades ago and, like physies, finally come to the conclusion that there can be little fruitful outcome from dealing deterministieally with the mieroscopic elements of a complex ensemble. VII. AN INTERIM SUMMARY
We have now completed our discussion of the psychophysieal and neurophysiologieal topies, whieh revolve around the many issues of spatial and temporal co ding in the various sensory modalities. One of the most important summary points, whieh should be self-evident at this point, is that it is very diffieult to unravel the spatial and temporal dimensions from each other. Space and time seem to be, in many instances, interconvertible one into the other, or linked in such a way that a temporal or spatial snapshot alone loses the most critieal parameters of the sensory code. Sensory-coding theory may, therefore, be required to postulate some sort of a relativism axiom concerning the temporalization of spatial dimensions in much the same way as has happened in physies. In the last two chapters, we have considered a number of special problems, such as the neurologieal basis of depth perception in vision and spatial localization in audition, whieh emphasize the fact that isomorphy is not necessary in a neural coding schema. It seems in the nervous system as if time often represents spatial dimensions. For example, auditory space is coded in large part along a temporal continuum. Auditory pitch, a correlate of the temporal frequency of the stimulus, seems to be transformed, on the other hand, into spatial relationships by nonneural preprocessing. It is these sorts of findings that have made it impossible to separate the material in this chapter into two separate categories. Another general outcome, whieh must be noted here, is that spatial and temporal discriminative abilities as expressed in psychophysieal experiments usually are quite imprecise in relation to the fineness of the stimulus world. The general neural properties, whieh appear to ac count for this 1055 of discriminative ability, are the divergent and convergent network processes of the nervous system. Convergence of many sensory receptor inputs into a few transmission neurons leads to a 1055 of spatial acuity-a general property whieh is reflected in the ubiquitous presence of "receptive fields" for almost all neurons more central than the receptor itself. Divergent mechanisms, on the other hand, account for the elongation of brief real-time mieromoments into psychophysical macromoments. A I-msec nerve response in the periphery may give rise to central neural activity that lasts for seconds. With this elongation comes a 1055 in the ability to discriminate between both the order and simultaneity of successive stimuli. Thus is introduced the notion of the psychologieal moment, the temporal analogue of the spatial receptive field.
Feature Detection-Neurophysiology and Psychophysics
459
Clearly, one of the most important sets of notions that have emerged from the last two decades of neurophysiological research is that the overall spatio-temporal pattern of the stimulus is important, probably far more important than the simpler dimensions of occurrence of an impulse stimulus in defining the neural response. Lateral interactions between adjacent receptors and the time sequence of these interactions allow a number of highly specific feature detection properties to emerge in neural response patterns. Contour sensitive mechanisms, critical time-space patterns capable of elicting activity where generalized stimulation is not, and special sensitivity to moving objects of specific shapes all emerge as fundamental mechanisms of the nervous system. The elucidation of the physiological mechanisms and of related psychophysical phenomena has been one of the most important emerging traditions of this period. However, there are a number of caveats that must also be observed. It is not yet certain that some of the more superficial process analogies between behavior and neurology are truly structurally related. Some processes, only superficially similar, have been misassumed by some workers to be identical. In this chapter, we considered whether this is a valid approach. We turn now to another area-those problems that concern the representation of the quality of a stimulus and consider both the basic psychophysical evidence, which has to be explained, and the neurophysiological findings, which are relevant to the specification of a co ding schema for quality.
CHAPTER 9: THE NEURAL CODING OF SENSORY QUALITY-VISION
I. INTRODUCTION We now turn to the other great experiential domain-sensory quality. As in the previous chapters, we would like to begin our discussion with some attempt to deHne what is meant by sensory quality. Disappointingly, we are in no better shape than when we attempted to define quantity or to unravel the often intermingled temporal and spatial dimensions of sensory experiences. Sensory qualities also seem to be elusive entities which, although everyone seems to have reached common agreement concerning their nature, evade a precise and rigorous definition. We are tempted to add to the circularity of the definitions we have already given and say that if one holds the sensory magnitude constant and fixes the perceptual temporal and spatial pattern, any further differences that can be discriminated by subjects are by exclusion qualitative in nature. But this sort of definition, of course, adds little to our understanding beyond the generally accepted popular definitions. It is equally unsatisfying to attempt a definition in terms of the nature of the physical stimulus, but it is probably necessary. Each of the sense organs has evolved to displayamaximum sensitivity to a particular kind of physical energy. Perhaps the best we can do is to define qualities for each of the senses individually in terms of the stimulus. For example, visual quality is defined as the range of experiences that are produced by stimulating the eye with photic energy varying over the wavelength range of 400 to 780 nm. To the degree that different stimuli can be distinguished
The Neural Coding of Sensory Quality-Vision
461
from one another on the basis of wavelength alone, each visual stimulus wavelength or indiscriminably narrow band of wavelengths is associated with a particular visual quality. Equivalent definitions can be made for some of the other senses, although such definitions break down when there is no single dimension of stimulus variation. The cutaneous and chemical senses respond to multidimensional stimuli, and the experiences produced in each case are not so simply classified. Another major distinction, which must be kept in mind during our discussion of the neural co ding of quality, is that there are probably two separate ways in which the term quality is used. In some gross manner, we must distinguish among the macroqualities of sensory experience-the great sensory modalities-vision, touch, sm eIl, taste, and hearing. On the other hand, within each one of these modalities, we also have to deal with microqualitative differences among experiences-a domain in which we would use such terms as distinguishable hues or pitches. The distinction between macro- and micromodalities, of course, becomes less clear when we talk about mixed experiences such as that collectively referred to as cutaneous sensation. In this context, it is not always clear whether there are many macromodalities or a set of micromodalities. The different cutaneous experiences, such as heat, touch, pressure, tickle, and itch, do not seem to be neatly analyzable or synthesizable one from the other. Similarly, the classic but probably incorrect qualitative descriptions of the olfactory and gustatory modalities usually involve the classification of these senses into "fundament als" such as sweeet, sour, bitter, and salty or a set of fundamental odors, but not all would agree that these "fundamentals" play the same role as the primary colors of vision. In olfaction and gustation, each of the "basic" smells and tastes appears, from some points of view, to be more comparable to color vision in toto than to individual hues. One olfactory or gustatory quality does not continuously grade into another as do the colors or pitches. It seems, in general then, that our definition of quality is not only less than precise with regard to macromodalities, but also with regard to wh at actually constitutes a micromodality in somatosensation, gustation, and olfaction. It is probably also true that there is a closer relationship between the language we use to describe sensory quality and our analyses of wh at constitutes a fundamental or primary experience than some solely physiologicalor psychophysical sensory scientists would be willing to admit. It is still moot whether the language reflects the biology, or the theoretical models of sensory quality merely reflect the language. Amoore's analysis of basic smells, for example, is based upon commonly accepted linguistic usage -the words people use to describe smells-and although, hopefully, there is in this language some reflection of the basic biology and chemistry of olfactory receptors, it is difficult to say with certainty that the number "seven" has any special significance. Furthermore, there are two issues involved, even in this statement that should be separated and that have interacting physiological, psychophysical, and semantic overtones. First, are the seven basics that Amoore identifies really fundamentals? Second, even if they are, are they the only ones or are there more? The familiar anecdote about the many names for the different kinds of snow
462
Sensory Coding
that exist in the Eskimo language is relevant here. As we shall see, the number of " primary" smells one accepts may be more a matter of the precision of the decision criteria applied by the experimenter than the biology of the subject. The point being made is that the search for primaries in the somatosensory, gustatory, and olfactory senses may be quite artificial and stimulated only by the success of the notion of primaries in color vision. In fact, however, there may be very little to support the notion of primaries in the sense that a small number of basic smells, touches, or tastes can be used to synthesize all other possible experiences. Certainly such an idea is not apart of auditory theorizing, where analysis rather than synthesis is more often the framework of discussion. The interaction between the biology and the descriptive models of sensory quality co ding has b'een a basic issue for many centuries. In the material of the next three chapters, we shall try to present and discuss those alternative theories of sensory quality coding that have a specific neurophysiological or anatomical point to make. In doing 50, we shall pass over without mention those theories of sensory quality co ding that are merely descriptive. Since most of the theories of sensory co ding, even those with a specific physiological or anatomical premise, were inspired by certain aspects of the psychophysical data, we shall attempt to make explicit both what the specific psychophysical data are and what its neural implications are. In the last 20 or 30 years, a very important revolution has occurred in sensory quality theories. Electrophysiological and anatomical techniques have been used to provide explicit and direct criteria for the selection of which of several previously suggested and equally plausible alternative theories is, in fact, the most useful one. The indirect speculations and deductions from behavioral data in the past have thus been augmented by these new approaches in a way that has lifted a veil of confusion or speculation from many important problems. The direct observation of the spectral absorption of human cone pigments, for example, has made it possible for us now to be absolutely explicit ab out which of an infinitely large set of potential trichromatic fundamentals, each perfectIy plausible from some mathematical or psychophysical point of view, actually is valid. All theories must now conform with regard to the absorption spectra of the photochemicals from now on. Another important generalization, which has become increasingly evident, is that the classical models were not, in fact, mutually inconsistent. We now know, for example, that alternative theories of color vision both seem to be correct, but each at a different level of the afferent visual pathway. Similarly, alternative theories of pitch encoding seem to hold in audition, but at different regions of the acoustic spectrum. The fact that alternative theories are correct in alternative domains once again reminds us of the basic tenet of neural coding theory-namely, that information may be represented by entirely different co ding mechanisms in different situations, yet all such representations are equivalent with regard to the message content. The prime axiom of quality coding has traditionally been Müller's
The Neural Coding of Sensory Quality-Vision
463
law of specific nerve energies. This statement referred to the fact that regardless of wh at stimulus was applied to a sensory nerve, the sensation so induced was of a quality specific to that nerve. Müller's law does seem to be correct when we discuss coding at the level of macroquality. There appears to be no way to stimulate the cochlear mechanism so that a visual impression is produced, or the retina so that an auditory experience obtains. Electricity, the universal stimulus, always seems to produce sensations specific to the particular nerve being stimulated. However, no matter how comprehensive Müller's law is in terms of the macromodalities, it seems as if it is completely inadequate in explaining the coding mechanisms, which ac count for our ability to distinguish between mieromodalities within one of the great senses. In the rest of this chapter, we shall see that all sensory neurons at all levels of the ascending nervous systems are, to a greater or lesser degree, broadly tuned. That is, all sensory neurons respond not to a single narrowly defined stimulus, but rather to a range of stimuli, each of which is able to invoke widely differing mieroqualities. Obviously, in this case the neurons are not specific in the sense Müller meant. Activity in a given visual neuron, for example, can be interpreted as red, green, or blue by the central nervous system, depending upon the relations between its activity, and those of other associated neurons. In the subsequent sections of this chapter, one of the main generalities we shall be able to draw will be of this sort of broad tuning at all levels of the nervous system and in all of the sensory modalities. The general plan of our discussion in the next three chapters will be to deal with each sensory modality individually. We shall first consider the principal psychophysieal data for each modality that must be explained by a reductive theory. We shall then discuss the classic theories of quality coding themselves and then the modern neurophysiologieal contributions that bear specifically on the problem of that modality's quality representation. Finally, we shall present what is, in our view, the most up-to-date contemporary theory of quality coding for each of the senses. In Chapter 13, we shall sum up what appear to be the generalities common to all of the quality coding premises. 11. THE KEY PSYCHOPHYSICAL DATA
A. The Duplex Retina and Its Psychophysical Correlates A large number of different pieces of anatomieat biochemieat and behavioral data present convincing evidence that the retina contains two different kinds of receptor cells. This finding is referred to as the theory of the duplex retina. From the earliest recorded times, we have evidence that men were aware that their daytime vision was different in some critieal ways from their nightime vision. Thus, the earliest evidence for the duplex retina theory was behavioral. Early mieroscopic investigations showed an anatomie basis for this behavioral differentiation. There were two distinct forms of retinal receptors present, whieh could be seen to be either cylindrical or conical in form. This observation confirmed earlier speculations that there might, in fact, be two distinct photoreceptor sys-
464
Sensory Coding
tems simultaneously resident in the human eye. One, now known to be mediated by the rods, is monochromatic and able to operate at lower light intensities, while the other, mediated by three kinds of foveal cones, operates best at higher levels of illumination. In Chapter 4, when we discussed visual transduction, we also noted that the most widely accepted theory of photoreception asserted that four different kinds of retinal photopigments were to be found in certain kinds of vertebrate eyes. Direct observations of the effects of these photopigments have been made in two closely related teleosts (the carp and the goldfish) as well as in primates. The outer segments of the rods of the fishes were loaded with a substance, which Wald has referred to as porphyropsin, a derivative of retinab, while the rods of the primate contain a substance known as rhodopsin derived from retinah. In each animal, each of three types of cone apparently contains one of three other pigments, the absorption spectra of which are assumed to underlie trichromatic vision. There are many different psychophysical indications that support the notion of a duplex retina. For example, during the adaptation process in the dark following exposure to a brief light, the threshold sensitivity undergoes a gradual change so that dimmer and dimmer lights can be seen as time passes. Repeated measurements over the years have shown that under certain conditions, the increase in sensitivity during this dark adaptation process is a two-limbed curve with a noticeable break occurring after about 7 min in the dark. These conditions include: 1. that a sufficiently bright adapating light is usedi
that the area of the retina illuminated includes both rods and coneSi 3. that a broad spectral band is used to light-adapt the eye prior to the dark adaptation; and 4. that anormal ob server is used without any form of color blindness that might obscure the break.
2.
Figure 9.1 shows a typical decrease in threshold intensity (known as dark adaptation) as a function of time in the dark und er these conditions. The first part of the curve is assumed to be due to a rather rapid increase in the low sensitivity cone system, while the second part of the graph is attributable to the increasing sensitivity of the slower adapting but far more sensitive rod system. The break in the dark adaptation curve is assumed to be caused by the crossover of the dark adaptation curves of the rods and the cones, respectively. Such assumptions are based, in part, on data obtained from subjects who suffer from a congenital absence of all three kinds of cones (rod monochromats) and who thus exhibit only the slow segment attributed to rod dark adaptation or from peripheral regions of normal retinae. Foveal dark adaptation curves, on the other hand, are almost pure cone responses and display only the more rapid but less sensitive portion. Similarly, if red light is used prior to dark adaptation, the dark adaptation curve is almost solely the product of the more rapid cone response. Both of these latter observations also support the idea of the duplex retina.
465
The Neural Coding oE Sensory Quality-Vision 3 1° Blue (A max
=
453 nm) test stimulus
f5' Temporal retina Test appears blue
2
Log threshold
Test appears white
o
o
-1
-2
o
10
20
30
40
Minutes in the dark after a full bleach
FIGURE 9.1 A typical two segment dark adaptation curve clearly showing the rod-cone break (courtesy of Dr. Matthew Alpern, University of Michigan).
Another type of da ta that reflects the duplex nature of the retina concerns the wavelength function of absolute sensitivity under conditions of high illumination and under conditions of low illumination. If one determines the relative spectral sensitivity of human vision at high illuminances, it is the spectral sensitivities of the cones that predominate. Such a spectral sensitivity curve is referred to as the photopic luminosity curve. If, on the other hand, the spectral sensitivity is determined in a darkadapted subject at low light intensities, then the data, referred to as the scotopic luminosity curve, exclusively reflect the spectral absorption characteristics of the rods. There are two ways in which these two curves can be plotted. If both the photopic and scotopic da ta are normalized on a graph in such a way that full scale represents the maximum response for the optimum stimulating wavelength of each, then the curves appear as shown in Figure 9.2. This representation, however, is often quite misleading, for although it emphasizes the difference in peak spectral sensitivity, it ignores the fact that the absolute sensitivity of the scotopic curve is so much greater (that is, 50 much less light is needed) than the photopic one. A somewhat better way of plotting these same da ta that avoids this difficulty
466
Sensory Coding 0 1924 CIE photopie luminosity function for standard observer
-1
>-
-2 1951 CIE scotopic luminosity function for young eyes
~
'e;; 0
c:
'E
..
.2
-3
> .;::; C\I
f
-4
CD
0 ...J
-5
-6
-7
400
500
600 Wavelength (nm)
700
800
FICURE 9.2 The CIE relative luminosity curves tor scotopic and photopie vision normalized by assuming equal maximum sensitivities tor rods and cones and plotted as a logarithmic attenuation trom that reterence level. But see also Figure 9.3 (trom Graham, 1965).
is to utilize an absolute radiant energy scale. This was done by Wald (1945), for example, and his version has been replotted in our Figure 9.3. In this plot, the difference in absolute levels of sensitivity is much more clearly shown, and thus the true relationship between the two curves is made clear. It should be no ted that these two curves reflect two different photochemical situations. The scotopic curve is a psychophysical correlate mainly of the absorption spectrum of a single substance-rhodopsin. On the other hand, the photopic curve is the cumulative effect of the absorption spectra of three different photochemicals-those that we have already referred to in Chapter 4 as erythrolabe, chlorolabe, and cyanolabe -the three cone pigments. The photopic luminosity curve can be seen to peak at ab out 560 nm and the scotopic curve about 500 nm. This difference in the peak absorption of the rhodopsin, on the one hand, and the mixture of the three cone pigments, on the other, leads to several other important effects in addition to the break in the dark adaptation curve already mentioned. There is, in addition to the increasing absolute sensitivity of the retina as it darkadapts, a shift in the wavelength-the Purkinje shift-to which the eye is relatively most sensitive. A light-adapted eye is most sensitive to lights
467
The Neural Coding of Sensory Quality-Vision
2 fovea I cones
SO above fovea
2
• •
rods cones
0
...>
'> ';:; 'tij
.,c
.,>
-1
CI>
.,
';:;
~
-2
Cl
0
...J
-3
-4
-5 350
400
450
500
550
600
650
700
750
Wavelength (nm)
FICURE 9.3 The same data shown in Figure 9.2 hut not normalized. This means of graphing the scotopic and photopic curve emphasizes the fact that cone vision is three log units less sensitive than rod vision. Data tor cones at the tovea and at 8 deg trom the fovea are shown. Rod measurements are taken 8 deg trom the tovea (trom Wald, 1945).
in the yellow region of the visual spectrum, while a dark-adapted eye is most sensitive to blue or green lights. The effects of this shift as one gradually dark-adapts can be remarkable. If red and blue stimuli are matched for equal brightness in intense light, the blue will become subjectively brighter than the red under scotopic conditions. Whereas lights containing only monochromatic wavelengths remain relatively constant in their chromaticity as the intensity of the light changes, lights composed of mixed wavelengths tend to change colors as their intensity changes because of this shift in peak sensitivity. Mixed colors, such as combinations of yellow, orange, and red, tend to be yellowish at high intensities, but to take on definite blue and green tones at low intensities. This phenomenon is known as the Bezold-Brucke phenomenon (or shift) and may be reiated to the shift in the relative contribution of each of the four retinal photopigments to the luminous function at differing levels of light adaptation.
Sensory Coding
468
Differences in critical flicker fusion characteristics as a function of the level of light adaptation have also been noted. At high light intensities, all color lights seem to exhibit the same flicker fusion frequencies, while at lower intensities, each spectral color seems to follow its own characteristic flicker fusion curve as shown in Figure 9.4 from Hecht and Schlaer (1936). The implication of this finding is that at high levels of light, the cones all have the same flick er fusion characteristics. In fact, however, this can be shown not to be true. Brindley, Qu Croz, and Rushton (1966), for example, were able to show that the portion of the chromatic system that responds to blue has a lower critical flicker fusion frequency than those portions responding to the longer wavelengths. To bring out this difference, however, they had to preadapt the eye with red and green lights to selectively desensitize these receptors. It was only in this way that the blue system with its lower absolute level of sensitivity could be studied in isolation. The technique of preadapting the retina with a given colored light to reduce the sensitivity of one of the three chromatic receptor classes was invented by Stiles (1949). In sum, the retina exhibits a wide variety of psychophysical functions that are dependent upon the fact that the retina is a duplex system of one type of rod and three types of cones. The generality of this duplexity theory is weIl established and in large part explicable on the basis of the photochemical differences between these two types of retinal photoreceptors. B. Trichromatic Color Mixture It is a fact that a person with normal color vision can match or reproduce any color or color combination by manipulating a color mixer with only three controls. This is the basic fact of trichromaticity and is the basis for much of the theorizing about the neurological mechanisms that underlie this most important quality dimension. This notion can be expressed mathematically in the following way:
u
u
C'I
+
U
+
..t:l
ÜI::t
111
u
(9.1)
where C is the color to be matched, and = represents the notion of " can be matched by" rather than any notion of numerical equality. In this equation, the coefficients a, band c represent (to a first approximation) the percentage of each of the three colors Cl, C 2, and Ca, which are used for the color mixing. The terms Cl, C 2, and Ca refer to a set of three "fundamentals" or " p rimary" colors that can' be used to achieve the match. In theory, any set of three colors can be used as the fundamentals, and the basic notion of trichromaticity has often been phrased in the following manner: any
color can be matched by appropriate amounts of any fixed set of three colors. The only restriction is that the set be orthogonal; that is, no one
oE the three matching colors may be a mixture of the other two. The choice oE what the triad of primary colors is to be, therefore, is theoretically
arbitrary. Indeed, they need not be monochromatic or even equal in
469
The Neural Coding of Sensory Quality-Vision
60
N
450 nm 490 535 575 605 605 670
50
J:
>
.,c
(.)
40
~
C"
....~
~ .;:;
8
30 20
10 0
-3
-2
-1
0
2
3
4
5
Log retinal iliuminancEl (trolands)
FIGURE 9.4 A graph showing the relationship between critical flicker fusion frequencies for different colored lights and the intensity of the stimulating /ights. Above a certain threshold, there is no differential effect of color, while at low light intensities, each color displays its own characteristic flicker fusion curve (from Brown, 1965, after Hecht and Schlaer, 1936).
brightness. However, certain practical considerations (and empirical data, as we shall see below) compel us to choose triads of primaries that are somewhat restricted. For example, if three primaries are chosen, all of which are in the long wavelength end of the visual spectrum, then one of the coefficients a, b, or c will have to be negative. A negative coefficient me ans that that particular one of the triad of fundamentals has to be added to the color that is being matched rather than to the other two members of the triad. Negative coefficients mayaiso be necessary when monochromatic primaries are used in some instances, but the existence of a negative coefficient does not diminish the basic notion of the fact of trichromaticity, namely, that three and only three variable light sources must be manipulated to match all other colors. The choice of the triad of primaries, as we have said, is completely arbitrary. The triad may consist of any three monochromatic colors, any three bandwidths of the visual spectrum, or any combination of monochromatic wavelengths and extended bandwidths. Not only is the spectral composition of the triad arbitrary, but so too is its intensity or luminosity. However, the choice of a particular triad will determine the magnitude of the set of coefficients a, b, and c.
Sensory Coding
470
The coefficients a, b, and c, defining the characteristics of a color match of some unknown, may be considered from a number of different points of view. Given a certain triad of primaries, they may be defined as nondimensional ratios by the following three equations:
Cl a=----Cl + C2 + Cs
(9.2)
b=
(9.3)
Cl
C2
+ C2 + Cs Cs
c=----Cl + C2 + Cs
(9.4)
where Cl, C2, and Cs reflect, in this case, the absolute amounts of the three primaries in a given color mixture. Since each of the coefficients is defined in terms of the proportion of one colored light to the sum oE all three, it is immediately clear that the sum of the coefficients must be equal to 1 when defined in this manner. Thus,
a+b+c=l
(9.S)
Another way in which the same information may be represented has been recently developed by Cornsweet (1970). Rather than using a Eormula that is specified in terms of the relative proportion oE the three primaries, he defines a system of color matching, which is specified in terms of the quantal absorption of three retinal pigments. His model is based upon the notion that a mixture of three primaries will match some other color if the number of quanta absorbed by each member of the set of three photo sensitive pigments is the same for the matched and the matching stimuli. Or expressed formally,
= Nal + Na2 + N a3
(9.6)
N bo = Nbl
(9.7)
Neo
+ Nb2 + N b3 = Nel + N e2 + Nes
(9.S)
N ao
where N, in general, refers to the number of quanta of light absorbed by one of the three photopigments. Each of the three pigments (subscripted a, b, and c) will absorb a certain number of quanta from each of the four involved lights: the unknown (subscripted 0) and the three primary colors used in the color match (each of which is subscripted I, 2, or 3, respectively). Thus, the number of quanta absorbed by the b photoreceptor when stimulated by the first of the primaries would be indicated Nbl. A light will be matched by adjusting the intensities of a set of primaries until the quantal absorption in each of the three photo pigments is equal for the unknown and the combined effect of the three matching colors.
The Neural Coding of Sensory Quality-Vision
471
Both of these formulations, one in terms of the ratio coefficients and one in terms of quantal absorption, are, of course, equivalent, and it is possible to go from one to the other easily. Each is an expression of the basic fact that we wish to emphasize here, namely, that trichromacy is the basic characteristic of normal color vision. This means that when color matching and color mixing psychophysical experiments are carried out, only three degrees of freedom are required. The tridimensionality of color space allows us to represent it in a three-dimensional spatial plot in a compressed and simple manner. There are two ways in which this can be done, one based upon the ratio coefficients, and one based upon Cornsweet's quantum absorption notions. The method based upon ratio coefficients allows us to display in a two-dimensional plot many of the three-dimensional features of color mixing and trichromatic vision. One form of this plot, usually known as the CIE [Commission Internationale de L'Eclairage (International Lighting Commission)] chromaticity diagram is illustrated in Figure 9.5. This particular chromaticity diagram is one based upon a set of monochromatic primaries with wavelengths of 460, 530, and 650 nm. A different set of primaries would produce a chromaticity diagram, which differs in shape from this one to the degree that the primaries differ from this standard set. An important practical criterion for the choice of a set of primaries is that when they are mixed in equal proportions, the mixture will produce a relatively good white. On this CIE chromaticity diagram, the horizontal coordinate represents one of the coefficients of the trichromatic equation [Equation 9.1], which specifies how much of the long radiation color (usually red) is present. The vertical coordinate, in turn, represents the coefficient of the intermediate wavelength (usually green) primary. Knowing these two numbers, the coefficient of the third short wavelength component (actually a blue) is uniquely specified, since a + b + c = 1 and no third dimension need be specified. The CIE chromaticity diagram is packed with an astounding variety of information. One of the most important subsets of data contained within it is the definition of the required combinations of the three primaries that must be used to match the spectral colors. Spectral colors, of course, are the sensations produced by the spectrum of visible monochromatic stimuli. The continuous curve shown in Figure 9.5 is the locus of the triads required to reproduce each of the spectral colors ranging from the near infrared to the near ultraviolet. This locus of points also essentially represents the outer limits of the color world, for each spectral color represents the purest or most saturated color possible. Under some conditions, such as those following chromatic adaptation, combinations of coefficients, which seemingly produce colors that lie outside of the spectral locus, can be obtained. Such combinations of color coefficients represent supersaturated colors, but the interpretation of such phenomena is equivocal. As one moves along any line connecting the locus of spectral points and the center of the chromaticity diagram, one is approaching the other end of the saturation continuum-the purest white in which no color
472
Sensory Coding
0.85
520 495
0.80
495
0.75
495 495 495 495
495
495 495
0.70 0.65
505
495 495
0.60 0.55 0.50 b
495 495
495
500 495
495
495 495 495 495 495
0.45 0.40
495
0.35 0.30
495
495 495 495
Best white area
0.25 0.20 0.15 0.10 0.05 0 0
495
495 495 495 495
495 495 495 495 495 495495495 0.10
0.20
0.30
0.40
0.50
0.60
0.70
B
FIGURE 9.5 The standard 1931 eIE chromaticity diagram plotting the a coeffi1, c is also thus uniquely decient against the b coefficient. Since a + b + c
=
fined. The circled center region approximates the most white of the whites obtained in a three-color mixture experiment. The straight Une shows a means of calculating the chromatic resultant C of the mixture of two colors, A and B (adapted from Optical Society of America, 1953).
tones are observable. Thus, one is moving from the most saturated chromatic stimulus to the most desaturated chromatic stimulus, and points lying along this line represent increasing degrees of desaturation. Two-color mixing is also simply represented on the chromaticity diagram shown in Figure 9.5. For any two colors lying anywhere within the spectral locus, the center of gravity of a straight line joining the points that represent each color is representative of the color of their combination. For these special cases, in which the two colors lie at the ends of a straight line that passes through and is centered in the central white region, the two colors are said to be complements of each other.
473
The Neural Coding of Sensory Quality-Vision 520 525 0.80 0.75
515
530 535 540
510
545 550 555
0.70 0.65
505
560
0.60 0.55
565 570
500
575
0.50 b
580 585
0.45 0.40
590
495
595 600
605
0.35 0.30
610 615 620 625 6~ 690 to 780
490
0.25 0.20 0.15
485 480
0.10 0.05 0
475 470 380 to 410 0.10
460 450 0.20
0.30
0.40 0.50 0.60 0.70 a FICURE 9.6 The system of MacAdam ellipses, which show the regions in which colors are not differentiated from one another. The dimensions of the ellipses are defined by a measure of the standard deviation of the wavelength of indiscriminable colors in a two-color discrimination experiment (from MacAdam, 1942).
In other words, when these two colors are mixed in appropriate amounts, the resultant is a white. Another interesting set oE data, which is included on some drawings oE the chromaticity diagram, is the set oE MacAdam ellipses. MacAdam (1942) had shown that there were areas on the chromaticity diagram within which the normal human observer could not distinguish any difference between the set of color mixtures that were 50 defined. MacAdam demonstrated that these areas were in the form of ellipses, in which the orientation oE the major axis varied systematically depending upon where, in the chromaticity diagram, one was working. Figure 9.6 shows one plot oE a system oE MacAdam ellipses. The sm all size of the ellipse
474
Sensory Coding
in the lower left-hand corner of the curve suggests a very great sensitivity to color differences when the slightest changes are made in the amount of red or green light. On the other hand, at the top of the chromaticity diagram, the very large ellipses suggest that, in the yellow-green region, there is a reduced sensitivity to the effects of changes in the amount of red, and even less effect on the perceived color when very substantial changes are made in the amount of green in the color mixture. In the lower right-hand corner of the chromaticity diagram are ellipses of intermediate size, indieating that the observer's sensitivity to changes in either two or three of the triads of fundamentals is intermediate. The chromaticity diagram can also be used as a starting point for discussing many other detailed rules of color mixture. It is quite obvious that any given point on the diagram can be produced by a number of different color combinations. For example, any given point on the eIE chromaticity diagram can be produced by the combination of any of the pairs of colors at the ends of any of the family of straight lines that pass through and have their center of gravity at that point. Yet, regardless of the colors of the pair of component colors that are used to produce the color represented by that point, all of the mixtures are indistinguishable. Such combination colors, of equal chromaticity, but produced by even the most widely varying components, are called metamerie matches. The experiments and data concerning color mixture are varied and extensive. Much of the data has been summarized in a set of rules generally known as Grassman's Iaws, and over the years, the rules have been extended and rephrased to reflect current theory and experiment. Perhaps the best modern statement of these rules has been summarized by Graham (1965, p. 372): 1. Any mixed color, no matter how it is composed, must have the
same appearance as the mixture of a certain saturated color with white (Heimholtz, 1866, 1924-25, vol. 2). (The wavelength corresponding to the saturated color is called the dominant wavelength.) 2. When one of the two kinds of light that are to be mixed together changes continuously, the appearance of the mixture changes continuouslyalso (Heimholtz, 1866, 1924-25), vol. 2). a. For every color there can be found another complementary or antagonistic color which, if mixed with it in the right proportion, gives white or gray, and if mixed in any other proportion, an unsaturated color of the hue of the stronger component (Titchener, 1924). b. The mixture of any two colors that are not complementaries gives an intermediate color, varying in hue with the relative amounts of the two original colors and varying in saturation with their nearrzess or remoteness in the color series (Titchener, 1924). 3. The mixture of any two combinations which match will itself
match either of the original combinations, provided that the
The Neural Coding of Sensory Quality-Vision
475
illumination of the colors remains approximately the same (Titchener, 1924). 4. The total intensity of the mixture is the sum of the intensities of the light mixed (Grassman,1854). While the CIE chromaticity diagram has been the standard spatial representation of the data of color mixture, Cornsweet's (1970) tridimensional representation presents a new alternative, which may prove to have considerable advantages in years to come. As we noted previously, Cornsweet plotted his three-dimensional graphs in terms of the number of quanta absorbed. The basic assumption underlying this approach is that all quanta that are absorbed by any of the three possible cone pigments have an equal physiological effect, regardless of the wavelength of the illuminating light. Since it uses absolute quantal absorptions, the Cornsweet plot must be a true three-dimensional one, for the quasi-two-dimensionalization inherent in the equation, a + b + c = I, is not present in this situation. Thus, the number of quanta absorbed by each of the three photopigments is not normalized and must be represented as independent degrees of freedom. Figure 9.7 shows an example of Cornsweet's method of plotting the chromatic trispace and also the locus of the spectral colors. Although the curve representing the locus of spectral colors looks, at first glance, as if it were simply a two-dimensional plot, this is merely a consequence of the fact that the absorption coefficient of the short wavelength or blue receptor substance is quite small compared to the others. It is a general result that the blue light contributes less per incident quanta to the overallluminous experience than does the medium wavelength (yellow-green) or the longer wavelength (red) sensitive receptors. Each point on the spectrallocus on Cornsweet's chromaticity diagram (Figure 9.7) is plotted by determining the relative absorption of each spectral wavelength by the three photopigments. The three coefficients of absorption are thus defined. It is then assumed that a 1000 quanta/sec of that particular wavelength compose the incident signal. It is, therefore, a simple and direct calculation to specify the three coordinates. One simply multiplies each of the three determined coefficients by 1000 to determine the number of quanta absorbed by each of the three chromatic color pigments. In Cornsweet's words, this is the " effect on system" A, B, or C. The locus of the spectral colors represents the same information as the locus of spectral colors on the CIE chromaticity diagram, and much of the other color mixture information is also inherent in Cornsweet's plot in more or less the same manner.
C. Stiles' and Wald's Increment Threshold Experiments In spite of the fact that color mixing data provide the basis for a number of very useful models of color mixing, there is, in fact, little within that entire body of knowledge that speaks to the problem of the human color receptor spectral response characteristics. The fact that almost any triad of fundamentals can be used in a color mixing experiment to produce a match for an unknown color means that there is no way to determine from the color mixing data alone anything unique about the characteristics
476
Sensory Coding
200
Effect on system C 160
120
80
40 Effect on system B
440 Effect on System A
40 80
640
120 160 200
620 600
80 660 480
80
120
500
160
200
520 540
580 560 FICURE 9.7 An alternative means of plotting the chromaticity diagram in a three-dimensional space. The three coordinates indicate the proportion of 1000
quanta, which are absorbed by the three photochemicals, respectively. Though the curve appears to be primarily in the A-B (red-green) plane, it is truly threedimensional. The distortion is due to the low level of absorbance of the blue (C) system (adapted from Cornsweet, 1970).
of the basic absorption spectra of the three types of cones. There is, however, some psychophysical data that do suggest a possible solution of this problem. Stiles (1949, 1959) has used a technique of increment thresholds in a way that is analogous to the technique used in the usual dark adaptation experiment. In this way, he was able to demonstrate a multiple segment curve similar to the rod and cone segments of the conventional dark adaptation curve. He interprets these multiple segments as a reflection of differences in the absorption spectra of the three different cone photochemicals. From these multi segment curves, and assuming a certain mathematical relation between the energy of the adapting field and the energy of the just detectable test increment, Stiles has been able to predict a set of curves that, he believes, are specifically the spectral sensitivities of the three photoreceptors. Let us consider Stiles' technique and findings in detail because of the almost unique role they playas psychophysical indicants of receptor photoabsorption characteristics. The general psychophysical technique was the detection of a small spot of light of one color on a background of an-
The Neural Coding of Sensory Quality-Vision
477
other color. The stimulus consisted of a IO-deg background adapting field of wavelength IJ- and a briefly illuminated l-deg test flash of wavelength A. The technique is rather straightforward once these conditions have been established. The amount of energy of a just perceptible test stimulus is simply measured as a function of the intensity of the background. This can be done for many combinations of the test and background colors. The constant background illumination performs an important additional function beyond merely setting the level of light adaptation. It also acts as a selective adaptor of the three photoreceptors. Thus, a red adapting light will tend to diminish the sensitivity of the red and green receptors, but leave the blue cone sensitivity relatively intact. Since the overall threshold response of the whole system is obviously a function of the pooled influences of the three receptor types, this selective adaptation will have the effect of enhancing the relative contribution of the blue in comparison to the red and green. The overall response will, thus, be biased more in the direction of the unadapted blue receptor than it was when none of the three was light-adapted. Thus, just as workers once obtained a break in the dark adaptation curve by using different levels of white light to selectively bias the response of the dark adaptation curve in a way that resulted in the rod-cone break, Stiles now does the same thing with chromatic adaptation and obtains the sort of curve that is shown in Figure 9.8a. In this particular case, a bluish test tight (476 nm) was superimposed on a yellowish green adapting light. The curve shows three distinct segments, which Stiles, in this case, has named 7T4, 7TI, and 7T3 (7T2 and 7T5 being additional segments, which occur under certain other experimental conditions). The 7TI, 7T2, and 7T3 segments obtained in these and in other related experiments are all thought by Stiles to be due to the action of the blue cone. He attributes the 7T4 segment to the action of a green cone and 7T5 to the action of a red cone. The critical next step is to go from the "adaptation" curves, of which Figure 9.8a is one example, to the individual spectral sensitivity curves of the photoreceptors. Unfortunately, this involved a number of mathematical considerations and some intervening functions. The full details of Stiles' derivation are beyond the scope of our present discussion, but fortunately he has summed them up graphically in Figure 9.8b. Assume that the three curves on the left-hand side of this figure are the three spectral absorption curves for the three photoreceptors. Assume also that there is another spectral function involved, which is displayed by the three curves in the lower right-hand corner. This second set of three curves represents the change in the background field intensity that has to be made to raise the threshold for a test flash by one log unit as a function of wavelength for each of these three receptor systems. These two spectral functions will, then, be related by the third set of "linking" functions in the right-hand quadrant of the figure. Due to the fact that this is a mixed system, the threshold in the increment experiment will be determined by the lowest value of any of these three linking functions. The bottommost envelope indicated by the combined dotted and solid line is the current example. The combined envelope is exactly the sort of curve that was obtained and displayed in Figure 9.8a.
478
Sensory Coding
-3 Fovea
Log NA (erg/sec/deg)
;\ =476 m~ (test) ~ =578 m~ (Background) -5
-6 -6
-5 -6
-6
-6
-4
-2
o
2
Log M~ (erg/sec/deg)
FICURE 9.8(a) A plot of the threshold of the test stimulus as a function of the background illumination level. The curve obtained in this experiment is a multifaceted one, in which each segment reflects so me aspect of multiple receptor mechanisms. See text for full details (trom Stiles, 1959).
By processing a large set of such adaptation threshold curves (typified by the dotted curve of the present case), Stiles was able to reason backward to the shape of the three photoreceptor curves. Although we shaH not go into the details of this analysis, we have tabulated his conclusions in Table 9.1, which shows the general characteristics of the spectral response of the three cone types as weH as the rod. While it is most important to note that Stiles' 11' functions are not identified with the individual absorption curves, it is on the basis of these 11' functions that he was able to derive one of the few reasonable sets of cone spectral sensitivities based purelyon psychophysical data. His three spectral sensitivity curves for the cones have peaks at 440 nm, 540 nm, and 575 nm. These values should be compared with the biological data presented later in this chapter. On the other hand, Wald (1964, 1966) has used Stiles' preadaptation and increment threshold technique to produce what he believes are direct psychophysical correlates of the absorption curves of the three photopigments. His technique was very much like that used by Stiles, but by selecting a set of adapting lights that included yellow (to reduce the sensitivity to long wavelength stimuli), purpIe (to reduce sensitivity to both long and short wavelength stimuli without affecting the middle of the spectrum to any great degree), and blue (to reduce sensitivity to short wavelength stimuli), he produced three curves that seemed to closely approximate the
479
The Neural Coding of Sensory Quality-Vision Log (field intensity) -6
-2
-5
-4
-3
-2
-1
= log Wj.I 0
2
3 2 -:::,,.(.
;,
= 470 lllj.I j.I = 620 mj.l ~
-3
Gi ;;:
3
;, J:.
~-4
4
c
01
0 ...J
"0
Ö
f
J:.
GI
J:.
.!:'.
"Green"
0
....
Ö
"Blue"
0
01
.2
5
-5
J:.
.!:'.
Red
01
0 ...J
700
700mj.l
(470)
-6
600
Wavelength
500 ~
400
of test stimulus
lllj.I
.... "
......
600 ~:!
"Green"
"Red"
00
500.j.g
"Blue"
-4
-3
-2
-1
0
CDQ.
> ..
2
3
400~i
Log (field intensity to raise threshold by 1 log unit)
FICURE 9.8(b) A graphical means of showing how the individual photoreceptor spectral absorbtion curves are derived from the increment threshold data shown in Figure 9.8(a) (from Stiles 1959, after Stiles 1949).
responses found with direct absorption type measurements (see below). The three response curves peaked at 430, 540, and 575, respectively, values in dose agreement with the Stiles data. This same procedure has also been used, more recently (Wooten and Wald, 1973) to show that the various types of cones are present in extremely peripheral portions of the retina, a surprising result in terms of the usually accepted notions of a color-blind retinal periphery. D. Hue Discrimination and Color Blindness One of the most basic of the psychophysical problems with which any theory of color vision must always deal is the basic nature of hue discrimination. How small is the difference in the wavelength of a given illuminant, which can be distinguished as a hue difference? The experiment necessary to ans wer this question is usually carried out using a device that allows side-by-side comparison of a test field and an adjustable comparison field. Experimental data pertaining to this problem have been repeatedly obtained over the years by several different investigators. The major interesting feature of the data is that it is a very irregularly shaped curve. Figure 9.9 presents what is now considered to be the dassic plot of these data-a summary of several experiments by Judd (1932). The w-
480
Sensory Coding
TABLE 9.1 THE RELATIONSHIP BETWEEN STILES' "tr" COMPONENTS OF THE ADAPTATION CURVE AND THE FOUR PHOTORECEPTOR MECHANISMS"
Mechanism
Symbol
Rod ........
1T O
Absent at the fovea
503
1T O
at approx. 2.6 log units
440
1T 1T2O
at approx. 1 log unit
1T O
at approx. 4.0 log units
"Blue" co ne ...
"Green" cone .. "Red" cone ...
1T O 1T O
Remarks
Wavelength of Maximal Sensitivity (nm)
? (Between 440 and 480 m/l) 440 540 575 (very flat max.)
.. This table also indicates the peak sensitivities of the spectral absorptions of the four photoreceptors in the human eye (adapted from Stiles, 1959).
shaped curve is seen to exhibit its highest sensitivity-a minimum detectable hue difference associated with a wavelength shift of less than 1 nm-at wavelengths of the incident light of about 480- and S80-nm. The lack of an ability to discriminate colors in this normal manner is known as color blindness. A more precise definition, necessitated by the many different kinds of color blindness, is based on the number of degrees of freedom required by a subject to match all the colors in the space defined by either CIE or the Cornsweet chromaticity diagram. Normal trichromatic vision, as we have seen, requires the presence of three independently controllable primary colors in the color mixture for the perfect matching of all other colors. Various forms of weak or anomalous trichromatic color vision have been observed. Subjects may be protanomalous (with a lowered sensitivity to long or red wavelengths) or deuteranomalous (in which the subject appears to have lowered sensitivity to the median or yellow-green wavelengths). In both of these forms of anomalous trichromatism, the subject is still required to use three colors to match all sampies of the chromatic visual space. Wide deviances are found, however, among the amounts of the three primaries, which are required for the matching of a given color by an anomalous and anormal trichromatic subject. A protanomalous trichromat will tend to use far more than normal amounts of red for his matches, for example. A deuteranomalous trichromat will use more yellow-green than normal for his color matches. On the other hand, there also exists a group of people with deficient
481
The Neural Coding of Sensory Quality-Vision
7
Koenig Dieterici Koenig Exner Steindler Steindler Uhthoff Jones trom empirical law
6 E c:
.E 5 GI CJ
c:
~
GI
:; 4 '0
...
.t:
CI
c:
~
.,
~
3
:s:
2 2
450
500
550
600
650
Wavelength in nm
summary of several experiments measuring the differential threshold for changes in stimulus wavelength as a function of wavelength (from Judd,
FIGURE 9.9 A
1932).
color vision such that only two primaries are required for matching any other color in the chromaticity space. People with such a deficiency are caHed dichromats. As was the case with anomalous trichromats, there are several distinguishable types of dichromatism. Subjects may exhibit behavior that classifies them either as persons suffering from dichromatic protanopia, deuteranopia, tritanopia. Protanopes are incapable of responding adequately to reds, deuteranopes to yellow-greens, and the very rare tritanopes to the blues. As with anomalous trichomats, deficiencies in other psychophysical tests such as hue discrimination are also exhibited by dichromats as weH as the color mixing deficiencies. Hue discrimination in dichromats may be 10 times as insensitive as normal trichromats (that is, a difference in wavelength 10 times normal is required). This is an important factor in diagnosing the specific neural basis of each condition as weH as the nature of the chromaticity confusion. On the other hand, the specific nature of the changes in the luminosity curves of dichromats has been a matter of considerable debate over the years. Protanopes seem to generaHy exhibit a diminished sensitivity at the red end of the spectrum, deuteranopes in the green region, and tritanopes seem to have a slight diminution in their luminosity curves at the blue end of the spectrum. However, the complexities of the changes in the luminosity curve, both in relative amplitude and in the degree to which the entire
482
Sensory Coding
luminosity curve is shifted among diehromats, are very involved matters and will not be dealt with further in this chapter. Sufferers from the most extreme form of color blindness, monochromats, are, for an practieal purposes, incapable of discriminating among hues on any basis other than brightness difference. Thus, any single color can be used to match an other colors anywhere in the chromaticity space. There are two types of monochromat that can be distinguished on the basis of their luminosity curves. One type displays a photo pie luminosity curve, whieh is indistinguishable from the scotopie luminosity curve. The other type displays a photopie luminosity curve, whieh is midway between the normal scotopie and the normal photopie luminosity curves. It has been suggested that the first type of luminosity curve is produced by a retina with no cones at an, while the second by a retina with only one of the three normal types of cones present. For this reason, the term rodmonochromat has been used as adescriptor of the former type, and the term cone-monochromat as adescriptor of the second type. For purposes of color mixture, both types still act as if a single color could match all other colors. Both kinds of monochromats also typically show many other kinds of visual diffieulties related to their abnormal retinal. It is not known whether there are three different kinds of cone-monochromats-a possibility whieh is suggested by the fact that any one of the normal three types of cones might be present alone in this system. The relative rarity of this type of color blindness makes any such hypothesis extremely speculative, and we shall have to wait until a sufficiently large sampie of monochromats have been studied to definitely answer that question. E. "Fundamental Yellow," Complementary and Paired Colors, and Neutral Lod As one scans the literature of color vision, there constantly recurs a most interesting statement. This statement is that there is something special about yellow, so that it is not "perceived" in psychologieal exepriments as a mixture of other colors as are such colors as greenish-blue or yellowish-red (orange). Rather, some workers in the Held consider yellow to be "psychologically" just as primary as red, green, and blue. This statement is diffieult to interpret, for at first glance it is not exactly c1ear why other color naming situations might not lead to a wide variety of other so-called psychologieally fundamental colors. The idea of a "perceptual analysis" of mixed colors, such as yellowish-green into their constituent yellows and greens, is also a notion that is not completely c1ear in light of the general nature of synthetie color mixture and metamerie matches. Many authors have used the idea of a "primary yellow" as the starting point in their arguments for opponent color theories of color vision (see below), but this point is probably a misunderstanding of the basic idea of the trichromatic data we have already discussed. The basic fact of color mixture is that any color can be produced by appropriate mixtures of three primaries, and at least from that phenomenological point of view, there is no apriori way to convincingly argue that any particular color is more fundamental than any other. In spite of the confusion regarding the definition and existence of a
The Neural Coding of Sensory Quality-Vision
483
"fundamental yeIlow," there is a certain amount of hard data that do suggest that there are links between certain color pairs such that they either operate together or in opposition. This sort of data is more compelling than the vaguely defined "fundamental yeIlow." One of the most important of these pieces of evidence, suggestive of linked processes, is found on the chromaticity diagrams themselves. Complementary colors are defined as those pairs of colors which, when mixed in appropriate amounts, produce a completely colorless or unsaturated white light. Complementary colors are represented on the diagram as the colors at the ends of any straight line whose center of gravity lies in the central white region. Thus, there are many colors that tend to cancel out the chromaticity of a linked partner and that, therefore, presumably may be linked at some physiological or anatomicallevel. A number of other ways in which pairs of colors seemed to be linked have been summarized by Hurvich and Jameson:
How can this system of three independent processes be made to account, for example, for the apparent linkages that seem to occur between specific pairs of colors as either the stimulus conditions or the conditions of the human observer are varied? Why should the red and green hues in the spectrum predominate at low stimulus levels, and the yellow and blue hue components increase concomitantly as the spectrum is increased in luminance (von Bezold, 1873)? Why, as a stimulus size is greatly decreased, should discrimination between yellow and blue hues become progessively worse than that between red and green (Farnsworth, 1955 i Hartridge, 1949)? Why should the hues drop out in pairs in instances of congenital color defect or when the visual system is impaired by disease (ludd, 1949; Kollner, 1912.)? (HURVICH and lA MESON, 1957, pp. 384-385)
It has also been noted (Unkz, 1964), as weIl as by Hurvich and Jameson (1957), that it is impossible to conceive of or to find words to describe a reddish-green hue or a yellowish-blue hue. The difficulty in finding a color for which we could use such color names, they believe, reflects the fact that the relationship between blue and yellow, on the one hand, is different than that between yellow and red, on the other. But this sort of data also suffers from the same difficulties as does the distinction of yellow as a primary color-color names are based on word usage and subjective judgments that are peculiarly elusive when one attempts to precisely define the operations involved in their elicitation. A more compelling set of data has been developed by Jameson and Hurvich (1955) in their attempt to develop a quantitative opponent color theory. Noting that a relatively wide range of spectral hues produces a sensation of yellowishness, blueness, greenness, and redness, they attempted to determine the relative strength of each of these sensory experiences by mixing in with each hue-inducing wavelength varying amounts of a postulated opponent color until all traces of the original color disappeared. T~us, for example, a band of stimulus wavelengths varying from
484
Sensory Coding
about 500 to 700 nm would produce color responses that were reported by the subject to have some amount of yellowish tone. Various amounts of blue light would then be mixed with each of aseries of wavelengths within this band, and the amount of blue required to completely cancel any "yellowishness" measured as an indicator of the strength or chromatic valence of the yellow response at each wavelength. Figure 9.10 shows a sampie set of cancellation data for the visible spectrum, at various pI aces in its course, that are capable of eliciting some red, yellow, green or blue experience. The data are plotted in terms of the amount of the opponent color that had to be added at each wavelength to eliminate any residual "redness, yellowness, greenness or blueness." On this graph, it can be seen that the maximum amount of blue required to cancel yellowishness from one band varying from 490 nm to 650 nm was at ab out 530 nm and that the function dropped off on both sides. A wavelength band varying from 480 nm to 580 nm elicited some green experience, which had to be neutralized with a red, the largest amount of which was required also at about 520 nm. The fact that a single wavelength should produce some green and some yellow should not be too surprising. There is a range of wavelengths whose color names include greenish-yellow and yellowish-green, for example. When adequate amounts of red had been introduced to completely neutralize the green, the residual color would be a yellow. When adequate amounts of blue had been introduced to completely neutralize the yellow, the residual color would be a green. The curve also shows that the band of wavelengths that induces blue color tints runs from 430 nm to 480 nm. These colors had to be neutralized with yellows, and maximal amounts of yellow were required at about 450 nm. The red curve, on the other hand, is somewhat peculiar because reddishness is an experience that is introduced by both long and short wavelengths. At the shorter wavelengths, the experience of purpie includes tints that most people describe as including some reddishness. At the longer wavelengths, the sensations are color named reds and oranges. To remove all of the reddishness from a short wavelength from about 400 to 470 nm, green light had to be added, peaking in the amount required for neutralization at about 440 nm. At the longer wavelengths, the range of red-inducing stimuli was ab out 580 to 700 nm, and the peak amount of green required for neutralization occurred at about 620 nm. It should be noted that there are a number of behavioral difficulties with the Jameson and Hurvich neutralization procedure. First of all, it was both necessary and difficult to define, for each of these basic colors, exactly what the bandwidth of spectral wavelengths is that evokes the particular sensation. Secondly, upon mixing the postulated opponent color in with the chosen wavelength, it was not always the case that the mixture turned out to be white. It often became an unsaturated version of one of the other opponent pair. Thus, blue and a yellowish tone could be mixed together, and the observer might be faced with deciding whether there was any yellow in a resulting green or red field. These, then, are a few of the psychophysical da ta of the sort on which the various theories of color quality coding are base.d. In the
485
The Neural Co ding of Sensory Quality-Vision
+1.00
+0.75
+0.50
51'" c:
&
e'"
+0.25
(ö
::l
'>'"
~ .;; 10
a;
0.00
0::
-0.25
-0.50
Blue Yellow
Red Green
-0.75
White (photopie response) -1.00
400
500
600
700
Wavelength (nm)
HGURE 9.10 Chromatic response functions for a single subject, indicating how much opponent color must be added to eliminate the residual chromatic effect of the colors indicated. The photopic luminosity curve is also shown for this subject (trom lameson and Hurvich, 1955).
following sections, we shall discuss the alternative theories, the biological evidence that constrains these theories, and finally make a statement of wh at is believed to be the most widely accepted contemporary model.
III. THE COMPETITIVE THEORIES As one reviews the enormous literature on color theory, it quickly becomes clear that no brief review or summary is going to be comprehensive enough to do justice to all of the distinguished workers who have contributed to this field. It is also clear, however, that much of the intellectual effort expended on the controversies in this field was aimed at models that were
486
Sensory Coding
totally descriptive and unrelated to the coding problem. This was necessitated by the fact that, until the advent of modern neurophysiology, theorists were forced to argue only from the findings of psychophysical experiments to the underlying neural codes. As in so many other instances involving a "black box" approach, this is a procedure which has often been fruitless, since so many alternative proposed internal mechanisms can give rise to the same external behavior. In the following discussion, we shall concentrate our attention on those aspects of the competitive theories that contain specific neural hypotheses and ignore those that are formulated only in terms of mathematical theories of color space. A major physiological issue has always been the nature of the trichromatic fundamentals and the ways in which they might be added together. In particular, the question was repeatedly asked: were the absorption spectra of the primaries closely spaced or widely dispersed? From the psychophysical data, many theorists attempted to reason back to the specific shape and number of the response curves of the group of primary colors. In the following section, this will be one of the main themes of our discussion. A. The Trichromatic Theories The basic nature of trichromatic color mixture was recognized relatively early, and by 1801 Young was able to state a visual theory, which postulated specifically that color vision is media ted by three retinal receptor systems. A major premise of his theory, which has proved to be most durable, is that the ratio of activation of the three systems is the key cue for the various color sensations. Young postulated specifically that there are three, and only three, different chromatic systems in the retina, and that each has a different spectral absorption curve. It is important to note, however, that there is nothing fixed in the various trichomatic theories about the nature of the absorption curves of the three systems. Young, Heimholtz, and a host of others who have followed, for many indirect and sometimes misleading reasons, suggested one after another triad of primaries that conceivably could serve to explain color vision effects. Indeed, the empirical fact that almost any triad allows color matching of any spectral hue (given the only exception that one be allowed to mix one of the three with the to-be-matched color) prohibited the direct specification of the nature of the triads from psychophysical evidence. Some workers, notably Young hirns elf as well as Helmholtz (1924-25), König and Dieterici (1893), and Thomson and Wright (1953) among others, have suggested and developed theories on the basis of absorption spectra that are relatively broad and very widely spaced. Figure 9.11 (a) and (b) show two sets of fundamentals as proposed by some of these workers. On the other hand, over the years some investigators have suggested that the spectral absorption curves of the cones are very narrowly spaced. For example, Figure 9.11 (c) shows one of the sets used by Hecht (1934) in his theory of a mathematical color space. An important conceptual point is that the trichromatic theory is, in fact, only a theory of the receptors. Few of the proponents of the trichro-
487
The Neural Coding of Sensory Quality-Vision
matic theory over the years have ever said anything about any higher level of neural coding. The sole biological assumption of a11 trichromatic theories concerns the absorption spectra of the receptor photochemicals. What, then, are the basic ideas of trichromatic theory? First, it is postulated that color perception is mediated by the cones, specialized photoreceptors found mainly within or ne ar the fovea of the retina. Second, it is asserted that there are specifica11y three kinds of cones, no more and no less. Third, the outer segment of each type of receptor cone is presumed to be filled with a photochemical that differs in its absorption spectrum from that of the other two. Fourth, on the basis of differences among the three receptors in absorption coefficients for the same wavelength, it is thought that relatively different amounts of neural activity are generated in each by different colored lights. Fifth and fina11y, there is a psychobiological coding assumption that on the basis of this difference in the relative rates of activity in different neural pathways or groups of afferent neurons, different chromatic sensations occur. The important point to note in this brief summary of the trichromatic theories is that from their first formulation by Young to the most modern treatment, they have a11 been restricted to a very limited kind of physiolo10
B
8 ~
:~
'iij
c
CI>
6
G
'"
...u
~
R
CI>
Q.
'" 4 ~
';:;
'"
a;
0:
2
o
400
450
500
550
600
650
700
Wavelength (nm)
(a)
HG URE 9,11 (a) The three fundamental spectral sensitivity curves according to the original Young and Helmholtz trichromatic theory (from ludd, 1951). (b) The three fundamental spectral sensitivity curves according to Koenig and Ladd-Franklin theories (from ludd, 1951). (c) The three fundamental spectral sensitivity curves according to Hecht (1934). HGURE 9.11 (Cont.)
488
Sensory Coding 9
B
8 7
...>
'S
';:;
6
'iij
c:
G
Q) '" 5 c;;
...
R
~
t,)
8. '"
4
~
.,
';:;
Gi
er
3
2 2
o
400
450
500
550
600
650
700
Wavelength (nm)
(b)
...>
'S;
B
';:;
. ... .
'iij
c: Q)
G
~
R
t,) Q)
a. Q)
.,>
';:;
Gi
er
400
500
600 Wavelength (nm)
(c)
700
The Neural Coding of Sensory Quality-Vision
489
glzmg. The main biological assumption, common to all of the theories, was that the retinal receptors were of three kinds. All theories imply that relative amounts of activity in adjacent neural pathways are the key code for quality, but the details are rarely spelled out. The major controversy among those who championed one or another form of the trichromatic theory was the breadth and overlap of the spectral absorption curves of the three primary colors, yet little' empirical psychophysical evidence (see our previous discussion of Stiles' work as one extraordinary example) could be developed that uniquely supported any particular triad or degree of overlap. The empirical data only said that three independent manipulations of the color mixture were necessary to match any other color, but contributed nothing to the choice of any particular triad. Thus, as one compares the various trichromatic theories of color vision that appeared over the years, it can be seen that, for the most part, they were nonphysiological, once having passed the assumption of a specific triad of absorption curves. Helmholtz's (1892) original treatment of hue discrimination on the basis of trichromatic receptors introduced the notion of a mathematical color space-in other words, a mathematical description of the color that was perceived on the basis of activation of the presumed set of three primaries. This notion of a color space and the orientation toward a theory that was primarily a mathematical description of color vision, was continued by Hecht (1930) and by Stiles (1946), and both of their descriptions are also marked by a remarkable absence of physiological detail or speculation about wh at is going on beyond the receptors. B. The Opponent Color Theory Hering (1878), influenced mostly by considerations of the psychological nature of perceived primary colors, feIt that the notion of three-color receptors, all of which were essentially dormant until stimulated, was an inadequate model. Noting the linkages of various kinds that seemed to occur among various colors, he proposed the first of what is now known as the opponent color theories. Unfortunately, controversy and debate over the years have obscured a very important fact, namely, that the original version of the Hering opponent color theory with its four primary sensory colors (red, green, blue, and yellow) was also a trichromatic theory at the retinal photochemical level, and if carefully analyzed, it can be seen that Hering's original formulation really represents nothing more than an alternative set of the three retinal receptor absorption curves. He assumed the existence of a blue-yellow receptor as well as a red-green one and a blackwhite one. An important difference in the details of his theory, however, was that he assumed that luminous stimulation could not only lead to an increase in the amount of neural activity elicited from visual receptors, but in some situations could lead to a reduction. For example, in his hypothetical red-green unit, red increased and green decreased the base level of neural activity. In other words, Hering's theory deeply involves the notion of spontaneous ongoing neural activity in the absence of a stimulus, which could be inhibited by lights of particular wavelengths or enhanced by other wavelengths. Not withstanding this new twist, it is often overlooked that Her-
490
Sensory Coding
ing's theory is also a trichromatic one in that it also assurnes that there were only three different types of photoreceptor pigments in the retina. However, rather than receptors with maximal monophasic excitatory responses in the red, green, or blue portions of the spectrum, he assumed that two contained a substance that was capable of being either broken down or regenerated, depending upon the color of the incident light. He further assumed that one of the three photochemicals was generally sensitive to either darkness or light. Hering assumed that this substance was broken down by stimulation with any wavelength light stimulus and regenerated in the dark, thus, he thought, explaining some of the information about light and dark adaptation. Another substance was sensitive to both red and green lights. In red light the substance "degenerated," but it was regenerated in green light. The third member of the triad was regenerated by blue, but degenerated under the influence of yellow light. The breakdown and regeneration of the three photopigments meant that the absorption curves of at least two had to have both negative and positive values. Figure 9.12 shows the generation-degeneration absorption curve suggested by the Hering theory. It is important to note that where the curves cross the horizontal axis, it was assumed by Hering that there was no stimulated neural activity. Thus, a blue light of about 475 nm produced no action in the redgreen system, even though it substantially increased the amount of neural activity in the blue-yellow system due to the breakdown of the blue-yellow receptor substance. Similarly, there is a point on the spectrum (about 500 nm) that strongly regenerates the red-green substance, producing, presumably, a sensation of pure green, but with no yellow or blue tones. Similarly, at 575 nm, it is assumed that a pure yellow is produced without any red or green tones. At intermediate points, the colors produced are mixtur es of two of the four colors of the two opponent pairs. The similarity of these curves to the Hurvich and Jameson psychophysical data should be noted, but also the differences. In sum, then, the original Hering opponent color model was also a theory whose main biological assumption concerned the nature of the photochemicals in the peripher al receptor. Like the c1assic trichromatic theories, it assumed three and only three photochemicals, but said nothing at all ab out the nature of coding at high er levels. Hering's great contribution was to introduce the notion of opponent mechanisms into color vision discussions. IV. THE BIOLOGICAL DATA
There is only one way to resolve many of the controversial issues concerning the neural and photochemical basis of color vision, and this is to make direct measurements of the necessary parameters from the point of view of chemistry, physics, and physiology at the various levels in the ascending pathway. To determine the nature of the true primary absorption curves of the photochemicals as weIl as the actual number of different types, chemical and optical studies of the wavelength-dependent response of each must be carried out. To determine whether the trichromatic notion of three overlapping excitatory response systems or the opponent hypothesis with both
491
The Neural Coding oE Sensory Quality-Vision
+10
W
+8 +6
8
...>-
'> ';:; +4
R
'in
c: Q)
'" +2
R
~u
Q)
a.
'"Q)
>
0
';:;
'"
äi
a: -2 y
-4
-6
400
G
500
600
700
Wavelength (nm)
nCURE 9.12 The three photoreceptor absorption curves of the Hering opponent color theory, showing that both excitation and inhibition of spontaneaus activity (unlike the solely excitatory processes of Figure 9.11) were thought to occur specifically in the three types of photoreceptor. The early Hering theory proposed that the R receptor was excited at both ends of the visual spectrum. The reader is cautioned to carefully note the diflerences between this set of hypothetical absorbtion curves and the superficially similar psychophysical data of Figure 9.10 (tram ludd, 1951).
excitatory and inhibitory processes holds at any given level of the nervous system, electrophysiologieal experiments must be carried out. It is the purpose of this secHon to specify the direct biologie al correlates of stimulus wavelength at each level of the ascending visual pathway. A. The Photoreceptor 1. Rushton' 5 Reflection Densitometry. One of the first direct optical mea-
surements of the characteristies of cone pigments was reported by Rushton
(1958), using a deviee that he had developed a few years earlier (Rushton, 1956). Rushton's deviee was essenHally a very large automatie opthalmoscope as shown in Figure 9.13. The technique depends upon the fact that the
pigments of the dark-adapted eye tend to absorb light, while a light-
492
Sensory eoeling
FlCURE 9.13 A photograph of Alpern's modification of the Rushton reflection densitometer, showing part of the optical chain, part of the electronics, and the subject in position (courtesy of Dr. Matthew Alpern, University of Michigan).
adapted eye reflects more of the incident light back from the rear of the eye -less light being absorbed by the bleached than by the unbleached pigment. The amount of light reflected back at the various wavelengths of the visual spectrum was determined first in the dark-adapted and then during the dark interval of a flickering adapting light. The difference between the two spectral absorption curves (the so-called difference spectrum), Rushton initially feIt, directly reflected the absorption characteristics of the photoreceptor pigments. Rushton's technique required several successive steps, which we should clarify by spelling them out in detail. First, an absorption spectrum was determined by measuring the reflected light from the dark-adapted eye. Then a second absorption curve was determined following bleaching with red light, for example. The second absorption curve was then subtracted from the first to give the difference spectrum. Then after all of the red-sensitive pigment had presumably been removed in this manner, a second bleaching was carried out, using a strong white light and a second difference spectrum calculated by subtracting a third absorption curve from the first. A typical pair of these two difference spectra is shown in Figure 9.14. The key datum is that the peak of the two difference curves has been shifted by the successive bleaches. Rushton's early interpretation of these data was that the two difference curves predominaritly reflected the absorption spectra of two different pigments, the first mainly that of a yellowish-orange-sensitive pigment with a peak absorption at about 590 nm, and the second that of a yellow-greensensitive pigment with a peak absorption at about 540 nm. Pointing out that there was no practical way to measure the blue curve (H there indeed was one), Rushton (1958) concluded that he only had direct evidence of the absorption spectra of two of the co ne pigments. In 1959, Weale used a similar technique and obtained similar data. However, sometime later, both Rushton (1964) and Ripps and Weale (1964) reinterpreted the meaning of these difference spectra and stated that they now feIt that the retinal densitometry techniques do not reflect accurately the individual absorption
The Neural Coding of Sensory Quality-Vision
493
.04
....;;;
.03
>
C
GI
'0 GI
> .;;
.02
'"
äi
a::
.01
.01
500
550
550
650
Wavelength (nm)
FIGURE 9.14 Difference spectra originally believed to be of two of the normal cone pigments of the normal eye. These data are no longer believed to have that significance, but rather to be a more complicated mixture of cone response characteristics. Originally, Rushton believed that the right hand curve (tracing the course of points marked with 0 and .) reflected the absorbtion of a long wavelength sensitive pigment and the left hand curve (tracing the course of the rectangles) reflected the absorbtion of a medium wavelength sensitive pigment (from Rushton, 1958).
spectra of any of the classes of the foveal cones when more than one pigment is present. In the special situation in which one of the two measurable pigments may be missing, it is probable, they believe, that valid estimates of the remaining system can be made, since any influence of the blue or short wavelength system seems to be undetectable with this technique. But, for mixtures of pigments, the results are now considered to be equivocal. Retinal densitometry is now used mainly as a me ans of tracing the temporal course of dark adaptation rather than as a means of specifying absorption spectra (Rushton, 1962, 1964). Brown and Wald (1964) have also questioned the validity of the techniques of reflection densitometry as a means of specifying the spectral absorption curves of cone pigments. 2. Direct Microspectrophotometric Studies of Single Cones. In 1964 two
remarkable papers, which will have the most important implications for all theories involving cone absorption spectra, were published. As no ted, one of the main issues of controversy in color vision theories up to recent years has been the specification of the fundamental triad of primary colors, which were assumed to exist in the retina. All of this controversy, however, could be laid to rest if an extremely difficult technical tour de force could be executed, namely, the determination of the absorption spectra of individual
494
Sensory Coding
cones in the human retina. This ideal experiment was actually carried out nearly simultaneously by two groups (Marks, Dobelle, and MacNichol, 1964, and Brown and Wald, 1964), using newly developed and nearly identical techniques of microspectrophotometric measurement. Briefly, the procedure involves the passage of a tiny beam of light, demagnified by being passed through an inverted microscope, through a single cone. The light absorbed by the cone could be determined by comparing this tiny beam of light with another light beam that did not pass through any equivalent absorbing material. This arrangement is shown in Figure 9.15. A measurement of the difference in the energy of the two beams was made at many monochromatic wavelengths across the visual spectrum. It should be no ted, however (and this point has been especially prominent in a criticism of these data by Sheppard, 1968), that only a very few cells had been sampled in this most difficult technical experiment. In fact, Brown and Wald's entire paper is based on one rod and only four cones from the extracted retina of a human cadaver, and Marks, Dobelle, and MacNichol report only the absorption curves of seven monkey cones and two human cones. Nevertheless, the agreement in their data is remarkable. Figure 9.16 shows the absorption difference spectra of the four cones measured by Brown and Wald. One peaks at 450 nm, two at 525 nm, and the fourth at 555 nm. Marks, Dobelle, and MacNichol's data, as replotted in Figure 9.17, show three groups peaking, they state, at 445 nm, 535 nm, and 570 nm, respectively. Hopefully, if these data are substantiated and do not turn out to be sampling errors produced either accidentally or by virtue of the fact that cells with absorption spectra that fall into these particular categories are, for some reason, more likely to be sampled than others, then much of the classical controversy will be laid to rest. The major significance of this work lies in the fact that they directly and definitely specify the width and peak sensitivity of the absorption spectra of the primate color primaries. Of special interest to theories of quality coding is the fact that there are points on the spectrum at which all three curves overlap, and, thus, a stimulating light at certain wavelengths will stimulate all three receptor systems to some degree. Unfortunately, although nearly a decade has passed since the original reports, there have been no replications or extensions of these findings reported in the literature. The presumption is that they do correctly reflect the grouping of three visual photopigments for primates, but rigorousness demands that at least a note of caution be mentioned at this point. More extensive data for the cones of the goldfish have, however, been reported by Marks (1965). One hundred thirteen separate runs were made on single cones, using the same sort of microspectrophotometric measuring procedure described above. Figure 9.18 is a histogram, showing the peak sensitivities measured in this group of spectral response determinations. Clearly, in this case, there is also a clustering of the peaks into just three main groups centered at 455 nm, 530 nm, and 625 nm. These findings are quite comparable to the primate data, although the right group seems to peak at a longer wavelength in the goldfish than in the primate.
495
The Neural Coding of Sensory Quality-Vision Photodetectors
Beam 1
Equal energy monochromator
Recorder
Beam2 50%-50% Beam splitter
Inverted microscopes
Specimen
Difference voltage
nCURE 9.15 Diagram of the components of a recording microspectrophotometer. The output beam of a constant energy monochromater, is split by a half
silvered mirror. Both half beams are demagnified by inverted microscopes. Beam 1 then passes through the specimen, while Beam 2, the reference beam, does not. The differences in the balanced output of a pair of matched photodetectors is, then, a direct measure of the absorbance of the specimen as the wavelength of the illuminating light is scanned.
3. Electroretinographic Studies. The electroretinogram, the long-Iasting
graded potential produced by the retina when it is stimulated with light, can also provide a means of indirectly estimating the absorption characteristics of visual cones. This technique has been applied both to human and infrahuman preparations in a number of different ways. Witkovsky (1968) and Burkhardt (1968), in a pair of articles published simultaneously, both examined the effects of chromatic adaptation on the electroretinogram of the carp and goldfish, respectively. While different components of the electroretinogram responded in somewhat different manners, all of their data were consistent with the notion that adaptation with different colored lights and measurement of the b wave of the ERG produced a very few residual response curves with different peak sensitivities. Both of the authors considered this to be indicative of the fact that the photoreceptors were of a small number of different types and that each type was filled with chemicals that had their peak absorption at relatively widely spaced points. Although, as we noted in our chapter on transduction, the spectral absorption curves of different animals will differ slightly because of the species-specific details of the opsin, it is interesting to compare the peak absorptions suggested by the data from these animals with those of the previous seetion on primates. Witkovsky concluded that the residual ERG action spectra following chromatic adaptation obtained from the carp reflected the effects of four pigments with absorptions peaking at 482 nm, 517 nm, and 660 nm, respectively. Burkhardt, however, working with the goldfish, was only able to specify the peak sensitivity of the blue and red receptor system that appeared to peak at 450 nm and 620 nm, respectively. The electroretinogram has also been used with human subjects by Riggs, Johnson, and Schick (1966) in an ingenious manner. A stimulus pattern consisting of alternating bars of different color lights was presented to the ob server. The intensities of the two lights were adjusted to produce equal electroretinographic responses. However, when the pattern was displaced, the chromatic change induced an electroretinographic response, which was presumably independent of the luminosity of the stimuli. The
496
Sensory Coding
Difference spectra of single human cones
0.02
CI)
u
c:
cu
. .c .c ... 0
....0cu
0.01
CI)
Cl
c:
cu
.l:
U
o
400
500
600
Wavelength (nm)
HGURE 9.16 The difference spectra, following bleaching with yellow light, of four human cones, measured by a direct microspectrophotometric procedure. One code peaks in the blue, two in the yellow-green, and one in the red. The color names "blue," "red," etc. are used here and elsewhere in our discussion of the biological data as a shorthand for the wavelength of the photic stimulus to which the receptor is most sensitive (from Brown and Wald, 1964).
major factor determining the amplitude of this evoked response was the absolute difference in wavelength between the two colored lights. Riggs and his colleagues interpreted their data to be consistent with a trichromatic receptor system in the retina, but were scrupulously careful in noting that it was not possible to calculate specific spectral absorption curves on the basis of these data without making certain assumptions which, they feIt, could not be justified on the basis of their results.
Electrophysiological Recordings trom Single Cones. Another type of experiment, which can be used to determine the nature of the cone absorption spectra, is one in which the neural responses of individual photoreceptive elements are investigated. In this manner, the problem of analyzing the 4.
497
The Neural Coding of Sensory Quality-Vision 535
570
Relative absorbance
445
450
550
650
Wavelength (nm)
HG URE 9.17 Absorption spectra for 10 primate cones, showing the clustering of the responses into three different groups peaking in the blue, yellow-green, and red portions of the spectrum (from Marks, Dobelle, and MacNichol, 1964).
complex integrated responses of a number of intermixed photoreceptor types with overlapping sensitivities is avoided. Because of the very small size of the vertebrate cone, such an experiment is technically extremely difficult, and its successful execution would be a tour de force of the highest order. Yet the task of inserting a microelectrode into a single co ne has been accomplished on a number of different species. The first group to do 50 used the carp as their experimental animal (Oikawa, Ogawa, and Motokawa, 1959). However, we shall discuss a more recent study by Tomita, Kaneko, Murakami, and Pautler (1967), who also used the carp, but whose advances in technique allowed the accumulation of a much higher quality data base. The success of a complex experiment, such as the one to be described here, often depends upon rather curious, and at first glance trivial, changes
Sensory Coding
498
15
:tlc
...8o 10 ~E :::l
5
Z
400
450
500
550
600
650
Wavelength (nm)
FIGURE 9.18 A histogram showing the peaks of the absorbance spectra of 113 goldfish cones, displaying the striking clustering into three different groups with sensitivities in the blue, yellow-green, and red portions of the spectrum (from Marks, 1965).
in procedure. Two important, though smalI, procedural developments were of special consequence in this study. Because the cone in the carp retina is so smalI, an ultramicroscopic capillary electrode was required to avoid massive destruction as the electrode penetrated the cell wall. Tomita and his group (Tomita, Murakami, Hashimoto, and Sasaki, 1961) had developed techniques for drawing out glass microelectrodes with points as fine as 0.1 p.,. These electrodes, however, were still not sufficiently small to easily penetrate the cone cell membrane when they were simply advanced by a micromanipulator. The cell wall was sufficiently elastic so that the slowly moving microelectrode simply stretched the membrane under the low level of accelerative forces so generated. Tomita and his colleagues, therefore, invented a device that "jolted" the stage on which the retina was mounted as the electrode was advanced. The high er accelerations produced by this delicate jolting were sufficient to overcome the elasticity of the cell wall, and the electrode then penetrated into the interior of the cone. With this instrumentation, they were able to record a graded dc hyperpolarizing potential when the cone was stimulated by light. This potential was presumably the receptor potential itself and varied in amplitude depending upon the wavelength of the stimulating light. Incidentally, as no ted in Chapter 4, it is now believed that all vertebrate visual receptor potentials are hyperpolarizations, suggesting that the effect of the breakdown products of the photochemical may be to reduce the membrane permeability rather than to increase it. This is confirmed by other experiments in which constant electrical currents are passed through the photoreceptor. Voltage measurements made during the period the stimulating light is on show a relatively large increase over those obtained during the dark period. A simple application of Ohm's laws dictates that this must be the result of increased membrane resistance, which is the same thing as decreased permeability. In invertebrates, most analogous receptor potentials appear
The Neural Coding of Sensory Quality-Vision
499
(al 500500
FICURE 9.19 Craded poten- (bI tials recorded from three different classes of carp cones exemplifying peak sensitivities in the blue, yellow-green, and red portions of the visual spectrum. The responses in this figure, it must be clearly (cl noted, are not spikes, but are brief graded potentials produced by stimulating the cones with brief, though equal energy spectral pulses. Also note that the response to light is a hyperpolarization in all of these cases' down ward being negative (from Tomita, Kaneko, Murakami, and Pautler, 1967).
400
500
600
700
Wavelength (nm)
to be depolarizing and probably result from increases in membrane permeability. The exact si te of entry of the mieroelectrode would, of course, be of some interest. The very small cross section of the outer segment of the cone makes it unlikely that Tomita's group had actually entered that part of the cello Indeed, using an electrophoretie dye injection technique, Kaneko and Hashimoto (1967) subsequently showed that the electrode in these experiments was usually located in the inner segment of the carp's cone. An automatie monochromator was used by Tomita and his colleagues to produce visual stimuli at wavelength intervals of 20 nm. The amplitude of the evoked receptor potential was recorded photographieally for each wavelength interval from the face of a CRT, whieh had its x axis coupled to the wavelength control of the monochromator. The automaticity of the device was very important because even with these fine electrodes, a cone could be held active only for a few minutes. In 40 sec, an entire spectral response curve could be determined for an impaled cell ranging over the spectrum from 400 to 740 nm. A sampie output of this scanning procedure is shown in Figure 9.19 for three cells typieal of the population, whieh was more or less randomly sampled every time an electrode was advanced into
500
Sensory Coding 15
B
Mean =462 ± 15 N=23
10 5
lll}J
0
..
GI
...
(a) 15
G
tU
(J
0
10
Mean = 529 ± 14 mJ.l N= 14
"GI
.c
E
5
:l
Z
0
(b) R
40
30 20
Mean = 611 ± 23 N= 105
lll}J
10
o 400
500
600
Wavelength (nm)
700
HGURE 9.20 Histograms of the peak sensitivities of a Zarger sampZe of the three classes of cone responses shown in Figure 9.19. Once again it is clear that the cones in this animaZ have absorption spectra, which cluster into three widely spaced groups (from Tomita, Kaneko, Murakami, and PautZer, 1967).
the retina. The maximum amplitude of any of these negative-going receptor potentials is only 5 mV, yet the response curves are relatively smooth and noise free, a signal-to-noise ratio characteristic of the fine technique of this group. Tomita and his colleagues were able to re cord the spectral response curves of 142 cells sufficiently weIl to include them in their analyses. The analyses are summarized in the histograms of Figure 9.20(a), (b), and (c). Three groups of response curves appear to emerge from this analysis. The spectral absorption curve does vary slightly from one ceIl to another in the retina of this fish, but whether this is true variation in the absorption characteristics or merely an artifact of these low-amplitude signals cannot be determined. Nevertheless, it is clear that there is a group of receptors whose peak sensitivity is in the short wavelengths at about 462 nm, a medium wavelength sensitive group with a peak sensitivity of 529 nm, and a long wavelength sensitive group with a peak sensitivity of 611 nm. Tomita and his colleagues also pooled all of their data for each group and plot ted three average spectral absorption curves they believed to be representative of the total population. This is presented in Figure 9.21, and it can be seen that once again the familiar picture obtains-three relatively broadly tuned classes of receptors with peak sensitivities as indicated above. It should also be noticed that the long wavelength-sensitive pigment seems to have a
501
The Neural Coding of Sensory Quality-Vision 100
FICURE 9.21 The average absorption spectra tor the data of Figure 9.20. This figure further emphasizes the fact that the response curves are broadly tuned in relation to the full visual spectrum. Both mean values and variance limits are shown in this figure (from Tomita, Kaneko, Murakami, and Pautler, 1967).
~
.,
~ ö.
~ 50 ~
8.
:c
a:
0 400
500
600
700
Wavelength (nm)
sensitive tail, which extends well down into the shorter regions of the spectrum, and that this is fairly typical of most of the spectral functions we have discussed. The "red" pigment, therefore, is sensitive across almost the entire spectrum-a very broad band tuning indeed.
5. A Brief Summary of Retinal Color Coding. All of the data presented so far in this section support the conclusion that Young was essentially correct 170 years ago when he hypothesized three more or less broadly tuned receptors with peak sensitivities in the red, yellow-green, and blue portions of the spectrum. Collectively, these da ta put to rest a number of persistent controversies. With the exceptions of those practical applications of color mixture which may, for some procedural reason, require some other triad of primaries, there is no justification, theoretical or otherwise, for assuming any other triads of primaries than the ones identified by these studies. The narrowly spaced primaries of Hecht (1934) and Hurvich and Jameson (1957), though interesting, are simply nonphysiological and must be considered to be only mathematical fictions, which can exist only because of the fact that there is no psychophysically unique set in trichromatic color matching. Similarly, there is no evidence of any opponent process at the receptor level, a11 response functions being monophasie. Acknowledging the fact that the data described above have been collected on several different species and that species differences do often lead to shifts in peak sensitivities, it still may be of some use to summarize the information contained in this section. This has been done in Table 9.2, which lists the peaks of the triad as determined in each of the physiological studies we have discussed in addition to the psychophysical studies (Stiles, 1949,1959, Wald, 1964, 1966) which bear on this problem. In spite of the species differences and the differences in technique in each laboratory, the agreement among these measurements is quite high. There is little evidence of any fourth receptor in the yellow region, a fact that also puts to rest some notions of "primary yellow," or for that matter any sort of opponent process at the retinallevel. But therein, of course, lies the rub, for as we ascend to the very next accessible level of neural coding-the bipolar layer of the retina-the picture changes in a surprising way.
S02
Sensory Coding
TABLE 9.2. THE PEAK SPECTRAL SENSITIVITIES OF TRICHROMATIC RETINAS DERIVED FROM SEVERAL SOURCES'" Study
Species
"8"
"G"
"R"
Brown and Wald, 1964
Man
450
525
555
Marks, Dobelle, and MacNichol, 1964
Man Monkey
445
535
570
Marks, 1965
Goldfish
455
530
625
Witkovsky, 1968
Carp
482
517
580, 660
Burkhardt, 1968
Goldfish
450
Tomita etal. (1967)
Carp
462
529
611
Stiles (Psychophysics) (1949, 1959)
Man
440
540
575
Wald (Psychophysics) (1964, 1966)
Man
430
540
575
620
.. Note particularly that there are only two different groups of animals involved-primates and teleost fish-and that two of the studies are psychophysical ones. This table clearly shows that widely spaced visual primaries with peak sensitivities as indicated must now be accepted. Trichomatic retinas have not, however, yet been demonstrated in other animals.
B. Color Coding Beyond the Photoreceptors 1. The Retinal Horizontal and Bipolar Layers. As thoroughly compelling as
the findings are that the photoreceptors are organized into a broadly tuned trichromatic system, there is no apriori reason to believe either that the coding schema remains the same or, for that matter, that it changes drastically as the signal ascends to higher portions of the visual pathway. The basic premise of coding theory is that the representation scheme of patterns of stimulus information may vary at different levels. As if nature wanted to emphasize this point, we shall discover when we consider the data for those cells that synapse with the receptors in the outer plexiform layer-the bipolars and horizontal cells that they encode color information in a completely different way than do the receptors. The beginning of the story is actually based upon amisinterpretation of some data that had originally been presented by Svaetician (1956) as representative of cone electropotentials. Later, MacNichol and Svaetician (1958) and Svaetician and MacNichol (1958) modified this conclusion and stated that the potentials they recorded in their experiments actually originated in the inner nuclear layer. The general procedure in these experiments was to insert a very fine glass microelectrode into the retina of a fish. The potentials recorded, like those recorded at the level of the photoreceptor, were purely graded signals
The Neural Coding of Sensory Quality-Vision FIGURE 9.22 One type of graded potential obtained from the bipolar layer. In this example, the cell produces a hyperpolarization when stimulated with almost any wavelength light. This response, like that of the cone, is not a spike response, but only aseries of brief graded potentials produced by equal energy stimulating pulses (from Tomita, 1965).
S03
10mV
400
500
600
700
Wavelength Wavelength (nm) (nm)
of varying amplitude with no evidence of spiking. The response functions recorded in their experiments, as the wavelength of the incident light was varied, were of two kinds. First, there were cells that had a very broad spectral sensitivity and responded solely by producing a graded hyperpolarization of varying amplitude. Figure 9.22 shows the response pattern of one of these cells. However, another major dass of cells, which had a completely different pattern of response, was also recorded. With these cells, as the spectrum was scanned, the response of the slow potentials changed from a depolarizing to a hyperpolarizing response. Figure 9.23 shows the form of the response function produced by a cell that hyperpolarized in the blue region of the spectrum, but depolarized when the stimulating light was in the red end of the spectrum. Svaetician also originally reported the presence of cells that depolarized when the stimulating light was in the blue end of the spectrum, but hyperpolarized when the light was in the longer wavelengths, but in the later papers these findings could not be replicated. MacNichol and Svaetician (1958) associate the monophasic hyperpolarizing response with certain giant horizontal cells, while the opponent responses were assumed to be the product of bipolar ceU activity. However, the picture may be even more complicated than as described by Svaetician and his co-workers. Tomita (1965) has found that in addition
FIGURE 9.23 An opponent type of bipolar layer response. This response is also a graded potential and not a spike response. The cell hyperpolarizes in the bluegreen end of the spectrum and depolarizes in the yellow-red end (from Tomita, 1965).
10mV
400
SOO
600
Wavelength (nm) (nm) Wavelength
700
504
Sensory Coding
to the broadly tuned monophasie hyperpolarizing cells, and the cells that had adjacent spectral regions of hyperpolarization and depolarization, some cells (probably bipolars) in the inner nudear layer responded with a triphasie response, whieh was hyperpolarizing at both ends of the spectrum, but depolarized in the middle range of spectral stimuli. The response of such a cell is shown in Figure 9.24. The important general point that is made by the data is that the co ding schema for colors has already begun to change at the second stage of neural processing. Considering the possible complexity of the interconnectives between the photoreceptors and the inner nudear layer, this transformation can be understood, but it is still a surprising turn of events, for in one synaptic leap we are beginning to find elements of an opponent color systeml Perhaps some of Hering's intuitions concerning color coding were not too farfetched, regardless of the fact that his theory of the photoreceptive process is so dearly wrong. 2. Amacrine Cells and the Ganglion Cells of the Optic Nerve. Third level
neurons in the retina-the amacrine and ganglion cells-introduce a new factor into the coding scherne. Werblin and Dowling (1969) have shown in the mudpuppy, at least, that, unlike the receptors, these cells are able to propagate regenerative spike action potentials. Thus, the symbols available in the co ding language are now different. Although not very much is known of color coding in the amacrine cells, since the axons of the ganglion cells make up the optic nerve they are relatively accessible. Surprisingly, when one considers the relative ease with which ganglion cell potentials can be recorded (compared to bipolar responses, for example), there is a relatively contradietory set of data concerning color coding at this level. The first studies of their neural processing were carried out by Granit, who has summarized the main ideas of his work in an important book (Granit, 1955). Unfortunately, while the theoretical influence of Granit's book is still continuing, the details of his experimental work on ganglion cell color coding do not seem to have had the same persistence. Let us consider one of Granit's papers in detail to understand the sort of findings he obtained and the techniques he used. Granit (1945) inserted glass insulated metal mieroelectrodes through the cornea into the retina of a cat. The signals that were recorded in this mann er were trains of spike activity. The relative spectral sensitivity was measured by scanning with an equal energy spectrum after selective adaptation with colored lights. The number of responses that occurred in a fixed period of time was used as the response measure, but were normalized so that the strongest responses were indicated at 100 percent. In this 1945 paper, Granit reported finding two different general dasses of receptors. One group displayed relatively narrow (about 50 nm wide at the half bandwidth level with moderate stimulus intensities) sp'ectral sensitivity curves, with center wavelengths at about 460, 540, and 600 nm. Figure 9.25 shows a set typical of this first dass. In line with his previous work, he referred to those cells as the chromatic modulators and associ-
sos
The Neural Coding of Sensory Quality-Vision FICURE 9.24 Another kind of bipolar layer graded response. In this case, the cell is not simply opponent, but is actually triphasic in response form. This particular cell hyperpolarizes in the short wavelengths, depolarizes in amiddie band of stimuli, and then hyperpolarizes once again for even longer stimulus wavelengths (from Tomita, 1965).
10mV
I
400
I
I
500
600
I
700
Wavelength (nm)
ated them specifically with the color receptors of the retina, although it seems almost certain now that anormal cat is really a dichromat and has only two kinds of cones. The second dass of ganglion cell fiber observed in the cat's eye was characterized by a very broad spectral range-almost 125 nm wide at the half bandwidth level in some cases. Two types of these broadly tuned cells were observed. One had a peak sensitivity near that of the scotopic luminosity curve, and it was, therefore, referred to as a scotopic dominator, while the other had a peak ne ar that of the photopic luminosity curve and was referred to as the photopic dominator. In all, the five different types of cells, three "modulators" and two "dominators," provided the basis for Granit's "modulator-dominator" theoryof neural coding. It must be clearly no ted that this theory assurnes a trichromatic basis for color co ding at the level of the ganglion cells. Granit is quite specific about this point (see pages 134-135 of Granit, 1955). Thus, modulatordominator theory is just another version of the now-familiar Young-Helmholtz theory, but expressed in neurophysiological terms at the ganglion ceIllevel. However, the results of another important study of ganglion ceIl responses carried out by Wagner, MacNichol, and Wolbarsht (1960) on goldfish retinas led these investigators to conclude that the neural co ding mechanisms in the ganglion cells were opponent rather than trichromatic in nature. In this case, however, the term "opponent" has a different meaning. In the more peripherallevels of the retina, opponent mechanisms were reflected as decreases or increases in the level of polarization of a ceIl membrane and were thus graded potentials. In the ganglion ceIl, we are now talking about rates of spike firing, and the opponent mechanism is reflected by a decrease or increase away from the spontaneous spiking frequency. The amplitude of the specific neural response involved-the spike-of course, remains constant. Wagner and his coIleagues used microelectrodes made of a platinum and iridium alloy to re cord ganglion ceIl potentials from the excised retina of goldfish. Their data indicate a much more complicated pattern of action than the one suggested by Granit. For example, in addition to the differential opponent spike rate response, the color of the incident light also
506
Sensory Coding
100
CI>
GI
80
~
'5. CI>
... !E 0
60
:I
C
E :I E
'x 40 tU
...E 0
'#.
20
400
450
500
550
600
650
700
Wavelength (nm)
FICURE 9.25 Three spectral response curves for ganglion cells of the dass that Cranit referred to as modulators. This sort of finding was the basis of his assertion that there was a trichromatic system at the ganglion cell level of the retina. This assertion is not now believed to be correct (from Granit, 1945).
strongly affected the temporal "on" and "off" responses of the cells. Some cells were stimulated to respond with a burst of activity at the beginning of the stimulus when the stimulus was of a short wavelength, while at longer wavelengths the response produced was a purely "off" response. Other cells behaved in just the opposite fashion. The spontaneous activity of some cells was inhibited by a stimulus. Some cells fired in a sustained manner rather than in an "on-off" fashion. To further complicate the picture, some cells changed their pattern of response as the intensity of the stimulating light varied. Figure 9.26 shows the pattern of response of a typical opponent cell that produced "on" responses for wavelengths about 500 nm, and "off" responses for wavelengths greater than 600 nm, and combined "on-off" responses in the middle regions of the wavelength spectrum. The two separate "on" and "off" curves were interpreted by Wagner and his colleagues as being representative of opponent color mechanisms, but in this case the main co ding mechanism appeared to be the relative amount of "on" and "off" activity rather than the opposite polarity of response observed in the bipolar cells. Motokawa, Yamashita, and Ogawa (1960), in a study aimed at the general problem of receptive fields of the carp's ganglion cell, report data that essentially support the Wagner, MacNichol, and Wolbarsht findings.
The Neural Coding oE Sensory Quality-Vision
507
3
...>
';;;
c:
l!l 2
,e: HG URE 9.26 Another form
of ganglion ceIl spectral sensitivity, in which the "on" and "off" spike responses of the same cell have quite distinct spectral sensitivities. This is another unusual form of opponent process. The vertical co ordinate indicates the intensity of the stimulus required to produce a criterion response (trom Wagner, MacNichol, and Wolbarsht, 1960).
Inhibition
"0
'0
Off
.,'"...
~
...
~
~ co
';;
2
~
Cl
0 .J
On
0
500
600
700
Wavelength (nm)
Thus, the data that have been obtained by Granit (see p. 503), on the one hand, and by Wagner, and by Motokawa and their respective colleagues, on the other, are quite contradictory. Whether this is a species difference (which is unlikely considering the variety of species Granit used) or is due to some other procedural differences cannot be determined at this time, and future experiments will have to definitively unravel the apparent contradiction. Considering the co ding mechanisms in prior and successive levels, it appears that Granit's work is misleading in some way, but of this we cannot be certain, for in addition to the temporal complexities we have just mentioned, some new work suggests there are also some chromaticspatial interactions involved in the receptive field organization of these ceHs that further complicate the subject of ganglion ceH color coding. Daw (1968) and Gouras (1968) have both been interested in the receptive field structure of the color-sensitive ganglion cells of the goldfish and monkey, respectively. Both experimenters find that each animal displays a center-surround antagonistic receptive Held arrangement similar to that we have already discussed for monochromatic stimuli in Chapter 9. But in this case not only did the temporal and spatial dimensions interact, but the chromatic dimensions of the stimulus also became involved in a very intricate way. Perhaps the complexity of the interaction can best be emphasized by simply quoting one of Daw's summary statements, which holds, in general if not in detail, also for Gouras' findings in the monkey. Most cells (49%, Type 0) also gave a peripheral response with an "on" response to green light, and an "off" response to red light in the periphery, as well as an "on" response to red light and an "off" response to green light in the centre (or vice versa). (DAW, 1968, p. 567.)
508
Sensory Coding
This is an extreme example of a case in which one neural code-the spike frequency pattern-is overlappingly sensitive to many aspects (space, time, and quality in this case) of the stimulus and probably contributes to the definition of equally many dimensions of the perceived experience. The main point of these findings from the ganglion cell is to emphasize the variety of functions contributing to the response of this third-order cello In addition to the ones we have already measured, the overall spatial and temporal pattern of the stimulus is certainly also involved. For exampIe, the spatial distribution of the stimulus can also produce differential effects. One reason that Granit's findings may have diverged so much from these other studies is that he often used diffuse light stimuli. Many of these effects occur only with more localized stimuli. On the other hand, Daw also reports that his effects are maximized when the spots are not very tiny, but of some intermediate size. Another point made by Gouras is also important to note. One of the two types of ganglion cells observed by hirn seemed to be sensitive to the output of two types of cones in a way that is not too dissimilar to the opponent mechanisms we have so far discussed. The other type of ganglion cell was, however, more like a simple trichromatic mechanism, although it, too, could be both inhibited and excited, depending upon the portion of the receptive field in which a particular color stimulus fell. Thus, even at the ganglion level, there appears to be some redundancy in the co ding as weIl as the overlapping we have just described. Clearly, we have here a most complicated example of overlapping and redundant coding, of which the whole story has not as yet been told. 3. The Lateral Geniculate Body. The next stage of neural processing for
which data are available is the thalamus. The ganglion ceU axons, which make up the optic nerve and optic tract, ascend without any further synaptic interaction until they arrive at the thalamic lateral geniculate bodies, and, therefore, little integration or recoding of any consequence can occur between the retina and this centrallevel. Most of our knowledge of chromatic coding in the lateral geniculate body is based on the work of DeValois and his coUeagues (DeValois, Smith, Kitai, and Karoly, 1958j DeValois, 1960j DeValois, Jacobs, and Jones, 1963j DeValois, 1965j DeValois, Abramov and Jacobs, 1966j DeValois and Jacobs, 1968). DeValois, Abramov, and Jacobs' paper (1966) is the best summary source, and it is upon this paper that we shall concentrate our attention. The general technique used by DeV alois and his coUeagues was to drive KCI-filled glass microelectrodes into the various layers of the monkey's lateral geniculate body and thus more or less randomly sampie the pattern of activity of single ceUs during periods of stimulation with spectral colors. Because of the relative size of the cells and the electrode and of the response magnitude, they feIt that all of their recordings were probably extracellular. The potentials recorded in their experiments were not graded potentials, but bursts of spike activity similar to those recorded in the ganglion ceU axons of the optic nerve. DeValois had earlier shown that the activity induced in the various
The Neural Coding of Sensory Quality-Vision
509
layers of a cat's lateral geniculate differed with regard to their time course rather than representing the chromaticity of the stimulus. [A separate representation of red, green, and blue responses from each eye in each of the six layers had been hypothesized by Le Gros Clark (1949) to explain the multilayered anatomical organization of this thalamic center. But this idea was definitively dispatched by these new findings.] Some layers contained cells that predominately produced "off" responses, some "on" responses, and some both "on" and "off" responses, but all color information was represented in aH six layers by one or another of these coding mechanisms. AH of the ceHs that were encountered displayed moderate amounts of spontaneous activity, usually varying between 5 and 10 spikes/sec, and as we shaH see, this spontaneous activity was absolutely critical in characterizing the response pattern of each cello After a cell was located, brief flash stimulation with 12 different wavelengths of light was used to determine the relative response of the cell as a function of the chromaticity of the stimulus. Over the course of the experiment, DeValois and his co-workers were able to study 147 cells that displayed some sort of chromatic response function. These ceHs divided into two groups. One group produced a response function which was such that in one part of the spectrum the steady firing was increased by the stimulus, while in another part of the spectrum the activity was decreased. These ceHs were termed spectraHy opponent ceHs. One example of such a ceH's response is given in Figure 9.27 as a function of stimulus wavelength. On the other hand, a second group of cells, which either uniformly increased or uniformly decreased its activities as the stimulus flashes scanned across the spectrum, was also found. These ceHs were termed nonopponent excitors or nonopponent inhibitors, respectively. The opponent ceHs were further classifiable into four different groups, depending upon the spectrallocus of the excitatory and inhibitory region. Two long wavelength excitatory types exhibited peaks in the red and yeHow regions, while two short wavelength excitators exhibited peak increases in their activity either in a green or blue region. Typical spectral sensitivity curves for the six different types of ceIl are shown in Figure 9.28(a) through (f). DeValois, Abramov, and Jacobs conclude by suggesting that the spectraHy opponent ceHs are the ones primarily responsible for the representation of color, whereas the overall brightness information is represented by both the inhibitory and excitatory nonopponent ceHs. Once again, at this level it is clear that simple notions of trichromaticity do not weIl describe the behavior of the sampie cells. The data obtained in these experiments emphasize a number of other important facts. First, whatever is the spectral sensitivity and the nature of the co ding of the individual neurons that represent chromaticity throughout the ascending visual pathways, there is virtuaHy no evidence of any very narrow or narrowly spaced spectral sensitivity curves. Rather, it seems always to be the case that relatively broad and broadly spaced response curves are found. DeValois' work also suggests one other important point, namely, that the relative amount of activity among the three or four systems is probably more important than the absolute amount of neural activity in a single system in the representation of chromatic information.
510
Sensory Coding On
Off
420 440 480 -500 E
.5
...
~
g'530 Q)
1 lts
~560 590 650 670 Time
FIGURE 9.27 A sampIe of the type of opponent response observed in a lateral geniculate neuron as a function of the frequency of the stimulating light. In the short wavelength end of the spectrum, this cell produced a substantial"on" response. In the long wavelength end, it produced a substantial "off" response. This is another form of opponent coding (trom DeValois, 1965).
A significant caveat, which might be worthwhile to mention concerning this and related work that deal with the response of cells with varying levels of spontaneous activity, is that there always remains the possibility of a bias concerning the sampie of cells from which the experimenter records. We have already seen how the level of spontaneous activity is associated with the response type of a given cello Since there would be some tendency to miss cells that do not have any appreciable spontaneous activity, this might be reflected in a somewhat distorted picture of what is going on at any particular neural level. The apparent predominance of the opponent mechanisms may be more a result of this sampling bias than the true biology of the situation. Wiesel and Hubel (1966) have also studied the responses of colorsensitive cells at the level of the lateral geniculate body of the thalamus. They, however, were concerned with another aspect of the problem. They noted that the antagonistic center-surround receptlve field arrangement explored by Kuffler and others and the color mechanisms explored by DeValois and his colleagues both were examples of opponent mechanisms, In the one case, the opposing stimuli fell on different spatial locations, while in the other, the opponent stimuli lay in different portions of the visual spectrum. The question they asked concerned the behavior of single cells in this situation, Are individual geniculate cells capable of responding
The Neural Coding of Sensory Quality-Vision
511
to both kinds of opponent mechanisms or are the spatial and chromatic codes conveyed separately through separate systems? It turned out, as they pursued their experiments, that most cells differentially sensitive to color were also differentially sensitive to some aspect of the spatial pattern and vice versa. Hubel and Wiesel observed four different types of cells in the various layers of the lateral geniculate body. Type I included antagonistic center-surround arrangements with the center being maximally sensitive to one color and the surround being maximally sensitive to another. Type II cells showed no center-surround ararngement, but did show opponent color mechanisms similar to those described by OeValois over a limited spatially homogenous receptive field. Type III cells had the same spectral sensitivity in both the center and surround of an antagonistically organized field, while Type IV had a central excitatory field surrounded by a large inhibitory field in which the peak spectral sensitivity was always shifted more or less to the longer wavelengths with regard to that most vigorously exciting central field. Thus, Hubel and Wiesel conclude that most cells were affected more or less both by the spatial and chromatic dimensions of the stimulus. This is clearly a situation in which the neural coding mechanisms is overlapping, just as intensity and color interact in other situations. It is an effective warning to us to be sure that, whatever our coding schema turn out to be for a given dimension, we always make it quite clear that the statement all other things being equal" is conspicuously appended either explicitly or implicitly. 11
4. Single Cell Responses in the Visual Cortex. In spite of the enormous
complexity of the cellular response patterns of the visual cortex to spatiotemporal patterns (see, for example, in Chapter 8) the spatio-temporal complexity observed by Hubel and Wiesel and by Spinelli, it is still possible to find evidence of differential spectral sensitivity among single cortical cells. Motokawa, Taira, and Okuda (1962), for example, were able to insert tungsten microelectrodes through holes in the skull of monkeys directly into the visual cortex, 50 that extracellular responses could be recorded from single cells. Stimulus lights of 15 different wavelengths were intensity modulated with neutral density filters until all had equal constituent energy. Each was then flashed in succession for 1/2 sec with 8-sec interflash intervals. In this manner, the spectral response function of these cells at different wavelengths could be determined. The cells, which were surveyed, typically showed the on and off behavior patterns found at 50 many other levels of the visual pathway. As it should be expected, there seemed to be very many different factors influencing the behavior of these cortical cens. The general arousal state of the animal influenced the cellular response substantially, and it was necessary to continuously monitor the EEG to be sure that the preexperimental surgical anesthesia effects were completely overcome. Ouring the experiment, the monkeys were immobilized with Flaxedil-a curarelike substance-and rigidly fixed in a holding apparatus. The results were quite surprising. In our discussion of the coding of color in the ascending visual pathway, with the exception of Granit's some-
25
50
()
5l
+R-G cells
20
"in
45
00
+Y-B cells
CI>
~
'5. CI>
15
0.6
o
~
0c:
10
~
5
c:
40
1.3
Spontaneous rate
. . "0 ~
35
..!!! CI>
~3O
'5.
:E
o
400 20
40 60
80 500 20
40 60
80 600 20 40 60
Wavelength (nm)
60 700
g c: ~
(a)
25
01 20
:E 15
50 10
04
Spontaneous rate
09
45
o
+B-Y cells 40
o
400204060 80 500 20 40
~CI> ~
g ...c:
30
~ ~
'5.
25
80 100
03 15
+B-Y cells
03
CI>
o
20
15
60
"0 10 c: c: ~
CI>
:E
40
(b)
() 20 -02
'~
Ö
60 80 600 20
Wavelength (nm)
() 35
03
10
Spontaneous rate
03
:E 0 400 20 40 60 80 500 20 40 60 80 600 20 40 60 80 700
OB
Wavelength (nm)
Spontaneous rate
o
(d)
30
o
400 20
40 60
80 500 20
40 60
80 600 20
40 60
Wavelength (nm)
80 700
(c)
25
Excitators
()
CI>
..!!! X;
20
~
~
""C;; 15
~
.~
Inhibitors
Spontaneous rate
o ci
5
~
0
0
ci
c:
10
...c:
~
c: c:
= 'ö.
17
0.0
15
10
0.7
CI> :E
11
05
:E_20406060_20405060~20405060~
Wavelength (nm)
(e)
Spontaneous rate
o
400 20
40 60
80 500 20 40 60
80 600 20 40 60
1.3
80 700
Wavelength (nm)
(f)
FIGURE 9.28 The various types of spectral response curves observed in the lateral geniculate body. (a) A cell which was inhibited in the short and excited in the long wavelengths. The crossover points defines this as a +red-green opponent cello (b) A cell which was inhibited in the short and excited in the long wavelengths, The crossover point defines this as a +yellow-blue opponent cello (c) A cell which was excited in the short and inhibited in the long wavelengths, The crossover point defines this as a +green-yellow opponent cello (d) A cell which was excited in the' short and inhibited in the long wavelengths. The crossover point defines this as a +blue-yellow opponent cello (e) A cell wh ich is inhibited over the entire visual spectrum. This is defined as nonopponent inhibitory cello (f) A cell which is excited over the entire visual spectrum.
The Neural Coding of Sensory Quality-Vision
513
what atypical findings and Gouras' second type of ganglion ceIl, there was little evidence of simple monophasic trichromatic representation from the very earliest synaptic contact in the retina. Opponent type operations of one sort or another seemed to be the main coding mechanism all the way up through the lateral geniculate body. However, although Motokawa and his colleagues found several different kinds of response patterns, they found some ceIls, at least, that were at least partially trichromatic in their coding scheme. Figure 9.29 is a plot of their data for a particular group of 22 cells for which they were able to successfully track the entire response spectrum using an "on" response as the criterion. The data have been pooled for both cortical cells and cells that Motokawa and his colleagues feIt were probably extensions of the optic radiations, but the overall pattern is clear. The familiar picture of three broadly tuned receptor groups with peak sensitivities at 460 nm, 530 nm, and 620 nm1 is found once again. Thus, information originally encoded in terms of trichromatic mechanisms and subsequently re-encoded in terms of opponent mechanisms has re-emerged at the cortical level on ce again in the language of a trichromatic code. Some of the ceIls, however, exhibited a sort of behavior, which Motokawa, Taira, and Okuda classified as opponent type responses. These cortical cells seemed to have different behavior patterns for the "on" portion of their responses and for the "off" portion. The sensitivity of the "on" responses for some cells peaked in a very narrow range at the long wavelength end of the spectrum. Surprisingly, the "off" responses of these same cells displayed a peak of evoked activity at the short wavelength end of the spectrum. Both the "on" and "off" portion of these opponent processes, therefore, were excitatory in nature. In this case, however, they were distinguished in terms of whether the short or long wavelength ends of the spectrum maximally excited the "on" or "off" portions of the response. Figure 9.30 shows this type of response pattern for a typical cell with a peak "on" activity centered at about 620 nm and a peak "off" activity centered at about 480 nm. In addition to these cells that showed a single peak of sensitivity for either the "on" response or the "off" response, some ceIls, which displayed double peaks of sensitivity in both the "on" and "off" responses, were also observed by Motokawa and his colleagues. Figure 9.31 displays this kind of response pattern. Andersen, Buchmann, and Lennox-Buchthal (1962) have also explored the behavior of single cells in the monkey cortex, using a flash technique in which the visual stimuli were only 8 msec long. In this case, there would be no way to separately compare the "on" and "off" effects. In general, their 1 This, of course, does not agree with the spectral sensitivity of the monkey's red cone, which, on limited evidence, appears to peak at 570 nm.
This is defined as a non opponent excitatory cello The horizontal dotted lines in each case define the spontaneous level of activity, and the small numbers indica te the log attenuation of the stimulus with respect to the maximum stimulation level. Each curve in these figures is, thus, for a different stimulus intensity. The color names are defined by what human observers report. We do not know what this animal sees (all figures are from DeValois, Abramov, and lacobs, 1966).
514
Sensory Coding
100
.. ..
GI
~
"0. '0-
0
... 1l
E 50 ~ c:
E ~ E
"xtU
E
'0-
0
"*"
o
400
600
500
700
Wavelength (nm)
FICURE 9.29 In the primate cerebrum, some evidence of trichromatic co ding is seen once again. The figure shows the spectral sensitivities of some trichromatic cell types in the monkey's visual cortex (open and half-filled eircles) and the optic radiation (filled eircles). All responses in this case were from cells that gave strong "off" responses (from Motokawa, Taira, and Okuda, 1962).
findings can be characterized as showing no evidence of either trichromatic or opponent codes of any kind. Each cell seems to respond to a characteristic band of spectral wavelengths, but there was no evidence of any clustering into three or more groups. Perhaps more interesting is their finding that the specific sensitivities of the individual cells were very labile-a cell with a narrow response band might change suddenly to become responsive to white light and all monochromatic frequencies. This shift, which was not reversible, seemed to be associated with the type of anesthesia used in this experiment-a clear warning of the possible pollution of what otherwise might seem to be a relatively straightforward experiment by ostensibly insignificant technical details. Finally, we may call attention to Hubel and Wiesel's (1968) work on the monkey striate cortex. They found that cortical cells with any kind of color sensitivity were relatively rare, and even the few from which they could re cord seemed to be associated with all possible variants of the different kinds of temporal-spatial sensitivities (see Table 8.1) that were present in the monkey's brain. (Higher-order complex cells were not observed in this anima!.) As they put it,
... a satisfactory study [of cortical color coding] will probably mean recording from thousands, rather than hundreds of cortical cells. (HUBEL and WIESEL, 1968, p. 225.)
515
The Neural Coding of Sensory Quality-Vision 40
30
c:
o
CI)
20
20
CI) ~
'e.
...
CI)
0
"-
CI)
~:::I
o
Z
FIGURE 9.30 An opponent type process in a cortical unit in the monkey. This cell produces excitatory "on" responses and "off" responses, hut in different regions of the spectrum. But as we saw in Figure 9.29, the "on" and "off" responses themselves might preserve some trichromatic codes (from Motokawa, Taira, and Okuda, 1962).
... 0
20
20 400
500 Wavelength (nm)
It is clear that this sort of data is only chipping at the peak of a horrendously complex system of cortical color co ding and that much further work will have to be done to extend and rationalize these few observations. But what is clear is that the possible kinds of mechanisms for the encoding of color at the cortical level are several and diverse as weH as being intermingled with mechanisms for the representation of time, space, and intensity.
v.
A CONTEMPORARY MODEL As one examines the history of color theories, it seems that the only real biological controversy that existed between the trichromatic theory and the opponent color theory concerned the nature of the photoreceptive process itself. That controversy is clearly resolved at this point in the history of the problem. The receptor processes are trichromatic an.d it is entirely probable that three different chemicals reside in each of three different kinds of cone. Though we have no idea what the nature of the anatomicalor bio-
516
Sensory Coding
15
c:
0
CI>
10
5
GI
~
'a.
... CI>
0
~
GI
.t:I
E :;,
z
0
~
0
5
400
500
600
Wavelength (nml
700
HG URE 9.31 A curious cortical cell response, which was triphasic in its opponent mechanism. Both the "on" and "off" responses were excitatory at hoth ends of the spectrum, hut to a lesser degree in the middle (from Motokawa, Taira, and Okuda, 1962).
chemical difference among the three cones is that allows each to acquire or manufacture a different type of photochemical with a characteristic absorption curve, it is quite clear that this is exactly what they do. Furthermore, the difference in the absorption spectra of the three pigments is specifically attributable to the difference in three different kinds of opsins, the large macromolecular moeity, a single form of retinah presumably, according to Wald's theory, being common to all three. The data concerning retinal photopigment absorption also resolve another one of the major issues that distinguishes the various trichromatic mathematico-descriptive theories from one another. The nature of the triad of trichromatic fundamentals is now established beyond any reasonable controversy. The three photochemicals have absorption spectra that are about 100 to 150 nm wide and peak at 450 nm, 530 nm, and either 575 nm for primates or 620 nm for the fish, respectively. Any further attempt to use primaries with significantly different characteristics is inappropriate and can only give rise to what is essentially a mathematical fiction based on the psychophysical fact that any triad can be used for color matching as well as neglect of the biological data.
The Neural Coding of Sensory Quality-Vision
517
The success of a trichromatic theory in describing the nature of the absorption pigments and in putting to rest that specific part of the Hering theory that postulated opponent photochemical properties does not, however, mean that all notions of opponent mechanisms in the visual system must also be discarded. It is quite clear that trichromatic coding mechanisms are only infrequently found once one is past the photoreceptor layer. From the bipolar cell layer onward, apparently the neurophysiological data do support the idea that a single cell can operate to encode color information in a manner that comes closer to what we call opponent mechanisms than it does to the trichromatic ideas. 2 However, the re-emergence of trichromatic mechanisms possibly in the optic nerve and in the cortex should also be a warning that information, whatever the language it is encoded in at any particular level, is essentially preserved as one ascends the visual pathway. Trichromatic recoding by subsequent mechanisms is not only possible, but has been observed in at least a few instances. The major premise, which we are compelled to accept, once again, is the notion of equivalent but different encodings of the same information at different levels in the ascending pathways. The rapprochement of the trichromatic theory and opponent color theory is based on the fact that mechanisms associated with one or the other of the two views, can exist sequentially at different levels of the nervous system without any inherent contradiction. Theories of color based on different neural coding schemes at different levels are generally called state or zone theories and have been present in the literature, though relatively unnoticed, for the past 70 years. Most of these zone theories, unlike the classical theories of Young-Helmholtz or of Hering, are patently neurophysiological in their conceptualization. In this regard, they go quite far beyond the matter of the absorption spectra of the retina I receptors. Over the years, the names of Von Kries, Schrodinger, Adams, Müller and Judd have been associated with the zone, state, or in the nomenclature of this book, level theory. Perhaps the most fully worked out zone theory is actually that of Hurvich and Jameson, whose championing of the opponent aspects is somewhat better known. Although their aim was to provide support for an opponent color theory, their papers clearly affirm the notion that the code may differ at each level. Their work is of special interest because they, more specifically than any of the others, speculate about the nature of the transformations that might ac count for the reencoding of the output from a trichromatic retina to a central opponent mechanism. The Hurvich and Jameson theory (1957) assurnes a particular trichromatic set of photoreceptors in the retina and subsequent opponent color mechanisms. However, it is primarily a theory based on psychophysical 2 The words "color," "red," "green," and "blue" are used he re and earlier in this chapter in a way that may distress or confuse some readers. They are meant to be used by the author as shorthand terms for long, medium, and short wavelength photic stimuli as defined by the subjective experiences that would be produced by these wavelengths if observed by a normal human subject. We are not suggesting that there is any color sensation in any of the neurophysiological preparations being discussed. A similar statement must be made concerning the use of such words as sweet, salty, fragrant and high or low pitch in later chapters.
518
Sensory Coding
findings and, as such, is by itself incapable of confirming anything about the details of the central opponent processes. As we have seen, there are several different ways in which opponent processes may be coded. Furthermore, the major specific photochemical assumption of their original theory, that of narrowly spaced pigment absorption curves, has been rejected and replaced with a more modern set (Jameson and Hurvich, 1968). Much to their credit, Hurvich and Jameson (1957), anticipating future developments in neurophysiology, specifically noted that some plausible method for converting from a trichromatic co ding in the retina to a central opponent mechanism must be hypothesized if their model was to have any possible substance. In 1957 they were faced, in addition, with the equally plausible alternatives of assuming that the three photoreceptor substances were each isolated in separate cones or that combinations of any two photoabsorptive pigments could exist in a single cone. In the light of the evidence which has been forthcoming since then, it is feasible to ignore the latter alternative and consider only the case in which a single pigment is present in each cone. Figure 9.32 (adapted from Hurvich and Jameson, 1957) shows a hypothetical set of connections, which might account for the conversion of trichromatically coded retinal information to a central opponent color system in one synaptic jump. The three circles represent the three types of cones. In the original drawing, Hurvich and Jameson labeled these circles a, ß, 'Y because the primaries in their hypothetical system were all assumed to be in the yellow region of the spectrum. However, based on the directly measured absorption spectra of the three pigments, it is not inappropriate for us to return to the older nomenclature-R, G, and B-indicating long wavelength, medium wavelength, and short wavelength sensitive substances, respectively. The three circles representing the photoreceptors are connected by single stage communication links to the three opponent cells represented by the attached pairs of rectangles. Each of the three types of retinal receptor has three outputs, two of which may be excitatory or inhibitory as required, while the third is solely excitatory. Two of the outputs go to opponent type "neural units." These units are cell or cell combinations which, while not specifically identified by Hurvich and Jameson, can now be assumed to be either second-order neurons in the inner nuclear layer or combinations of these second-order retinal units. The important fact is that whatever these neural units are, they must, in some way, respond differentially as a function of the relative balance between the inputs from the receptor cells. Hurvich and Jameson believe that the two opponent neural units indicated in Figure 9.32 are specifically involved in the chromatic encoding, while the third cell or cell system is associated with the overall luminosity, a function of the cumulative output of the three cells. This third mechanism was originally considered by Hering to be also an opponent system, but in the light of the data then current, Hurvich and Jameson considered it to be a nonopponent overall brightness system. A necessary premise for the Hurvich and Jameson re-encoding idea is that the effect oE the output of any of the three photoreceptors may be either inhibitory or excitatory on the two hypothetical chromatic opponent
The Neural Coding of Sensory Quality-Vision To
b
y
central
519
nervous
9
system
bk
y
w.
Dark adaptation
R
G
8
HG URE 9.32 A schematic sketch of a plausible neural mechanism, wh ich could convert a trichromatic (R, G, and B) photoreceptor co ding scheme to a more central opponent mechanism (b-y, g-r, and bk-w) (adapted and modified from Hurvich and Jameson, 1957).
neural units. A red receptor, for example, as indicated in Figure 9.32, tends to excite both the blue-yellow and the green-red opponent units. The output of the green receptor, on the other hand, tends to excite the blue-yellow opponent unit, but inhibit the green-red opponent unit. The blue receptor, finally, has outputs that tend to excite the green-red unit, but inhibit the blue-yellow unit. All three units tend to contribute to the excitation of the black-white unit. (Incidentally, this schema is slightly different than the original Hurvich and Jameson model, but in wh at is believed to be a simplifying and clarifying manner.) How, then, are colors encoded by the opponent system? Table 9.3 illustrates at least one coding scheme, which might be used to represent differences in both hue and saturation. If a light from the red end of the spectrum is used as a stimulus, then the blue-yellow and the green-red opponent units both increase in the amplitude of their activity or in some other way selectively respond in a positive sense. This is indicated by a + in both columns in the table. If the red light is a very saturated red, then the
520
Sensory Coding
TADLE 9.3 A POSSIBLE CODING CHART FOR AN OPPONENT COLOR CODING SYSTEM (SEE TEXT FOR FULL DET AlLS)
b/y
g/r
bk/wh
Unsaturated Red Saturated Red Unsaturated Green Saturated Green Unsaturated Blue Saturated Blue
total amount of activity will be limited mainly to the output of the red receptor (++), and the activity of the black-white and the blue-yellow units will be relatively low (+). On the other hand, if the light is relatively unsaturated and contains a relatively large amount of other colors, this state can be reflected by large amounts of activity (+ +) in each of the chromatic opponent units and in the output of the black-white receptor. Thus, for this red light, hue and saturation are encoded by a relative amount of activity among the two chromatic opponent units and the black-white receptor. Similarly, an unsaturated green light would lead to increased activity in the black-white and blue-yellow systems (+ +), as weIl as a substantial decrease in the relative amount of activity (- -) in the green-red system. A very pure green, however, would lead to only a slight increase (+) in the black-white and blue-yellow cell outputs, but still produce the same (--) decrement in the green-red system. In the same way, a stimulus light from the blue end of the spectrum would result in a substantial decrease ( - -) in the amount of activity in the blue-yellow opponent unit, but a greater or lesser amount of activity in the opponent green-red and nonopponent black-white units, depending upon whether or not the light is saturated. The enormous strength of this sort of coding scheme is that a zone theory allows us to consider a system that is at once both trichromatic and, at different levels of the ascending pathway, opponent in operation. Another important advantage of this hypothetical system is that it once again directs our thinking toward a system in which the relative amounts of activity in the several subunits are critical to the specification of hue rather than the absolute amount of activity in any system. Thus, the same red, with any given degree of saturation, can occur at many different levels of brightness. Although there are some changes of hue with brightness level (the Bezold-Brucke effect), this idea is in general agreement with the overall stability of hue at various brightness levels. The discrepancies, which do exist, are probably due to the fact that the nonopponent black-white
The Neural Coding of Sensory Quality-Vision
521
system is probably considerably more complicated than the one postulated in this obviously oversimplified model. In Hurvich and Jameson's (1957) discussion, they were able to show that a wide variety of other visual phenomena could be explained in a relatively straightforward manner by their zone model. If we ignore the very obvious discrepancy of the spectral absorption curves of the triad of primaries, their model did a relatively good job in handling such diverse phenomena as wavelength discrimination, chromatic adaptation, and much of the color blindness data. With regard to this latter problem, the notion of different co ding mechanisms at different levels of the nervous system is particularly valuable in helping to understand how some color blindness may be due to shifts or absences in photoreceptor absorption spectra, while others may be due to neural processes at more central levels. Hurvich and Jameson feIt, for example, that protanomaly and deuteranomaly were probably due to shifts in the absorption spectra of the receptor photopigments toward shorter wavelengths and longer wavelengths, respectively. On the other hand, deficiencies in some more central green-red or yellowblue opponent neural systems, they fee!, are more closely linked to the dichromatic protanopia, deuteranopia, and tritanopia. In sum, then, color coding throughout the visual pathway seems to be characterized by several kinds of recoding at different levels. At the most peripher al level, a trichromatic system based on three different kinds of photopigment underlies the initial transduction. However, at the very first synapse, this trichromatic mechanism seems to be re-encoded into opponent type systems, wh ich are maintained in one form or another with few exceptions all the way to the cortex. It should be noted that the specific nature of the opponent processes, that is, the particular code employed, varies in ways that make the opponent processes differ as much among themselves as they do from the trichromatic mechanism. Opponent processes in the inner nuclear layer, at least in part, seem to be reflected in the opposite amplitudes and directions of polarization of graded potentials. In the lateral geniculate body, on the other hand, the opponent mechanism seems to be encoded by an increase or diminution in the rate of spontaneous spike action potential activity. Finally, there is some evidence in the cortex that the opponent mechanisms are encoded by a different peak spectral sensitivity for the "on" and "off" responses of single cells in addition to the suggestion that trichromatic patterns of response re-emerge at this level. The overall impression that one acquires from the neurophysiology of chromatic responses is that there are probably a large number of different neural codes for visual quality operating at different levels (and even occasionally at the same level) of the visual pathway. There is, in this sense, no single code for quality in the visual system, but a number of different manners of representing stimulus wavelength information-information that remains, regardless of the neurallanguage, more or less constant, until it activates that state of the nervous system at which it becomes equivalent to the experience of color. Another theme that permeates all of the data we have discussed is the interplay of color-coded information and temporal and intensive information. The "on-off" system is obviously evolved to encode temporal discon-
522
Sensory Coding
tinuities in the visual stimulus; yet in our discussion we often see instances in which the "on-off," "on," or "off" activity is also modulated by the spectral characteristics of the stimulus. Rates of firing of a given neuron are also affected by both the color and the intensity of the stimulus. In the ascending visual pathways, it is clear, therefore, that a single ceIl does not operate uniquely as a color encoder or as a time encoder or, for that matter, as a spatial or intensity encoder either. Any ceIl anywhere in the visual pathway that is sensitive to light would probably show some modulation of its activity when any of these stimulus dimensions are varied. The specific meaning of the different attributes of a visual stimulus, therefore, must in some way be functionaIly rated to a balance and comparison of the relative amounts of activity in a large number of different places. This is a sort of mechanism that is quite different from the labeled-line hypothesis, which asserts that the activation of a single key neuron is uniquely associated with some single attribute of the stimulus. Thus, in some broader sense, the entire visual system, and not only the quality code, must operate as a sort of opponent system, and Table 9.3 might well be expandable into some much more elaborate comparison scheme to represent the encoding of the total dimensionality of the visual stimulus.
CHAPTER 10: THE NEURAL CODING OF SENSORY QUALITY-AUDITION
I. THE KEY PSYCHOPHYSICAl DATA
A. Frequency Analysis and Pitch Discrimination If the eye can be best characterized by the data of color synthesis or mixture, the ear is best described as an analyzer. The fact basic to audition is that the auditory system is capable of separating to some degree a complex mixture of different sounds into its components. The ability, for example, to he ar a particular instrument in a performing orchestra or a particular voice in the noisy hubbub of a cocktail party makes our analysis of the phenomena of auditory pitch perception fundamentally different from those of visual hue perception. Although this ability to analyze complex tones is present to a first approximation in human psychophysical processes, it should be understood that it is not as comprehensive a capability as many would have us believe. Ward (1970) has been particularly vigorous in his rejection of the generality of this "quarter-truth" (of frequency analysis) and has noted that vibratos are not frequency analyzed, nor are pulses, nor are many of the musical sounds encountered in our everyday life. Nevertheless, the idea of the general ability to analyze some signals is so ingrained in the scientific literature of audition and is a possible explanation of a sufficiently large number of situations to make it an important factor in any discussion of the basic theory of auditory coding. If one looks at the pressure pattern of a complex acoustic waveform, it is dear that the capability of the auditory system to analyze sounds might, in some way, be analogous to the Fourier analysis process we discussed in
524
Sensory Coding
Chapter 2. Indeed, shortly after Fourier presented his mathematical theory as a means of analyzing and specifying the characteristics of mechanical oscillations, Ohm postulated a set of auditory laws that was based on the idea of the frequency analysis. Ohm's assertion, however, was not simply a statement that complex acoustic stimuli could be analyzed into a unique set of sinusoidal components, a notion already implicit in Fourier's work and the nature of the physical acoustic stimuli, but rather that the auditory system did exactly that analysis, and that the results were observable in the behavior of the organism. The ability, or perhaps better, the partial ability, to analyze an acoustic stimulus into separate components is closely intermingled with the issue of the spedfic psychophysical effect of different acoustic frequendes. To a first approximation, different frequencies of stimulation give rise to different experiences of pitch-the term analogous to hue in the visual sense and representing the quality dimension for this modality. However, just as the color of a light may vary in attributes other than simple hue (it may, for example, vary in saturation as well), the single dimension of pitch (such as that produced by a pure stimulus frequency) does not completely describe the qualitative dimensions of a more complex tone. All musical instruments, for example, produce not single pure tones, but, rather, a complex mixture of the fundamental note and the overtones produced by the mechanics of the instrument. The quality of this mixture of tones, a concept which is separate from the pitch of the individual components, is known as the timbre of the instrument. Other temporal properties of the tonal pattern beyond the frequency of the stimulating tone can also contribute to the quality of the sensation. Repeated oscillation of the temporal intensity pattern (amplitude modulation) varies a quality of the sensation known as tremulo. Osdllation of the frequency pattern (frequency modulation) produces an experience of vibrato. These temporal properties of the stimulus, however, are more comparable to those that produce flicker in vision or vibration in somatosensation than to color or touch and will not be further considered here. The major qualitative dimension in audition is pitch, and just as it is difficult to describe what it is that we mean by a red or a green, it is also difficult to describe what is meant by a high or low pitch. The best manner of definition is an operational one. High pitches are associated with highfrequency stimulation and low pitches with low-frequency stimulation. Thresholds and subjective magnitudes also vary with frequency, howeverj the greatest sensitivity occurring between 1000 and 3000 Hz. It should be also noted in passing, that the pitch of a pure tone does change to some degree as the intensity of the signal varies. Generally, the alteration in pitch with intensity shifts is greater at either end of the auditory frequency spectrum than in the middle region around 1000 Hz, where the threshold is the lowest. The effect has, over the years, been studied by numerous investigators and is considered to be important in some theories of auditory encoding, but at best it is a second-order effect, the perturbing effects of which can best be considered after our theories of auditory encoding have been developed to the point that the first-order effects have
The Neural Coding of Sensory Quality-Audition
525
been satisfactorily accounted for. We shall, therefore, not deal with the issue further in the present discussion. The discrimination of frequency differences is a related and exceedingly important matter, whieh we have already introduced when we discussed spatial interactions. This was done because of the well-known fact that much in the way of frequency discrimination is accounted for by the preneural spatial mechanies of the fluid-filled cochlea. Frequency discrimination was assumed to be, at least in some regards, closely analogous to visual spatial interactions. As areminder only at this point, we should also note that frequency discrimination as it is often measured, with a warbling technique in whieh two alternatively presented tones are adjusted for equality of pitch, is very fine-of the order of 1 to 4 Hz within the range from so to 3000 Hz as was shown in Figure 7.6. This is a remarkably fine sensitivity, corresponding to 1,110 of 1 percent of the frequency over much of this region. This ability to discriminate between nearby frequencies is one of the basie capabilities that must be explained by any theory of auditory encoding. In light of the broadly tuned nature of the peripheral auditory receptive mechanism, this has proved to be one of the most challenging points of contention among the competing theories. Just as in vision, the apparent dis agreement between the fine psychophysieal frequency discrimination and the broadly tuned mechanieal and neural mechanisms in audition has often been resolved by attributing successive sharpening mechanisms to the various levels of neural processing. (See, for example, von Bekesy's 1967 book on sensory inhibition.) We have already detailed some of the objections to such an approach. It should also be noted that there is no apriori reason from the point of view of coding theory why the psychophysieal discriminations should necessarily be isomorphie (to the extent that each is equally finely tuned) to the responses at any level of the nervous system. An equally good argument might be made for the representation of psychophysieal fine tuning by discriminations of the differences between overlapping broad neural distributions (in the vein of signal detection theory) without resorting to any spatial isomorphism. This is a topic we shaH deal with further in the later parts of this chapter and in the final summary of Chapter 11. B. The Pitch of Combined Stimulus Frequencies If one continuously increases the output frequency of a single sine wave acoustie stimulus, there is a corresponding continuous and monotonie increase in the pitch of the sound. Thus, to a first approximation at least, pitch is directly related to the energy of the partieular frequency component of the stimulating tone. However, there are a large nu mb er of other auditory phenomena, most of whieh are associated with the perception of tones produced by combining acoustie frequencies, whieh seem to show that under many conditions, pitches are perceived that may be totally unrelated to the energy spectrum of the stimulus. Before we discuss the details of these phenomena, it would be weH to note that there are two possible general approaches toward an explanation of such a fact. First, it may be that frequency and pitch are simply not
526
Sensory Coding
as directly related as suggested by the simple monofrequency experiment. On the other hand, it may be that this association does hold, but that in some manner distortions produced by nonlinear mechanical and neural auditory processes introduce the signal frequencies that are missing in the stimulus pattern itself. An analysis of this issue, therefore, is of the greatest significance in establishing the strengths and weaknesses of several of the alternative theories of auditory coding. 1. Beats and Difference Tones. When two frequencies of physical oscilla-
tion occur simultaneously in some medium, there is a waxing and waning of the amplitude of the combined oscillation at a frequency equal to the frequency difference between the two. Figure 10.1 shows the effect of adding two sinusoids on the amplitude of oscillation of the combined acoustic wave. The waxing and waning of the intensity of the combined wave is a purely physical phenomenon and is perfectly predicted simply by the linear addition of the two signals. In an auditory environment, if two sound waves are so mixed, an observer will report that he hears the waxing and waning or beating, as it is usually known, of intensity (if the difference in the two frequencies is quite smalI) in addition to the two individual components. As the frequency difference between the two sound waves increases, the intensity of beating passes through stages that have been variously reported as vibratos and burrs until at a specific frequency difference (which is determined by the frequencies of the individual components), a new continuous smooth tone is heard. This new tone corresponds in pitch to asensation induced by a single oscillation equal to the difference between the two real sound waves. At this stage, the percept is no longer one of varying intensity, but of a pure and continuous additional tone. The important and disquieting fact about this phenomenon is that if a perfect frequency analyzing system examined the signal, then it would be shown that there was no energy of any kind at the difference frequency. Thus, it seems that the association between stimulus frequency (and the power of the signal at that frequency), on the one hand, and pitch, on the other, breaks down. The auditory system seems to be in some way responding not to the energy content at a specific frequency, but to the envelope of the composite wave-an envelope that has exactly zero energy content. It is only in a nonlinear system, which can introduce distortions, that frequencies corresponding to the envelope frequency with nonzero power become physically present. The question then arises: are the beat frequency energies actually introduced by some nonlinear distortion in the auditory system, thus salvaging the notion that a given frequency signal is the main correlate of the auditory pitch? Or alternatively, is the auditory system in some way sensitive to the periodicity of the signal, even though the energy content of the periodicity be zero? As we shall see, this is the crux of much of the controversy among the various theories of auditory pitch perception. In addition to the particular difference pitch, the beating we have described here (which is functionally related to the difference between two interacting frequencies), there are also many other kinds of summation and
The Neural Coding of Sensory Quality-Audition
527
(a)
(b)
(c)
FlCURE 10.1 When two oscillations of nearly identical frequency [(a) and (b)] are combined, the sum is a beat frequency equal to the frequency difference between the two.
difference combination pitches. Summation or difference pitches, corresponding to frequencies equal to integral multiples of the sum of two stimuli can also be heard in many other situations. We shall discuss some of the other major combination pitches in the following sections. 2. The Seebeck Siren. Seebeck (1841) many years ago performed some ex-
periments with a pneumatic siren that are closely related to this question of combination tones. His sirens (whistles might be a better term) were made from rotating disks containing perforations, which alternatively closed and opened a source of compressed air. In this manner, he was able to produce patterns of acoustic pulses that were composed of known mixed frequency components long before electronic equipment was developed. Seebeck used Fourier's mathematical analysis techniques to show that combinations of two high frequencies produced physical stimuli with very little energy at frequencies corresponding to pitches that were very strongly heard. For example, if a frequency pattern such as that shown in Figure 10.2(a) was presented, a smooth high-frequency tone was heard. However, if every second puff of air was displaced very slightly as shown in Figure 10.2(b), then a tone one octave lower [equivalent to the experience produced by a stimulus pattern like that in Figure 10.2(c)] was heard. However, upon Fourier analysis of the stimulus pattern of Figure 10.2(b), it could be shown that there was virtually no energy present at the frequency represented by Figure 10.2(c). Thus, once again there appeared to be a perceived
Sensory Coding
528
FleURE 10.2 A simple sketch (a)
(b)
(c)
illustrating the Seebeck siren effect. An equal interval series of pulsed compressed air blasts (a) sounds like a pure frequency. 1f every other blast is slightly displaced in phase as is (b), a second tone is heard one octave below the original signal. It has a tone equivalent to that produced by the pattern in (c). However, there is no physical energy at the frequency of (c), and the problem is thus raised of how a place coded system could possibly respond with the pitch of (c) to the stimulus frequency pattern of (b).
pitch associated with a virtually zero energy stimulus component. The Seebeck experiments, like the beat phenomenon, therefore, represent an apparent deviation from the basic assumption of association of a given pitch with the energy content of a given frequency component that is implicit in Ohm' s acoustic law. 3. The Missing Fundamental or the Residue. If several waves with frequencies of, for example, 500, 600, 700, and 800 Hz are mixed together, the combination produces an additional tonal experience comparable to the effect produced by a loo-Hz tone. Fletcher had originaBy reported this phenomenon in 1935 and had named it the missing fundamental. However, as Licklider (195la) points out, it is not at aB clear whether this is the correct term, for clearly both the difference tone and the common fundamental of all of these component frequencies are both equal to 100 Hz. The question thus arises: which of these two possible options-the common fundamental or the common difference-is responsible for the perception of the loo-Hz tone? Schouten (1940) repeated Fletcher's experiments with stimulus frequencies equal to 300, 500, 700, and 900 Hz and found that the perceived tone was still equivalent to that produced by a loo-Hz tone. Thus, the notion of a "missing fundamental" is the correct terminology, and this phenomenon is not a difference tone in the true sense of the word. Schouten (1938, 1940) has been among the most active investigators in pursuing the question of the missing fundamental over the years. He has coined the term "the residue" to describe the residual tonal experience of
The Neural Coding of Sensory Quality-Audition
529
a frequency fundamental whose physical energy content is zero in the stimulus, and which can exist even though supposedly canceled by a tone of equal frequency but opposite polarity. Schouten, therefore, concluded that even if the tone had been introduced by nonlinear distortion in the ear, this cancellation procedure was not able to remove its effect. This finding further suggests that the tonal experience is not linked to the energy of the stimulus frequency. 4. Tones in Random Noise. Huggins and Licklider (1951) were able to
demonstrate that even random noise, which presumably has energy distributed at all frequencies, could take on specific tonal qualities under certain conditions. The experimental conditions they used involved a frequency dependent phase shifting mechanism. A 360-deg phase shift, a delay of one full cycle, was introduced for the high-frequency components of a noise signal. For low-frequency components, no phase shift was introduced, but at some narrow intermediate range the phase shift gradually changed from o to 360 deg. The original signal was amplified and fed into one side of the pair of earphones, while the signal with this specific pattern of phase shifts was fed into the other ear. In this situation, Huggins and Licklider report that a very faint tone is heard at ab out the center frequency band at which the phase shifter shifts over from no low-frequency shift to the high-frequency phase shift. The Huggins and Licklider induced tone phenomenon occurs only below a crossover frequency of 1000 Hz and is best between 400 and 800 Hz. Fascinatingly, this same interesting phenomenon has been used by Licklider (1962) and von Bekesy (1963a) to support the opposing frequency and place theories of hearing, respectively. Miller and Taylor (1948) have also shown that if a white noise signal is interrupted at about 200 or 300 Hz, observers would also report atonal quality at about the frequency of the interruption. The interesting and significant part of all of these data is that the spectrum of white noise is continuous, with energy at all frequencies with or without the interruptions. Yet in some manner, the interruptions and phase shifts apparently are introducing some new characteristic information, which could be deciphered as tonal experiences in much the same manner as zero energy beats, or missing fundament als are heard. In sum, there are a number of phenomena of combination tones, which seem to suggest that the simple notion of a unique correlation of stimulus frequency energy and pitch experience does not hold. The data suggest that there might be some way in which the purely temporal characteristics or periodicity of the stimulus, independent of the Fourier energy components, can produce the experience of tone. As we have no ted, there is also another possible explanation, namely, that the energy at the perceived frequency is introduced by nonlinear distortions of the input signal. Although the answer to this question is primarily one that will have to be answered by empirical mechanical and neurophysiological inquiries, Sm all (1970) has summarized a considerable amount of psychophysical evidence, which suggests that the arguments for
530
Sensory Coding
the introduction of energy by nonlinear distortions cannot be supported as an explanation of pitch of combination signals. Briefly, his reasons, as paraphrased by the author, inelude: 1. Subjects are able to detect combination pitches at very low sound pressure levels, which presumably would be within the range of linearity of the mechanical parts of the auditory mechanism (Thurlow and Small, 1955). 2. If a signal is added to a mixed signal with opposite phase, but intensity equal to the combination pitch, it would be expected that the combination pitch, if areal physical entity with nonzero energy content, would be canceled. In fact, Schouten's experiments show that the difference tone is not canceled. Thus, Small fe eis that the residue is probably not due to the presence of real energy (Schouten, 1938). 3. If a second signal is introduced with a frequency quite elose to a combination pitch with zero energy such as a beat tone or a residue, it might be expected that there would be beating between the two tones. In fact, in this situation there is no perceived beating, again suggesting that no real energy has been introduced (Schouten, 1938). 4. If a second tone is introduced which would otherwise mask the combination pitch, it does not, in fact, do so. The presumption, therefore, is that the normal lateral interaction mechanisms do not come into play, because the cochlear places so stimulated are not adjacent. It is suggested, therefore, that the information about the combination pitch is carried by neurons other than the usual ones associated with that frequency (Licklider, 1954, and Small and Campbell, 1961). 5. A stimulus with a frequency corresponding to that which would produce a combination pitch can be used to selectively fatigue the region of the cochlea that would be expected to be associated with the combination pitch. In this case, however, the combination pitch is still heard, and its continued presence suggests, once again, that the information associated with the induced combination pitch is not carried by that region of the cochlea normal for that frequency. The generation of the combination pitch, Small therefore coneludes, is not due to the presence of energy at that difference frequency, but dependent purelyon the periodicity of the signal (Small and Yelen, 1962).
However, as we shall see later when we consider the physiological data, there is accumulating evidence that combination frequencies are actually observable in the cochlear microphonic and the auditory nervous response. The arguments cited by Small and these physiological findings, therefore, are apparently incompatible and cannot be reconciled at this particular stage in the history of the problem.
The Neural Coding of Sensory Quality-Audition
531
C. Masking and the Notion of Critical Bands In the previous chapter when we discussed spatial interactions, we considered much of the data concerning masking of one tone by another. There is no need to repeat that discussion at this point; however, there is a set of phenomena which are elosely related to the notion of masking to which we shall now attend. These findings deal with the concept of critical bands or bandwidths within which certain auditory phenomena remain essentially constant. It is difficult, because of the many different ways in which the notion of critical bands has been used throughout the auditory literature, to precisely define their meaning and significance other than operationally. The best way to introduce the notion, therefore, is to describe an experiment in which the critical bands are measured. The elassic determination of critical bandwidth has usually been carried out using a masking technigue as summarized by Scharf (1970). Fletcher (1940), for example, used three monofreguency stimuli in his studies of the critical bandwidth, two of which were used to mask a third, the freguency of wh ich was halfway between the two masking tones. Zwicker's (1956) more modern study of the problem substituted for the masked monotone a very narrow band of noise with its center freguency halfway between the two monotone maskers, but the findings of the experiments are, for aB practical purposes, identical. The freguencies of the two masking tones initially are set very elose to the center freguency of the noise, but in successive stages are moved further and further apart, one masking tone increasing in freguency, while the other decreases in freguency. As the freguency separation of the two masking tones increases, there is at first little change in the threshold of either the noise or the monotone, but at a given separation, the masking effect suddenly begins to decrease, indicating that the inter action is no longer as strong. The separation of the two tones at that point of sudden change is defined as the critical bandwidth for that particular center freguency. Critical bands can be measured with a number of other nonmasking technigues. For example, complex sounds made up of several freguencies do not vary in intensity as the constituent freguencies vary over the critical bandwith. However, once the mixed tones begin to fall outside the critical bandwidth, the loudness of the combination begins to increase. Furthermore, thresholds of complex signals are not affected by signals outside the critical bandwidth. Much of these data have been summarized in Scharf's chart, which is reproduced in Figure 10.3. The critical band is seen to be relatively constant in its width from ab out 100 Hz up to about 500 Hz, after which it increases up to a maximum width of about 2000 Hz when measured at 10,000 Hz. In general, the critical band data have been used as support for a place theory of auditory freguency coding. Critical bands are assumed to be explained by spatial interactions comparable to those explaining, for example, freguency discrimination (see Figure 7.6). Obviously, a critical band cannot be simply identified with the differential freguency measure, for it is 20 to 50 times as wide, but this ratio remains relatively constant as one scans the auditory freguency spectrum, and some less direct relationship may exist.
532
Sensory Coding
Zwicker et al. 5000
N ~
..
.&:
2-tone masking Phase sensitivity Loudness summation Threshold
Greenwood
Masking
2000
Scharf
2-tone masking Loudness summation
1000
Hawkins and Stevens
2.5 x critical ratio
't:I
'§
'E
lU .Q
500
....;: ~
0
200
100
50
50
50
100
200
500
1000
2000
5000
10,000
Frequency (Hz)
HGURE 10.3 A summary of the data prepared by Scharf fra m several experiments on the critical bandwidth. The source of the data is indicated in the legend on the figure (fra m Scharf, 1970, tram data by Greenwood, 1961; Hawkins and Stevens, 1950, and Zwicker as weil as tram Scharf's own data).
These, then, represent abrief sampling of some of the psychophysical phenomena of importance, which must be explained by any theory of auditory pitch. Obviously, the sampling is incomplete and cannot begin to do justice to all of the work done by so many psychophysicists in the last SOor even 100 years. However, we are fortunate to have had an extraordinarily good book become available only recently. Tobias (1970) has edited a volume entitled Foundations of Modern Auditory Theory, in which a large number of the leaders in the field have presented a set of what is undoubtedly one of the most readable and authoritative statements on these problems to have appeared in many years. The reader who would like more complete details of these and related materials. is directed to this very important treatise and especially to the work of Small (1970) and Scharf (1970) for more detailed discussion of the present material.
The Neural Co ding of Sensory Quality-Audition
533
11. THE COMPETITIVE THEORIES No discussion of contemporary auditory pitch theory can be responsibly carried out without extensive acknowledgment of the remarkable analysis of the problem by Wever (1949) in his book Theory of Hearing. Wever's book-scholarly, critieal, and erudite-effectively summarizes both the his tory and the interrelationships of the various theories of pitch perception that have evolved over the past century. Most of the discussion in the following section will be, therefore, based on his analysis. Pitch theory has in general, over the years, been characterized by a controversy over the anatomie level at whieh the tonal analysis (implied by Ohm's acoustie law) is carried out. Analysis is thought to be peripheral (cochlear) in some theories, but central in other descriptions. Peripher al analysis theories usually assurne a 1055 of temporal fidelity early in the pathway. The central analysis theories necessarily required that a reproduction of the original temporal pattern of the acoustie signal be carried in some manner to the central nervous system, where a frequency analysis can presumably be carried out. On the basis of these premises, the two main general c1asses of auditory pitch theory have come to be called the place theories (all of whieh assurne peripheral spatial analysis in the cochlea) and the frequency or periodieity theories (all of whieh assurne central analysis after transmission of signals with high temporal fidelity to the central nervous system). In the following portion of this section, we shall consider the place and frequency theories separately and trace the historieal development of each. We shall distinguish among the varieties of each theory, noting the key differences among each. A. Place Theories Although a11 place theories agree, in principle, that the main mechanism for auditory pitch encoding is a peripher al analysis of stimulus frequeneies in the cochlea, they do not all agree how this spatial analysis comes about. Place theories began to emerge almost as soon as some of the earliest notions of the cochlear anatomy were uncovered. Characteristiea11y, each theory was quite c10sely tied to some physical theory that was then of contemporary interest. Whatever the dominant current technology, someone, it seems, was always able to rephrase the auditory co ding problem in terms of an analytical mechanism based on that technology. Organ pipes, harps, elastic tubes, hydraulie waves, and so on, a11 became models of the acoustic place analyzing mechanism. Table 10.1 adapts and extends apart of a table Wever prepared to describe the various place auditory theories. The emphasis in this table is on the speeifie mechanism that has been used to ac count for cochlear place localization. Two major c1asses of place theories can be used to encompass the entire variety of historieal theories: those dependent upon resonant properties of the cochlear tissues and those dependent mainly upon the dynamies of waves in the cochlea. We shall now consider the details that distinguish between the resonance and wave versions of auditory place theory.
TADLE 10.1 AUDITORY PLACE THEORIES (ADAPTED FROM WEVER, 1949)
I. Resonance Place Theories Date
Dependent on resonance of tuned elements
Theorist
a. 1605 - Bauhin
Differentially tuned airfilled cavities of various shapes and sizes
b. 1672 - Willis
Two air-filled cavities, differentially tuned Differentially tuned elements of bony lamina Differentially tuned elements of organ of Corti Differentially tuned elements of organ of Corti Differentially tuned arches of Corti
c. 1683 - Duverney d. 1707 - Valsalva e. 1760 - Cotugno f. 1867 - Helmholtz
Dependent of
membrane resonance
Critical feature
g. 1870 - Heimholtz, as influenced by Henson and Hasse g. 1867 - Hasse
I.
1898 - Ewald
Differentially tuned transverse fibers of basilar membranes Membrane resonance properties of tectorial membrane Standing waves on the basilar membrane
11. Wave Theories of Place Ind epen d ent of cochlear mechanics
Dependent on basilar membrane mechanics
Dependent on tube mechanics of the cochlea
a. 1894 - Hurst
Traveling bulge proci'-lced by interaction of original and reflecting wave
b. 1900 - Ter Kuile
Traveling bulge produced by curtailed monodirectional waves
c. 1928 - von Bekesy
Positions of maximum amplitude of atraveling wave as a function of basilar membrane properties
d. 1946 - Zwislocki
Position of maximum amplitude of a traveling wave as a function of basilar membrane properties
e. 1931 -
Pulsations in elastic tubes
f
Ranke
1937 - Reboule
Pulsations in elastic tubes
The Neural Coding of Sensory Quality-Audition
535
Resonance Place Theories. The notion of resonance is a simple one. Each physical object has a natural frequency of vibration and will tend to vibrate at that frequency more easily than at any other. Thus, energy can be transferred from a medium to a tuned object with maximum efficiency at its resonant frequency. In many instances, this transfer of energy can occur with no serious consequences. A violin string will tend to hum along at its resonant frequency when stroked or even when it is simply present in a noisy environment. A strong tone can also, however, cause a finely tuned object to destroy itself if the energy absorbed at the tuned frequency is much greater than the limits of physical strain of the object. In a trivial case, a crystal goblet can break after absorbing energy at its re sonant frequency. With more serious consequences, bridges-particulady suspension bridges-unless carefully detuned with appropriate damping devices, can be sent into such violent oscillations that they can collapse. Perhaps one of the most striking examples of this was the Tacoma Narrows Bridge disaster in Washington, which collapsed in 1940 shortly after it was buHt. The bridge-a suspension type-hung in a windy valley, and the blowing of the wind, like the bowing of a violin string, was sufficient to make the bridge violently oscillate, and finally collapse. Similarly, stimulation with energies of mixed spectral distribution is often capable of exciting a set of tuned physical resonators to individually vibrate at amplitudes proportional to the power of each of their individual resonant frequencies. When notions of tuned resonators were formalized in the seventeenth century, it almost immediately became obvious to some of the eady students of audition that the resonance phenomena might possibly provide an explanation for some of the mysteries surrounding pitch perception in the human auditory system. Bauhin (1605) was the first to suggest that the inner ear was composed of a collection of air-filled cavities (analogous to organ pipes) of various sizes and shapes, each of which resonantly vibrated in tune with a particular frequency of airborne oscillation. Seventy years later Willis, astutely noting the absence of such a hypothetical system of cavities, suggested a modification of this hypothesis. The main change in his suggestion was that the relative amount of oscillation of only two particularly tuned resonant cavities was the key differentiating process. Unfortunately, his theory, like Bauhin's, had a serious defect. The necessary cavities simply did not exist. The first really modern resonance place theory of audition was offered by DuVerney in 1683. His idea was based on the then newly discovered fact that the cochlea had a bony shelf-the osseus lamina -extending completely along its length. This bony lamina was discovered to vary in width, decreasing with distance from the oval window. Since long strings were known to resonate at lower frequencies than short strings, he assumed (incorrectly as it turned out) that the resonators in that cochlea were arranged so that low tones were localized near the oval window. High tones were thought to be localized at the extreme apical end of the cochlea. In the eighteenth century, Valsalva and Cotugno added a most important notion. They both agreed that the resonating structure was not
1.
Sensory Coding
536
the bony lamina, but was in some way associated with the soft tissues, which were becoming ever more frequently observed as microscopic and dissection technique became increasingly efficient. Cotugno specifically assumed that the soft tissues of the organ of Corti were arranged in strands like the strings of a musical instrument and that the resonant vibrations of these strings were directly associated with specific tonal sensations. Among other important points of Cotugno's theory was the fact that he, for the first time, properly assigned the basal portion of the cochlea to high tones and the extreme apical end to low tones. Although we now know this to be, in fact, true, his idea was based on an incorrect assumption. Cotugno, still assuming that a stringlike resonator mechanism was operative and knowing that the basilar membrane is wider at the apex than at the base, fortuitously made the right decision. As we shall see, however, it is dear that the width of the basilar membrane has nothing to do with place localization, and thus, while Cotugno was right about the basilar membrane locus of low and high tones, he was completely wrong in the reasons he used to support his argument. The most famous of all of the resonant place theories was that presented by Helmholtz in two versions in 1857 and 1870, respectively. Wever's discussion of the details of the Helmholtz theory and of the manner in which it was presented is a dassic example of the supremacy of style over content, for Wever makes it quite dear that the Helmholtz theory was little more than a rehashing of Cotugno's ideas of re sonant strands in the organ of Corti. In the first version of the theory, Helmholtz, much impressed by the discovery oE the arches oE Corti, assumed that these microstructures were the resonant elements. However, shortly thereafter, Hensen (1863) showed that these units did not vary sufficiently in size to serve this role. Heimholtz, in the 1870 version of his theory, therefore, found it necessary to assurne that transverse bands of the basilar membrane, operating independently of each other, were the tuned resonators. Unfortunately for his theory, subsequent studies showed even this modification to provide an in ade qua te basis of resonant discrimination for simple physical reasons. This can be elucidated by considering the tuning formula for aresonant string, which is given by the following equation:
f=~ 2L
rr \IM
(10.1)
where f is the re sonant frequency of the vibrating string; L is its length; T is the tension on the string; and M is its mass. This means that the tuned frequency of a vibrating string can be increased either by decreasing its length or its mass, or increasing its tension. There is little reason to assume that there is much in the way of a mass difference as one seans various portions of the basilar membrane, and even then, it was realized that there is little variation in the tension across the basilar membrane from one end to another. Thus, the major portion of the frequency dif-
The Neural Coding of Sensory Quality-Audition
537
ference must be attributable to string length. Unfortunately, the width of the basilar membrane varies only from about 0.1 to 0.5 mmi thus, string length could only vary by about 5 times. Considering that the auditory frequency range varies over a ratio of almost 1000 to I, it is clear that simple vibrating string resonance is not adequate to account for the full range of tonal experiences. The notion of resonantly tuned portions of the basilar membrane as the primary analysis mechanism, therefore, did not seem to be acceptable. Subsequent workers turned to the possibility that the properties of a continuous elastic membrane, rather than a vibrating string, might be invoked to ac count for the localization differences of the auditory spectrum. In 1867, Hasse ca11ed attention to the fact that the tectorial membrane was possibly resonant, but perhaps the most interesting resonant membrane theory was Ewald's 1898 extension of the notions inherent in the vibration of a cymbal or of what had been more generally ca11ed the Chladni plate. Metal disks, when struck, vibrate in elaborate patterns of standing waves. These waves can be made visible by dusting sand on the plate. The patterns will differ for plates of different size, mass, or thickness and for plates that are damped at different points on their surface. Ewald noted that since the vibratory pattern of the plates also varied as a function of the driving frequency, that auditory place localization might be accounted for in terms of this sort of membrane or platelike resonance of the basilar membrane. Unfortunately, most of the considerations that lead to the rejection of the tuned basilar membrane " string" also lead to the rejection of this hypothesis, even though the mathematical description of a two-dimensional vibrating plate is not 50 simple and the reasons for rejecting this idea not so apparent. In sum, then, resonant place thories demand too much variance of the physical parameters of the delicate tissues of the basilar membrane to adequately explain auditory place localization. The mechanical properties of the tissue simply are not appropriate to permit the generation of highly localized place co ding on the same basis that the operation of a harp string or a Chladni plate may be explained. The historical development of the resonance place theory, however, was very important, for there was a convergence on a very important premise on the part of a11 versions-namely, that it is the location along the basilar membrane that is the key correlate of stimulus frequency and, thus ultimately, pitch. 2. Wave Place Theories. If the physics of the cochlea made it impossible
to accept any re sonant place theory, was there any way in which localization of component frequencies could be accomplished on some other mechanical basis? In the years fo11owing Helmholtz's presentations, this question was repeatedly asked and a number of answers proposed by many investigators. The general theme, wh ich ties a11 of their alternative answers together, is one that assumes that the behavior of the fluids in the inner ear, the medium through which waves are propagated, is critical. In other fluid dynamic situations, it could be shown that different wave patterns are produced by different driving frequencies. Thus, the wave or hydrodynamic properties of the fluids and the cavities of the inner ear
538
Sensory Coding
offer a possible mechanism for establishing place representation, even though more primitive ideas of local resonance have had to be discarded. All of the theories we shall discuss in this section hypothesized mechanisms for the production of different wave patterns by virtue of the dynamics of fluid-filled tubes. But all theories are not the same in one or another aspect. The difference in wave pattern has been attributed by some to the elastic properties of the basilar membrane and by some to the tubelike characteristics of the entire cochlear cavity. Other theorists have suggested, however, that the wave patterns can be explained without any reference to the mechanical properties of the cochlea, but rather by simply assuming certain temporal relationships of time-varying sinusoidal signals in any medium. Shortly after the difficulties with the Helmholtz resonance theories were noted, Hurst (1894) and Ter Kuile (1900a and b) suggested their versions of traveling wave theories. Both of their theories assumed a traveling bulge whose dynamics were independent of any of the mechanical features of the cochlea, but depended only upon the time course of the acoustic wave. Hurst suggested that when a wave of compression occurred in the air, it initiated a traveling wave moving from the oval window up the basilar membrane toward the heliocotrema. This wave was reflected from the apical end of the cochlea and, on its return, collided with one of the next cycles of the stimulus compression wave. The collision of the initial and reflected waves resulted in a bulge, which moved on the basilar membrane with its location at any given time dependent on the frequency of the stimulating wave. In fact, of course, the theory would not involve only a single bulge, but an intricate pattern of traveling bulges, depending upon the complexity of the incoming signal. Ter Kuile's idea was very similar, but it did not even involve the collision of an outgoing and reflected wave-merely the fact that the distance between a wave of compression and a wave of rarefication for a given sound frequency was dependent upon its frequency. Thus, a wave or bulge, once started, would be inhibited mechanically by its own opposite phase at times that would correspond to pi aces on the basilar membrane. The attractiveness of these theories is that they do not require any major differential mechanical properties of the cochlea beyond uniform elasticity. Ranke (1931) and Reboul (1937) have also suggested place theories that depend only on the elastic properties of the tubelike cochlea. Their theories grew out of the hydrodynamic models, which were developed in the twentieth century to describe such phenomena as pulsations in veins and arteries. Unfortunately, the cochlea is a cavity in bone, a relatively inelastic material and displays few of the properties of avein. For this reason, their hypotheses have never had a very great deal of popularity in modern theory. The nonresonant place theories that are most prominent now are wave theories in which localization is dependent mainly upon the combined physics of the basilar membrane and the cochlear fluids. Specifically, von Bekesy has proposed that limited mechanical elasticity and rigidity of the basilar membrane is sufficient to account for the formation
The Neural Coding of Sensory Quality-Audition
539
of traveling waves whose amplitudes vary as they pass down the cochlea. The point of maximum amplitude has been shown by von Bekesy to vary as a function of the frequency of the stimulating frequency in a way that explains many auditory phenomena. Because so much of von Bekesy's theory is intimately involved in direct measurements, we shall consider his ideas in detail in the next section on the biological data. As abrief prelude, it is perhaps appropriate to say that his ideas have the greatest possible popularity today. This, then, is the general his tory and a categorization of place theories, both of the resonance and wave types. We shall now turn to the other major category of auditory theories, in which the analysis is considered to be primarily central and which assume little, if any, analysis in the periphery. B. Pure Frequency, Periodicity, or Telephone Theories It is interesting to note that the group of theories that we shall now discuss arose primarily as reaction to the deficiencies of place theory in explaining a11 aspects of auditory pitch perception. The initial dates of the early presentations of the frequency theories fall in the second half of the nineteenth century-about the same time as the technology of telephonic communication was emerging and about the time that the pulsative nervous activity with varying interpulse frequency had first been observed. There was, for a11 practical purposes (according to Wever's history), no historie antecedent of frequency theories as there had been of the place theories. The reasons for this are clearly understandable in the context of the his tory of the problem. The anatomical discoveries of the preceding two centuries had repeatedly shown that the cochlea was a spatially extended organ, and the musical technology of the seventeenth, eighteenth, and nineteenth centuries was filled with instances of stringed instruments and resonant tubes. When the criticisms of the resonance theories began to be appreciated, however, some of the early workers apparently decided to bypass the issue of analysis. Many of the early frequency theorists simply ignored the problem of peripher al analysis or at least deferred it by assuming that information about the frequency pattern was transmitted relatively intact to the central nervous system. Hopefu11y, it is obvious at this point in our discussion that there is no reason why frequency reproduction would not be a perfectly feasible means of pitch encoding. Coding theory makes no apriori requirement that pitch or any other quality be ultimately encoded by some spatial localization dimension. Various states of periodic neural activity are pos sible coding schema for acoustic frequencies, even if the frequencies of neural representation have shifted several octaves from the frequencies of the acoustic stimulation. This is a valid notion as long as the transforming relations are in some way interpretable by the central nervous system. The use of the word periodicity as an alternative for the word frequency emphasizes that there is, in fact, something fundamental about the repetitive interval pattern as a candidate coding dimension for pitch that goes beyond average interval. Because of the fact that no recoding is assumed in the frequency theories beyond reproduction of the temporal
540
Sensory Coding
TABLE 10.2
AUDITORY FREQUENCY THEORIES (ADAPTED FROM WEVER, 1949) Date
Theorist
Critical Feature
1
1865
Rinne
A critique of place theory
2
1885
Voltolini
All hair cells are capable of encoding all frequencies like a telephone
3
1886
Rutherford
Direct neural time representation up to 15,000 Hz; Complex tone decoded centrally
4
1892
Ayers
Field of grain-the hair cells are freely floating
5
1895
Bonnier
Basilar membrane is the responding tissue
6
1908
Hardesty
Tectorial membrane is the responding tissue
pattern, there is probably no psychophysical means to definitively prove that a frequency theory is true in the same way that a place theory may fall or be sustained on the basis of the detectability of envelope frequeneies with no energy content. Frequency theories, therefore, are particularly dependent upon the direct neurophysiological evidence for support or disconfirmation. Von Bekesy (1963a) also notes another important point. Classic frequency theories, influenced by the then emerging technology of the telephone, had two separate and independent postulates. First, the general idea that the frequency pattern of the acoustic stimulus was preserved in the neural signal was axiomatic. The second postulate, modeled on the diaphragm in the mouthpiece of the telephone, asserted that all portions of the basilar membrane oseillated in unison and that all portions, therefore, have the capability of encoding all frequeneies of acoustic stimulation. Direct observations, mostly carried out by von Bekesy himself, refute this second postulate, but do not necessarily force us to reject the first one, and it is that notion rather than the second that shall be the major theme of the brief his tory of frequency theories that follows. Once again, our guide in this discussion must necessarily be Wever's (1949) extraordinarily well-written and comprehensive book. Table 10.2 adapted from Wever (1949) lists the more significant,of the pure frequency theories that have been proposed over the years. Most frequency theories differ only in certain minor anatomical and functional assumptions. Rutherford, the best known of the frequency theorists, ex-
541
The Neural Coding of Sensory Quality-Audition TADLE 10.3 COMBINED AUDITORY PLACE AND FREQUENCY THEORIES (ADAPTED FROM WEVER, 1949) Date
Theorist
Critical feature
1
1896
Meyer
Nonlinear elasticity of basilar membrane
2
1918
Wrightson
Summation of input signals
3
1930
Fleteher
Duplex theory
4
1949
Wever
Volley theory
pressed the commOn key idea when he postulated nerve action potential frequencies of up to 14,000 Hz. Ayers, Bonnier, and Hardesty, three of the less well-known frequency theorists, differed from One another mainly in emphasizing the role of the hairs, the basilar membrane, and the tectorial membrane, respectively, as the sensitive medium, but a11 agreed that the nervous activity essentia11y reproduced the temporal pattern of the stimulus and that this pattern was transmitted to the central nervous system, where it was analyzed by undefined mechani~ms. It is, in fact, this basic premise concerning the temporal properties of individual neurons in the frequency theories, an axiom on which a11 of the theorists agreed, upon which a11 of them collectively fall. Early in the twentieth century with the development of the new electronic measuring techniques, it was clearly and definitively established that the transrnitted spiking rate of individual neurons never exceeded 1000 Hz and that direct reproduction of any frequency greater than that by a single neuron was, therefore, impossible. C. Combined Place and Frequency Theories The individual difficulties with some of the early pi ace theories and the early frequency theories led some workers to the notion that a combined place and frequency theory might be better able to serve as a model of auditory pitch encoding. Wever (1949) also relates the history of these combined theories. We have summarized his discussion in Table 10.3. Wrightson and Keith (1918), for example, had proposed a theory, which was based essentially on the wave mechanics of a traveling bulge, but which assumed that acoustic frequencies were represented at specific frequency dependent places on the cochlea by nerve impulse firing rates. The spatial distribution of frequency dependent information was not, however, in his theory a result of any mechanical analytical process, for according to Wrightson, the fluids of the inner ear were a11 incompressible, and a11 locations were capable of responding equa11y to each input
542
Sensory Coding
frequency. The spatial localization was thought to be due, rather, to a superimposition at any point of the activity produced by a given input frequency with the activity produced by any other frequency. Thus, complex incident waves produced patterns of neuronal activity that were rapid at some critical places and less rapid at others. Place localization in Wrightson's theory was, therefore, irregular in that there was no regular sequence of cochlear locations associated with the auditory spectrum. The specific locations of maximum activity, however, were the key correlates of input frequency. Unfortunately, his theory assumed not only that there was a one-for-one correspondence between stimulus and neural frequeneies, but even worse, that each cycle of the stimulus produced four nerve action potentials. Thus, the problem posed by the inability of the neuronal tiring rate to keep up with the stimulus frequencies was multiplied rather than simplified and the incompatibility with the neural data accentuated. Meyer (1896) also used a hydraulic wave model to describe how different places on the cochlea responded to different frequencies of acoustic stimulation. Meyer was also particularly interested in showing how these different places of activation interacted mechanically on the surface of the basilar membrane to produce combination tones. Nevertheless, his theory also specifically states that there is a one-to-one correspondence between the neural response frequencies and stimulus and is thus, in major part, a frequency theory and is, therefore, subject to the same difficulties as all of the rest. Perhaps the first of the truly modern combined theories of auditory quality was that proposed by Fletcher (1930). Fletcher was among the first, if not the first, to explicitly suggest that there is both frequency and place coding in the cochlea for different ranges of the stimulus frequeneies. He stated, primarily on the basis of physical analogies and masking data, that high tones were more likely to be represented by place localization. Fletcher's conception of the mechanism for a place localization was resonance in a manner very similar to the original Helmholtz harp string, but slightly different in formulation. The main difference was that the length of the resonant elements was supposed to occur in the tuning equation as a square rather than in the linear form of Equation (10.1). The reasoning for this is somewhat obscure, but to quote Fletcher directly: Most authors who have tried to make a simple analysis of this problem have assumed that the frequency of resonance is inversely proportional to the length, w, of the cross fibers, but this analysis makes it inversely proportional to w 2 • This is because it was assumed that the effective mass due to the vibrating liquid, which is associated with a small pOT1tion of the basilar membrane, decreases as w decreased. In other words, the effective vibrating volume is just sufficiently wide to cover the basilar membrane. (FLETCHER, 1930, p. 319.)
The main advantage of a system dependent upon the square of the length of the harp string, according to Fletcher, was that it allowed one to take
543
The Neural Coding of Sensory Quality-Audition 120
:c '1J
Space patterns tor pure tones
125
= ~
75
100
~
c: o ';:;
-3
80
E
4000
bl
80
o
5
2000 10
15
250
500
1000
20
25
30
35
Distance trom oval window
FIeURE 10.4 The narrow (and now known to be incorrect) frequency turn-
ing curves of mechanical displacement along the basilar membrane according to Fleteher, 1930.
advantage of the known 5 to 1 variation in the width of the basilar membrane to give a variation in tuning sufficient (in conjunction with an assumed 10 to 1 variation in basilar membrane tension from the basal end to the apex) to give a differential range of resonant oscillation varying from 86 to 16,000 Hz. Fletcher's theory assumed very narrow bands of activities produced by these re sonant mechanisms as shown in Figure 10.4, although his theory did also allow a considerable broadening of this area of activity as the intensity of the stimulus was increased. He also assumed a sort of primitive phase locking in which the nerve impulse frequency followed the input signal oscillations, but only up to some maximum frequency. It is explicit in Fletcher's writing that this frequency following was also to be considered a code for low-frequency sound signals, and his theory, therefore, represents a historical precedent in the formulation of a duplexity theory of pitch encoding. The most significant attempt to salvage some sort of temporal coding for stimulus frequency is the combined place and frequency theory known as the volley theory, which was presented by Wever himself in his 1949 book. Wever's ideas had a number of features that have been persistent in modern thinking, but also several that are difficult to maintain in the light of recent evidence. Briefly, Wever's theory was that both resonance place and frequency types of coding were to be found in auditory mechanisms. He, like Fletcher, supported the very important notion that the frequency theories were best able to handle some of the low-frequency signals, while place principles seemed best able to account for higherfrequency discriminations. This is an idea that has become the backbone of modern theories, as we shall see later in this chapter. On the other hand, to extend the range of application of the frequency principle, Wever suggested an idea, which he called the volley principle. The basic idea of the volley principle is shown in Figure 10.5. This hypothetical synchronization of nervous activity in parallel fibers is spe-
544
Sensory Coding
Fibers Fibers
Fiber a Fiber b Fiber c Fiberd
Fiberd
Fibers a - e combined
FICURE 10.5 A diagram demonstrating the basic premise of the volley theory. The motion is that a group of nerves, none of which is individually capable of firing at the frequency of a sound stimulus, can collectively convey the necessary frequency information (from Wever, 1949).
cifically invoked to deal with the problem of how the limited frequency response range of a single auditory neuron could possible represent the very high frequencies at the middle and upper portion of the range of acoustic perception. Wever suggested that if one neuron could not do the job, perhaps several working together might be able to do so. Thus, in this figure, we see how a group of five neurons, each capable of firing only once every millisecond, collectively might be able to fire at a 5-kHz rate. Originally some critics thought that this level of synchronization was simply asking too much of the nervous system, but recent work (Rose, Brugge, Anderson, and Hind, 1967) suggests that these early critics were too modest in their estimate of the synchronization capabilities of the acoustic nerve and that, in fact, such volleying may actually be occurring. It now seems as if the major issue is to determine the range in which volleying can work and the upper limits of this type of frequency encoding, rather than to determine if it does work at all. This, then, concludes our brief survey of the classic theories of auditory quality coding. The theories, historically, can be seen to be characterized by either the place or the frequency representation of different acoustic frequencies, and most recently by a trend which seems to ration-
The Neural Coding of Sensory Quality-Audition
545
alize the two approaches. At this point, it is necessary to turn our attention from theory to neurophysiological data, for in the past two decades a particularly high quality of experimental evidence, which is directly relevant to the neurophysiological postulates of the dassic theories of audition, has been accumulating. III. THE BIOLOGICAL DATA Classical auditory theories differ somewhat from dassic theories of visual quality in that they are, in general, much more specific about the neurophysiological mechanisms of nervous transmission. Specific assumptions have been made in almost all instances concerning the details, not only of the receptor mechanisms, but also of the patterns of neural encoding of the messages ascending toward the central nervous system. The net result of this orientation is that many features of the auditory theories are directly subject to empirical testing. We shall follow the same sequence used in the discussion of the visual modality, namely, to look at the encoding processes at various stages of the ascending auditory pathway. In this way, it is hoped that we shall be able to resolve some of the controver sial issues raised by the psychophysical data and either reject or confirm some of the specific neural postulates of each type of theory. In the following section, we shall first examine wh at is known of the mechanical action of the cochlea in order to evaluate the various notions of cochlear function and codirig and then the neural response patterns. We shall then consider the various coding schema used at higher levels of the pathway in order to evaluate wh at spatial and temporal parameters seem to be called into action at these levels to represent auditory quality.
A. The Cochlea 1. The Wave Mechanics of the Cochlea as Elucidated by Direct Visual Observation. Although it is quite certain that models and theories and indirect, but suggestive, findings are important in the development of Our understanding of the various mechanisms of the auditory system, it is also quite dear that direct observations of the mechanical action of the cochlea would be of the greatest importance in selecting among hypothetical alternatives. It is also quite certain that the scientist almost single-handedly responsible for the development of the direct observation methods and the elucidation of the actual physical response in the cochlea is Georg von Bekesy, who in 1961 received the Nobel Prize, primarily for his contributions in this area. Although others have contributed to this Held in recent years, von Bekesy's papers on this topic, which began in 1928, are still the best contemporary general statements on the mechanical and hydrodynamic aspects of cochlea function. Fortunately for the inter es ted student, all of von Bekesy's papers up to 1960 have been collected and translated and have been published as a single volume (von Bekesy, 1960). In the early 1920's, von Bekesy had turned his attention to the problem of the cochlear localization of sounds. His interest in this problem and the
546
Sensory Coding
publication of his theories and findings continued for the next three decades. The primary anatomical technique he used involved the decalcification of a portion of the temporal bone removed from a fresh cadaver. A dental-type grinding tool was then used to remove a small portion of the bony wall, exposing the apical end of the basilar membrane. Because of the entrance angle in his early studies, only about one-third of the basilar membrane was usually brought into direct view by the technique. Tiny carbon or alumina particles suspended in a saline solution were then flushed into the cochlea and allowed to deposit themselves on the basilar membrane. These deposited particles made c1early visible these otherwise nearly transparent tissues. The action of the basilar membrane could then be direct1y observed with a microscope as a function of the frequency or amplitude of the stimulating tone produced by either an earphone or a tuning fork. One of von Bekesy's first observations with this preparation was that there were no standing wave patterns as had been suggested by Ewald. There were, instead, patterns of traveling waves initiated, which moved from the oval window toward the heliocotrema at the apical end of the cochlea. By using stroboscopic illumination, von Bekesy was able to observe these traveling waves in what was effectively slow motion and observe their velocity direction and amplitude as the stimulus characteristics varied. Another early observation was that there were observable displacements at the apical third of the cochlea only when lower-frequency stimuli were used. Frequencies high er than 300 or 400 Hz apparently were not capable of producing deformations anywhere within the region of the basilar membrane that von Bekesy could see. Thus, the notion that the apical end is the region of localization of the low-frequency tones was supported and at least tentative support given for the suggestion that high-frequency tones might be localized near the oval window. Another early and exceedingly important observation in this germinal study was that the traveling wave did not have the same amplitude throughout its visible course. Rather, there appeared to be a gradual increase in the amplitude of the wave of distortion on the basilar membrane as it came into view, and then a decrease as it moved past some point of maximum amplitude. Figure 10.6 shows the amplitude of the wave of distortion at two different positions of the wave produced by a 200-Hz signal as it moves along the basilar membrane. Note particularly the outer dashed line, which is the envelope of the maximum amplitude of the wave. This particular frequency peaks at a distance of 28 mm from the stapes. Other frequencies were seen to peak at other locations. The substance of this very important contribution by von Bekesy is that the maximum amplitude of the traveling wave occurs at a different locus as a function of the frequency of the acoustic stimulus. In later years, he was able to extend his direct observations of the cochlear vibrations by cutting out other small portions of the bony wall in different preparations and thus determine the point of maximum vibration up to about 1600 Hz. Figure 7.7 presented the results of this series of experiments. These data indicate the point on the cochlea at which the maximum amplitude of
The Neural Coding of Sensory Quality-Audition
547
Distance
20
22
24
26
28
30
32
Distance trom stapes (mm)
FICURE 10.6 A drawing showing the observed amplitude and position of a trave/ing wave produced by a 200-Hz signal at two different instants as indicated by the solid and inner dashed lines respectively. The outer dashed line indicates the envelope of maximum amplitudes for all positions. The maximum amplitude of this outer enve/ope occurs at different locations for the different frequencies (fram von Bekesy, 1947).
vibration occurred for a given set of test frequencies. Figure 10.7 is a more detailed presentation of the overall amplitude pattern as a function of cochlear locus of the traveling wave for seven different frequencies. In this modified way, then, exceedingly compelling support is given to notions of place encoding of all stimulus frequencies above 50 Hz. Another important finding from these direct observations of cochlear vibrations was von Bekesy's demonstration that the pattern, wherever maximized and for whatever frequency, is extremely broad. There was never any sharply defined region activated by even the purest monofrequency. This is a critical observation, because it stands in sharp contrast to the fine frequency discriminative ability displayed in psychophysical tests. Furthermore, it also contrasts with many of the earlier theories of cochlear localization that proposed sharp tuning due to resonance. At first glance, the psychophysical and neurophysiological sets of data seem almost antagonistic, for von Bekesy reports that stimuli differing in frequency by as much as 20 to 40 Hz did not exhibit noticeably different patterns of basilar membrane response. Yet subjects can discriminate signals that differ by as little as 2 or 3 Hz in psychophysical experiments. Over the years this discrepancy has led a number of workers, including von Bekesy, to speculate about mechanisms that sharpen the response of the system to allow for better discrimination. Usually these discussions revolve around the possibility of neural interaction of a spatial sort like those found in the visual system that we have already discussed in Chapter 7. Von Bekesy (1967) has published an entire volume dealing
548
Sensory Coding
25 Hz
50 Hz
100 Hz
CI>
...
"0
:::l
:a.E ca CI> > 'P ca
200 Hz
a;
a:
400 Hz
800 Hz
3 1600 Hz
o
10
20
30
Distance from stapes (mm)
o
HGURE 10.7 A set of seven amplitude envelopes similar to the one shown in Figure 10.6, showing the difference in the localization of the maximum as a function of the stimulus frequency. Note that the peak amplitudes produced by the higher frequeneies tend to be located more toward the oval window and those produced by the lower frequeneies near the heliocotrema (from von Bekesy, 1949b).
with the problem of neural inter action and its general relevance to many different perceptual phenomena. But the reader is also reminded of the absence of evidence supporting central sharpening. Tonndorf (1970), however, points out that there is some reason to believe that there may be some sharpening in the mechanical system itself, which would explain why even first-order neurons (see below) appear to have narrower tuning curves than does the cochlea mechanical apparatus. It should be no ted, at least in passing, that von Bekesy's experiments have been criticized for a number of different reasons. In aseparate section of his 1960 book, he has considered these issues and replied to them. First, some critics had noted that the amplitude of the stimuli he used had been very high and suggested that perhaps the responses at these stimqlus amplitudes gave a distorted picture of what was actually going on. Von Bekesy's rebuttal centered around the fact that over the range of stimulus intensities that produced any observable response, the amplitude of the
The Neural Coding of Sensory Quality-Audition
549
response was linear. He feIt that it was, therefore, highly unlikely that at lower intensities, the stimulus would become nonlinear and thus, in any fundamental way, be different from that observed at his experimental levels. A second criticism was that there might be some change in the elasticity of the basilar membrane after the death of the experimental subject. Von Bekesy noted that there was a 100 to 1 difference in the stiffness of the basilar membrane from one end of the cochlea to the other and that, therefore, any small postmortem change in this critical property could have little effect on the general outcome. Finally, the criticism was made that an opening in the cochlear wall might change the tube properties of the cochlea. However, von Bekesy was able to show that this was not the case-only the amplitude of the response seemed to be affected by the hole and not the shape of the response. In discussing these comments, von Bekesy also reported another important discovery. The specific physical parameter of the basilar membrane, which is responsible for the production of traveling waves and the specific shape of these waves, is the stiffness of the basilar membrane. It is upon this mechanical property that place localization on the cochlea is dependent, and it is the wave action on the more or less stiff sheet of the basilar membrane that must take the place of tuned resonance in any modern place theory of auditory quality coding. We shall discuss this point further later in this chapter. Other workers have recently extended von Bekesy's direct observation technique to related problems. Among the foremost of these scientists to apply the technique has been Tonndorf (1962), who cosidered the problem of the auditory analysis of mixed acoustic frequencies. He introduced a complex wave consisting of a combination of a 50- and a 100-Hz tone after first stimulating the cochlea with the two tones separately. Figure 10.8 (a) shows the displacement pattern along the basilar membrane for the two tones individually. Each has a maximum amplitude in approximately the locus observed by von Bekesy. When a mixture of the two tones was introduced into the ear, however, the resulting displacement function showed a bimodal"humping" consistent with a theory of simple Fourier analysis into component frequencies by the mechanical action of the cochlea. This is shown in Figure 10.8 (b). Thus, there appears to be a physical correlate of Ohm's acoustic law visually observable in the shape of traveling waves on the basilar membrane. 2. The Wave Mechanics of the Cochlea as Elucidated with the Mossbauer
Effect-A Promising Technique. When a radioactive source moves about in space, there is a shift in the frequency of any radiation emitted by that source. This Doppler shift of frequency can be used as a remarkably sensitive means of recording very small velocities. The sensitivity is such that velocities as small as a few feet a year, for example, can be measured with appro!,riate gamma ray detection and measuring equipment. The shift in the gamma radiation frequency as a function of the velocity of the source is known as the "Mössbauer effect" and has been applied, because of its exquisite sensitivity, to the measurement of the motion of the basilar
550
Sensory Coding
(a) Single events
50 Hz 50 HzHz
(b) Compleox event
20
30
Distance from stupesstupes
40
50
FIGURE 10.8 Envelopes of traveling waves. (A) shows the envelopes for a 50- and a 100-Hz tone, respectively, while (8) shows the envelope of the combined stimulus. This form of response is considered to be evidence for acoustic frequency analysis at the level of the basilar membrane, since two peaks corresponding to the componeOnts of the combined wave appear in the latter's amplitude envelope (from Tonndorf,1962).
membrane. Johnstone and Boyle (1967) seem to have been the first to apply the technique in a preliminary study, but a much more complete experiment resulting in a much more comprehensive set of data has been reported by Rhode (1971). The general technique described by Rhode involves the surgical opening of the cochlear cavity and the placement of a small cobalt-57 radioactive source on the basilar membrane not far from the round window. The site of the observation in this case was, therefore, quite different from that used by von Bekesy, most of whose observations were made near the apical end of the cochlea. All of the measurements made by Rhode on ab out 20 squirrel monkeys unfortunately were recorded from almost exactly the same single point on the basilar membrane. The velocity of the movement at this point was directly measured with the Mössbauer apparatus and, from this basic measure, the amplitude and phase distortions of the motion calculated. The fact that the measurements were made at only one place on the basilar membrane (in addition to comparison measurements on the malleus of the middle ear) restricts the general importance of Rhode's application of the Mössbauer technique to the problem under discussion. The data obtained are clearly limited to the tuning of that single point; it remains for some future investigator to perform the tedious and difficult task of sampling the spatial distribution of the cochlear response at a very large number of positions to give the fuH and complete story of cochlear localization. However, it is clear that some of the more important features of the von Bekesy theory of traveling wave localization are confirmed in Rhode's experiment. Rhode determined the amplitude of the response to frequencies varying from 1 to 10 kHz for both the maHeus and the basilar membrane, and then calculated the ratio of the amplitudes to give a more accurate description of the tuning of the basilar membrane. The ossicular chain, of course, has its own fundamental mechanical properties, and it is the transfer of energy between ossicles and the basilar membrane that criti-
551
The Neural Coding of Sensory Quality-Audition 1. Malleus
Vi
c:
e 0
·E
0.1
cu
"0 ::l
FICURE 10.9 Amplitude of vibration as a function of stimulus frequency measured with the Mössbauer effect both at the malleus and at a single point on the basilar membrane near the oval window (fram Rhode, 1971).
Basilar membrane
.!:
C.
.,E
.,cu
0.01
~
80 db SPL
C1.
0.001 1.
10
Frequency 1kHz)
cally defines the response pattern of the basilar membrane, not the relation between the amplitude of oscillation of the basilar membrane and the extern al acoustic stimulus; that latter relation being confounded by the dynamics of the ossicles. Figure 10.9, for example, shows the absolute amplitude measurements for both the malleus and the basilar membrane. When stimuli of varying frequency are presented, clearly the response function of both was broadly tuned and quite unlike the sort of curve one would expect from either the known psychophysical or neurophysiological data. When the response functions of amplitude of the basilar membrane response are divided by that for the amplitude of the response of the malleus, however, a transfer ratio, which more clearly depicts the specific differential sensitivity of the basilar membrane, is obtained. Figure 10.10 is an example of such a ratio tuning curve. This curve clearly reflects the peaking of the tuning curve at about 8 kHz. Thus, this portion of the cochlea seems to be a region that is primarily associated with the place representation of 8-kHz tone, a finding whieh once again is consistent with the notion that high tones are spatially localized at the basal end of the cochlea. An additional important observation contained in the graphs shown in Figure 10.10 is that the ratio curves do not perfectly overlap for the three different sound pressure levels used. This is an important observation, for these nonconstant ratios suggest that there is, indeed, a nonlinearity of the transfer function of the basilar membrane. This is extremely important, for it points to the anatomie level at whieh some of the combination tones might be introduced into the auditory message, namely, the initial mechanieal cochlear analysis. Another interesting point noted by Rhode is that the threshold for auditory signals has an amplitude that is only a small fraction of an angstrom unit, thus confirming indirect calculations made many years earlier. Clearly, the Mössbauer technique is going to be extremely important in helping to definitely resolve the problems of cochlear mechanies.
552
Sensory Coding 30
SPL 70dB BOdb 90db
...c:
-30 -30
other
20
GI
E
8co Q. .,
~
GI
E GI
'ii co
., 'ii .,
c: Q.
GI
f!
~
1i
:!2
..
0
';:;
co
GI
"0
:::l
:::l
,
::c
60 80
S, S, Left Exposure 4000 Hz for 4 min at 110, 120,and 130db Average losses: 2,3,23,0, and 41,0 db
.a, ~
E
~ w
(d) 250
500
1000
2000
4000
8000
Frequency (Hz)
FlCURE 10.11 Hearing losses that are produced by sustained exposure to pure tones. (a) Shows the normal audiogram before and after the test series. (b), (e), and (d) show the effects of stimulation with 500-, 2000-, and 4000-Hz stimulation at the levels and for the duratians indicated. Low-frequency stimuli tend ta impair hearing all across the spectrum. High-frequency stimuli tend only to impair higher-frequency hearing (tram Davis, Morgan, Hawkins, Calambos, and Smith,1950).
554
Sensory Coding
quency half an octave higher. It may be hypothesized that these effects are correlated with the direction of movement of the traveling curve, whieh moves from the high tone encoding region at the basal end of the cochlea to the low tone encoding region at the apieal end. Presumably, a very intense sound frequency would deactivate all more basal portions of the basilar membrane, in addition to the portion at whieh it produces its peak amplitude of deformation. A similar finding obtains when selective cochlear destruction is carried out with drugs, whieh act selectively on the cochlea. Stebbins, Miller, ]ohnsson, and Hawkins (1969) used the antibioties kanamycin and neomycin, whieh are known to produce specifie destruction of the hair cells, to achieve selective cochlear destruction. The destructive action of these drugs begins at the basal end of the cochlea and gradually, with continued administration over aperiod of months, works its way progressively toward the apex. Stebbins and his colleagues first measured the normal audiograms of five rhesus monkeys with some ingenious animal psychophysieal procedures (see Stebbins, 1970), and then administered drug doses for periods varying from one to six months. After a given period of time, during whieh progressive cochlear degeneration occurred, the animal was retested and then immediately sacrifieed and mieroscopie examinations made of the then existing cochlear structure. Missing axons in the cochlear nerve, degeneration of hair ceIls, and even massive destruction of the organ of Corti were observed by Stebbins and his colleagues after one-to-six-month periods of antibiotie administration. Associated with this neural deterioration were deficiencies in the audiogram. These deficiencies were characterized by sharp cutoffs of the highfrequency components, but little 1055 in the low-frequency regions. The amount of high-frequency hearing 1055 increased with the passage of time until there was a broad range of tonal deafness after ab out six months of antibiotie administration. Figure 10.12 shows the general trend of increasing hearing 1055 as a function of frequency for increasing drug administration periods. Associated with these relatively sharply cutoff hearing losses were equivalent sharp cut offs of the limits of hair-cell destruction as determined in the postmortem. Figure 10.13 shows the anatomie destruction as measured by the number of hair cells remaining at various positions along the basilar membrane after various periods of drug administration. Obviously, the one animal to whom neomycin had been administered for a short time had been far more grossly affected by that treatment than had those animals who had received kanamycin, a less toxie antibiotie. The general results of this experiment, therefore, are consistent with those of the adaptation experiments described above. Small lesions at the basal end of the cochlea seem to be associated with the 1055 of only veryhigh-frequency tonal coding. As the lesion increases in size, including more and more of the basal end of the basilar membrane, there is a gradual increase in the amount of 1055 of lower-frequencies hearing 1055 with ever 10wer frequencies dropping out in sequence. Another way of experimentally producing controlled lesions along portions of the basi1ar membrane is "simply" to insert a probe and
555
The Neural Coding of Sensory Quality-Audition
M-14 Left ear 1.M Kanamycin
90 .I:l '0
100 mg/kg/day
.: 70
.......
for 180 days
~ 50
'0
'0
-=f30 -= I-
15 kHz
11 kHz
4 kHz
8 kHz
2 kHz
10 -10
o
10
20
30
40
50
60
70 Days
80
90
100
110
120 130 200
FlGURE 10.12 Hearing losses as a function of duration of antibiotic treatment for four different frequencies. Obviously, high-frequency sensitivities are lost earlier than low-frequency ones, and this loss is correlated with destruction of the basal end of the organ of Corti in the earlier stages of drug administration (trom Stebbins, Miller, lohnsson, and Hawkins, 1969).
mechanically destroy these delicate tissues. This sort of experiment is, in reality, quite technically difficult to execute, but in some instances it can be done. For example, auditory effects of mechanical destruction of portions of the cochlea have been reported by Gross (1952) and Schuknecht and Neff (1952). Each of these experiments involved the 1055 of increasing amounts of the cochlea in aseries of guinea pigs starting at the apex, the most 5urgically accessible region. Both found that increasing the area of destruction increased the bandwidth of low-frequency tones that were no longer perceived, and involved more and more high-frequency tones according to behavioral tests. Guinea pigs are not people, however, and human data are always of prime interest. Unfortunately, it is not ethically possible to surgically manipulate the living human cochlea just for the purposes of an experiment of this sort. Therefore, what data are available have always depended upon accidental cochlear injuries, as measured with audiograms taken during life and subsequent postmortem examination of the tissue 1055. Bredberg (1968) has recently published a mono graph, in which a series of such studies was carried out. He discovered that, in general, degeneration at the basal end of the cochlea was associated with severe hearing 1055 for speech sounds, while degeneration at the apical end only modestly affected speech perception. This is probably due to the special importance of middle and high frequencies to speech perception and also supports the notion of high-frequency localization at the basal end of the cochlea. A final type of selective deactivation experiment exaggerates the temporary hearing 1055 produced by a moderately intense adaptation tone. In this case, the ear is flooded with very-high-intensity sound waves of a single frequency for an extended period of time, and the experimenter then
Sensory Coding
556
100
2000 2000
80 01
.S c:
'(ij
E
~ .!!!
60
M-21 R
M-14 R
nm all freq.
"8 "-
M-16 R
M-13 R
km
km
2000
8000
nm
15,000
'(ij
....0 ....c: ~
40
.,~
0..
20
0 25
20
15
10
5
0 Base
Apex Length of basilar membrane - mm from basal end
FICURE 10.13 Curves showing the degree of damage to the inner (IHC) and outer (OHC) hair cells as a function of the type of antibiotic and the duration of the period over which it is administered. Monkeys 16, 13, and 14 were given kanamycin sulfate for 5, 28, and 180 days, respectively, resulting in increasing amounts of damage. Monkey 21 received neomycin sulfate for only 15 days, but in this brief time had much more severe damage to the organ of Corti than that caused by even longer periods of administration of kanamycin sulfate. This was particularly true for the inner hair cells, which were almost totally destroyed in this brief time (from Stebbins, Miller, Tohnsson, and Hawkins, 1969).
examines the anatomical destruction produced by the various frequencies. This sort of experiment was initially performed by Smith (1947), who found that low-frequency tones produced broad bands of cochlear destruction near the apex, while high-frequency tones produced narrower bands of destruction closer to the oval window. Unfortunately, for the simplicity of the story we want to tell, when the cochlear microphonic correlates of such destruction were measured (Smith and Wever, 1949), the effect of any destruction stimulus frequency was found to be very widely spread across the auditory spectrum. But as we know, the microphonic is picked up simultaneously from widely spaced portions of the cochlea. Low tones, however, did produce a generally greater deficit than did
557
The Neural Coding of Sensory Quality-Audition 300 Hz
0
2
4
6
8
10
12
14
16
1000 Hz
FIGURE 10.14 Charts show2 0 10 12 8 4 6 ing the degree of damage to the organ of Corti from sustained high-intensity sounds 5000 Hz at the frequencies shown. Note that low frequencies 0 2 4 6 8 10 12 damage the apical end of the cochlea in broad regions, while high-frequency stimuli 10,000 Hz selectively damage narrow 0 2 4 6 8 10 12 regions doser to the basal end (trom Smith, 1947). Distance from oval window (mm)
14
14
16
high-frequency tones. Figures 10.14 and 10.15 show that range of anatomical destruction and the sort of irnpairment of the cochlear rnicrophonic produced by various stimuli. All of these data are characterized by a single theme: low tones appear to be represented by activity predorninantly broadly localized at the apical end of the cochlea, while high-frequency tones seem to activate 40
40
.0 "tl
20
10,000 Hz
5000 Hz
1000 Hz
:i
o
...J
30 300 Hz 40 40 100
1000
10,000
Frequency (Hz)
FIGURE 10.15 The effect of sustained high-intensity tones on the production of the cochlear microphonic. Though there is no solely local decrease in the sensitivity, low tones do produce a greater impairment than do the high tones, a fact which is in agreement with the wider spread of low-frequency responses and the widely dispersed source of the microphonic (from Smith and Wever, 1949).
558
Sensory Coding
narrower portions of the cochlea near the basal portion. The data do not, on the other hand, provide a strong argument for a unique localization of speeific frequeneies at speeific places. Rather, all of it generally supports the notion that wide regions of cochlear activity are produced by even pure stimulus frequeneies.
The Cochlear Microphonic and Localization. In our discussion of the auditory transduction mechanism, we have already introduced the cochlear microphonic as a possible candidate for the role of the receptor potential. Over the years, a number of investigators have suggested that it might also playanother important experimental role, even if ultimately it is shown not to be the actual receptor potential. The general idea was that a small, but not microscopic, electrode, if placed on the basilar membrane, might pick up potentials only from that immediate region. Thus, it should be possible to measure tuning curves and other response functions of local points by using the cochlear microphonic as an indicator (a useful sign even if it is not a code). Such data would contribute greatly to our knowledge of the localization of acoustic frequeneies on the basilar membrane. One of the first studies that interpreted the cochlear microphonic in this way was reported by Tasaki, Davis, and Legouix (1952). They used a differential electrode configuration to show that electrodes placed in the first basal turn of the cochlea picked up microphonics assoeiated with all frequeneies, while at the third turn only low frequeneies produced recordable responses. But there is some controversy surrounding this approach. For example, Dallos, Schoeny, and Cheatham (1971) have shown that microphonics recorded in the cochlea contain all frequencies at all electrode positions. The current view is that the cochlear microphonic response recorded with a gross electrode is not the restricted output of a very small region of the cochlea, but rather is the cumulative response of a rather broad region. In fact, Dallos and his co-workers believe that some portions of the cochlear potentials that are picked up at one end of the cochlea may actually come from the other end. They have, however, suggested a means of discriminating Iocal from distant cochlear microphonics, based upon a differential recording system in which two electrodes are placed in the scala vestibuli and scala tympanni, respectively. Since it had been definitively established that the cochlear microphonic is generated in the basilar membrane, recordings from the two electrodes on either side of the membrane should be 180 deg out of phase. OnIy those signals that have a perfect 180-deg phase difference can be assumed to be associated with local activity, while signals with other degrees of phase shift would be considered to be generated at some distant locus. Figure 10.16 (a), (b), and (c) shows the phase angle response data as a function of frequency for electrodes positioned at various positions (one at each of the first three turns) of a guinea pig's cochlea All three of the response curves show some evoked microphonic potentials across the entire stimulus spectrum utilized. Not all responses, however. have the IS0-deg phase angle characteristic of locally generated signals. These curves may be interpreted as indicating that all frequencies includ-
4.
360
c;, .,
.
~
., .,
180
~
Cl..
>
11
In (a) 0 0.1
0.2
0.5
2
2
5
10
2
5
10
Frequency (kHz)
360
c;, ., ~
.,.,.,
180
~
Cl..
>
11
I-
CI)
o (b) 0.1
0.2
0.5
2 Frequency (kHz)
180
c;, .,
.
., .,
~ ~
Cl..
>
0
11
I-
CI)
-180
(C) 0.05
0.1
0.2
0.5
2
2
5
10
Frequency (kHz)
nCURE 10.16 Phase angle measurements of the cochlear microphonic between the scala tympani (ST) and the scala vestibuli (SV) as a function of stimulus frequency. (a) As recorded in the first turn of the cochlea of five guinea pigs. (b) As recorded in the second turn of the cochlea of three guinea pigs. (c) As recorded in the third turn of the cochlea of three guinea pigs. See text for full details (from Dal/os, Schoeny, and Cheatham, 1971).
560
Sensory Coding
ing the very highest used (19 kHz) produce cochlear microphonics in the first turn. In the second turn, only microphonic potentials up to about 3000 Hz are produced locally, while at the third turn only very-Iow-frequency tones produce local cochlear microphonics. These data, therefore, also speak to the same point made in several different ways in this section-namely, in general the cochlear responses produced by low tones are localized at the apical portion of the cochlea, while the responses produced by high tones are localized mainly at the basal end and central region of the cochlea. In sum, then, some form of cochlear spatial localization of stimulus frequencies appears in the findings obtained with almost an of the different experimental methods we have discussed. Implicit in an these findings, therefore, is a very strong support of place theory at the cochlear level for the frequencies above 40 or 50 Hz. Below this limit, however, it seems that direct observations and indirect electrical measures an fai! to show any differentiallocalization. B. The Response Area at Various Levels of the Ascending Auditory Pathway In our previous discussion of spatial interactions, it was necessary to briefly introduce the notion of the response area of neurons in the auditory system. In the present section, we shall complete our discussion of this important set of data, emphasizing their current role in the support of place theories of auditory quality coding. The notion of response area was introduced by Galambos and Davis (1943), and their work has engendered a continuing series of follow-up studies. Galambos and Davis were interested in the response patterns of single auditory nerve fibers and explored the combined effect of stimulus intensity and frequency on the evoked spike response frequency. Glass microelectrodes, filled with Kel, were inserted through a surgically prepared opening in the skull of a cat into the auditory nerve so that potentials were recorded from single ceHs. Their initial observations suggested that each probed ceH had a characteristic tuning curve such that its threshold was lowest at one particular frequency, but that frequencies on either side of their best frequency were also able to excite the nerve if their intensities were high enough. As the frequency difference between a stimulating frequency and the best frequency increased, the amplitude of the stimulating frequency had to be progressively higher to elicit a neural response. The function representing the response area thus defined is V-shaped on a graph, in which frequency is plotted along the horizontal coordinate, and the necessary stimulus amplitude for a criterion response is plotted along the vertical coordinate. Figure 7.14 showed the general pattern of response as observed by Galambos and Davis for four auditory nerve neurons. This figure, now considered to be a classic, displays many of the important features. that have been substantiated by more recent studies. One of the most significant of the features of this response curve is that it is not symmetrically V-shaped. The high-frequency cutoff region is steeper than the low-fre-
The Neural Coding of Sensory Quality-Audition
561
quency cutoff. Furthermore, the width of the tuning curve is quite large. At the top of each of the curves, the width may be anywhere from 1500 to 3000 Hz, and this basic width increases as one goes from cells with low center frequendes to cells with higher center frequendes. This absolute increase in width with increase in frequency is often not appredated, because these curves are always plot ted on logarithmic horizontal axes, which obscure the increase in absolute bandwidth. On the other hand, it is also important to consider that the absolute width of the tuning curve may be less important than the ratio of the bandwith to the central frequency. Kiang (1965) has suggested the use of the traditional engineering "Q" ratio to define the sharpness of the tuning curve: center frequency width of response area 10 db above threshold
Q=--------=-----'---------:-
(10.2)
and has plotted the Q's for a large number of auditory nerve fibers. This plot is reproduced in Figure 10.17 and shows that the relative sharpness of tuning, at least as measured with the Qindex, gradually increases with center frequency above 2 kHz, a conclusion that is the complete opposite of that drawn from the absolute width data. Nevertheless, whichever view of the trend in width one accepts, the general conclusion that must be drawn from these data is that neurons of the auditory nerve do not exhibit anywhere near the same degree of narrow responsiveness at normal listening levels that is reflected in psychophysical tests of frequency discrimination. Over the years, a large number of workers have investigated the response curves of auditory neurons at other levels of the ascending pathways. Galambos (1952) himself and Gross and Thurlow (1951) have worked at the level of the medial geniculate body i Tasaki and Davis (1955) and Rose, Galambos, and Hughes (1959) at the level of the cochlear nucelusi and Hind (1960) and others at the level of the cortex. Moushegian, Rupert, and Galambos (1962) were the first to show that there were inhibitory areas within the exdtatory response areas at some levels, in addition to the adjacent inhibitory response area originally shown by Galambos (1944) and further developed by Greenwood and Maruyama (1965). (See Chapter 7 for a complete discussion of this material.) Because small differences in the details of procedure often make the results from different labs incomparable, Katsuki (1961) made a very important contribution to the problem when he studied the response areas at all of these levels within the confines of a single laboratory, with a single technique, and on a single spedes. Figure 10.18 reproduces his figure, showing the response curves recorded from the auditory nerve, the inferior colliculus, the trapezoid body, and the medial geniculate levels, respectively. We have already discussed in Chapter 7 the importance of this finding in raising a question of the existence of central sharpening. Another distinction among cells at the various levels is also included in these data, to which our attention has been called by Simmons (1970). He points out that, at all levels, investigators have found both moderately wide response areas, like the ones we have been discussing, and also,
562
Sensory Coding 24 22
:t; Ci .l: CI>
...
GI
20
...
18
a:
16
.l:
w
.IJ "0
0
14
O'l
co
I'.) I\)
preserved
and
and Company.)
and
and
somatotopic somatotopic somatotopic somatotopic
sulci sulci
B
somatotopic somatotopic somatotopic
MOTOR SEQUENCE
FIG. 5-18 The arrangement of the motor (A) and sensory (B) homunculi on the post-central and precentral sulci of the human brain, respectively. The figures emphasize the regular somatotopic order preserved in these regions but do not support any theory of discrete, nonoverlapping localization. (From Penfield & Jasper, ©1954, with the permission of Little, Brown, and Company.)
A
somatotopic
somatotopic
MOTOR SEQUENCE
Company.)
B. PERCEPTUAL PERCEPTUAL PSYCHOBIOLOGY PSYCHOBIOLOGY B.
5
297 297
5
5
5
5
5
5
5
5
5 5
5
5
5
5
5 5 5
5
5
5 5
5
5
5
5
5 5
N - Neck
5 5 5
T ~ Thorax S ~ Shoulder A~Arm
F - Forearm H - Hand
FIG . 5-19 The The map map of of the the body body on on the the somatosensory somatosensory cerebral cerebral cortex cortex has has an an analog analog in in the the FIG.5-19 external cuneate cuneate nucleus nucleus of ofthe the rat. rat. This This figure figure shows shows the the somatotopic somatotopic localization localization of of various various external portions of of the the body body in in this this small small nucleus nucleus as as well well as as its its location location in in the the brain brain stem. stem. (From (From portions Welker, ©1974, ©1974 , with with the the permission permission of of Elsevier/North-Holland Elsevier/North-Holland BioBioCampbell, Parker, Parker, && Welker, Campbell, medical medical Press.) Press.)
area. A A small small vestibular vestibular cortical cortical region region has has been been identified identified by by Andersson Andersson and and area. Gernandt (1954). (1954)_ Gernandt With this this brief brief background background review review of of the the other other sensory sensory and and motor motor areas areas in in With as noted, noted, has has been been hand, II can can now now turn turn to to the the visual visual system. system_ The The visual visual system, system, as hand, chosen because because itit isis the the best best known known and and because because the the aspects aspects of ofvisual visual perception perception chosen we shall shall be be most most concerned concerned are are well-grounded well-grounded in in the the dimensions dimensions of of with which which we with the physical physical stimuli. stimuli_ Vision, Vision, therefore, therefore, presents presents the the best best opportunity opportunity to to make make the explicit of perceptual perceptual localization localization that that are are implicit implicit in in modmodexplicit the the general general concepts concepts of ern sensory sensory theories. theories _ ern
1.1_ Anatomy Anatomy of of the the Visual Visual Brain Brain Perhaps the the first first point point to to be be stressed stressed about about the the organization organization of ofthe the visual visual system system Perhaps that itit isis multiple, multiple, not not only only in in terms terms of of cortical cortical projection projection areas areas but but also also in in isis that
298
5.
LOCALIZATION OF MENTAL FUNCTIONS
terms of the number of pathways that course from the eye to the brain. It is now well-established that, in addition to the "classical" and well-known pathway from the retina through the lateral geniculate body of the thalamus to the striate regions of the cerebral cortex (the retinal-geniculo-striate pathway), there is also, in mammals, one other well-validated major visual pathway that projects from the retinae to the superior colliculus, or tectum (roof), of the brain stem. This extrageniculate or collicular pathway seems to perform functions associated with general visual orientation, in at least some animals, rather than precise pattern vision (Schneider, 1969; Trevarthen, 1968). A third visual pathway has also recently been suggested by Graybiel (1974) that passes from the retina to the cortex via the pretectal region of the brain stem. At this point, little is known of the possible function of this third visual pathway. Figure 5-20 diagrammatically plots one theory about the relationship among these three pathways. For readers who are interested in a comparative analysis of the interconnections of the various brain nuclei in the visual systems of the various vertebrate classes, by far the best analysis is that presented by Ebbesson (1970). His diagrams, in particular, are rich fonts of understanding of the organization of this complex of multiple nuclei that mediates vision in vertebrates. Recent discussion with colleagues such as Glenn Northcutt and Steven Easter of the University of Michigan, who are far more expert than I with the anatomy of the vertebrate visual system, suggest that Ebbeson's diagrams only show the major visual pathways. In such animals as the frog, as many as a dozen distinguishable pathways may actually exist. I have already noted the existence of the multiple visual projection areas on the surface of the mammalian cortex. It seems fairly certain that the occipital pole of the cortex is the primary projection area of the visual cortex for the geniculo-striate pathway in primates and many other mammals. This occipital Visual Area 1, which shall be referred to as VI, is, in turn, surrounded by a peristriate region known as Visual Area 2 or V2. The inferotemporal (lower portion of the temporal lobe ) cortex has also been known to be an important area for the processing of visual information. In recent years, many additional regions previously assumed to be association area cortex have either been demonstrated to serve important visual functions through evoked potential and single unit procedures or to be structurally linked to the visual pathways by means of degeneration techniques. It is now thought that there are possibly as many as a dozen visual areas on the surface of the primate cerebral cortex. For example, Allman and Kaas have extensively studied visual projections to the cortex in the owl monkey and have discovered, in addition to the three "classic visual areas" (VI, V2, and inferotemporal cortex), an additional visual area in the posterior portion of the temporal lobe (Allman and Kaas, 1971) and a crescent-shaped "middle temporal" visual area located between the peristriate region and the root of the temporal lobe (Allman, Kaas, Lane, &
B. PERCEPTUAL PSYCHOBIOLOGY B. PERCEPTUAL PERCEPTUAL PSYCHOBIOLOGY PSYCHOBIOLOGY B.
299 299 299
SOMATI C
LP
SC Do (?)
T
Teg, RF PBN (N Pp Pp R.?) R.? (N
(?lNR? N Pp
Pons Sp Cd
LGd
V
EW Do-Is Teg,RF
I.Olive
Pu
Pons (m) I.Olive
Ret i no
FIG.5-20 One One theory theory of of the the organization organizationof ofthe thevisual visualsystem systememphasizing emphasizing three three different different FIG.5-20 One theory of the organization of the visual system emphasizing three different FIG.5-20 visual pathways. pathways. The The right right side side shows shows the the classic classic retino-geniculo-striate retino-geniculo-striate pathway; pathway; the the left left visual pathways. The right side shows the classic retino-geniculo-striate pathway; the left visual side depicts depicts the the tectal tectal and and pretectal pretectal channels. channels. Abbreviations Abbreviations indicated indicated are: are: SC SC= superior side depicts the tectal and pretectal channels. Abbreviations indicated are: SC ==superior superior side colliculus; PT PT = pretectal region region; LGd, vvv= dorsal and and ventral ventral nuclei nuclei of of lateral lateral geniculate geniculate colliculus; PT ==pretectal pretectal region; LGd, ==dorsal dorsal and ventral nuclei of lateral geniculate ; LGd, colliculus; body; LP LP== =nucleus nucleus lateralis lateralis posterior; posterior; Pul Pul== =pulvinar; pulvinar; EW EW== =nucleus nucleus of of Edinger-Westphal; Edinger-Westphal; body; LP nucleus lateralis posterior; Pul pulvinar; EW nucleus of Edinger-Westphal; body; =nucleus nucleus of of Darkschewitsch; Darkschewitsch; Is Is== =nucleus nucleus interstitialis interstitialis of of Cajal; Cajal; Teg-RF Teg-RF== =tegmentum, tegmentum, Da== Da nucleus of Darkschewitsch; Is nucleus interstitialis of Cajal; Teg-RF tegmentum, Da nucleus ruber, ruber, perirubral perirubral fields; fields; NP NPppp=== nucleus nucleus papilioformis; papilioformis; reticular formation; formation; NR NR=== nucleus nucleus ruber, perirubral fields; NP nucleus papilioformis; reticular formation; NR reticular Sp Cd Cd===spinal spinal cord; cord; and and I. olive===inferior inferior olive. olive. (Figure (Figure and and abbreviations abbreviations from from Graybiel, Graybiel, spinal cord; and l.I. olive olive inferior olive. (Figure and abbreviations from Graybiel, Sp Cd Sp ©1974, with with the thepermission permission of ofThe TheM.l.T. M.LT.Press.) Press.) ©1974, with the permission of The M.l.T. Press.) ©1974,
Miezin, 1973). 1973). In In addition addition to to (or (or perhaps perhaps instead instead of) of) these these areas, areas, by by employing employing In perhaps instead of) these areas, by Miezin, degeneration techniques, techniques, Zeki Zeki (1971) (1971) has has shown shown that that the the peristriate peristriate area area(V2) (V2) of of has shown that the peristriate area (V2) of degeneration old world world monkeys monkeys projects projects to to other other regions, regions,which, which, although although they they do do not not mainmainregions, which, they not mainold tain the the retinal retinal topographic topographic organization, organization, also also appear appear to to serve serve comparable comparable visual visual retinal organization, also to serve comparable visual tain functions. These These supplemental supplemental visual visual areas areas are are now now referred referred to to as as V4, V4, V4a, V4a, V5, VS, areas referred to as V4, V4a, V5, functions. and PSTS PSTS (the (the posterior posteriorsuperior superior temporal temporal sulcus) sulcus) by by Zeki. Zeki. In In addition, addition, V2 V2 itself, itself, temporal sulcus) by Zeki. In V2 and he believes, believes, may may actually actually be be two two separate separate regions regions designated designated asas V2 V2 and and V3_ V3. separate regions and V3. he
300
5.
LOCALIZATION OF MENTAL FUNCTIONS
The interconnections of these cortical and subcortical centers is at least partially summarized for old world monkeys in Fig. 5-21. This figure indicates that some of the signals initiated in the retina project directly to the striate cortex and then pass to the peristriate (or circumstriate) cortex. The next step in this pathway conducts to the multiple visual cortical regions, described by Zeki, and from there to the inferotemporal cortex. This diagram also indicates one of the possible extrageniculate pathways from the retina to the superior colliculus and then to the pulvinar from where paths have been tracked to the cortex. Although this diagram differs slightly from the one shown in Fig. 5-20, both are included to indicate two different modern theories of visual neuroanatomy. Another similar "theory" of mammalian visual system organization is embodied in Figure 6 of Ebbesson (1970). According to Charles Gross and his colleagues (Gross, Bender, & Rocha-Miranda, 1974), visual pathways from the inferotemporal cortex then pass to both the frontal lobes of the brain and to subcortical centers of the brain stem, including those especially interesting ones of the limbic system. Obviously, in most mammals, a very large portion of the brain is potentially involved in visual function according to current anatomical evidence. A much more realistic diagram of the organization of the visual system of a typical primate is shown in Fig. 5-22. In the remaining portions of this section, I review the available data that indicate the particular roles played by the more important of these brain loci in visual behavior. 2.
The Role of Visual Areas 1, 2, and 3The Striate Cortex and the Peri striate Cortex
It would be most satisfying if a simple answer could be given to the question: What happens to visual behavior when a particular visual area of the cortex is ablated? However, the history of this question is enormously complicated both by the anatomical and surgical complexities, on the one hand, and the difficulties in assessing behavioral changes with more or less arbitrary measure~, on the other. Among the earliest modern studies of the effects of ablation of the striate cortex (VI) were the reports from the laboratory of Kluver (1941). Kluver removed what he believed to be V 1 and concluded, on the basis of tests of visual form discrimination, that the operated animals seemed to have lost all pattern vision. His experimental animals were unable to detect the difference between triangles and squares, for example, and exhibited a marked deficiency in moving about in an obstruction-filled field. In other words, the animals appeared to be blind to everything except the overall brightness of the visual environment. However, in recent years, a number of workers, most notably Lawrence Weiskrantz (as summarized in Weiskrantz, 1974), have suggested that the conclusions
Press.) Press.)
organization organization The The The The The The
Press.) Press.) Press.)
Press.) Press.) The Gross, Press.) Gross, The primate primate primate primate
primate primate Press.)
Press.) Press.) The
primate
primate primate primate FIG. 5-21 5·21 Another Another possible possible theory theory of of the the organization organization of of the the primate primate visual visual system. system. FIG. (From Gross, Gross, Bender, Bender, && Rocha-Miranda, Rocha-Miranda, ©l974, ©1974, with with the the permission permission of ofThe The M.LT. M.l.T. Press.) Press.) (From
permission permission permission permission permission
permission permission
permission permission
permission permission pathways permission
pathways pathways
pathways pathways pathways EX pathways pathways
pathways pathways pathways pathwayspermission A more more realistic realistic depiction depiction of of the the visual visual pathways pathways in in the the primate primate brain. brain. (From (From A ©1965, with with the the permission permission of of Academic Academic Press.) Press.) Kuypers et et aI., aI., ©1965, Kuypers FIG.5·22 FIG.5-22
301 301
302
5.
LOCALIZATION OF MENTAL FUNCTIONS
drawn by Kluver from his ablation experiments may be in need of considerable modification. Weiskrantz believes that animals that have total destruction of VI (including the inevitable retrograde degeneration of the lateral geniculate nucleus) can still discriminate patterns (pasik & Pasik, 1971; Weiskrantz, 1972), although less well than normal animals. Weiskrantz also believes that his results indicate the striate cortex is actually a region in which detailed pattern discrimination is mediated, but only to the extent that it is particularly dependent upon fine spatial discriminations. He thus suggests that the animal without VI is behaving more or less as if it were only suffering from a gross diminution of its spatial acuity, but that many other visual tasks involving form, localization, and brightness are still adequately handled. Weiskrantz (1974) goes on to report that when the peristriate cortex (V2 and V3) is ablated, in addition to VI, only then does the animal behave as if it was totally incapable of processing any information about visual form. However, when only the peristriate cortex is ablated, the results are far more subtle. The effects seem to be restricted to very complex sorts of visual information processing such as spatial relations among objects; on most simple visual tasks, the animals seem to do fairly well. A specific explanation of how this residual form perception remaining after ablation of VI may be mediated has been suggested by Dalby, Meyer, and Meyer (1970), who carried out VI ablations on cats and also observed that some primitive pattern vision is maintained. In their experiments they used stimuli such as visual cliffs and checkerboards that varied in the length of their constituent visual contours. On the basis of the results, they suggested that the residual visual form perception was actually functionally related to cumulative differences in the length of the constituent visual contours. This explanation would thus relegate "residual form perception" to the cumulative amount of optic nerve activity rather than any stimulus structural sensitivity per se. This makes sense because activity is known to be associated with contour length as a result of lateral interactions within the retina. Because of a dimensional transformation, form would therefore have been coded as amplitUde! Unfortunately for this simple explanation of what had been a rather perplexing observation, Dalby and the Meyers also observed that even when flux differences and contours were carefully balanced in displays composed of squares or circles, there was still some residual discriminability on the basis of form alone. Whether another form-to-intensity transformation was occurring or not could not be ascertained. Nevertheless, their work does suggest that it is possible for what appears to be a residual form discrimination ability to be explained in terms of a quasi-intensive code rather than one directly reflecting the geometry of the situation. It is obvious that the complexity of the problem of what functions the visual areas of the cortex perform is great, and the technical difficulties are profound. There still is no general agreement as to even the basic questions. Ablating two
B.
PERCEPTUAL PSYCHOBIOLOGY
303
areas does not lead to a simple summation of the effects of the two independent surgical extirpations. Nevertheless, it is clear that these portions of the cortexVI, V2, and V3-are more likely to produce a visual deficit involving form perception than any other behavioral deficit yet observed. However, discrepancies are far easier to find than generalities, because, as laboratory research has accumulated on animals other than primates, it has become clear that the regions that correspond in other mammals to VI, V2, and V3 in primates can be ablated with a remarkably small amount of pattern discrimination deficit being generated. If any generality is forthcoming, it is that in primates the finest form discriminations based on best acuity seem to be more closely associated with these visual projection areas than they are with other parts of the visual brain. 3.
The Inferotemporal Cortex
Another region of the cerebral cortical mantle that has been shown to be intimately involved in visual learning is the inferotemporal cortex-the lower or caudal portion of the temporal lobe. The anatomy of the inferotemporallobe, as observed both electrophysiologically and with degeneration techniques, reveals that its main inputs come from the striate cortex (VI) via the peristriate areas (V2 and V3) possibly by way of V4, V4a, and V5. However, it has become increasingly clear in recent years that this region must also receive inputs from the collicular and pulvinar pathways, as shown in Fig. 5-22. Originally all of the temporal lobe was thought to be generally involved in producing visual deficits (Kluver & Bucy, 1937, 1938). However, in later years, Mishkin (1954, 1966) demonstrated that it was only the lower or inferior portion of the temporal lobe that was responsible for the visual defects that had been observed by Kluver and Bucy. Microelectrode recordings from the neurons of the inferotemporal area seem to be exclusively responsive to visual inputs (Gross, Schiller, Wells, & Gerstein, 1967; Gross, Bender, and Rocha-Miranda, 1969), adding further credence to the concept that it is almost uniquely a part of the visual brain. The type of visual effects observed after the ablations of the inferotemporal cortex are extremely subtle. They are exhibited most often in experiments in which the animal under study is required to perform some kind of visual learning task. However, not all visual learning tasks are equally effected, and ablations of different portions of the inferotemporallobe produce different types of learning deficits. For example, consider Fig. 5-23, which depicts the effects of ablations of several different regions on different types of learning. The effects obtained when only a single discrimination is required in each trial can be inconsequential. However, if the experimental task is only slightly complicated, for example, by having several discriminations concurrently present in each trial, massive performance deficits can be produced. According to Weiskrantz (1974), the posterior
304
5.
LOCALIZATION OF MENTAL FUNCTIONS
three showingshowing showing
500
three
400
200 500 200 200 Object Leornlnlil
200
Type:m
+ TIl'
Type I
+n:
200
Type 0+1
'"
200
'00
Unopero1ed
200
200
200
200
200
200
200
FIG.5-23 Graph showing the effects of different lesions on three different learning tasks (pattern relationship, concurrent learning, and object learning). The horizontal axis indicates the region of the lesion in the experimental and control animals. The extent of the lesion is indicated both in the drawing and by the arbitrary Roman numerals. The three vertical coordinate scales are necessary because of the different rates of learning of the three learning = number of animals in each group; and FH = = a sham tasks. Abbreviations indicated are: N = operated control group in which the hippocampus lobe was lesioned. (From Mishkin, © 1972, with the permission of Springer-Verlag, Inc.)
region of the inferotemporal cortex is involved in the selection of, and attention to, visual cues, but the anterior portion may possibly be more closely associated with visual memorization. Clearly, though, these terms are mere shorthand for mental processes that are complex enough to evade adequate explication at present. Two general facts emerge from this type of analysis. First, it seems clear that the inferotemporal cortex performs functions that are subsequent to or of a higher order than those performed by the earlier portions of the visual system. Thus, while the striate cortex seems to be necessary for fine-grain form perception, the inferotemporal cortex is much more intimately related to visual learning processes that are harder to define precisely. Lesions of inferotemporal cortex' cortex· may have profound effects on even simple visual learning tasks. Yet, according to Gross (1973), previously learned information is not forgotten after inferotemporal lesions, and it seems almost certain that none of the more basic visual functions such as form or brightness discrimination are affected at all by inferotemporal lesions. Thus the inferotemporal lobe is certainly neither the storage unit (if there is any) for the acquired memories, nor is it required for the discrimination of form. Rather, its role seems to be more involved in controlling the acquisition, rather than in the actual storage, of visual information.
B.
PERCEPTUAL PSYCHOBIOLOGY
305
Meyer (1972) believes that the function of the inferotemporal cortex is even more specific. Noting that animals tend to learn how to learn, that is to develop what Harlow (1949) called learning sets, Meyer has suggested, on the basis of extensive experimentation in which especially careful control was made of the tasks required of the experimental monkeys, that the deficits produced by inferotemporal lobe lesions were entirely explained in terms of a reduction in the ability to develop the learning set. Tasks that involved only short-term retention or single stimuli appeared to be totally unaffected even by very extensive inferotemporallesions. This then brings us back to the other main observation concerning the role of the inferotemporal cortex-it is responsive only to visual stimuli. Because this region receives inputs from at least two, and possibly three, independent visual pathways and seems to be heavily involved in learning, it might be regarded as a region specialized for the integration of information from multiple inputs. The integrative role of the inferotemporal cortex may, thus, be one in which links are established between coverging inputs from the collicular and the geniculostriate visual pathways. As we have seen, fine pattern discrimination seems to be mediated in the latter, and, as we shall see in the next section, the collicular pathways seem to be more concerned with orientation and localization. The function that may best describe the role of the inferotemporal cortex is the merging and integrating of these two sources of visual input information prior to the selection of appropriate responses. However, there is an argument against even this role. Recent microelectrode studies (e.g., Rocha-Miranda, Bender, Gross & Mishkin, 1975) have suggested that the visual responses of the inferotemporal cortex are totally abolished if the striate cortex is ablated. The visual input to the inferotemporal region, therefore, may be exclusively through connectives that come from the geniculo-striate pathway either directly or through the forebrain commissures and may not directly involve the collicular inputs. This issue of the role of the inferotemporal cortex is yet to be resolved. A useful and comprehensive review of the problem of inferotemporallesions and their behavioral effects has been published by Dean (1976). He concludes that all of the inferotemporallesion experiments are explicable in terms of either deficits in the ability "(a) to categorize visual stimuli or (b) to form associations with them [po 41]." Clearly, at this point the difficulty inherent in such vague definitions of the relevant psychological constructs become dominant. 4.
The Superior Colliculus
In the late 1960s, a considerable amount of interest in the visual functions of the region known as the superior colliculus (or tectum) of the brain stem was generated by the work of a number of psychobiologists. We have already seen
306
5.
LOCALIZATION OF MENTAL FUNCTIONS
how the tectum and, perhaps, also the pretectal areas are now considered to be independent pathways of visual information flow 3 from the peripheral retinae to the central nervous system. However, another role has been suggested for the superior colliculi that stresses an independent contribution that they may make to visual perception. Among others, Schneider (1969) has hypothesized that the collicular system, rather than being just another means of getting visual information to the cortex, may actually represent an independent and sufficient visual interpretive system. For example, in lower vertebrates, such as amphibia, the tectum is believed to be the main visual projection; and frogs, for example, which do not possess a cerebral visual cortex (see Ebbesson, 1970), perform quite well in their visual environments. Anatomical and ecological clues, like this one from lower vertebrates, led Schneider to study mammals to determine the respective roles of each of their constituent visual systems. Working originally with hamsters, Schneider noted that specimens in which bilateral undercutting of the colliculi was performed seemed to have a considerable amount of difficulty in orienting toward stimuli. Upon initial examination, the animals appeared to be functionally blind; they stumbled about and were unable to direct their gaze at food objects. But more detailed testing showed that these animals could indeed "see" fairly well. They were, for example, able to discriminate among patterns. On the basis of such evidence, Schneider assumed that the two visual systems -the geniculostriate and the collicular pathways-are respectively responsible for what may be broadly considered to be two different kinds of vision. He supported the concept that the geniculostriate system is responsible mainly for discriminating among forms and shapes mainly mediated by the high acuity region of the retina-the fovea, and added the idea that the collicular system is more involved in localization and orientation over the entire visual field but seems to be incapable of independent form recognition. The collicular system, Schneider proposed, acts to bring biologically Significant stimuli into the foveal field of view for detailed examination and should, therefore, be closely associated with the musculature that controls eye movement and head position. It would thus be expected to have large receptor fields for its visually sensitive neurons. 3Although I stress the visual function of the superior colliculus in this section, it is important to note that it also has other sensory functions. Drager and Hubel (1975) have shown that single neuron responses can be elicited from this area in the mouse by acoustic and tactual stimuli as well as by the visual ones. The more general function of the superior colliculus, they suggest, is to represent the geographical environment of the animal whether the information is forthcoming from his eyes, ears, or whiskers. It is also possible, considering that the neural responses lead the ocular muscle contraction, that it also plays a motor role in eye movements.
B.
PERCEPTUAL PSYCHOBIOLOGY
307
A great deal of subsequent research has tended to reinforce the concept that the colliculus or tectum is involved mainly in aspects of visual perception that involve spatial localization and orientation. Some workers, such as Schiller and Koerner (1971) and Wurtz and Goldberg (1972), have convincingly shown that the collicular visual system is implicated in the control of eye movements. Schiller and Koerner have specifically shown that activation of the collicular system does bring a visual stimulus that may fall initially on the peripheral portion of the retina into the foveal field of view where the pattern perception mechanisms of the geniculostriate system may contribute to the recognition of the stimulus. Just how collicular neurons may be involved in eye movements is demonstrated in Fig. 5-24. This graph, from Schiller and Koerner (1971), shows the direction and magnitude of a large number of saccades (eye movements). The open circles, randomly distributed across the field of this figure, are those in which RIGHT
LEFT 10·
5·
s·
10·
·
10 10
·
s
z
· ·
5
~
o
o 10
UNIT W R 20- 5 FIG.5-24 In this figure, the results of an experiment are displayed in which a microelectrode was inserted into a single collicular neuron. Each time a saccadic eye movement occurred, a dot or circle was plotted on the graph to indicate direction and size of the saccade. If the cell emitted a burst of activity prior to the saccade, a dot was plotted; if the cell was quiet, an open circle was plotted. The results of this experiment indicate that the cell impaled was associated with a very sharply defined range of saccadic movements. (From Schiller & Koerner, ©1971, with the permission of The American Physiological Society.)
308
5.
LOCALIZATION OF MENTAL FUNCTIONS
there was an eye movement but no response from an extracellular microelectrode located close to a certain group of neurons in the deeper layers of the superior colliculus. However, the black dots indicate a set of saccades of a particular size and in a particular direction that was always preceded by activity in the neurons near the microelectrode. The important fact in this record is that the neurons from which the signals came seemed to be associated with a particular direction and magnitude of movement. The implication of this result is that the neurons in the deeper layers of the colliculus are very deeply involved in the control of eye movements. A further insight into the nature of this control lies in the fact that the saccades indicated with black dots were not all initiated from the same point. Rather, the neurons seemed to emit control signals that specified a size and direction of the saccade that was independent of its initial point. The superior colliculus, therefore, is presumed to send out signals that participate in the control of what must essentially be considered ballistic 4 eye movements that tend to bring stimuli into the foveal view, based on the low acuity but general orientation information of the visual inputs it receives from the peripheral field of view. There is one warning that should be introduced into this discussion, however, in the spirit of the critical analysis that I have been carrying out for this material. There is still some residual doubt whether these recorded neuronal responses are, in fact, the output of motor units. It is possible, quite to the contrary, that these are not motor neurons directly controlling eye movements but sensory neurons reflecting feedback signals from muscle receptors. This could be the case, even if the electrophysiological response led the observed motor response in time because of the inertial characteristics of the mechanical aspects of the oculomotor system. Whatever the role of these deeply situated neurons, they do seem to be involved in some aspect of motor activity. When the electrodes were not thrust as deeply into the colliculus, Schiller and Koerner found a different pattern of results. In the superficial collicular layers, neurons seemed to display a purely visual sensory function. However, neurons located there failed to exhibit any of the sensitivities to the pattern, shape, or direction of movement that are so characteristic of the neurons in the striate and peristriate cortices. Rather, the neurons of these superficial layers of the superior colliculus are much more likely to have both large receptive fields and high sensitivity to such general factors as the size than to the movement or shape of the stimulus.
4 A ballistic motion is one in which the trajectory is mainly determined by the forces exerted at the time of projection. There is no "course correction" or guidance subsequent to release of the missile.
B.
PERCEPTUAL PSYCHOBIOLOGY
309
In sum, the superior colliculus in mammals appears to act as an initial earlywarning system that acquires visual stimuli and then contributes to the orientation of the animal, or its eyes, to bring the stimulus into the foveal retinal areas that the cerebral pattern perception mechanisms depend upon for their function. The proximity of the superficial sensory layers and the deeper motor layers suggests an integrative role for the colliculus in which sensory inputs are linked to fairly specific motor outputs. 5.
Subcortical Mechanisms in Visual Perception
It is obvious that other subcortical centers as well as the superior colliculus, including the basal ganglia and the thalamus, may be involved in perceptual processing. Some human clinical observations summarized by Riklan and l.evita (1969, chap. 5) lend substantial support to the notion that basal ganglia pathologies that produce symptoms of Parkinson's disease also produce associated perceptual difficulties, as evidenced in responses to drawing-type tests like the Bender Gestalt, or unstructured interpretive tests like the Rorschach. Riklan and l.evita suggest that one of the more important roles played by the basal ganglia is in the integration of postural and visual afferent signals to determine the perceived vertical in a tilted-chair experiment. Proctor, Riklan, Cooper, and Teuber (I 964) discovered that after therapeutic surgery of the basal ganglia for Parkinsonism, patients made many more errors in determining vertical alignment than did normal controls. However, as I have already noted, most human clinical data are very variable and difficult to interpret, and, unfortunately, there has been relatively little work done on perceptual deficits produced by experimental lesions of the basal ganglia in experimental animals with which to compare these human data. However, a considerable amount of work has been done on rats, cats, and, in at least one important study, monkeys with regard to brain stem involvement in visual information processing and learning. Thompson and Myers (I 971), in an important experimental study of the monkey's brain stem have also reviewed a considerable amount of the material on perceptual effects produced by similar lesions in rats and cats. They note that research on rats and cats has pointed to three different areas of the brain stem that seem to impair visual learning and discrimination. Specifically they state: There are at least three circumscribed areas of the brain stem which may play an important role in visually guided behavior. One of these lies in the vicinity of the pretectum. In the rat, the critical pretectal focus appears to occupy the anterior half of the nucleus posterior thalami (Breen, 1965; McNew & Thompson, 1966; Thompson & Rich, 1961). In the cat, the results of one study (Thompson, Lesse, & Rich, 1963) suggest that the critical focus lies within the posterolateral portion of the pretectal area and the subjacent nucleus posterior thalami.
the
the
the
the
the the
the
the the
the
the
the
the
the the
the
the the the
the
the of
the
of
of of
of
the of
of of of
of
of
of
of of
of
of
of of of
FIG.5-25 FIG. 5-25 Maps of eight different levels of the brain stem. The animal was trained in a visual discrimination task, and then the lesions (indicated by circles) were made. Those on the left-hand side of each figure were effective in reducing the performance following recovery; those on the right-hand side were not. Obviously visual memory is influenced by almost all levels of the brain. Abbreviations indicated are: AT = area tegmentalis; CG = central gray; CM ==centre medial; H == habenula; HYP =hypothalamus; IP == interpeduncular nucleus; LG == lateral geniculate nucleus; LP == lateral posterior nucleus; MB == mammillary bodies; MD == medial dorsal nucleus; MG == medial geniculate nucleus; NP == nucleus posterior; ON == = posterior commissure; PT == pretectal oculomotor nerve; OT = optic tract; P = pons; PC = region; PUL == pulvinar; Rm == magnocellular division of red nucleus; Rp == parvocellular division of red nucleus; S == subthalamic nucleus; SC == superior colliculus; SN = = substantia & Myers, ©1971, nigra; and V = ventral nucleus. (Figure and abbreviations from Thompson & with the permission of The American Psychological Association.)
310
B.
PERCEPTUAL PSYCHOBIOLOGY
311
A second area of the brain stem critical to visual discrimination performance in lower forms is located within the ventromedial midbrain, particularly in the region of the red nucleus. Bilateral destruction of this area has repeatedly been found to impair the execution of a visual discriminative response in rats (McNew, 1968; Thompson, 1969; Thompson, Lukaszewska, Schweigerdt, & McNew, 1967) and markedly diminishes responsiveness to visual cues in cats (Myers, 1964; Sprague, Levitt, Robson, Liu, Stellar, & Chambers, 1963). The third area of the brain stem supporting normal visual discrimination performance is located in a zone between the pretectal region and the red nucleus. In the rat, this area lies immediately lateral to the rostral extension of the central gray substance and descends caudally in close conjunction with the habenulopeduncular tract (Thompson, 1969; Thorne, 1970). Although less intensively investigated in the ca t, this area seems to include the deep subcollicular region, the lateral portion of the central gray substance, and the subjacent tegmentum (Myers, 1964) [Thompson and Myers, 1971, p. 480].
Thompson and Myers (1971) then report the results of their experimental studies of visual effects when lesions were produced at various levels of the monkey's brain stem. In each level, they assayed the visual effects of the lesion by using a simple discrimination task in which the monkey had to choose between different objects that might cover a piece of banana. Figure 5-25 summarizes the results of their study. This figure shows the extent of the lesions that were created at eight different levels of the brain stem, as determined with post-mortem histology. The lesions drawn on the left side of the figures indicate those areas that did produce a deficit; the ones on the right indicate areas that when damaged did not interfere with the kind of visual performance tested. In general, Thompson and Myers found that only two brain stem regions consistently produced a deficit-the pulvinar nucleus and a region composed of the posterior thalamic and pretectal nuclei. A number of other nuclei produced inconsistent interference with this type of visual discriminative task. An important finding from this work was Thompson and Myers' discovery that the deficits produced by these brain stem lesions were the same as those produced by inferotemporal lesions. Thus they were led to suggest that the role in visual learning that had been ascribed solely to the inferotemporal cortex might, in fact, be mediated by a vertically organized system of interacting structures rather than that cortical region alone. Much of what has been said so far about the inferotemporal cortex may, therefore, be applicable more appropriately to this vertically organized system in general. There are, however, some caveats concerning this work that should be mentioned. Because of the complexity of the behavioral processes that were assayed in Thompson and Myers' study, it is entirely possible that the interference was produced by means other than a direct effect on visual discrimination per se.
312
5.
LOCALIZATION OF MENTAL FUNCTIONS
Lesions in the thalamus, for example, may simply have interfered with the flow of visual information to the brain. Furthermore, some of the lesions led to "a severe and relatively permanent paralysis of downward gaze" (Thompson & Myers, 1971, p. 504). Inability of the animal to normally scan the visual scene may, therefore, also have played a role in decrementing the performance in a way that was unrelated to discriminative or memorial functions. Nevertheless, although there may be some question about the exact details of Thompson and Myers' work, it is particularly significant in stressing that no one "center" exclusively controls visual discrimination and learning. Rather, the same general picture emerges here as in most other studies of central nervous system localization: Each mental process is the result of the interaction of a system of constituent nuclei. The idea of a unique center will die hard, but it will die as this type of data becomes more widely appreciated. 6.
Some Comments on Microlocalization
So far in our discussion of localization in the visual system, we have only considered localization with regard to the brain macrostructure-the lobes and grossly demarcatable regions. There is another level at which the problem of localization can also be attacked, however, that is of much more minute dimensions. The various regions, themselves, are not homogeneous. If one switches research techniques, dropping the paradigmatic approach in which the behavioral effects of ablations are examined and adopting the recording microelectrode as one's main research tool, a whole new set of data and dimensions concerning the localization problem becomes available. The extra- or intracellular microelectrode (often less than one micron in diameter) serves as the exploring tip of an electrophysiological recording system that is capable of detecting the responses of individual neurons. Almost all conceivable dimensions of the visual stimulus, including brightness, color, and spatial and temporal pattern, have been shown to elicit and/or modulate activity in one or another part of the visual nervous system. Variations in these dimensions are transformed or encoded into patterns of neurophysiological response in different regions that may be selectively sensitive to one or more of the trigger dimensions I have mentioned. Which particular neuron in which particular place will respond to which particular stimulus dimension is a question that is the microscopic place coding analog of the macroscopic localization issue. Unfortunately, there is also no general answer to the question of microscopic place coding. Rather, there are a number of issues that are involved in the search for the coding parameters of sensory neurons that are not usually considered from the point of view of localization theory. Complicating the matter is the fact that a single neuron may often respond differentially to a number of different parameters of the visual stimulus. Thus a single neuron may be found to be varying its frequency of firing as the color, place, and intensity of the visual
C.
PSYCHOBIOLOGY OF THINKING AND SPEECH
313
stimulus vary. There is no unique place code, or microscopic localization, therefore, at this level; rather, other dimensions (than where a neuron is located) are responsible for conveying information about these aspects of the stimulus to particular neurons in particular places. The reader interested in the problem of sensory coding is referred to Chapters 6 and 7 in this book or, for a more complete discussion, to my earlier work (Uttal, 1973). What does occur within some of the visual areas, however, is a sort of localization or, more properly, place encoding of the spatial arrangement of the visual field. It is well-established that in the early portions of the visual pathway, up to and including VI, V2, V3, and probably V4, retinotopic (i.e., a spatially isomorphic or map-like) representation of the external world is maintained. However, as one proceeds further to the higher-level regions, such as the frontal or inferotemporal cortices, there is a breakdown in this form of localization; no evidence of an isomorphic retinotopic localization can be observed. Thus a kind of microlocalizational, topological mapping of the environment, is present but only within some of the more peripheral brain areas assigned to visual processing. We may conclude that there does appear to be a certain degree of differentiation of function observable among different areas at both the macro- and micro-levels of the visual nervous system. Localization, or place encoding, as some would call it, is an important aspect within the visual process just as place is an important code for sensory modality. But it is a form of localization that involves more than a single visual center; it is a system of interacting component centers, each of which depends for its function upon the integrity of others, and each of which may be heavily interconnected with other portions of the brain and brain stem. This then completes our brief survey of the ways in which visual perception is localized in the central nervous system. I now turn to a consideration of systems of components of the brain that seem to be heavily concerned with some more complex cognitive processes.
C. ON THE LOCALIZATION OF THOUGHT PROCESSES IN THE BRAIN-THE PSYCHOBIOLOGY OF THINKING AND SPEECH 1.
On the Anatomy of the I ntrinsic Areas
The psychological processes that I consider in this section are among the most elusive to define in an exact fashion. I use the rubric of "thinking" to include such diverse mental functions as those behind language behavior, on the one hand, and highly structured and abstracted laboratory tests of decision making, on the other, to emphasize the most extreme examples. The breadth of this topic
314
5.
LOCALIZATION OF MENTAL FUNCTIONS
should become clear as readers progress through the later parts of this section. For the moment, however, let us consider the anatomy of the neural structures within which modern research suggests that these complex cognitive processes may, to a certain extent, be localized. For the most part, I deal in this section with the parts of the brain that have classically been called the association areas of the cerebral cortex. There are a number of experiments that suggest that some of the subcortical centers of the brain and brain stem may also be involved to some extent in the processes considered, but most of the relevant research has been carried out on the cortical association areas (the nonsensory or motor regions-the "uncommitted" regions) that were shown in Fig. 5-16. The regions with which I am now concerned have been referred to as association areas for some compelling historical reasons. The early psychobiologists thought that these nonsensory and nonmotor regions were literally responsible for the "association" of sensory input signals with the appropriate motor outputs and that the brain operated as a giant switchboard. It was assumed that sensory signals flowed first into the primary projection areas and then were routed to the association areas where they were linked together with appropriate responses. The molar process of learning was the external behavior that many of the nineteenth-century associationists thought reflected the formation of these neural links. After suitable experience, specific stimuli came to be associated with particular responses. The stimulus-association-response concept of human behavior was the predominant psychological tradition during this period. We now have a somewhat different view of the role and nature of the socalled association areas of the cortex. It is clear that almost all of these areas also receive direct signals from the sensory systems. Indeed, it is often possible to cut the connectives from the sensory projection areas to the association areas, without producing major deficits in behavior. It now seems that the many connections that run between the different association areas themselves may be more important than the pathways between the primary sensory projection regions and the association regions. Because of these and related considerations, Pribram (1960) has urged us to call these areas the "intrinsic" areas rather than use the archiac term "association" areas. From this point on and for the reasons he suggests, I follow this revised nomenclature, which stresses an entirely different role for these important areas. The detailed anatomy of the intrinsic areas and their interconnections is extremely complicated. Very little is known of their cellular architectonics. We simply do not have adequate conceptual anchors to use in an analysis of intrinsic area function comparable to the well-ordered stimulus dimensions, the simple input pathways, and topographic layout of the primary sensory projection areas. Furthermore, there seem to be an abundance of interconnections between and among the various intrinsic areas. It must be remembered that, except for the
C.
PSYCHOBIOLOGY OF THINKING AND SPEECH
315
thin shell of the cerebral mantle and the other subcortical nuclei, most of the cerebral hemispheres are made up of myelinated nerve fibers interconnecting the various nuclei of the cortex. It is patently impossible to adequately review the many anatomic studies of the interconnecting tracts in this important part of the brain, but the reader may wish to look again at one of the very good neuroanatomic texts such as Crosby, Humphrey, and Lauer (1962) or Carpenter (1976) to review this material. 2.
The Frontal Lobes and Time Binding
It has been known for almost a century that the effects of frontal lobe injuries or ablations produced subtle behavioral effects, but the appreciation of exactly what these effects were proved to be extremely elusive, in spite of a considerable amount of research on animals and on humans. The classic case of Phineas Gage, often described in elementary textbooks, still serves as an excellent illustration of the subtle but important psychological processes that seemed to be associated with these regions of the brain. Mr. Gage, after recovering from the initial trauma produced by having a crowbar thrust through his eye and out the top of his head in a way that essentially amputated his frontal lobes, led a relatively normal life. The exceptions to his complete recovery were behaviorally delicate and difficult to precisely define. He seemed to have lost ambition, judgment, and his "organizational" abilities, all of which were important personality attributes for the job he held as a railroad construction foreman prior to his injury. Unfortunately, in this oft-told tale, we are left with only these extremely vague measures of Mr. Gage's personality changes. It is not exactly clear from the critical point of view of the empirically oriented laboratory psychobiologist just what such vague dimensions of behavior as "judgment" or "organizational ability" are. They sound all too much like phrenological faculties. The search for some more specific behavior deficit has thus led to the use of some highly abstract laboratory tests of performance that can be quantified in a way that the term "ambition" cannot be. Stemming from many other more modern studies, reported by such distinguished neuropsychologists as Luria (1966a), is the discovery that the frontal lobe seems to be intimately involved in complex behaviors that have to do with organization of sequences or complexes of responses. According to Luria, lesions in this region typically produce a deficit in the ability of patients to evaluate the subsequent effect of their present behavior. As suggestive as this notion is, these deficits are still relatively poorly defined. In animals, much more carefully controlled research procedures, of course, can be used; and similar lesions produce symptoms that seem to be very much in this same category. Behavioral processes that require the organization of responses
316
5.
LOCALIZATION OF MENTAL FUNCTIONS
into sequences or that might be involved in the evaluation of the future effect of those responses are also degraded in experimental animals. The typical effect of amputation of the frontal lobes in a dog or monkey is the generation of a difficulty in locating food that is placed in the field of view contralateral to the amputated frontal lobe. The animal seems not to be blind in any sense; discrimination tests make it clear that the animal is capable of discriminating between forms. Rather, the animal seems to simply neglect the visual stimulus, to not appreciate the fact that this stimulus is a potential satisfier of hunger. Furthermore, the deficit exists only for a period of a few weeks following the surgery. After that, the animal recovers its attentiveness to food, and the effect of the ablation disappears. If the other frontal lobe is then ablated, the process simply repeats-the animal at first disregards or neglects a visual food stimulus in the contralateral field of view and then recovers what appears to be near normal behavior. By far the most unusual laboratory test used in the study of behavioral destruction following frontal lobe ablation is the delayed response test in which an animal is prevented from responding immediately following the presentation of a stimulus (Jacobsen, 1935). In its most familiar form, the delayed response task is carried out in the following way. The stimulus object is exposed to the animal for a period of time. It is then removed (or obscured), and a second period of time passes prior to the exposure of the manipulandum that the animal must use to make his response. The duration of this second interval between the stimulus and the presentation of the manipulandum is usually the independent variable in experiments of this sort. A host of experiments using the delayed response task have shown it to be particularly sensitive to frontal lobe ablations (see, for example, Butter, 1964; and Chapter 8). The dorsolateral regions of the frontal lobe seem to be particularly effective in producing this delayed response deficit according to Butter's and related work. In humans, the effects of frontal lobe injury are more complicated. Luria (1966a) discusses disturbances in voluntary motor patterns, including a disintegration of serial sequences of motor responses, a tendency toward perserverance (such as might be evidenced by an inability to reverse a motor or verbal response in accord with a sequence of alternating verbal instructions) as well as some more complex speech and visual replication difficulties. Luria also characterizes the general change in personality often observed as a loss of goal directedness in the afflicted humans. The general picture that emerges from this brief discussion of the effects of frontal lobe ablations and injuries is that these effects are, in large part, associated with what some have called time binding. All of the deficits, whether they are delayed responses in monkeys or a loss of sequence control or even "ambition" (whatever that is) in humans, are linked by this common thread. Each exhibits a behavioral change that is in some way associated with events that are
C.
PSYCHOBIOLOGY OF THINKING AND SPEECH
317
spread over an extended period of time. The frontal lobe deficit seems most likely to involve an inability to string sequential events together or to appreciate the subsequent effects of some currently emitted response. Pribram (1973), in summing up the proceedings of an important congress, suggested that the weight of the data concerning frontal lobe injury led him to the idea that the frontal lobes act to inhibit mutual interference that would otherwise occur among a series of near simultaneous brain events. This concept is nearly synonymous with the notion of a time sequencer expressed in the preceding paragraph. because once interference is reduced, the opportunities for the serial ordering of sequential events and the determination of the subsequent biological value of earlier events would necessarily be enhanced. In Pribram's terms, the "ability to resist novelty" (Pribram, 1973, p. 306) is also a function of the frontal lobes. One of the other effects often noticed with frontal lesions, he notes, is that the operated animals do not normally habituate to repeated stimuli. Rather, they continue to respond as if the stimulus was fresh and new far longer than do normal animals. In sum, the notion of time binding, the linkage of a sense of mental state into a smoothly flowing stream of consciousness and behavior, permeates almost all of the literature that describes behavioral changes due to frontal lobe lesions. 3.
The Left Hemisphere and Speech
Even though language production has often been considered one of the most clear-cut examples of a relatively discretely localized psychological function, a close examination of the recent literature suggests that speech is no better localized than any other function we have discussed so far. Two of the preeminent psychobiologists who have studied the brain localization of speech mechanisms (Penfield & Roberts, 1959) have emphatically stated: "No discrete localization of lesions producing various types of agnosia and apraxia has been found. It seems as Jackson (1931) stated, that any acute lesion to any gross part of the left hemisphere will produce some disturbance in speech [po 78]." The late Eric Lennenberg (1974) also put the problem of localization of speech functions into perspective when he noted: In many aphasiologists' opinion, the exact anatomical substrate for language remains elusive, especially for the cognitive side of language. If one compares the various aggregate maps of cortical lesions prepared, for example, by Conrad (1954), Penfield and Roberts (1959), Russell and Espir (1961), or Luria (1970,1972), there do not seem to be sharply delimited or structurally well defined areas that are alone responsible for the appearance of specific c1inicallanguage deficits. In other words, specific aphasic symptoms are not pathognomonic for destruction of one, and no other, cortical area. Instead there are gradients of probability for the occurrence of a symptom complex that may appear in connection with a lesion in a given area [po 5241.
These points of view are stressed at the outset of our discussion because it is all too easy to infer from a first reading of the literature that the opposite
318
5.
LOCALIZATION OF MENTAL FUNCTIONS
conclusion, namely that a high degree of localization of speech function does occur, is actually the true state of affairs. However, it is also important to note as we begin this discussion that some of the most active workers in the field of brain localization of speech functions still adhere to the general theme of the theory of radical localization originally proposed by Carl Wernicke one hundred years ago. Geschwind (1970), for example, still feels that Wernicke's (1874) classic hypothesis of precise cerebral localization of the various speech functions is the most useful contemporary theory and that the functions assigned by Wernicke and his predecessor Broca to particular centers of the brain still hold. Masland (1971) is another contemporary worker who feels that there is a great deal to be said for rather specific localization of speech function on the surface of the cerebral mantle. We must recognize, therefore, that there still is a great deal of controversy concerning the localization of speech function. I begin the discussion by considering the value of speech itself. It is a truism that speech is an especially important, and perhaps unique, human function, binding individuals and people together in time and providing the necessary basis for cultural progress. Until recently, it was thought that symbolic language, such as speech and writing, was exclusively a human process. In recent years, however, several workers (Gardner & Gardner, 1969; Premack, 1971) have shown that it is possible to train chimpanzees to use signs in symbolic ways that are difficult to distinguish from the vocalizations of human speech. This extraordinary and exciting act of animal training would be just a trick if it were not for the fact that it emphasizes an extremely important theoretical point: Language is not a single process or act but is a complex of a number of different sensory, response, and integrative functions. Language, according to this point of view ,may best be thought of as including three different sets of subprocesses. One set includes the actual motor processes that control the production of the speech sounds or phonemes. A second set corresponds to the processes that form the sequence of appropriate speech terms in accord with the rules of grammar. Linguists would refer to this aspect of speech as syntax generation. A third set concerns the processes that regulate the semantic context or meaning of the spoken or written sentences. Could these three aspects of language behavior reflect three different aspects of brain anatomy? Certainly some of the early workers in the field thought so. There was already an elaborate taxonomy of the various kinds of aphasias that could result from speCifically placed brain lesions by the middle of the nineteenth century. The generally accepted theory was that different forms of these aphasias reflected inabilities to form speech sounds, to sequence words in accord with the rules of grammar, or to process the symbolic meaning of the respective speech sounds. Table 5-1 tabulates some of the better-known speech difficulties in accordance with these three classifications.
C.
PSYCHOBIOLOGY OF THINKING AND SPEECH
319
TABLE 5.1 Disorders of Symbolic Communication
A. Disorders of articulation and hearing 1. Aphonia-Inability to speak due to vocal organ damage 2. Apraxia-Inability to speak due to loss of motor control 3. Anarthia-Inability to form or articulate speech sounds 4. Deafness-Inability to hear for mechnical or neural reasons B. Disorders of syntax 1. Conduction aphasia-Aphasia in which words are skipped or repeated C. Disorders of semantics 1. Semantic Aphasia-Inability to understand meaning of words 2. Agnosia-Inability to recognize meaning or spatial and temporal relations of objects 3. Alexia-Inability to understand meaning of written words 4. Anomia-Inability to name objects 5. Apractognosia-Inability to handle spatial relations 6. Asymbolia or asemia-Inability to understand significance of symbols 7. Finger agnosia-Inability to recognize one's own fmger 8. Acalculia-Inability to do simple arithmetic 9. Agraphia-Inability to write 10. Aphemia - Inability to express meaningful words
What can presently be said with assurance about the localization of these speech disorders in the brain? One of the points on which all contemporary workers currently agree is the fact that the two cerebral hemispheres are not exactly symmetrical. Depending to some degree but not entirely upon the handedness of the person, one or the other hemisphere is found to be dominant for processing speech and writing. It is only in unusual circumstances (such as congenital or early damage to the corpus callosum) that the dominant hemisphere will not be the one contralateral to the dominant hand. Given that most people are right-handed (the cited figures vary from 70 to 90% from different sources), the left hemisphere is the one in which the dominant speech centers are said to be most often located. Damage to the right hemisphere can be remarkably extensive without any deficit being produced in verbal behavior. (But, note also the relevance of and the caveats within the discussion on the split brain preparation later in this chapter.) Since the time of Broca, there have been assumed to be well-defined regions within the confines of the dominant (usually left) cerebral hemisphere that, when damaged, will produce particular types of aphasic deficits. I have already alluded to the early studies in our historical survey, but to be more specific, I should note that Wernicke himself assumed that the area known as Broca's Area (indicated on Fig. 5-26) was primarily involved in the organization of the syntactical relations between items in a sequence of speech sounds, and that portion
320
5.
LOCALIZATION OF MENTAL FUNCTIONS
SPEECH EVIDENCE
ARE A S
FROM
EXC
SION
FIG.5-26 Three major areas of the brain believed by many to be associated with speech function. The posterior area stippled with many small dots is known as Wernicke's area, the anterior area stippled with irregularly placed larger dots is Broca's area, and the regularly spaced dots denote an area (on the inside of the hemispheric fissure) known as the supplementary motor area. The extent of each area indicated on this diagram is only approximate. (From Penfield & Roberts, ©1959, with the permission of the literary executors of W. Penfield and Tha Princeton University Press.)
known as Wernicke's Area (also seen in Fig. 5-26) seemed to be more closely related to the symbolic, representational, or semantic processing of the meaning of speech. The other aspect of speech, motor control, has typically been assumed to be controlled by the classical motGr areas that are located on the most caudal portion of the motor cortex quite near Broca's Area_ The proximity of these two regions has often produced difficulty in distinguishing between the motor and syntactical aspects of speech. Other workers have shown that lesions of the supplementary motor regions on the top of the cerebral hemisphere also produce speech disturbances very much like those produced by lesions in Broca's Area. This area is also shown on Fig. 5-26.
C.
PSYCHOBIOLOGY OF THINKING AND SPEECH
321
Geschwind (I 970) has also suggested that there is a special role played by the band ofaxons-the Arcuate Fasciculus-connecting Wernicke's and Broca's Areas. When transmission of information along this band is disrupted, he asserts that there is a characteristic kind of speech difficulty produced that is referred to as conduction aphasia. Conduction aphasia is most typically characterized by the patient's inability to repeat words spoken to him, even though other aspects of his speech might be nearly normal. At the risk of misleading the reader into accepting the concept of the localization of speech function with greater credulity than is justified, Fig. 5-27, from Penfield and Roberts (1959), is presented as a summary of several of the historic theoretical positions. This chart depicts the regions of the left hemisphere that were assumed for many years to be associated with a number of the different kinds of specific aphasic conditions. As one studies this figure, it is particularly important to keep in mind the fact that modern theories postulate a much less specific form of localization of function than is suggested there. One source of confusion that led to the formularization of this somewhat misleading map lies in the fact that true syntactic or semantic aphasia is a con-
DISORDERS OF SPEECH FROM THE LITERATURE
syntactic
syntactic syntactic syntactic areas areas
syntactic
FIG. 5-27 Penfield and Robert's summary of the areas of the brain that seem to be associated with particular forms of speech disorder. (From Penfield & Roberts, ©1959, with the permission of the literary executors of W. Penfield and The Princeton University Press.)
322
5.
LOCALIZATION OF MENTAL FUNCTIONS
dition that can be closely imitated by much simpler sensory or motor difficulties that have little to do directly with the interpretive and symbolic aspects of speech. Another difficulty with any theory of sharp localization of speech function is the fact that there is a great deal of recovery of function in almost all cases of traumatic aphasia. Although almost any damage to the left hemisphere will produce some sort of speech deficit, there is practically no area that will produce a permanent speech deficit when injured. Even the extreme speech deficits produced by lesions of Broca's or Wernicke's Areas are often transitory; speech function usually recovers except in cases in which a progressive neural degeneration is induced. Furthermore, the younger the patient is, the better the chances of recovery of full speech capabilities. Clinical reports of the recovery of perfectly normal speech function after total destruction of the classic Wernicke and Broca Areas continue to be forthcoming (see Lennenberg, 1974, p. 525 for a summary of recent studies). Because clinical data are the only source of information concerning brain localization of the uniquely human ability to speak, it is of the utmost importance that even exceptional and anecdotal evidence be carefully considered, even though it is often "noisy" by conventional laboratory standards. If strict localization of speech function, in the sense originally proposed by Wernicke, is no longer tenable, what organizing theorem can we use to guide future research and therapy? What we must turn to instead is another expression of the general theory that asserts that a number of brain centers must interact to produce any complex mental function. Indeed, the notion that it is possible to separate the articulatory, syntactical, and semantic aspects of speech from each other may be just as fallacious as the attempt to assign these functions to separate parts of the brain. Once again, it is Eric Lennenberg (1974) who best summed up the contemporary view: Very many parts of the brain must contribute to the proper function of a behavior that is as inseparable from perception, memory, concept formation, and every other cognitive process as is language. The anatomy of gross and microscopic connections of the brain may as easily be cited in support of the notion that language is the product of many different physiologically interacting parts of the brain as in support of a speech-centers-with-connections model-perhaps more reasonably so. Locke stressed that cortical areas that have been called "language centers" are probably merely regions with certain physiological functions such that, when impaired, they disturb the smooth interaction and harmonious interplay among various suborgans of the brain. Some such disturbances are evidently more injurious to certain types of capacities (say language) than others. In short, my point of view denies that types of behavior have their own specialized and autonomous centers; rather, it proposes that every differentiated part of the brain makes its own physiological contribution to widespread activity patterns, resulting from interactions of different parts of the brain, and that these activity patterns are the proper correlates of such behavior as language.
C.
PSYCHOBIOLOGY OF THINKING AND SPEECH
323
While it is likely that no connection in the brain is random and that there is an orderly relation between fields of cells in the cortex, for example, and correlated areas in the retinae, the skin, the skeletal muscles, or in subcortical nuclei, this does not lend credence to the notion of centers with principal control over any particular kind of circumscribed behavior. The brain is not a loose aggregate of autonomous organs, but a single organ. Its anatomical subdivisions undoubtedly have their own specific physiological functions, contributing to various types of behavior in different ways. But, so far, we know of no behavioral entity that is the exclusive product of just one brain region alone [pp. 627-629].
4.
Asymmetry of the Cerebral Hemispheres
It now appears that there is a good argument to support the hypothesis that the two cerebral hemispheres do not serve identical roles with regard to their sensory, integrative, and motor functions. The primary sensory projection regions of the cerebral mantle are typically arranged with at least some degree, and in some cases, a complete, crossover of this ascending and descending neural pathway. It is well-known, for example, that the postcentral somatosensory regions and the prefrontal motor regions are mainly crossed so that the left hemisphere predominantly receives signals from the somatosensory receptors on the contralateral side of the body and, with some exceptions, sends motor signals back to the contralateral side of the body. Similarly, the visual system, because of the crossover of the nasal portions of the visual field at the optic chiasm, is organized so that the left cerebral hemisphere receives signals from the left nasal and the right temporal hemiretinae, and the right hemisphere receives signals from the right nasal and left temporal hemiretinae (see Figs. 5-9 and 5-12). Many auditory fibers also cross over. It is clear, therefore, that the two hemispheres of the brain are not functionally or anatomically identical even at the level of these relatively low-level sensory input and motor output regions. The question we face now, however, does not concern these communication areas; rather it is concerned with the nature of the functional dissimilarities that may occur between the intrinsic areas of the brain, which are more likely than the other regions to be involved in higher cognitive functions. The intrinsic areas of the left (usually) hemisphere appear to be more involved in the mediation of several different kinds of speech processes in the normal human than are those of the right hemisphere. The question now arises: What role is played by the extensive intrinsic regions of the right hemisphere, which have previously gone unmentioned? During the last twenty years, considerable progress has been made in the study of the lateralization of the functions of the intrinsic cortical regions using a remarkable surgical technique based upon an unusual anatomical fact. The two cerebral hemispheres are structurally separate from each other and can communicate at the cerebral level only through two bundles of heavily myelinated fibers. The main cross-connective is the great corpus callosum shown in Fig. 5-28.
324
5. LOCALIZATION OF MENTAL FUNCTIONS
FIG. 5-28 A ventral dorsal section of the brain showing the important midline connecting links between the hemispheres of the brain. (From Sperry, ©1967, with the permission of The Rockefeller University Press.)
This band of neurons constitutes by far the largest number of interconnecting neural pathways, although there are a few other and considerably smaller commissures directly caudal to the corpus callosum. The posterior and the habenular commissures are, in fact, cross-connectives at the level of the diencephalon rather than the telencephalic cerebrum. The anterior commissure, on the other hand, which is truly a connective between the two cerebral hemispheres , is so intimately linked with the corpus callosum that surgical transection of the latter almost always includes the former. The main function of the corpus callosum is to interconnect all of the regions of the two hemispheres, with the exception of the inferotemporallobes. The anterior commissure is responsible for the transmission of signals between the two inferotemporal lobes. The details of this arrangement are shown in the diagrammatic representation of Fig. 5-29. In sum, surgical transection of just the anterior commissure and the corpus callosum abolish all direct interhemispheric communication, sparing only those indirect links that flow through the lower centers of the brain.
C. C.
PSYCHOBIOLOGY PSYCHOBIOLOGY OF OF THINKING THINKING AND AND SPEECH SPEECH
interconnected interconnected interconnected
325 325
interconnected interconnected
interconnected
interconnected
interconnected
interconnected interconnected
interconnected interconnected
interconnected
each.
interconnected interconnected
interconnected interconnected interconnected interconnected
interconnected
FIG. 5·29 Cross Cross section section of of the the brain brain showing showing the the general general pathways pathways of of the the cerebral cerebral comcomFIG.5-29 missures and the the regions regions interconnected interconnected by by each. each. (From (From Crosby, Crosby, Humphrey Humphrey,, && Lauer, Lauer, missures and ©1962, with with the the permission permission of of Macmillan Macmillan Publishing Publishing Company.) Company.) ©1962,
If the the corpus corpus callosum callosum of of an an experimental experimental animal animal isis transected transected (such (such aa relarelaIf tively called aa commissurectomy), commissurectomy), there there isis practically practically no no tively simple siInple operation operation isis called difficulty of the the life-sustaining life-sustaining homeostatic homeostatic difficulty in in keeping keeping the the animal animal alive. alive. None None of processes, processes, generally generally controlled controlled by by the the lower lower brain brain stem stem centers, centers, are are affected affected to any any substantial substantial degree. degree. Rather, Rather, the the commissurectomized commissurectomized animal aninlal isis generally generally to normal in in all all except except the the most most subtle subtle ways. ways. Indeed, Indeed, even even humans humans who who have have had had normal their unfortunate accident accident or or in in their corpus corpus callosum callosulll transectcd, transectcd, either either through through unfortunate some therapeutic therapeutic surgical surgical procedure, procedure, appear appear to to be be almost almost entirely entirely normal normal exexsome cept cept under under the the most most highly highly controlled controlled test test conditions. conditions. In In such such highly highly structured structured taken to to prevent prevent indirect indirect communication communication between between test conditions, conditions, great great care care isis taken test hemispheres (for example, example, eye eye movement movement might might allow allow both both hemispheres henlispheres to to rerehemispheres (for ceive the the same same stimUli). stimUli). Both Both animal animal and and human human studies studies have have resulted resulted in in the the ceive of some some extraordinary extraordinary effects effects of of commissurectomies. commissurectomies. demonstration of demonstration [rDIn an an inability inability on on the the part part of of an an operated operated animal animal These effects effects may may vary vary from These to transfer transfer aa task task learned learned with with one one hemisphere hemisphere to to the the other other (Myers (Myers & & Sperry, Sperry, to to what what appears appears to to be be the the emergence emergence of of two two relatively relatively independent independent concon1953) to 1953) scious of aa single single human. human. The The dede· scious personalities personalities (Sperry, (Sperry, 1966) 1966) within within the the skull skull of tennination of of just just what what functions functions can can be be performed performed by by each each of of the the separated separated termination of the the most important important targets targets of contemporary contemporary hemispheres has become become one of hemispheres psychobiological research. research. psychobiological Sperry Sperry and and his his colleagues colleagues (for (for summaries, summaries, see see Gazzaniga, Gazzaniga, 1970 1970;; 1975) 1975) have have shown in commissurectomized commissurectomized humans humans the the two two hemispheres hemispheres can can operate operate shown that in
326
5.
LOCALIZATION OF MENTAL FUNCTIONS
independently to perceive and learn in those situations in which the inputs are carefully restricted so that only those from one side of the body, or one visual field, reach each hemisphere. This can be accomplished most easily for visual inputs by tachistoscopic exposures and proper placement of stimuli in the field of view of each hemisphere. When stimuli are carefully restricted in this manner, the two hemispheres each function as if there were no communication of the learned or perceived information between them. In the course of these early experiments, which have demonstrated a not-toosurprising absence of communication (given that all known interhemispheric connections have been transected), it was also observed that the mental capabilities of the two hemispheres were not symmetrical. Each hemisphere appeared to display demonstrably different capabi:ities. For example, the left speech hemisphere (for a right-handed subject) WaS better able to recognize an object and verbally respond by naming the object, either orally or written-an act of symbolic language processing. However, when the same object was placed in the field of view restricted to the other hemisphere, the patient was unable to name the object. Nevertheless, even though the object cannot be named, the patient was able to recognize it and process information about the object in appropriate and often complex ways. For example, if an object, like a screwdriver or a cup, was put into the left hand of a normally right-handed person, the signals transmitted to the right hemisphere produced appropriate manipulation of the object; the right hemisphere, therefore, was said to be capable of stereognosis. This manipulative skill indicated that the patient was able to process some sensory information in a meaningful, cognitive fashion, in spite of the fact that he was unable to name the object either verbally or in writing. The right hemisphere in this case was, therefore, assumed not to be unconscious, imperceptive, inattentive, or incapable of cognitive function. It was simply asserted to be less capable with specific regard to its ability to assign language symbols to perceived objects! Sperry and his collaborators showed in these early experiments that the right hemisphere of the brain was just as capable as the left in the processing of two- and three-dimensional spatial information, but lacking the ability to verbalize the ongoing mental processes. As shown later in this section, however, the hypothesis of a mute right hemisphere is no longer so strongly held. The extraordinary phenomenon of the emergence of two almost independent personalities in humans with split brains can lead to some curious situations. Sperry (1966) discusses situations in which one hand of a patient actively tried to prevent the other hand from performing routine tasks, like putting on the patient's pants! Anecdotes such as this should be read, of course, with a great deal of caution. It is almost impossible to interpret exactly what is implied by these conflicts between two "separate" personalities. A great deal of research is
C.
PSYCHOBIOLOGY OF THINKING AND SPEECH
327
yet to be done on split brain preparations to exploit this very important discovery in order to interpret its full impact on theories of brain localization and psychological function. To emphasize the preliminary nature of theories of lateral hemispheric specialization, however, it should also be noted that very recently one of Sperry's colleagues (Zaidel, 1973; 1974) has demonstrated that substantial verbal ability does appear to exist in the right hemisphere of commissurectomized patients when the appropriate experiments are carried out using a special optical system (Zaidel, 1975). Although this verbal ability seems to be limited to short phrases or single words, it is now clear that speech information can be received and interpreted to at least some degree by the right hemisphere. Thus it is not quite as profoundly deaf and mute symbolically as had first been thOUght. As a final caveat, however, it is extremely important for the reader to remember that the split brain preparation is critical in carrying out any analysis of differential functioning of the two hemispheres. A large number of studies that purport to study the function of the separate hemispheres are carried out on normal subjects without commissurectomies. These experiments supposedly take advantage of the fact that portions of the visual pathway go initially to a single side of the brain or that right-eared and left-eared auditory performance differs. Unfortunately the cross-connections between the two hemispheres in the normal subject make any such analysis in the normal subject spurious. It is not possible to state where the mental processes have occurred. It is totally inappropriate, therefore, to suggest that without a comissurectomy, the effects of a single hemisphere are assayed by a visual task, even if the stimulus had been positioned so that it was sent originally to only one hemisphere (as was done by Patterson & Bradshaw, 1975)-or that difference in left-ear and right-ear performance indicated difference in auditory hemispheric capability (as was claimed by Bever & Chiarello, 1974). In addition to the possible cross-connectives through the corpus callosum and the other commissures, it should also be noted that the auditory system sends signals to both hemispheres from either ear with crossover occurring as low as the medulla. Not only should an auditory experiment not be used in the normal patient to study this problem, but it would be an inappropriate stimulus to use in dealing with all except the most deeply split brain preparations! In sum, it is now clear that there is some asymmetry in the function of the two cerebral hemispheres, particularly with regard to symbolic language processes. But there is considerable flux in the interpretation of the available data. One way to sum up the hypothetical details of this asymmetry is to present Fig. 5-30, prepared by Sperry (l970b), to graphically portray his findings as they were understood at that time. In the last few years, however, there has been considerable change in the theory of localization of the processes indicated in this figure, as well as in its
328
5.
LOCALIZATION OF MENTAL FUNCTIONS
FIG.5-30 Sperry's theoretical attribution of the function s mediated by each of the cerebral hemispheres. (From Sperry , ©1970b, with the permission of The Association for Research in Nervous and Mental Disease.)
general concept. This figure, therefore, is somewhat misleading in stressing a more complete differentiation of cerebral function than is now thought to be the actual case. Needless to say, the technique of split brains is exceedingly important but has not yet been thoroughly exploited in the analysis of the localization of cerebral function.
D.
AROUSAL AND ATTENTION
In 1975 , two papers on the psychobiology of arousal and attention were published by two authors from somewhat divergent portions of the psychological
D.
AROUSAL AND ATTENTION
329
community. One of the authors, Michael Posner (Posner, 1975), has been primarily trained in the field of human performance; the other, Karl Pribram (pribram & McGuinness, 1975), came to psychobiology from medical neurology. Both of these highly respected contemporary scientists have converged on a research problem of extreme complexity-that of arousal and attention. Both have become interested in elucidating the brain mechanisms that are involved in the determination of that state of an organism that has to do with the responsiveness to and selection of particular stimuli. Although it seems that neither Posner nor Pribram was aware of the work of the other (there are no citations of the other's work in either of the two papers), there is an important overlap in the two papers that may be extremely instructive in helping us to understand exactly what is meant by the terms arousal and attention. It is an extraordinary example of convergent scientific thought that both Posner and Pribram attempt to define attention in terms of a tripartite classification. By comparing their two attempts at definition, we should be able to determine if there is a common base of agreement in their approaches from which we can draw a single, unified view that is prototypical of the term attention among contemporary researchers in the field. Posner's (1975) tripartite classification of attentional acts is based mainly upon psychological considerations; he indicates three "senses" of the word attention: 1. One sense of the word attention is called alertness and concerns the study of an organismic state which affects general receptivity to input information. 2. The second sense of attention involves selection of some information from the available signals for special treatment. 3. Finally there is a sense of attention related to the degree of conscious effort which a person invests [pp. 443444].
Pribram's analogous approach to the problem is based upon his long experience with the neurological and physiological aspects of arousal and attention research. Nevertheless, Pribram, in the Pribram and McGuinness (1975) paper also comes up with three kinds of attentional processes: 1. One regulates arousal resulting from input .... arousal, which is defined in terms of phasic physiological responses to input. 2. . .. a second controls the preparatory activation of response mechanisms. . .. activation, which is defined in terms of tonic physiological readiness to response. 3. A third operates to coordinate arousal and activation .... This coordinating effort is defined as demanding effort [po 116].
Or, in a more succinct fashion, later in their paper: Three neurally distinct and separate attentional systems-arousal, activation, and effort -operate upon the information processing mechanism. The presumed operation of these control systems is perhaps best illustrated as follows: The orienting reaction involves arousal but no activation; vigilant readiness involves activation
330
5.
LOCALIZATION OF MENTAL FUNCTIONS
but no arousal; the defense reaction involves both arousal and activation; when neither arousal nor activation is present, behavior is automatic, that is, stimulus-response contingencies are direct without the intervention of any of the control mechanisms of attention [po 133].
Are the two definitions of attentional processes similar? In fact, there appears to be a remarkable correspondence between these two approaches to the definition of the term. Posner and Pribram, along wi th the latter's colleague McGuinness, all agree that their first process is concerned with the "general receptivity" or "regulation of arousal" to incoming stimuli. Likewise, there is a near synonymity of the notion of stimulus selection, on the one hand (posner), and the notion of activation of a particular response mechanism, on the other (pribram & McGuinness). With regard to the third point, the words are virtually the sameboth note the importance of effort in regulating the balance between the two processes or the stimuli to which attention shall be directed. Thus, despite their separate intellectual origins, the definitions developed by both Posner and Pribram are in essential agreement as to just what psychological processes are being dealt with when one studies attention. Whether the approach is psychophysical or physiological, the notions of receptivity, selectivity, and effort, permeate the study of what is today known as attention and arousal. One of the most important of the brain structures involved in arousing an animal and readying it to attend to and select among stimuli appears to be the brain stem reticular activating system shown in Fig. 5-31. The reticular activating system consists of a very complex network or reticulum of neurons in the central core of the upper spinal cord and the brain stem. It is anatomically characterized by containing some of the largest single neurons of the central nervous system. Figure 5-32, for example, depicts only a part of the axonal arborization of one reticular cell. According to Scheibel and Scheibel (1958), this extensive collateral arborization is displayed along the entire course of all neurons of the ascending reticular system. Even if the detailed cytological structure is very difficult to describe because of cytological size and complexity, the general functional arrangement of the ascending reticular activating system has been relatively simply conceptualized. The most important fact of reticular function is that it receives collateral inputs from all of the primary sensory pathways as they pass from the periphery to the sensory projection regions of the cerebral cortex. The ascending reticular system has multiple inputs and is thus universally polysensory. After a considerable amount of integrative interaction among these individual sensory inputs, a generalized, nonspecific output, elicited by anyone or all of the inputs, is sent from the reticular system to many portions of the cerebral cortex-sensory projection, motor, and intrinsic areas alike. Thus visual inputs will actually produce nonspecific reticular output signals to most of the areas of the cerebrum in addition to the specific signals to the visual areas. This general distribution of sensory information from the reticular system is thought to be necessary for the general arousal of cortical response. Indeed, we
FIG. 5-31 5-31 The The organiza organization of the the ascending ascending reticular reticular system system.. In In this this case, case, somatosensory somatosensory FIG. tion of afferents are are seen seen to to enter enter the the spinal spinal cord. cord. Signals Signals from from these these afferents afferents are are thought thought to to be be afferents distributed to to almost almost all all portions portions of of the the cerebral cerebral mantle mantle through through the the reticular reticular system. system. (From (From distributed ©1954, with with the the permission permission of of Blackwell Blackwell Scientific Scientific Publications.) Publications.) Magoun, ©1954, Magoun,
of of of of of
of
of of of
of of
of of of
of of
of
of
FIG. 5-32 5·32 FIG.
A fairly fairly typical typical multibranched multibranched neuron neuron of of the the ascending ascending reticular reticular system system of of the the A brain. (From Scheibel Scheibel && Scheibel, Scheibel, © ©1958, 1958, with with the the permission permission of of Henry Henry Ford Ford Hospital.) Hospital.) brain, (From
331 331
332
5.
LOCALIZATION OF MENTAL FUNCTIONS
know that (Libet, Alberts, Wright, & Feinstein, 1967; Uttal & Cook, 1964) both the specific sensory signals and the nonspecific signal propagated along the reticular activating system seem to be jointly necessary for conscious awareness of a stimulus. This important fact has been determined because of a convenient difference between the conduction velocities of the specific and the nonspecific signals to the cortex. The evoked potential produced in the sensory specific projection regions appears at about 50 msec. following a stimulus; the nonspecific reticular signal, recordable over the entire head, occurs much later-about 200 or 300 msec. after the stimulus. A series of evoked brain potentials of this sort is shown in Fig. 5-33. Note that when a subject goes to sleep, there is virtually no change in the early signal from the primary projection area, but there is a nearly complete diminution in the late nonspecific signals. The conclusion to be drawn from this observation is that both the specific sensory signal to the primary projection region and the nonspecific generalized activation that passes through the reticular activating system are necessary for perception, awareness, and attention, although neither alone appears to be sufficient. The ascending reticular activating pathway was originally observed 70 years ago during some of the early anatomical studies of the brain stem (for a fairly up-to-date discussion of the cytoarchitecture of the reticular system see Olszewski, 1954). About 40 years later, Bremer (1935, 1937) observed that transection of the reticular system produced animals that seemed to be chronically asleep. A little more than a decade later, the results of some extraordinary experiments by Moruzzi and Magoun (1949) demonstrated that electrical stimulation of the reticular system produced EEG responses that were identical to those observed in animals that were normally aroused, alerted, or awake. The notion that the reticular system was an activating system, therefore, became immediately apparent, and, within a few years, a considerable amount of anatomical and physiological evidence was filling in the details of the function of this portion of the brain stem. There now seems to be a fairly strong body of evidence that supports this concept that the reticular system is particularly involved in arousing and maintaining readiness for stimuli. Without the reticular system, an experimental animal sleeps, and electrical stimulation of the reticular system produces both behavioral and physiological states (such as EEG patterns and pupil diameter changes) characterist:c of an alert, aroused, and attentive animal. But is the reticular activating system really necessary and sufficient for the maintenance of arousal? A considerable difference of opinion has emerged with regard to this question in recent years. Unfortunately, as in so many other situations in brain research, it has turned out that there was considerable recovery of the arousal function following some cases of reticular transection. Use of the Bremer technique (total reticular transection in small steps) resulted in animals that often spontaneously awakened after a few weeks. As usual, when one discusses the problem of localization, the story is not as simple as it may have seemed in the early 60s. The reticular system probably is not alone in controlling
w
tv w tv w tv
0
TIME IN MILLISECONDS
320
IPSILATERAL ASLEEP IPSILATERAL AWAKE
IPSILATERAL AWAKE IPSILATERAL AWAKE
320
IPSILATERAL AWAKE
IPSILATERAL AWAKE IPSILATERAL AWAKE
FIG,5.33 FIG.5 .33 The The effects effects of of sleep sleep on on the the averaged averaged evoked evoked potential potential showing showing the the change change in in the the trailing trailing edge edge of of the the N N wave wave and and the the decrease decrease in in ampliamplitude of the the 00 wave wave when when subject subject is is asleep. asleep. The The Nand Nand 00 wave wave nomenclature nomenclature isis now now obsolete. obsolete. (From (From Vttal Vttal & & Cook, Cook, 1964.) 1964.) tude of
o
1'V
334
5. LOCALIZATION OF MENTAL FUNCTIONS
arousal. Nevertheless, it seems certain that the reticular system is involved in arousal as is at least part of the necessary neural system. What are the other contributing nuclei? Thompson (1975) extensively reviews the role of such other brain loci as the medial thalamus, which appears to produce an inhibitory effect on arousal, an effect originally described by Magoun (1963); the frontal lobe, which, when ablated, produces an increase in the general level of activity in primates (French, 1959); the hippocampus, which will also produce hyperactivity when damaged (Kim et aI., 1971). Furthermore, a number of other brain stem nuclei, other than the reticular formation, have also been shown to be involved in the control of sleeping and waking. Examples, Thompson notes, of these other brain stem sleep centers are Jouvet's (1967) raphe nucleus and the nucleus known as the locus caeruleus. By far the most elaborate and specific theory of the role of the brain loci (other than the reticular activating system) involved in arousal and attention, however, is to be found in the paper by Pribram and McGuinness (1975) that I have already discussed. Figure 5-34 depicts their view of the multiple brain loci that they feel are involved in the control of arousal and attentional processes. Pribram and McGuinness go far beyond a simplistic theory based only on reticular system function to spell out a comprehensive theory in which specific neural centers are associated with the arousal, activation, and effort aspects of attention, respectively. They speculate (on the basis of a considerable amount of cited experimental and anatomical evidence not necessary to repeat here) that the amygdala may be the main center for accepting the neural inputs from the ascending reticular system, and furthermore, that the amygdala is specifically an intermediary in the production of the general, overall level of arousal that they refer to as phasic and which is associated with heightened autonomic responses. However, the amygdala in their theory is thOUght to be under the control of two neocortical regions that exert mutual inhibitory and excitatory influences on it. On the one hand, the orbitofrontal portion of the cortex acts to inhibit amygdala arousal; damage to this region will produce a hyperaroused and extremely active animal. On the other hand, the frontolateral portion of the cortex, in their theory, acts to excite the amygdala's response, and damage to it will reduce autonomic responses such as the generalized electrodermal responses that would have otherwise been produced by stimulation of the amygdala. Pribram and McGuinness go on to pinpoint the basal ganglia as the especially relevant loci for the mediation of the part of the arousal process they call activation. Furthermore, they assert that the hippocampus and those centers that interact with it are associated, in particular, with the attentional aspect they call effort. This region, they believe, serves to coordinate the arousal and activation processes mediated by the amygdala and the basal ganglia, respectively. It would be well to emphasize at this point that, although many of the individual interactive links shown in Fig. 5-34 are fairly well known, the scheme proposed by Pribram and McGuinness is, in its entirety, quite speculative. The
E. THE THE LIMBIC LIMBIC SYSTEM SYSTEM AND AND EMOTION EMOTION E.
EFFORT EFFORT
AROUSAL AROUSAL EFFORT EFFORT
INTRINSIC CORTEX CORTEX INTRINSIC
EXTRINSIC CORTEX CORTEX EXTRINSIC
AROUSAL AROUSAL
BASAL GANGLIA GANGLIA BASAL
DORSAL THALAMIC THALAMIC DORSAL
EFFORT EFFORT EFFORT
EFFORT AROUSAL
EFFORT EFFORT
EFFORT
EFFORT
EFFORT
EFFORT
HYPOTHALAMIC HYPOTHALAMIC
EFFORT
MESENCEPHALIC MESENCEPHALIC
EFFORT
SPINAL SPINAL
ACTIVATION ACTIVATION EFFORT
AROUSAL
EFFORT
335 335
EFFORT
EFFORT EFFORT
EFFORT
EFFORT EFFORT
EFFORT EFFORT
FIG. FIG. 5.34 5.34 Probram Probram and and McGuinness's McGuinness's model model of of the the organization organization of of the the "attention" "attention" system. system. See text text for for details. details. Dotted Dotted lines lines indicate indicate inhibition; inhibition; solid solid lines lines indicate indicate excitation. excitation. (From (From See McGuinness, ©1975, ©l975, with with the the permission permission of of The The American American Psychological Psychological AssociAssociPribram && McGuinness, Pribram ation.) ation.)
associations they they make make between between structure structure and and function function are are often often based based on on ininassociations direct clues clues and, and, of of course, course, are are always always biased biased by by the the relatively relatively artificial artificial definidefinidirect of the the psychological psychological processes processes that that may may be be involved. involved. Nevertheless, Nevertheless, Pribram Pribram tions of tions and McGuinness McGuinness have have made made aa major major contribution contribution by by synthesizing synthesizing aa large large body body of of and information into into aa well-structured well-structured model. model. Whether Whether or or not not this this particular particular model model information subsequently turns turns out out to to be be correct correct in in every every detail, detail, itit undoubtedly undoubtedly could could play play subsequently major role role in in coordinating coordinating research research efforts efforts in in this this field field and and in in serving serving as as aa founfounaa major dation on on which which some some future, future, more more accurately accurately descriptive descriptive theory theory will will ultimately ultimately dation be built. built. be Pribram and and McGuinness' McGuinness' model, model, unlike unlike its its predecessor predecessor theory theory of of aa unique unique retiretiPribram of the the variety variety of of neurostructures neurostructures cular role, role, emphasizes emphasizes the the very very important important fact fact of cular involved and and the the complexity complexity of of interactions interactions that that must must certainly certainly occur occur among among involved them to to mediate mediate arousal. arousal. them
E. THE THE LIMBIC LIMBIC SYSTEM SYSTEM AND AND EMOTION EMOTION E.
Developing an an adequate adequate definition definition of of emotion emotion isis particularly particularly difficult difficult due due to to aa Developing gross failure failure of of aa satisfactory satisfactory behavioral behavioral taxonomic taxonomic analysis. analysis. We We still still do do not not have have gross good definitions definitions of of emotions. emotions. Furthermore, Furthermore, as as with with any any other other mental mental process, process, good of emotion emotion remains remains intrapersonal intrapersonal and and can can be, be, at at best, best, only only inadeinadethe experience experience of the quately suggested suggested by by verbal verbal or or other other behavioral behavioral means means or or measured measured by by phyphyquately
336
5.
LOCALIZATION OF MENTAL FUNCTIONS
siological indicators that are assumed, on the basis of a very mixed bag of data, to be associated with the emotional experiences. The very abundance of indicators of emotion itself can create a problem. It is usually very difficult to distinguish between changes in the evoked motor response and the underlying motivating states, and this can lead to major misinterpretations of the role of some lesioned nuclei. Furthermore, the use of autonomic or other physiological indicators as measures of emotion also raises an important additional conceptual problem. Although the premise that certain of these physiological indicators are equivalent to emotional states seems implicit in the writings of some contemporary psychobiologists (for example, many psychophysiologists imply that the electrodermal response is equivalent to emotional "arousal"), it must be remembered that there is not necessarily any direct association (in the sense of coding or psychoneural identity relationships) between such electrophysiological responses as the EKG or changes in the skin resistance, on the one hand, and the psychological states we call attention or arousal, on the other hand. Or, for that matter, is there any direct correspondence between these electrophysiological signals and any other mental process? Although there is considerable evidence that the two domains of response, the electrophysiological and the mental, are often correlated, it is absolutely essential for the reader to keep in mind that it is possible to disassociate autonomic responses and emotional mental states. This is one of the striking and important implications that is to be gleaned from the extensive current efforts on the conditioning of autonomic responses, which is usually understood to speak to other issues. In other words, autonomic response can be conditioned without conscious awareness. This repeatedly demonstrated but usually overlooked disassociation is a compelling argument against any idea of a direct correspondence between autonomic response and the intrapersonally private emotional mental states. Clearly, this is a point of considerable import in today's world of pop psychobiology, commercially available (to the public) biofeedback training devices, and the quasilegal status of commercial lie detection services. It is also important in those formal laboratory studies of "emotion" in which localized electrophysiological activation and some autonomic responses have been used as stimuli and responses, respectively, and in which the autonomic response is dealt with as if it were tantamount to emotion. The persistent possibility of this conceptual confusion should be kept in mind in the following discussion. Emotion, though it may not be any more precisely definable than a perceived color and is far less well anchored to specific physical referents, may be best understood in relation to the affective states that are popularly described by such words as "love" and "pleasure," at one end of the continuum, and "rage," "hatred," "pain," and "displeasure," at the other. The intrapersonal experiences that fall along the emotional continuum are very often associated with either approach or avoidance behavior, depending upon the valence of the affect or the pleasure-displeasure balance in an ambivalent stimulus. It is clear then that even
E.
THE LIMBIC SYSTEM AND EMOTION
337
these intrapersonally private dimensions of emotional response are necessarily going to be very closely associated with other brain and mental processes that might, at first glance, seem to be quite independent. Fear, rage, or anger and love or pleasure are all referenced to some particular stimulus object. Only in pathological states is rage not object-directed. Thus emotional responses must necessarily be closely interrelated to sensory processes. Similarly, the state of arousal of the organism usually will also playa very important role in its emotional response as will any previous cognitive experiences that it has had with regard to the referenced object. Therefore, many of the anatomical structures that are believed to be more involved in sensory, attentional, or learning acts must also be involved to at least some degree in the action of the emotional mechanisms. We can fairly ask, however, what brain mechanisms seem to be most intimately involved in controlling emotional behavior, even if it is conceptually inappropriate to regard them as the unique locus of the emotional process. As we look back over the history of theories of emotion, we see that there has been a well-ordered sequence in the development of psychobiological theories of the localization of emotion. One of the earliest of the modern neurologically oriented theories of emotion was the one proposed by William James (1842-1910) that has become known as the James-Lange Theory of Emotion. The James-Lange Theory was based upon a fact well-known since antiquity, namely that the autonomically driven structures of the circulatory system and the viscera are actively involved in many emotional reactions. James proposed that the experiences that we refer to as emotions are, in fact, sensory experiences produced by afferent signals fed back from the viscera. He assumed that the viscera were themselves excited by efferent signals from the central nervous system, but that interoreceptors within the viscera lay at the "heart" (if the reader will excuse the pun) of emotional experience. To James, therefore, emotion was simply sensory awareness of visceral states. The key neural structure in the James-Lange Theory, therefore, is the cerebral cortex itself, which receives and interprets these visceral sensory signals. The next stage in the development of psychobiological theories of emotion emerged when it became apparent that there was usually a time discrepancy between the physiological visceral response and the psychological emotional response, and that the psychological response occurred even if the viscera were disconnected from the rest of the body. In light of this sort of discrepancy, the James-Lange Theory was no longer tenable, and subsequent workers like Walter B. Cannon (1871-1945) turned to subcortical centers in their search for the locus of the emotional experience. Cannon (1927), for example, proposed that emotional experiences occurred when the dorsal regions of the thalamus were activated and, furthermore, that the hypothalamus controlled the emission of appropriate motor, visceral, and secretive responses. However, both the James-Lange Theory of visceral sensation and the Cannon thalamic-hypothalamic theory subsequently fell victim to newer developments
338
5.
LOCALIZATION OF MENTAL FUNCTIONS
in neuroanatomy and clinical medicine. In 1937, an exceedingly influential paper (papez, 1937) was published by James V. Papez in which he described the anatomical and functional organization of a coordinated set of neural structures of the diencephalon and the basal portions of the telencephalon. He believed that these portions of the brain were specifically involved in the control of emotional experience and responses. The system of centers that Papez described contained structures that had previously been collectively referred to as the rhinencephalon, due to the fact that they had long been thought to be part of the olfactory portion of the brain. This system is now referred to as the limbic 5 system and has already been depicted in Figs. 3-38 and 3-39. Papez suggested, in his original paper, that a particular subset of the limbic structures formed a discrete neural circuit within this complex of interacting centers, and that the functioning of this circuit was essential for the elicitation of emotional experience and response. It is of some historical interest to consider the specific details of "Papez' emotional circuit" at this point. Papez proposed that afferent sensory signals, in addition to being projected to the primary sensory cortex, were also directed to the hypothalamus. Signals from the hypothalamus were sent back along the mammillothalamic tract to certain anterior nuclei of the thalamus. Signals from the thalamus were then projected to the cingulate lobe of the cerebral cortex. It was this cortical locus that Papez presumed to be the seat of the emotional experience, itself. From this locus, signals were sent along the cingulate tract to the hippocampus and from there, via the fornix, to another large conduction tract, to return to the hypothalamus where visceral responses were controlled. This pathway can be picked out in the diagrams of Fig. 3-38 or 3-39. It now appears to be the case that this famous "Papez Circuit" (as well as the theory of emotion that it embodies) is only part of a much more extensive complex of interacting loci and pathways within the limbic system and other parts of the brain stem that have been found to be involved in the regulation and control of those psychological processes we call emotion. I shall now briefly consider the role of the various components of the brain in the control of emotional experience and behavior, according to the best contemporary judgments. It is fortunate that we have, as a guide for the limbic aspects of this complex story, an important and insightful new book (Isaacson, 1974) devoted entirely to the anatomy and function of the limbic system. Another useful review (Clemente & Chase, 1973) oflirnbic system function in aggressive or antagonistic behavior will also be of special value to the reader who prefers a more detailed discussion than is possible here. Some of the most Significant early studies of the roles of the various brain structures in emotion were carried out by Bard and his collaborators (see, for 5Umbus means edge, and these structures are found along the inner edge of the cerebral hemisphere.
E.
THE LIMBIC SYSTEM AND EMOTION
339
example, Bard, 1934; Bard & Mountcastle, 1947). In an extensive series of experiments, Bard and his co-workers showed that there was a progressive change in emotional behavior as the surgeon descended deeper into the nervous system with a progression of transecting cuts. The general effect of the removal of the cerebral neocortex was to produce an animal that had a very low threshold for extreme emotional responses-the classic rage reaction. Typically, however, such animals were not able to direct the evoked rage toward a particular stimulus object. On the basis of this kind of data, Bard suggested that the cortex mainly served as an inhibitor of emotional centers deeper in the brain and as a director of the rage response. When progressive surgical transections were made below the level of the thalamus and hypothalamus, however, Bard observed that there was a gradual decrease in the variety of the components of the emotional response that could be obtained. The deeper the cut, the fewer the number of constituent emotional responses that could be emitted by the animal. Bard's major contribution-the suggestion that the hypothalamus was the unique nervous center for emotion-though no longer acceptable today, was based on this pattern of results. However, as we now know and as was pointed out by Papez in his classic 1937 paper, the hypothalamus is only one part of the limbic system, and it is to this complex of structures rather than the hypothalamus alone that more recent workers have looked for a more complete explanation of the neural basis of emotional behavior. The exploration of the functions of the nuclei of the limbic system has generally revolved around the effects of stimulation or ablation of one or more of the various centers on some particular emotional response, such as a rageful reaction. If I were to summarize, in a few words, the conventionally hypothesized role of the involved limbic nuclei, I would have to say that the hypothalamus is thOUght to playa rather special and central role in its ability to elicit emotional responses, and the other nuclei seem to be better described in terms that characterize their ability to modulate hypothalamic activities. A detailed review of work in the last twenty years on elicitation of aggressive behavior by hypothalmic stimulation can be found in Berntson (1972). An example of the central role played by the hypothalamus in emotion lies in the fact that it seems to contain centers of affect that are so compelling as to deserve the designation of "pleasure" or "pain" centers. Electrical stimulation of the lateral hypothalamic nuclei produces the powerful form of the self-stimulation response made famous by the fortuitous observation of Olds and Milner (1954). In this germinal report, these workers originally showed that animals in instrumental conditioning paradigms would repetitively stimulate themselves (occasionally to the point of exhaustion and even death) by depressing a lever that applied electrical currents to those portions of the limbic system. It is now thought by James Olds (Olds, 1962) that the most powerful of these "pleasure" centers are located in the region of the lateral hypothalamus. Furthermore, the
340
5. LOCALIZATION OF MENTAL FUNCTIONS
hypothalamus is also thought to contain analogous regions that are equally powerful elicitors of avoidance behavior, presumably because activation of these centers produces such a painful or unpleasant affect. As important as the Olds and Milner discoveries have been to physiological theories of behavior in the last two decades, I should not omit another important caveat at this point. As we see in the next section on motivation, it now appears that many of the hypothalamic effects on feeding and drinking are mediated not by hypothalamic nuclei but rather by interruption of the sensorymotor signals conveyed by fiber tracts that pass in close proximity to these nuclei. A similar situation might also exist in the case of those "pleasure centers." There has never been an adequate proof given that the self-stimulation responses were produced exclusively by the neurons of the hypothalamic nuclei and not by stimulation of these nearby sensory or motor fiber tracts. The specific idea of a localized pleasure center, although appealing to classical ideas of the localization of mental phenomena, still requires a considerable amount of further research for its validation. The hypothalamus is also thought to be deeply involved in the direct control of aggressive and defensive behavior. In another comprehensive review of the role of the hypothalamus in antagonistic behavior, Kaada (1967) concludes that a considerable amount of response-specific differentiation occurs among the constituent nuclei. An animal that is stimulated in the lateral hypothalamus will tend to attack some emotion-producing object, but an animal that is stimulated in the ventromedial pOr,tion will exhibit defensive behavior, and an animal that is stimulated in the dorsal hypothalamic nuclei will flee from the same threatening object. Berntson and Beattie (1975) have shown that these hypothalamic nuclei are not totally independent of each other, however. Rather, there is, quite to the contrary, a considerable amount of overlap between regions that produce attack and/or simply threatening behavior. Again the organization that is suggested is one in which certain nuclei are more likely to produce one or another behavior but not one in which they are divided into mutually exclusive subfunctions. It cannot be overemphasized, however, that even though the hypothalamus may be a dominant central command unit for emotional behavior, it is subject to strong inhibitory and excitatory influences from other of the structures that make up the limbic system, as well as from other neocortical centers. The hippocampus, for example, seems to exert a general inhibitory sort of function on hypothalamic nuclei. Removal of the hippocampus, rather than decortication in general, probably was the specific cause of Bard's original observations that decorticated animals tended to be hyperemotional and to display a very low threshold for rage reactions. The amygdala, quite to the contrary, seems to produce either inhibition or excitation of hypothalamically driven emotional behavior, depending upon
E.
THE LIMBIC SYSTEM AND EMOTION
341
which of two of its subregions are stimulated. The lateral and basal regions of the amygdala, when activated, tend to increase fear and avoidance behavior, but activation of the medial region tends to damp out such emotional behavior. It should not be forgotten, however, that the amygdala is actually a constellation of about a dozen nuclei and is no more a homogeneous unit than is the thalamus. Yet most lesion research on this area produces very large regions of damage. It is quite understandable why the results from such experiments are sometimes inconsistent and self-contradictory. The septal regions of the limbic system also seem to be deeply involved in the control of emotional behavior. In the first days immediately following septal damage, animals display a classic hyperemotional, antagonistic behavior pattern known as septal rage. Such septal animals can be extremely dangerous in the first days following the operation; they display vicious attack and aggressive behavior and generally make very bad house pets. It is a curious paradox, however, that the animal will exhibit an increased tendency to flee from fear-producing objects over time and appear much more "cowardly" than would a normal animal. In general, animals with septal lesions tend to become eventually much tamer than normal animals. However, the septum, too, is not homogeneous, and a number of studies [for a recent example, see Golda, Novakova, & Sterc (1975), which also reviews the problems of septal differentiation in detail] have shown that lesions in different portions of the septum may produce distinguishable behavioral effects. Golda and his colleagues, for example, have shown that laterobasal septal lesions produce a reduced tendency to aggressive reactions, but dorsal and mediobasal lesions do not seem to interfere with these reactions to the same degree. Electrical stimulation of the septum, on the other hand, does not produce a stable pattern of responses but produces results that are highly dependent upon the species of the animal and its current emotional state. The reader who feels the urge for greater detail of this important series of experiments concerning the limbic system and emotion should refer to Isaacson (1974) or Clemente and Chase (1973), or, for a particularly clear discussion, to Valenstein (1973). It should also be appreciated that psychobiologists now generally accept the notion that other nonlimbic systems are also deeply involved in the control of aggressive behaviors. In an important study that clarifies the role of sublimbic systems in attack and threat behavior, Berntson has shown (Berntson, 1973) that stimulation of the pons can produce attack and threat behavior of complex kinds as well as a more docile grooming response. In another paper (Berntson, 1972) he has also shown that lesions of midbrain tegmentum can drastically alter the responses that are produced by hypothalamic stimulation. Medial tegmentum lesions inhibited the release of hypothalamically stimulated aggressive behavior, suggesting that this medial region normally exerted an excitatory influence on the hypothalamus and that lesions of the lateral tegmentum allowed
342
5.
LOCALIZATION OF MENTAL FUNCTIONS
attack behavior to spontaneously erupt, suggesting that this lateral region normally exerted an inhibitory influence on it. Berntson's studies, therefore, are important in providing evidence that the control of emotional behavior is not limited to the hypothalamic regions or even to the limbic system. His findings once again stress the important generalization that almost all complex behaviors can be affected by almost any part of the brain and that the sort of localization that occurs must be interpreted more in the form of a system of interconnecting nuclei than in terms of any theory of sharply demarcated functions of single "centers." His findings specifically suggest the presence of a vertically organized system of nuclei that is involved in the control of emotions. Coupled with what we know of the involvement of other sensory and limbic centers in emotional behavior, it is clear that a major portion of the brain is actually involved in the regulation of emotional behavior. As this section comes to an end, it should be pointed out that the data obtained from experiments that explore the effects of either stimulation or ablations in the limbic system and related nuclei are intrinsically difficult to conceptualize. The responses that are used as behavioral indicators of limbic activity are very often complex patterns of behavior themselves (e.g., a generalized attack behavior pattern) rather than discrete events occurring at single instants in time. Subtle changes in the behavior of an operated animal could occur and be invisible to almost any particular behavioral assay technique that might be used. There is, in addition, a particular difficulty in controlling all of the relevant variables when one is studying emotional behavior. The effect of stimulation to or ablation of almost any portion of the vertically organized system to which I have alluded can be modulated by activity in almost any other region. Both the present environment and the past history of the animal can also dramatically alter the results produced by the manipulation of any independent variable. Furthermore, results of stimulation or ablation change over time; ablation of a specific region may initially lead to enhancement of emotional activity but then later to a reduction, or vice versa. The point being made by raising these complications in the interpretation of lesion- or stimulation-produced emotional behavior is that the current description of the role of the various centers of the system that controls emotional behavior is, at best, incomplete and inadequate and, in some cases, must certainly be misleading. Perhaps this difficulty has been best summed up by Alberto Zanchetti (Zanchetti, 1967), who, speaking only of the limbic system, said: A host of ablation and stimulation experiments have shown that emotional behavior can be altered, in either direction, by manipulation of various portions of the limbic system. Unfortunately, there is still no definite agreement on the part played by each structure. Although I do not intend to enter into details of this highly controversial subject, I shall cite the opposite effects of amygdalectomy described by Bard and Mountcastle (1947), who obtained increased susceptibility to rage behavior, and by Kluver and Bucy (1938), who observed placidity as a typical feature of their classical temporal lobe syndrome [po 607].
F.
MOTIVATIONAL PROCESSES
343
About the most meaningful comment I can make, therefore, is that we are at the very beginning of an understanding of the complex interactions that occur within the emotional controlling system and how they relate to emotion. At present, the main conclusion with which I most comfortably end this section is that there does seem to be a vertically organized system in the brain, which includes the limbic system and particularly the hypothalamus, that more directly controls emotional behavior than do most other portions of the brain. This does not mean that the other portions of the brain are entirely excluded from a role in emotional response. Many of the nuclei I have mentioned are also heavily interconnected with other regions of the brain. On the other side of the coin, it should also be remembered that neither the limbic system nor any of the other nuclei I have mentioned are exclusively involved in the control of emotions. For example, the limbic system, as one would expect considering the close relationship between emotional and appetitive drive states (motivation), is also involved in eating, drinking, mating, and maternal behavior, and, indeed, in some problem-solving behavior (see Thomas, 1968). The associations described in this section can only be my best guess based on some information that is variable, incomplete, and inconsistent. We simply do not know everything that should be known concerning the function of this complicated emotion-controlling system. Indeed, in light of the totally inadequate taxonomy of mental processes and behaviors, which is, perhaps, more blatant in the field of emotion than in any other area of psychology, it may be that many of the questions concerning this mental process that we are asking in the psychobiological laboratory are not only unanswered but also essentially meaningless.
F.
MOTIVATIONAL PROCESSES
It is quite clear that the emotional or affective responses that I described in the
previous section must necessarily be closely linked to the motivational state of the organism. Animals tend to strive for those things that are pleasant and to avoid those things that are painful. Thus the drive to approach some particular object or to avoid some other is inextricably associated with the affective or emotional experience produced when the goal object is encountered. Because of this obvious amalgamation of mental process and, as we shall see, at least a partial overlap in involvement of the particular structures of the brain, emotional and motivational states can be distinguished only with difficulty. Motivation is particularly elusive of definition because it is not so much reflected in a cluster of observable and specific responses, as for example rage and fear, as much as it is merely a state of the organism-a set of tendencies or weighting factors toward producing a variety of overt responses, any of which can achieve the goal of staisfying some specific need. Motives or drives may thus be better considered as propensities toward action rather than actions themselves. Localizing
344
5.
LOCALIZATION OF MENTAL FUNCTIONS
a "tendency" of this sort thus becomes a most elusive task, perhaps a nonsensical one. Another problem is that motivation, as evidenced by the overall behavior of the animal, is very strongly affected by the environment. The specific responses that are elicited by stimulation of particular nuclei of the hypothalamus are highly dependent on the external sensory stimuli the animal is momentarily experiencing. A typical example of this sensory impact is to be observed in the fact that, although certain hypothalamic nuclei were classically thought to be involved in the control of lactation, the secretory process is highly dependent upon the stimulation provided by a suckling infant. Sexual and food appetites are also directed at specific objects and require specific motor responses. Clearly, the electrophysiological states of the so-called emotional brain centers are not entirely definitive of the motivational states of the organism. Sensory and motor systems also contribute to their control. What, then, are the motivational processes for which we should search for a neural correlate? Generally, the appetitive drives, as they are often called, are those tendencies, or propensities, of the organism that impel it toward the execution of certain responses that tend to satisfy some of the physiological requirements for the maintenance or perpetuation of life processes within a relatively stable or homeostatic range. I only mention in passing the more complex drive states acquired through experience for the acquisition of certain objects that are not directly relevant to physiological maintenance. For example, such drives as the urge for "power" or "glory" or "wealth" are totally beyond the scope of this book. Thus the ingestion of food or water, the consummation of sexual activity, breathing, temperature regulation, sleep, lactation, and excretion, are all responses toward which the organism is impelled by drive states created within the body by some centrally detected imbalance in the internal chemical, mechanical, or thermal condition of the organism. Motivational drives, therefore, in addition to their necessarily close linkage to external sensory stimuli (the target or the goal object) and to effector mechanisms (throUgh which the drives are consummated), must also be closely linked to brain sensory mechanisms that are sensitive to the internal state of the organism. Internal receptors must be able to determine the deficiency or superabundance of some substance, such as oxygen, glucose, water, or hormones in the blood stream, a shift from optimal body temperature, or a pressure in the lower intestine. There is a considerable amount of evidence which indicates that sensory receptors exist in the hypothalamus that are specifically sensitive to the chemistry of the blood and are able to detect the concentration of nutrients, salinity, CO 2 , and oxygen, as well as blood temperature. The findings of the last five years, in particular, have not only argued strongly against the conventional behavioral taxonomy of motivation as a family of specific behaviors, but have also stressed the fact that no single center or group of nuclei has exclusive control over the generalized behavioral tendencies that we
F.
MOTIVATIONAL PROCESSES
345
call motivations. Prior to this last five-year period, a classic hypothalamic theory of motivation was almost universally accepted. It was generally assumed that motivated behavior like drinking, eating, and sex were controlled by specific hypothalamic nuclei. For example, destruction of certain hypothalamic regions was thought to specifically enhance drinking, and other ablations specifically inhibit drinking. Other nearby nuclei regulated eating or sex in a similar antagonistic manner. It appears now that this classic theory is a gross oversimplification of the true state of affairs. Not only are many other nuclei of the central nervous system now known to produce similar motivational effects, but there is at least some suggestion that transection of nearby bundles ofaxons to interfere with sensory motor processes may have been a more important influence on the behavioral change than the actual destruction of hypothalamic nuclei. The purpose of this section is both to review the classic point of view and to consider some of the more recent studies that argue against the notion that hypothalamic nuclei are specific centers for the control of motivated behavior. The classic view was that a key role was played by hypothalamic centers in motivated behaviors. The general role of these hypothalamic nuclei was thought to be to maintain a homeostatic balance. As an example of this classic theoretical viewpoint, consider the proposed role of hypothalamic nuclei in the control of drinking behavior. Drinking behaviors are necessary for the replenishment of fluids within the body. Serious cellular damage can occur when the extracellular water level is depleted. As the relative salt concentration of the extracellular fluids increases, a discrepancy between the elevated external salt concentration and the normal intracellular salt concentration is produced. There is, therefore, a strong osmotic tendency for fluids to be transported from the inside of the cells to the outside. This can quickly lead to the disruption of the life-sustaining electrophysiological processes that are dependent upon ionic concentration differences being held between certain narrow limits. How does an animal's nervous system know that body fluids are depleted? What causes a motivational state to be created that can lead to behavior that will correct this deficiency? In addition to any indirect sensory channels for the communication of information-the dry feeling in the mouth that must be mediated by somatosensory receptors, for example-there are probably receptors in the hypothalamus itself that detect the water balance within the body. Andersson and McCann (1955) have shown that high concentrations of salt solutions applied directly to nuclei located in the lateral regions of the hypothalamus can serve as an effective stimulation for excessive drinking. Bilateral removal of these lateral nuclei of the hypothalamus, it has been shown, produced adipsia (a total absence of drinking behavior). However, it is now thought that other centers of the limbic system are also specifically and primarily involved in controlling drinking behavior. When either the major portions or posterior regions of the septum are ablated, the animal becomes hyperdipsic and enormously overdrinks (Anand & Brobeck, 1951; Epstein & Teitelbaum, 1960; Harvey & Hunt, 1965; Lubar &
346
5.
LOCALIZATION OF MENTAL FUNCTIONS
Wolf, 1964). Such results suggest that there is a system of excitatory and inhibitory, or start-drinking and stop-drinking, centers that operate as antagonists to produce drinking or nondrinking behavior, depending upon which center is momentarily dominant. Another important characteristic of the classic theory of the function of many of the hypothalamic nuclei was that the nuclei seemed to be especially sensitive to the most minute amounts of certain hormones. Some of these supereffective hormones, such as angiotensin, are produced in other portions of the body as a function of reduced water levels. Therefore, it was realized early that there are other means of communication, in this case, the circulatory system, present in addition to the direct neural regulation of the state of the various anagonistic hypothalamic nuclei. The classic view also proposed that the hypothalamic nuclei that control eating were either the same as or located nearby those involved in drinking. There also appeared to be two distinct eating centers that operated in an antagonistic fashion just as there were in the control of drinking. One group, the lateral hypothalamic nuclei, if destroyed, seemed to selectively produce an animal that does not eat at all (Anand & Brobeck, 1951); it is said to be aphagic. On the other hand, if the ventromedial regions of the hypothalamus are damaged, the animal will become hyperphagic; it will continue to eat indiscriminately, far beyond its physiological needs, and may become grossly obese (Hetherington & Ranson, 1942). The beginning of apostasy from this classic theory might be found in a line of evidence which indicates that the adipsic and aphagic behaviors that are produced by lesions of the hypothalamus are generally only transitory. Teitelbaum and Epstein (1962) found that there is a remarkably orderly sequence of recovery of normal eating and drinking functions following lateral hypothalamic lesions, as shown in Fig. 5-35. Although an animal might not eat or drink at all immediately after the lesioning operation, it will progress from this first stage into a second stage in which it will eat small amounts of particularly palatable, moist foods. The recovering animal then progresses into a third stage in which it will eat almost everything that is wet. As time goes by, the animal will then pass to a fourth stage in which it will eat and drink normally. As one psychobiologist put it, it is almost as if the animal with lateral hypothalamic lesions progressed through the normal developmental stages of the maturing animal, learning or relearning this behavior anew. This progressive recovery of ingestive function began to suggest the very important possibility that the role of each of the hypothalamic nuclei may be acquired and differentiated from that of its neighbors on the basis of certain maturational and experiential factors that transcend any predetermined neural interconnections. If this is the case, then the proposed role of any of these nuclei as unique loci for motivational control becomes highly equivocal.
F. F. MOTIVATIONAL MOTIVATIONAL PROCESSES PROCESSES
5taCje I
Staoe II
Staqll ill ADIP 5 IA, DEHYDRATION_ APHAGIA
347 347
StaCle N
ADIPSIA. APHAGIA
ADIP5IA, ANOREXIA
hypothalamus. hypothalamus. hypothalamus.
the
the
the
the
hypothalamus. hypothalamus. hypothalamus. hypothalamus. hypothalamus.
the
the
the
the
EATS DfW WATER. (IF HYDRATED)
the
the
the
the
DP.INK5 WATER. 5UfWIVE5 ON DRY FOOD AND WATER.
the
the
the
the
REC.OVERY
FIG,5.35 Diagram showing showing the the stages stages in in the the recovery recovery from from the the adipsic, adipsic, aphagic aphagic effects effects of of FIG. 5.35 Diagram surgical surgical ablation ablation of of the the lateral lateral hypothalamus. hypothalamus. (From (From Teitelbaum Teitelbaum && Epstein, Epstein, ©1962, ©1962, with with the the permission permission of of The The American American Psychological Psychological Association.) Association.)
Other Other regions regions that that produce produce eating eating and and drinking drinking deficits deficits have have been been repeatedly repeatedly of the the involvement involvement of of the the basal basal ganglia ganglia of of the the demonstrated. Specific Specific evidence evidence of demonstrated. cerebrum, cerebrum, for for example, example, has has been been presented presented by by Neill Neill and and Linn Linn (1975). (1975). They They found found feeding feeding deficits deficits in in rats rats that that were were comparable comparable to to those those produced produced by by lateral lateral hypothalamic hypothalamic lesions lesions produced produced by by surgical surgical insults insults to to the the corpus corpus striatum striatum and and the the globus globus pallidus. pallidus. Marshall, Marshall, Richardson, Richardson, and and Teitelbaum Teitelbaum (1974) (1974) have have also also found found similar of the the pathway pathway between between the the substantia substantia nigra nigra similar effects effects when when other other points points of of G-hyG-hyin the the brain brain stem stem and and the the basal basal ganglia ganglia were were interrupted interrupted by by injections injections of in droxydopamine. droxydopamine. These These findings findings led led both both groups groups of of investigators investigators to to suggest suggest that that the feeding feeding deficits deficits produced produced by by the the lateral lateral hypothalamic hypothalamic lesion lesion were were in in no no way way the unique but actually actually represented represented only only one one of of many many possible possible unique to to the the hypothalamus hypothalamus but lesions lesions that that could could interrupt interrupt this this elaborate elaborate system system of of nuclei nuclei and and tracts tracts that that run run from from the the mesencephalon mesencephalon to to the the telencephalon. telencephalon. To that the the cerecereTo complicate complicate matters matters further, further, itit also also has has been been demonstrated demonstrated that bral cortex cortex itself itself and and particularly particularly the the frontal frontal lobes lobes are are also also involved involved in in the the control control bral of of eating eating behavior. behavior. Rice Rice and and Campbell Campbell (1973) (1973) chronically chronically implanted implanted stimulating stimulating electrodes of rats. rats. Electrodes Electrodes placed placed in in these these nuclei nuclei electrodes in in the the lateral lateral hypothalamus hypothalamus of will normally normally produce produce eating eating behavior behavior when when electricity electricity isis passed passed through through them. them. will However, However, this this evoked-eating evoked-eating could could be be markedly markedly reduced reduced and and the the thresholds thresholds for for electrically if major major portions portions of of the the electrically elicited-eating elicited-eating could could be be markedly markedly increased increased if frontal of the the rat rat were were removed. removed. This This suggests suggests an an interaction interaction between between the the frontal lobes lobes of frontal frontal cortex cortex and and the the lateral lateral hypothalamus hypothalamus such such that that the the frontal frontal cortex cortex facilifacilitates tates the the hypothalamically hypothalamically driven driven hyperphagia. hyperphagia.
348
5.
LOCALIZATION OF MENTAL FUNCTIONS
Braun (1975) has also shown similar disruptions in normal eating behavior following various kinds of neocortical lesions. Rats with complete decortication showed a shorter but qualitatively similar course of the deficit in eating and drinking behavior as did the hypothalamic animals. This sort of data makes it quite clear that, though the hypothalamus may play some role in initiating or terminating eating and drinking behavior, it is by no means sufficient to control the entire process. This idea is further supported when viewed in the light of other observations that indicate that driven hypothalamic eating behavior is always directed at reasonable food objects. The idea of hypothalamic triggering (GO) or terminating (STOP) mechanisms that themselves are heavily influenced by nonhypothalamic centers may be a much more realistic model of their role than is the theory that a single omnipotent hypothalamic "eating" or "drinking" center exists. An alternative view of the role of the hypothalamus is that it determines a "set point" for body weight. The set point hypothesis assumes that the ingestive and metabolic resources of the organism will be mobilized to achieve a prescribed weight. Some factor other than food intake will be used by the body to regulate the motivated level of hunger. This other factor may be the amount of fatty tissue or the blood glucose level, or for that matter almost anyone of a number of more or less indirect stimuli to continue eating long past the time the nutritional needs of the organism have been fulfilled. Overeating (hyperphagia) results, according to this view, from a missetting of the hypothalamically controlled set point rather than from an induced hyperphagia per se. If this view is correct, the role of the hypothalamus as a sensory center of the internal milieu may be more important than the alternative role as an "eating" or "GO-STOP" triggering center. The concept of the set point has been eloquently summarized by Keesey and Powley (1975). A sobering further reminder of the fragility of any model that assumes that the control of specific eating-drinking behaviors is localized in certain areas of the brain has been presented by Gazzaniga, Szer, and Crane 0974). They carried out the classic lateral hypothalamic lesion in rats and found, as expected, that these animals were greatly deficient in their drinking behavior. However, Gazzaniga and his colleagues also noted that the rats used in their experiments displayed an increased propensity to run as a result of the lesion. Greatly increased drinking rates could be elicited from these animals, however, if the opportunities for running were made contingent upon prerequisite drinking behavior. Gazzaniga and his colleagues proposed that these results should make us very careful about localizing drinking behavior in a specific hypothalamic center. They suggest, on the other hand, that the ability to regulate this complex motivation is dependent upon "the entire cerebrum" as well as subcortical centers. Clearly the interactions suggested in this section and the result of such contingent experiments should make us wary of localizing any functions too sharply
F.
MOTIVATIONAL PROCESSES
349
and further prepare us for thinking about the nervous system as a closely coupled set of interacting structures rather than as a system of independent functions. The hypothalamus does seem to play an important, if equally nonunique, role in several other forms of motivational behavior. Sexual behavior, the powerful and compelling means of satisfying the urge for somatosensory delights, has been assumed to be both directly and indirectly controlled by a number of hypothalamic components. The indirect route-a chemical one-is supported by findings that the hypothalamus regulates the production of the hormones estrogen and androgen and also displays a high level of chemical sensitivity to these hormones within several of its constituent nuclei. The nuclei of the anterior hypothalamus may also be involved in direct neural control of sexual development and mating behavior. If the anterior nuclei of a female rat are ablated, she will not go through the normal estrous cycle; however, she will ovulate. If the female's ventromedial nuclei are destroyed, there is a degeneration of the ovaries and a resulting reduction in ovulation. In the male, erections and ejaculations can be produced by direct electrical or chemical stimulation of various portions of the hypothalamus as well as other portions of the limbic system. However, these responses can also be produced by spinal stimulation in animals with transected spinal cords, and thus the hypothalamus may only playa triggering or modulating role in these male sexual behaviors. A considerable amount of direct and indirect evidence, however, now indicates that many different portions of the nervous system are involved in the regulation of sexual activity. Almost every new journal brings a new demonstration of the role of some other center. For example, Clark, Caggiula, McConnell, and Antelman (1975) have just demonstrated that an area in the mesencephalon of a rat increases sexual activity after it is lesioned. This excitatory function is probably mediated by releasing other nuclei from inhibitory controls but once again illustrates the complex interactions between a number of centers that may influence such motivational processes. But obviously, here too, we are only at the beginning of an understanding of how particular brain nuclei exert their influence on sexual behavior. In conclusion we can see that the classic model of the hypothalamus as an essential and unique center for the control of motivated behavior is under strong attack these days. Many other brain regions other than the hypothalamus have been shown to have effects on eating, drinking, and sex, and there are strong interactions between the various nuclei. There is, therefore, an emerging consensus that the concept of the hypothalamus as a unique motivation "center" is totally incorrect. At best, it is only one of a number of important regions in a much more complex system than had hitherto been appreciated. At worst, some recent psychobiological theories have suggested that the role of the hypothalamus in motivation has been completely misrepresented and that it is more likely that "hypothalamic" lesions produce their effects because sensory and motor fibers
350
5.
LOCALIZATION OF MENTAL FUNCTIONS
passing near this region are damaged during the classic operations. Grossman (1975) in particular has proposed a strong argument for a model in which the hypothalamus plays only a minor role in motivation, and the alterations in eating behavior usually attributable to it are, in fact, due to sensory motor dysfunction produced by the interruption of nearby transmission systems. Zeigler and Karten (1973) and Zeigler (1974) have specifically implicated the trigeminal pathways as the possible sensory tracts that may produce the apparently misnamed "lateral hypothalamic syndrome" in pigeons. Obviously, we have a highly fluid theoretical situation at the present time. Even though the classic notion of the hypothalamus as a region uniquely specialized for motivational control is being discarded, the new theories are in a state of flux and no new general perspective prevails. If any single concept seems to characterize the essence of these new views, however, it is that the motivational systems are more likely to be controlled by widely diffuse systems of nuclei with heavy sensory-motor influence than by any single master "center." This emerging point of view is entirely consistent with the interacting systems viewpoint championed here to explain most other mental functions. G.
MOTOR CONTROL
This section contains a very brief mention of the neural mechanisms involved in the control of the musculature of the body. I have already mentioned the cortical motor areas on which topographic maps of excitability may be drawn (as shown in Fig. 5-18a). I have also already mentioned in Chapter 3 the two systems of descending fibers from the motor cortex to the spinal motor neurons. The first efferent motor system, the pyramidal system, sends fibers to the spinal cord without synaptic interruption. Most of these fibers are arranged so that they cross over and control muscles on the opposite side of the body; however, a few pyramidal fibers descend without crossing. The second descending motor system-the extrapyramidal system-is much more complex. Efferent fibers from the cortex synaptically terminate in many centers of the basal ganglia and the brain stem. From their relay points, they send secondary motor efferents to such structures as the pons (and thence to the cerebellum), the substantia nigra, the caudate nucleus, and others. From these nuclei, motor fibers may descend directly to the spinal or cranial motor roots where they are communicated to the effectors. A partial diagram of the descending motor systems, including both direct pyramidal pathways and indirect extrapyramidal fibers, is shown in Fig. 5-36. The cerebellum is mainly associated with the coordination of integrated patterns of motor movements, although it also mediates some reflex activity, is important in maintaining a general level of muscular tonus, and may be involved in other more complex behaviors (see Chapter 3). The spinal cord, in addition to its role as the major communication pathway between the central
G. MOTOR MOTOR CONTROL CONTROL G.
351 351
Various areas of the cerebral cortex
8! ~t
Pons
Red nuc.
.2 ..
I~
~u
Putamen GiobusCaudate poll.
ThOiomus Tectum
Central nuc. Tegmentum
Sub. nigra (To thaI.)
Tegmentum Subthalamus
Nuc. n. VIII
i
8-
.,.a ~
:; .,.
.::
I
'"
en
o
:; ~ en
~ Spinal and bulbar reflex connections, homolateral and contralateral, segmental and intersegmental.
'"ac
'"' ~
cL I
~
-g
ci en
In I
~"a.
8 .g ~.~ ......Q... E
I
...:
+-: Q. lit
I
~ U
~r:.8~
ia a.. a.'"
.... 0' ~
..
0-
z:; '"
:::0
U
.>a:
c';.
. To motor unit of striated muscle
FIG. 5.36 5.36 The The multiple multiple pathways pathways from from the the motor motor regions regions of of the the cortex cortex to to the the typical typical FIG. motor neuron. neuron. (From (From Thompson, Thompson, ©1967, ©1967, after after Ranson Ranson && Clark, Clark, 1959, 1959, with with the the permission permission motor of W. B. ofW. B. Saunders Saunders Company.) Company.)
nervous nervous system system and and the the periphery periphery of of the the body, body, isis also also aa major major integrative integrative center center for relatively relatively simple simple and and direct direct reflex reflex activities. activities. Defects Defects in in the the function function of of the the for substantia nigra nigra has has been been particularly particularly implicated implicated in in such such neuromuscular neuromuscular disdissubstantia orders as as Parkinson's Parkinson's disease. disease. Not Not indicated indicated on on Fig. Fig. 5-36 5-36 are are additional additional interconinterconorders nections with with sensory sensory systems. systems. Obviously, Obviously, the the cerebellar cerebellar inputs inputs that that maintain maintain nections general posture posture do do have have interconnections interconnections with with the the visual visual system system and and the the somasomageneral tosensory receptors, receptors, as as well well as as inputs inputs from from the the vestibular vestibular system. system. tosensory to the the prevailing prevailing views, views, exert exert aa generalgeneralThe cerebral cerebral hemispheres, hemispheres, according according to The ized of the the motor motor system system to to prepreized inhibitory inhibitory control control over over the the brain brain stem stem portions portions of vent vent aa massive massive tonic tonic contraction contraction of of the the musculature musculature in in addition addition to to the the general general regulatory regulatory control control with with both both inhibitory inhibitory and and excitatory excitatory efferent efferent signals. signals. This This isis most dramatically dramatically exhibited exhibited by by the the fact fact that that aa condition condition known known as as decerebrate decerebrate most rigidity rigidity occurs occurs ifif the the cerebral cerebral hemispheres hemispheres are are surgically surgically or or traumatically traumatically disdisofthe thelower lowercenters. centers. Less Lesssevere severe forms forms of ofmotor motordisorders, disorders, connected from from the the rest rest of connected
352
5.
LOCALIZATION OF MENTAL FUNCTIONS
such as cerebral palsy, occur as a result of defects in the interaction of the cortical and brain stem components of the motor system. In conclusion, it is clear that, as we examine the motor system we once again observe the action of an integrated system of centers, each of which contributes in many different ways to the overall control of muscular responses. In no sense of the word is there a single locus for motor control that operates exclusively and in isolation to regulate the individual muscular contractions, despite the dramatic and highly regular somatotopic mapping on the prefrontal cortical surface.
H.
AN INTERIM SUMMARY
In this chapter, I have considered several problems concerning the localization of various psychological functions in the nervous system. The problem has been a central one throughout the classic philosophical and modern scientific periods of the mind-body problem. The idea that parts of the mind may be localized in particular parts of the brain is a natural evolutionary conceptual development because both mind and brain have been traditionally analyzed into subcomponents. However, the localization question is also thought by some contemporary psychobiologists to be conceptually confusing and inadequate. For the past 200 years, students of the mind-body problem have believed the brain to be the unique part of the body for mediating mental processes. There have been two divergent schools of thought that have repeatedly emerged with regard to the question of the way in which the brain performs this function. The first-radical localization theory-asserted that specific mental processes were sharply localized in specific neural centers in the brain. The other-radical homogeneity theory-asserted that the brain acted en masse as a unified and equipotential organ to mediate mental processes. In recent years, another view, somewhat intermediary between these two radical positions, has emerged that is based on the concept that the brain is organized as a system of subcomponents that have differentiated, but interacting, functions in each psychological process. It is only with the greatest difficulty that the components of a system can be separated from each other and the details of the circuitry interconnecting the components unraveled without losing the essential nature of the brain's functioning. It has been repeatedly emphasized in this chapter that the available data that speak directly to the question of localization are exceedingly complex and "noisy" and that there are a number of conceptual, logical, and biological difficulties that make a simple and precise answer to the question of localization elusive. Thus, the general points of view of the brain localization or homogeneity must be considered to be theories rather than statements of "empirical fact." This is true, to an even greater degree, when one considers the function of specific
H. AN INTERIM SUMMARY
353
nuclei within the brain. There is even a considerable amount of disagreement among various workers as to the exact nature of the data in what, at first glance, might seem to be well-defined and carefully designed experimental inquiries. In this chapter in every instance that a particular nucleus or group of nuclei was associated in a particular way with a given psychological function, a statement of principle was made that was highly interpretive, hypothetical, and theoretical, as well as being subject to repeated review and reinterpretation as new data and new perspectives are brought to bear on the problem. Psychobiology simply does not yet have solid answers to many of the subtle questions concerning brain localization with which it deals. In this chapter, I have even mentioned a number of instances in which two investigators have obtained diametrically opposed answers to the same question. Furthermore, the dynamic and variable effects of brain surgery have also made the temporal course of ablation-caused deficits a source of considerable confusion in the search for a solution to the problem of localization of function. In this chapter I have tried to emphasize a modern theoretical view that stresses the concept that no nucleus acts in isolation to mediate any single psychological process. Multiple nuclei are always found to be involved in any single psychological function, and each individual nucleus seems to be involved in many different psychological functions. The reader should especially recall how many different roles were mentioned for the system of nuclei that we call the hypothalamus, for example, in the course of the several sections of this chapter, as well as how many other nuclei have already been shown to produce effects similar to hypothalamically induced ones. It must be further remembered that the actual testing of the potential role of any particular structure in any psychological function requires that some sort of an initial hypothesis be proposed concerning its involvement. This hypotheSis may be based on some chance observation of an accidentally produced behavioral deficit or on a clue provided by some related anatomical or psychological observation. Once a given structure is implicated as having a potential functional role, it is necessary to apply specific and complex tests to assay its contribution to that function. No matter how careful or ingenious the experimenter may be, however, it is impossible for him to claim that a particular structure is not involved in the psychological process under study. This is so for two reasons. First, the assessment techniques employed simply may not have been sensitive to some subtle deficit that was produced by a given lesion. Second, the candidacy of a particular center simply may not have been tested in a way that excludes redundancy of function. There is, therefore, a strong paradigmatic and even social (Le., in the society of scientists) influence on even the empirical data in this esoteric area of psychobiology. The figures that have been selected to illustrate the organization of the neural systems involved in each of the psychological processes described in the several subsections of this chapter have often overlapped considerably. In that
354
5.
LOCALIZATION OF MENTAL FUNCTIONS
regard, they also tend to divert support away from a theory of radicallocalization by showing that individual nuclei are involved in several functions. The general nature of these systems charts also strongly emphasizes the emerging modern perspective that any psychological function, no matter how simple, is regulated by an interacting system of nuclei in which there are complex lines of inhibitory and excitatory feedforward and feedback rather than by singular centers of control. No pretense is being made that we have as yet unraveled the details of any of these systems nor that any of these diagrams is complete or accurate. Rather, they have been presented to emphasize the concept of interacting systems and the premise of nonunique localization of each psychological function. What we can only say, beyond expressing the general view that we are more likely to observe complex system interactions than exclusivity of regional representation, is that some regions are more involved in some processes than others. It also appears that there is a considerable degree of differentiation of function of the various areas of the brain and the brain stem; that is, they are not equipotential. The fundamental conceptual problem of defining exactly what we are trying to localize is another significant source of difficulty obstructing simplistic solutions to the localization problem. In many cases, the psychological process for which a representational locus is sought is conceptually very nebulous. In some cases, the process may only be a specific motor pattern. In other cases, it may be a generalized tendency, and in still other cases, it may be a task based on criteria of convenience in the laboratory or of experimental method that have little or no ecological validity in the natural life of the organism. In concluding this summary, it is most important to stress again the point that however strong the notion of radical localization may be in contemporary research and in currently popular textboods, the actual facts of the matter appear on close scrutiny to be otherwise. Perhaps the current situation can best be summed up by quoting the words of two distinguished contemporary psychobiologists, Elliott Valenstein and Alexander Luria. It has yet to be demonstrated that electrical or chemical stimulation of any region of
the brain can modify one and only one specific behavior tendency [Valenstein, 1973, p. 32]. Mental functions, as complex functional systems, cannot be localized in narrow zones of the cortex or in isolated cell groups, but must be organized in systems of concertedly working zones, each of which performs its role in [a] complex functional system, and which may be located in completely different and often far distant areas of the brain [Luria, 1973, p. 31].
This theoretical perspective is subtantially different from either of the classic antagonistic positions taken by the radical homogeneists or the radicallocalizationists respectively.
6 Representation and Coding: the Backgrou nd
A.
1.
THE ISSUES
What is Meant by Representation and Coding?
In the previous chapter I dealt mainly with the brain in terms of its gross structures when I asked which regions of the brain mediate which psychological functions. In the present chapter! the discussion descends to a considerably more microscopic level. The comparisons now to be drawn between the neural structures and behavior will deal with the characteristics of and interactions among individual neurons rather than with larger portions of the brain. The basic axiom of this book should be restated explicitly at this point; it is that the nervous system provides the immediate, necessary and sufficient mechanisms for the embodiment of all mental processes. The corollary of this axiom germane to our present discussion is that the key to the particular ways in which mental processes are encoded and represented lies in the function, arrangement, and interaction of neurons, the constituent building blocks of the nervous system. If I were to propose a prototypical question that would best characterize the problem posed in this chapter, it would probably be similar to the following: How do the actions of individual or relatively small groups of neurons contribute to the ensemble networks that serve as the physiological equivalents of psychological processes? !The material discussed in Chapters 6 and 7 has been the author's main research interest for the last two decades. Therefore, certain portions of this chapter have been adopted from previously written materials. It is hoped that continuity and currency have been achieved by substantial editing, cutting, and linking with new material.
355
356
6.
REPRESENTATION AND CODING-THE BACKGROUND
Admittedly, even this restricted form of the prototypical question is complicated and can be asked in a number of different ways depending upon the viewpoint of the individual psychobiologist. He or she might ask, for example: Do the characteristic responses of single neurons in the nervous system map in some direct way onto the domain of psychological responses? Alternatively, he or she might ask: What is the relationship between single cell responses and psychological functions? Some contemporary psychobiologists have phrased the question in slightly different ways that are limited to their own field of interest. Because of their relative simplicity, initial isomorphism of stimulus and receptor response, and primarily undirectional flow of information, questions of the representation of psychological responses have been asked most often in the domain of sensory or perceptual systems. Thus, for example, Naomi Weisstein (1969) asks what is essentially the basic and prototypical question, although restricted to the processes of visual perception, when she queries: "Are there 'property analyzers' in the human visual system, and, if so, what properties do they analyze for, and how [po l57]?" As does Stuart Anstis (1975), who asks: "What does visual perception tell us about visual coding [po 269] ?" Nevertheless, all of these questions, regardless of the domain or specific formulation, are virtually synonymous in basic concept. Each asks the same fundamental question. Each is based on the single premise that the proper level of neurophysiology at which to study the problems of the representation of psychological function is that of the action and interaction of individual neuronal responses. Regardless of the particular viewpoint or the specific research topic or the technological approach, all psychobiologists are tackling the same problem when they consider this aspect of the mind-brain question. That problem is the one that we have brought together in this chapter under the rubric of representation or transactional coding. In this chapter either of these two terms (or the abbreviation, coding) can be used interchangeably to classify the processes by which neural responses become equivalent to or identifiable with (in Feigl's terms) mental processes. I elaborate later a more formal definition of what is meant by these terms. Briefly, however, there is a subtle distinction that should be mentioned. The terms representation, or transactional coding, represent the general process by which neural signals come to be the physical equivalent of mental processes. The word code, however, is more specific and refers to the symbols and rules used by that process. The essence of the study of representation or coding is the comparison of the functions of individual neurons, encoded as they are in their own particular symbols of graded or all-or-none electrochemical activity, and psychological processes. This is the main issue developed in Chapters 6 and 7. To begin the search for a solution to this problem of representation, we must start by being explicit about exactly what the terms "representation" and "coding" mean. The science of language and meaning, or as it has been calledsemiotic-by such workers as Charles Morris (1955), is an excellent model to
A. THE ISSUES
357
help us understand what is meant by these terms. The problem of the representation of thought processes by language-one particular set of symbols and the syntactical rules for their use-is identical to the problem of representation of mental processes by neural activity patterns or states-another set of symbols subject to its own set of syntactical rules. Both neural representational codes and written or spoken languages deal with the assigned associations between the constituent symbols and certain concepts. In each case, the essential problem is how different coding systems can be interconverted to each other. Perhaps the issue of neural representation can be clarified by a few simple analogies taken from semiotic. Imagine a line of speakers, each of whom is only able to translate the information presented to him in one language into a different language. Each speaker then passes the information on to the next person in the chain. The communicated idea can usually be regenerated, barring linguistic anomalies, into the original language at the end of the chain by an appropriate final translator. At the end of the chain, however, the exact sentence structure probably will not have been maintained. Furthermore, there is always the possibility that mistranslation or misinterpretations, either intentional or accidental, may have occurred. (This sort of "noise" is a constant byproduct of all but the most perfect of communication systems.) But, with a bit of good luck, the meaning of the communicated idea will be more or less reproduced, even though it has been represented by several different languages at several different stages of the communication process. Languages are the analogs of codes in the lexicon of the sensory-coding theorist. Words are the analogs of neurophysiological signals. Grammatical rules are the analogs of the rules of transformation and encoding. On the other hand, it is often not so easy to discern the analog of the message in the case of neural coding and representation. We may assume either that the physical geometry and timing pattern of the stimulus initially define a message, or perhaps more properly, that the dimensions of sensory experience are the real components of the message. As is well-known, the two do not always directly correspond. Outright hallucinations and illusions produced by conflicting cues are both common experiences in which there is a patent lack of correspondence between the percept and the stimulus. An important implication of the allegorical chain of translators is the virtual impossibility of determining which languages have been used at intermediary stages of the chain, if the translators have done a good job. It is only in the most unusual instances that the final translation will contain some clues as to the intermediary languages in which the message may have been encoded. This fact raises an important question concerning whether or not the neural codes studied in psychobiology may be decoded by behavioral research alone. I assert that they cannot. For example, neither the subjective experience of variable brightness nor differing behavioral responses to a fluctuating visual stimulus need necessarily contain any information regarding the internal neural encoding. A
358
6. REPRESENTATION AND CODING-THE BACKGROUND
simple numerosity code reflecting neuronal recruitment, a frequency code, or even a spatial code are all equally good candidates to represent stimulus intensity. In the same way, the person who listens to a telephone or radio cannot tell whether or not the message had been communicated by a frequency or amplitude modulation system or even by one of the new, more exotic, pulse-coded communication systems used for economizing channel width. What, then, is a formal definition of a representational or transactional code? I formally define a code as a set of symbols that can be used to represent message concepts and meanings (patterns or organization) and the set of rules that governs the selection and use of these symbols. For example, the representation of the amplitude of a stimulus or of a sensory magnitude (the message) by a train of nerve impulses (the symbols), whose frequency is related to the stimulus magnitude by a logarithmic compression law (the transformation rule), is a fairly complete statement of at least one coding situation. Armed with this definition I can now consider some of the implications and the general background of the problem of representation and coding. Although language provides a useful allegory, because its terms are more familiar than those of neural representation, it would be inappropriate to recapitulate much linguistic theory here to elaborate the notion of coding and representation. It is germane, however, to note simply that much of the analysis of how language could represent meaning is directly relevant to the problem of how neural activities can do so. Ernst Cassirer (1874-1945) eloquently stated the case for language as a system for the representation of ideas in a way that may be directly analogized to the case of neural representation and coding. Cassirer (1953) said: In analyzing language, art, myth, our first problem is: how can a finite and particular sensory content be made into the vehicle of a general spiritUal "meaning"? If we content ourselves with considering the material aspect of the cultural forms, with describing the physical properties of the signs they employ, then their ultimate, basic elements seem to consist in an aggregate of particular sensations, in simple qUalities of sight, hearing, or touch. But then a miracle occurs. Through the manner in which it is contemplated, this simple sensory material takes on a new and varied life. When the physical sound, distinguished as such only by pitch and intensity and quality, is formed into a word, it becomes an expression of the finest intellectual and emotional distinctions. What it immediately is, is thrust into the background by what it accomplishes with its mediation, by what it "means." The concrete particular elements in a work of art also disclose this basic relation. No work of art can be understood as the simple sum of these elements, for in it a definite law, a specific principle of aesthetic formation are at work. The synthesis by which the consciousness combines a series of tones into the unity of a melody, would seem to be totally different from the synthesis by which a number of syllables is articulated into the unity of a "sentence." But they have one thing in common, that in both cases the sensory particulars do not stand by themselves; they are articulated into a conscious whole, from which they take their qualitative meaning [pp. 93-94].
This elegant passage of insightful prose is full of relevance to the problems of the neural representation of mental process discussed in this chapter. Strongly
A, THE ISSUES
359
emphasized in this passage is the notion that different sets of symbols are equally suitable for the representation of a single idea or concept. Indeed this is one of the basic notions of representation theory, i.e., there is no one unique code, but rather there are many codes, perhaps strung end-to-end like the line of previously described translators converting a simple message from one language to another. As mentioned, this concept of multiple equivalent representations speaks directly to an important aspect of the problem of neural representation of psychological processes, It asserts that many different codes may be used at intermediate stages to represent a communicated message without leaving any residual clue that the message had at one time or another been in any particular code, Thus, psychophysical data cannot tell us what kind of neural code may have been used internally, and there is no best code, Another important concept that is implicit in Cassirer's paragraph is his idea of wholes, or conceptual "sentences," or more generally in modern psychobiological terms, of "ensembles." As Cassirer says for language, the individual sound (and by this I believe he means both phoneme and word) means little in isolation. Only when it is in the context of a "sentence" does it acquire meaning from the configuration of the entire string of elements of which it is but a part. In an analogous sense, neural coding is also virtually meaningless to the student of representation when he looks only at the individual neuron (even though few neurophysiologists would admit it), The global, molar or holistic behavioral state must be defined not by the behavior of any single "pontifical" (decisionmaking) neuron but rather by the aggregate state of a large group of interacting neurons within some organized network. In particular, the relative aspects of their individual responses must be evaluated to determine the message carried by each neuron. This perspective, of course, is in strong conflict with the philosophy underlying much of the single cell representational theory that has such wide currency in contemporary psychobiology. It is interesting to note, however, the defensive position to which this Single-cell perspective leads, For example, Naomi We isstein (I970), an active modern proponent of Single-cell correlates of perception, is forced to fall back on hypothetical neurons that encode "symbolic" properties such as "in back of." Stuart Anstis (1975), also a strong believer in single neuron models, after reviewing many of the associations that have been drawn between psychophysical and neurophysiological research, is also forced to conclude that for many classes of perceptual phenomena: "It is clear that the higher perceptual processes exemplified here cannot be explained by any kind of edge extraction of spatial filtering, or any known type of physiological mechanisms. They are of a different logical type [p, 316] ." There is, thus, a break appearing in the relatively strong support for contemporary theories of neural representation that assert that single neurons are themselves capable of representing psychological processes. The newly emerging opinion is one that asserts that networks of neurons and their interactions are closer correspondents to behavior and thought than are single neurons or classes of
360
6.
REPRESENTATION AND CODING-THE BACKGROUND
neurons. I believe that there is a powerful and compelling analogy between Cassirer's concept of "wholes" and the idea of a network. Obviously, many psychobiologists are now becoming increasingly ready to accept this relationship, even if they have not yet fully ascribed to it. 2.
Isomorphism-A False Clue; An Unnecessary Code!
In raising the possibility of different "logical types" of relationships between neurophysiological mechanisms and psychophysical processes, Anstis clearly opens a Pandora's box of further conceptual problems. Assuming there are at least two conceivable classes of representational relationship, one dependent upon the direct representation of mental processes by patterns of single neuronal responses that are spatially or temporally isomorphic to the stimulus, and the other dependent upon symbolic or nonisomorphic representations, where should the boundary between the two be placed? Given the validity of symbolic representation in at least some cases, could not the concept of isomorphic representation and equivalence so frequently championed in contemporary literature merely be an artifact of the analogy existing between neurophysiological and psychophysical data? Have we in some way overextended our theories from what are essentially peripheral communication processes to much more complex integrative processes? Is it possible, instead, that the processing carried out by individual neurons is actually only symbolically rather than isomorphically related to the psychophysical process? Have we been seduced, because of some superficial sort of Similarity that exists between responses in the psychophysical and neural domains, to see identity where only analogy actually exists? In Chapter 7 I discuss some evidence that suggests that this is exactly what has happened in at least some cases. The same criticisms implicit in these questions are also explicit in Cassirer's paragraph. He also notes a condition in which the symbols of the language of the code need bear no isomorphic equivalence to the symbols of the external or internal worlds. When Cassirer suggests (somewhat incorrectly2) that pitch and intensity can be encoded into "intellectual and emotional distinctions" he is also obviously aware of the assertion that there is no need for dimensional isomorphism between the stimulus and the percept. Yet the propensity to glorify apparently isomorphic data (even though it may be illusory) is ubiquitous in modern psychobiology. It is assumed implicitly by too many psychobiologists that any dimenSionally similar responses and codes are likely to be representa2Cassirer's error is not germane to the discussion of this section but is important. Physical stimuli have no pitch, subjective amplitude, or qUality. These latter psychological dimensions are attributed to the neural representations of the stimulus by the observer. The acoustic physical stimulus has only amplitude of pneumatic pressure and frequency of oscillation. Sound, a subjective experience, has pitch and loudness.
A. THE ISSUES
361
tions of each other. More often, and, perhaps, more fundamentally erroneous, it is misassumed that dimensionally nonequivalent stimuli and representations are, a priori, not equivalent from a coding point of view. Both of these assumptions are demonstrably incorrect! The basic axiom of coding theory is that it is not necessary to have dimensional equivalence or isomorphism for one set of symbols in one language to represent a concept in another no matter how complex the concept may be. This holds true for both peripheral sensory and central cognitive representations. Logically it is no more necessary for isomorphic coding to occur in the sensory mechanisms than centrally where the symbolic relationship is more blatant. That peripheral mechanisms, in fact, are often isomorphically coded does not detract from this statement. Isomorphism is easily engendered early in an information processing system and is progressively reduced as information passes onward in that system. This logical assertion is a direct corollary of one of the major axioms of modern mathematical and computer theories. Any concept, no matter how complicated, can be represented by a sufficiently long sequence of binary digits. A further corollary of this idea is that any concept can be represented in any language, although it is likely to be encoded less efficiently in some than in others. These mathematical and linguistic theorems may be extended to the psychobiological sphere and assert what is clearly the basic premise of this chapter.
Any psychological concept or percept, no matter how complicated, can be represented by a wide variety of different neural response patterns. No one pattern, isomorphic or not, is, a priori, any better than any other. The neural representation may be totally nonisomorphic, for that matter, and still faithfully encode the mental process, whatever it may be. 3.
The Concept of the Neural State
This brings us to another basic point more obliquely referred to in Cassirer's paragraph-the idea of neural state. It was fashionable in the past to ask: "Who reads neural codes?" (see Bullock, 1961; and Perkel & Bullock, 1968, p. 285). The implication of this question is that the message communicated from receptor organs to the central nervous system had to be decoded or interpreted by some central mechanism. The four main possible answers proposed by Perkel and Bullock to this question as late as 1968 are: 1. 2. 3. 4. with
Single "pontifical decision making" neurons The effector mechanisms such as muscles and glands A large "pool of neurons acting as a unit" A system of parallel pathways that converge, but not to a single unit, decisions made at the "narrowest part of the funnel" [pp. 285-286].
I believe, however, that the question of who or what "reads neural codes" is itself fallacious and invalid. It is simply the modern myth of the homunculus
362
6.
REPRESENTATION AND CODING-THE BACKGROUND
restated in the terminology of coding therapy. A more correct resolution of the issue would be to assert that nothing and/or no one reads the neural code. Rather, after a series of neural processing stages, a certain state of the involved neurons is established. This state itself is, in fact, the equivalent of the psychological experience. The state is not "interpreted" or "read" in any sense of the term; it simply is. The homunculus, like Louis XIV, must say, "L'Etat c'est moL" In this regard it is important to note that the representation approach itself, in general, and in its currently most highly developed aspect-sensory coding theory-in particular, represent an important escape hatch for many of the paradoxes and perplexities of mind-brain philosophical controversies of the past. Relationships that had been sought by classical mind-brain theorists are explicit and quantifiable in coding theory. Many of the more nebulous aspects of the problem, by definition, cease to be issues. The psychobiologist no longer must search exclusively for isomorphic mappings, because he can now accept the idea of encoded and symbolic representations by the state of a neural net. This is a vitally important difference between classic and contemporary research in this area that simplifies and restates the mind-brain problem in a very significant manner.
4.
Are Compound- or Single-Cell Action Potentials the Key Codes?
Finally, it should be noted that there are two main ways to measure the action of individual neurons and the ensembles they form. One way is to use a microelectrode technology to observe and record the response patterns of individual neurons. The functions of a neural ensemble could then be determined by repeated observations of many individual neurons and thence by inductive synthesis of these observations into theories of ensemble action. An alternative approach would be to use a statistical measuring instrument that itself performs the synthesis of the aggregate response profIle into a single molar measure. Macroelectrodes tuned to detect low-frequency compound responses are examples of this overall response approach. Just as cumulative gas pressure measurements indicate the collective response of an ensemble of individual gas molecules, compound potentials give us insights into the collective or statistical behavior of the ensemble of neurons. However, the compound action potential obscures, as does the gas pressure meter, the details of individual unit activity. Although the choice of macro- or microelectrodes is fundamentally a technical issue pertinent to the goals of each experiment, an important conceptual perplexity arises from the presence of these two alternative means for studying neuronal processes. Which one is the more fundamental in the representation of psychological processes? Is mind more a reflection of the action of individual neurons or is mind a statistical process more akin to the compound potential that reflects the mixed activity of many neurons?
B.
A BRIEF HISTORY OF ELECTROPHYSIOLOGY
363
A vigorous controversy surrounds this question in contemporary psychobiology. Single-cell theorists often assert that compound potentials are as meaningless as the voltages that might be measured with a similar technology on the outside of a computer. The adherents of compound potential theories point out that the activity of an individual neuron would be equally meaningless in the sea of twinkling neurons in which it is embedded. What system, they ask, could possibly allow the state of anyone neuron, or even anyone kind of neuron, to predominate or "pontificate" in the specification of the dimensions of the psychological process? They note also that the temporal dimensions of psychological processes are more similar to those of the compound action potential than to those of single neuronal responses. Mental processes typically occur in tenths of seconds, like compound-cell action potentials, not in milliseconds like Single-cell action potentials. In this chapter I consider the conceptual basis of both of these technical means of examining the action of neuronal ensembles-the approach based upon the study of the response of single neurons and the approach based upon the study of compound potentials because this perplexity has not been resolved. The discussion of neural compound-cell action potentials to be presented shouid not be misunderstood, however, as an argument for some type of "field" theory of mental action. The compound-cell potential is simply a convenient way to determine what groups of neurons are doing, in which the statistical processing is done by the electrode rather than the experimenter. These thoughts conclude my introduction to the conceptual foundations of the representation problem. In the remainder of this chapter, I have strived to provide the historical, methodological, and conceptual basis of fields of psychobiology that seek to relate electrophysiological signals to behavioral events. The discussions of single neuron and compound action potentials is followed by a specific analysis of the problem of sensory coding as an example of the representation problem. I have chosen this area on which to concentrate simply because it is by far the best developed, but it should not be misunderstood that this is the only area in which the problems of representation and coding exist. Ultimately psychobiology will have to come to grips with the more complicated codes for problem solving, emotions, and all of the other psychological processes. In Chapter 7 I consider the specific empirical data that have been obtained relevant to the problem of representation. B.
A BRIEF HISTORY OF ELECTROPHYSIOLOGy3
Unlike the problem of localization of psychological processes within the nuclei of the brain, the problem of representation of psychological processes at the neuronal level is of very recent vintage, and its history is, therefore, brief. The 3Some of this section has been excerpted from Uttal, 1975b (Cellular Neurophysiology and Integration), with the kind permission of the publisher, Lawrence Erlbaum Associates, Inc.
364
6.
REPRESENTATION AND CODING-THE BACKGROUND
reasons for this historical difference between the two problems are obvious. Whereas localization theory dealt with processes and material entities that were of the same macroscopic scale as that of the human observer and his intuition, the processes and entities of representation take place at a microscopic level whose existence was only guessed at until the invention of the high-power microscope in the sixteenth and seventeenth centuries. Even then the full significance of the roles played by the tiny neurons and their ultrafine parts did not become evident until the application in the past few years of high power electron micrographs and intracellular microelectrodes. Indeed, the very manner in which neurons carried out their biological function was not understood until the fact that they were essentially electrochemical machines was appreciated. That detailed understanding of membrane electrochemistry did not even begin to emerge until about the beginning of the twentieth century. The availability of the electronic and microscopic technology that was necessary for the study of individual neurons actually occurred within the lifetime of most current neurophysiologists. How very recent this is compared to the speCUlations of Alcmeon, the Greek philosopher, concerning which part of the body was associated with which mental processes! The understanding of the fact that neurons were electrochemical in function and the development of the microelectrode recording technique were two major breakthroughs in cellular neurophysiology. The discovery of compound electrical signals from the brain and the development of the percutaneous evoked potential technique were other important technical developments. The major events in the development of these ideas is the topic of the remainder of this section. 1.
The Origin of the Idea of the Electrical Basis of Nervous Action
Neurophysiology and physics are extraordinarily intertwined from the late eighteenth century on. Discoveries in physiology stimulated findings in physics, which in turn provided impetus for new areas of physiological research. The apex of these exciting times occurred in 1791, when Luigi Galvani (1737-1798) published his observations on a series of experiments carried out on a neuromuscular preparation from the frog. Galvani showed that preparations, consisting only of amputated frog legs and the attendant stumps of the severed nerves, could be made to respond to electrical stimuli that were produced in any number of different ways. Electricity was manipulated (if not completely understood) well enough by this time that it could be produced from electrostatic generators, stored in Leyden jars (the early capacitor), or acquired from lightning rods. Figure 6-1 shows one of Galvani's experiments. Clearly the electrical stimulus produced by lightning was the direct antecedent of the observed action in the frog legs; although this did not necessarily establish electrical action in the
B.
A BRIEF HISTORY OF ELECTROPHYSIOLOGY
365
FIG.6.1 Etching of one of Galvani's experiments in which the nerves of a frog's leg were stimulated by atmospheric lightning. (From Green's 1953 translation of Galvani's original work.)
nerves (that discovery would have to wait until the tiny neuronal responses could be electronically amplified and displayed a century later), it was a minuscule logical step to the assumption that the physical energetics of the neuron were the same as the stimulus. In fact we now know that this logical leap is not necessarily valid. The modern view of sensory transduction is based upon the premise that the physical energy of the stimulus is converted to other forms of physical energy in the nervous system. Indeed, the variety of stimulus energies to which the receptors are normally sensitive should have warned against such a deduction . Nevertheless, the twitch of Galvani's frog legs signaled an end to hydraulic theories of nervous action and a beginning of the electrical theories, versions of which have persisted until our time. It is now understood that the electricity of the nervous system does not function in the same way as the electricity in a metallic conductor. Although both are electrical and can be described by the same mathematical functions, in each case the carriers are different. Electrons and "holes" convey charge in metals, and ions convey charge in the neuron. In addition, the properties of the conductors are different. Metals have low resistances, and neurons exhibit mixed resistive and capacitive impedances of appreciable magnitude. The capacitive and resistive aspects of neural membranes and the nature of the charge carrier specify that conduction velocities will be far
366
6.
REPRESENTATION AND CODING-THE BACKGROUND
slower in biological tissues than in a wire. Furthermore, the neuron is a metabolic system that, in many instances, produces its own power from energy sources located within the cell itself. This is quite unlike the metallic conductor, which must be powered by an external power source. As noted earlier, Galvani's observations were also extremely influential in the development of the physical science of electricity. One experiment carried out by Galvani involved the stimulation of a frog's nerve-leg preparation by hanging it on a metal railing. The source of the electricity in this case was initially obscure, but a contemporary of Galvani, Allessandro Volta (1745-1827), correctly suggested that the source of the electrical stimulus was the junction between the two different metals of the hook and the railing. Stimulated by Galvani's biological data, Volta went on to invent the bimetallic battery, a device that provided a steady electrochemical source of electricity. This invention led to the many discoveries and technological advances in electricity of the next two centuries, eventually culminating in, among other innovations, a modern neurophysiology that is based on the use of electronic instruments. Thus the circle of interaction between electrophysiology and electrotechnology was completed. By the middle of the nineteenth century there was little remaining controversy regarding the electrical concomitant of nervous action. Most researchers had accepted the notion without argument and went on to special problems of their own interest. However, we now know that electricity is a byproduct of electrochemical reactions. Even then, some workers suggested roles for electricity in nerve action that sound surprisingly modern. Among the most notable of these was Johannes Muller (1801-1858), the formulator of the law of specific energies of nerves. According to Brazier (1959), Muller was also one of the first to suggest specifically that electricity "was an artificial excitant that had no part in natural excitation." Nevertheless, the notion of electricity as an easily recordable and universal correlate of nervous action was firmly established in the science of neurophysiology by this time. As it turned out, the view of electricity as fundamental was to be modified by a continuing series of developments in biochemistry in the twentieth century. But as an indicant of biochemical processes, which themselves are difficult to measure, the electrical changes associated with neural activity still reign supreme. Much of the future development of neurophysiology depended upon the development of electrical instrumentation.
2.
Stimulators and Recorders
Once the electrical activity of the neuron was discovered, investigators began a search for the perfect stimulator with which to activate neural tissues. This search continues to this day. Electricity is so easily controlled compared to other forms of physical energy, and it is so effective in giving the experimenter control
B.
A BRIEF HISTORY OF ELECTROPHYSIOLOGY
367
over the stimulus pattern whether or not it is an "artificial" stimulant, that the technology of electrical stimulation is still under active development today. Indeed, the history of electrical stimulators is highly correlated with the history of neurophysiology even when it was not so intended. It is amusing to note that one of the earliest recorded nerve stimulators was thought by its inventor in the seventeenth century to be purely mechanical, but, may very well have been electrical. This early work was carried out by a young student at the University of Leyden by the name of Jan Swammerdam (1637-1680). Although the results of his experiments were not published for over 100 years, it seems fairly certain that in the year 1660, Swammerdam was unwittingly stimulating nerves with what was most probably an electrical stimulus. His "mechanical" stimulator is best described by his own words: If, instead of the heart, we should chuse [sic] to make use of some other muscle, we may proceed in the manner represented in the eighth figure [see Fig. 6-2] , where the glass siphon, a, contains within its hollow the muscle, b, and the nerve hanging from the muscle is fastened, without being cut or bruised to a slender twisted silver wire, c, that runs at the other end, an eye made in a piece of brass wire, soldered to the embolus or piston of the siphon, d. Things being thus made ready, a drop of water, e, must be let into the slender tube of the siphon by a very fine funnel. Now, if after this, the silver wire be cautiously drawn with a leisurely hand f through the ring or eye of the brass wire, till the nerve is irritated by the compression, it must by this means undergo, the muscle will contract itself in the same manner with the inflated heart, whose alterations, upon a similar occasion, I have already described, even the drop of water will in some measure sink, though afterwards it never rises again [Swammerdam, cited in Fulton & Wilson, 1966, p. 212].
There is a catch, however, in this description. The astute reader will have noticed that the muscle hung on a silver wire, which was in turn passed through a brass eye. The necessary conditions for electrical stimulation with current from a bimetallic battery were therefore present, and it certainly seems that Swammerdam may have inadvertently stumbled upon an electrical stimulus 100 years before Galvani. Nevertheless, because the concept of the existence of electricity had not yet been established in the seventeenth century, there was no way in which this could have made sense to him. Another electrical stimulator used over 100 years later was simply another animal. Galvani himself used the spinal cord or the crushed end of a frog's leg nerve to produce muscular contractions in a second frog's leg. These experiments were especially important because they showed that the signals involved in the electrical action of nerves had nothing intrinsically to do with the metals of some bimetallic battery or other parts of the stimulating mechanism. Rather, they suggested that the mechanism of the electrical stimulus was also intrinsic to the nerve and occurred even though no wire, bimetal junction, or lightning was present. The conceptual link between electricity and nerve action having been made, the design of electrical stimulators became more specifically directed. After
368
6. REPRESENTATION AND CODING-THE BACKGROUND
e
of
of of
of
of of
FIG . 6.2 Etching of one of FIG. Swammerdam's experiments in which he may have inadvertstim ula ted the nerve with an en t1y stimulated ently electrical voltage produced by a bimetallic contact. Abbreviations = silver wire; d = = indica ted are: c = indicated brass support. (From Swammerdam's book, Biblia Naturae, published in 1737, as reproduced by Fulton & Wilson, 1966.)
Volta's description of the bimetallic battery effect, Claude Bernard (1813-1878) developed some small units especially suitable for stimulating nerves. A drawing of one of his battery stimulators is shown in Fig. 6-3. The next important development in stimulator technology occurred as a result of the rapidly evolving physical science of electromagnetism. The transformer, a voltage magnifier consisting simply of two coils of wire wrapped around the same core, came into general neurophysiological use in the form of the induction coil introduced by Emil du Bois-Reymond (1818-1896) in the nineteenth century. The idea of the induction coil is simple. A circuit containing a relatively low-voltage source, typically a battery and a coil of wire wound with relatively few turns, could be interrupted with a simple switch. During the turnoff and turn-on times (the periods in which the flow of current through the first coil is varying), a fluctuating magnetic field is produced. That magnetic field is capable of inducing a much larger voltage (though a smaller current) in a second coil with a larger number of turns wound concentrically with the first. Because the switched interruptions could be controlled by mechanical rotators, the time sequence of a pattern of electrical stimuli could also be varied. Figure 6-4 is a sketch of one of du Bois-Reymond's induction coils. electrophysiological stimulator well The induction coil was used as the main e1ectrophysiological into the twentieth century, when it was finally replaced by all-electronic stimulators that allowed much better control over the amplitude, wave form, and timing of the stimulus. stimulus. The earliest units were constructed from vacuum tubes, but
the
the the
the
the the
FIG.6.3 Etching Etching of of one one of of Claude Claude Bernard's Bernard's voltaic voltaic pile pile electric electric stimulators, stimulators, showing showing the the FIG.6.3 switch (I(I == interrupter), interrupter), the the voltaic voltaic pile pile (P), (P), and and the the frog's frog's leg leg preparation preparation (G). (G). (From (From switch Bernard, 1858.) Bernard,1858.)
FIG.6.4 Drawing Drawing of of one one of of du du Bois-Reymond's Bois-Reymond's early early induction induction coil coil stimulators, stimulators, c.c. 1848. 1848. FIG.6.4 (Courtesy of ofGrass Grass Instrument Instrument Company, Company, Quincy, Quincy, Mass.) Mass.) (Courtesy
369 369
370
6.
REPRESENTATION AND CODING-THE BACKGROUND
this technology ultimately gave way to one based on transistor technology that became available in the 1960s. In recent years such luxuries as constant-current and voltage-isolated features have come to be routinely included in even the least expensive stimulators. Although stimulating instrumentation has changed greatly in appearance, and its convenience has increased enormously, there is little fundamental difference between the action of the earliest induction-coil stimulators and the most modern integrated circuit designs, and none as far as the biological preparation is concerned. Both types of stimulators are means of exciting nervous action by the production of appropriate electrical driving forces. On the output end of the experiment-the amplification, display, and recording of faint signals from neurons-the situation is entirely different. Recording and display equipment has progressed to a level of such complexity and utility that a truly qualitative change has occurred in the neurophysiologist's ability to see and conceptualize what is happening. The earliest recording devices were, once again, other organic units. The muscle twitch induced in the frog's leg was the very first indicator of neural activity. Credit for the initial application of this organic "scope" must be attributed either to Swammerdam or to his contemporary, Thomas Willis (1621-1675). The next major step in recording technology was the subsequent invention of the electrometer, a simple device composed of two leaves of thin metal foil suspended in a glass jar from a metallic bulb, as shown in Fig. 6-5. Electrometers act as detectors of electricity because of the repulsive forces generated by like electrostatic charges. When the bulb at the top of the jar is charged, the two foil leaves become identically charged to the same potential because they are electrically interconnected to the bulb. The electrostatic repulsion between equally charged objects, even if slight in amplitUde, is sufficient to force the delicate leaves apart. Similarly, a charged electrometer (and thus one in which the leaves were spread apart) would be discharged in the presence of any appropriate conductive pathway that allows the electrometer's stored charge to leak away. Unfortunately, neither the frog's leg nor the electrometer could respond rapidly enough to keep up with the speed of a neural response, nor could they give a quantitative measure for the magnitude of the signal. Neurophysiologists thus eagerly capitalized on the development of a new instrument, the moving coil galvanometer in 1882 by Arsene d' Arsonval (1851-1940). A sketch of one of his instruments is shown in Fig. 6-6. The moving coil galvanometer is essentially a small motor consisting of a smaller coil of wire mounted between the poles of a permanent magnet. When the current from a voltage source, such as a neurophysiological preparation, passes through the smaller coil, it turns it into a small electromagnet. The interaction of the two electromagnetic fields is sufficient to produce a mechanical force, and if the small magnet is free to rotate (restrained only by a light spring)
-Conducting ball and post
Conducting cork Conducting foil bottle Conducting
Drawing of of aa simple simple Drawing electrometer. This This device device primariprimarielectrometer. ly measures measures electric electric charge charge but but ly can can be be used used to to detect detect the the presence presence of ionization ionization or or radiation radiation due due to to of their effect effect on on charge charge accumulaaccumulatheir tions. tions. FIG. 6.5 6.5 FIG.
371
371
:=IG. 6.6 6.6 An An early early D'Arsonval D'Arsonval :=IG. galvanometer used used to to measure measure galvanometer weak electrical electrical currents. currents. M M isis aa weak small small mirror, mirror, attached attached to to the the rorotor, that that isis used used to to detect detect the the small small tor, rotations created created by by the the interacinteracrotations tion tion of ofthe the induced induced electromagnetelectromagneticic field field with with the the permanent permanent magmagnetic netic field. field . (Courtesy (Courtesy of of Grass Grass Instrument Instrument Company, Company, Quincy, Quincy, Mass.) Mass.)
371 371
372
6.
REPRESENTATION AND CODING-THE BACKGROUND
and has a pointer attached to it, the rotation of the coil produced by the tiny neuropotential can be measured. Moving coil galvanometers were used until the electronic instrumentation revolution began in the 1930s. Other forms of galvanometers and electrometers have also been developed. The general effort in each case was to reduce the mass of the moving coil to the minimum so that the speed of response of the system would be improved. In 1897, W.D.B. Duddell made a step in this direction by reducing the coil to a single loop of wire. The ultimate in moving coil galvanometers, however, was the development in 1901 by Willem Einthoven (1860-1927) of the string galvanometer in which the coil was replaced by an ultrafine metalized quartz fiber that had to be observed through a microscope. The moving coil galvanometer also can be modified so that it operates in a "ballistic" fashion. In the ballistic galvanometer, the maximum deflection of the coil is proportional to the total amount of current that has flowed through the coil. The peak of the deflection is, therefore, a time integral of all the current produced by the driving voltage. Another innovative device, invented in 1873, was the capillary electrometer, a sketch of which is shown in Fig. 6-7. This device is essentially a modern analog of the old leaf electrometers. In the case of the capillary electrometer, however, the moving component was a tiny bubble of sulfuric acid in a column of mercury. The position of the bubble in the capillary tube was also proportional to the applied voltage. Capillary electrometers rivaled in sensitivity some of the string galvanometers and were particularly useful in neurophysiological research because of the small mass of the bubble. Nevertheless, the major problem inherent in any electromechanical device is the considerable mass of the moving element no matter how lightly it is constructed. Because there is always inertia associated with bodies of any mass, or viscosity of the medium through which even a bubble moves, a substantial amount of energy was required to accelerate the pointers of the early measuring instruments. This is so even if the mass of the pointer is only as slight as a small piece of metal or the viscosity as low as that of air. There was a great need, therefore, to develop instruments of little mass and low inertia, that moved in vacuums and that, therefore, could move fast enough to keep up with the lowpowered electrochemical processes of neurons. This problem was not solved until the twentieth century, with the development of electronic measuring and display instruments. A. Forbes and C. Thatcher (1920) first used a vacuum tube amplifier to extend the range of a string galvanometer. The major innovation, however, was the cathode ray tube, an engineering achievement that resulted in the development of recording instruments (oscilloscopes) with nearly instantaneous response time because the moving element-a beam of electrons-had neglible mass. The first application of cathode ray oscilloscopes to neurophysiological recording is usually attributed to Joseph Erlanger (1874-1965) and Herbert Gasser (1888-1963) in 1924 who studied
B. AA BRIEF BRIEF HISTORY HISTORY OF OF ELECTROPHYSIOLOGY ELECTROPHYSIOLOGY B.
373 373
drawing drawing
drawing of of one one of of AA drawing Lippmann's capillary capillary electromeelectromeLippmann's ters for for the the measurement measurement of of ters weak electrical electrical voltages. voltages. The The obobweak server views views the the position position of of server the meniscus meniscus of of the the column column of of the mercury that that varies varies in in heigh heightt as as mercury function of of the the voltage. voltage. (Cour(Couraa function of Grass Grass Instrument Instrument ComComtesy of tesy pany, Quincy, Quincy, Mass.) Mass.) pany, FIG.6.7 FIG.6.7
Comdrawing
drawing
Com-
compound (multifiber) (multifiber) responses responses in in the the phrenic phrenic and and sciatic sciatic nerves nerves of of the the cat. cat. compound of aa newly newly developed developed electronic electronic technology technology Erlanger and and Gasser Gasser took took advantage advantage of Erlanger In the the years years that that to make make some some extraordinary extraordinary neurophysiological neurophysiological discoveries. discoveries. In to have followed, followed, the the general general recording recording and and display display techniques techniques they they pioneered pioneered have of research. research. Today Today there there isis aa substansubstanhave become become universally universally used used in in this this kind kind of have not only only these these recording recording instruments instruments but but also also the the elecelectial industry industry providing providing not tial trodes and and other other elements elements needed needed to to run run aa complete complete laboratory. laboratory. Where Where once once the the trodes of the the simplest Simplest neural neural potential potential was was aa technical technical tour tour de de jorce, force, now now recording of recording even undergraduate undergraduate students students make make use use of of equipment equipment and and procedures procedures that that would would even have astounded astounded senior senior investigators investigators only only aa decade decade or or two two ago. ago. have not all all that that isis to to be be Certainly there there are are new new developments developments on on the the horizon, horizon, and and not Certainly done has has yet yet been been accomplished. accomplished. Perhaps Perhaps no no single single technological technological development development isis done needed needed so so badly, badly, or or isis so so imminent imminent in in the the neurophysiological neurophysiological laboratory, laboratory , as as the the use of of the the small small digital digital computer computer to to process process information information acquired acquired by by amplifiers, amplifiers, use of data data summaries. summaries. thus producing producing more more convenient convenient and and meaningful meaningful displays displays of thus of this this important important tool tool have have yet yet been been rereOnly minimal minimal beginnings beginnings in in the the use use of Only ported. ported.
374
6.
REPRESENTATION AND CODING-THE BACKGROUND
The histories of the computer and of neurophysiological research in the past 20 or 30 years are also curiously intermingled, continuing the trend of parallel development in electric::l technology and neurobiology. In the 1940s and 1950s there was a tendency for computer scientists and neurophysiologists to compare what appeared to be many common features of computers and nervous tissue. Today, however, it is clear that the digital computer, as it currently exists, is a poor model of even the simplest brain. Nevertheless, there is no question that the computer will have a profound impact on neurophysiology as it becomes more widely used as a data processing and display device and as a generator of stimulus patterns. The many powerful ways in which the digital computer can analyze and display summaries of large amounts of data promise to be particularly important in future studies of the interactions of large numbers of neurons. 3.
Intracellular Techniques
Antoni van Leeuwenhoek's (1632-1723) development of the first high-power simple microscope in 1682 opened a window into the microscopic world that was to revolutionize all of biology. Although his best models consisted of only a single lens (see Fig. 6-8) and had magnifying power of only about 275, the world they opened to scientists was amazing in both its beauty and complexity. By the early l800s, higher-power compound microscopes consisting of multiple lens elements 4 had evolved to the point where they provided a level of magnification sufficient for observation of the microstructure of animal tissues. It became obvious to nineteenth century investigators that nerves, which previously were believed to be relatively homogenous tissues, were actually multiple cables of smaller fibers. In fact, the microstructure of the axonal filament of a single neuron was beginning to be understood by that time. Theodore Schwann (1810-1882), among others, had observed the outer fatty covering, the myelin sheath, of the axon formed from those specialized nonneural cells that now bear his name. As the nineteenth century passed, the concept that the really important processes occur at the cellular level grew stronger, along with what must have been enormous frustration. Although researchers knew that they should be looking at single cell responses, these single cells were simply too small to be investigated individually with the techniques then available. In the twentieth century, with the development of oscilloscopes and highgain and noise-free amplifiers, the task of recording neural activity from within single cells became a major but continually elusive goal of many neurophysiologists. A few investigators in the 1930s were able to dissect a small bundle con4Surprisingly, low-power compound microscopes were originally developed by Janssen in 1590, and were in common use by such scientists as Schemer in 1628, and Hooke in 1660, prior to the Leeuwenhoek development of a high-power simple microscope.
B.
A BRIEF HISTORY OF ELECTROPHYSIOLOGY
375
FIG. 6.8 A drawing of one of Leeuwenhoek's 2-inch long, simple (single spherical lens) microscopes showing the screw arrangement that was the antecedent of the modern mechanical stage. This type of microscope had a maximum magnification of about 275X. (From Bradbury, ©1967, with the permission of Pergamon Press.)
taining only a single active fiber, free from a compound nerve, and by laying it across some large electrode (e.g., a saline-soaked cotton wick), were able to pick up extracellular action potentials from single cells. But extracellular potentials are only weak and distorted (by the external media) reflections of what is actually going on across the cell membrane. Most workers in the field realized that neurochemical theory would have to remain speculative until it became possible to record potentials across cell membranes, and that meant placing an electrode inside the neuron. The task was finally accomplished because of an anatomical freak. I.A. Young (1936) described a most unusual neuron, a giant cell in the squid nervous system, which had an axon as large as 1 mm in diameter. This axon was sufficiently macroscopic that even a relatively large glass tube could be inserted into the intracellular space. If this tube contained a salt solution, it could act as the necessary internal electrode. Tubular electrodes were used to record transmembrane, intracellular potentials by two groups almost simultaneously (CurtiS &
376
6.
REPRESENTATION AND CODING-THE BACKGROUND
Cole, 1942; Hodgkin & Huxley, 1939). The experimental procedure is sufficiently important to warrant quoting their description (Hodgkin & Huxley, 1939) in its entirety: A 500 f1 axon was partially dissected from the first stellar nerve and cu t half through with sharp scissors. A fine cannula was pushed through the cut and tied into the axon with a thread of silk. The cannula was mounted with the axon hanging from it in sea water. The upper part of the axon was illuminated from behind and could be observed from the front and side by means of a system of mirrors and a microscope; the lower part was insulated by oil and could be stimulated electrically. Action potentials were recorded by connecting one amplifier lead to the sea water outside the axon and the other to a microelectrode which was lowered through the cannula into the intact nerve beneath it. The microelectrode consisted of a glass tube about 100 pm. in diameter and 10-20 mm. in length; the end of the tube was filled with sea water, and electrical contact with this was made by a 20 pm. silver wire which was coated with silver chloride at the tip .... A small action potential was recorded from the upper end of the axon and this gradually increased as the electrode was lowered, until it reached a constant amplitude of 80-95 mv. at a distance of about 10 mm. from the cannula. In this region the axon appeared to be in a completely normal condition, for it survived and transmitted impulses for several hours [po 710].
This technique was astonishingly influential in its impact on theory, for virtually the first observation-that the electrical potential across the membrane went positive rather than simply retreating to zero during the production of a spike action potential-necessitated a drastic reformulation of the contemporary theory of membrane action. As productive as the procedure was, this sort of intracellular recording required a freak neuron; the study of neurophysiology could advance only if a procedure for the intracellular recording of neuronal action potentials of broader applicability could be developed. The important step of nondestructive penetration ofa cell by an electrode was accomplished by Ralph W. Gerard (1900-1974) and his colleagues (Graham & Gerard, 1946; Ling & Gerard, 1949) on muscle fibers, but the technique is identical to that currently used on neurons. Their procedure involved the use of a tiny fluid-filled glass tube. These tiny tubes were made by a process in which tubes of soft glass were heated and then pulled evenly from both ends until they thinned and broke in the middle. The fluid characteristics of molten glass act as a "demagnifier" to produce a replica of the original tube with the same shaped cross section but at a greatly reduced size. These microelectrodes, or micropipettes as they are often called, are so tiny that they may be literally pushed through the membrane of even a very small neuron without destroying it. The membrane forms a seal around the micropipette, like a self-sealing tire; this seal is sufficiently robust to allow the neuron to continue operating for hours, or even days, after being impaled. Modern micropipettes may be smaller than 1/10 of a pm (one ten-millionth of a meter).
B.
A BRIEF HISTORY OF ELECTROPHYSIOLOGY
377
Again, the technique is of sufficient importance that excerpts from an early Ling and Gerard (1949) publication are still, after a quarter century, extremely interesting: We have pushed in the direction of drawing and filling microelectrodes of well under 1 /1 tip diameter and properly tapered and were rewarded by obtaining highly constant membrane potentials. The making of the microelectrodes was critical. Extensive trial led to a tip tapering over a terminal millimeter from a diameter of 25 J.l. to a few tenths at the opened end. This is stiff enough to handle and to obtain easy penetration and slender enough so that deep insertion does no further damage and that the electrode is still usable after the extreme tip is lost. Electrodes are drawn from capillary tubing of .5 to .7 mm. o.d. (Schaar and Co.). Instead of a microburner, the flame of an air-gas blow torch is used. The capillary is gradually moved in from the side toward the flame and pulled abruptly when sufficiently soft. With some practice, two good needles, separated and with open ends, are obtained fairly regularly. The needle is bent at 90 0 , with the aid of forceps, and about 30 at a time are mounted for filling on a glass plate with cotton thread wound on it; other materials produce sediment which clogs the needles. Filling is easily achieved by boiling vigourously in KCI solution under slight intermittent back pressure. The holder is placed in half isotonic KCI in a porcelain evaporating dish of 12" diameter to which another, of 10" diameter, serves as cover. The dish is heated with the blow torch for about half an hour, by which time the volume is reduced to half and the air in the needles replaced by vapor. Cool isotonic KCl is then added and the needles promptly fill. This treatment, associated with building up and sudden release (as the lid lifts at intervals) of some pressure, leaves all the needles completely filled with approximately isotonic KCl. Each tip is then examined for shape and for absence of bubbles under the microscope and is tested for resistance (normally 100 megohms, usually falling with use but still satisfactory at 20 megohms) on the amplifier in a convenient holder [Ling & Gerard, 1949, pp. 383-385].
This description is sufficiently detailed to be used as a manual for contemporary microelectrode production. A more detailed discussion of microelectrodes of several modern kinds is given later in this chapter. 4.
The Membrane Theory of Neuronal Action
From a purely historical perspective, it is astonishing how quickly new knowledge accumulated and new theories developed in the twentieth century regarding the cellular and biochemical basis of nervous action. The general framework of contemporary theory was spelled out by Bernstein in 1902 when he suggested that the basis of neural action was the selective passage of various ions through the semipermeable plasma membrane of the neuron. However insightful this notion, it was based on what were then some very flimsy ideas. Even the existence of an intracellular fluid filled with mobile ions was not firmly established
378
6.
REPRESENTATION AND CODING-THE BACKGROUND
until R. Hober's work (1910; 1912) on the red blood cell. Although there was some suggestive evidence for the existence of the membrane itself, its physical properties were not measured until the mid-I920s by Fricke. He was the first to suggest that the membrane is only a molecule or two thick and to estimate (correctly, as it turned out) its electrical capacitance. Direct viewing of the cell membrane with an electron microscope has been possible only since the mid1940s. Once routine entry of the cell with micropipettes was possible, many speculations of the past were opened to direct experimental inquiry. Hodgkin and Huxley (1939), as we have noted, quickly demonstrated the voltage overshoot; on the basis of impedance measurements, they were able to show that the whole course of the spike action potential could be explained solely on the basis of variations in membrane permeability and the resulting flow of particular ions. This series of ideas still forms the basis for most contemporary theory, although a number of details are still topics of active investigation. For example, the nature of the channels through which the ions flow, and the nature of the active pumping processes that produce the electrochemical driving forces, are not entirely understood yet. 5.
The Discovery of the Compound Action Potentials
Although the measurement of the transmembrane potential did not become commonplace until the second half of the twentieth century, the discovery of the brain's gross electrical activity occurred much earlier. In 1875, Richard Caton (1842-1926), a young physiologist at the Liverpool Royal Infirmary School of Medicine, discovered that electrical signals could be picked up from the exposed surface of the brain of rabbits and monkeys. The signals he detected were of two kinds. The first, now known as the electroencephalogram (EEG) , is characterized by a more or less spontaneous and continuous oscillation. When these signals were also discovered to be present in man by Hans Berger (1873-1941) in 1925, it was determined that the EEG was at maximum amplitude when the subject was mentally inactive. In this state the neural activity underlying the EEG is thought by some to be more or less idling, allowing a high degree of synchronization of the individual cellular responses. The synchrony of response of many neurons presumably produces the large differences between the high peaks and low valleys of the recording. When the subject becomes mentally active (e.g., by doing mental arithmetic or looking at a picture), the characteristic result is a dimunition in the EEG. This reduction in the amplitude of the waves is thought to be associated with the desynchronization of the individual cellular responses and not a reduction in the overall level of neural activity.
B.
A BRIEF HISTORY OF ELECTROPHYSIOLOGY
379
The EEG has traditionally been frequency analyzed into sinusoidal components (Fourier analysis) to quantify the characteristics of any given record. The strongest (most energetic) sinusoidal component of these human brain potentials typically has a nominal frequency of 10 Hz, although both faster and slower signals are regularly observed. A sample of a typical EEG and its frequency analysis are shown in Fig. 6-9. 5 For a more complete story of the intriguing history of the discovery of the EEG, the reader is referred to Brazier (1961). The second class of signals that Caton observed were transients produced in a particular region of the brain when a specific stimulus was applied to one of the normal sensory receptors. These signals are now commonly referred to as evoked brain potentials. The evoked brain potential differs from the EEG in that it is neither spontaneous nor continuous. It is, rather, a brief transient that follows a specific time course after the application of a particular stimulus. The typical evoked potential consists of two parts. The first is a rapid component that appears to reflect the arrival of the signal at the specific cortical projection area following its passage up the primary sensory pathway. The second component is much longer and probably reflects the nonspecific response produced by passage of the signal along the ascending reticular system. The somatosensory evoked brain potential, for example, consists of a rapid signal from the postcentral cortex and a later more diffuse response recordable virtually any place on the surface of the skull. A major development in evoked brain potentials occurred when techniques were developed for recording them through the intact skull in humans. Previously this sort of mapping of the brain had been restricted to situations in which the brain was surgically exposed by reflecting a portion of the skull. Through the use of computer averaging techniques, however, even the tiny component of the evoked potential present on the skull's surface can be detected. This development was based on World War II developments in radar Signal analysis that depended on the photographic superimposition of multiple images of noisy echoes. In 1947, G. D. Dawson suggested that similar superimposition techniques could be used to detect the small evoked potentials from the human brain. (He also pointed out that the technique, like the radar solution, was functionally identical to the idea used by Francis Galton (1822-1911) in 1883 to enhance the similarities among a collection of facial portraits.) Dawson's contribution to the
SIt should be remembered that the fact that a complex signal is analyzable by Fourier analysis into a family of sinusoidal signals does not mean that the brain is actually generating sinusoids. The mathematical processes inherent in Fourier analysis are very general and will produce the sinusoidal equivalents of any signal no matter what form the original generating functions may have taken.
380
6.
REPRESENTATION AND CODING-THE BACKGROUND
The
The The
The The
The
The
The
The
The The
The
The The
The The The
The The TheThe
The
The
The The TheTheThe TheThe The TheThe
TheThe The
FIG.6.9 A frequency analysis of an electroencephalographic trace. The spectral plot (frequency analysis) is drawn on top of the more usual representation of the EEG. The alpha rhythm (10-12 cps. component) is suppressed in this set of records at A by the subject's having opened his eyes. (From Hughes, ©1961, with the permission of Wright & Sons Medical Publishers.)
sciences of brain and behavioral physiology has been most influential. His techniques created an almost unprecedented opportunity for electrophysiological studies of the awake and intact human. Although Dawson's first photographic superimposition techniques are primitive compared to the sophisticated real-time computer averaging techniques used today, the general nature of the cl)ncept is identical. Dawson (1954) also developed what was perhaps the earliest electrical device actually to perform a pseudo-averaging function, a commutator-driven bank of capacitors, each of which charged to a level corresponding to a weighted algebraic sum of the signals at a given time after the stimulus. Figure 6-lOa and bare sample records from Dawson's original photographic superimposition technique and from the electromechanical device he subsequently invented. For comparison, Fig. 6-10c is a plot of an averaged brain potential generated by one of today's general-purpose digital computers. This, then, is a brief glimpse of the past. In the following section I summarize the current status of cellular neurophysiological data and theory to which these historical precedents have contributed so much.
(a)
..,.
2Q
.:
.
"6. E
:t
2Q 2Q
c:
8. tl
2Q
~
2Q
40
60
80
100
Time in milliseconds (6) 320
..,..
=
"6.
..
E
~
Mwave
c
8.
Mwave
Mwave
tl
~
0
Time in
320 milli~onds
(c)
FIG. 6.10 The development of the averaged evoked potential technique has been considerable since it was first suggested by Dawson. (A) An evoked potential produced by repetitive tracing on a cathode ray tube. This is a 75-msec long sample from two electrodes on either side of the central fissure of the brain. (From Dawson, ©1950, with the permission of the Medical Department, The British Council, London.) (B) A 100-msec plot of an evoked brain potential produced on the electromechanical mechanism developed by Dawson. (From Dawson, ©1954, with the permission of Elsevier/North-Holland Biomedical Press.) (C) An averaged evoked potential produced by a general-purpose digital computer for a longer period of time following the stimulus (320 msec). (From Uttal & Cook, 1964.)
381
382
6.
REPRESENTATION AND CODING-THE BACKGROUND
C.
THE NEUROPHYSIOLOGY AND TECHNIQUES OF SINGLE NEURONS6
1.
Principles of Cellular Neurophysiology and Integration
The study of the physiology of individual neurons and of their interaction at the specialized communication junctions called synapses has become a major endeavor in the biological sciences in the last few decades. Unfortunately, a full discussion of cellular neurophysiology cannot be included in this chapter without making it cumbersomely long. There are a number of introductory texts, however, that concentrate specialized attention on this body of knowledge. The reader is referred to any of the following for a more complete discussion: Aidley (1971), Hodgkin (1964), Katz (1966), Ochs (1965), Schmidt (1975), Stevens (1966), and Uttal (1975b). In place of a full discussion of cellular neurophysiology, some of the more important principles are simply highlighted here to provide a partial foundation for understanding the material to follow. These summary statements have been condensed and adapted from the concluding chapter of my earlier work on the subject (Uttal, 1975b). A full discussion of the background behind each of these summary statements can be found in that text. It is hoped this list will provide a brief recapitulation and review for those readers who are acquainted with this material and that, coupled with the neuroanatomy described in Chapter 3, it will be sufficient to make the detailed discussions of neural representation theory that follow in later chapters understandable. 1. Modern neuron theory assumes a system of discrete cells that are not protoplasmically interconnected by that interact through specific points of contact known as synapses. 2. Neurons are specialized cells characterized by an exceptionally high degree of electrochemical reactivitity. Their function in the nervous system is primarily communication and integration of information patterns, but it is becoming increasingly evident that many neurons also playa secretory role. 3. Neurons carry out their communication and integration functions by means of electrochemical reactions. The flow and concentration of a few ions (sodium, potassium, and chloride) mainly define the direction and magnitude of the voltages measured during any neural response. 4. Neurons respond in a number of different ways. Both regenerative spike action potentials and graded potentials are to be found at different points on the cell. Some of these responses are activated by mechanisms intrinsic to the cell (e.g., spontaneous or pacemaker action potentials), and others are produced by outside stimulation (e.g., receptor or post-synaptic action potentials). 6Some of this section is adapted from Uttal, 1975b, and is used with the permission of the publisher, Lawrence Erlbaum Associates, Inc.
C.
NEUROPHYSIOLOGY OF SINGLE NEURONS
383
5. The category of "action potential" (contrary to popular usage) includes responses other than the spike action potential. In fact, any deviation from the potential level of the resting membrane that is not related to the metabolic or homeostatic processes of the membrane may be considered to be an information-bearing action potential. Specifically included in the category of action potentials, in addition to the spike, are nonregenerative graded potentials. 6. The electrical signals picked up from neurons are correlates of the dynamic and steady-state distributions of the involved ions. The electrical signal, however, should not be considered the primary aspect of neural activity. We are learning now that the electrochemical action of the neuron is more chemical than electrical. Our fascination with the electrical concomitants is largely due to the procedural fact that our recording technology is more highly developed to measure volts than ion concentrations. 7. The electrochemical events underlying nervous action are currently assumed to be most directly mediated by the mechanisms and processes of the neuronal or plasma membrane. This general assumption is the key premise of what has come to be called membrane theory. In particular, variations in the impedance of the membrane to the passive flow of sodium, potassium, and chloride ions seem to explain much of what we know of neural processes. Active transport or pumping actions also play critically important roles. 8. Neurons exhibit a wide variety of anatomic forms, depending upon their particular function. Neurons specialized to transmit signals over long distances possess elongated axons, whereas neurons specialized to integrate information from general inputs more often display extensive arborization of the dendritic trees. Receptor neurons mayor may not have elongated axons. However, generalizations about the anatomic form of neurons usually turn out to have many exceptions, particularly when particular associations are made between form and function. 9. Transmission neurons can actually conduct in both orthodromic (normal) and antidromic (opposite to normal) directions. Synapses are not, however, in general, bidirectional'? Synaptic rectification, therefore, is the main basis of directional information flow in the nervous system. The refractory period following a spike response also helps to make the nervous system directional by preventing retrofiring. 10. Transmission neurons often are covered with a myelin sheath composed of multiple layers of Schwann cell membranes. This myelin sheath acts to produce a rapid form of neural transmission by allowing the ionic currents to skip between the nodes of Ranvier-the periodic discontinuities along the length of the sheath. 11. The plasma or cell membrane of a neuron is a region of demarcation between the inside and the outside of the cell. Because the membrane can just be resolved at the limits of electron microscopy, there is no consensus concerning 7We see in Chapter 9, however, that some electrical synapses do seem to be bidirectional.
384
6.
REPRESENTATION AND CODING-THE BACKGROUND
its structure. A number of competing theories of membrane structure currently exist. They all involve a combination of lipid and protein molecules, most likely arranged as a bilayer of lipid molecules penetrated by protein molecules. 12. The most generally accepted theory of membrane action asserts that an ionic concentration equilibrium is established across the membrane by a metabolically driven pumping action operating in a direction opposite to the passive forces of diffusion, osmosis, and potential driving. The establishment of ion concentration difference by the "pump" is hypothesized to produce a resting potential characteristic of the unactivated neuron. 13. The exact nature of the "pump" itself is not yet fully understood, but it does appear to pump out of the neuron at least two ions of sodium (creating a potassium ion surplus inside the cell). Chloride ion concentrations seem to follow more or less passively and are determined by the final resting potential, although the existence of a separate chloride pump has also been suggested. 14. The transmembrane channels, through which sodium and potassium ions flow, appear to be different, because it is possible to selectively block one or the other with various pharmacological agents. 15. All deviations from the resting potential are the direct result of ion concentration shifts caused by changes in the permeability of the membrane or by active pumping action. 16. Neural transmission may occur by either of two processes. The first is a passive, decremental spread of electrical signals by ionic currents. This process is suitable only for transmission over short distances. The second is the propagation of regenerative spike action potentials. This is a nondecremental process, suitable for communication over long distances. 17. Spike action potentials are produced by a sequential but independent breakdown in the membrane's permeability, first to sodium, and then a millisecond or so later to potassium. These breakdowns occur in successive patches along the length of an axon. The increase in sodium flow, according to the generally accepted Hodgkin-Huxley (1952) theory, accounts for the leading edge of the spike, and the increase in potassium flow accounts for the rapid recovery of the spike back to almost the resting potential level. 18. Long-term recovery of ion concentrations to resting potential levels, however, appears to be dependent upon an extended period of sodium and potassium pumping. 19. The triggering of the breakdown in membrane permeability that leads to a spike is a threshold phenomenon. Once a local depolarization has achieved that threshold, the remainder of the process is spontaneous and requires no further introduction of extracellular energy. Because the process produces larger electropotentials than those needed to trigger it, the spike is able to propagate to distant locations without decrement. 20. The amplitude of a spike is independent of the stimulus; it is a function only of the state of the neuron. This fact is known as the all-or-none law.
C.
NEUROPHYSIOLOGY OF SINGLE NEURONS
385
21. The speed of spike propagation is a function of the size and degree of myelinization of the neuron as well as of its general metabolic state. n. Both electrical and chemical means of synaptic transmission are known. The two types of synapse differ in the spacing between the involved neurons, the rapidity of synaptic transmission, the degree of electrical coupling, and the relative sizes of the pre- and postsynaptic regions. 23. Chemically mediated synapses work by diffusion of transmitter molecules secreted by the presynaptic region to receptor sites on the postsynaptic neuron. 24. A single neuron does not produce more than a single kind of transmitter substance. This fact is known as Dale's principle. 25. A single transmitter substance may produce either inhibitory or excitatory responses, depending upon the nature of the postsynaptic receptor sites. 26. The transmitter substance appears to be bundled into packets of several hundred molecules each and produces quantal miniature postsynaptic responses associated with the arrival of the contents of individual packets at the receptor sites. These miniature quantal responses, however, may sum with each other to produce postsynaptic potentials of virtually continuously graded amplitude. 27. Excitatory postsynaptic responses always appear to be depolarizations, but inhibitory potentials may be either hyper- or depolarizing, depending on the level of the resting potential. In the vertebrate retina, however, some excitatory potentials are hyperpolarizing; in invertebrates, photoreceptor potentials are depolarizing. Therefore, no simple relationship exists between excitatory and inhibitory responses, on the one hand, and hyperpolarizing or depolarizing membrane potentials, on the other. 28. Inhibitory postsynaptic responses generally seem to involve an increase in the membrane's permeability to potassium and chloride ions. Excitatory postsynaptic responses seem to involve changes in the permeability of the membrane to sodium and potassium ions, changes which are similar to those occurring in other membrane actions. However, there are notable differences between the neurochemical and neuroe1ectric actions of synapses and axons. Synapses, for example, cannot sustain spike action potentials and can communicate only by chemical transmitter substances and graded electropotentials. 29. Neural integration, occurring primarily at synaptic junctions, represents what is perhaps the most significant neurobiological process. It is the basis of all network functions, thus of all selective interconnections, adaptive behavior, and ultimately, of all cognitive functions. Neural integration is defined as the cumulative effect of multiple inputs such that the conducted information pattern is modified in an adaptive manner. 30. Integration is a property of most neurons, induding receptors and motor units that receive multiple inputs, as well as interneurons, the latter being defined as "neurons that connect neurons to neurons"; i.e., all neurons except receptors and the final effector units.
386
6. REPRESENTATION AND CODING-THE BACKGROUND
31. A number of processes that tend to modulate the flow of information occur within the neuron. The major integrative factors modulating information flow, however, occur at the synapses. 2.
The Techniques of Cellular Neurophysiology8
Earlier in this chapter, I briefly discussed the origins of the techniques that have been used to explore the function of individual microscopic neurons. The abundance of information that has been forthcoming using these techniques has been utterly staggering. A plethora of journals publish innumerable articles on a wide variety of neurophysiological topics. This flood of information would challenge and strain anyone's ability to integrate these data into a reasonably comprehensive theory of mind-brain relationships. Yet virtually every recent psychobiological theory of perception, learning, or general cognitive function is derived from these cellular physiological findings, and thus virtually all psychobiological theory is a direct outcome of the recording, analyzing, calculating, integrating, and display capabilities of the available instrumentation. In the long run, therefore, our instruments dictate both our theoretical constructs and general views as much as they determine which measurements will be made. In light of this fundamental role of instrumentation in theory construction, the purpose of this section is to provide a more complete discussion of the techniques of Single-cell neurophysiology than has been presented so far. Because so much of the material presented in the rest of this book is derived from experiments in which recordings were made of the responses of single neurons, it seems appropriate at this point to make a purely methodological and technical digression. I review the techniques that are used to detect, record, and analyze single-cell electropotentials. Figure 6-11 is a block diagram of a typical electrophysiological research system that could be used to examine neuronal function. The most important, indeed the essential, component of this system is the tiny microelectrode that acts as the interface between the bioelectric and electronic components of the combined system. This electrode must be mounted on a mechanical electrode manipulator that, in turn, is supported by a rigid stand also holding a preamplifier that is designed to detect the bioelectric signals picked up by the electrode, but draining only microscopic amounts of current from the neural elements. Even slight amounts of current drain could greatly distort the displayed signal. The main, or power, amplifier is responsible for production of the high-power signals necessary to drive the display units, the devices that convert the detected temporal voltage patterns into readable formats. In any neurophysiological recording system, the functions of two or more of these devices shown in Fig. 6-11 may be combined into a single physical unit. 8Some of this section is adapted from Vttal, 1975b, and is used with the permission of the publisher, Lawrence Erlbaum Associates, Inc.
c. NEUROPHYSIOLOGY OF SINGLE NEURONS
387
Auditory monitor
Oscilliscope
components I
, Mlcroelectrode
Power amplifier Chart records
typical typical components components components
Analog magnetic tape
On line digital computer and display
FIG. 6.11 Diagram of the components of a typical electrophysiological recording system. (From Vttal, 1975b.)
Although this combination may result in a reduced number of separate physical boxes, each of the devices indicated in Fig. 6-11 must be accounted for at one place or another in the system. A further stage of instrumentation that is becoming increasingly important, as noted earlier, in electrophysiological experiments is also shown in Fig. 6-11: an on-line computer that may be used either to produce simple or complex stimulus patterns or to analyze and display the results of an experiment. I now
388
6.
REPRESENTATION AND CODING-THE BACKGROUND
consider each of the devices shown in this typical neurophysiological research system in detail. a.
Electrophysiological Recording Electrodes
The problem of interconnection between electrical measuring equipment and organic tissue became a major one as soon as it was understood that there is something electrical about nervous activity. The earliest electrodes were simple pieces of metal. For many years, however, the signal distortion from the nerves caused by these primitive junctions between metals and biological tissues was not apparent because all of the available display devices were inadequate. With the advent of good electronic amplifiers and cathode ray tube oscillographic displays, the distortion became obvious. Modem electrophysiological electrodes reduce the distortion of the signal in a number of ways that are enumerated in the following discussion. The present section can best be organized by dividing electrodes into categories on the basis of their size. Macroelectrodes include the relatively large contacts best suited for detecting neural signals that are generated by many different cells, are distributed over a wide region, or come from neurons that can be isolated from their neighbors. Microelectrodes, on the other hand, are small enough (only a fraction of a micron in diameter, at their smallest) that the responses of individual neurons can be selectively detected.
Macroelectrodes. The signals generated by the brain (electroencephalogram) or the heart (electrocardiogram) are typical of signals that are best recorded by macroelectrodes. Depending upon the size of the macroelectrode, these macrosignals summate the activities of a relatively large number of individual voltage sources distributed over a relatively wide area. However, some workers have been able to routinely record single-cell responses by simply inserting the cut-off end of a relatively thick, insulated wire into brain tissue (e.g., Ito and Olds, 1971) and using it as a macroelectrode. In other instances, when cells can be physically isolated from one another, as in a teased nerve preparation, the separated neuron may simply be laid on top of a macroelectrode (see Hartline, Wagner, & Ratliff, 1956; Uttal & Kasprzak, 1962) to record single cell potentials. A macroelectrode can be as simple as a wad of cotton soaked in an appropriate solution, a metal hook, or a metallic button pasted to the skin or immersed in a conducting solution. In the latter case, the electrolytic solution has the function of reducing the resistance of the interface between the tissue and the electrode; thus most of the biologically generated voltage will be developed across the electrode rather than lost across the interface resistance. The electrolytic solution also performs another important function in electrodes exposed to moderately high-current densities or prolonged immersion. Most metallic electrodes (especially metals like silver and zinc) polarize under
C.
NEUROPHYSIOLOGY OF SINGLE NEURONS
389
conditions in which electrical currents are sustained in a single direction. Polarization is a phenomenon similar to the normal action of a battery in that a constant electrochemical voltage is maintained by the electrode itself on the basis of changes occurring on its surface. Polarized electrodes selectively exhibit relatively high impedance to low frequencies and also have an intrinsic reverse potential opposing the biologically generated current. This results in a differentiated representation of the actual form of the Signal along with a strong D.C. bias. These distorting actions are thus produced when the electrode takes on significant electrical properties of its own rather than remaining an inert detector of neural action potentials. Polarization is due in part to changes in the surface of the metal electrode and in part to the electrolysis of the fluids in which the electrode is immersed. Gas bubbles formed from the electrolytic breakdown of the solution can gather on the electrode and contribute to the battery action, as well as increase the interface resistance. The depletion of ions from the active region of the electrodes may also be a factor in creating electrode polarization. When carriers of electrical charge are removed in any manner, the effect is to increase the internal resistance of the interface between the electrode and the specimen. For those applications in which electrodes are used for stimulation rather than recording, polarization is not a significant problem, because the stimulating currents are very large compared to the polarization potentials. However, when tiny potentials are recorded from a nerve cell, polarization potentials can be equal to, or greater than, the signals themselves. The increased resistance, coupled with even the small capacitance of the electrode-preamplifier wiring, can reduce the overall frequency response of the electronic system by substantial amounts (Le., the range of electrical frequencies that will pass through the amplifier is reduced). Polarization is prevented by coating the electrode with one of its own salts or by isolating the metal from the fluid by a salt-solution bridge. The reader is referred to Silver's (I958) sterling acticle on electrodes in Donaldson's (I958) book on bioelectric recording technique for a more detailed review of the chemistry and prevention of polarization on electrodes.
Microelectrodes.
The second type of electrophysiological electrode, the microelectrode, mentioned earlier in the historical section of this chapter, is designed to be small enough to penetrate the interior of the neuron without affecting its function. A microelectrode can record the potential difference, therefore, between the inside and the outside of a single cell if a reference electrode is placed in the extracellular environmeut. Eventually cells punctured with these tiny electrodes become less responsive, but a skilled electrophysiologist can often keep a microelectrode-penetrated cell alive for up to 48 hours. Apparently the initial puncture does less harm than the subsequent tissue movements around the electrode tip. These movements may enlarge the puncture wound to a size preventing the cell's self-healing properties from working.
390
6.
REPRESENTATION AND CODING-THE BACKGROUND
The earliest microelectrodes (see the historical discussion of Ling and Gerard's work earlier in this chapter) were fine glass tubes drawn by hand to sub micron diameters after being heated to the temperature at which glass becomes viscous. The fine glass tube, or micropipette, is not actually the electrode itself, as glass is a good insulator. The micropipette must be fIlled with an electrolytic salt solution, which serves as the true electrical conductor making contact with the intraneuronal juices through the tiny hole at the end of the micropipette. The electrolyte-filled glass microelectrode is still the best choice in many intracellular recording situations. However, for extracellular recording, metallic microelectrodes are used increasingly because of their rugged durability. One way to construct a metallic electrode is to fIll a glass micropipette with a low melting-point metal such as indium. An alternative method is to sharpen electrolytically a relatively coarse wire by passing an electric current through it while it is suspended in an etching solution. By regulating the amount of current, it is possible to control the sharpness and gradient of taper of the tip. Hubel (1957), Parker, Strachan, and Welker (1973), and Spinelli, Bridgeman, and Owens (1970) are among the many who have discussed techniques for sharpening tungsten. Electrolytically sharpened electrodes of platinum and its alloys have also been successfully used. These kinds of electrodes are then usually insulated with a plastic or varnish except at the recording tip to restrict the region from which signals will be picked up. New electrodes have been designed that are specific to the action of a single ion species. For example, Neild and Thomas (1973) describe a chloride-sensitive microelectrode constructed from a glass-insulated, chlorided silver wire, which can be used intracellularly. Thomas (1970) describes a similar, sodium-sensitive glass electrode specific to that ion. Figure 6-12 shows several types of microelectrodes, including one of an ingenious multiple-barrel design. b.
Preamplifiers
The purpose of the electrode-amplifier-display chain is to reproduce and record the amplitude and time course of the neuroelectric signal under investigation, with as high fidelity as possible. But connecting electronic equipment to the delicate neuronal electrical source without distorting the waveform is not simple. In this section several possible sources of distortion are discussed that can be produced by the preamplifier and the electrode-preamplifier interface. We also consider ways in which this sort of distortion can be minimized. Microelectrodes have considerable internal impedances. A microelectrode with a tip diameter of 1 micron may have an internal resistance on the order of 10 to 20 million ohms, and smaller electrodes may have internal resistances as high as 108 ohms. The internal resistance of the electrode and the input resistance of the preamplifier to which it is connected, along with the unavoidable
C. NEUROPHYSIOLOGY OF SINGLE NEURONS
391
Plastic insulation
Glass
Salt solution Tungsten or steel
(b)
(a)
Salt solution
Insulation Plated
Plated
Plated metal
(c)
(d)
FIG.6.12 Drawing of four different kinds of microelectrodes: (A) a simple glass micropipette that might be filled with a conducting saltwater solution; (B) an insulated metal microelectrode that has been electrolytically sharpened; (C) an insulated metal microelectrode that has been formed by electroplating a metal on a glass substrate; and (D) a complex multibarrel microelectrode drawn simultaneously from two glass tubes. (From Vttal, 1975b.)
stray capacitance of the wiring, form a resistance-capacitance voltage divider as shown in Fig. 6-13. Two difficulties are created as a result of this implicit voltage division. The first difficulty is due to the internal resistive component of the electrode itself. The entire neuronal potential under study is effectively applied across the two ends (marked A and B) of the voltage divider shown in Fig. 6-13. But only a small portion of this voltage may be detected. This occurs because the part v of the total applied voltage V that is sensed across a resistor in such a series circuit is proportional to the ratio of the individual resistance value r and the total resistance R, a relation that may be expressed as r
v =- V R
(Equation 6-1)
392
6.
REPRESENTATION AND CODING-THE BACKGROUND
Input impedance af preamplifier
Rp
and
Cp
and
Input impedance of microelectrode
and and neuron, neuron,
neuron, FIG. 6.13 Drawing of the electric circuit of a neuron, the recording electrode, and the recording preamplifier. (From Uttal, 1975b.)
where (in Fig. 6-13) r = Rp R = Rp
+ Rm
Therefore, if the input resistance of the preamplifier is smaller or even comparable in magnitude to that of the electrode, only a portion of the total neurally generated voltage will be "seen" by the amplifier. Because neural potentials are so small compared to the ambient electronic noise levels, this can be a serious handicap. In order to sense a large portion of the neural signal, the input impedance of the preamplifier should be very high compared to that of the electrode. Preamplifiers with input impedances as large as thousands of millions of ohms (gigohms) are now commercially available. Such units develop most of the signal voltage across their inputs rather than across the electrode even when they are attached to relatively high-resistance microelectrodes. High-input impedance preamplifiers were originally constructed with vacuum tube cathode-follower inputs or special high-input impedance electrometer tubes; now that high impedance
C.
NEUROPHYSIOLOGY OF SINGLE NEURONS
393
semiconductor devices have become available, all-solid-state, high-input impedance preamplifiers are commercially available from a large number of manufacturers. These solid-state preamplifiers often use field effect transistors (FET), a form of transistor that is wired much like one of these special vacuum tubes and that has the same unusually high input resistance. Another advantage of high preamplifier input impedances is the small current drawn by the electrode-preamplifier circuit. If it were larger, it could interfere with the normal physiologic processes of the cell and thus artifactually distort recorded data. The second difficulty associated with input impedance arises from the combination of the resistive and capacitive components of the electrode and preamplifier impedance, as also shown in Fig. 6-13. Even a small input capacitance, when combined with the typically huge input resistance of a good preamplifier, can produce an electrical circuit with a long resistance-capacitance (RC) time constant. This long time constant results in the reduction of the high frequency sensitivity of the amplifying system. Therefore, the recorded biopotential will be seriously distorted by the inability of the preamplifier to follow the rapidly changing components of the signal. Rise times will be artifactually elongated, and rapidly changing signals will not be passed at all by the electrode-preamplifier circuitry. Thus, many preamplifiers in addition to being designed for highinput impedance, also contain special circuits to reduce the time constant by minimizing input capacitance. Older amplifiers accomplished the reduction in RC time constant by laboriously minimizing the actual input capacitance, either by placing the first stage of the amplifier very close to the tip, or by elaborate shielding. Amatniek (1958) has described an ingenious method, however, for reducing the effective input capacitance of a preamplifier in an entirely different manner. He suggested that the input capacitance be neutralized by balancing it with an equal but opposite capacitive effect. In such a device, the input signal is simply inverted by an amplifier with a grain of 1. This inverted signal is applied to a second capacitor, which is adjusted by the experimenter to be equal to the input capacitance of the preamplifier. Thus there are two capacitors, equal in size and connected in parallel, but with opposite polarities of applied voltage. The net result is that the effect of the input capacitance is reduced to a vanishingly small value even though the capacitance is still present. The advantages of input capacitance neutralization are offset, in part, by an increase in the noise level of the output signal. However, this is not a serious impediment, considering the relatively large signals produced by high-impedance intracellular electrodes of the sort requiring input capacity neutralization. Extracellular electrodes may generate only 100 microvolts; intracellular signals may be as large as 100 millivolts. In certain cases it is useful to design preamplifiers to have differential or bipolar inputs rather than a Single-ended input. The need for the differential input
394
6.
REPRESENTATION AND CODING-THE BACKGROUND
arrangement is dictated by the fact that some biopotentials cannot be referenced to any external ground or common potential level but exist only as voltage differences between two different parts of the tissue. A good differential input amplifier also possesses good common mode rejection, that is, the ability to reject signals that are common to both of the input electrodes. Satisfactory common mode rejection, however, requires balanced inputs and thus is not generally available with the idiosyncratic microelectrodes. If common mode rejection is good, a differential preamplifier will not pass any significant amount of the 60-Hz voltages, for example, that might be picked up from overhead lighting and other electrical equipment. If this "pick-up" is too great, metal screens or sheets can be used to shield specimen materials from most sources of interference. Preamplifiers usually do not have substantial voltage gain. They are primarily impedance matching devices coupling the high impedance of the electrode to the low-input impedances of the main power amplifiers, in order to maximize power transfer from one to the other. They may thus produce an output signal only slightly larger in voltage magnitude than the original signal but many times more powerful in current driving capacity. This low-level voltage capability means that many preamplifiers can be entirely battery operated, another feature that can contribute to the reduction of noise injected into the neurophysiological record by the electronic system. Amplifiers in general can be separated into two types on the basis of the kind of coupling that exists between successive stages. The first type of amplifier is said to be direct current (OC) 90upled from one amplifier stage to the next. This simply means that no series capacitors are used in the interstage circuitry, and constant voltages will thus pass as easily throughout the system as changing ones. These slow signals may include such biopotentials as the resting potential of a cell membrane, potentials defined by the position of the eye, slow surface changes on the brain, and similar signals of long or continuous duration. These signals require the use of DC amplifiers, but spurious voltages with similar lowfrequency characteristics may also be present that can distort or obscure the desired Signal. Thus a number of problems are associated with the design of DC amplifiers simply because they allow constant and slowly changing signals to pass. Interfering voltages to which DC amplifiers are sensitive include amplifier base-line drift and electrode polarization effects, in addition to spurious electrophysiological signals. Base-line drift results from the amplification of tiny, gradual, long-term fluctuations in the values of component parts of the amplifier. In a OC-coupled amplifier, a slight change in a resistor's value as a result of temperature can produce a minute voltage shift early in the train of amplifying stages that can be amplified at the output to a substantial voltage fluctuation. Such interference may completely obscure a low-frequency neural signal. Similarly, slight fluctuations in the characteristics of the electrode, such as the smallest amounts of polarization, can be amplified and result in large constant output signals.
C.
NEUROPHYSIOLOGY OF SINGLE NEURONS
395
The second type of alternating current (AC) coupled amplifiers is usually used for the study of signals that are relatively brief transients, such as spike action potentials, electrocardiograms, and evoked potentials, rather than for examining constant or long-duration voltage shifts. AC-coupled amplifiers have capacitive circuits inserted between the several amplifying stages. These capacitors, by virtue of their high impedance to low-frequency signals, tend to block steady voltages and pass only signals that vary to some degree. By placing a bank of capacitors and resistors of various sizes on a ganged multiple position switch, it is possible to select the band-pass characteristics of the amplifier, because the resistance-capacitance combination determines which frequencies will be passed. AC-coupled amplifiers thus have the advantage of allowing the investigator to select the optimum frequency range for the particular signals under investigation. It is possible to adjust the lower bound of the band-pass range to reject low-frequency shifts due to electrode shifts, for example, or the upper bound to reject signals due to high-frequency noise in the preamplifier. For 1-millisecondlong spike action potentials, a band pass from 300 to 2000 Hz would be desirable. For evoked brain potentials, a range of 3 to 300 Hz would be preferred.
c.
Stimulators
All the equipment discussed so far is designed to detect, amplify, and display the tiny electrophysiological voltages generated by neurons. However, many experimental situations also require some sort of stimulating apparatus to elicit a neural response under controlled conditions. Over the years a substantial change has occurred in the sort of stimulators used in neurophysiological experiments. In the early days almost all stimulators were impulsive types; i.e., they were designed to produce the briefest possible stimulus. Each stimulus transient, click, flash, or tap could vary only in amplitude. The impulse was considered to be instantaneous, and little consideration was given to the stimulus pattern in either space or time. One distinguished scientist, H. L. Teuber of M.I.T., is reputed to have referred to this as the "flash of lightning and burst of thunder" school of stimulation. The stimuli used to elicit neural responses were flashes of lights produced by stroboscopes or shutters, clicks of sound, mechanical taps to the skin, and impulsive electrical stimuli-that almost universal stimulus-applied to any place in the nervous system. Over the years, substantial changes in the philosophy of neurophysiological research have affected the design of stimulators for the various modalities. Perhaps the most important single milestone occurred in the late 1950s when Hubel and Wiesel (1959) and Lettvin, Maturana, McCulloch, and Pitts (1959) discovered that stimuli with various features in time and/or space (e.g., dots, lines, corners, and edges moving in particular directions at particular speeds) were much more effective than simple impulses in eliciting neural responses in the visual
396
6. REPRESENTATION AND CODING-THE BACKGROUND
system. This was so even when the patterned stimuli were less energetic than the impulsive ones. As this trend toward more elaborately patterned stimuli continued, stimulator technology changed in some unusual ways. At first there seemed to be a tendency toward simplification. The flashers and clickers of the past were not replaced by more elaborate stimulators but by less complex ones. In their many distinguished studies (e.g., Hubel & Wiesel, 1962, 1965), Hubel and Wiesel have used a simple light source projecting an appropriately shaped image that was moved about manually within the visual field of the animal. The work of Lettvin's group was done with an ingenious though equally simple stimulator using magnets to move a similar visual stimulus about on an aluminum sphere placed in a frog's field of view. In recent years, however, a number of investigators have been forced to turn to more elaborate apparatus by the increasing complexity of their experiments. Some have used computers (e.g., Spinelli, 1967; Uttal, 1969; and Gourlay, Uttal, & Powers, 1974) to generate elaborate patterns of visual stimuli. A full discussion of the variety of photic, mechanical, acoustical, and chemical stimulation procedures used in electrophysiological research is beyond the scope of this book. The interested reader may find a more complete discussion of various means of stimulating neural activity in another book of mine (Uttal, 1973, chap. 2). In this section only the electrical pulse stimulator is discussed for two reasons. First, electricity is the most general stimulus used to activate the nervous system. It is the one stimulus that allows the neurophysiologist to bypass the receptor, thus generating a known pattern of neural responses rather than one defined by the receptor's transductive mechanisms. This is possible because electrical stimuli can produce one spike action potential for each electrical stimulus pulse when applied to axons. Second, the use of electrical stimuli complicates the use of the sensitive amplifying and recording equipment already discussed in this chapter. It is necessary, therefore, to understand how some of these complications can be overcome. The major difficulty in simultaneously using electrical stimulation and highgain preamplifiers in the same system is a result of the discrepancy between the amplitudes of the electrical stimulus and the neuroelectric potentials. Stimulator voltage outputs, typically applied through macroelectrodes, may be as high as 100 volts before they can be effective in eliciting neural responses. However, the preamplifier is tuned to pick up intracellular signals that are, at best, measured in units of millivolts and often (with extracellular electrodes) only in microvolts. Thus the preamplifier is exposed during each stimulus pulse to a very much larger voltage than that level to which it is tuned to respond. Under these conditions most amplifiers overload, and their ability to pass any input lowlevel signals is momentarily blocked. Blockage of this sort is enhanced when the stimulus voltage and the neuroelectric signal are closely "coupled" to each
C.
NEUROPHYSIOLOGY OF SINGLE NEURONS
397
other electrically; i.e., when they are referenced to the same common or ground potential. This problem can be overcome if the two potentials are decoupled, or isolated from each other. Potential isolation can be achieved in a number of ways, all of which involve an electrical transmission from a grounded to a "floating" (ungrounded) circuit. Simple transformers can be used as the simplest expedient, but they do not have good isolation or waveform preservation properties. Two more recent developments have improved the degree of isolation that can be obtained. One innovation is a radio frequency system, developed by Schmitt and Dubbert (1949), which makes use of a small radio transmitter that sends a signal across a small air gap to an equally small, tuned receiver. The net effect of the signal transmission across the gap is to provide a voltage on the receiver side that is free of any reference to ground except for leaking through parasitic (stray) capacitances that occur when two wires simply run near each other. The more recent use of optical isolators achieves an even greater degree of isolation. These systems substitute a voltage-controlled light source and a light-sensitive variable resistor for the transformer or radio frequency transmitter-receiver. When an input signal activates the unit, the light source (usually a photodiode) glows. The light activates a photodetector, changing resistance in a way that controls the magnitude of the current passing through the output circuit. The net effect of this optical transmission, once again, is to isolate the stimulus from ground reference. The second major difficulty encountered with electrical stimuli revolves around the fact that the resistive and capacitive properties of the stimulated tissue can so distort the stimulus that a very constant voltage will produce a very nonconstant current. This effect is mainly due to the charging of the capacitive components of the tissue in such a way that the applied voltage waveform is differentiated. Only the rise and fall (high-frequency components) of the waveform are passed when a current is so differentiated. The distortion can result in an actual stimulus current waveform like that shown in Fig. 6-14, even though the applied voltage stimulus is rectangularly shaped. Special electronic circuitry is used to overcome this difficulty. Circuits can be designed that produce constant current stimuli rather than constant voltage outputs. In a constant current stimulator, the voltage applied to the tissue is not constant but varies to maintain a constant current through the tissue. Because most investigators believe the applied current is a more satisfactory stimulus amplitUde measure than the voltage, this is a desirable feature. In any event, a constant current can be quantified more precisely than a varying voltage. I have now considered a number of the important technical procedures and instruments important to cellular neurophysiological research. This equipment, as noted at the outset of this chapter, not only constrains and limits the phenomena investigated, but also directly influences the concepts and theories used to describe findings. Nevertheless, the fruitfulness of these instruments has been
398
6.
REPRESENTATION AND CODING-THE BACKGROUND
Current
E
g
Current
Current
FIG. 6.14 Diagram showing the effect of applying a constant voltage current to a load (e.g., the skin) with substantial capacitive impedance. The generated current is essentially the derivative of the applied voltage. (From Vttal, 1973.)
so great in the last 20 years that at least tentative answers can now be given to some of the very important questions concerning the actions of individual and microscopic neurons. D.
THE NEUROPHYSIOLOGY AND TECHNIQUES OF COMPOUND ACTION POTENTIALS9
As the number of neuronal spike or graded action potentials increases, the individual spikes may not be discernible on an oscilloscope trace. Rather, the voltages generated by each individual neuron may combine to produce a cumulative signal, which is the arithmetic sum of the single neuron responses, in accord with the usual laws of voltage addition. As the number of individual spike action potentials increases, pooled potentials show increasing numbers of complex spike forms resulting from the superposition of individual spikes. These voltage mixtures can usually be identified by extra bumps and notches on the otherwise smooth rise and fall of the spike; but occasionally deceptively simple, perfectly synchronized responses occur. In the central nervous system, almost any electrode not positioned in tracellularly, except ones that are fortuitously highly selective, will pick up such multiple responses because of the dense neuronal packing. 9Some of the material in this section has been adapted from Vttal, 1975b, and is used with the permission of the publisher, Lawrence Erlbaum Associates, Inc.
D.
COMPOUND ACTION POTENTIALS
399
If the electrode is unselective and has a relatively large field of action, the response will reflect the cumulative response of thousands or even millions of both graded and spike action potentials. Under such conditions, no vestige of an individual response is retained, and the potential picked up by the electrode exhibits very different properties than those of individual spikes. For example, it may last for very long periods of time-up to seconds or even longer in some free-running signals such as the electroencephalogram. Furthermore, compound action potentials of this sort always exhibit graded properties as the quantal nature of the contributing signals is obscured by the very large numbers involved. Compound action potentials represent another important candidate for the representation of mental processes. They are particularly interesting because they have time dimensions quite close to those of psychological processes and also because they may be able to represent aggregate neural coding patterns where the individual spikes cannot. They thus represent an important possibility as a medium for encoding the collective action or statistical properties of large numbers of neurons. If all the neurons are artificially synchronized by an electrical stimulus, pseudocompound action potentials can be recorded from preparations in which the normal coding process is actually more fairly represented by a different type of record. Figure 6-15 shows sample compound action potentials recorded from the human ulnar nerve by placing a rather large (3/16-inch diameter) electrode on the skin and recording the signal through it. Although this response looks similar to a single spike action potential, the dependence of its amplitude on stimulus intensity (a relationship that is not shown in Fig. 6-15) indicates that it is not. A single-cell response follows an all-or-none law. This compound signal is clearly the sum of a large number of spikes all occurring at nearly the same time, and its amplitUde may vary over wide ranges. Yet the duration of the response is almost as brief as an individual spike, indicating a high degree of synchrony indeed. Very simple, brief compound action potentials of this kind can be recorded with a technique originally developed by Dawson and Scott (1949), which allows Stich signals to be picked up through the skin percutaneously with-
o
2
4
6
8
10
Time, msec FIG.6.15 A sample record of a compound electrical potential evoked at the superficial portion of the ulnar nerve at the elbow by a brief (.5 msec) electrical stimulus applied to the superficial portion of the nerve at the wrist. (From Uttal, 1959.)
400
6. REPRESENTATION AND CODING-THE BACKGROUND
out surgery. As stimulus strength decreases, the signal gradually fades away until it can no longer be detected. Individual cellular responses, therefore, are not detectable with this technique, which can only be used with the relatively massive compound responses produced by the electrical stimulus. The general technique of electrical stimulation and compound action potential recording with large electrodes was first developed by Gasser (see, e.g., Gasser, 1938) almost 40 years ago for use on animals on whom experimental surgery could be performed. When the saphenous nerve of a cat was dissected free and cleared of its outer sheath, simple hook electrodes were able to record a more elaborate response pattern than that observed with the Dawson technique used percutaneously on humans. Erlanger and Gasser (1937) showed that the compound action potential recorded in this situation is composed of several sequential components that conduct at different velocities. A major result of this difference in conduction velocity is observed when the action potential is recorded at increasingly greater distances. The further the recording electrodes are from the stimulating electrodes, the more dispersed the various components of the signal become. Figure 6-16 shows the difference in shape of a typical response recorded at different distances from the site of stimulation. It has been possible to reconstruct the shape of a compound action potential of this sort by algebraic summation of the individual spike action potentials that are assumed to be produced by axons having diameters distributed in accord with anatomical measurements. The compound action potential in this case seems to be produced by a simple voltage addition of the individual spike action potentials from the family of neurons of varying diameter found in peripheral nerves. Although these peripheral nerve action potentials are relatively large (approximating 100 or 200 millivolts), evoked brain potentials are relatively small (usually less than 200 microvolts or so). To be seen at all, an evoked brain potential must be processed by a computer to extract it from the severe electrical noise conditions produced by interfering electroencephalographic and electromyographic signals, as well as simple electrical "shot noise" (Le., random fluctuations) in the recording electronics. Even so, this tiny evoked response is the cumulation of the potentials of a large number of individual membrane responses. Its properties are not the same as the peripheral compound action potential previously described, because it is obviously much longer in duration and exhibits a much more complex form than that peripheral nerve mimic of the classic single neuron spike waveform. In the next sections I consider the evoked brain potential in detail. Its use also raises certain important conceptual issues-the most important being the need to distinguish between what I call signs and codes. 1.
Detection of Evoked Brain Potentials
Investigations of the evoked brain potential are beset with a classic signal-tonoise problem. As I just noted, the signal is typically so small, particularly when
D.
COMPOUND ACTION POTENTIALS
401
due due due due
duedue
due
due due
due due
due FIG.6.16 Records showing the dispersion of a single compound action potential into multipeaked waves at various distances as a function of conduction time. The effect is due to the different conduction velocities of the fibers contributing to each peak. S indicates the occurrence of the stimulus. (From Patton, 1965, after Erlanger & Gasser, with the permission of The American Physiological Society.)
due
due due due
(From (From (From due
the (From
due
it is being recorded through the intact skull and scalp of the animal, that it is hidden in the ongoing noise. One might find, for example, that many important components of the evoked potential from the human somatosensory system are only 15 microvolts in amplitude, whereas the amplitude of the usual electroencephalographic activity may be as great as 1/2 millivolt. The problem facing the investigator then is to extract the tiny evoked potentials from the other interfering signals. Such a procedure is possible because of the evoked potentials' synchronization with the impulsive stimulus that generates them. In contrast, the noise is, by definition, activity unrelated to the stimulus. Therefore, at any given delay following a series of stimuli, the amplitude of the unstimulated activity or noise tends to be randomly distributed. When averaged, the randomly fluctuating signal will tend to converge on the value zero. On the other hand, when one averages a signal that is synchronized or time-locked to the stimulus, the average value at each delay following the stimulus will converge toward some representative nonzero value. A mixture of the two types of signal, synchronized and random, will tend upon repetitive averaging to reproduce the waveform of the smaller timelocked activity.
402
6. REPRESENTATION AND CODING-THE BACKGROUND
Averaging may be formally defined by an equation of the following form:
E
E E E EE EE E E
(Equation 6-2)
In this equation, E t is the average value of the signal at a given time, t, following is the the stimulus; N is the number of sequential responses averaged; and voltage at time t for the nth record. The equation represents but one point on the averaged curve for a particular time t after the stimulus. The variable t must be varied systematically to plot out the entire time function. In practice, the preparation of an average response requires the presentation of a repetitive number (N) of stimuli. For digital computer averaging, each evoked potential is then digitized by means of an analog-to-digital converter at a rate sufficient to provide an adequate density of sample points. A large number of averages, equivalent to the family of t's, is then computed with the use of Eq. (6-2), and the values of the family of averages are plotted or displayed on some appropriate output device to represent the average time course of the response. In the most sophisticated digital computer averages, a true division by N is made with each successive stimulus presentation. Performing a division, however, is a complicated logical operation that may be avoided by a simple expedient. Thus, many commercial special-purpose averagers do not actually average but are serial accumulators. Each time a stimulus is presented, an analog-todigital converter samples the data, as described above. These coded values may be summed according to Eq. (6-2), but the division by N is never accomplished. Instead, the cumulative value in the register corresponding to
E7
n
N 1:
=1
En
(Equation 6-3)
t
that is corresponding to a single t, increases throughout the entire experiment. In this manner, the values accumulated in each of the registers gradually increase in magnitude up to the maximum capacity of the storage registers. Because there is as much likelihood that an unsynchronized signal will be above the mean value as below, after any given number of stimuli, once again the contents of each register tend toward some mean value. In this case, however, an increasingly nonzero mean value will be produced as the number of stimulus presentations increases. The division by N, therefore, simply acts as a scaling factor to normalize the amplitUde of the signal for any N. It is here that the price is paid for eliminating the arithmetic circuitry necessary to accomplish the division. Because the scale of the oscilloscope display is usually set to display the final reading properly, it is not possible to visualize the unaveraged results of a single sweep during the early states of the averaging process because
D.
COMPOUND ACTION POTENTIALS
403
the summed values are just too small. Only after a considerable number of stimuli have been presented do the characteristics of the signal begin to enlarge sufficiently to become visible to the observer. 2.
The Origin of the Compound Action Potentials of the Brain
An important general question that arises in dealing with compound action potentials concems the relation of this waveform to the individual neuronal action potentials. For each compound response, we would like to know as much as possible about the contributing neuronal population and how their individual responses were summated. The problem of the peripheral nerve compound action potential is relatively straightfoward. The calculated sums of the individual spike action potentials have proven to be adequate in predicting the shape of the compound nerve response. However, for such compound action potentials as the evoked brain potential and the electroencephalogram, there is a great deal of controversy among different authorities as to the origin of the potential. The major controversial issue concerning compound evoked brain potentials is in regard to their cellular origins. Purpura (1959) comprehensively reviewed theories of the genesis of cortical potentials and stated that the early models stressing the contribution of spike action potentials, which were invoked to explain slow brain potentials, could no longer be accepted, principally because spike activity could not be associated experimentally with slow potentials. These models, he further stated, have been replaced by newer ideas in which slower dendritic and synaptic potentials are hypothesized as the basic subunits that are summated to produce the slow electro cortical potentials. Amassian, Waller, and Macy (1964) expressed this same point of view in one of the first symposia on sensory evoked response in man. For a number of reasons, including the apparent ability of experiments to disassociate spike and slow activity and mathematical considerations of the volume-recording technique, they also concluded that spike action potentials do not contribute directly to the compound evoked brain potential recorded from the surface of the head. They also stated that the slow potentials already known to exist, such as the postsynaptic and the various postspike after-potentials, are more likely added together to produce the waveform. The problem of the origin of the compound potentials and their relations to cellular responses was also reviewed in detail by John (1967). He also reported the prevailing view that compound evoked potentials were almost universally considered to be the summation of postsynaptic potentials, slow transmembrane potentials, and long duration after-potentials following spikes. But, he did also suggest that it was only in some statistical sense that any correspondence could be detected between the spike action potential pattern and the compound potentials. This idea was based on an important experiment that had been carried out by Fox and O'Brien a couple of years earlier. Fox and O'Brien (1965)
404
6. REPRESENTATION AND CODING-THE BACKGROUND
used a combination of computer analysis techniques to demonstrate a very high correlation between the latency histograms of spikes from a single cortical cell, as recorded with a high frequency band-pass preamplifier through an extracellular microelectrode, and the averaged compound evoked brain potential, recorded through the same electrode but with an amplifier passing lower-frequency signals. The correlation between these two measures is great enough when the sample size is large enough (see Fig. 6-17) to suggest that spike action potentials may have been prematurely rejected as major contributors to the production of compound potentials. Fox and O'Brien's work suggests that the compound-evoked brain potential is a direct correlate of the probability of spike firing , and thus they may have demonstrated the missing relation between spike activity and the compound evoked brain potential. For the time being, it is sufficient to state that the genesis of the compound evoked brain potential is still being vigorously debated; whereas some feel it is a
.001 '001 700
~lSO $ WCtP$
~ too
e; 000
i
~ ~oo 300 200
Two (A)
(A)
(A) Two
Two
Two
(b) FIG.6.17 Two records showing the high degree of correspondence between the probability of firing of a single cell (A) and the localJy generated evoked brain potentials (B) recorded from the visual cortex of a cat. (From Fox & O'Brien, ©1965 , with the permission of The American Association for the Advancement of Science.)
D.
COMPOUND ACTION POTENTIALS
405
direct reflection of synaptic and dendritic graded activity, others argue strongly for direct spike action potential contributions to the compound evoked brain potential. Fox and Norman (I968) expanded the earlier notions of the relationship between the spike and the compound evoked brain potential by proposing a new measure they call congruence. Congruence is a measure that describes in a quantitative fashion how typical an individual cell's pattern of response is of the total sample of cells contributing to a particular compound evoked potential. The activity of the single cell and of a microelectroencephalogram were recorded in the manner previously described by means of one electrode and two amplifiers with differing band-pass characteristics. Fox and Norman formed three functions from these recorded data. The first function was the distribution of amplitudes of the microelectroencephalogram recorded with the low band-pass amplifier. A digital-to-analog converter sampled the amplitUde of this continuously varying slow potential every millisecond and made an entry into one of 256 amplitUde categories each time the amplitude of the signal was within its range. In this way, an amplitUde histogram was formed that showed the number of occurrences of each amplitude value. An example of this type of histogram is seen in Fig. 6-18a. The second function was based upon the occurrences of the spike action potential from a single cell, as detected with the high band-pass amplifier. Every time the cell fired a spike, it would trigger the computer to determine in which of the 256 amplitUde levels the slow potential currently resided. Another computer analysis program then developed a second amplitUde histogram based upon the number of times a spike occurred for each of the 256 amplitUde levels. An example of this type of histogram is shown in Fig. 6-18b. So far, this procedure is a relatively straightforward process in which two simple amplitUde histograms are developed. Now consider that some of the amplitudes of the slow potential occur more frequently than others. The extreme values, of course, occur relatively infrequently, and the mid-values occur relatively frequently. The frequency of the spikes shown in the second histogram is strongly biased because there are many more mid-value amplitUdes of the slow potential than there are extreme-value amplitUdes. It is, therefore, necessary to normalize these data by dividing the distribution of spike occurrence by the distribution of amplitUde occurrence. By carrying out this simple calculation, Fox and Norman controlled for unequal numbers of the slow potential amplitUdes. The results of this normalization form a third histogram, shown in Fig. 6-18c, which plots a function that is essentially equivalent to the number of times a spike occurred at each amplitude value of the slow potential. This characteristic histogram of spike occurrence is, therefore, normalized for the number of times each slow potential amplitUde value occurred. If we carefully study the three histograms, a most important fact emerges. Even though there is an almost normal, i.e., an almost random distribution of the first two functions, there turns out to be a very high degree of correlation
406
6.
REPRESENTATION AND CODING-THE BACKGROUND
Counts of micro EEG amplitudes
(0)
Spikes at each EEG amplitude
(b)
Proportion
bla
(c)
Vol loge FIG. 6.18 Three distributions that show the close relation between the probability of a spike action potential and the local electroencephalogram (EEG). (A) Histogram of the amplitudes of the micro EEG at various times. (B) A histogram of the number of spikes produced at each EEG amplitude. (C) A new distribution produced by dividing (B) by (A), displaying a high statistical correlation. (From Fox & Norman, ©1968, with the permission of The American Association for the Advancement of Science.)
between the microelectroencephalogram amplitudes and spike occurrence. This is indicated by the nearly straight line that is obtained when the data are normalized and plotted. The specific numerical value of the correlation between the activity of a given single cell and the microelectroencephalogram can be computed and was found by Fox and Norman to vary from experiment to experiment. This fact in itself is a very significant result, because it confirms the notion that the microelectroencephalogram recorded locally is not a measure of activity of any single cell but rather represents the action of a large population of cells. Thus, if the singlecell activity varies in its congruence with respect to the microelectroencephalogram, it indicates that other cells in the neighborhood are also contributing to
D. COMPOUND ACTION POTENTIALS
407
the overall potential. If the single-cell recording had always correlated with the microelectroencephalogram, it would have raised the ever-present specter that the microelectroencephalogram is something quite different from that recorded in a more conventional manner with very large electrodes. If the notion of congruence is a reasonable one, it should also be expected that when the sensory system into which the electrodes have been placed is driven by an appropriate stimulus, the correlation between spike count and the amplitude of the microelectroencephalogram would increase. This is exactly the result obtained for those cells in the visual cortex of the cat that respond to light flashes. (The visual cortex does, of course, include other cells that are not directly and simply activated by a light stimulus alone, and for these cells the congruence was seen to decrease.) In summary, Fox and Norman presented a valuable new measure for the estimation of the "coordination" or congruence of a group of cortical cells. This measure has a number of important implications. First, the high degree of correlation with single-cell activity tells us that the microelectroencephalogram, if it is not a direct product of the spikes themselves, is certainly derived from processes that also underlie spike generation. The two are essentially analogs of each other and may be considered to be at least informationally equivalent if not representationally identical. Second, the microelectroencephalogram gives us a measure that is a cumulative index of the activity of many cells in the region surrounding the recording electrode rather than a measure of the activity of a single cell. Such a measure may be the only plausible way of investigating the action of ensembles of neurons-and ensembles may be the "name of the game." Third, congruence is a direct estimate of the degree to which the activity of a single cell represents the mean biological activity of all the cells in the region contributing to the slow potential. Congruence should, therefore, prove in future years to be an important measure of brain activity and contribute greatly to an understanding of how it is that the activity of single cells are pooled into compound responses. Another way in which the relationship between cellular and compound action potentials can be made is through the medium of a conceptual drawing based on experiments carried out by Verzeano (1963). His drawing has taken on the status of a classic as a provocative model of the process of compound potential production. Verzeano believes that the shape of the evoked potential or of the electroencephalogram is due directly to the summation of spike action potentials. His drawing also emphasizes the point that shape of the compound potential is determined by the statistics of very large numbers of individual spike action potentials. These cellular responses may be either highly synchronized or completely desynchronized. When the cellular spikes occur at the same time, there is a tendency to produce high-amplitude compound potentials even though the same average amount of spike activity would produce no compound potential if it were completely desynchronized.
408
6. REPRESENTATION AND CODING-THE BACKGROUND
Figure 6-19 illustrates Verzeano's reasoning. In this thought experiment, four electrodes are inserted into closely adjacent regions of some neural tissue. Figure 6-19a shows a considerable amount of neural activity, but that activity is totally unsynchronized among the four electrodes. The resultant compound potential is very small. In Figs. 6-19b, c, and d, there is a progressively higher degree ofsynchronization of the activity at each electrode and a progressively larger amplitude of the compound potential. It is interesting to recall in this context that the unsynchronized activity is associated with a high degree of mental activity, whereas the synchronized activity is generally associated with an "idling" mental state. This, indeed, is the one classic and unequivocal electroencephalographic conclusion. One final point should be made with regard to the possible origins of the compound potentials that can be recorded from the brain. As we learn more about glial cells, the supporting units found between neurons, it is becoming clear that they, too, produce bioelectric potentials. Some of these voltages may be produced under the indirect influence of the neurons that are embedded among the glial cells. In some cases (e.g., the Muller cells-retinal glial cells), the time course of the glial and neuronal cells' response may be nearly the same. Thus, it is possible that even though they are not involved in representative information processing as discussed previously, glial responses may be correlated with responses to stimuli or psychological processes. It has been further suggested that the direct cause of glial depolarization is the increased potassium concentration in the extracellular space as a result of neuronal activity. This possibility has been considered in detail by Orkand, Nicholls, and Kuffler (1966) and in an extensive review by Kuffler and Nicholls (1966). The simple fact that such glial action potentials exist raises the possibility that they may contribute to the production of compound action potentials. In any event, there is as yet no complete agreement as to an explanation of the origins of electroencephalographic or evoked brain potentials; any or all of the voltage sources mentioned so far may contribute to those compound action potentials. Hopefully, the problem can be resolved in the not-too-distant future, but frankly, there is considerable reason to believe that the problem will be very refractory. What we do know, without doubt, is that they can be detected, and because they do come from the brain, it is obvious that many will inquire into their relationship to mental processes. This is the topic of the next section. FIG.6.19 (Opposite page) Verzeano's conceptualization of how spike activity may lead to compound waves like the EEG (GW). If the spikes are all firing randomly, the amplitude of the EEG is low. As the synchronization increases, however, the peak-to-trough amplitude of the EEG increases even though no more spikes may be firing than previously. In this figure, the model has been drawn in two different ways. On the left-hand side, the signals from four electrodes are shown as serial temporal records. On the right-hand side, a snap shot in time has presented the spatial pattern at different stages of synchronization. (From Verzeano, ©1963, with the permission of Acta Neurologica Latinamerica.)
.f>,
CO (0
~ 0o
D
C
8
A
E4
E3
Ez
EI
G. W.
EI EZ E3 E4
G. W
E3 E4
EI Ez
G. W
~
EI E2 E3
G. W
[lI C OlI,f, O Jrol
-INC"EASEO
5
SVNCHIilONIZEO
$YHCHIitONI1ED
TZ
CONVULSIVE
STATE
HYPER -SYNCHRONIZED
TZ
CONVULSIVE
'N C R[" S ' O OR COMPLETE
FUllY
TZ
ll5H T L Y
Oe:SYHCHlilOHIZEO
EI
EI
EI
EI
E2
E2
E2
E2
E3
E.3
E3
E3
E4
E4
E4
E4
410 3.
6. REPRESENTATION AND CODING-THE BACKGROUND
Are Compound Action Potentials Codes of Psychological Processes?10
Because the recording technology of the compound evoked brain potentials is so straightforward, it is sometimes assumed without adequate justification that these electrical signals from the brain directly reflect the essential brain activities that may be identifiable with psychological processes. However, the issue is far more complicated, and such an inference of identification is not justifiable without considerable further attention being directed at the logical and empirical aspects of the putative association. To summarize a widely current view, many psychobiologists explicitly and implicitly suggest that evoked brain potentials, obviously coming from the brain and often apparently correlated with psychological and stimulus dimensions, are indeed the transactional equivalents of mental processes measured under similar circumstances. In this section I have tried to show that the determination that such a potential, even though it may come from the brain, is a true transactional or representational equivalent requires far more stringent tests of necessity and sufficiency than are usually applied. I assert that a far more correct view is that no matter how high the correlation between some parameter of the evoked brain potential and a psychophysical function, the assumption of equivalence is not, a priori, acceptable. The simple, logical fact is thoroughly obscured by the large number of studies purporting to demonstrate an association between certain components of the evoked brain potentials and various aspects of behavior. Although it would be impossible to completely review all of this material, the reader is referred to Cobb and Morocutti (1967), Donchin and Lindsley (1969), MacKay (1969), and Regan (1972), as well as the last 20 years of the journal EEG and Ginical Neurophysiology to satisfy his or her urge for a detailed data base. These sources provide abundant evidence of the effort that has been expended on the search for compound evoked potential correlates of psychological processes. Whether this search has actually been fruitful in contributing to the solution of the mind-brain problem is another matter. In general, a typical sensory evoked potential such as that depicted in Fig. 5-33 can be considered made up of several major components. The first component is a short latency (20-50 msec) signal that is most probably a manifestation of the activity in the primary sensory cortical projection area. This early component is mainly restricted to the portion of the skull directly over the appropriate primary projection area, and its properties directly depend upon the physical properties of the stimulus. Later components (250-500 msec), however, are more general, can be detected over wide portions of the scalp, and seem much more closely related to the "mental" state of the subject and/or the symbolic or meaningful aspects of the signal. It is these later signals that are most affected by the state of consciousness of the subject and that also seem to de10Some of this material is adopted from Uttal, 1967, and is used with the permission of the University of Chicago Press.
D.
COMPOUND ACTION POTENTIALS
411
pend upon such factors as the significance or ecological relevance of the stimulus. Experimenters have observed that these latter components of the evoked potential are sensitive to such factors as uncertainty (Sutton, Braren, Zubin, & John, 1965), decision making (Squires, Squires, & Hillyard, 1975), degree of learning (Jenness, 1972), attention (Picton & Hillyard, 1974), and even to the different meaning of two alternative interpretations of an ambiguous stimuli (e.g., Johnston & Chesney, 1974). In spite of these many correlative results, the question of whether or not these evoked potentials are truly the transactional equivalent (in the sense of a psychoneural identity relationship) of a psychological process remains unclear. Much of the data reported in experiments using evoked potentials is based upon small differences produced by alternative conditions of the independent variable and only one particular portion of the total waveform. Not all portions of the signal perform in the same way, and not all differences are reliable from one laboratory to another. There is, therefore, a great deal of controversy over both interpretations and the empirical facts, themselves. For example, Galbraith and Gliddon (1975) strongly criticize Johnston and Chesney's (1974) paper on the basis of a vocalization artifact. Johnston and Chesney's (1975) rebuttal is only partially satisfactory in meeting this criticism. Another example is a highly interpretive and thoroughly opaque controversy between Donchin (1975) and Stowell (1975) on the one hand, and Begleiter and Porjesz (1975b) on the other, over interpretation of data in an earlier paper by Begleiter and Porjesz (1975a) concerning decision-making effects on the evoked brain potential. Again, both sides have strong arguments and no clear resolution is apparent. As one looks over the history of this field, it seems in general that the level of disagreement is very high. There seems to be more of these critiques and rebuttals in the evoked brain potential field than is usual in science, and this controversy may reflect a lack of crispness in the data base. A more fundamental difficulty, however, lies in the conceptualization of the relationship between the evoked brain potentials (if they do ultimately turn out to correlate satisfactorily) and the psychological responses. This difficulty lies in the fact that it is not always easy to determine whether a neural signal is a code or a sign; that is, whether the neural signal is the true underlying equivalent of the psychological process or simply a correlated concomitant. This important distinction between these two categories of neural response is now elaborated. Earlier in this chapter I offered a general definition of a code as a set of symbols and transformation rules that allows an economical representation of a body of information. For the special problem I am discussing here, it is also necessary that the code actually become the equivalent, at some stage, of a mental process. A representation that is not so utilized but is lost at a more central level of information processing is, in this context, a sign. A sign may be useful to an external observer who may decode it or measure its properties as an indication of the current state of the stimulus environment or of the neural communication system. However, within the behavioral framework with which we are most con-
412
6. REPRESENTATION AND CODING-THE BACKGROUND
cerned, such fluctuations represent little more than concomitant variations of lesser interest. For example, in some central nervous tissue, there are cells in which a high degree of correlation exists between the variance of the intervals between successive action potentials and stimulus amplitude. Under conditions of minimal external stimulation, the intervals are widely spaced and considerably varied. On the other hand, when high amplitude stimulus levels are imposed, the intervals shorten and decrease in variability about their new mean value as demonstrated by Poggio and Viernstein (1964). The change in variability has been shown to exist, but its effect on behavior is what must ultimately identify such variability as either a code or a sign. If fluctuations of interval variability affect subsequent levels in such a way that they differentially influence some behavioral or mental function, then the fluctuation itself is a true code for that process. If, on the other hand, the influence of the neural fluctuations is lost and cannot be shown to affect any behavior or thought, such fluctuation is only a sign of the external environment. In addition, it is useful to distinguish between two kinds of signs. The first category includes those signs that are completely stimulus-determined but which lose their influence because of the insensitivity of some subsequent portion of the nervous system to their particular kind of fluctuation. I refer to these as stimulus signs. On the other hand, systemic signs include those fluctuations that are introduced into neural signals by factors other than external stimuli but which still do not affect all subsequent levels of the nervous system. An example of such a systemic sign is a change in the amplitude of an evoked potential caused by a metabolic deficit which, though altering the signal, does not affect the behavior. It should be noted that there can also be systemic codes. Systemic codes include stimulus-unrelated fluctuations in the neural state that affect behavior. The neural mechanism underlying memory is a good example of a systemic code, affecting perception as it does in a way that is unrelated to the current stimulus impact. In a discussion of evoked brain potentials, a careful distinction between the two kinds of signs is as important as the distinction between a sign and a code. In peripheral sensory nerves, the Jistinction between a sign and a code can be more easily conceptualized. In the context of peripheral nerves and their action potentials, an adequate demonstration of the psychophysical discrimination of a given signal fluctuation is operationally sufficient to define it as some sort of a code, because discrimination indicates a sensitivity equivalent to interpretation. But, whereas the family of potential signals in peripheral nerves can be easily categorized, the complex interactions of the net of neurons constituting the central nervous system are much less well understood. In each case, however, the sufficiency test for a code is the same. It requires that we use the entire efferent action-a complex and multi-influenced process that has been generally called behavior-as the measuring instrument.
D. COMPOUND ACTION POTENTIALS
413
This point is emphasized to make clear that no neurophysiological measurement, no matter how sophisticated, no matter at what high level of the nervous system it is carried out, and no matter how highly correlated, can substitute for a behavioral test to distinguish a sign from a code. Therefore, the psychobiologist must consider the behavioral response as the ultimate test of whether or not some complex neural process actually is the basis of some mental awareness or experience. With the evoked brain response, we are dealing with an integrated electrical potential in which much of the microscopic details of the states of the individual neurons in the system have been lost. It is a mathematical truism, as I have noted previously, that such composite waveforms can be formed by an almost infinite number of combinations of SUbcomponents. On the other hand, does the central nervous system process these components of the evoked brain potential in a statistical way which is so similar to that exploited by the electrophysiological recording techniques that, in fact, they can be considered codes? Of course, this is another way of stating the same essential question. There is an argument that the use of the terminology and the approach of the compound evoked potential may be the most realistic way of answering the fundamental question of representation asked at the beginning of this chapter. It is possible that an analysis of mind and/or behavior to the level of the individual neurons and synapses, on a strictly deterministic basis, may be just as difficult as describing the behavior of a box of gas on the basis of the dynamics of the individual molecules constituting the gas. This is not only due to the practical difficulties of the quality of individual measurements (as limited by the Heisenberg uncertainty principle) but also to the mathematical-conceptual problem defined by the fact that an infinite number of combinations of individual-gas-molecule dynamics could lead to the same gross behavior. Thus, the issue of the compound evoked potential may not only be a useful adjunct but, because it is intrinsically a statistical process, may represent a set of techniques that are the only conceivable means of asking the important questions about the representation of mind's action by brain's structure. This may be so in spite of the fact that the brain is microscopically orderly and not a "mess of porridge" or a homogeneous gel. I repeat, this approach is a possibility and a plausibility; but it is also important to note that there is as yet inadequate justification for acceptance of this concept of the evoked potential as a coded statistical representation of mind. Much progress has to be made in the laboratory and the study before such an idea can gain consensual acceptance. The discussion of the distinction between signs and codes in the context of the compound evoked potential also holds true within the domain of individual neurons, the other possible major level at which we could expect to find codes for mental processes. In turning to the problem of sensory coding in the next section, our attention is, thus, redirected to this world of the individual neuron. It is important to remember, however, that even though it is in the realm of sen-
414
6. REPRESENTATION AND CODING-THE BACKGROUND
sory processes that so many of our data have become available concerning the specifics of neural representation, this is only a portion of the problem of coding and representation. Ultimately, all other mental processes must be subject to a similar analysis. E.
SENSORY CODING11
Because of the anchor provided by the metrics of the physical stimulus and the relative simplicity and directionality of the sensory pathways, the study of representation at the cellular level has reached its highest state in modern research on the problem of sensory coding. The general question of representation is constrained in this case to ask only -How are the percepts that are the results of impinging patterns of stimuli and the information that is communicated along the afferent pathways encoded? Although psychobiology can take pride in the substantial progress that has been made in studies of sensory coding, it must also acknowledge that the profession is only at the beginning of any systematic knowledge about the coding of cognitive processes. Indeed, most current experiments, even though carried out on nonsensory or nonmotor portions of the cerebral cortex (the intrinsic regions), are in the sensory paradigm. It is true, therefore, that although simple sensory communication mechanisms do serve as a partial model for more complex central integrative functions, there is only an imperfect analogy between the two domains of research. In at least one regard, they are fundamentally different. As I have noted previously, the criteria of excellence of the nervous system's communication aspects are based upon the fidelity of information transmission. It is at this level that a closer approximation to isomorphic representation of the communicated information may be found and be useful, even though it is no more necessary here, in some logical sense, than more centrally. It is clear, on the other hand, that the representation of central cognitive processes maintains little semblance of such isomorphic mapping. The central cognitive or symbolic coding of a concept or a percept may be in dimensions bearing no geometrical relationship to the original stimulus pattern. What research there is on the topic suggests a gradual progression away from geometrical congruent representations in the periphery to increasingly less "mappable" representations more centrally. The central symbolic processes, therefore, allow considerable "patching" to be done to the relatively poor sensory information that does make its way to the intrinsic areas. Holes may be filled in, clues interpreted, past experience introduced, and judgments made about relationships that differ considerably from the simple geometric or temporal characteristics of the original stimulus. The criteria of excellence of integrative mechanisms are, therefore, based on the richness of the mixture of information and the deviation from, rather than fidelity to, simple reproduction of 11Some of this section of the chapter is adapted from Uttal, 1973, and is used with the permission of the publisher, Harper & Row, Inc.
E.
SENSORY CODING
415
the input stimulus pattern. It will probably turn out, therefore, that sensory coding theory is a fairly poor model for cognitive coding theory; Another important point should be made here with regard to sensory coding in particular. The general problem of sensory coding is a much more complex and multifaceted one than the Singular determination of the functional relationship between stimulus intensity on the one hand, and nerve impulse frequency on the other hand. In the past, this single task had often been presented as "the" sensory-coding problem. Although such an overSimplification is not as frequent as it was, there still is a tendency to emphasize this single aspect of sensory coding in the current literature. It is easy to understand the reasons for this misemphasis. The dramatic size and explosiveness of the propagated spike potential make it a far more visible and interesting signal than the more delicate graded synaptic and receptor potentials. But these latter neural signals are also candidate codes. In addition, the usual technology in a Single-cell electrophysiological experiment involves the use of a micropipette and an oscilloscope. For all their virtues, these instruments are limited in scope. The micropipette responds to events at but a single place, and the oscilloscope (at least until some of the recent developments of some highly ingenious spatial display techniques) usually was used as a simple plotter of time functions. Experiments based on this particular methodology, therefore, tend to overemphasize the importance of time functions at a particular point in space. It was not realized until recently that many other possible neural codes of equal importance even existed. This is certainly an example of the tremendous constraining influence on one's perspective that can be produced by one's technology. Similarly, psychophysicists have concentrated on the psychological problem of judged magnitudes or intensities. The search for "the" psychophysical law has, in large part, been a search for relationships between perceived magnitude and stimulus dimensions. But here, too, magnitude is only a portion of the total span of dimensions that are involved in the sensory coding problem. Quality, temporal, and spatial parameters also must be encoded by neural signals, and the elucidation of their codes is also a part of the sensory coding problem. In the following pages I spell out a view of the coding problem that is based upon the correlation of two groups of dimensions rather than only magnitude and action potential frequency. The first group of dimensions describes messages. The second describes the neural responses. We must separately ask: What is the pattern of the information (the message) that is carried along the neural communication pathways, and what are the parameters of the neural response in which that information is encoded? 1.
The Common Sensory Dimensions; the Message
There is a certain amount of ambiguity in defining the conveyed information, because there are two perspectives from which the characteristics of the message
416
6.
REPRESENTATION AND CODING-THE BACKGROUND
may be discussed. One way is to assume that the key reference is, in fact, the physical stimulus and that it is sufficient to merely talk about physical stimulus dimensions. After all, that is the original source of the message. On the other hand, our knowledge of the transductive processes should have forewarned us that physical dimensions may be very quickly reencoded into entirely different neural dimensions even within the receptor itself. A temporal dimension might be converted into one of magnitude, and a spatial one might become a temporal one under the appropriate conditions. An alternative scheme in which such conversions could be accounted for is to be much desired. An eminently feasible alternative is to base our discussion on the aspects of perception rather than the dimensions of the physical stimulus. This alternative set of dimensions might well be called the set of discriminable dimensions of the phYSical stimulus, or alternatively, the set of common sensory dimensions. This latter nomenclature emphasizes the fact that this schema is modality independent. The same set of common sensory dimensions can be used to describe patterns as well in one sense as in any other. Thus perceived magnitude, or subjective intensity, is a term that can be used for visual brightness, auditory loudness, or even olfactory smelliness. Such an approach has the additional advantage of biasing the experimenter toward searches for those coding factors that are common to all of the senses and biasing the theorist toward the development of a general, rather than a modality specific, theory of sensory coding. The relationship between the stimulus, the neural code, and the common sensory dimension can be made somewhat more concrete by a specific example. Consider the auditory sense. A stimulus with a certain physical intensity produces, at a given level of the ascending pathway, a neural fluctuation along some dimension. This neural fluctuation, or code, is interpreted by the nervous system in such a way that in a psychophysical experiment, subjects may report that the stimulus was associated with a certain loudness or perceived magnitude. It is also important to realize that the relationship between the subjective experience and the neural fluctuation is the key problem that must be unraveled in coding theory. It is not the relationship between either the stimulus and the neural fluctuation or the stimulus and perceived magnitude. This fact is usually hard to grasp, because all three domains-the phYSical, external environment and the stimuli it generates; the neurally encoded pattern of signals; and the psychophysical responses-are involved in the process. Perhaps the reader will find some balm for this perplexity by looking once again at Fig. 2-10, which provides some order for these various interacting variables. This discussion of a schema of sensory dimensions also reminds us of another fact. If a stimulus dimension can be varied without producing any mental or behavioral effect in either the short run or the long run, for all practical purposes, it is not a dimension of interest to students of the psychobiology of sensory coding. An example of such an indistinguishable dimension change would be a shift in wavelength of the emitted radiation from an infrared or ultraviolet
E.
SENSORY CODING
417
source. Although there is a great change in the characteristics of the physical energy, this change simply would be undetected by the sensory mechanism. Fluctuations within the differential threshold or beyond the absolute thresholds would also presumably fall outside of our category of discriminable physical stimuli. From this psychobiological point of view, two stimulus patterns that are not discriminated by the organism as being different are identical even if they produce disimilar neural responses.
2.
The Candidate Codes-The Neural Language
Once having defined the common sensory dimensions, the second part of this task is to identify the relevant neural coding dimensions. In an earlier presentation of this conceptual model (Dttal & Krissoff, 1968), the words "possible dimensions of the neural code" were used to catch the intended flavor of this plan. A preferable phrase has come into use since the publication of that paper. Perkel and Bullock (1968) refer to the possible neural dimensions as "candidate neural codes," which may be defined as any neural signals that have been observed to vary concomitantly with some variation in a physical stimulus dimension. As I noted earlier in the discussion of compound evoked potentials, the neural Signal need not be demonstrably associated with any mental or behavioral dimension to be a candidate. Only after it has passed certain tests of necessity and sufficiency (not usually carried out in neurophysiological experiments) can a candidate code be accepted as a true code. Nevertheless, additions to the list of candidate codes lie purely in the domain of the neurophysiologist. Interestingly, the concept of testing to confirm a candidate as a true code has now been expanded to the field of neurochemistry. Putative neurotransmitters must be similarly tested to convincingly enter lists of "true" neurotransmitters. It is important to remember that there is a dual air of uncertainty concerning the dimensions that are to be entered on the list of candidate codes. First, there can never be any assurance that such a list is complete. New dimensions of variation are constantly being discovered and added to the list of candidate codes. For example, only in the last decade have we become aware that not only the conventional mean frequency dimension of nerve impulse trains is a candidate code, but that the independent higher-order statistics of interpulse interval might also be candidate codes. Other possible additions to the list might also be lurking just offstage, awaiting the development of some new measuring instrument or an insightful investigation by some ingenious experimenter. The second element of uncertainty concerning dimensions entered on the list of candidate codes lies in the fact that certain dimensions, which have single metrics in some external measuring system, may have alternative and independent ways in which they might be decoded. For example, frequency is an ambiguous dimension-it may be decoded by either measuring the average interval between two impulses or by counting the number of intervals in some integrat-
418
6. REPRESENTATION AND CODING-THE BACKGROUND
ing period. In this case, also, we must wait for some further experimenter to resolve how the influence of a particular dimensional variation is exerted.
3.
Discriminable Dimensions of the Physical Stimulus (The Common Sensory Dimensions)
a.
Perceived Quantity
The sensed intensity or magnitude of a physical stimulus has long been an area of concern to the psychophysical cryptographer. Although psychobiologists might have initially considered referring subjective magnitude solely to stimulus intensity, it is clear that this is not the thing to do. Whereas all subjective magnitudes vary with stimulus intensity to a major degree, there are also any number of other secondary means by which they can be influenced. For example, consider that the brightness of a photic stimulus varies not only with the number of photons but also with the wavelength of the spectrum of the stimulus. Similarly, it is equally well-known that the loudness of a sound is dependent on the acoustic frequency as well as the sound pressure level. Pairs of electrical pulse stimuli applied to the peripheral nerves (Uttal, 1959) produce sensations of varying amplitudes as the interpulse interval varies even though the total amount of current remains constant. Thus, in general, stimulus intensity is not the only correlate of sensory magnitude. Conversely, stimulus intensity variations are not always sensed purely as sensory magnitude shifts. Increasing stimulus intensity beyond certain limits can produce substantial quality changes-in hue, in saturation, in pitch, or even in the production of pain. Sensory magnitude, therefore, seems to be a more appropriate dimension for the coding problem than stimulus intensity. b.
Perceived Quality
The kind or quality of a sensory experience is, of course, the other classic area of sensory research. Yet, there is also an ambiguity in the interpretation of this term or of its closely related biological equivalent, sensory modality. At its most gross level, the problem of modality is trivial. It is clear that the receptor organs make an initial analysis of incident physical energies by virtue of a lowered threshold to one kind of physical stimulus. There is no question that the human eye is best able to detect radiant energy between 400 and 800 nm., or that, among the sense organs, the ear responds maximally to pneumatic pressure fluctuations within the range of 30 to 15,000 Hz. This is the "adequate stimulus" basis of gross "place" coding of the senses. The second part of this dual interpretation is not trivial and has been the major point of attack of sensory theoreticians for the past century. The problem in this case concerns those different kinds of qualities discriminable within a
E.
SENSORY CODING
419
given modality. The nature of color vision and pitch perception, each representing families of micromodalities within vision and hearing, respectively, are problems at this level. Another problem encountered in the study of quality coding is that in some of the sense modalities neither the stimulus properties nor the sensory experiences are adequately defined. In vision and hearing research, for example, the specifications of the physical stimulus are very highly developed. We have a single physical dimension in each case that can be systematically varied to alter the microquality of the visual or auditory sensation. However, when one deals with senses such as somesthesis, olfaction, and gustation, another complication arises. The separation of the various modalities into their families of micromodalities is based upon popular, historic, and nonscientific traditions. The complexities of subjective quality in the cutaneous senses are probably not adequately described by a statement mentioning only the classic categories of touch, pain, pressure, warmth, cold, and an open-ended group of "derived" sensations. The electrical stimulation of the skin, for example, gives rise to sensations, some of which mimic some of these qualities and some for which these older classification schemes have no descriptive term. It is surprisingly difficult to give a good definition of quality. It might well be defined as the discriminable changes that are left over after one has accounted for the magnitude, temporal, and spatial differences between stimuli. The best I can do here is a statement that hopefully leaves the reader with the notion of discriminable differences between kinds of stimuli of the same modality. c.
Temporal Discriminations
There appear to be many different kinds of temporal discrimination, and these may be interpreted in manners common to several modalities. For very long durations and intervals, the temporal judgments made by an observer may be considered to be modality independent, because the stimulus events merely serve to delimit some other internal timing process. A click, a flash, or a tap can be used equally well to beat out a rhythm. On the other hand, specific timing considerations for very short times within a modality are critical. For example, the ability of the nervous system to use a frequency code for intensity is bounded at the high-frequency end of the spectrum by the refractory period of neurons. At the low-frequency end, theoretically there is no bound, but practically, as interpulse intervals increase, there arrives an interval beyond which the neuronal circuitry can no longer wait to do its counting. Fraisse (1966) has recently published an important volume dealing with the perception of time. It remains the best survey of temporal psychological processing available. Beyond the scholarly presentation of a wealth of studies of temporal discrimination, Fraisse's main contribution is an elucidation of the complexity of the family of time senses. A summary of his categorization of temporal skills includes the following discriminative abilities.
420
6.
REPRESENTATION AND CODING-THE BACKGROUND
Relative temporal order. The ability to determine which of two different stimuli arrived first is of a high level of biological significance. This parameter of the stimulus may be dealt with in purely temporal terms by the observer, but, surprisingly, it is more often interpreted spatially particularly when the interstimulus interval is small. One of the most familiar of this latter class of discriminative abilities is the auditory system's use of differential time of arrival (and for higher-frequency tones-relative intensity) to localize a sound source in space. The timing precision of binaural localization is astonishingly high, corresponding to only a few microseconds of asynchrony in the arrival times. Relative temporal order of two spatially disparate stimuli can also be a major determinant of the spatial localization of the resultant combined thermal, tactile, and gustatory sensation. Von Bekesy (e.g., 1963) has demonstrated many instances of such effects and has shown that very slight differences in the relative temporal order of the two stimuli can substantially change the apparent position of the fused sensation in audition, somatosensation, and even gustatory localization. Temporal acuity. Temporal acuity is defined as the ability to distinguish two identical stimulus events, sequential in time, as being separate rather than a single event. Temporal acuity is, of course, directly related to the relative temporal order sense, because to specify one stimulus as having preceded another, they must have been distinguished as separate events temporally. Yet it seems certain that these two capabilities are distinguishable from each other, because the former capability requires an ordinal judgment in addition to the more primitive resolution capability of temporal acuity. Hirsh and Sherrick (1961) demonstrated just such a distinction between Simultaneity and order in their classic study of the time senses by demonstrating situations in which it was possible to precisely state that two events had occurred even though the subject was confused about relative temporal order. Duration or interval. Another temporal sense involves the ability to replicate the sustained duration of an event or the interval between two events. This sense requires the organism to be capable of clocking time. How this is accomplished is a problem of much current speculation, because so many biological rhythms, which could serve as bases for the clocking operations, have been discovered. It is clear that there is probably no reason to distinguish between a marked interval and a continuous event, because the true stimulus information is only that included in the initiation and termination of the interval or event. All three of these temporal discriminative abilities probably play an important role in what might be called complex temporal pattern recognition. Whether there are other temporal abilities that must be added to fully describe all aspects of temporal discrimination is yet to be determined.
E.
d.
SENSORY CODING
421
Spatial Discriminations
The seminal visual research of Hubel and Wiesel (1959), Lettvin, Maturana, McCulloch, and Pitts (1959), and of Barlow and Hill (1963), among others, has emphasized the importance of special codes for spatial stimulus parameters. In each case, stimuli were shown to produce different nerve messages when the stimulus pattern differed geometrically, even though all other stimulus dimensions were held constant. On a Simpler level, it is clear that spatial localization of stimuli applied to different points of the receptor fields must be accounted for, and although we have a good deal of evidence to suggest that this is carried out by a corresponding place code (topographic representation) at least in the more peripheral portions of a sensory channel, there still remain two other major problems. First, how does one explain the pseudolocations made of interacting patterns, such as those summarized in von Bekesy's (1958) paper on funneling on the skin? Second, how does the mapping of spatial localization by a topographic code overlap with those theories of quality coding or of other stimulus dimensions that also require spatially distributed codes? In the study of interactions between different spatial areas, a great deal of progress has been made. Ratliff (1965) has reviewed the work of the last century, not only describing the spatial codes for contours, but also giving a detailed electrophysiological analysis of the transformation processes, which lead from the original spatial stimulus pattern to the evoked pattern of neural signals. We must, furthermore, determine how it is that different stimulus locations are discriminated from each other. We must determine what the neural codes are that allow us to discriminate size differences and enhance contours or even see textures. These, then, are among the most prominent of the common sensory dimensions that we know to be discriminable and that must be accounted for in any attempt to define the complete neural code. I next consider the possible dimensions of the nervous activities-the candidate codes-which might provide symbols for the representation of these sensory variables. 4.
Possible Dimensions of the Neural Code (The Candidate Codes)
To associate common sensory dimensions accurately with neural response dimensions, one must be cognizant of as many of the likely neural dimensions as possible. The purpose of this section is to list and describe the more important of these dimensions without resorting either to a meaningless class inclusive of all classes, such as "the spatio-temporal pattern," or to biologically unlikely possibilities. There are, however, two important cautions, which should be noted before we consider the items in this list. First of all, the list has its origins primarily
422
6.
REPRESENTATION AND CODING-THE BACKGROUND
in the observed dimensions of neuroe1ectrical signals. It should be remembered, as noted earlier, that many of these electrical signals may not be themselves the essential agents; rather, they may only be indicators of chemical processes, for example, which are more directly involved in the information flow. For example, at the synapse, the information flow is signaled by the amplitude of both the graded presynaptic and postsynaptic potentials. The actual synaptic transmission, on the other hand, is mediated by the number of packets of transmitter substance that migrate across the synaptic cleft. Both of these types of representation, the amount of the transmitter substance and the amplitude of the synaptic potentials, are presumably related, and each is a continuous function of input signal magnitude. Nevertheless, the code actually read by the postsynaptic tissue is not usually an electrical one, but a chemical one. That electrical signals generally are only indicators of more basic chemical processes elsewhere is rapidly becoming a prime tenet of modern neurophysiology. However, from the point of view of the network hypothesis, which has gradually been evolving in the book, neither of these means of communication of information, electrical or chemical, is really the essence of the relationship between the nervous system and the mind. Rather, the essential aspect of the nervous system is the arrangement of the neurons in the net, regardless of what "technology ," chemical, electrical, or what have you, is used to implement the interconnections. The second important caveat, already alluded to, is that this list of potential information-carrying neural signals is necessarily incomplete, and new items are being added to it almost every year as new candidate codes are uncovered by neurophysiologists. New instruments, new experiments, and, perhaps most important, new insights all suggest new candidate codes, which must be fit into this scheme. By accentuating the dimensionality of the signals rather than their specific physics, however, a considerable amount of generality and flexibility is achieved, and the list remains open-ended. It should also be reiterated here that the compilation of the list of candidate codes at this stage has nothing whatsoever to do with the psychophysical dimensions. It is, rather, a task that must be carried out by the neurophysiologist and, as such, is independent of any perceptual Significance that the candidate code might later be shown to possess. Once a code has been identified as a candidate, then its relationship to perceptual dimensions must be separately assayed by persons who are best operating in the field of science I have referred to as psychobiology. a.
Place
The particular location or place activated by an incoming signal is one very important means of representing some attribute of the input signal. There are several different kinds of spatial codes so far suggested. One of the most common is referred to as the labeled line. In neuronal communication systems, the
E. SENSORY CODING
423
mere fact that a given nerve is activated by virtue of the specific characteristics of the transducer is tantamount to a candidate code. There are many conceivable ways in which one particular neuron or group of neurons might be selected for the transmission of information by an incident stimulus. Lowered thresholds to particular types of physical energy or specific temporal pattern sensitivities are among the most interesting possibilities. Johannes Muller's theory (Muller, 1840) of the specific energy of nerves is a formal statement of place coding, which seems to hold true in a gross way for the representation of sensory quality. Activation of the optic nerve, for example, by any stimulus, no matter of what form, always produces a visual sensation. In a more microscopic sense, however, the coding of microquality (hue, pitch, and so on) by place codes seems to break down and depend more on some relative temporal code (see subsequent discussion). Individual fibers are not uniquely associated with a particular microquality, and therefore Muller's law does not hold at this level. b.
Number of Activated Units
Another possible dimension of neural coding is, simply, the number of activated fibers or cells in a given nerve tract or ganglion. Magnitude is the sensory dimension most often considered to be mapped, at least in part, by the number of responding neural elements. Because an increase in the number of responding units also means that more places must have been activated, strong interactions might be expected between stimulus dimensions coded by place and those coded by number. But this is not necessarily so. If there is an increase in activity in some regions and a decrease in others as a result of an increase in stimulus intensity, the resulting change in total number of activated units may, in fact, be zero! In addition to the number of fibers that fire, we may also consider that the number of times that a given fiber fires or the duration of a burst may be codes in some instances. Thus the number of activated units may be really a subclass of a more general candidate code-the number of impulsive responses occurring in a given volume in a given period of time. C.
Neural Event Amplitude
The discovery of the all-or-none law effectively removed the amplitude of the response of a single axonal spike potential from the list of candidate coding dimensions. It now seems certain that the all-or-none law is valid for axonal spikes and that the amplitude of the individual nerve impulse is related only to the metabolic state of the axon and not to any characteristic of the stimulus once the spike threshold is exceeded. However, it should be remembered that in other parts of the neuron, it has been equally clearly established that slow potentials of graded amplitude and prolonged duration are the significant information symbols.
424
6.
REPRESENTATION AND CODING-THE BACKGROUND
The amplitude of the neural signal appears as an important information-carrying code in many different contexts. Two of the most important examples are: the receptor-generator potential and the potentials that are recorded from postsynaptic tissues. In another context, we must also consider that some of the amplitude measures recorded from peripheral nerve compound action potentials may be closely associated with one or another discriminable stimulus dimension. Very often, however, differences in such compound neuroelectric amplitudes merely reflect more fundamental processes. For example, the compound action potential recorded from peripheral nerves is most probably a cumulative index of the number of constituent axons that are responding in an all-or-nothing fashion. d.
Temporal Pattern
Naming "temporal pattern" as one of the candidates for neural coding is almost as weak a statement as falling back on a coding category of "spatio-temporal patterns"; each notion is so vague as to be almost meaningless. In the following paragraphs, I strive for more precision by specifying exactly what temporal dimensions are under consideration. Graded potential time functions-the shape of the response. Graded potentials of all kinds, receptor-generator potentials, compound action potentials from peripheral nerves, evoked brain potentials, and even free-running electroencephalographic recordings, all can be described as having certain shapes. Shape is a vague term, of course, and what is usually meant by shape is a function or set of measures that describes the amplitUde fluctuation of the graded potential as a function of time. Some of these parameters are relatively straightforward. For example, the simple latency pattern of the various amplitude deviations following the presentation of a stimulus can be used as a first approximation to shape. But superimposed on these simple latency measures can also be descriptions of the characteristics of the rise of the waveform. Does it abruptly appear, or does it rise gradually in either a linear or exponential fashion? We have to answer precisely this kind of question in any attempt to use shape as a descriptor of the initial portion of a graded response. Another class of time function dimensions, which has often been of importance in the representation of sensory phenomena, deals with the steady-state or quasi-steady-state portion of the response. For example, one might want to know: Does the stimulus maintain a constant amount of activity, or is there a significant amount of adaptation or neural accommodation over the time course of a constant stimulus? For those signals, such as the free-running electroencephalogram, which are varying spontaneously (the word "spontaneously" must, of course, be read in this case as "under the influence of unknown stimuli"), such shape parameters as the rate of change, the frequency spectrum, and the rise and fall pattern of specific waveforms, all must be considered as candidate codes.
E. SENSORY CODING
425
The following temporal candidate codes are descriptors of the pattern of regenerative spike action potentials. Here, in accord with the all-or-none law, there is no suggestion that the amplitude of an individual spike can carry any useful information. Spike amplitude merely reflects local metabolic conditions perhaps modulated by conditions of previous response. The temporal parameters of importance are, therefore, only those that describe the pulse frequency modulation characteristics of groups of spike potentials. Frequency of firing. Although place and the number of activated neural elements are relatively unambiguous measures, which can be evaluated without confusion (even though the technical details may be cumbersome), frequency is an ambiguous dimension of neural activity. Frequency, or the number of responses per unit time, may be evaluated in one of two different ways by a subsequent decoding mechanism. The first way is one in which time measurements are made of the intervals between each pair of sequential responses. The alternative form of decoding possible for frequency is one in which a count is made of the number of neural events occurring within some basic integrating unit of time. As Anatol Rapoport (1962) pointed out, the interval-sensitive procedure would be essentially an analog process, because the range over which the interval varied could be continuous. On the other hand, the counting procedure is essentially a digital process dealing only with integral values of the number of events. Macrofluctuations in frequency pattern. Wall and Cronly-Dillon (1960) suggested that a specific code for somatosensory quality, at certain levels, might be the macropattern of the frequency of neural discharge in afferent pathways. Thus a frequency pattern in which the nerve impulse rate goes from a minimum frequency to a higher frequency very rapidly and then slowly diminishes would be perceived differently than a signal in the same pathway and with the same average frequency but whose frequency pattern slowly increases and then rapidly diminishes. These macrofluctuations are regularly observed in many types of neurophysiological recordings from single cells and might be of Significance in the encoding of stimulus dimensions other than quality. Microfluctuations in frequency pattern. An important related question is whether or not the nervous system is able to detect microfluctuations in frequency and whether such an arhythmia is a candidate neural code. A microfluctuation would be defined as a transient change in the frequency pattern. A missed pulse, an extra pulse, or a momentary gap in the train of spikes, all might be codes for one or another common sensory dimensions. Temporal comparisons between two or more places. Another important class of candidate codes includes those situations in which comparisons are made in some neural center between temporal patterns that arrive on spatially separate channels. The auditory system, for example, certainly operates in some fashion that takes into account the phase and amplitude differences of neural
426
6. REPRESENTATION AND CODING-THE BACKGROUND
responses to the synchronized stimuli applied to the two cochlea. Mountcastle, Poggio, and Werner (1963) have reported a similar temporal comparison process in position indicators in the cat's somatosensory thalamus. Furthermore, Pfaffmann (1959) has suggested that a similar kind of relative activity detection process might underlie gustatory quality coding. A related idea is the volley principle (Wever, 1949), which has been invoked to explain high-frequency following by the auditory system. According to this principle, spatially separate neural structures are capable of cooperatively conveying a frequency that exceeds the capacity of any individual neural structure. Such a process would require a high degree of synchronization detection ability on the part of the neurons involved and a precise comparison of their firing rates-an ability that somewhat surprisingly does seem to occur. In any of these cases, the important fact is that the critical information is not absolutely contained in a single channel of information; rather it depends upon comparison of relative amounts of activity in parallel channels. This is the basic idea of what are generally called pattern, crossneuron pattern, or ratio theories o/sensory quality. Derived statistical measures. When I spoke of macrofluctuations, I was referring to relatively continuous changes in the frequency pattern. When I spoke of microfluctuations, I was referring to transient changes in the frequency pattern. There is, however, another possibility. There may be long-term fluctuations in the statistics of the pulse pattern that depend upon an evaluation of the microtemporal fluctuations but in a summarized fashion over long periods of time. Thus, the standard deviation and the range of the interval histogram of individual units in the cochlear nucleus of the cat have been shown to exhibit a specific signature by Rodieck, Kiang, and Gerstein (1962). Mountcastle, Poggio, and Werner (1963) have also shown similar effects in thalamic cells representing joint position and have given an interesting analysis of how this information could be used as a code. Furthermore, other derived statistical measures are common descriptors in statistics. In addition to the mean frequency and the variance (or standard deviation) of the interval pattern, higher-order moments can also be calculated, which can be used to compute such characteristics as the skewness or kurtosis of an interval histogram. These derived statistical measures may also conceivably play a role in neural coding. Unfortunately, there have been no attempts to determine if these derivatives of the higher moments actually vary systematically with stimuli. Thus no progress has been made in testing their role as candidate codes, much less in determining how they might be associated with common sensory dimensions. 5.
Cautions in the Association of Sensory Dimensions and Candidate Codes
From the vantage point provided by the preceding discussion, it should be apparent that the general solution of the sensory-coding problem contains a number of substeps. First, the dimensions of sensory experiences must be elucidated by
E. SENSORY CODING
427
psychophysical experiments on man and animals. Second, the neurophysiologist must identify dimensions of neural activity for inclusion in the list of candidate codes by determining which dimensions are functions of stimulus variations. Third, preliminary associations can be made between common sensory dimensions and candidate codes at various levels of the ascending pathway. These associations, however, are at best only tentative. There are a number of conceptual constraints, which make final confirmation of the association-the fourth step-very elusive indeed. These constraints place limits on the assuranc(, with which we can accept any sensory-coding association. In the following paragraphs I point out some of these potential conceptual and technical pitfalls, which impede final confirmation of tentative associations between candidate codes and common sensory dimensions.
a.
The Sign-Code Distinction Revisited
Every point made in the discussion of the relevance of the distinction between signs and codes for evoked brain potentials earlier in this chapter is also germane at the level of the single-cell responses more typically encountered in the sensory-coding problem. There simply is no way of guaranteeing that any observed fluctuation in any neural response, no matter how closely it correlates with stimulus properties, is truly a code unless certain tests of necessity and sufficiency are carried out. A full discussion of this important problem in the domain of cellular responses and an example of an analytical test of a code are presented in Uttal (I 973) in my discussion of spike action potential interval irregularity. b.
Dimension Alterations
It must be expected from what we already know of the coding of sensory information that there will be very drastic changes in the coded form of a stimulus pattern at various levels of the afferent nervous system. For example, the most current theory of auditory encoding assumes that there is a transduction from temporal (frequency) stimulus patterns to a spatial code by a hydraulically mediated cochlear place localization of different frequencies. It is not too surprising, therefore, to learn of other specific neural structures that respond spatially to a specific temporal pattern of stimulus input. The existence of temporal "keys" -particular temporal patterns capable of activating specific loci, thus converting time to space-has been suggested by recent results (see, for example, Segundo & Perkel, 1969). On the psychophysical side of the ledger, MacKay (I 961) has also reported several instances of spatial patterns that give rise to flickering changes in the visual field, suggesting the conversion of spatial to temporal patterns. Presumably, these affects are related to eye movements. Thus we should expect spatial-to-temporal, as well as temporal-to-spatial, transformations. The caution inherent in these results is that we should not demand dimensional constancy throughout the afferent pathways and that codes at one level
428
6. REPRESENTATION AND CODING-THE BACKGROUND
need not be identical nor even of the same dimensional category as codes at another level. All that is required is that the information pattern be represented in one way or another at all levels. Thus a stimulus pattern might be represented at one level by a temporal code, at another by a spatial code, and at another by an amplitude code. As I have repeatedly said, no topographic or isomorphic consistency is really necessary, nor is there any need for linear ( or nonlinear) representation of functions that appear to the perceiver to be linear (or nonlinear). All that is required is some representational scheme that transmits the critical information of the input pattern in some available language. (The metaphorical line of speech translators discussed early in this chapter is the best way to concretize this important concept.) For these reasons, it is important to avoid the narrowing of perspective, which would arise from the false requirement that temporal stimulus patterns be represented by temporal candidate codes and from other similar but equally incorrect isomorphisms. A most important corollary is that there is no one coding scheme that can be identified for each dimension of each sense but only local definitions of codes at specifically defined levels. c.
Boundary Condition Results
Another caution relates to the fact that many results are significant only in the sense that they represent limiting cases or boundary conditions. The determination of a threshold in a psychophysical or a neurophysiological experiment is a case in point. The threshold may impose a limit on the availability of a certain dimension to serve in some particular coding operation, but it does not necessarily completely define the functional variability of such a dimension as the corresponding stimulus dimension is varied. Different coding mechanisms may come into play at different stimulus amplitudes, for example. d.
MUltiple and Overlapping Coding in Two or More Dimensions
The old phrase "spatio-temporal pattern," naive and virtually meaningless as it was, did reflect a certain problem. Many complex stimulus patterns are not unidimensional, and it is sometimes misleading to expect a given stimulus dimension to be associated with only one candidate code. In fact, such a separation may not be possible without considering interdimensional interaction, because some dimensions may act to modify some other dimensions. It is probably misleading to presume a one-to-one relationship between all stimulus dimensions and a single candidate code. As we see later, there appear to be many multiple codes used in the nervous system. There are two quite different ways in which multiple coding may be exhibited. The first way might be best called redundant coding. In this situation, the variation of a single stimulus dimension may lead to the simultaneous variation of two or more candidate codes. An example of such a phenomenon is the now
E.
SENSORY CODING
429
well-known simultaneous variation of mean frequency and variance of a spike action potential pattern as stimulus amplitude is varied. The second way in which multiple coding may be exhibited might best be called overlapping coding. In this situation, two or more stimulus dimensions may be capable of altering a single dimension of neural coding. For example, both the intensity and wavelength of a photic stimulus are known to affect the rate of firing of a ganglion cell axon in the optic nerve. Of course, in this case of a single neuron, such overlapping coding would lead to an ambiguous situation, because the change in wavelength could always be compensated for by a change in intensity. This ambiguity can be resolved only on the basis of other parallel neural communication lines that convey similarly coded information but with slightly different coding characteristics. In fact, this is probably the basis of color coding and perhaps of quality coding in general. A related matter of concern is produced by the very high degree of convergence within the nervous system. Signals at higher levels may not reflect the influence of a single input alone. Rather, such a higher-level signal may be the result of the integration and processing of patterns of inputs from several sources. The matter is further complicated by the fact that feedback signals from more central portions of the nervous system can also alter the pattern of activity at peripheral levels. This and related kinds of centrifugal (information flow from the central to the peripheral nervous system) effects often lead to responses of great complexity, because input-output relations now become subject to both regenerative and degenerative effects of positive and negative feedback. The difficulties and surprises of signal tracing encountered in systems with elaborate feedback loops are well-known to electrical engineers. e.
Species and Intraindividual Variability
Whereas ideally we would like to be able to generalize as much as possible to keep sensory theories as simple as possible, it is probably also important to keep in mind the fact that not all organisms, either within or among species, operate in exactly the same fashion. Furthermore, there is also the possibility of what Perkel and Bullock (1968) refer to as "labile coding." At different times in the organism's development, it is conceivable that different coding mechanisms underlie a single function. Analogous differences in coding mechanisms may be expected to exist at different stages of the evolutionary scale. f.
A ttentional Limits on Our Perspective
I have, several times in the course of this book, referred to the fact that the attention of the scientific community is directed to small portions of the total problem by accidents of technology and of paradigm. The overemphasis on the frequency factors of spike action potentials as codes for intensity has been the classic example. The development of the more general notion of the coding
430
6.
REPRESENTATION AND CODING-THE BACKGROUND
problem was inhibited until new instruments and new experimental paradigms widened our perspective. A very complete study of the general problem of paradigms of consensus and the ways in which revolutions in scientific perspectives occur has been made by Kuhn (1970) and may be of interest to the more philosophically oriented reader. F. AN INTERIM SUMMARY It should now be clear what we mean by a theory of sensory coding in particular and neural representation in general. Although there are a number of cautions that have to be observed, the general task involved in the unraveling of the codes of the nervous system is the precise definition of the association between percepts, experiences, and thoughts and the candidate neural dimensions. For the sensory domain, we might consider this task to be one in which we are required to fill in the entries on a correlation matrix such as that shown in Table 6-l. A table such as this will have to be developed for each level of afferent neural coding. This means that there may be several different levels of coding even within a single cell and many within the entire course of the ascending pathway. This need for a multiplication of coding matrices would be somewhat discouraging if it were not for the fact that we may also expect to find some common features at equivalent levels as comparisons are made across the senses. This places the severe requirement on the sensory-coding theory to be sure that all of the comparisons being made are between truly comparable levels of encoding. It is not too difficult to be led astray and to make false comparisons between levels that are actually nonequivalent in this search for a general theory. A classic example has been the oft-repeated statement that the frequency discrimination of the skin is far poorer than that of the ear. In fact, however, the frequency discrimination capability of the ear, from a neural coding point of view, is more comparable to spatial discrimination capability on the skin than to its frequency sensitivity. The analysis of acoustic frequencies into spatial patterns on the cochlea confuses the issue and illustrates some of the problems that can develop when one depends too much on the potential physical stimulus as a referent. On the other hand, when the acoustic nerve is driven electrically (Simmons et al., 1965), comparisons of the frequency discrimination capabilities of the ear and the skin turn out to be very similar. This brings me to my final point. The concept of representation or coding has intrinsic within it a major theoretical perspective that could reorganize much of our thinking in experimental psychology. If represented information, regardless of the dimensions of the specific neural code, may be transformed into percepts with other, and perhaps different, dimensions, then it follows that the interpretation of any coded afferent signal may be quite variable depending upon the circumstances of its surround or the previous experience of the perceiver. This variability in percept, it is important to note, is possible without a corresponding distortion in the particular afferent pattern of neural signals. Thus
.;:. tv
'"ec:
8
e E E
c:
(f)
>(; '"c: Q)
i5
E
c: Q)
";;;
~
Q)
I\.)
0>
FIG.9.6
Electrogenesis of transmitter substance secretion
Secretory terminal
Passive diffusion of transmitter substance
products
Secretory
Postsynaptic graded potential
\Depolarizing,/
(polari~ing~
Receptor site
Postsynaptic
Spike activity
Axon
Schematic sketch of the various stages of neuroelectric and chemical coding at a chemical synapse. (Adapted from Grundfest, 1957.)
Spike activity
Axon
Presynaptic
A.
MECHANISMS OF NEURAL PLASTICITY
627
Castillo and Katz (1954) observed in motor end plates, and Eccles, Eccles, Iggo, and Lundberg (1961) and Nishi and Koketsu (1960) observed in central neurons, that when a microelectrode was inserted into the postsynaptic tissue, spontaneous depolarizations occurred with rather irregular periodicity. These spontaneous depolarizations were very small, with an amplitude of only about .5 mY. Because of their small size, they have come to be called miniature postsynaptic potentials. It has been determined that the miniature postsynaptic potential results from the action of a relatively large and constant number of molecules (typically several thousand), because iontophoretic experiments in which only a few hundred ions are injected by a driving electrical field do not produce equivalent depolarizations. Because the miniature potentials regularly occur spontaneously, are of relatively constant amplitUde, and seem to result from the action of a substantial number of molecules, the idea that large numbers of molecules of the transmitter substance form constant size aggregates or packets has gained wide favor. Further direct support for this concept has come from electron micrographs of the sort shown in Fig. 9-7. The first observation of the vesicles, now believed to be the packets of transmitter substance, is usually attributed to De Robertis and Bennett (1954). Clusters of these structured vesicles can be seen in the presynaptic terminal in close proximity to the synaptic cleft. It is currently thought that the molecules separate either upon, or shortly after, the release and then travel independently to the postsynaptic receptor sites. Correspondence between the cleft size, the calculated diffusion time, and synaptic delays observed electrophysiologically, supports the notion that the molecules of the transmitter substance diffuse passively across the synaptic cleft. Recently some important new information concerning the production and recycling of the synaptic vesicles that originally contain the molecules of transmitter substance has been developed by a number of laboratories (Douglas, Nagasawa, & Schultz, 1970; and Heuser & Reese, 1973). Heuser and Reese, in particular, have proposed a model of vesicle recycling that is extremely interesting. The general outline of their theory, based upon an experimental procedure that involved both electrical recordings of postsynaptic potentials and an ultrarapid fixation method for electron microscopic examination, is diagrammed in Fig. 9-8. The cycle may be examined starting at the point (I) at which existing vesicles loaded with transmitter molecules migrate toward the plasma membrane of the neuron, presumably as a result of some stimulus applied to an earlier portion of the neuron. The vesicles merge with the plasma membrane releaSing their contents into the synaptic c1eft-a process that is called exocytosis. The remaining membranes of the vesicles gradually coalesce indistinguishably with the neuronal plasma membrane as shown at point (2). The increasing membrane area produced by this coalescence, however, is compensated for by a gradual reproduction of synaptic vesicles (pinocytosis) from a region of membrane near point (3), the boundary of the Schwann cell sheath of the cell. The vesicles so formed,
FIG.9.7 Electronmicrograph of a synaptic region of a Muller cell from a lamprey. Note the synaptic vesicles clustered on the presynaptic side of the cell. PI and P 2 = two postsynaptic cells; t and f= microtubules and microfilaments, respectively. Arrows point to regular arrangement of vesicles around microtubules. Magnification is about 130,OOOX. (From Smith, Jarlfors, & Beranek, ©I970, with the permission of The Rockefeller University Press.)
628
A.
MECHANISMS OF NEURAL PLASTICITY
629
629 629 629
629
629
FIG.9.8 Diagram depicting the metamorphasis of the synaptic vesicles. Numbered stages are described in the text. (From Heuser & Reese, ©1973, with the permission of The Rockefeller University Press.)
however, are coated with certain molecules that seem to prevent them from participating in the normal transmitter release process. It is likely, according to this research, that they do not, at this point, contain any of the transmitter substance. Heuser and Reese propose that the newly formed and coated vesicles next migrate to the interior of the cell, where they form specialized structures known as cisternae. At this point (4), the molecules making up the vesicle coating material are disassociated from the newly regenerated vesicles and migrate back to the membrane where they are again used to coat the next generations of recycled vesicles. The cisternae, however, undergo a process akin to budding, at point (5), during which fully loaded (with transmitter substance) and active vesicles are produced that can fully participate in the synaptic transmission process, at point (1), which I have already described. Like so many other theories of neuronal and membrane function, models of chemical synaptic function are in a rapid state of flux. In a very recent report, Marchbanks (1976) has suggested that the vesicles are, in fact, not receptacles for the transmitter substance actually used in synaptic conduction in some cases.
630
9.
NEURAL CORRELATES: MECHANISMS, MODELS
He feels that the acetylcholine that migrates across certain synaptic clefts does not appear to come from the vesicles. Rather, his observations suggest that it almost entirely comes from the pool of free acetylcholine in the presynaptic intracellular space. The acetylcholine in the vesicles, Marchbanks further hypothesizes, is actually only a buffer stored against possible depletion of the extravesicular (but intracellular) supply by heavy use. Furthermore, his work suggests that the stored acetylcholine is not actually contained within the vesicle but is more probably only attached to its surface much like the coating material is thought to be in Heuser and Reese's model. A variety of biochemicals are now believed to act as synaptic transmitter substances in the nervous systems of various animals. For example, as I have already mentioned, acetylcholine has been shown to be an effective transmitter substance at several synapses. Although this substance was long thought to be solely involved in neuroeffector junctions and parasympathetic action in vertebrates, "cholinergic" synapses are now known to be more generally distributed throughout all animal phyla. Similarly, the catecholamine norepinephrine has classically been suggested as the main transmitter substance in the sympathetic portion of the autonomic nervous system, and epinephrine plays the same role in the parasympathetic portion. Other possible excitatory transmitter substances in mammalian brains include 5-hydroxytryptamine (serotonin), glutamic and aspartic acids, and dopamine, another catecholamine. Glutamates also seem to be excitatory transmitters in the squid. Recent work indicates that some of the substances that had been thought to be exclusively excitatory may in fact be either excitatory or inhibitory, depending upon the nature of the receptor site. Acetylcholine, for example, is now known to serve either function in invertebrates like Aplysia (Wachtel & Kandel, 1967). It seems to be a general conclusion now that some transmitter substances may be both inhibitory and excitatory in different situations. The actual action of any transmitter seems to depend upon the nature of the receptor site and possibly even on the rate at which the substance is delivered (although variations in rate may simply serve to select different receptor sites). Nevertheless, even though a single substance may act to either inhibit or excite, a fundamental law of neurophysiology, Dale's principle, asserting that a single neuron is capable of producing only a single kind of transmitter substance, is still valid. Substances that have classically been considered as solely inhibitory in mammalian brains include the amino acids glycine and GABA (gamma-aminobutyric acid). On the other hand, a substance like strychnine, although not strictly a natural transmitter substance, is known to inhibit the action of inhibitory transmitters and thus to produce a pseudoexcitatory effect. Strychnine's general effect is, therefore, to elicit neural activity, and in large doses this activation may be so strong as to result in severe convulsions. Strychnine appears to decrease the sensitivity of postsynaptic regions to the inhibitory substances that normally stabilize the neural net.
A.
MECHANISMS OF NEURAL PLASTICITY
631
Another important synaptic chemical effect that should be kept distinct from the deactivation of inhibitory processes just described is the deactivation of the excitatory transmitter substances after they have migrated across the cleft by highly specific enzymes. For the system to maintain a high speed of response, it is obviously necessary that some active chemical means of deactivating excitatory transmitter substances must normally exist in the postsynaptic tissue. Otherwise the residual portion of the transmitter substance may exert a persistent influence and prolong the postsynaptic response. In cholinergic systems, for example, postsynaptic tissues seem to be rich in AChE (acetylcholine esterase), an enzyme that quickly breaks down acetylcholine. The synaptic effects of acetylcholine, therefore, last for only a few milliseconds before the residual transmitter molecules are destroyed. This is particularly important in speeding up the action of neurons and motor units, because perseveration of a response would reduce the speed at which changes could be made in the firing rate of the neuron. It has also been suggested that some general anesthetic drugs work in a similar way by interfering with chemical synaptic transmission in the central nervous system. The postsynaptic sites, chemoreceptors that they are, are especially likely to bind a number of chemicals, but not all that are bound necessarily excite the neuron. Specifically, the suggestion has been made that blocking occurs when the anesthetic drug occupies many of the postsynaptic receptor sites without producing the effects required for nervous transmission. Because the sites are already occupied, any of the usual transmitter substance that might arrive would be ineffective. Other anesthetics, particularly local ones, must directly act to reduce axonal activity. Although several major stages of the synaptic transmission process have been distinguished, not all are yet fully understood. It is not definitely known, for example, if spike action potentials produce an intermediate graded potential in presynaptic tissue or if the potentials detected there are merely degraded spikes. A question of equal importance concerns the action of the specific release mechanism. How does a graded potential release, or trigger the release, of transmitter molecules? In other words, what ionic mechanism suddenly stimulates extensive expulsion of transmitter chemicals when previously only infrequent spontaneous passages leading to miniature postsynaptic potentials occurred. About all that seems certain now is that calcium ions are intimately related somehow to the release of transmitter substances. The graded potentials produced in the presynaptic region by the spike action potentials seem to pull Ca ++ ions into the cell. As these ions move into the neuron, the synaptic vesicles may tend to fuse more often with the membrane and thus more frequently release the molecules of the transmitter substance into the interneural cleft (if this is the correct model). These and related issues represent the frontier of current research in the field. A full and excellent discussion of synaptic biochemistry can be found in Albers, Siegel, Katzman, and Agranoff (1972) or in Dunn and Bondy (1974).
632
9. NEURAL CORRELATES: MECHANISMS, MODELS
Bennett's (I 974) collection of papers on synaptic transmission is an excellent source of general information about contemporary synaptic data and theory. A comparison of electrical and chemical synapses. It is important for the reader to understand that the distinction between an electrical and a chemical synapse does not mean that one is exclusively chemical or the other exclusively electrical. The voltages and currents that are the primary excitants in electrical synapses do result from the same sort of ionic flow processes that generate the electrical resting and action potentials. Although the chemical synapse seems to be relatively insensitive to electricity as a primary excitant (some neurophysiologists feel that it is completely inexcitable electrically), it does exhibit the same sort of electropotentials as any ion-membrane system. Because our technology for observing the magnitude and time course of these chemical events is almost exclusively an electrical one, chemical events can be only indirectly observed with the usual sort of electrophysiological electrodes and amplifiers. The key difference between chemical and electrical synapses is the nature of the primary excitant that is best able to alter the resting potential of the postsynaptic region. In the epilog to his distinguished book, Eccles (1964) lists some more-specific distinguishing features of the chemical and electrical synapse, which have been paraphrased as follows:
1. A much smaller synaptic delay occurs across an electrical synapse than across a chemically mediated synapse. 2. The passive (electrotonic) spread of the generated postsynaptic potentials extends much further in an electrical than in a chemical synapse. An electrode placed in a presynaptic neuron will pick up much more of the activity in the postsynaptic neuron if the synapse is electrically mediated. In other words, electrical synapses are more closely electrically coupled than chemical ones. 3. The size of the synaptic cleft between two neurons is much larger in a chemically mediated synapse than in an electrical synapse, where the plasma membranes of the cells in some cases may actually be fused. This reduced spacing in an electrical synapse may partly explain the high level of electrical coupling described in (2). 4. Two neurons that electrically interact at a synapse are usually of roughly the same size. If the synaptic element of the first cell is very much smaller than that of the second cell, chemical transmission is almost obligatory, because direct electrical action between small presynaptic and large postsynaptic neurons would require some sort of an electrical energy amplification process. No such process is known to exist in electrical synapses. In the past decade, considerable additional distinctions have been noted between these two forms of synapses. Bennett (1972) lists a number of others that can be paraphrased and added to the previous four criteria.
A,
MECHANISMS OF NEURAL PLASTICITY
633
5. Individual chemical synapses are intrinsically unidirectional (rectifying) because of the asymmetrical physical arrangement of the source of the transmitter substance in the presynaptic neuron and the locations of the receptor sites in the postsynaptic neurons. Electrical synapses may possibly rectify, in some instances, but there appears to be conflicting evidence that this actually happens, and, in general, information flow seems to be possibly equally well in both directions. 6. Chemical inhibitory synapses are common, but direct inhibition is not observed in electrical synapses. 7. Electrical synapses are basically linear systems (following Ohm's Law), whereas chemical synapses are nonlinear (this is another way of saying that they amplify). 8. Temporal summation is common in chemical synapses but difficult to effect in electrical synapses because of the rapid decay of electrotonic spread. b.
Possible Synaptic Plastic Mechanisms
In this section I direct attention to the possible ways in which a synapse could conceivably change its properties such that its transmission conductivity would be altered as a result of previous activation. As we see later, there are many physiological possibilities that could account for such plasticity in the efficacy of a given synaptic junction. It is important to reiterate that the changes I am about to describe must ultimately exert their influence on learning by means of the changes they produce in the state of the neuronal network. Thus any theories of learning based upon synaptic change are not, in any way, inconsistent with the more general concept of the network state. The important generalization to be kept in mind, regardless of the particular membrane mechanism invoked to account for the synaptic change, is that the overall effect of plasticity is to modulate the ease with which information can pass from a presynaptic to a postsynaptic cell and thus to reconfigure the monentary state of the overall neural network. The rest of the details are but the "technological" means by which this informational reconfiguration occurs. John (1967), in his distinguished review of the mechanisms of memory, points out that there are really only three fundamentally different ways in which network reorganization can take place. Two of these are explicitly synaptic effects, and the third is a temporal process not directly involving synaptic plasticity. He refers to these three categories as the "Growth," "Shunt," and "Mode" hypotheses, respectively. By the first of the two synaptic possibilities, the growth hypothesis, John means that the nervous system may create new pathways in the neural network
634
9.
NEURAL CORRELATES: MECHANISMS, MODELS
as a result of the birth of new synaptic junctions. John's second possible synaptic means of altering network organization, which he refers to as the shunt hypothesis, asserts that there is actually very little growth in the interconnectivity pattern among the neurons in the network. Rather, he suggests that only a few of the multitude of preexisting and available synapses are selected from among the others by a process of differential facilitation. John's third category, the mode hypothesis, is nonsynaptic but invokes transitory processes in which circulating patterns of neural impulses account for information storage and the details of synaptic transmissions at any place at any time. John (1967) defines a mode as a "temporal sequence of states in the network [po 64)" with an increase in the probability of particular patterns of activity in the network. In brief, the idea is that circulating nerve action potentials are capable of modulating synaptic conductivity by providing, at the proper time, "gating signals" that permit an otherwise ineffective signal to pass through a junction. It is important to note again, as this point in this discussion is passed, that much of John's thinking is based upon the concept of probabilities of groups of neural responses rather than upon specific deterministic mechanisms associated with individual neurons. His approach has a considerable a priori validity that can easily be lost sight of at the level of ultramicroscopic examination to which I direct the following discussion. It is vitally important to remember that each individual synapse is probably irrelevant to the molar behavior of the organism. Each synapse is but one of a myriad of similar structures contributing to the global response. In the realm of learning, as in the realm of perception, it is extremely difficult to understand how any individual synapse could be essential to any molar mental act. It seems far more likely that the relevance of these synapses is meaningful only in the statistical or probabilistic sense. It is the "central tendency" of the responses of a relatively large population of synapses that is important, not the individual synaptic response itself. With this caveat in mind, I can now profitably turn to a consideration of the specific microscopic and individual synaptic mechanisms that might account for the reconfiguration of neural nets that, without a doubt, occurs during learning. I consider, in turn, one possible mechanism of plasticity in electrical synapses and then several of the possible mechanisms of plasticity in chemical synapses. Finally, observed changes involving the actual physical growth of old synapses, or even the birth of new synapses that might serve as explanations of variable synaptic conductivity, are discussed. Plastic mechanisms in electrical synapses. The possible mechanisms underlying plastic functional change are numerous in chemical synapses. For many reasons, however, it has not been fashionable to consider the electrical synapse as a likely site of plasticity, particularly in discussions of the mammalian brain. Recently, however, Llinas (1974) has proposed a possible mechanism by which the action of an electrical synapse might be modified through the intervention of
A.
MECHANISMS OF NEURAL PLASTICITY
635
a nearby chemical synapse. Figure 9-9a shows Llinas' hypothesis of a possible arrangement of some of the neurons of the mammalian inferior olive. Note that two dendrites (lOD) are interconnected by electrical gap junctions (indicated by arrows) and that chemical synaptic terminals (ST) are also located nearby. Figure 9-9b shows how the electrical synapse could operate in the normal state. There is close electrical coupling between the two IODs, and one is able, therefore, to excite the other. This simple arrangement allows electrotonic currents to flow between the two IODs much in the way a closed switch operates. When the chemical synapse is activated and its transmitter substance is released, however, Llinas suggests that there may be a reduction in the effective membrane resistance of the presynaptic portion of the electrical junction that short circuits
INFERIOR OLIVE GLOMERULUS
(AI
and
and
ELECTROTONIC COUPLING 100
and
ELECTROTONIC UNCOUPLING and
100
(B)
IC)
and
and
FIG.9.9 A hypothetical model of a combined electrical and chemical system in the gomerulus of the inferior olive that might demonstrate plasticity in an electrical synapse. (From Llinas, ©1974, with the permission of The American Physiological Society.)
636
9.
NEURAL CORRELATES: MECHANISMS, MODELS
(or shunts) the ionic currents that had previously been effective in activating the postsynaptic electrical portion. The electrical synaptic circuit thus is rendered ineffective (corresponding to an opening of the switch, as shown in Fig. 9-9c), and the network of which this system is a part is thereby changed in state. This single hypothesis of a chemically mediated electrical modulation is the only mechanism yet suggested that might allow an electrical synapse to exhibit the kind of plastic behavior required for the mediation of molar learning.
Nonstructural plastic mechanisms in chemical synapses. Quite unlike the situation with electrical synapses, the chemical synapse provides a plethora of candidate mechanisms that might account for the neural response variability observed to result from prior experience. In this section I consider some of the more plausible of these synaptic changes. The processes to be discussed occur on a molecular scale and, therefore, are observable only to the extent that they modify biochemical or physiological measures and not as growth or other anatomic changes. In the section to follow I consider changes that are better described in terms of the physical growth of old, or the birth of entirely new, synapses. Presynaptic mechanisms. Some mechanisms of synaptic plasticity may be related to the presynaptic processes that control the release of transmitter substance. It is conceivable, for example, that the process by which an incoming spike action potential is converted to the transmitter releasing processes may be inhibited or enhanced by previous activation within the presynaptic neuron. It is also possible that a persistent hyperpolarization of the presynaptic membrane, by repeated activation, could result in a reduced ability to trigger transmitter release. There are, therefore, several points in the presynaptic portions of the synapse at which a plastic effect might be mediated. Surprisingly, however, one of the most obvious presynaptic candidates to explain conditioned decrements in synaptic conductivity-depletion of the transmitter substance itself-turns out to be a relatively unlikely possibility. Transmitter substance is available in apparently large amounts relative to the quantity released by each synaptic activation either directly or in quickly available backup stores (see discussion of Marchbank's work on p. 629). Thus transmitter depletion at any conceivable realistic level of physiological activity seems to be unlikely. A transmitter substance depletion argument is also severely weakened by the fact that the general effects of repetitive stimulation are very often just the opposite of the prediction of a depletion hypothesis-a potentiation or increase rather than a decrease in synaptic conductivity. The converse of a depletion hypothesis-a mobilization of increased amounts of available transmitter substance-is, however, a possible basis of potentiated or enhanced synaptic plasticity. In general, therefore, it now seems more likely that fluctuation in the probability of the release of transmitter molecules, rather than change in the amount
A.
MECHANISMS OF NEURAL PLASTICITY
637
of those molecules, is the best explanation of presynaptic plasticity. One suggestion is that the probability of transmitter release is closely related to the amount of ionic calcium present in the presynaptic terminal. For a summary of recent research on calcium ion effects, see Kupfermann (1975). For a discussion of the presynaptic blockade of transmitter release by botulinum toxin, see Kao. Drachman, and Price (1976). Post-tetanic potentiation. Another important mechanism of synaptic plasticity is known as post-tetanic potentiation (PTP). Although it is not certain whether PTP is a presynaptic or postsynaptic effect, it is of sufficient interest to be considered in detail at this point. Assume the following experimental preparation: A stimulating electrode is inserted into a presynaptic axon, and a recording electrode is placed somewhere in the postsynaptic neuron. The recording electrode may be recording either postsynaptic potentials in the region of the synaptic interface between the two cells, or spike activity in the axon. In either case, its purpose is to measure the change in activity of the postsynaptic cell as a function of presynaptic stimulation. Now assume that the presynaptic cell is actually stimulated with a train of electrical pulses. Such a pattern of stimulation is said to be tetanic because a similar stream of nerve impulses is thought to be responsible for the persistent contraction of a muscle, a phenomenon known as a tetanus. The effect of a single tetanic stimulus burst on many types of synapses is the production of a prolonged hyper-responsiveness, or potentiation, of the preparation to a single test stimulus pulse. Post-tetanic (or postactivation) potentiation may be evidenced either by a greatly enlarged postsynaptic graded potential or by an increase in the number of postsynaptic spike action potentials produced by the single presynaptic test stimulus. Figure 9-10 shows a typical set of data illustrating the very long duration of potentiation obtainable. Although not all post-tetanic potentiations last as long or produce as large a change in relative response magnitude, this record is not grossly atypical and illustrates the very elongated periods of hyper-responsiveness obtained with this procedure. Post-tetanic potentiation is of particular interest because it is one of the few single-cell plastic changes in vertebrates with durations comparable to those of behavioral phenomena. Eccles (1953) specifically has speculated that this type of increase in long-term cellular responsiveness may be one of the underlying mechanisms of human short-term memory. Because post-tetanic potentiation is transient and gradually does decrease, however, it could not possibly explain long-term memorial storage. Post-tetanic potentiation has been found frequently in vertebrate cerebral neurons, as well as in peripheral cells; thus it is apparently a ubiquitous phenomenon that may well have an important bearing on neural theories of short-term learning. Bliss and Lomo (1970) have shown by a number of independent lines of research that post-tetanic potentiation occurs with great magnitude and with especially long persistence in the mammalian hippocampus, a structure already
638
9. NEURAL CORRELATES: MECHANISMS, MODELS
140
130
"0 E
~ 120
c: 0
,.e 0
110
100
A 0
2
4
2
4
Minutes FIG.9.10 Plot of the time course of post-tetanic potentiation as reflected in the amplitude of a postsynaptic graded potential. The pulse on line A indicates the occurrence of the tetanic burst. (From Eccles, © 1953, after Liley & North, with the permission of Oxford University Press/Clarendon.)
noted to be closely related to the control and regulation of learning, if not mediating the actual storage of information. Their demonstration showed that hippocampal potentiation may last for several hours. Postsynaptic mechanisms. In introducing this discussion of plastic effects in chemical synapses, I noted that the changes occurring at the presynaptic source of the transmitter substance were not the only plaUSible candidates to explain the changes in synaptic conductivity; comparable changes at the receptor areas of the postsynaptic neuronal membrane might also be responsible. The transduction process that converts the signal implicit in the diffusing transmitter chemical to a pattern of postsynaptic spike action potentials is also made up of several steps. Anyone of these steps might be altered to produce a change in the efficiency of conduction. Some of the most plausible postsynaptic locations of plasticity are the receptor sites themselves, chemically tuned as they are to particular transmitter substances. Deutsch (1973), for example, has specifically proposed that alterations in neuronal plasticity might be explained by a diminished or heightened sensitivity to the specific transmitter substance on the part of receptor sites. Furthermore, reductions in the postsynaptic neurosecretion of enzymes like acetylcholinesterase, which are primarily responsible for the deactivation of the
A.
MECHANISMS OF NEURAL PLASTICITY
639
transmitter substances, also might lead to abnormally prolonged efficiency of transmitter chemicals. The nature of the receptor sites for synaptic transmitter substances is just beginning to be understood. The "sites," themselves, are probably protein molecules with the ability to selectively combine with the transmitter molecules in ways leading to changes in the postsynaptic plasma membrane properties. The induced membrane change may be in the form of a variation in the membrane permeability, resulting in transmembrane ion concentration shifts that are subsequently measured as postsynaptic potentials. A very specific hypothesis of postsynaptic plasticity has been made by Mark (1974). He suggests that activity in a presynaptic neuron tends to build up high concentrations of calcium and sodium ions. The increase in intracellular sodium ions, in particular, produces an increase in the sodium pumping action that in turn produces an increase in the production of intracellular proteins whose structure is specific to that particular neuron. Mark assumes that the postsynaptic neuron then becomes sensitized to the "tailored" protein much as if it were an an tibody. A postsynaptic neuron is thus tuned to fire more easily to a particular presynaptic neuron and to repress inputs from other presynaptic neurons. As a result of the selective sensitization of groups of neurons in this manner, the overall arrangement of the neural network assumes a new state, and the animal displays an al tered form of molar behavior. This hypothetical process is diagrammed in Fig. 9-11. Another possibility, of course, is an increase in the number of ultramicroscopic receptor sites rather than an increase in the efficiency of a constant number of sites. Such a possibility, however, is transacted at a molecular level that cannot be tested with the techniques currently available for measuring synaptic structure. The concept of an increase in the number of receptor sites bridges the gap from the mechanisms previously discussed (which are essentially changes in the molecular status of existing synapses) to consideration of another major possibility -the actual physical growth of new synaptic connections. Although it is well established that regeneration of damaged tissue in the central nervous system does not usually occur, the reasons for this are not entirely clear. Central neurons do grow when incubated in vitro. Perhaps functional regeneration of central nervous tissue is poor simply because of the very dense packing of neurons in their normal state. There is, however, a considerable amount of evidence of the sprouting of what are called dendritic spines, which appear to contain synaptic connections, as a result of use and of their degeneration as a result of disuse. One of the first suggestions that physical growth of dendritic processes plays a role in learning was made by Ariens-Kappers (1917), who proposed a directed growth of dendrites from one neuron to another as a result of neural activity. The key idea was that Growth of synapses-structural changes.
640
9. NEURAL CORRELATES: MECHANISMS. MODELS
Active presynaptic neuron Amino acids for marker synthesis
how how
Repression
Repression of inactive synapses
FIG.9.11 Mark's theory of how synaptic use could lead to changes in conductivity. According to his model. activation of a presynaptic neuron is accompanied by an increase in the extrusion of Ca ++ and Na +. This changes the internal environment of the neuron so that it later has a tendency to take up a different kind of amino acid than usual. This leads to further changes in the specific proteins produced. The post synaptic neuron responds specifically to the protein, in a manner similar to an antibody-antigen reaction, thus changing the state of the neural network in a semipermanent way. (From Mark, ©1974, with the permission of The Oxford University Press.)
when these fibrils came into proximity with each other, membrane growth processes occurred that led to the creation of new synapses. Although the exact mechanism of such directed growth is not known (some workers have suggested that it is a function of the biochemicals secreted by the neurons themselves), the general process in which neural tissue grows toward or specializes around the regions from which it receives the highest level of stimulation is referred to as neurobiotaxis. Chemical biotaxis, in general, has been suggested as the means by which particular portions of the total genetic code (contained in all cells) are selected for expression by each cell during ontogeny. Much later, Eccles (1953) contributed the next important step to the theory of synaptic growth. He proposed a more specific hypothesis-the swelling or ballooning of synapses with use-that has been extremely enduring in spite of the fact that it has only rarely been substantiated by direct observation.
A.
MECHANISMS OF NEURAL PLASTICITY
641
Axons had long been suspected to swell as a function of usage (Hill, 1950), and Eccles suggested that a similar mechanism, although in a somewhat more delicate form, could account for the increased conductivity of synapses during learning. He further suggested that the swelling was a direct result of increased water uptake and that the increased synaptic size led to an increased synaptic conductivity simply because of a reduction in the membrane resistance and the changing distribution of presynaptic transmitter release sites. Other workers (e.g., Ranck, 1964), noting that selective destruction of synapses can also be used to modify the state of the network in the same way that punching out holes in a paper card can enrich its information content, suggested that a process of selective synaptic degeneration might be a relevant structural change mediating learning at the molar level. The notion that synapses may actually be either created or enlarged as a function of learning is one of the predominan t themes of research in this field today. Cells of the mammalian brain, such as the great pyramidal neurons, have an extremely rich arborization consisting of spines projecting from all portions of the neuron-the cell body, the axon, or dendrites. Figure 9-12 is a drawing ofa typical pyramidal cell (from layer 5 of the cerebral cortex of a rat) and indicates some of the nomenclature used in these investigations. A few of the spines on this neuron, which look like simple granules in low-power magnification, are enlarged in the inset to show both their detailed structure and the way in which the synaptic terminations are typically located at the most distal portion of the spines. An important feature of these pyramidal neuron spines is their great variance in shape. Chang (1952) has proposed that the electrical resistance of a spine may vary as a function of its shape. Thus spines with long thin stalks, for simple physical reasons, would be expected to have a higher electrical resistance than short stumpy spines. The variation in synaptic resistance could, just as did synaptic swelling, directly affect the ability of a synapse to pass presynaptic graded currents and thus the general ability of the synaptic junction to convey information between the two neurons it interconnected. The hypothesis of a relationship between shape and conductivity is strongly supported by the work of Peters and Kaiserman-Abramof (1970), who have shown in a series of elegant microanatomical studies the extreme diversity of shape of dendritic spines found on pyramidal cells. Figure 9-13 shows the microanatomy of a sample of spines based on the observations of Peters and Kaiserman-Abramof and some estimates made by RaIl (1974) of the membrane resistance these variously shaped spines should exhibit. Rall has carried this logic one step further by suggesting that systematic alterations in the geometry of the family of spines on a given neuron may be a function of the neuron's past experience. Thus changes in the microanatomy of synaptic spines must be added to our list of possible plastic synaptic mechanisms and thus of learning.
APICAL DENDRITIC SHAFT
neuron cortical neuron intercon-
DENDRITIC DENDRITIC
interconinterconintercon-
neuron DENDRITIC
neuron neuron
neuron
FIG.9.12 Drawing of a cortical pyramidal neuron stained with a Golgi silver stain showing the major features of this type of cell and the details (inset) of the spiny synaptic interconnections. (From Greenough, ©1975, with the permission of American Scientist, Journal of Sigma Xi, The Scientific Research Society of North America.)
642
w
~
stem length 0.8 fi bulb dimensions 1.4 x 0.6fi
0.5-2.5fi
length 1.5 fi
Average length t 7 fi Range 0.5-4.0 fi Average stem length 1.1 fi Average bulb dimensions 0.6 fi Stem diam. 0.05 to 0.3fL
3. Thin
Average Range Average Average
2. Mushroom-shaped
Average length 1.0fi Range 0.5 - 1.5 fi
t Stubby
Dimensions
ss
106 to 107 ohm
R ~ 107 to lO9 ohm
ss ~ R
Rss~ lO5 to lO6 ohm
Estimates of spine stem resistance
FIG. 9.13 Comparison of the shapes and stem resistances of pyramidal spines. This figure is an adaptation of one originally produced by RaIl (1974) (who provided the estimates of spine stem resistance and who utilized the spine drawings and dimensions from Peters and KaisermanAbramof (1970). (Used with the permission of RaIl and Peters & Kaiserman-Abramof and Brain Information Service/Brain Research Institute and The Wistar Press.)
Shapes and lengths of dendritic spines
644
9.
NEURAL CORRELATES: MECHANISMS. MODELS
Extensive changes have indeed been observed in the number and shapes of pyramidal neuron spines as a result of neuronal activity. In fact, the spine count seems to go up and down as a result of use and disuse (and as we see later, also possibly as a result of reinforcement) in a regular fashion on several different portions of the neuron. The technique used to make such measurements typically requires the manipulation of some aspect of the experimental animal's environment (either internal or external) and subsequent microscopic examination of postmortem specimens of brain tissue usually prepared with the Golgi silver stain. The Golgi stain, as I have noted, has the highly desirable property of staining most portions of a single neuron but only a relatively few of the all-toonumerous ones present in prepared tissue. Thus the anatomy of a single cell can be examined in detail in a field "cleared" of most of the surrounding neuronal tissue. The technique for examining these structural effects of learning was developed to its current high levels by Valverde (1968). Valverde's now nearly classic study pointed out a considerable difference in the proliferation of dendritic spines in the occipital cortex, in particular when a mouse had one eye ennucleated at birth, compared to control areas of the brain. Figure 9-14, for example, shows two photographs of the apical dendrites from a pyramidal cell in the mouse's visual cortex. The experimental picture (Fig. 9-14b) is of a neuron from the side of the cortex contralateral to the ennucleated eye, and the control picture (Fig. 9-l4a) is of a portion of a pyramidal neuron from the ipsilateral occipital cortex. Because the visual fibers are mostly crossed over in the mouse, this is a valuable built-in control. Obviously there is a substantial difference between the two photographs. Spines are much less frequent on the contralateral side of the brain to which the ennucleated eye would have projected and which, therefore, had received a much lower level of activation. There are, of course, a large number of potential artifacts in this kind of research. It is possible that the effects are not visual but have something quite separate to do with mechanical or degenerative damage done by the ennucleation itself. One control in this case would be examination of the effects on other regions of the brain that are not primary visual projection areas. Valverde has carried out just this control and graphically displays the summarized spine counts for neurons in both the occipital and temporal lobes of the mouse's brain shown in Fig. 9- 15. Obviously, the effect on spine growth of visual stimulation is restricted to the visual regions of the brain and are not general to nonvisual regions. A simple interpretation of this deceptively complex experiment, however, still eludes us. The data are further confounded by the fact that ennucleation is a total and drastic reduction not only of the details and geometry of the visual stimulus but of all photic energy. Some more-recent studies have added to our understanding of the problem by showing that not only the total absence of light per se but also very much more subtle aspects of visual experience can produce similar spine count changes in particular regions of the visual cortex.
A.
MECHANISMS OF NEURAL PLASTICITY
645
°
501-' (A)
(6)
FIG.9.14 Photographs of two dendrites from the visual cortex of a 48-day-old mouse. The photo on the left shows the normal rich growth of dendritic spines and associated fibers (1, 2, and 3) on the side of the brain ipsila teral to an ennuc1ea ted eye. The photo on the right is from a pyramidal cell contralateral to the ennuc1eated eye. (From Valverde, ©1968, with the permission of Springer-Verlag, Inc.)
Globus, Rosenzweig, Bennett, and Diamond (1973), for example, have shown that both the spine count and the width of individual spines of pyramidal cells in the occipital cortex varies substantially as a result of whether rats lived in an "enriched" environment with many toys and much maze-running experience or in an "impoverished" environment with little stimulation. Interestingly, the spine counts and shape varied only on dendrites near the base cell body in this experiment. Spine counts and shapes on other portions of the dendritic tree of these cells changed only slightly if at all. Greenough, Volkmar, and Juraska (1973) and Greenough and Volkmar (1973) have found essentially the same result in other portions of the brain. In their experiments, however, the most significant changes occurred in the inferotemporal cortex, and very few spine counts changed when comparisons were made in the frontal lobes. Because the temporal lobe is also tightly linked to
Mice raised under normal conditions
.=
10
's on '"
60
Mice raised in darkness
.., 0
=
'"e 50 ~
.'"= .. =
40
'" 30 ..,
=
= ='"
=
...,
20 10
';;'
"'"
10
CASE
'" '" :EN '" '" . of =.of spines on 50 micra '"showing . the number:EN "'.pyramidal Charts :EN :EN :EN :EN :EN segments CON
AGE (Oa,s)
on
Mice raised under normal conditions ~
o
.... N
00.
Mice raise d in darkness
10
u
's
:;:
60
:; 50 e
.....'" ..'"'"= ..
40
:. 30 u
=
...'"= ~
=
"'"
20 10 10
CASE IGE (Days)
"'. Charts showing the number of spines on 50 micra segments of :ENpyramidal
FIG.9.15 Charts showing the number of spines on 50 micra segments of pyramidal cell dendrites as a function of location and rearing conditions. The upper figure compares the spine counts in the visual cortex for normal and dark-reared animals, There is obviously a difference as a result of the rearing conditions. The lower chart shows the counts for the temporal lobe, Little change is observed as a result of the rearing conditions in this region, Obviously the effect is specific to the visual projection regions. (From Valverde, © 196 7, with the permission of Springer-Verlag, Inc,
646
A. MECHANISMS OF NEURAL PLASTICITY
647
visual inputs, and the effect is not universally found throughout the brain, further support is provided for the hypothesis that the microscopic anatomical changes in the shape of the synaptic spines are direct effects of very subtle aspects of the visual experience. An interesting variation on this experimental theme has been carried out by Rutledge, Wright, and Duncan (1974), who used a more specifically designed learning paradigm on cats. These workers combined long-term (several weeks in duration) electrical brain stimulation with a classical conditioning paradigm in which the electrical brain stimulation served as the conditioned stimulus and a shock to the foot served as the unconditioned stimulus. The usual Golgi-type staining procedure was then used on postmortem brain samples to determine the effects of this conditioning procedure on neural and synaptic growth. These effects were compared to tissue samples taken from cats that had also received the brain stimulus and the foot shock but without pairing. Rutledge and his colleagues found a considerably greater increase in spine count when the es and the ues had been paired than when they had not been paired on the ipsilateral side of the brain. In addition, there seemed to be a considerably greater degree of sprouting in the brain regions contralateral to the side on which the electrical stimuli had been applied. Figure 9-16, for example, shows the difference between contralateral and ipsilateral brain tissue from a trained animal. The ipsilateral-contralateral difference seemed to be independent of whether the animal was trained, however. It occurred in all animals on the side opposite to the portion of the cortex to which the electrical stimulus had been applied even without pairing.
FIG.9.16 Two photographs of dendritic growth in the cat's brain produced by electrical stimulation of the suprasylvan cortex. The picture on the left is from the side of the brain to which the electrical stimulation was applied; the picture on the right shows the much more extensive growth stimulated in the contralateral supra sylvan sulcus. The effect of stimulation therefore seems to be enhanced by some mediation by an intermediate synaptic connection. (From Rutledge, Wright, & Duncan, ©1974, with the permission of Academic Press.)
648
9.
NEURAL CORRELATES: MECHANISMS, MODELS
Why should the growth have been greater on the contralateral side even when the pairing had not been carried out? Rutledge and his colleagues suggest that this might be due to the fact that some sort of trans-synaptic activation is required to elicit neuron sprouting and growth. The fibers on the stimulated side of the brain were devoid of any trans-synaptic influence, for the most part; they were directly stimulated. The transcollosal connection to the other side of the cortex, however, is interrupted by synaptic junctions, and thus this criterion was met for the contralateral hemisphere. Ipsilaterally the spine counts on the oblique and vertical portions of the dendritic tree were especially affected by the pairing aspect of the conditioning paradigm. Although the differences in spine counts were small, Rutledge believed that these results provided specific support for the hypothesis that synaptic spine growth was stimulated by the actual conditions of training that led to the acquisition of the skill and that they could not be attributed to simple stimulation effects. Thus he concluded that synaptic spine growth is a very likely correlate oflearning per se. The fact that trans-synaptic activation is also necessary for stimulation-elicited neural sprouting is also a significant and related result. It suggests that the kind of neural growth observed here is in some way closely associated with synaptic activation. The fact that the synapses are miniature neurosecretory organs may be also a part of this story. It is entirely possible that the contralateral sprouting as a result of electrical stimulation alone, and ipsilateral growth as a result of pairing, are both in some way "fertilized" by the chemicals secreted by the involved synapses. In spite of the somewhat modest statistical differences obtained for spine counts in this study, it is a particularly important contribution. Rutledge and his colleagues provided one of the few studies in the literature that examines in detail the relative contribution of two experimental variables-simple use and esues pairing-and concludes that it is not neuronal use alone, in general, but the synaptic activation associated with learning, in particular, that accounts for synaptic neuronal growth. An unusual opportunity to conclude this discussion of spine growth is provided by an exceedingly relevant study by Purpura (1974). Although all of the studies discussed so far have been carried out on infrahuman species of one sort or another, Purpura's study used a quite unusual experimental subject for this type of work-the human being. He compared the dendritic spine characteristics observed in retarded and normal youngsters. Figure 9-17 is a sample of his results from four different subjects. Plate Al in this figure shows dendritic spines from a normal 6-month-old infant. This drawing has been marked with three of the standard spine shapes commonly found in microphotographs of this tissue and are classified as thin (TH), stubby (ST), or mushroom-shaped (MS). Plate B1 is a dendrite from a normal 7-year-old child who had been killed in an accident. The spines are thick and more or less similar to those shown in AI. The other
A.
MECHANISMS OF NEURAL PLASTICITY
649
two plates, A2 and B2, are from severely retarded human beings. A2 is from a living 10-month-old infant taken by biopsy, and B2 is a postmortem sample from a severely retarded 12-year-old child. The difference between the spine shapes in the normal and retarded children is quite clear. The retarded children's dendritic spines have characteristically thin stalks and bulbous heads. The normal children have much shorter and stubbier spines. The spines are also obviously less abundant in the retarded children than they are in the normal children. Purpura, of course, is not able to relate these microscopic anatomical differences to the specific behavioral dificiencies in other than a general way. However, these data, as well as those of others who have found similar results (e.g., well
well
well
2
well
well
well FIG.9.17 Four cortical pyramidal cell neurons examined with the Golgi stain. (Al) is from a normal six-month-old infant; (A2) is from a retarded 10-month-old infant; (Bl) is from a normal 7-year-old child; and (B2) is from a 12-year-old retarded child. Note that the normal two children show stubbier and denser spines (classified as TH = thin; ST = stubby; or MS = mushroom-shaped); the two retarded children show a much sparser sample of exceptionally long and thin spine stems. (From Purpura, ©1974, with the permission of The American Association for the Advancement of Science.)
650
9. NEURAL CORRELATES: MECHANISMS, MODELS
Marin-Padilla, 1974), constitute a strong argument that the overall integrative ability of the brain's neural network can be modified by synaptic complexity in ways apparently closely associated with aspects of molar intellectual function. An obvious corollary of this hypothesis is that variations in the richness and complexity of this synaptic interconnectedness may be linked to variations in the ability to learn. This then concludes our survey of some of the hypothetical or observed changes occurring at the synapse as a result of the organism's experience that could conceivably affect the organization of the neural network. This discussion of synaptic spines is also a transitional one; it especially emphatically recalls our attention to the probabilistic nature of the synaptic net. In the light of the microanatomy so far discussed, we may conclude that it is the average spine density and/or quality that matters rather than the presence or absence of any particular spine. A related inference is that the presence or absence of any single neuron in the net is equally inSignificant. Rather, the organizational pattern of many neurons into an interacting net, as dictated by the synaptic interconnections, is the essence of brain's representation of mind, and changes in that pattern are the essential equivalents oflearning. The next question, therefore, concerns how this possible, plausible, and, in some cases, empirically observed synaptic plasticity produces the corresponding changes in the neural networks that are felt to be more directly linked to the molar aspects of the problem. Network organization and reorganization is, therefore, the topic of the next section of this chapter. c.
Possible Network Changes Associated With Learning
The following discussion deals with ideas that lie at the outermost boundary of speculation and credibility. The reason for this is simple. With the exception of a few instances in invertebrates (see, for example, the work of the Aplysia discussed in Chapter 8 and later in this section), there have been virtually no cases reported in the literature in which the individual actions of more than a few of the constituent neurons of an ensemble or network has been studied simultaneously. It is an oft-repeated axiom of this book that input-output relationships alone cannot possibly provide a unique solution to the problem of internal structure. Thus, almost without exception, each of the plaUSible network processes described are conceptual inventions based upon considerations of plausibility or of the activity of some model preparation far down the phylogenetic tree. Nevertheless, the exercise is useful because it reflects insights gleaned from laboratory experimentation and suggests what might be, even if it does not confirm what actually is. It should also be appreciated that, from a certain point of view, almost all of the preceding discussion of the possible mechanisms of synaptic plasticity is
A. MECHANISMS OF NEURAL PLASTICITY
651
irrelevant to the study of the representation of learning by neural networks. Although these mechanisms of synaptic plasticity are amenable to more direct investigation than the pattern's network organization, and they are exceedingly relevant to other conceptual levels, clearly they represent only the "technology" (i.e., the mechanisms) by which varying synaptic conductivity is effected. The mechanisms of altered synaptic conductivity themselves, however, are not the essence of learning. Rather, molar learning is much more likely encoded by changes in the state (i.e., the momentary pattern of organization) of the neural network than by the particular biochemical mechanisms that provide the means for the changes at each synapse. This is true in spite of the fact that study of network organization in vertebrates is beyond the scope of present research technology. It is possible that several different network states can represent or encode the same molar processes. Given this logical possibility and the empirical fact that few studies have ever been carried out on neural ensembles in a way that could differentiate among the many equally plausible mechanisms of learning, little can be said with assurance concerning the particular network changes involved in vertebrate learning or any other cognitive process. What can be done is to take advantage of the speCUlations of such neuroscientists as Kupfermann and Pinsker (I970) concerning the kinds of neural network changes that might underlie one particular type of learning-classical conditioning. In addition to the models shown in Fig. 9-2 that are based on variations in pacemaker response, Kupfermann and Pinsker have also invented hypothetical, simple neural networks that are reasonable models of the behavioral phenomena of classical conditioning based on synaptic plasticity. They emphasized, in the construction of these networks with varying synaptic conductivity, a very important feature of learning to which I have previously alluded only in passing; namely, that learning in general seems not to result simply from use. Rather, effective behavioral adaptation requires, in addition, some validation, success, or reinforcement of the response. Thus any proposed model of learning based on simple stimulus-response repetition or use alone is a less likely candidate than one requiring some kind of feedback of the utility of the response to the animal. A process like post-tetanic potentiation is not, therefore, of itself, likely to have all of the necessary features of a satisfactory model of learning. Kupfermann and Pinsker, therefore, add the important concept of specificity to their network model thus neurophysiologizing Neil Miller's definition of the behavioral aspects of learning presented on p. 517 in Chapter 8. Drawing upon their laboratory studies of Ap/ysia and some of the theories of plausible synaptic plastic processes proposed in the previous section, Kupfermann and Pinsker suggest four neural networks that might equally well serve as models for classical conditioning. Figure 9-18 shows the four different models. The first (Model A) is based upon the notion that prior to conditioning a CS and
652
9. NEURAL CORRELATES: MECHANISMS, MODELS
a ues are both required to activate an interneuron (INT). After repetitive pairing of the es and ues, the es acquires the ability to activate the interneuron. The interneuron, therefore, is assumed to exhibit postsynaptic plastic behavior at the point signified by the blackened synapse. The interneuron in Model A serves the important function of making the action of the es highly specific to one particular conditioning situation. In Model B, this interneuron is not present, but specificity can also be achieved by a different strategem. All that is required is that the responding neuron (R) be activated during the paired presentation of the es and the ues. Thus the two criteria-repetitiveness and reinforcement-for successful conditioning are met. The repetitive activation of the plastic synapse occurs, at least some of the time, in synchrony with the unconditioned response. The "successful use" of R thus becomes a factor in the modulation of the EPSP at the plastic synapse and in the increasing ability on the part of the es to fire R.
and
and
and
and
and
and
and and
and and
and
and
and
and
and
and and
and
FIG.9.18 Four different neural circuits, each of which is capable of modeling the type of learning called classical conditioning. In each case, the essential plastic effect occurs at the synapse indicated in black. See text for details. (From Kupfermann & Pinsker, ©1970, with the permission of Academic Press.)
A.
MECHANISMS OF NEURAL PLASTICITY
653
Both Models A and B produce response plasticity to the extent that they are capable of modulating the activity of the postsynaptic neuron (R). In each case the change in the synaptic efficacy is dependent upon the successful use of the R neuron. The synaptic changes produced, therefore, must be describable in terms of the postsynaptic receptor structure. In Models e and D, however, the critical (blackened) plastic synapse is altered presynaptically through the intervention of the ues. In Model e, as in Model A, an interneuron is present that plays a modulating role on the critical synapse, whereas in Model D the plasticity is a direct result of the ues acting on the neuron through which the es is conducted. The important point to be made by this discussion of these four alternative mechanisms is that they are all equally plausible network models. There is no present way to favor one model over any of the others as being more or less physiologically reasonable. Yet the situation modeled in this case is exceedingly simple. It involves only two or three "neurons." The number of equally plausible models, however, increases drastically with the number of neurons involved in the net. Networks that model realistic forms of vertebrate learning involve not three or four but thousands, millions, and even billions of neurons interacting through complex patterns of both inhibitory and excitatory connections. Is the situation, therefore, hopeless? Possibly not; One mitigating factor may lie in the fact that the units in Kupfermann and Pinsker's model can be thought of as representing ensembles of neurons collectively performing a single function rather than individual neurons. This brief digression into network theory may be concluded by noting again the enormous gap between any physiological observations of synaptic plasticity on the one hand, and specific biochemical mechanisms at a more microscopic level and network effects at a more macroscopic level, on the other. The links between these levels, whereby molecular changes lead to synaptic plasticity, synaptic plasticity produces network changes, and network changes produce behaviorallearning, are not yet understood. d.
A Note on Psychochemistry
The field of psychochemistry, which is concerned with behaviorally effective chemicals (drugs), is large enough to easily constitute another entire book. In this section, I only briefly mention some of the facts relevant to the effects of drugs on behavior. The reason for inserting these sections at this point, where they may seem to some readers to be irrelevant digressions, is that both drugs and hormones (the topic of the next section) are known to exert their influence at the level of the synapse. These chemicals modify behavior by thus modifying the connectivity of neural nets in much the same way that learning must do so. However, because the independent variables in a learning experiment are behavioral, we can never know what chemical processes are initially involved. However, when chemicals are used as independent variables, both macroscopic and microscopic analysis procedures can be used to describe the exact changes that
654
9.
NEURAL CORRELATES: MECHANISMS, MODELS
occur at the membrane level. The analogy that may exist between these chemical effects and learning, therefore, is too close to be avoided. Models of synaptic effects produced by drugs and hormones are highly suggestive evidence of what may be going on in learning at the molecular and membrane levels. Thompson (1975) classified psychocactive drugs into five categories: (a) sedatives and hypnotics, (b) stimulants, (c) anesthetics, analgesics, and paralytics, (d) psychotogenics (which induce psychotic symptoms), and (e) psychotherapeutics (which relieve psychotic symptoms). Table 9-1 lists various examples of these drugs and, as an aside, indicates which of them are addictive. The reader is referred to Thompson's book for a more detailed discussion of each of these categories. This list, though only partial, is a fair indication of the wide variety of chemicals that can affect mental states. It is axiomatic within the general monistic and reductionistic philosophy of modern psychobiology that any drug affecting behavior does so because of its effects on some aspect of neural activity. And, indeed, much is known about the biochemical action of several of these psychoactive drugs. Local anesthetics, for example, are known to act directly on the plasma membrane of peripheral neurons to reduce ionic transport. They thereby can directly interfere with neural conduction. Some psychoactive substances are also known to act on particular regions of the central nervous system. It seems likely, however, that most drugs that affect mental states act at the level of the most chemically sensitive parts of the central nervous system-the synapses themselves. Table 9-2, which has been abstracted from Dunn and Bondy (1974), lists a large number of drugs that are known to modify synaptic action by interfering with the release, diffusion, reception, or activation of transmitter substances. This table is organized in terms of the transmitter substances and indicates the drugs, the effect, and the presumed biochemical action of each where known. The important point is that the state of the neural network can be altered by the action of specific chemicals on synapses. Certain regions of the brain are selectively affected by certain drugs, because they are particularly rich in one or another kind of synaptic transmitter substance. On the basis of which transmitter is predominantly used, certain subsystems of the brain can be defined. One well defined system, for example, includes those regions that mainly use serotonin as their transmitter substance and thus are sensitive to any drug that affects serotonin metabolism. These regions are concentrated in the raphe nuclei located near the interface between the lower midbrain and the upper pons (Snyder, 1976). Acetylcholine use probably also defines a separate sys'tem but it is widely distributed throughout the brain and has not yet been satisfactorily demarcated. On the other hand, a dopamine system, including the substantia nigra and basal ganglia, among other nuclei, is sharply defined by the predominant use of this transmitter substance in its constituent synapses. A norepinephrine system has also been defined. Figures 9-19, 9-20, and 9-22 show lateral and dorsal views of the norepinephrine and dopamine systems, respectively. Be-
TABLE 9.1 Classes of behaviorally effective drugs. (From Thompson, ©1967, after Mcilwain, with the permission of Harper & Row, Publishers.) Drug class
Evidence of addiction?
Group
Example
General Barbiturates Bromides Chloral derivatives
Alcohol Phenobarbital Potassium bromide Chloral hydrate
yes yes no yes
Analeptics Nicotinics Psychotogenics
Pentylenetetrazol Nicotine Lysergic acid diethylamide Amphetamine Caffeine
no yes
Opium derivatives Cocaine Procaine Nitrous oxide Diethyl ether Chloroform d-tubocurarine
yes yes no no no no no
Marijuana Lysergic acid diethlyamide
no no
Mescaline Psilocybin
no no
Meprobamate Chlordiazepoxide Phenobarbital
yes yes
Tranylcypromine Imipramine
no no
Reserpine Chlorpromazine Amphetamine
no no
Sedatives and Hypnotics
Stimulants
S ym pa thomim etics Xanthines Anesthetics, analgesics, and paralytics
Analgesics Local anesthetics General anesthetics
Paralytics
yes yes
Psychotogenics Cannabis sativa Ergot derivative
Lophophora williamsii' Psilocybe mexicana
Psychotherapeutics Anti-anxiety: Propanediols Benzodiazephines Barbiturates Antidepressant: MAO inhibitors Dibenzazepines Antipsychotic: Rauwolfia alkaloids Phenothiazines Stimulant
655
TABLE 9.2 Drugs that affect synaptic conductivity classified in terms of the neurotransmitter on which they exert their effects. (Adapted from Dunn & Bondy, 1974, with the permission of Spectrum Publishers.)
Drug
Effect
Mechanism
lowers tissue ACh antagonist antagonists
blocks choline uptake blocks ACh release reversibly block ACh receptors
antagonists
bind ACh receptor
antagonist nicotinic agonist agonist
blocks muscarinic receptors mimics ACh at nicotinic receptors mimics ACh
(a) Acetylcholine hemicholinium botulinus toxin d-tubocurarine (active principal of curare), gallamine (Flaxedil) naja naja toxin, CX-bungarotoxin atropine, scopolamine nicotine carbachol (carbamylcholine) (b) Catecholamines CX-methyl-p-tyrosine (CXMpT) reserpine
depletes catecholamines
inhibits tyrosine hydroxylase
depletes catecholamines
diethyldithiocarbarna te
depletes NE
amphetamine
noradrenergic agonist
phenoxybenzamine, ergot alkaloids, phentolamine dichloroisoproterenol, propanolol haloperidol, spiroperidol, phenothiazines ( chlorpromazine, fluphenazine) lithium salts cocaine, amitryptyline, imipramine, desmethylimipramine (DMI) 6-hydroxydopamine
CX-antagonist
inhibits vesicular storage (see also serotonin) inhibits DBH by chelating Cu 2+ probably multiple: stimulates release and mimics NE blocks CX-receptors
{3 -antagonist
blocks {3 -receptors block dopamime receptors
depress NE deplete NE
unknown inhibit NE reuptake
destroys catecholaminecontaining neurons
unknown; probably taken up into catacholaminergic cells by the selective reuptake systems
(continued)
656
TABLE 9.2 (continued) Drug
Effect
Mechanism
depletes 5HT elevates 5HT depletes 5HT
inhibits tryptophan hydroxylase (and tyrosine hydroxylase) blocks 5HIAA efflux inhibits vesicular storage
agonist antagonist
inhibits reuptake block postsynaptic receptor
complex effects
may both block and mimic 5HT ation (LSD also inhibits the degradation of substance P) unknown but probably analogous to 6-hydroxydopamine
(c) Serotonin p-chlorophenylalanine (pCPA) probenicid reserpine (see also catecholamines) amitryptyline methysergide, cinanserin, cyproheptadine lysergic acid diethylamide (LSD) 5,6-dihydroxytryptamine
destroys 5HT containing neurones
(d) Gamma-aminobutyric acid hydroxylamine, amino-oxyacetic acid tetanus toxin
cerebral excitant
bicuculline picrotoxin
cerebral excitant cerebral excitant
increase GABA
inhibit GAD and GABA-T, but GAD less than GABA-T inhibits GABA (and gly) release blocks GABA receptors may block GABA receptors
(e) Glycine tetanus toxin strychnine
cerebral excitant cerebral excitant (convulsant at higher doses)
inhibits glycine (and GAB A) release blocks glycine receptor activity
antagonist
blocks glutamate receptor
agonist
competitively blocks reuptake
(1) Glutamate
glutamic acid diethyl ester glutamic acid dimethyl ester
657
NORADRENALINE
hypothalamus hypothalamus hypothalamus hypothalamus
(From (From
(From
(From
stria term. hypothalamus (From ventral bundle (From dorsal bundle
FIG.9.19 A lateral diagram of the rat's brain showing the major ascending norepinephrine system. (From Ungerstedt, ©1971, with the permission of Acta Physiologica Scandinavica.)
DOPAMINE
(From (From
hypothalamus hypothalamus
(From
(From
hypothalamus hypothalamus hypothalamus hypothalamus hypothalamus
FIG.9.20 A lateral diagram of the rat's brain showing the dopamine system. (From Ungerstedt, ©1971, with the permission of Acta Physiological Scandinavica.)
658
A.
MECHANISMS OF NEURAL PLASTICITY
NORADRENALINE
659
centralis centralis n accumbens
centralis
olfactorium centralis caudatus centralis
centralis centralis centralis
amygdaloid. centralis
rat's
centralis centralis centralis centralis centralis
rat's rat's
rat's
median eminence substantia nigra
rat's
subst. grisea centralis
rat's rat's
rat's
rat's rat's
FIG.9.21 A dorsal view of the rat's brain showing the norepinephrine system (on the left-hand side) and the dopamine system (on the right-hand side). (From Ungerstedt, ©1971, with the permission of Acta Physioiogica Scandinal'ica.)
cause of the localized sensitivities implied by the existence of these systems, it is certain that the application of certain of the psychologically active drugs selectively affect both specific nuclei and specific kinds of synapses. Drugs, therefore, can target particular regions of the brain in ways we did not previously have the power to accomplish. The various systems are not independent, however, and must interact among themselves. Groves, Wilson, Young, and Rebec (1975) have pointed out the existence of a supersystem, as shown in Fig. 9-22. Within this complex are found interactions among the inhibitory dopamine, the excitatory acetylcholine, and the inhibitory GABA synaptic systems. This supersystem may be collectively involved in both behavioral and motor deterioration typical of Parkinson's disease. To illustrate the current state of our knowledge concerning the details of the biochemical action of psychologically active drugs, let us consider two important papers dealing with synaptic opiate receptors (Snyder, 1975) and with the action of antipsychotic drugs (Iversen, 1975). These two papers were published
660
9. NEURAL CORRELATES: MECHANISMS, MODELS NEOSTRIATUM (and globus pallid us)
brain brain
brain
within
SUBSTANTIA NIGRA pars compacta
FIG.9.22 A "supersystem" within the brain composed of subsystems that use dopamine, acetycholine, and GABA as the transmitter substances, respectively. This group of interlocking subsystems forms a "self-inhibiting" feedback loop that may be the basis of the action of some of the antipsychotic and psychotropic drugs. (From Groves, Wilson, Young, & Rebec, ©1975, with the permission of The American Association for the Advancement of Science.)
together after jointly sharing the distinguished F. O. Schmitt prize for neuroscience research in 1975. Snyder's work deals with the action of poppy extracts, generically known as the opiates, on the nervous system. Specifically, he sought to determine where in the cell opiates work. It has been assumed that opium must be exerting its effects by being chemically bound to specific receptor sites on the neuron. But exactly where had not been known. Chemical analyses of the brain had shown the regions of the brain in which the highest level of opiate receptor binding occurs are the amygdala, the medial thalamus, and a number of the hypothalamic
A.
MECHANISMS OF NEURAL PLASTICITY
661
nuclei. Therefore, opiates seem to work selectively on the regions usually considered to be part of the limbic system. Within these regions, Snyder was further able to determine the portions of the constituent neurons that seemed to possess the highest level of opiate binding activity. This was done by analyzing the binding capability of various fractions of neuron preparations that had been broken up and separated by a special centrifuging procedure. Snyder reported that little receptor activity (i.e., opiate binding) was found in any fraction of the centrifuged neuron other than in the region of the synapses. Indeed, very little receptor activity is found in the portion containing the synaptic vesicles; only the centrifuged fraction containing synaptic membranes are rich in the chemicals that must be the receptor materials. Although he could not say with certainty whether the receptors are in the presynaptic or postsynaptic regions, Snyder felt it was likely that opiates exert their influence Gust as transmitter substances do) on receptor sites on the postsynaptic membranes. Some progress has been made recently in identifying the molecular structure of the opiate receptor. Simon, Hiller, and Edelman (1975), using a chromatographic technique, have shown that the opiate receptor is most likely a macromolecule with a molecular weight of about 370,000. This may also be a fairly good general description of the heretofore mysterious postsynaptic receptor site. Iversen (1975), in considering the action of the antipsychotic drugs, has gathered similar data. Drugs effective in treating schizophrenia, for example, are typically found to be most effective in changing neural activity in synapses using dopamine as their main transmitter substance. Iversen states that recent evidence suggests that antipsychotic drugs may possibly work presynaptically by increasing the rate at which dopamine is secreted. This suggests that some psychosis may, conversely, be the result of a deficiency in the release of this particular transmitter. Perhaps the fairest thing to say with regard to the action of drugs on mental states, however, is that although a little is known about the manner in which they exert their effects at the synaptic membrane and biochemical levels, absolutely nothing is known of the neural network reorganizations they produce. Although chemicals have been successfully used in therapy, and although theories abound concerning the relationship of particular kinds of brain chemistry and specific behavioral disorders (most notably the relationship between the catecholamines, like dopamine, and schizophrenia, and between the manic-depressive syndrome and the biogenic amines), the details of the relationships are not yet known to the degree that would allow us to say "we understand" what happens when these drugs are administered. For all practical purposes, it must be admitted that both psychotherapeutics and psychotogenics are used without understanding and without a theoretical basis of their behavioral effects. They are usually discovered accidentally and applied according to hit-and-miss procedures.
662
9. NEURAL CORRELATES: MECHANISMS, MODELS
Even the best research in the field of drug therapy for psychoses, therefore, is characterized by a "barefoot empiricism," and the most effective chemotherapies operate with little knowledge of the significant changes made in neural organization. Experiments reputed to have found specific chemical correlates of psychotic states [e.g., the reported low level of dopamine activity in schizophrenics, Wise & Stein (1973)] often turn out to be the results of unanticipated artifacts [e.g., the reduced levels of dopamine activity have been attributed to prolonged storage of the cadavers by Wyatt, Schwartz, Erdelyi, & Barchas, (1975)] . Some aspects of the chemistry of the synaptic effects produced by antipsychotic drugs, on the other hand, are relatively well-known. Figure 9-23 is a chart showing the relationship between the average clinical doses of a large variety of these substances and the concentrations that produce a 50% decrement in dopamine release by synapses. This figure is based on a study by Seeman and Lee (1975). Clearly, the clinical tradition, almost without realizing it, has resulted in dosage levels for a wide variety of drugs that seem to be highly correlated with a constant alteration of synaptic function. Nevertheless, there is no way to bridge the gap between this kind of knowledge of synaptic chemistry and knowledge of the network modifications that ultimately must correspond to the changes in behavior. When neurochemists examine the chemistry of the synapses, they are studying the technology of the neural elements. But this is a secondary aspect of the problem, and little progress has been made in understanding the essence of the main problem-the network state. Sadly, this particular field of clinical psychobiology, so important in its personal and social implications, has contributed little to a fundamental understanding of the relationship between the brain and mind. e.
A Note on Hormones
Another powerful class of chemicals affecting behavior includes the substances known as hormones. Hormones are chemical substances secreted by the endocrine (ductless) glands of the body and thus are transported about the body in the blood. They are powerfully effective in regulating a wide variety of behaviors, as well as metabolic processes, and conversely are also produced in amounts and kinds that covary with mental states. For example, injection of the gonadal hormones (androgens or estrogens) can induce reproductive behavior specific to each of the sexes, and sexual activity seems to regulate the amount of hormones generated. Removal of the source of these gonadal hormones by castration can lead to serious deficiencies in the organism's sexual behavior, as well as in its growth and maturation. The effects of individual hormones are not simple; there are complicated interactions, for example, between the androgens and estrogens; androgenic effects in males seem to depend upon an adequate
A. MECHANISMS MECHANISMS OF OF NEURAL NEURAL PLASTICITY PLASTICITY A.
5 100 10100
663 663
-
Blockade of dopamine release
P4657A Pipomp e rone
100 100 Thiol ida zin e
Ch I orpro'TIOZ j ne
Tlozodon e Spiroperidol
Metiapine
Clozopi ne
Perozine
Proc l-dorperoz jne
lenp e rone
Mol indon e Moperone
Tr if l uoperazine (+) But oc l amo l
T~iot~i;.cene
100 100
Haloperidol
Re~erpine
0: Flupe,,'hixol
Spiroperidol
Pirnozi de
Be'1pe ridol
Tr irlupe' ido l
100100
Spiroperidol
100
10
100
1000
10,000
Average clinical dose (mg/day)
FIG.9.23 A A natural experiment experiment in psychochemotherapeutics. psychochemotherapeutics. This This graph graph displays displays the the rereFIG.9.23 of various various antischizophrenic anti schizophrenic drugs drugs and and lationship between between the the clinically clinically effective effective dosages dosages of lationship their measured chemical ability to reduce the amount of dopamine released from the neoregion of the the rat's rat's brain. brain. Amazingly, Amazingly, the the chemical chemical strength strength isis linearly linearly correlated correlated striatal region with the clinically effective dose. (From Seeman Seeman & & Lee, ©1975, with the permission of American Association Association for for the the Advancement Advancement of of Science.) Science.) The American
supply of estrogen [see Roy & Wade (1975) for a good, brief, and up-to-date summary of this problem]. Another hormonal influence on behavior is becoming clear. Pituitary hor(ACT H) and melanocyte-stimulamones such as adrenocorticotrophic hormone (ACTH) ting hormone (MSH) are thought to be intimately related to the acquisition of skills in instrumental conditioning experiments. This is particularly true in cases in which the reinforcement is the consummation of some basic drive. Although it cannot be asserted with certainty that the effect of these hormones is not an indirect one (perhaps they work by altering motivational levels or by varying the amount of protein available for synaptic growth) these hormonal effects may be
664
9. NEURAL CORRELATES: MECHANISMS, MODELS
quite direct and specifically mediated by their influence on transmission chemistry of specific types of synapses. As in the case of the psychotherapeutic drugs themselves, there seem to be direct effects of some hormones on the brain, and it is most likely once again that these effects are mediated by interactions between the hormones and the synaptic transmitter substance. The brain is considered by many neuroscientists, therefore, to be a major hormonal receptor organ. For example, Fig. 9-24 shows the regions of the female rat brain in which estrogen concentrating neurons may be found. On the other hand, the brain is also a neuroendocrine secretory organ. The hypothalamus, for example, secretes a hormone, vasopressin, that can affect the secretion rates of other endocrine glands or affect the blood pressure directly. An especially interesting related possibility is that behavioral states may be associated with the production of hormones in a way that would allow assays of those hormones to be used as diagnostic tools in psychotherapy [see Carroll, Curtis, & Mendelf (1976a; 1976b), for example] . Once again, the situation is opaque at the organizational and informational level at which mental processes must be represented. Although relationships have been shown between hormones and behavior and between hormones and synaptic effects, the exact relationship between the specific synaptic effects and behavior eludes us.
B.
GLOBAL THEORIES OF THE NEURAL BASIS OF LEARNING
In the introduction to this chapter I pointed out that there was little disagreement among the various theories of the neural basis of learning with regard to their basic premises. I asserted that the various theories differed mainly in terms of the attention given to one or another level or aspect of the problem and not in terms of any particular controversy among alternative explanations within any level. In this section, this point is expanded upon, and two modern theories of learning, one proposed by Hyden and the other by Pribram, that stand at the extremes in terms of their different emphasis on molecular biochemistry and field interactions, respectively, are discussed. The key point of this discussion that the reader should keep in mind is that upon close examination, there is more agreement among those theories than there is disagreement. Each accepts the central biochemical role of RNA metabolism; each hypothesizes a synaptic locus of the plastic neuronal effects; each believes that neurons must be concatenated into large networks to achieve their meaningful molar function; and each accepts the basic concept that the global activities of these networks can only be measured by examination of the compound action potentials that are the equivalent of the statistical sum of the actions of the individual units.
B. THE NEURAL BASIS OF LEARNING
Isep \
nsl
Ih
Ih
/
665
pi Ih
Ih Ih
Ih
the
the
Ih
the the the
the
the
thethe
the the the the
thethe the the
FIG.9.24 This map of the brain shows the points at which estrogen binding sites have been located in the brain of the female rat. Abbreviations of the various nuclei indicated are: a = nucleus accumbens; ac = anterior commissure; aha = anterior hypothalamic area; arc = arcuate nucleus; cbllm = cerebellum; cc = corpus callosum; cg = central grey; db = diagonal band of Broca; dm = dorsomedial nucleus of the hypothalamus; f = fornix; fr = fasciculus retroflexus; h = hippocampus; ic = inferior colliculus; lh = lateral hanenula; lsep = lateral septum; mamm = mammillary bodies; mpoa = medial preoptic area; mt = mammillothalamic tract; nst = bed nucleus of the stria terminalis; ob = olfactory bulb; oc = optic chiasm; pf = nucleus parafascicularis; pvm =paraventricular nucleus (magnocellular); sc = superior colliculus; scp = superior cerebellar peduncle; tub = olfactory tubercle; vm = ventromedial nucleus; and vpm =ventral premammillary nucleus. (From Pfaff & Keiner, ©1973, with the permission of The Wistar Press.)
The two points of view to be discussed are not microtheories in the same way as those presented earlier in this chapter. Rather, although each has its own emphasis, the two examples now considered share the common characteristic that they both are more global attempts to describe the interactions among the severallevels of the problem. The most highly developed, clearly explained, and broadly conceived theory emphasizing the biochemical level is that proposed by Holger Hyden (1970). Most germane to the present discussion, however, is the fact that his theory transcends several levels from the most microscopic to the most macroscopic. Another example of a theory that crosses several conceptual levels is to be found in the work of Karl Pribram (e.g., Pribram, 1969; Pribram, Nuwer, & Baron, 1974). Pribram's model is based on a proposed analogy between learning and the optical hologram. This theory, which stresses the aggregate action of neurons, speculates about learning mechanisms that might use changing interference processes among waves of neural activity similar to interference patterns produced by optical systems as the basis of behavioral change.
666
9.
NEURAL CORRELATES: MECHANISMS, MODELS
Of the many possible neural models of learning and memory that could be presented, these two are the most useful, because the emphasis in each is among the most extreme in terms of the biological level emphasized. Hyden's emphasis is ultramicroscopic; Pribram's is just this side of the molar mental process itself. Discussing the details of these two alternative theoretical viewpoints reveals the contemporary consensus in sharpest contrast. 1.
Hyden's Theory
As mentioned in Chapter 8, since the early 1960s there has been considerable interest and research effort directed toward elucidating what seems to be an important role of certain macromolecules in the memory process. The basic idea behind this work is that some of the large molecules, also known to be involved in genetic information storage and reproduction, and in particular RNA t (transfer ribonucleic acid), may play important roles in mediating individuals as well as evolutionary memories. Although it is not certain, it seems likely that the idea was originally based only on the analogy between the two different meanings of the word memory-one defined in terms of the individual and one in terms of the species. Another more compelling influence, however, was the discovery that RNA t and certain associated enzymes did play important biological roles in protein synthesis and, thus, ultimately in all growth processes. When early theories suggested that synaptic plasticity may also be framed in terms of growth, another logical link in an inferential chain implicating these macromolecules in memory was formed. The fact that the brain seems to be especially rich in the variety of types of RNA molecules compared to other organs of the body (Brown & Church, 1971) is also another suggestive argument that this particular biochemical is deeply involved in brain function, in general, and memory, in particular. It is beyond the scope of this chapter to exhaustively consider the very large body of published reports in which biochemical changes have been shown to be associated with learning. The interested reader who wishes to look further at this problem should read two especially readable and thoughtful reviews by Agranoff (1974) and Horn, Rose, and Bateson (1973). Although both reviews strongly support the contributory role of the genetic macromolecules in learning, both also emphatically make the point that: "that macromolecular synthesis is involved in memory formation remains a hypothesis" (Agranoff, 1974, p. 618). This caveat is probably the key statement in this exceedingly complex field in which the biochemical and behavioral data have met and been compared. One excellent example of the meaningful ways in which biochemical studies of learning can be carried out at the molar level is illustrated in the work of Bernard Agranoff and Roger Davis of the University of Michigan (Agranoff, 1967;
B. THE NEURAL BASIS OF LEARNING
667
Davis & Agranoff, 1966; Agranoff, Davis, & Brink, 1965). They have shown that in the goldfish, formation of long-term memory seems to be selectively blocked by the injection of puromyacin. (Puromyacin is an antibiotic that is also known to act selectively to block the formation of new protein.) A fish that had successfully learned to avoid an electrical shock will act as if it were naive on subsequent retesting if this protein formation-blocking antibiotic is injected immediately following the original training. In all other regards, the fish appeared to be perfectly normal. There is no depression of the general metabolic functions (or activity) or any reduction of the fish's ability to initially learn the task. By selectively injecting puromyacin at different times following the training, a critical time period, presumably closely related to the time at which the consolidation of short-term into long-term memory occurs, can be identified. The consolidation was found to be complete in about 2 to 3 hours. The point of these experiments is that they suggest that the consolidation process probably does involve the creation of new proteins in some manner and thus RNA t . This conclusion, it must be repeated, does not mean that the memory itself is stored in the atomic configuration of a macromolecule, but rather that some aspect of a protein synthesis process (possibly including the growth of new synaptic membranes or enlargement of old ones) is involved in the consolidation of long-term memories. On the other hand, macromolecules have been incorporated into learning theory in some ways that are now almost universally appreciated to be erroneous. Those psychobiologists who suggest that the macromolecules formed during learning are constructed with specific amino acid sequences in accord with the experience, and that these amino acid sequences represent a readable and specific code for experience, have a difficult row to hoe! I have already mentioned this work in Chapter 8. In general, any hypothesis that purports that memories can be chemically extracted and transferred with a high degree of specificity to other animals is not held in high regard by contemporary psychobiologists. Although the idea is seductively attractive and has attracted a large amount of popular attention, there is even less scientific proof for this extreme macromolecular coding version of the biochemical hypothesis than for the more general one that assumes merely that some sort of biochemical synthesis is in some undefined way involved in learning. For the reader who wishes to pursue this topic further, two useful sources that present more positive arguments for the controversial idea of memory transfer are Adam (1971) and Ungar (1970). Having briefly reviewed the relevant background, I can now discuss the kind of biochemical approach that is acceptable to contemporary psychobiology. I consider as my example of an acceptable model what is certainly the most specific and highly developed theory of biochemical involvement yet proposed. Holger HyMn (Hyden, 1970) has outlined a theory of memory that mainly emphasizes the RNA changes occurring within the brain as a result of learning.
668
9. NEURAL CORRELATES: MECHANISMS, MODELS
The particular training paradigm he used to carry out his related empirical studies was an ingenious one. It took advantage of the anatomical organization of the rat brain to provide intrinsic controls. Rats, Hyden observed, like people, tend to be either dominantly right- or left-pawed. In rats particularly, the sensory-motor regions that control each paw are found almost entirely in the contralateral portion of the brain. Thus if an animal is trained to use the left paw on some task, Hyden suggested that any induced biochemical changes would be mainly found only in the right hemisphere. The left hemisphere, he felt, should be mainly unaffected with regard to its chemical constituents by the training trials. Although this premise of the study is open to criticism, as is the assumption that the effects of learning should be localized, it did turn out in this case that, upon analysis of brain tissue from the contralateral experimental and ipsilateral control areas, substantial differences in the RNA content were observed. Not only did the average amount of RNA increase from about 22 /J.I1g to 31 J1J1g within each neuron from the contralateral side of the rat's brain, but there was also a substantial and significant change in the relative amounts of the RNA produced. The ratio of the four bases, adenine, guanine, cytosine, and uracil, which carry the code for the amino acids that are the building blocks in protein synthesis, changes drastically as a result of the training but only on the contralateral side of the brain. The changes in the ratios of the various bases are shown in Table 9-3 for the several conditions of Hyden's experiments. Clearly, Hyden's data indicates that learning is associated with changes in the biochemistry of related brain areas even if it is still impossible to say exactly what the role of the changes is in mediating synaptic plasticity. Hyden's major contribution is the specific hypothesis he generated to explain the nature of these RNA changes and the possible link they may provide between the biochemical level and the individual neuroelectrical level. His theory also includes possible links with the ensemble network level and, in a more speculative manner, even between the ensemble and molar behavioral levels. Before discussing the details of Hyden's theory, it is important to explicitly point out that this model, although involving structural specificity at the molecular level, is completely different in orientation than the molecular storage model embodied in the transfer experiments. The transfer theory specifically postulates that patterns of neural excitation produce macromolecular configurations that are themselves the repository of the engram. The learned behavior can be transferred between animals, it is asserted, because the necessary and sufficient information is encoded within the structure of the macromolecule by its specific structure. Hyden's approach is quite different, and his disdain for the other kind of molecular theorizing is quite explicit. He says (Hyden, 1970): "At this point, I would like to stress that no data support the view that brain cells contain mech-
B. THE NEURAL BASIS OF LEARNING
669
TABLE 9.3 Shifts in ratio of RNA bases produced as a result of learning. (From Hyden, ©1970, with the permission of Academic Press.)
Controls Mean Adenine Guanine Cytosine Uracil
18.4!.48 26.5 ! .64 36.8 ,:f: .97 18.3 !.48
A + G C + U
0.81 ! .27
G + C A + U
1. 72 ! .054
Change in percent
p
!.11 ! .90 !.75 ! .56
+9.2 +8.3 -14.4 +7.1
.02 .01 .01 .05
.95 ! .35
+17.3
.01
1.51 ! 0.26
-12.2
.02
Learning ~~
20.1 28.7 31.5 19.6
anistically taping 'memory molecules' that store information in a linear way. This is biological nonsense [po 106]." Hyden's concept of the role of RNA, therefore, is quite different in essentials from that of the transfer theorists. He accepts the fact that biochemical changes are induced in RNA molecules but asserts that these changes represent the realization of preexisting and genetically determined structural patterns that are intrinsic and are not created as a specific result of the experience. But how is this selective molecular production guided and controlled? The answer to this question is the essence of Hyden's theorietical contribution. He bases his answer upon quantitative estimates that only a small portion of the genetic information (the genotype) contained within a neuron is actually embodied in its molecular structure (the phenotype). Hyden suggests that portions of this extra encoded information, genetically available to produce proteins, are activated by the pattern of neural activity of the neuron. Thus specific new RNA sequences are not created by the learning experience as a result of training, but rather, existing RNA (Le., within the genotype) is triggered to produce a specific phenotype protein. The specific nature of the produced proteins governing synaptic conductivity determines how and when the neuron will respond when subsequent input stimulus patterns are received. Ultimately, a richer than normal variety of RNA will be reflected in a richer than normal (and perhaps different) supply of protein molecules. These proteins determine the functional relationships between neurons and thus the interconnectivity characteristics of the neural network.
670
9.
NEURAL CORRELATES: MECHANISMS, MODELS
An important corollary of the premise that the genotype is only partially expressed in the phenotypic RNA synthesis, Hyden asserts, is that the particular phenotypes that are produced can be determined by the electrical and chemical environment of the gene. The particular environmental mechanism, which Hyden suggests to be capable of regulating which of the possible RNA codes will be actually expressed, is a change in the ionic equilibrium across the synaptic membrane produced by perturbations in the local electrical fields. These fields, in turn, are produced by the prevailing patterns of neural activity. Thus, his chain of logic is: When a neuron is stimulated to respond with action potentials, these action potentials produce electrical fields that produce ionic equilibrium changes that select the specific phenotypic form of the RNA that will be produced from among all of the possible genotypic possibilities. The RNA, in turn, is able to direct the production of particular proteins, whose structure is directly associated with determining the subsequent patterns of neural activity by virtue of their involvement in synaptic conductivity and stimulus selectivity. In particular, the produced proteins condition synapses to respond only when certain input conditions are met. Therefore, whether the neuron will respond in a retrial depends upon the key (the pattern of incoming neural activity) fitting the lock (the protein regulated synaptic sensitivities). An important advantage of this model is that each neuron is not limited to respond to a single pattern of activity. Rather, it is capable of participating in the action of a large number of neural nets to the extent that the appropriate proteins for each net have been manufactured within the cell and have conditioned the appropriate synapses. Glia play an unusually important role in Hyden's theory. Glia have convp,ntionally been thought to play merely a supportive (both mechanically and metabolically) role in the brain. Hyden, however, noting that glia are particularly efficient producers of RNA, suggested that there may actually be some transfer of appropriate RNA from glial cells to neurons after neural activity has stimulated the manufacture of the RNA in the glia. Finally, Hyden suggested that three criteria must be met before a neuron will fire in a subsequent retrial situation that recalls information from the engram. First, the electrical pattern produced by the incoming activity must be appropriate to the original activity-produced specific proteins; second, the appropriate proteins must be present; and third, and most speculatively, there may actually be a requirement that some additional RNA be transferred from the glia to the neuron at the moment of retrieval. Figure 9-25 graphically depicts the various elements of Hyden's theory. In summary, Hyden has proposed a model that emphasizes molecular changes to account for the plastic changes in individual neurons that produce selective functioning of a neural network. It is not a theory of molecular information storage per se (he does not propose that information is stored in molecules),
B. THE THE NEURAL NEURAL BASIS BASIS OF OF LEARNING LEARNING B.
671 671
Amount of ~ub~ t once
sketches
Amount of ~ub~ t once
Gila RNA tro n~ter
S ,multone ou~
occurrenceot the3v(l!Oo bl es Q've~ oct ,vOl,on otpro Te,n tfon~m l lll' r
Ner ve cell RNAt spec p'!2Z.0-~J TelescJpe f'lln~0-200~ Lifht S;)/cate crown 0-5U:.
1.50
Fusect quartz
1.45 00rife 400
G
F 500 D600 C Wave Length (m"u.)
700
FIG. 5.31. The refractive index of various glasses as a function of the wavelength of the incident light. (From Hardy and Perrin, © 1932, with the permission of McGraw-Hill, Inc.)
C. PERCEPTUALLY SIGNIFICANT OPTICAL PROPERTIES OF THE EYE
331
color effects observed in a water droplet or a prism due to the different indices of refraction that are exhibited when various wavelengths of light pass through those materials are also observed when light passes through the media of the human eye. Chromatic aberrations are thus anomalous optical processes in which different wavelengths of light are brought to focus at different distances from the lens. This process is illustrated in Fig . 5-32. The eye 's chromatic aberration can lead to some peculiar but little known perceptual phenomena. Colored objects may be differentially magnified by the differing indices of refraction exhibited by the lens, and colored fringes may be created around a point source of light. Perhaps most amusing and dramatic, however, is the fact that chromatic aberrations can lead to differential depth effects as the images of different colored objects are diverted to fall on different corresponding points in the two eyes. The most amusing of these chromatostereoscopic effects is the oft-told and probably apocryphal story of the little old lady who thought that a miracle had occurred-all the red letters were standing out above the pages of her Bible! The miracle was, of course , the result of chromatic aberration, a phenomenon that most scientists would consider to be more of a nuisance than a miracle. Chromatostereoscopic phenomena go under other names. Helmholtz (1856) referred to this same effect as the "fluttering hearts phenomenon. " Red stimuli on a blue background appear to float in front of the background. However romantic, the effect is due to exactly the same transformation--chromatic aberration-as the somewhat more theological version just described . Research on chromatostereopsis has been reviewed by Oyama and Yamamura (1960), who have also shown that the effect is independent of stimulus luminance but quite dependent on the purity of the chromatic stimuli . They also showed a decline in the separation (in depth) of different colors when color-blind observers were used. 11 Chromatic aberrations in the eye may be measured in several ways. Psychophysical tests have traditionally been used (Bedford & Wyszecki, 1957). In recent years , however, increasing use has been made of objective measurement procedures using opthalmoscope-like devices (Bobier & Sivak, 1978; Charman & Jennings, 1976) . In these objective procedures, direct measurements are made of the light reflected back from the retina as a function of the stimulus wavelength. Bobier and Sivak, for example, found that there could be as large as 0.8 diopter difference in refractive power of the eye for red and green lights, and my colleague, Daniel Green of the University of Michigan, suggests that the difference may be as great as 2.8 diopters for red and blue lights under certain "I should also note that Oyama and Yamamura (1960), at least at the time their paper was published , were among the few who did not believe that the sterochromatic phenomenon was due to chromatic aberration. They attributed it to a putative variation in "color sensation, " basing this conclusion on the decline of the effect in color-blind observers.
332
5. PRENEURALPROCESSES
~oo ~oo ~oo ~oo
Lens FIG. 5.32.
Ray paths of a lens displaying chromatic aberration.
conditions. Obviously, chromatic aberration is not a negligible process even if it is evidenced only in obscure instances such as those just described.
3. The Effects of Pupil Size The diameter of the pupil of the human eye is controlled by a pupillomotor reflex that depends in part on the level of the incident light, the level of mental activity of the subject, and other psychological factors. Some factors influencing pupil size are stimulus-related, and some are controlled by higher-order cognitive interpretive processes. Alpern and Campbell (1962), for example, have studied the influence of stimulus luminance on pupil size; they observed that both the rods and the cones contribute signals that lead to the control of pupil diameter by demonstrating that the peak of the spectral sensitivity of the pupillary reflex occurs at about 535 mm-a value that is intermediate between the scotopic and photopic sensitivity curves. The relation between stimulus intensity and pupillary diameters is shown in Fig. 5-33. On the other hand, it has been shown that the pupillary diameter is also affected by mental arithmetic tasks (Kahnemann & Beatty, 1966) and general intelligence (Ahem & Beatty, 1979). Hess (1965) has demonstrated that the diameter of an observer's pupils is also affected by such intangible factors as the attractiveness of a member of the opposite sex or, even more surprisingly, simply by the diameter of the pupils of the person who is being viewed. Hakerem and Sutton (1966) have shown that the level of vigilance demanded of a subject also affects the pupillary diameter in a way that is totally independent of whether or not a visual stimulus is present. Painful stimuli will also evoke pupillary responses (Bender, 1933). Thus it is clear that the diameter of the pupil actually is an indicator of a complex response to many exogenous and endogenous factors at the several different processing levels of the taxonomic theory that I present in this book. Over the last 20 years, Lawrence Stark of the University of California and his colleagues have presented a compelling case that the pupil can be modeled as a
C. PERCEPTUALLY SIGNIFICANT OPTICAL PROPERTIES OF THE EYE
333
8
-EE
7
'0.
5
:::l
a.
6
'0 4 ~
Q)
+-
Q)
E
c
CI
3
2 -1 -5 -4
-3 -2
-1
0
-1
-1 -1
Log luminance FIG. 5.33. Variation of the pupil diameter with light intensity. (This figure is a rough approximation, as great individual differences are always present.)
nonlinear dynamic servomechanism. (Stark, 1964, 1968; Stark, Campbell, & Atwood, 1958; Terdimen, Smith, & Stark, 1969; Usui & Stark, 1978). Thus, though the influences on pupil size are manifold, the sum total is a smoothly operating system. We now appreciate some of the complexity of this system even if we do not fully understand all of the involved neural pathways. Regardless of the origin of the forces and the neuromotor mechanisms that regulate the diameter of the pupil, it is obvious that the pupil is an important link in the optical chain within the eye and that its diameter can significantly contribute to changes in the quality of the retinal image. The size of the pupil is particularly important in accentuating any intrinsic refractive errors of the various components of the eye. It is for this reason that an artificial pupil the size of the most tightly constricted natural pupil or a maxwellian optical system in which all light rays pass through the center of the pupil are desirable adjuncts in any experiment in which measurements are made of visual acuity as well as in those in which it is necessary to control retinal illuminance. For any eye that is not perfectly refracted, the size of the blur circle due to optical aberrations 12 produced by a theoretically perfect point source of light will 120f course, there will also be diffraction-induced blur, growing as the aberration-induced blur diminishes with reduced pU·,Jil size.
334
5. PRENEURALPROCESSES
depend on the pupil size. Optical scientists have derived the following relation for the effect of the pupil size on the blur circle size: . I d' BIur CIrc e Iameter
=
pupil diameter x diopters of refractive error total dIOptnc power of eye (5-4)
.:......=------.----'.'-----~-----
Thus if the eye is perfectly corrected (i.e., the refractive error = 0 diopters), there will be no blur, other than that due to diffraction, no matter what size the pupil is. On the other hand, if there is any refractive error, then the size of the pupil will be a major factor in determining the size of the blur circle. The nature of this equation also suggests that even a badly refracted eye can produce relatively sharp vision if the pupil is small. It is for this reason that highly myopic or hypermetropic viewers can obtain much clearer vision by looking through a pinhole. It is for this reason also that as the light dims at the end of the day and pupillary reflex expands the pupil diameter, visual acuity typically diminishes. This effect is known as night myopia. However, low luminance as such is not a necessary cause in this situation. If, for example, the pupil is dilated with drugs or if the contrast of a sinusoidal stimulus is reduced (as done by Green & Campbell, 1965), the same sort of reduction in visual capability occurs. Low luminance is not, in other words, the immediate antecedent of night myopia. Factors that increase blur or contrast are the immediate causes, and "night" myopia can occur even in relatively bright lights. Pupil size also affects the range of depths over which clear vision can be obtained. Images are not in focus at all distances from the eye when it is accommodated for clear focus at anyone depth. In general, the wider the pupil, the smaller the depth of focus. Specifically, this relationship is described by the following equation: _ acceptable blur circle diameter x focal distance D h f +" (5 -5) ept 0 lOCUS pUpI'1 d'Iameter
4. The Point Spread Function and the Modulation Transfer Function All the geometrical optical aberrations discussed so far in section C result in the degradation of an image (of point, line, or object) projected on the retina, by spreading the region onto which the incident light falls. The particular form of this degradation may be measured in several ways. The two most often used metrics of less than perfect retinal imaging are the point spread function \3 and the BIn the discussion that follows I use both the tenns point spread function and line spread function. For our purposes these two tenns may be considered to be identical. The cross section of the line spread is a good approximation to a point spread. Fonnally, the line spread function is simply the integral of the point spread function in the direction of the line. Practically, one obtains a point spread function if one's stimulus is a point, and a line spread function if one's stimulus is a line.
C. PERCEPTUALLY SIGNIFICANT OPTICAL PROPERTIES OF THE EYE
335
modulation transfer function. The point spread function is a measure of the spread of the image of a point source of light. The modulation transfer function is the measured reduction in contrast of a spatial sine-wave stimulus pattern; that is, the decrease in the relative intensity of illumination of the peak and trough of waves of the projected spatial sinusoid on the retina due to the aberrant spread in the light. Before I discuss the details of these two means of measuring the effect of optical aberration on image quality, it is important to make two preliminary points. The first is that the point spread function and the modulation transfer function can both be objectively measured in a way that does not require any psychophysical judgment on the part of the subject. The devices used to measure the objective optical properties of the images on the retina can be an opthalmoscope and a photometer and do not necessarily involve any psychophysical procedure. Indeed, it is not even necessary that the subject in such an experiment be alive! One early study (DeMott, 1959) of the point spread function was actually carried out on the en nucleated eye of deceased cattle. A different kind of point spread function and modulation transfer function can, however be measured psychophysically. It is possible to ask subjects to rate or discriminate the degree of blur of the image by specifying tasks for them to perform that require acuity judgments of patterns of small dots. It is also possible to present subjects with sinusoidal light patterns and to determine the degree of contrast that must be present for the grating to be perceived as a grating for different spatial frequencies. In either case the entire perceptual system is being used as a measuring instrument. This procedure, however, does not determine the degree of retinal image degradation as does the ophthalmic photometer used in the objective procedure. The psychophysical procedures measure the overall effect of the entire visual system. And indeed the two curves are very different; the objective test produces a monotonic function while the subjective one peaks at a central value (see Fig. 5-37). The important principle here is that the objective and psychophysical procedures do not measure the same thing! The objectively measured point spread function and modulation transfer function are exclusively measures of the Level o transformations that have been imposed upon the image by the optics of the eye. They say nothing about the neural or psychological processes that may be involved in establishing the related perceptual phenomenon; they are simply excellent descriptions of the properties of the proximal stimulus. The psychophysical measurements, on the other hand, measure these optical effects in combination with the effects of other levels of processing. These higher levels of processing can further degrade the ability of the eye in tasks that involve visual acuity and the detection of spatial sinusoidal patterns. Surprisingly, however, they can also reduce the impact of quality reduction by compensating contour intensification processes. In a later chapter I discuss the modulation transfer function and related spatial interaction effects as measured psychophysically.
336
5.
PRENEURAL PROCESSES
Any differences between those data and the findings described here must be interpreted as the additional effects of other higher levels of psychoneural processing. The second preliminary point to be kept in mind is that the point spread function and the modulation transfer function are totally equivalent in terms of the information that they represent. The modulation transfer function can be directly obtained from the point or line spread function by performing a Fourier transform upon it. Thus, neither method will produce any "better" description of image degradation than the other. One of the methods of measuring image degradation may be more convenient than the other in a particular instance, but neither is inherently superior to the other. Because the point spread function is somewhat more intuitively direct, let us consider it first. The image of a point source, as noted previously, will not be perfectly imaged on the retina as a point if the eye exhibits any optical aberrations; that is, the image will be blurred (spread out) on the retina if any of the von Seidel lens equations do not balance or if the retina is not exactly at the focal distance of the ocular duplex focusing system. The task faced, when one attempts to measure the point spread function, is then how to evaluate the image produced on the retina by a theoretically perfect point light source. Note that I have used the word theoretically here. In fact, one need not have a perfect point source to carry out this experiment. An equivalent to a mathematically or theoretically perfect point source can be approximated by a less than real physical point source placed at a distance such that it subtends a visual angle of less than the one-half minute of visual angle, which is the average cross-sectional diameter of a foveal cone. For all practical purposes, even if the optics of the eye were perfect, no smaller point source could produce an image that would have any greater physiological significance. It now seems certain that a given number of quanta crammed into even as small an area as .0001 of a minute of visual angle would produce exactly the same neural effect as the same number of quanta distributed over a full half-minute of visual angle. A theoretical or mathematical point, therefore, is established by any stimulus that subtends a visual angle less than the width of a photoreceptor. The first objective measurements of the spread function in human eyes were made only recently by Flamant (1954). She used a graphical means of reconstructing the reflected image on the retina obtained with an opthalmoscope. Her measurements led to the discovery that the spread function, although not negligible in the human eye, was not as bad as might have been expected, given the previously assumed sloppy optics of the cornea and lens. This good report card for the optics of the eye has since been replicated a number of times (Campbell & Gubisch, 1966; Krauskopf, 1962; Westheimer, 1963; Westheimer & Campbell, 1962, among others). In each case the eye was shown to have impressive optical properties and to display a spread function that was quite modest considering the assumed lack of homogeneity of the optical media of the eye.
C. PERCEPTUALLY SIGNIFICANT OPTICAL PROPERTIES OF THE EYE
337
The actual shape of the blurred image produced by a real eye will depend on the dioptic details and the angular subtence of any object greater than one-half minute of visual angle. For a optimumly refracted (emmetropic) eye and a close approximation to a theoretical point source, the retinal image will be distributed over approximately 2-3 min of visual angle by residual optical aberrations. For a somewhat larger stimulus (for example, a 1 .6-min-wide line), the optical image will be spread over approximately 6 min of visual angle. In both cases , the width of the point spread function will depend on the size of the pupil in the manner described in the previous section. To measure the spread function, Krauskopf (1962), for example, set up an elaborate opthalmoscope that projected a stimulus line (1 .6 min of visual angle in width) onto the retina. He then photometrically measured the reflected image. Because the reflected image consisted of light rays that had passed through the optical system of the eyes twice, the actual retinal light distribution had to be reconstructed by taking the inverse Fourier transform of the product of the individual Fourier-analyzed components of the original stimulus and the square root of the calculated sine-wave responses for each frequency . 14 This technique is based on several assumptions . For example, the success of such a calculation depends upon the validity of the assumption that the optical pathway is symmetrical and reversible (i.e., that the effects of the aberrations will be much the same for the light entering the eye as for the light leaving the eye). Using this assumption , Krauskopf computed what the distribution of the light must have been on the retina from the measurements that were made of the light emerging from the eye. These reconstructed spread functions (which are cross sections of the light distribution from a line rather than exactly of a point in this particular study) are shown in Fig . 5-34. The several curves in this figure are parametric with the diameter of the pupil. The 1.6-min-wide physical stimulus has been spread by the optical properties of the eye into a wider blur region that varies from about six min of visual angle for a 3 mm pupil to about 12 min for an 8-mm pupil. Of course, the decrease in the point spread function with decreasing pupil size shown in Fig. 5-34 is only part of the story . At the same time that the optical aberration effects are decreasing the effects of diffraction are increasing and the minimum size of the point or line spread function actually is at a minimum at about 2.4 mm where the two curves cross over. Not surprisingly, it is at about this diameter also that visual acuity is greatest (S. Shlaer, 1937). A powerful property of the calculated spread function, as Krauskopf points out , is that it makes it possible to compute the retinal image produced by any stimulus . To do so, one must carry out the mathematical transform known as convolution. which is represented by the following equation: '4The reader should consult Krauskopf (1962) for details of how this procedure was actually carried out.
338
5.
PRENEURAL PROCESSES
§ ..,§
I§ ~§
..J
C
U ell
~§ ..,
..J
11:0
~
~
~
5
0
5
~
~
VISUAL ........ E - lllIUTES Of' .. 1IC
l(x,y)
=
+.r J+Y Jx', x', -00
-00
~
FIG. 5.34. Effect of pupil size on the retinal line spread function. The set of parametric figures presented here depicts the results for (from bottom to top) pupil sizes of 3,4, 5, 6, 7, and 8 mm. (From Krauskopf, © 1962, with the permission of the American Institute of Physics.
PSF(x - x', y - y') X O(x',y')dx' dy'
(5-6)
where lex, y) is the luminance distribution on the retina produced by O(x, y), the luminance distribution of the stimulus object, PSF(x, y) is an analytic expression for the point spread function 15 and x' and y' are shifted coordinates. Though this sounds complicated, the convolution can be thought of as nothing more difficult than a point-by-point multiplication of the point spread function and an inverted version of the stimulus. Having carried out such a calculation, one l5For those interested, PSF(x, y) for the normal human eye has been approximated by Flamant (1954) and by Westheimer and Campbell (1962) as being represented by the expression e- kd . For the cat the point spread function was found by Bonds (1974) to be of the form I /(d/k) 512 , and by Robson and Enroth-Cugeli (1978) as II I + (dlk) "12, where d is the distance from the center of the image and k is a constant in all cases (and e is the base of the natural logarithm).
C. PERCEPTUALLY SIGNIFICANT OPTICAL PROPERTIES OF THE EYE
339
approaches much closer to an exact specification of the proximal retinal stimulus against which to anchor the perceptual process than is possible knowing only the physical properties of the stimulus. As Flamant had originally discovered, it also turned out in Krauskopf's experiment that the optics of the eye were rather good. The latter's measurements suggested that the spread of the image due to optical aberrations was probably less than the spread due to neural convergence in many portions of the retina. Furthermore Bonds, Enroth-Cugell, and Pinto (1972), have neurophysiologic ally confirmed that the convergence of receptors onto the ganglion cells as measured by the size of their respective fields in most parts of the eye is much larger than the point spread function in the cat's eye. A similar outcome has been obtained in a psychophysical experiment reported by Campbell and Green (l965b). Using an image projected on the retina with a laser, they determined that the optical properties of the eye were far better than suggested by the conventional ophthalmoscopic methods using noncoherent light. Obviously there are also retinal effects contributing to the reduction in image quality. These two contributions to the spread function-the optics of the eye and those of the retina-have also been teased apart by Gorrand (1979). He used a new method of measuring the modulation transfer function (to be discussed later) to separately measure the scattering in the retina and the aberration due to the optics of the eye. In this way he was able to attribute to each its proper role in image degradation. The improved estimates of the quality of the optics of the eye does not mean that we now think they approach perfection. In recent years more direct means of measuring the point spread function have been developed that do not require the assumptions of the indirect mathematical transformation methods based on data obtained during a double transverse of the light nor the additional complexities of a psychophysical judgment. For example, Robson and Enroth-Cugell (1978) developed a technique for inserting a tiny optical fiber into the eye so that the light distributions could be measured directly at the retina. Although they did confirm that the light intensity produced by the aberrations of the ocular optics did diminish quite rapidly (most of the light was contained within a few minutes of arc as previous workers had found), they were also able to establish that some lesser amount of light could be measured as far out as 20 deg from the center of the image. The alternative means of specifying the optical properties of the eye is to plot its Spatial Modulation Transfer Function (SMTF). The SMTF is a graph that relates the spatial frequencies (into which any spatial pattern can be analyzed by a two-dimensional Fourier analysis-see Weisstein, 1980 for a good introductory tutorial on this topic) to the relati ve ability of the optics of the eye to maintain the contrast between the peaks and the troughs of the consistuent spatial sine waves of differing frequency. The contrast or contrast ratio (CR) is conventionally defined as
340
5. PRENEURALPROCESSES
C
R _ peak height - trough height - peak height + trough height
(5-7)
To make this concept clear, consider three different spatial sine waves that, for example, have spatial frequencies of 1, 10, and 20 cycles/degree of visual angle, respectively. The lowest frequency spatial sinusoid (l cycle/degree) will pass through the optics of the eye with relatively little degradation. The peaks and troughs are so far apart that they are affected very little by the optical aberrations of the eye. However, the high-frequency (20 cycles/degree) spatial sine waves tend to lose their contrast as they pass through the eye. The dark troughs tend to have some light refracted, diffracted, and scattered into them (and thus to be less dark), and the light peaks tend to have some light refracted, diffracted, and scattered out of them (and thus to be less light) by the optical aberrations of the eye. Intermediate values will be affected to some intermediate degree. Thus, to a first approximation, the reduction in contrast depends directly on the spatial frequency of the stimulus; the higher the frequency, the greater the reduction in contrast. The SMTF is the function representing the specific relationship between the contrast ratio and the frequency of these spatial sine waves. Ideally the SMTF would be measured by applying a family of spatial sine waves to the eye and measuring the contrast for each presented spatial frequency at the retina itself. In actuality one usually has to reason backwards in exactly the same way as Flamant and Krauskopf did in their pioneering studies of the point spread function because it is difficult to place measuring instruments on the retina itself (despite the highly unusual accomplishment of Robson and Enroth-Cugell, 1978). Thus the usual way to determine the objective SMTF from the details of the known stimulus pattern and the measurements of the doubly transited reflected image requires virtually the same kind of mathematical manipulations just described for the point spread function. Figure 5-35, based upon research by 1.0
0.8
.Q
1'1
0.6
~
~ 0.4 0.2
effect effect
effect effect
o
0.2
0.4 0.6 Cycles / min of arc
0.8
FIG. 5.35. The effect on contrast of the spatial frequency of the stimulus pattern. This is the objective spatial modulation transfer function. (From Westheimer, © 1972, with the perf!lission of Springer-Verlag, Inc.)
341
C. PERCEPTUALLY SIGNIFICANT OPTICAL PROPERTIES OF THE EYE
x
c
A,
5,
A2
E
P A3
L, T
52
L2
1M FIG. 5.36. The optical equipment used to measure the objective spatial modulation transfer function. X, Xenon arc lamp; C, collecting lens; AI' aperture; SI' source slit; A., aperture; P, beam-splitting pellicle; L I , lens in outgoing beam; M, front-surface mirror on rotating mount; S" analyzing slit; T, photomultiplier. (Figure and caption codes from Campbell and Gubisch, © 1966, with the permission of The Journal of Physiology.)
Campbell and Gubisch (1966) and Westheimer (1963), shows the objective SMTF produced with both a 3-mm and a 6-mm pupil. The kind of research
opthalmoscope used in this kind of experiment is shown in Fig. 5-36. The degree to which the optics of the eye blurs the optical image of an external object will be reflected in the retinal image degradation measured with these objective techniques. Retinal blur will also be reflected in the limits on visual acuity and contrast sensitivity obtained in purely psychophysical experiments. But it should not be forgotten that there are other factors-the size of the receptor; the density of the receptor matrix; the neural interactions in the various networks, including such high level processes described by the words criterion and subjective-that also will limit the observer's ability to utilize the pattern of the retinal information. Indeed, as we later see and as shown in Fig. 5-37, these higher-level processes not only change the magnitude but also the shape of the function. The retinally measured SMTF is a monotonically declining function, whereas the psychophysically measured SMTF has a peak at an intermediate spatial frequency. Nevertheless, the optical aberrations do establish a maximum limit that even an ideal nervous system cannot exceed. No informationprocessing system will be able to perform better than the input signal allows, and one that does as well as that maximum must be considered as an ideal observer. The SMTF of the various components of the eye can be determined independently. We should like to know what portion of the blur can be attributed to the optical properties of the lens, cornea, and ocular media. We should also like to know what part of the blur can be attributed to scattering in the retina, and finally, what part can be attributed to the neural processing that occurs subsequent to transduction. The procedure for determining each of these contribu-
342
5.
PRENEURAL PROCESSES
1·0
M.T.F. of Elements of Humon Visual System
1'0
A.Excised Retina(Favea)
l
()8
B 'Carnea + Eye ens C' Retina{Fovea) + Neural System
O'S
D'Neural System
0'6
~
o
g-
Q
0'4
O'S
I
.2
..10'2
-c O'S
~~
g'
, 120
Ot>
140
160
,0,0 ISO c/mm
0'4
0'4
0·2
0·2
OOA I
o
2'0
4'0 10
s'o
6'0 20
Ibo
I~O
30
Spatial Frequency,
Ilo 40
I~O
18~OO
50
c/deg
FIG. 5.37. The objective and subjective modulation transfer functions (MTF) of the human visual system. (A) The MTF of a retina with the image projected directly upon it (i.e., all other components have been removed). (B) The MTF of a retina as modulated by the eye's optic. (C) The psychophysical (subjective) MTF. (D) C divided by B to give an estimate of the MTF produced by the neural portions of the visual system. (From Ohzu and Enoch, © 1972, with the permission of Pergamon Press, Inc.)
tions is different. For example, to determine the blur due to retinal scattering, Ohzu and Enoch (1972) took a freshly ennucleated human eye and removed the retina. They then measured the blur of the light image emerging from the back of the retina when a sharply focused image was projected on the front of the retina. This procedure gives the SMTF of an isolated retina. In other experiments, Campbell and Green (1965b) have determined both the objective SMTF (as measured with the double transit ophthalmoscopic procedure previously described), and the psychophysical SMTF (as determined by a contrast threshold experimental procedure). Ohzu and Enoch have proposed that by normalizing the psychophysically obtained SMTF by dividing it by the objective whole eye SMTF, an estimate can be obtained of the contribution of the neural system alone. All four of these curves are shown in Fig. 5-37. Clearly, the nervous system and not the optics of the eye, is totally responsible for the nonmonotonic aspect of the psychophysically obtained function.
C. PERCEPTUALLY SIGNIFICANT OPTICAL PROPERTIES OF THE EYE
343
5. Binocular Optical Effects In an earlier part of this chapter I discuss some of the perceptual consequences of the geometrical fact that the two eyes are located at different locations on a horizontal rather than a vertical line. In this section, I pursue this discussion of the binocular processes further by considering how differences in the optical properties of the two eyes may affect visual perception. I have already alluded to two very obvious facts of vision. First, structurally the normal human observer has two eyes; and, second, his visual perception is functionally singular. These two facts compellingly state that our perception must result from the construction of a single perceptual interpretation out of the signals from the two eyes by some central integrative mechanism. However this intermixture of binocular signals occurs, it is a subtle and complex process that goes far beyond simplistic concepts of "suppression" of the information from one eye or a simple additive "fusion" of the information from both eyes. Rather, the two-dimensional information coming from each eye must be intermingled and interpreted such that a new, previously nonexistent three-dimensional percept can be created. Differences between the two monocular images, therefore, may be very important in defining the generated percept. In fact, as we see in Chapter 11, they are critical to stereopsis, and any distortion of the retinal image is likely to have a profound effect. It is the purpose of this section to consider specifically those optical conditions that can lead to image size differences between the two eyes-a condition originally referred to as aniseikonia by Lancaster (1938). I have already noted the simple fact that the two eyes do not have the same viewpoint and that most real-life scenes are filled with three-dimensional objects. This means that the simple geometry of the binocular viewing mode will directly lead to slightly disparate (different) images on the two retinae. Such binocular disparities are the basis of stereoscopic depth perception, the mental reconstruction of the real three-dimensional scene. In the normal viewer, there are broad regions of single three-dimensional vision in those portions of visual space in which there is appropriate convergence of the lines of sight. This single vision appears to be that of a cyclopean eye (Julesz, 1971) as I have already noted. There are, however, some abnormal situations in which the lines of sight of the eyes may not be adequately under control, and proper convergence and registration on corresponding points may not occur. In this situation, diplopia (double vision) is likely to occur. In some advanced cases, one of the diplopic images may be powerfully suppressed to overcome this distracting effect of double vision leading to a functional monocular blindness referred to as strabismic amblyopia. The differing viewpoints of even the normal eyes can also lead to another optical condition that can also have significant perceptual consequences. The fact
344
5. PRENEURALPROCESSES
that the two eyes view the world from two different viewpoints means that, with the exception of those two-dimensional objects lying exactly equidistant from the two eyes, all objects will produce different-sized images on the two retina. Different-sized retinal images can also be produced by an entirely unrelated process-unequal magnifying powers in the two eyes-even when the object is equidistant from the two eyes. Optical magnification imbalances may be due to transient factors such as momentary accommodation differences or to semipermanent magnification differences existing between the two eyes. A further source of what may be functionally equivalent to different image retinal sizes could also arise from differences in the anatomical packing of the photoreceptors themselves. If the receptor density is greater in one eye than in the other, the effect of even perfectly equal-sized images might not be the same because the mapping onto the retinal mosaic could activate the equivalent of a larger projection area at higher levels of neural encoding. Whatever the source of the asymmetrical magnification on the two retinae, whether it be the geometry of the viewing situation, unequal magnification by the optics of the two eyes, or receptor density differences, the result is aniseikonia. The outstanding student of such image size differences was Kenneth N. Ogle who presented what is still generally accepted to be the master discussion of the topic in his magnum opus, Researches in Binocular Vision (1950). Aniseikonia can lead to major visual disturbances if not corrected. If the retinal images are very different in size, fusion is not possible; and one eye or the other may fall into disuse. Aniseikonia is especially severe when one of the lenses is removed to treat cataracts; in that case the difference in image size may be 10% or more. The main effect of more moderate aniseikonia, however, is on the perception of depth. Although stereoscopic depth is a topic dealt with more fully in Chapter II, aniseikonic interference with stereopsis is properly considered as a Level 0 process and can, therefore, be considered at this point. Understanding the perceptual effects of aniseikonia is made difficult by the fact that only the horizontal components of the magnification error should produce a stereoscopic distortion. Yet, surprisingly, it turns out that both the vertical and horizontal components of aniseikonia are perceptually effective. If the magnification differences of an aniseikonic eye are limited to the horizontal plane (as would occur if the retinal image were distorted by a cylindrical lens with the long axis oriented vertically), then substantial distortions of the stereoscopic space would be expected. Figure 5-38 illustrates a cylindrical lens and its effects. An example of a stereoscopic distortion induced by the predominantly horizontal aniseikonia produced by such a lens is shown in Fig. 5-39. This type of spatial distortion results directly from the horizontal disparities produced as a result of unequal image size and can be predicted on a purely geometrical basis from the magnitude of the aniseikonia. On the other hand, if the magnification is equal in all orientations (as would be produced by a spherically magnifying lens), then a very surprising result occurs.
345
C. PERCEPTUALLY SIGNIFICANT OPTICAL PROPERTIES OF THE EYE
Image
Object Cylindrical lens
FIG. 5.38. The effect on an image of a square object when the light rays are passed through a cylindrical lens.
Very little distortion of the stereoscopic scene is produced. What seems to be happening is that the visual system has developed a powerful compensatory process that can adapt to wide ranges of intraocular differences in spherical magnification. This is a highly useful and adaptive process, because the differences in magnification between the two eyes would always be expected to be very much greater than the very small differences in retinal position associated with proper registration on corresponding points. This adaptive process, and the insensitivity to what may be substantial differences in spherical magnification, is
8'
C
F
C
IIF
FIG. 5.39. The effect on the image of a room with ill defined contours (a "leaf room") when it is viewed with a cylindrical lens in front of the right eye, only magnifying the image in the horizontal direction. (From Ogle. © 1950, with the permission of W. B. Saunders Co.)
F
JD
A
F
E - - -Leaf room
8
F
LAppa re nt shape of room
F
346
5.
PRENEURAL PROCESSES
a strong argument that the corresponding points may not have any' 'hard-wired" or fixed relationship at all, but rather may represent only flexible, and perhaps even transient, functional relationships among the retinal loci. Thus even though a purely vertical magnification difference between the two eyes should not produce any distortion in a stereoscopic space from a purely theoretical point of view (binocular stereopsis should be insensitive to vertical disparities), it does tum out that there is, in fact, a strong effect of vertical magnification asymmetries. Ogle described the strong distortion effects produced by vertical aniseikonia as the "induced effect. " He characterized the induced effect of a vertical magnifying cylindrical lens placed before one eye as a "simulation" of the effect of an equivalent horizontal magnification placed before the other eye. Thus, spherically magnifying lenses produce surprisingly small effects even though separate horizontal and vertical magnifications produce substantial perceptual effects. It is interesting to note that the processes underlying the counterbalanced effects of equal horizontal and vertical (i.e., spherical) magnifications before one eye must be assigned to different levels of the taxonomic schema that I have proposed as an organizing schema for this book. The distortion produced by horizontal aniseikonia is totally explicable in terms of the geometry and optics of the situation, and thus properly belongs in this chapter dealing with Level 0 processes. Ogle, however, could find no equivalent geometrical or optical explanation of the induced effect produced by unequal magnification along a vertical axis, and had, therefore, invoked a "psychic" or "psychological" explanation that I believe is quite comparable to the interpretive, symbolic processes I have incorporated under the Level 4 rubric. Specifically Ogle (1950) says: Since the evidence as presented in the previous chapter generally denies the existence of any physiologic compensatory change of vertical magnification, we are forced to fall back upon a hypothesis of a psychical change which results in a reorientation of the frame of reference for stereoscopic localization, partially in the sense of a rotation of the entire binocular perceptual space about the fixation point. This phenomenon, which basically must arise from the vertical disparities of images in the two eyes, which point of fixation from the two eyes or unequal magnification by the ocular optics, provides a basis for an explanation for the induced effect. When one increases the magnification of the image of the right eye in the vertical meridian by a suitable lens, an apparent distortion of space occurs in the sense that objects at the right of the fixation point appear nearer, those on the left farther-a clockwise rotation of the field. Thus the stereoscopic reference surface must have been rotated counterclockwise. This is the direction in which the reference surface must also be rotated when the eyes are asymmetrically converged to the right, if egocentric stereoscopic localization is to be maintained. This rotation of the subjective binocular visual field is not a pure rotation alone, but is the principal aspect consistent with an increase in the magnification of the image of the left eye in the horizontal meridian [p. 223].
C. PERCEPTUALLY SIGNIFICANT OPTICAL PROPERTIES OF THE EYE
347
Thus Ogle invokes what in the present context would be called an integrative, symbolic, interpretive Level 4 process. He believes that this process has evolved to overcome what necessarily must be a rather substantial difference in the magnification of the images on the two retinae by even normal eyes. The response to the naturally occurring aniseikonia thus is, in a very true sense, nonveridical with the actual image distortion on the retina. This compensatory process-the induced effect-helps us to overcome the disparities produced by spherical magnification differences between the two eyes. Yet without the horizontal component of magnification the induced effect of a purely vertical magnification is still so compelling that it can actually overcompensate and produce a perceptual distortion for which no compensation was actually required. Unfortunately, the nature of the induced effect remains obscure; and though the effect probably exists as described, it does little to help us understand what are the underlying mechanisms.
6. The Spherical Shape of the Eye The retina is not a plane. The fact that it is distributed over the interior surface of the globe of the eye means that the retina must necessarily deviate from the idealized planar projection screen required by Clerk Maxwell's criterion of a perfect nonaberrant lens. No matter how good the optics of the eye, therefore, there is no image plane on which to project an ideal object plane. There are, however, both advantages and disadvantages to the nearly spherical retinal shape and, in general, the curved retina has rather inconspicuous influences on our perceptions of the external world. One of the main advantages of a strongly curved retina on which to project the ocular image is that the field of view of the eye can be very much wider than it would be without such a construction. Figure 5-40, for example, shows a peri metric field measurement for a subject with normal vision. Such peri metric charts have become standardized tests of visual function. They are usually plotted in the following way: A small spot of light is used to explore the visual field, and the subject reports when it disappears from view. Obviously a flat retina would have to be unrealistically wide to have the same field of view. Peri metric measurements of this kind often indicate that the field of view of the human eye is actually somewhat greater than it would be expected to be and in some instances the field of view can actually reach around more than 90 deg from the line of sight. This result is initially surprising but totally explicable in terms of the relevant optics. Pirenne reminds us of the reasons that such an apparent violation of line-of-sight optics is possible. In this elegant book, Optics, Painting, and Photography (1970), Pirenne reproduces a drawing (see Fig. 5-41) originally presented by Hartridge (1919) showing the refractive effects of the cornea and the lens when hit at a near-grazing angle by a ray of light from an object that is objectively behind the eye. Our ability to see an object so far "out
90
the
the
the
defining defining defining defining the thethe the thethe thethe thethe A the the
the
the
the
the the
the
the the
the
the
the the
the
the
the
the
the the defining the defining defining defining the the the the the the the the thethe the the
the
the
FIG. 5.40. (A) Raw data points defining the visual field for two different intensities of an exploring test spot. (B) Completed perimetric charts of the visual field for the two different intensities. (From Tate and Lynn, © 1977, with the permission of Grune and Stratton, Inc.)
348
C. PERCEPTUALLY SIGNIFICANT OPTICAL PROPERTIES OF THE EYE
349
line
FIG. 5.41. Diagram showing how the visual field can in some extraordinary circumstances actually be larger than 1800 as a result of the strong refractive effects on light hitting the cornea from a point greater than 900 from the line of sight. (From Pirenne, © 1970, after Hartridge, with the permission of Cambridge University Press.)
of the corner of the eye" is totally explained in terms of the refractive bending of rays of light by the cornea. The dioptric power of the cornea is sufficiently great that light coming from an angle greater than 90 deg to the line of sight can be bent back to pass into the eye and impinge upon the edge of the retina. Somewhat surprisingly, it turns out that it is relatively difficult to identify any perceptual penalty paid by the observer for the curvature of the retina. There is no question that there is considerable optical distortion of the optical image produced on peripheral portions of the retina, but like the optical inversion of the image by the optics of the eyes, this distortion, in general, is of little consequence with respect to how the world is perceived. Compensatory neural coding and cognitive interpretive processes apparently overcome these distortions in the reconstruction of the perceptual experience. It is only in the most unusual circumstances, as Pirenne points out, that the perceptual effects of the curved retina can be detected. The association between the cognitive "image" and the retinal "image, " therefore, is not likely to be a linear one but at best only a topological one. One rather obscure demonstration of the effects of the spherical shape of the eye is shown in Fig. 5-42. This figure, also cited by Pirenne (1970) after an earlier demonstration by Helmholtz (1856), shows a distorted checkerboard. When the eye is placed at a position approximately 2 inches from the center of the figure, the checkerboard assumes a more linear and ordered appearance. The perceived correction of the pattern is largely due to the optical distortion of the portions of the retinal image that are farthest from the fovea. In general, however, it is clear that the degree of perceptual distortion produced is nowhere near the degree of image distortion on the retina. The percep-
350
5.
PRENEURAL PROCESSES
A FIG. 5.42. A stimulus that appears to be a rectangular checkerboard as a result of the optical distortion in the eye. To observe this phenomenon this stimulus should be viewed at a distance equal to the length of line A with one eye. (From Pirenne, © 1970, with the permission of Cambridge University Press.)
tual nervous system is a highly encoded system and there are many other higherlevel processes that can compensate for many kinds of peripheral distortions including those introduced by such factors as the spherical retina. A fundamental explanation of compensatory processes leading to the linear and upright perception of a topologically distorted and inverted image can be found in the language of sensory coding theory (the idea that a stimulus message may be represented by any dimension, no matter how distorted or scrambled, as long as decoding operations are available to unravel the representations), and in terms of a constructionistic perceptual psychology in principle, if not in detail. These compensatory processes are so effective that the perceptual effects of what can sometimes be very profound topological distortions in the retinal mapping of an object scene are often very difficult to detect.
C. PERCEPTUALLY SIGNIFICANT OPTICAL PROPERTIES OF THE EYE
351
7. The Stiles-Crawford Effect Even though the perceptual system can compensate for many kinds of distortions, there are other optical effects within the eye-in fact within the receptorthat can lead to significant uncompensable perceptual phenomena. One example is the Stiles-Crawford effect (Stiles & Crawford, 1933), a substantial variation in absolute visual threshold as a function of the portion of the pupil through which the ray enters the eye. This effect is generally attributed to the angle of incidence of the light ray upon the outer segment of the photoreceptor rather than pupil entrance point per se, but the former follows directly from the latter. The optical situation giving rise to this anomoly is shown in Fig. 5-43. The effect, to put it simply, is that the visual effectiveness of a stimulus is progressively less the greater the deviation from the axis of the photoreceptor outer segment with which that stimulus impinges. Typical results from one of Stiles and Crawford's early experiments are shown in Fig. 5-44. The effect was first reported as a unique characteristic of cones and was thought to be very weak in rods if it existed at all. However, there is now little question nowadays that it also does occur in rods: Daw and Enoch (1973), using a blue-cone monochromat, and Van Loo and Enoch (1975), as well as Webb (1972), and Flamant and Stiles (1948) have all irrefutably demonstrated its presence in rods. They also demonstrated that rods have a less constrained directional sensitivity than do cones. Figure 5-45 shows a comparison of the scotopic and photopic functions to illustrate this difference and to indicate why the lesser rod effect was missed in the early experiments. The Stiles-Crawford effect is definitely not due to any additional absorption by the ocular media in the longer pathways traversed by these light rays entering the perimeter of the pupil compared to those entering through its center. Such additional absorption could, in any case, amount to only a few percentage points
Movable point source of light
a
Lens
:eptor outer segment
Pupil
FIG. 5.43. Diagram depicting the variation in the angle of incidence of light on a receptor as a function of the point of entry into the pupil.
352
5.
PRENEURAL PROCESSES
o
4 sensitivity sensitivity
0.5
4 sensitivity sensitivity
1.0
6 8 10
sensitivity sensitivity
~.
20
Cl
...J
2
40
E
:0
-6 :3
60 80
fOO
200 2.5
-3 -2
-~
Nasal
o
4
2 :3 4 Temporal
Millimeters from center of pupil FIG. 5.44. The Stiles-Crawford effect shown as a function of wavelength in the dark adapted fovea. Nasal or temporal displacement of the stimulus from the center of the pupil degrades the detectability of the stimulus. (From Stiles, © 1939, with the permission of The Royal Society of London.)
in the available amount of incident light; the effect itself, on the other hand, may lead to a reduction in visual sensitivity of as great as 80% in some situations. In fact the actual decrement in visual effectiveness that is measured in the classic Stiles-Crawford experiment may actually be much stronger than that indicated in Fig. 5-44. There is another intraocular counterbalancing process that at least partially compensates for the effect. The crystalline lens, belying its name, is not completely colorless even in a young normal eye. There is a considerable amount of pigment present in it; and that pigment has a decided yellow tint, indicating that it absorbs some portion of the shorter wavelength of light passing through it. Weale (1961), noting that the lens is an oblate object thicker at the center than at the edges, suggested that there should be a differen-
c.
PERCEPTUALLY SIGNIFICANT OPTICAL PROPERTIES OF THE EYE
353
tial absorption by the pigmented materials of the lens such that more of a ray of light passing through the center of the pupil should be absorbed than of one passing through the periphery. This differential absorption should produce a decrement in visual performance at the center of the lens-an effect contrary to the Stiles-Crawford effect; and therefore, if this differential absorption existed, it would tend to diminish the measured strength of the associated phenomenon. Thus it is unlikely that the measured magnitude of the effect is, in fact, an underestimate of the actual influence of the angle of incidence on photoreceptor absorption in vision. The Stiles-Crawford effect is also dependent on the wavelength of the incident light. The absorption of short-wavelength light is most greatly influenced by the angle of incidence and long and medium wavelengths are, respectively, less strongly affected. Van Loo and Enoch (1975) have also shown this to be the case for rods. Because such a differential effect of the wavelength of the incident light exists, it is not surprising that there are also chromatic phenomena associated with the effect. Indeed, in some of the early experiments on this phenomenon, Stiles (1937; 1939) reported that the color of a beam of light consisting of a MG. .OD.
ROD
.8~ &
CONE
.4~
~.2~ ~ OF OF
NASAL
TEMPORAL
Z
UJ Vl
.2
OF .2 .2 OF OF OF .2 C
8.0 EDGE 8.0
8.0 8.0
_____
8.0
EDGE
EDGE
8.0
EDGE
8.0 ,.,-..0 8.0
BACKGROlN> BACKGROlN> ______
phenomenon phenomenon phenomenon
634
8.
LEVEL 2: NEURAL INTERACTION PROCESSES %
5 ~
3
.2::l
5
c.r::.
~.c
5
psychophysical
5American5
-5
~ 10 o g 20
~ ~
,g 50 100
5 10 Cycles per second
I
50
5
8
American psychophysical Small field
,... "0
c 0
u
Q)
C
0
...
II>
5 Large field
Q)
Ci a. ::::J
'0
II>
Q)
o.!!1
~
::3
a. .~
0.1
0.5
5
5
10
Cycles per second FIG. 8.38. A comparison of neurophysiological and psychophysical spatial modulation transfer functions for different size fields. (From Ratliff, Knight, Toyoda, and Hartline, © 1967, with the permission of the American Association for the Advancement of Science.)
D.
THE PERCEPTUAL IMPACT OF MORE COMPLEX NETWORK INTERACTIONS
1. A Comment Although there are clearly phenomenal traces of the peripheral inhibitory and convergent processes in the perceptual responses of man, it is obvious that these phenomena represent only a partial picture of the rich variety of neural interactions possible in the afferent nervous system. The variety of man's thoughts does not result only from such simple processes, but more likely, from a hierarchy of many levels of concatenated interactions-some inhibitory, some excitatory, and certainly some that perform such complex forms of neurophysiological logic that they will forever be beyond the limits of human analysis. It is not too surprising, therefore, to realize that even in the peripheral nervous systems there exist
D. MORE COMPLEX NETWORK INTERACTIONS
635
complex network interactions that are not simply summatory or inhibitory but constitute higher-level combinations of these primitive logical elements into more elaborate information-processing mechanisms. In this section I consider some phenomena that probably result from these more complex peripheral neural interactions . These kinds of neural processing, though somewhat more intricate and involved than those I have already discussed, are still well within the Level 2 rubric. There is, however, a difficulty concerning these more complex interactions that should be explicitly reiterated at this point. Though we have a substantial amount of information about the neurophysiology of inhibition and convergence, there is a paucity of direct neurophysiological data concerning the concatenated interactions in vertebrates that constitute the topic of this section. Because of the difficulties of studying such interactions in vertebrates, and the limitations of the available analytic technology, neurophysiological evidence for the complex interactions I am about to discuss is for the most part indirect or, even worse, nonexistent. Thus there is a considerable amount of controversy concerning what must, at best, be considered to be the speculative theory that is invoked to explain the phenomena I consider here. In Chapter 9 I deal more specifically with the problems involved in such conceptual constructs as "channels" and "detectors" and the role they play in current theory. For the moment the reader is forewarned not to expect too much precision in the way these words are used in this section.
2. Interactions Between Receptor Systems A long-standing debate in visual science concerns the possibility of reciprocal interactions among the signals generated by the four types of receptors of the retina . In previous sections of this chapter, I emphasized those aspects of the problem of lateral inhibition and summation that arise as a function of the spatial distribution of the stimuli on a relatively macroscopic scale; two spots of light either interacted or did not as a function of the distance between them. However, neither the spectra of the light used in those experiments (which was, in almost all instances, white) nor the specific receptors upon which that light acted was treated as a significant variable in the experiments I have considered so far. However, this problem must also be studied at another, more microscopic level at which stimulus spectra and receptor type becomes salient. The retina is not, as we have repeatedly seen, composed of a homogenous distribution of identical receptors . It is, rather, composed of four different kinds of receptors filled with four different kinds of photochemical and exhibiting four different absorption spectra. Most germane to the present discussion, given the existence of the four interdigitated systems, is the question of whether the elements of these four systems spatially interact in a way that is comparable to the spatial interactions of different retinal regions. In other words, this question concerns the nature of
636
8.
LEVEL 2: NEURAL INTERACTION PROCESSES
interactions between the rods and cones as well as between the different types of cones . At least two answer to this question might seem equally plausible a priori. One could imagine a system in which only like receptors interacted, as well as one in which any receptor type could interact with any other. It is the purpose of this section to consider the empirical evidence related to this question. The visual system provides ample anatomical opportunities for interactions between any of the four receptor types. There seem to be synaptic connections between the rods and all three kinds of cones (Sj6strand, 1974), furthermore, electrophysiological recordings from many types of ganglion cells indicate that they are often activated by signals that come from both rods and cones. A very complete review of the manifold variety of retinal neural interconnections has also been presented by Stell (1972) . One cannot read his remarkable essay without coming to the conclusion that the machinery is certainly present for the interaction of just about any retinal neuron with any other. A fine, though brief, review of anatomical and physiological data supporting interreceptor interactions has also been presented by Makous and Boothe (1974). However, the reader must appreciate by now that the mechanical possibility is not tantamount to the operational fact. And indeed, until recently, the psychophysical side of the question of receptor interaction has been quite equivocal. There are well-thought-out and executed experiments that demonstrate interaction and equally well-designed studies that demonstrate a lack of interaction between the various receptor types; some investigators are convinced that interaction occurs and others that it does not. In preview, it is probably fairest to say only that there are conditions in which interaction can be unequivocally demonstrated and others in which it is unequivocally absent. Perhaps the unusually rancorous tone of this controversy in the visual literature can be alleviated in part by reminding all concerned that there really is no intrinsic inconsistency of results but rather an emerging diversity of opinion as different experimenters use different conditions. If one assumes that all are correct at least within their own sets of conditions, one quickly arrives at a reasonable consensus and begins to appreciate the richness of the processes probed by the question-do receptors interact? The major technical problem in such an experiment, of course, is to actually isolate the rods and different kinds of cones from each other so that any measured effects can be definitively attributed to neural interactions rather than to the overlapping spectral sensitivities of the four receptors. A number of different approaches have been used to isolate receptors. One simple method is to present a stimulus (to a dark-adapted eye) whose luminance is below the cone's absolute threshold but above that of the rods. Such a stimulus has been referred to as a "pure scotopic stimulus. " If such a scotopic stimulus is used as the "inducer" to test for an effect on cone vision, then any alteration on the response produced by a photopic stimulus must be attributable to an interaction between rods and cones.
D. MORE COMPLEX NETWORK INTERACTIONS
637
To study interactions between the three different kinds of cones is more difficult. The usual way this problem has been attacked is to use Stiles' preadaptation-increment threshold technique to emphasize the response of one or another of the triad of receptor types. The reader should keep in mind, however, the many conceptual and empirical difficulties involved in interpreting the results of this type of experiment discussed in Chapter 6. Conversely, it should not be overlooked at this point that the meaning of the results obtained with Stiles' technique is itself very much dependent upon the assumed independence of the three systems! If, as seems to be the case, there is a great deal of interaction between the three cone systems, then much of the traditional theory underlying those results becomes equivocal. As I noted earlier, the 1T mechanisms were originally considered to reflect the properties of single cone systems. Now most workers in this field appreciate that many of the 1T mechanisms represent more complex composites of the properties of all three receptor types. Nevertheless, this procedure is still the most effective method that can be applied to the solution of the problem of photopic receptor interactions. The basic issues of the controversy surrounding the possibility of interactions between rods and the three types of cones can be illustrated by considering some of the major early studies of the problem. Two studies, the results of which were interpreted to mean that there was no interaction between the various receptors, were reported by Alpern ( 1965) and Alpern and Rushton ( 1965). Another, which did report complex interactions between the four types of retinal receptors, was published by Boynton, Ikeda, and Stiles (1964). The totally opposite conclusions to which the respective authors came, although later rationalized, represent an interesting example of the flow of ideas in science. Alpern (1965) attacked the problem by using an experimental paradigm in which a later, larger, surrounding stimulus was used to mask the effects of an earlier and smaller one. A constantly illuminated background stimulus of 625 nm (a light which appeared reddish-orange) was presented continuously to the observer. The test stimulus was a small (2.5 deg by 2.5 deg) square placed approximately 2 deg eccentric to the fovea. It had a dominant wavelength of about 527 nm and appeared to be green. The inhibiting, or masking, afterflash was a large circular spot, identical in shape to the contrast background except that it was blacked out in the region in which the test stimulus occurred. It occurred, and this is most important, 50 msec after the test flash. The wavelength spectrum and luminance of the afterflash were the major independent variables in the experiment. Figure 8-39 shows the various components of the stimulus used in Alpern's experiment. Selecting the wavelength and luminance of the afterflash so that it was a pure scotopic stimulus, Alpern reported that the incremental energy required for the subject to detect the green-appearing (photopic) test stimulus on the reddishorange background did not change as a function of the presence or absence of the afterflash. However, when a photopic afterflash, which was effective in activat-
638
8.
LEVEL 2: NEURAL INTERACTION PROCESSES
FP
12;0
10°
12;0 12;0 1 Background (p)
Test flash (A)
12;0 \2;°
9°
12;0 After-flash (9)
FIG. 8.39. The stimulus patterns used in Alpern's study of rod-cone interaction. (From Alpern, © 1965, with the permission of the Journal of Physiology.)
ing the cone systems, was used, there was a measurable additional effect on the threshold beyond the normal increase in the increment threshold produced by the background luminance per se. Similarly, if the experiment was carried out on an appropriately light-adapted observer who was responding mainly to cone signals, an inhibitory interaction of cones on cones was reported. Alpern's conclusion was that 'There was no interaction between rods and cones, " but he did feel that at least some kind of intercone or intracone interaction existed. In a follow-up study, Alpern and Rushton (1965), using essentially the same test flash-afterflash experimental design (with the exception that a somewhat smaller test flash placed within the fovea was used), studied the interaction of the three "isolated" .zone systems. Once again, they used the Stiles preadaptation procedure to en,hance at least partially the activity of the three cone systems relative to the normal balance. The results of this study led Alpern and Rushton to reverse Alpern's original conclusion on intercone interaction and to assert that not only were the rods independent of the cones, but that the three cone systems were independent of each other as well! The interactions between cones observed in the earlier study (Alpern, 1965) were attributable, they speculated, only to interactions between the same kind of cones. That is, they asserted that the elevation in the incremental threshold for any of the 7Tl 7T4, or 7T5 mechanisms depended only on the degree to which that system was activated and was independent of the degree to which the other two systems were activated. Diametrically opposed results, however, had been obtained by Boynton, Ikeda, and Stiles (1964) only a year previously. Their experiments, which also used a Stiles preadaptation procedure, led to the conclusion that there were, in fact, very complicated patterns of interaction between the three cone mechanisms and that they were not, as Alpern and Rushton were subsequently to suggest, by any means independent. It is important to note that in spite of the similarity of the general approach of Alpern and Rushton on the one hand, and of Boynton and his colleagues on the other, there were still some fundamental differences in procedure that suggest that the two studies were not directly comparable. The major difference was that the Boynton, Ikeda, and Stiles study did not use an afterflash or masking
D. MORE COMPLEX NETWORK INTERACTIONS
639
paradigm. They used an experimental design in which the stimuli for each of the three cone systems were presented at the same time and the same place-within a small spot of 10 min of visual angle in the middle of a 10 deg adapting field. The critical factor was the degree to which two simultaneous stimuli added together or inhibited each other as measured in terms of the respective independent and combined increment thresholds. As a result of the outcomes of a comprehensive series of experiments that tested many conditions, Boynton, Ikeda, and Stiles concluded, unlike Alpern and Rushton, that a wide variety of different interactions existed among the three different cone systems. The types of interactions they observed were varied and included both inhibitory and summatory interactions reflected by respective elevations or depressions of thresholds. Although some evidence for probability summation was obtained, they also found psychophysical evidence of what they believed were genuine interreceptor interactions as well as other interneural interactions within the retina. In recent years the controversy has continued without complete resolution, although a consensus does seem to be forming for the position that all types of receptors can interact with all others. A few investigators such as Westheimer (1970) have continued to report independence of rod and cone responses under some conditions. However, the preponderance of new data suggests that, in general, Boynton, and his colleagues were correct and that there are many conditions in which interreceptor interactions can be demonstrated. Indeed, it is the unusual and offbeat situation in which they are not so demonstrable. The afterflash or masking paradigm, which is one of them, seems to have an unfortunate choice with which to seek an answer to the question of receptor interaction. The preponderance of recent research, as I noted, supports the idea that there are many conditions in which the four receptor systems interact with each other. For example, Makous and Boothe (1974), once again using the Stiles preadaptation technique, have shown that even when the scotopic luminance of different wavelength backgrounds are made equal, there is still a detectable difference in the incremental threshold of a superimposed test spot as the spectral properties of the background are varied. When coupled with other data showing that, in general, Boynton and his colleagues were correct and that there are many produce different rod thresholds, this evidence makes a compelling case for the presence of cone effects on rod responses. Ingling, Lewis, Loose, and Myers (1977) have also shown strong effects of cones on the rod threshold, as have Frumkes and Temme (1977), Barris and Frumkes (1978), and Blick and MacLeod (1978). Finally, Sternheim, Gorinson, and Markovits (1977) have also provided additional evidence that there is an effect of the various cones on each other. Thus it seems certain now that there is a high degree of psychophysically observable interaction that corresponds to the well-established anatomical and neurophysiological opportunities for interactions between the four kinds of recep-
640
8.
LEVEL 2: NEURAL INTERACTION PROCESSES
tors. Why, then, in the light of the substantial mass of data that is now available demonstrating the presence of such interaction, was it missed in some of the earlier studies? The answer to this question is also now becoming clear. The newer studies make it clear that the interaction between the different kinds of receptors is strongly dependent upon two stimulus parameters-time and space-that were not adequately sampled in Alpern's (1965) and Alpern and Rushton's (1965) studies. Frumkes and Temme, for example, have shown that the interaction effect is substantially affected by the size of the stimuli. Any interreceptor effects, these investigators report, are washed out when the adapting field subtends an area greater than 6 or 8 deg of visual angle. It is only for smaller stimuli that any dependence on wavelength, and thus of cones on each other or on rods, can be measured. Similarly, Ingling, Lewis, Loose, and Myers (1977) have shown that the interaction effect is severely limited in time. These workers replicated Alpern's experiments, but instead of using a constant interval of 50 msec between the target and the afterflash, they varied the interval within a range of 0 to 60 msec. The results of this experiment showed that there was very little interaction between the receptors when the interval was equal to or greater than 50 msec, the very interval that Alpern had unfortunately chosen for his study, but that interaction did appear at shorter intervals. In a similar way Makous and Peeples (1979) have resolved another discrepancy between two earlier studies that used virtually the same experimental paradigm but found opposite results. Makous and Boothe (1974) had found rod-cone interaction wherever Flamant and Stiles (1948) had not. Makous and Peeples have now shown that Flamant and Stiles had unfortunately chosen to work within a range of luminances and wavelengths in which the interaction was minimal. However, when the range was expanded (in the Makous and Peeples study), the interaction appeared. Though these studies of the receptor interactions may seem like a relatively esoteric exercise, it turns out that the results of these studies have some very profound implications for many other areas of perceptual research. First, these findings speak directly to the problem of the nature of the four receptor systems. Because, as now seems clear, the degree of interaction between the various receptor systems is high, further support is provided for reconsidering the rationale behind Stiles' concept of the so-called 1T mechanisms themselves. Stiles' initial idea of selective adaptation, as I noted previously, depended upon the assumption of a relatively independent set of receptor channels. If they are not independent, exactly what the relationship is between the 1T mechanisms and the receptor absorption spectra becomes more and more uncertain and equivocal. Perhaps the 1T mechanisms are not neural mechanisms after all but interactive processes. Another important implication of this now well-demonstrated interaction between the four receptor systems lies in the fact that the interreceptor interactions are lost at great distances and at substantial time differences. Yet these are
D. MORE COMPLEX NETWORK INTERACTIONS
641
exactly the conditions in which metacontrast and simultaneous brightness and color contrast, so often attributed to these mechanisms, are most powerfully seen. These data provide another compelling argument for the suggestion that these global phenomena are not, in fact, well explained by the type of peripheral network interaction I have been discussing here.
3. Chromatic Recoding7 a. The Mechanism Another form of complex neural interaction that is now appreciated to exert a considerable impact on visual perception is the recoding of the afferent neural message that occurs as a result of interneuronal interactions in the retina. By recoding, I am referring to the translation that occurs in the neural signals and codes that represent such stimulus parameters as wavelength or intensity. In this section, I am concerned exclusively with the recoding of wavelength information and the psychophysics of the color-vision phenomena that seem to result from this recoding. It is now well established (Baylor & Hodgkin, 1974; Marks, W., 1965; Tomita, Kaneko, Murakami, & Pautler, 1967) that the response of the vertebrate photoreceptor is a transient monophasic increase in the polarization of the plasma membrane. It is an equally secure conclusion that the responses of many, if not most higher-order neurons-including the second-order bipolars (Tomita, 1965), the ganglion cells (Wagner, MacNichol, & Wolbarscht, 1960), the neurons of the lateral geniculate body (DeValois, R., Abramov, & Jacobs, 1966; DeValois, R., and De Valois, K., 1975), and at least some cortical neurons-are opponent type cells that respond with either an increase or a decrease (in frequency or amplitude) in whatever electrophysiological response they are generating, depending upon the stimulus conditions. Because the coding "language" used at the receptor level differs from that at subsequent levels, there must be a transformation of codes inherent in the way in which the receptor feeds signals into the bipolar layer. It is the thesis of this section that these neural interactions also affect the molar properties of color perception and, thus, leave traces that can be detected in appropriately designed psychophysical experiments. It has been hypothesized by Hurvich and Jameson (1957)8 that the recoding of chromatic information as it passes from the receptors to the bipolar layer occurs as a result of interactions between the signals generated by the receptors . According to this model, there is a selective convergence of the signals from more than 'Some of the material discussed in this section has been adapted and updated from UttaI (1973). "The classic zone or stage theories of color vision also implied such a recoding. However, Hurvich and Jameson's (1957) paper can be considered to be the first modern and specific model of this kind. Most recent interpretations take this paper as the jumping-off point into the modern era of opponent-color theory.
642
8.
LEVEL 2: NEURAL INTERACTION PROCESSES
one kind of receptor onto a set of several opponent mechanisms. The basic idea is that although the receptors represent color by the relative amount of activity in three monophasic chromatic mechanisms, the opponent system represents chromatic information by means of the relative amount of activity in two chromatic and one luminance mUltiphasic mechanisms. It is appropriate that I digress here for a moment to consider some important nomenclature. The word trichromatic is used in the present context to refer to a coding language in which three neural systems selectively and differentially sensitive to wavelength represent chromatic stimulus information. The ensemble of the activity in all three systems represents luminance, and all three of the systems respond monophonically. In other words the amount of induced neural activity is directly proportioned to the stimulus intensity. The word opponent, on the other hand, refers to a neural coding language in which the activity in each of two systems may increase or decrease depending upon the wavelength of the light; in other words, the response is biphasic or even triphasic. In such a system a separate third black-white luminance system must be invoked. The word trichromatic can also be used to describe the psychophysical fact that three degrees of freedom are necessary to match any color. In this sense, both the trichromatic and the opponent neural systems are trichromatic. This is not the way I use the term here. These two encoding schemata-the opponent and the trichromatic systemsand the transformation logic that converts one code to the other are diagramed in Fig. 8-40. The details of this figure should not be taken too literally. There are many different variations on the same theme that would produce informationally identical mechanisms. The particular version that I present here differs from the one originally proposed by Hurvich and Jameson (1957) as well as from some newer models of these chromatic recoding circuits proposed by Guth and Lodge (1973), Massof (1977), and Ingling (1977). Figure 8-41, for example, presents two other alternative, but equally plausible, models that also explain how a trichromatic receptor layer could feed into an opponent-color secondary layer. Another even more comprehensive and detailed model has been proposed by Ratliff (1976). This model (depicted in Fig. 8-42) suggests six levels of processing from the receptors to the opponent cells, some of which involve such neurally complex processes as differentiation and compression. Clearly, we are not simply talking about simple lateral inhibition and convergent summation, but neural interaction of a much more complicated kind, when we consider the translation from a trichromatic receptor to an opponent bipolar code. However, other models involving more than three types of opponent mechanism at other levels of the visual pathway have also been suggested. A good summary of the current status of the problem is presented by De Valois and DeValois (1975). They assume six different kinds of opponent cells in the fourth layer (the lateral geniculate body) on the basis of Russel De Valois' classic studies
D. MORE COMPLEX NETWORK INTERACTIONS
To
bk
bk
central
bk
nervous
643
system
bk
bk
w.
Dark adaptation
bk
bk
bk
FIG. 8.40. A schematic sketch of a plausible neural mechanism, which could convert a trichromatic (R, G, and B) photoreceptor coding scheme to a more central opponent mechanism (b-y, g-r, and bk-w). (From Ullal, 1973, as adapted and modified from Hurvich and Jameson, 1957.)
(DeValois, Abramov, & Jacobs, 1966) of the responsivity of neurons at that level. Of course the existence of six different kinds of opponent cells does not mean that more than three degrees of freedom are required for a psychophysical color match, only that some combinations of these neural mechanisms are redundant with other combinations. To add to the elegance of the system, it has even been suggested that the recoding accomplished within such a system is not stable but can vary as determined by the momentary state of dark adaptation (Ingling 1977). Such a finding also raises the possibility that this recoding process involves interactive inputs, not only from the three cones but perhaps also from the rods. It should not be overlooked, however, that the various boxes in Figs. 8-41 and 8-42 need not necessarily indicate individual neurons. The flow of information in this case may be between larger units of organization; each block in these hypothetical diagrams may represent a functional system of neurons rather than a
644
8.
LEVEL 2: NEURAL INTERACTION PROCESSES neural
neural neural
A
A
neural
A
A
A
neural
neural
A
A
A
A
A
B
FIG. 8.41. Two alternative models of the neural network converting trichromatic to opponent codes in the retina. (From Ingling, © 1977, with the permission of Pergamon Press.)
single cell. Nevertheless there is strong evidence, as I have noted, that the recoding is carried out at least once between the receptors and the bipolars even though further transformations, it seems likely, must occur at higher anatomic levels. In spite of the transformation from a trichromatic code to an opponent one at the receptor-bipolar interface, a trichromatic representation can be regenerated at higher levels of the nervous system by appropriate inverse transformations. It is possible, for example, to identify some trichromatically encoded neurons in the visual cortex (Anderson, Buchmann, & Lennox-Buchthal, 1962; LennoxBuchthal, 1962; Motokawa, Taira, & Okuda, 1962), even though most cortical neurons appear to be opponent types if they are differentially sensitive to color at all. It is, therefore, important to appreciate that the trichromatic information initially introduced as a result of the characteristic absorption spectra of the receptors is not lost when this information is recoded into an opponent set of symbols by the kind of neural network interactions exemplified by Figs. 8-41 or 8-42. The existence of the psychophysical "trichromatic fact" (any color experience can be matched by a mixture of any other three fundamentals) clearly shows that some properties of the message created by the receptors is maintained at the highest neural levels even if the neural language in which it is encoded is no longer trichromatic in the other sence of this word. Thus, despite the recodings, the trichromatic receptors have left this trace in the visual message in a way that has subsequent psychophysical implications. In exactly the same sense, it is not unreasonable to assume whenever the neural signals are recoded from one neural coding language (e. g., trichromatic) to another (e.g., opponent) that the new coding schema will also insert some new
645
D. MORE COMPLEX NETWORK INTERACTIONS
properties into the message that can also have an observable psychophysical result. Indeed, present theory and empirical fact imply that this is exactly what happens. One has only to look back over the history of research in color vision to appreciate that the controversies that existed between the two major theoretical positions-the Young-Helmholtz trichromatic theory and the Hering opponentcolor theory--could have been maintained only because both were at least partially valid. There are some psychophysical experiments that reflect the impact of the trichromatic receptors as I have previously noted; there are, in addition, some psychophysical experiments that reflect neural interactions and recoding of the signals from trichromatic to opponent-color mechanisms. There was, in fact, no fundamental inconsistency between the two theories (other than their detailed hypotheses concerning the nature of the receptors) but, quite to the contrary, a substantial complementarity. The key idea behind the contemporary rationalization of the two theories is that each describes a different level of coding and analysis. There were real differences between the two theories, of course, but only in terms of the physiological implementation that each assumed. Hering was
implementation implementation implementation
FIG. 8.42. An even more complicated neural model explaining the conversion of trichromatic to opponent codes in the retina. This model consists of six stages: I. photoreception; 2. transduction; 3. lateral summation; 4. adaptation; 5. differentiation; and finally, 6. compression. (From Ratliff, © 1976, with the permission of MIT Press.)
4
each
each
each
42
each
each
each
3
each
each
each
4
each
each
each
eacheach
5
6 implementation
each eacheach
implementation each
each each
Responses of Spectrally Opponent Neurons
each
646
8.
LEVEL 2: NEURAL INTERACTION PROCESSES
wrong-we now know that there are no opponent receptors-but this was an anatomical detail that could not be ascertained until the advent of microelectrophysiology in the 1960s. He was entirely correct in assuming however, that at some level in the nervous system opponent mechanisms existed. But so were the proponents of the Young-Helmholtz theory correct in assuming that at some level trichromatic mechanisms also exist. The levels and sites at which the various mechanisms were actually operating were anatomical and electrophysiological details to be worked out later. Our concern here, though, is with the perceptual phenomena that arise from the neural interactions underlying this recoding. In this category I would place all of the substantial body of data that pertains to opponent-color psychophysics as well as those anecdotal reports that suggest that there may be a fourth "primary" color, yellow, beyond the usual triad suggested by trichromatic theory. In the remainder of this section I consider the phenomenal traces of the opponent mechanism that occur as a result of this Level 2 recoding process.
b. The Phenomena As one scans the historic literature of color vision, the idea constantly recurs that there is something special about yellow, so that it is not "perceived" in psychological experiments as a mixture of other colors as are such colors as greenish-blue or yellowish-red (orange). Rather, some visual scientists in the field have traditionally considered yellow to be "psychologically" just as primary as red, green, and blue. This statement is difficult to interpret, for at first glance it is not exactly clear why other color-naming situations might not lead to a wide variety of other so-called fundamental colors. In spite of the vagueness and confusion regarding the definition and existence of a "fundamental yellow, " there is a substantial amount of hard psychophysical data that supports the idea that there are links between certain color pairs that either operate together or in opposition. These kinds of data have been far more compelling than the possible existence of a vaguely defined "fundamental yellow. " One of the most important of these pieces of evidence, the existence of complementary colors, is represented on the chromaticity diagrams shown in Fig. 6-8 and 6-9. Complementary colors are defined as those pairs of colors which, when mixed in appropriate amounts, produce a completely colorless or unsaturated white light. Complementary colors are represented on the diagram as the colors at the ends of any straight line whose colorimetric center of gravity lies in the central white region. Thus there are many colors that tend to cancel out the chromaticity of a complementary partner and that, therefore, presumably may be linked at some physiological or anatomical level. A number of other ways in which pairs of colors seemed to behave in opposition have been summarized by Hurvich and Jameson (1957) as they argue for an opponent-color theory:
D. MORE COMPLEX NETWORK INTERACTIONS
647
how can a system of three independent processes be made to account, for example, for the apparent linkages that seem to occur between specific pairs of colors as either the stimulus conditions or the conditions of the human observer are varied? Why should the red and green hues in the spectrum predominate at low stimulus levels, and the yellow and blue hue components increase concomitantly as the spectrum is increased in luminance (von Bezold, 1873)? Why, as a stimulus size is greatly decreased, should discrimination between yellow and blue hues become progressively worse than that between red and green (Farnsworth, 1955; Hartridge, 1949)? Why should the hues drop out in pairs in instances of congenital color defect or when the visual system is impaired by disease (Judd, 1949; Kollner, 1912)? [pp. 384-3851
It has also been noted (Hurvich & Jameson, 1957; Linksz, 1964), that it is impossible to conceive of or to find words to describe a reddish-green hue or a yellowish-blue hue. The difficulty in finding a color for which we would use such color names, they believe, reflects the biological fact that the relationship between the percepts of blue and yellow, on the one hand, is different from that between yellow and red, on the other. But such data also suffer from the same difficulties as does the distinction of yellow as a primary color--color names are based on word usage and subjective judgments that are peculiarly elusive when one attempts to precisely define the operations involved in their elicitation. All we can say with some assurance now is that all of these questions and difficulties that I have just mentioned are probably explained in terms of opponent-type mechanisms. A more compelling and quantitative body of supporting data has been developed by Jameson and Hurvich (1955) in their attempt to develop a quantitative opponent-color theory. Noting that a relatively wide range of spectral hues produces a partial experience of yellowishness, blueness, greenness, and redness, they attempted to determine the relative strength of each of these qualities by mixing varying amounts of a postulated opponent color with each hue-inducing wavelength until all traces of the original color disappeared. Thus, for example, a band of stimulus wavelengths varying from about 500-700 nm would produce color responses that were reported by the subject to have at least some yellowish tone. Various amounts of blue light would then be mixed with each of a series of wavelengths within this band, and the amount of blue required to cancel completely any . 'yellowishness" was measured as an indicator of the strength or chromatic valance of the yellow response at each wavelength. Figure 8-43 shows a sample set of cancellation data for the visible spectrum. These data are plotted in terms of the amount of the opponent color that had to be added at each wavelength to eliminate any residual "redness, yellowness, greenness, or blueness. " On this graph, it can be seen that the maximum amount of blue required to cancel yellowishness from one band (varying from 490-650 nm)
648
8.
LEVEL 2: NEURAL INTERACTION PROCESSES
+1.00
+0.75
+0.50
~
~
f
.
+0.25
iii :::J
.:;
-il!:
'p
0.00
a: -0.25
-0.50
Blue Yellow Red Green
-0.75
White (photopic response)
-1.00
400
500
600
700
Wavelength (nm)
FIG. 8.43. Chromatic response functions for a single subject, indicating how much opponent color must be added to eliminate the residual chromatic effect of the colors indicated. The photopic luminosity curve is also shown for this subject. (From Jameson and Hurvich, 1955.)
was required at about 530 nm and that the function dropped off on both sides of this wavelength. A wavelength band varying from 480-580 nm, on the other hand, elicited some green experience, which had to be neutralized with a red, the largest amount of which was required at about 520 nm. The fact that a single wavelength should produce some green and some yellow should not be too surprising. There is a range of wavelengths whose color names include greenish-yellow and yellowish-green, for example. When adequate amounts of red had been introduced to completely neutralize the green, the residual color would be a yellow.
D. MORE COMPLEX NETWORK INTERACTIONS
649
When adequate amounts of blue had been introduced to completely neutralize the yellow, the residual color would be a green. The curve also shows that the band of wavelengths that induces blue color experiences runs from 430-480 nm. These colors had to be neutralized with yellows, and maximal amounts of yellow were required at about 450 nm. The red curve, on the other hand, is somewhat peculiar because reddishness is an experience that is introduced by both long and short wavelengths. At the shorter wavelengths, the experience of violet includes chromatic experiences that most people describe as including some reddishness. At the longer wavelengths, the sensations are color-named as red or orange. To remove all of the reddishness from a short wavelength of about 400-470 nm, green light had to be added, peaking in the amount required for neutralization at about 440 nm. At the longer wavelengths, the range of red-inducing stimuli was about 580-700 nm, and the peak amount of green required for neutralization occurred at about 620 nm. It should be noted that there are a number of difficulties with the Jameson and Hurvich neutralization procedure that preclude a very simple explanation of these data. First, it was both necessary and practically difficult to define, for each of these basic colors, exactly what the bandwidth of spectral wavelengths is that evokes the particular sensation. How can you be sure there is no yellowishness at a particular wavelength? Second, it is difficult to determine exactly when all of that yellowishness might be gone. The neutralized color often became an unsaturated rather than a saturated version of one of the other opponent pair. Thus, blue and a yellowish tone could be mixed together, and the observer might be faced with deciding whether there was any yellow in a resulting green or red field. Third, a more direct problem is that the data obtained with Jameson and Hurvich's hue cancellation do not agree with the data obtained in direct matching techniques (Ingling, Russell, Rea, & Tsou, 1978). Short-wavelength sensitivity is drastically overestimated by the former technique. It is possible, therefore, that the basic empirical facts as well as their interpretation may be open to question. Nevertheless, these experimental results are considered to be classic demonstrations of phenomena in color vision that are more likely to be attributable to the opponent coding mechanisms that exist at higher levels of the visual pathways than to the receptors. It is important to note, furthermore, that there are really two quite distinct ways in which the idea of opponent mechanisms plays an important role in color vision research. First, there are the well-documented electrophysiological observations that indicate the existence of neurons whose wavelength codes are clearly opponent in nature. Second, there are the opponent-color models that are used to explain and describe a wide variety of different data ranging from the BezoldBrucke effect through color blindness to adaptation to color cancellation. The richness of the controversy in color theory these days is at least partially due to the fact that both a trichromatic and an opponent model can do a fairly good job of representing much of the data that has been obtained. In this section I have
650
8.
LEVEL 2: NEURAL INTERACTION PROCESSES
concentrated on the phenomena that seem to reflect most directly opponent mechanisms and thus neural interactions in the visual pathway. It is likely, however, depending upon their personal orientations, that many of my colleagues would not necessarily agree with the attributions that I have made here. For reasons that are more or less convincing to each individual theoretician , these phenomena may be classified differently by others. The existence of theoretical explanations based on opponent mechanisms (which are contrary to the receptor level hypothesis I have championed) of such phenomena as the threshold for color (Massof, 1977) or of the Bezold-Brucke phenomenon (Krantz, 1975b) are two examples of the kind of difference in attribution to which I allude. In any discussion of opponent theory, the special role of Dorothea Jameson and Leo Hurvich in promoting opponent-type theories should not be overlooked. Their leadership as latter-day proponents of opponent models of many visual phenomena is surely evidenced by their fine analyses in Volume VII/4 of the monumental Handbook of Sensory Physiology (1972; Chapter 14 by Jameson; Chapter 22 by Jameson and Hurvich; Chapter 23 by Hurvich). Others have also helped to extend the opponent hypothesis far beyond Hering's original intent. In an exciting series of articles, Carl Ingling and his colleagues have expanded upon the Hurvich and Jameson (1957) transformation model to show how the recoding of the information from the three chromatic receptor channels combined with the center-surround properties of higher-order neurons lead to opponent mechanisms that seem to explain such diverse phenomena as the specific spectral sensitivities of the opponent mechanisms, Stiles' 1T mechanisms (processes?), color saturation, and hue discrimination (e.g., Ingling and Tsou, 1977). Figure 8-44, for example, shows the theoretically predicted spectral sensitivities of the two opponent mechanisms and the luminance mechanism based upon one model of recoding for both light and dark adapted states. This figure should be compared to the psychophysical data previously obtained by Jameson and Hurvich (1955, Fig. 8-43) to show the power of Ingling's model. Note also the great difference in spectral sensitivity between the light- and dark-adapted states in Fig. 8-44, another finding suggestive of a rod input to the opponent mechanism. The specific details of Ingling 's theory of the convergent interaction leading to these phenomena are spelled out in a companion article (Ingling, 1977). Another important theoretical development is the work of David Krantz of the University of Michigan . In two very important papers (Krantz, 1975a, \975b), he proposed a formal mathematical model of opponent-color processes. Krantz' model represents every color experience as a multidimensional vector. These vectors represent the degree of correlation occurring between the different opponent processes (red-green, blue-yellow) as described in Jameson and Hurvich's classic study. Although Krantz' model is not phrased in anatomical or neurophysiological terms, it is explicit in its assertion that color experiences arise as a result of opponent mechanisms somewhere in the nervous system.
D. MORE COMPLEX NETWORK INTERACTIONS
651
-0.5
-0.5
-0.5
-0.5
-0.5
-0.5
A
-0.5
100
B
-0.5
~
;:
-1.0
in
c:
'"
I-
c :J
100
0.110.3
1
lOG(Ov/OC)
3
FIG. 10.27. The ratio of the time that a figure was grouped in the variable direction (Tv) compared to the time it was grouped in the constant spacing direction. (Reprinted with permission of Author and Publisher from: Oyama, T. Perceptual grouping as a function of proximity. Perceptual and Motor Skills. 1961,13,305-306, Figure I.)
C. FIGURAL ORGANIZATION
803
persuasion) were dealing with phenomena that were as inherently regular and quantifiable as any other in the repertoire of perceptual science. It is especially disappointing from this point of view that there should have been no continuing systematic effort to establish programmatically either the magnitude or the priority of the relationships between the other stimulus factors in this important subarea of perceptual research. In recent years, the emphasis in the study of these rules of perceptual organization has changed dramatically. The current stress is on how to use them as independent variables rather than to measure them as dependent variables. The goal in the most modem research is usually to determine the effect of grouping on a task such as target detection or memory rather than to study the factors that influence grouping. A general outcome of this emphasis has been to highlight further the fact that simple feature-detection models are not, in general, adequate as models of target-detection behavior and to show that the influence of configurational or organizational factors on memory and detection is even more powerful than the influence of local features. The theory tested in many such studies has been the specific feature-detection model proposed by such workers as Estes (1972, 1974) and Bjork and Murray (1977). The general philosophy expressed by these psychologists is that there are independent channels (conveying feature-specific information) that interact to account for the masking effects of letterlike stimuli6 on each other. Samples of the type of stimuli typically used are shown in Fig. 10-28. I refer to this type of masking paradigm as masking of the fifth kind-that is, a reduction in visibility produced by the entanglement of an element of a target pattern in a Gestalt or grouped pattern in a way that reduces the probability of its being detected or recognized as an independent element. The major counterargument to the Estes, Bjork , and Murray feature-sensitive channel theory, which clearly is of a genre similar to those that I criticized in the previous chapter, has been the demonstration that the Gestalt configurational effects, which are not well accounted for in such theories, are in fact dominant in modulating the detection of letters in this kind of experiment. Such an argument has been presented by Banks, Bodinger, and Illige (1974); Banks and Prinzmetal (1976); and Prinzmetal and Banks (1977). In the first of these papers, Banks and his colleagues (1974) reported that increasing the separation between a target T letter and a set of masking F letters increased the detectability of the target by perceptually extracting it from a group in which it had been embedded. This
6There is good evidence to support the idea that letter masking of this type and dot masking are not due to the same underlying processes. I (Vttal , 1975a) have shown minimal effect of figural goodness on random dot pattern masking and Banks and Prinzmetal (1976) and Prinzmetal and Banks (1977) have shown strong effects of goodness on letter-masking situations of the type referred to here. On this basis I consider these two phenomena to be masking of different kinds and to represent an even further subdivision of the masking taxonomy.
804
10.
UNIDIMENSIONAL PROCESSES
FIRST 01 SPLA Y (PREMASK)
SECOND DISPLAY (STIMULUS)
THIRD DISPLAY (POSTMASK and CUE)
FIG.
10.28. The sequence of stimuli presented in a single trial in Bjork and Murray's experiment. (From Bjork and Murray, © 1977, with the permission of the American Psychological Association.)
experiment was well controlled for lateral interaction effects; the authors thus felt confident in attributing the release-from-masking effect purely to the reduction of a Gestaltlike proximity factor rather than to a reduced spatial interaction between any putative feature sensitive channels. In the second of these studies, Banks and Prinzmetal showed that the grouping of the elements in the target and the mask strongly affect both the speed and accuracy with which a target letter could be detected and recognized. If the stimulus pattern was organized so that the target letter was a part of a Gestalt or perceived grouping that included the noise characters, then the target letter was detected less well than when it was separated from the group. This reduction in visibility of the target occurred even when the number of masking letters was larger in the well-grouped stimulus than in the poorly grouped condition, as shown in Fig. 1O-29A and B, respectively. In the third study, Prinzmetal and Banks (1977) discovered that a target character could be hidden by being made a perceptual part of a pattern that exhibited good continuity, as shown in Fig. 1O-30A. The target letter F in this case is far less easily detected when it is at the end of a line of characters (of which it becomes a perceptual part) than when it is placed alongside the line of
C. FIGURAL ORGANIZATION
A
805
B
FIG. 10.29. (A) An example of a stimulus in a "gestalt" masking trial. (B) An example of a stimulus with a larger number of masking characters. (From Banks and Prinzmetal, © 1976, with permission of The Psychonomics Journals , Inc.)
masking characters. This result is so robust that it holds even when the latter discontinuous positioning would bring the target into closer proximity to a larger number of masking characters (as shown in Fig. 1O-30B) than when it was at the end of the line. The important point made by three of these experiments is that the global configuration of the stimulus exerts a powerful, if not dominating, effect even on such processes as target detection, which are often considered only in terms of the details of their local geometry. From my perspective, there is no way that such phenomena can be satisfactorily attributed to the function of hypothetical "channels" or "feature detectors" in the way that the Estes , Bjork, and Murray theories attempt. The masking of characters by characters, as described in the experiments I have just discussed, is not the only way in which powerful configurational effects , akin to the Gestalt factors, on target detection can be demonstrated. In another kind of experiment, Weisstein and Harris (1974), Williams and Weiss-
A
B
FIG. 10.30. An example of a situation in which a target letter might become less detectable if it is part of a "group" (A) than when it is isolated (B), even though the spatial relations of the latter case bring the target into greater proximity with the elements of the mask. (From Prinzmetal and Banks , © 1977 , with the permission of The Psychonomics Journals, Inc.)
806
10.
UNIDIMENSIONAL PROCESSES
tein (1978), McClelland and Miller (1979) have all demonstrated a similar configurational impact on the detection and recognition of straight-line segments. The observer's task in Weisstein and Harris' study, for example, was to report which one of four straight-line segments (as shown in Fig. 10-31A) had been tachistoscopically presented. The lines could be flashed separately or as parts of patterns that had "good" organizational properties (as shown in Fig. 1O-31B). Control stimuli, made up of the same number of adjacent straight-line segments but in less well-ordered patterns, made it clear that this was not just an effect of the additional lines, but truly one of "object" (or Gestalt) superiority of the organized pattern over the disorganized ones . As powerful and well substantiated as are the organizational influences on perception, it is disappointing to note that, beyond the Gestaltist's "rules," there are few modern metatheories of perceptual grouping . Perhaps the most interesting approach to what is actually such a theory of grouping is presented, incorrectly I believe, in an entirely different context. I am referring here to the work on the "interpolation" of dotted contours reported by T . M. Caelli and his colleagues at the University of Melbourne. This work (Caelli, Preston, & Howell, 1978; CaeIIi & Umansky, 1976) also provides an interesting set of psychophysical data concerning completion and closure processes. A
c
b
a
d
B
a
b
c
FIG. 10.31 . CA) Sample of the line segments used as targets in the Weisstein and Harris experiment. (B) The same line embedded in a spatial context. (From Weisstein and Harris, © 1974, with the permission of the American Association for the Advancement of Science.)
d
C. FIGURAL ORGANIZATION
FIG . 10.32. (A) A sample of the stimuli used in the Caelli, Preston, and Howell experiment. (B) The tangent vectors purported to be computed by the visual system. (From Caelli, Preston, and Howell © 1978, with the permission of Pergamon Press.)
A
807
B
Before I begin this discussion, I should note that Caelli may well feel that I am misplacing his experiments and theory in my taxonomy. As I understand his work, he suggests that his data describe the emergence of percepts of continuous contours from discontinuous stimuli as a result of interpolative processes . However I maintain that the work of Caelli and his colleagues does not, in fact, involve the perception of continuous contours but rather the perceptual grouping of discrete points better described as a constellation of those points. To make this criticism clearer, consider the experimental design. In Caelli, Preston, and Howell's experiments (1978, p. 728), the observer is typically asked to: "Join up the dots (with a pencil on a paper test sheet) which correspond to the contours or shapes which spontaneously appeared when looking at each display." One of their typical displays is shown in Fig. 1O-32A. The point of my criticism is that both the response and the instructions in this experiment are actually ambiguous . The observer probably does not actually see continuous contours, nor does any part of the task or the response delve into the issue of continuity or discontinuity. Rather, the observer is more likely to be seeing groupings of dots akin to a stellar constellation without the awareness of continuous contours suggested in Caelli's interpretation of the findings. In other words , the observer is not interpolating between the dots. Instead, he is organizing and grouping them in much the same way as in any of the other Gestalt-type phenomena I have just described. The perceived patterns are still discontinuous, however, and not completed in the same way as in the closure phenomena discussed later in this section . Caelli's mathematical theory, on the other hand, which is based upon the development of tangent vector fields determined by the spatial positions of the dots, does provide one of the few good descriptions of how dots might interact in visual (as opposed to physical or neural) space to produce organized patterns. The tangent vector in Caelli's theory is a mathematical construct that describes possible perceptual interactions among the components of a dotted stimulus . Figure I 0-32B, for example, shows the tangent vector field associated with the stimulus that I previously presented in Fig. 1O-32A. These vectors are formed, according to Caelli's theory, primarily as a result of a summation of the interdot interactions that are most heavily, but not exclusively, influenced by the distances between pairs of the dots. Thus the vectors associated with widely separated dots will be very small, whereas those associated with closely spaced dots
808
10.
UNIDIMENSIONAL PROCESSES
will tend to be both longer and, most important, aligned along the major organizational axis defined by the stimulus. Caelli thus is asserting that these vectors are not solely depedent on the local distance between pairs of dots but, because the distance between many pairs of dots is taken into account in defining the vector, also on the global organization of the pattern. Caelli's "vectors" effectively describe "figural forces" directly comparable to Pragnanz or proximity that are known to affect grouping, but there is little basis for his assertion that these vectors themselves come into awareness in the form of continuous contours. What he has done is to sharpen our language and to provide a more formal description of the Gestalt ideas. We also must not forget, however, that Caelli's vectors in no sense should be considered to be actual properties of the stimulus. Instead they describe the outcome of interactive processes in the brain. Whether these processes reflect the function of local neural interactions in the visual cortex or more subtle symbolic interactions of higher perceptual levels is moot at the present time. But my intuition tells me that the latter language is probably more appropriate.
3. Interpretive Processes In the preceding sections I have dealt with the various ways by which the perceptual system responds to stimuli that are relatively well defined and unambiguous. The perceptual responses to these stimuli are analogous to those that could be generated by some ideal automaton. In that discussion it was often implicit that it was against such an automaton that a human observer was compared and evaluated. There are, however, many other situations in visual perception in which the stimuli are incomplete or ambiguous rather than definitive of a unique perceptual response. In those cases the percept depends to a much more substantial degree on the interpretations that the observer, as an active, inferential agent, may place on the incomplete or ambiguous stimuli. In some cases, the perceptual process may actually fIll in lacunae and gaps in an incomplete stimulus so that the observer perceives a physically incomplete stimulus to be complete. In some cases, the resulting interpretation may not be stable but may periodically alternate between equally plausible perceptual constructions. Demonstrations of such figural reversals and completions have been a popular part of any discussion of visual perception for many years. These demonstrations, however, playa much more important role in the theory of perceptual science than simply as parlor games or examples of the virtuosity of the human mind in elementary textbooks. The major theoretical impact of these illusions of completion, reversibility, extrapolation, or alternative perceptual constructs is that they collectively reflect the ubiquity of the constructionistic role played by the higher levels of the visual nervous system on what may seem to be relatively primitive and automatic phenomena. These interpretive illusions, perhaps better than any other artifact of perceptual science, illustrate the need for the
c.
FIGURAL ORGANIZATION
809
neorationalistic or construction is tic component in any metatheory of perception. In the most graphic and direct way, illusions of the class I consider here indicate that many perceptual responses are actively constructed from the symbolic cues and clues provided by the stimulus rather than passively determined by any aspect of the geometry of the stimulus acting strictly deterministically on the nervous system. The fact that there can be alternative and reversible perceptual constructions while the stimulus and, presumably, the state of the afferently transmitted neural signal remain nearly constant, is another strong argument for considering the nervous system as an active participant in the perceptual process rather than some kind of automaton that is merely passively responsive to the stimulus-borne information. The phenomena to be considered in this section are well known to anyone who has had an introductory course in psychology, but in spite of that familiarity there has been little effort to organize them in a way that highlights the similarities and differences of the processes that acount for them. In order to provide some order to the discussion, I propose the following microtaxonomy of what I call here interpretive processes. Two main subclasses of these phenomena are identified. The first consists of processes that lead to the phenomena of closure, extrapolation, and completion. In other words, these are the processes that lead to percepts that are more complete than the stimuli that elicited them. The second has to do with processing of stimulus conditions that are intrinsically ambiguous and that, therefore, are often interpreted as alternative perceptual experiences. This second subclass therefore includes the wide variety of processes leading to what have come to be called reversal phenomena. a. Closure and Completion. One of the most impressive perceptual processes displayed by the human observer is the ability to transform an incomplete stimulus into a full and complete experience. This completion, closure, or filling in (any of these words will do), can be amazingly thorough; great gaps in the physical stimulus are perceptually closed up by mechanisms of which we have very little knowledge. A universally observed instance of such a completion process is evidenced by the absence of any awareness of the substantial blind spot (approximately 3 deg of visual angle) on the retina. This large lacuna results from the absence of receptors in the retinal region where the retinal blood vessels and the optic nerve enter and leave the eye. Under most normal conditions of viewing, however, we are oblivious to this broad hiatus in the temporal portion of the external visual field (corresponding to the nasal location of the blind spot) in each eye. Even structured fields, such as checkerboards, appear to be complete when viewed under other than the most abnormal conditions. Retinal lesions can also produce pathological lacunae that have functional properties very similar to the normal blind spot. Both normal and pathological "blind spots" have been well known and the object of psychological research for over a century. Hermann von Helmholtz had written about it in his magnum opus on physiological optics
810
10.
UNIDIMENSIONAL PROCESSES
(Helmholtz, 1867) and Poppelreuter (1917) had studied completion in patients with visual field defects due to neurological damage. One explanation that has often been proposed to explain the continuity of vision across the blind spot is characterized by a radical isomorphic premise. This hypothesis assumes that because there are no brain regions corresponding to the blind spot, the regions geometrically adjacent to where it would have been are actually in anatomical contact with each other-the brain space and the perceptual space are congruent. In other words, because there is no missing cortical tissue, there is no discontinuity of neural representation to be perceived. This hypothesis, however, is counterindicated by the strong completion process that can occur equally as well elsewhere in the visual scene. Completion is not restricted to the blind spot; there are many other examples of closure that occur even when the retinal "blind spot" is not involved. For example, we tend to "see" grounds continue behind figures and figures continue behind interpositioned objects. In those cases there is no "missing" neural tissue to be used as a conceptual crutch to explain this phenomenon. Completion, therefore, is not just an aberration of one localized region on the cerebral cortex but rather is a general property of visual processing throughout the entire visual field. The remaining questions are: How are the missing components, without regard to retinal locus, completed and filled in; and how can we be so insensitive to missing parts of the visual scene? Once again, it must be acknowledged that the underlying mechanisms of these powerful perceptual processes elude us. There is certainly no network hypothesis sufficiently complex to handle these phenomena and thus they can only be discussed phenomenologically. Even the best available analyses often tum out to be little more than recitation of the many instances in which the phenomena occur. Thus what we can best do is simply to tabulate the wide variety of exemplars of completion and the conditions under which they occur. Indeed, there are many instances in both art and the technical literature to reinforce our appreciation of the ubiquity of the processes underlying perceptual completion whatever they may be. As one example, the fractured figure shown in Fig. 9-11, though initially difficult to perceive as a complete and unitary form, becomes whole once the perceptual problem it represents is solved. The Belgian artist Rene Magritte (1898-1967) has also used the powerful perceptual tendency to fill in figures as the basis for his curiously disturbing paintings. One of these, shown in Fig. 10-33, is at first glance totally reasonable, yet the reasons for our discomfort with this picture become evident when it is examined in detail. The reader interested in more of these elegant examples of the completion process should refer to the work of Gombrich (1960); Carraher and Thurston (1969); and Parol a (1969). Another striking illusion of completion-the subjective contour-has been brought to current attention by the work of Kanizsa (1955, 1974), although the original description of this phenomenon can be dated back to Schumann (1904)
c.
FIGURAL ORGANIZATION
FIG. 10.33. A painting by Rene Magritte showing the extraordinary amount of perceptual filling and closure that can occur in a very incomplete stimulus display.
811
812
10.
UNIDIMENSIONAL PROCESSES
and Rubin (1915). Figure 10-34 illustrates one example of this intriguing phenomenon. It should be noted that the phrase "subjective contour" is something of a misnomer. In fact, the illusion is often not just of contours but, as is clearly evident in this figure, of extended surfaces. The central rectangle in each of the drawings shown in Fig. 10-34 is not simply outlined by apparent edges; rather, each rectangular surface possesses illusory properties that are clearly different from their surrounds in spite of the fact that the physical properties of the two regions are identical. In particular: 1. There is a marked brightness difference between the illusory area and the surrounding area. The subjective rectangle in Figure 1O-34A, for example, is brighter than the background. This brightness difference occurs even though both regions are of exactly the same luminance. 2. The depth of the illusory areas appear to be different from that of the backgrounds. 7 This difference in apparent depth occurs even though all of the usual monocular and stereoscopic cues are identical in the two regions. 3. The illusory surface appears to be opaque and visually to obstruct the background. The background appears to continue behind the subjective surface. 4. Two or more subjective contours may exist simultaneously in the same region of space, as shown in Fig. 10-35. 5. The illusory surface tends to be the figure in the figure-ground relationship established for the scene and like the figure in an ordinary scene to be sharper in contour, more dense in texture, and more saturated in color than the ground that surrounds it. Because subjective contours are not an essential part of our existence, what rationale is there for the development of such a mechanism? Kanizsa (1974) notes that the only feature that is common to all of the many SUbjective contour demonstrations is that there must be an incomplete figure in the original stimulus. It seems possible, therefore, that the most logical way for the observer to construct a rational model of an incomplete object is to infer the existence of another intervening object. Thus, while the current enthusiastic interest in subjective
7This difference in apparent depth between the illusory foreground and the background has led some perceptual psychologists (e.g., Coren, 1972) to link the mechanisms underlying subjective contours with those of stereoscopic vision. There seems little justification for this association, however, because the illusory or subjective contours are, if anything, enhanced in monocular viewing while stereoscopic perception is abolished. Spatial disparities play no role in the depth differential observed in the Kaniza "subjective" triangle. The depth differences observed in the illusory contour seems to be more closely associated with some of the monocular cues to depth that I consider in Chapter II. It seems likely that all of these mechanisms simply represent alternative means of providing information to the interpretive mechanisms constructing the depth experience. The error here, once again, appears to be the fallacy of arguing from analogy of phenomena rather than homology of process.
c.
FIGURAL ORGANIZATION
A
B FIG. 10.34. Two subjective surfaces generated by patterns of opposite contrast. (Courtesy of Dr. Gaefano Kanizsa of the University of Trieste.)
813
814
10.
UNIDIMENSIONAL PROCESSES
FIG. 10.35. Stimultaneous generation of two subjective surfaces in an overlapping manner. (From Kennedy and Lee, © 1976, with the permission of Pion, Ltd.)
contours may be directed at the illusory surface, the perception of this surface may actually be epiphenomenal-a curious, but secondary, byproduct of the very strong tendency to fill in incomplete, but real, stimulus objects. It is this completion process that may be the adaptive and useful basis of the otherwise insignificant subjective surface phenomenon. Regardless of what the primary focus of research interest should be, and epiphenomenal or not, it is clear that the subjective contours and surface generated by this process are powerful and compelling percepts. It is possible to produce such standard geometrical illusions as the Poggendorf and Ponzo illusions with illusory contours (see Fig. 1O-36A & B), and I have already noted Weisstein's (1970) demonstration of the ability of an illusory grating in an inferred background to mask a real grating in a subsequent test. Clearly the powerful completion processes giving rise to the subjective contours and surfaces are the outcome of neural computational mechanisms that are so complex that there is yet no satisfactory neuroreductionistic explanation nor, in my opinion, is one possible at the present time. In spite of this neuroreductionistic models abound. Some theoreticians have suggested that these illusions may arise as a result of the "partial activation" of "line segment detectors." However, Kaniza has convincingly shown this hypothesis to be inadequate by demonstrating the existence of curved subjective contours occurring in the absence of even partial activation of any putative contour detectors. Furthermore, even dot patterns, presumably incapable of stimulating line-sensitive detectors, can lead to the illusion. Demonstrations that make these two points are shown in Figs. 1O-36C and D. Further evidence that narrowly defined line detectors are not involved in the elicitation of subjective contours is to be found in an illusory surface that does not produce a contour but a diffusely terminating surface that approximates a glowing object. This illusion, invented by Kennedy (1976), is shown in Fig. 10-37.
C. FIGURAL ORGANIZATION
815
Another important aspect of this demonstration is the additional support it provides for Kanizsa's conjecture that textured dot patterns, as well as lines, are capable of producing illusory surfaces. Kennedy and Ware (1978) have developed a wide variety of other illusory contours that can also be produced with dot patterns. The ability of observers to construct such elaborate subjective experiences from dot patterns argues strongly against a simplistic extrapolation
c A
B FIG . 10.36. (A and B) Demonstrations that geometric illusions can be produced by subjective surfaces. (C) Demonstrations that subjective contours can be evoked without "partial activation" of "line detectors." (D) Demonstrations that even as incomplete a stimulus as three dots can produce complex subjective surfaces. (Courtesy of Dr. Gaetano Kanizsa of the University of Trieste.)
D
816
10.
UNIDIMENSIONAL PROCESSES
FIG. 10.37. A stimulus pattern producing a subjective surface in the absence of sharp contours, (From Kennedy, © 1976, with the pennission of Pion, Ltd.)
from Hubel and Wiesel's data on single-cell feature sensitivity to "line detector" explanations of subjective contours. Coren and Theodor (1977), another pair of proponents of neuroreductionistic theory for this aspect of perception, have suggested that lateral interactive processes may playa role in the brightness difference between the brighter illusory surface and the background, but they carefully qualify this hypothesis by asserting only that the current state of empirical knowledge' 'does not rule out this type of explanation. " Another example of a different kind of neuroreductionism is Ratliff's (1965) and Ginsburg's (1975) proposal that the illusory figures in the Kaniza patterns are a natural byproduct of a Fourier transform carried out by the visual nervous system as a result of its limited band-pass characteristics. Ginsburg, in particular, argues that the "subjective figure" is actually present in the physical stimulus and is extracted by the Fourier transformation. This hypothesis has been severely criticized on the following basis by Tyler (1977b), and also strongly cautioned against by Becker and Knopp (1978). Tyler's arguments, in abstract, are: 1. The Fourier processed simulation of the Kanizsa triangle produces a roughly textural surface not the homogeneous surface reported by observers.
c.
FIGURAL ORGANIZATION
817
Why should we disregard this aspect of the transformation while at the same time accepting some others? 2. Ginsburg asks us to compare the filtered image visually with our perceptual response to the original image. This is, in fact, a double filtering that represents a "serious philosophical error. " 3. Contours appearing in the Fourier-filtered image appear to be an artifact of Ginsburg's process, apparently involving a filter with an abrupt cutoff. Such a filter produces lines but is physiologically implausible. 4. Most of Ginsburg's simulations seem to be due to high-frequency cutoff, but when this is simulated by viewing a stimulus at a distance, the illusion does not occur. 5. Finally, even if it is true that the illusion is mainly associated with a particular band of frequencies, this does not mean that the illusory figure is "physically present in the stimulus. " I believe that Tyler's criticism is correct. It seems to me that what Ginsburg has done is to simulate an illusion, not to explain it. Indeed, Becker and Knopp (1978) demonstrate another similar model based on a convolution process that simulates the illusion equally well. What Ginsburg, Tyler, and Becker and Knopp all agree upon is that the high-frequency characteristics of the stimulus are necessary for seeing the edges and the low-frequency characteristics are necessary for its unifonn surface brightness. In general, however, other more molar theories invoked to explain the properties of subjective contours are equally unsatisfying. A good summary review of theories of subjective contours is provided by Bachmann (1978). He lists the following molar approaches: I. Gregory's (1975) and/or Kennedy and Lee's (1976) cognitive theories.
2. 3. 4. 5.
Kanizsa's (1974) and/or Pastore's (1971) Gestalt theories. Coren's (1972) and/or Kaufman's (1974) depth-perception theories. Frisby and Clatworthy's (1975) contrast theory. Bachmann's (1978) own proposed resonance-by-feedback theory.
None of these, from my point of view, really help very much in understanding the mechanisms underlying this compelling phenomenon. For example, there is considerable experimental evidence that the Gestalt model fails to provide a satisfactory explanation for subjective contours or surfaces. In particular, Elizabeth Warrington (1965) has studied the completion of figures across defective regions of the visual fields in patients with some sort of retinal or cerebral damage. The independent variable in Warrington's experiments was the type of stimulus that had to be completed across the field defect. The major finding of her study was that completion of figures that were not good (in the Gestalt sense) and that contained no internal cues to what the absent portion of the figure might
818
10.
UNIDIMENSIONAL PROCESSES
be occurred almost as well as figural completion processes in which the cues to completion were more obvious. Examples of these two kinds of stimuli are shown in Fig. 10-38. Nonsymmetrical figures were completed at a rate very close to that observed for the simple geometrical objects used as controls. The major factor determining the completion of any figure was the degree of the observer's previous experience with it. In other words, the Gestalt hypothesis proposed by Kanizsa was not substantiated. Warrington's results are important because they help to shift the locus of the completion phenomenon, of which subjective contours are only one important component, from the geometry of the stimulus to the interpretive and constructionistic capabilities of the observer. In this regard they argue also against both the simplistic single-cell theories invoking partial triggering as well as the Gestalt "stimulus pragnanz" explanation. So far I have restricted this discussion to the completion of static spatial patterns. However, perceptual filling need not be considered only in a static and spatial context. It is also possible to observe perceptual filling and completion in dynamic temporal stimulus situations. Our ability to fuse the sequence of slightly differing sequential image frames in a cinema is an example of the visual system's powerful ability to interpret temporally incomplete stimulus information in a way that produces a continuous perceptual experience. When considered in this context, the phenomenon of saccadic suppression, so often alluded to in the literature as an explanation of the "cinema phenomenon, " can be interpreted from a different point of view. This alternative asserts that we ignore the stops between each frame of a movie or the blur in retinal images during eye movements, not because afferently conducted signals are suppressed but, at least in part, because these irregularities are smoothed out by a perceptual filling process. (Do not forget, however, that Matin, 1974, has identified a wide variety of other influences beyond "filling in" as contributing to saccadic suppression-see Chapter 5.) There are other more precisely measured effects involving subjective contours that reflect their temporal properties. For example, von Grunau (1979) has shown that apparent movement can exist between subjective contours, and Smith and Over (1979) have shown the existence of a motion aftereffect induced by moving subjective contours. Along with the facts that gratinglike subjective contours can mask gratings and that subjective contours can produce geometric illusions, these
3
3
3
3
3
FIG. 10.38. 1,2, and 3 are figures that are good in a Gestalt sense. 4 and 5 are not. Yet both are completed over retinal lacuna equally well. (From Warrington, © 1965, with the permission of the British Psychological Society.)
C. FIGURAL ORGANIZATION
819
findings all attest to the fact that these perceptual filling-in processes can produce "virtual " stimuli that themselves can lead to responses comparable to those produced by real stimuli. In conclusion, the visual system exhibits a powerful ability to complete and fill in, in both time and space, missing parts of a visual stimulus. The processes underlying these phenomena, in spite of suggestions to the contrary, clearly seem to depend on exceedingly complex neural mechanisms, probably beyond current understanding in any truly reductionistic sense. It seems that they are best considered to be the result of interpretive and symbolic processes described with a neorationalistic theoretical vocabulary.
b. Reversible Figures. In the proceding section I considered a number of visual processes in which the failure of stimulus-percept veridicality was so great that the resulting phenomena had to be considered to be the outcome of interpretive or constructive processes that transcended the informationally incomplete cues provided by the stimulus. That is, the information contained in the stimulus was so incomplete that only by invoking logical constructions based on interpretations of object continuity could we explain how we perceive. My discussion of these processes and phenomena was intended to make one major point, namely, that the properties of the stimulus and simple passive network transformations alone cannot fully explain the nature of the resulting percepts. The conclusion seems inescapable-they must also be influenced by symbolic and interpretive processes, no less neural but so complex as to embody a kind of logic of which we have virtually no understanding. Another category of perceptual phenomena incorporated within Level 3 makes this same point but from a somewhat different perspective. This category involves those instances in which the stimulus may be complete, but in which it is so ambiguous that two or more alternative perceptual constructions are probable outcomes. Examples of reversible perceptual processes, capable of flipping back and forth between alternative interpretations, are legion in the popular literature on visual perception and ubiquitous in any introductory discussion of visual perception. Most of these intriguing demonstrations can be classified within the following five categories: I. 2. 3. 4. S.
Reversal Reversal Reversal Reversal Reversal
of figure-ground organization. of depth. of figural meaning. of grouping. resulting from binocular rivalry.
The first four types of reversible phenomena are essentially monocular; only one eye is necessary to observe the reversal of the ambiguous figure. The fifth type, however, although phenomenally quite similar, is a binocular process that
820
10.
UNIDIMENSIONAL PROCESSES
occurs when two images are so disparate that no meaningful combination of them, comparable to that occurring during normal binocular fusion or stereopsis, is possible. All five categories, however, represent situations in which ambiguous or paradoxical stimuli are subsequently disambiguated by the visual system by alternation between alternative perceptual interpretations. To a greater or lesser degree, and depending on many factors including the nature of the ambiguity, the proportion of time allocated to each alternative percept may vary. Some alternative states are, therefore, said to be dominant over others. Reversal rate or the proportion of time spent perceiving each alternative state are often used as dependent variables in research on these phenomena. The first class of reversible figure, involving alternations in figure-ground relationships, has already been discussed. For example, Fig. 10-23 and Fig. 10-24 show two examples of figures in which the information leading to the establishment of the figure-ground relations is sufficiently ambiguous to allow alternative organizations to be constructed by the observer. Figure 10-24, the etching by the extraordinary Maurits Escher, demonstrated a profound reversibility that extends over an entire scene. It is now firmly established that Escher was well acquainted with the Gestalt demonstrations and that this scientific knowledge as well as his early contact with the highly geometric Moorish art in Spain were major influences on the development of his graphic artform. Those interested in this topic should read the delightful article by Marianne Teuber (1974), "Sources of Ambiguity in the Prints of Maurits C. Escher," to learn of this extraordinary interaction between art and science. The second type of alternation-depth reversals-is illustrated by the three stimuli shown in Fig. 10-39. All three of these line drawings are strongly suggestive of, but not definitive of, depth. The first (fig. 1O-39A) is the famous Necker cube, first described in 1832 as an artifact in the work of the crystallographer, Louis Necker. This object may be interpreted so that the upper right-hand comer (in the two-dimensional projection) may lie either on the front or the back plane of the perceived three-dimensional cube. The second (fig. 1O-39B); the equally famous Schroder staircase, exhibits this same property of depth reversal. Figure 1O-39C shows a third classic reversible figure known as the Thiery illusion. This ambiguous stimulus may be organized in either of two ways, which can best be appreciated by inspection of this figure. A few recent empirical studies of Necker cube reversal have been published. One of special interest has been reported by Kawabata, Yamagami, and Noaki (1978). They showed that the perceived three-dimensional organization of the 8 Although I emphasize the reversal aspects of the illusions now to be discussed. it is clear that all of these stimuli must also be influenced by the kind of Level 4 spatial and temporal interaction cues that are considered in the next chapter. In this regard. others may find the inclusion of this material at this point somewhat inconsistent with the rest of my taxonomy. However, the emphasis in this case is on a part of the perceptual processing of these figures that I believe is organizational, rather than relational, and it is for this reason that these phenomena are placed at this point in the discussion.
C. FIGURAL ORGANIZATION
A
821
B
c
FIG. 10.39. Three reversible figures: (A) The Necker cube . (B) The Schroeder staircase . (C) The Thiery Blocks.
Necker cube depended on the point of fixation taken by the observer. If, as shown in Fig . 10-40, the observer fixated the comer marked A, then virtually all of the time the surface ABCD appeared in front of surface A'B'C'D'. On the other hand, if the observer fixated the comer marked A', then ABCD appeared to be in front of A 'B 'C'D , only a small proportion of the time. Furthermore, it is now certain (Ellis, Wong, & Stark, 1979) that accommodation plays no role at all in the Necker cube response. Dynamic properties can also influence the interpretation of depth of objects like the Necker cube. If a rotating wireframe object is back-lighted so that a two-dimensional projection falls on a screen, the sequence of the projected images also helps to produce the perception of a solid object. In some instances, however, these dynamic two-dimensional projections can also be ambiguous,
822
10.
UNIDIMENSIONAL PROCESSES
CI
0
...
and
8
and
and and and
8 1 •
:
0
~ii ~
-I
~
U
§ -2 -1.45
-.~
-.45 t .05
t.55 tl.05 +1.55 SUBJ. EGH
-2 -2
-2
-2
-2
-2
LOG INDUCING FIELD (mil
-2
FIG. 11.3. Typical results from a simultaneous contrast experiment. The observer's admustment of the luminance of a comparison field is plotted as a function of the luminance of the inducing field. The parameter is the luminance of the test field. (From Heinemann, © 1955, with the of the American permission Psychological Association.)
B. SPATIAL CONTEXT
851
contrasted stimulus (usually a concentric annulus and a disc-shaped spot respectively) are placed at some distance from a matching stimulus. To measure the subjective magnitude of a contrasted test stimulus, the observer adjusts the intensity of a matching stimulus so that the matching brightness appears equal to that of the enclosed test stimulus. Thus, a key premise in this and related experiments is that although the effect of the contrasting stimulus on the brightness of the contrasted stimulus is substantial, its influence on the brightness of the matching stimulus is minimal. The change in the setting of the matching stimulus produced by the surround is used as a measure of the surround's contrast effect. The general results of the Heinemann-type experiment are shown in Fig. 11-3 for various levels of the test spot's luminance. In general, we see from this figure that the luminance of the surrounding annulus does not appreciably affect the brightness of the test spot until the luminance of the annulus is roughly half that of the test spot. 3 However, after that threshold has been exceeded, there is a remarkable reduction in the brightness of the test spot for increasing annulus luminance. The slope of this decline is quite steep; it is possible to reduce the apparent brightness of the test spot by contrast until the luminance of the matching light has to be less than one-thousandth of that of the test light to achieve a good match. An even more useful representation of the effect of contrasting luminance in simultaneous contrast paradigms has been suggested by Horeman (1965). As shown in Fig. 11-4, he constructed a three-dimensional graph showing the measured apparent brightness that resulted when various values of both test luminance and contrasting luminance were used. The resulting surface depicts, better than any single two-dimensional curve, the interaction effects of varying both contrasting and contrasted luminance. Furthermore, Oyama (1967) has noted that this is only one of a class of surfaces that can be constructed using such other dependent variables as the luminance of a matching stimulus or any single multiplicative transform of subjective brightness. He says that although the choice of dependent variable will affect the shape of the surface, all such surfaces will be linear transforms of each other. Despite the fact that workers such as Fry (1948), Diamond (1960), Ratliff (1965) and Com sweet (1970) have associated this kind of simultaneous brightness-contrast with lateral inhibitory interactions, the neural inhibition model is woefully inadequate to handle this contrast phenomenon, as I argued in Chapter 9. Another dramatic example counterindicating such a simplistic Level 2 3 Although I concentrate here on the diminishing effects of a brighter contrasting stimulus on a dimmer test one. it should at least be noted in passing that it is also possible to produce a reverse effect. If the contrasting stimulus is much dimmer than the test stimulus, the latter may appear brighter than it would in isolation. This reversed contrast, enhancement, or assimilation phenomenon has been known for many years (von Bezold, 1874) and has been discussed in detail by J. O. Robinson (1972) and by R. DeValois (1973). Reversed contrast can also re~ult when the size of the contrasting stimulus is reduced (Helson, 1943).
852
11.
LEVEL 4: PERCEPTUAL RELATIVISM
B
B
B
FIG. 11.4. A three-dimensional plot of the relationship between perceived brightness (B). the luminance of the inducing field (L ,) and the luminance of the test field (L). (From Horeman, © 1965, with the permission of Pergamon Press.)
theory of contrast was shown in Fig. 10-16. If this unusual contrast display is shown to an observer, despite the difference in the bipartite surround, the central spot will appear to be of constant brightness across its entire extent. However if even the slenderest vertical thread is placed to divide the test spot into equal halves, then the two sides change their apparent brightness substantially. It would be hard to develop a spatial "inhibitory" interaction model to explain this startling over all change. Obviously, in this case, the thread plays a role quite different from that of a modifier of "lateral inhibitory interactions." To the contrary, it has acted to alter substantially the' 'contexts" and perhaps even more importantly, the "meaning" of that context. Much higher-level processes must thus be invoked to explain the nature of this kind of perceptual experience. A closely related phenomenon is the so-called Craik-O'Brien-Comsweet illusion. Here too, the nature of the discontinuity between two regions can produce different perceived brightness even when the luminance of the two regions is identical. If a stimulus characterized by the cross-sectional luminance distribution shown in Fig. II-5A is presented to an observer, the stimulus is transformed in such a way that it is perceived as being made up of two areas of different,
B. SPATIAL CONTEXT
853
though uniform, brightness, as shown in Fig. 11-5B. The entire area to the left of the region of discontinuity appears dimmer than the entire area on the right. This illusion was first reported by Craik (1940a) and then by 0 'Brien (1958). It has subsequently been developed in several other forms by Comsweet (1970) among others. Although repeated allusions to a lateral interaction explanation of this phenomenon have been made by many perceptual psychobiologists, the important thing to note about this illusion is that the global change in brightness over such broad areas is not well predicted by any plausible lateral inhibitory interaction theory. The extreme breadth of the regions over which the phenomenon occurs seems to place it in another domain, one in which the overall "context" is responsible for the phenomenon rather than any local spatial interactions. Further evidence against considering this illusion as the outcome of a simple lateral spatial interaction is the fact that this same illusion can be produced along many other stimulus dimensions that do not involve brightness. MacKay (1973), for example, has shown that an analog of the Craik-O 'Brien-Com sweet illusion exists for line spacings (in which case the perceived distortion is on apparent line spacing rather than on brightness). Furthermore, Crovitz (1976) has shown that such an interaction exists for line lengths, and Anstis, Howard, and Rogers (1978) have even shown that an analog of the illusion exists in stereoscopic space in spite of the fact that the depth dimension does not even exist in the twodimensional retinooptic representation space in which the inhibitory interactions are conventionally supposed to be located. An analogous phenomenon can also
I~
A
)(
Physical stimulus
II
B
x Perceptuo I response FIG. 11.5. The Craik-O'Brien-Comsweet illusion. (A) A plot of the luminance of the physical stimulus . (B) A plot of the perceived brightness.
854
11.
LEVEL 4: PERCEPTUAL RELATIVISM
be produced in an auditory loudness context (Jesteadt, Green, & Wier, 1978; Rawdon-Smith & Grindley, 1935). The point of all of these examples is to stress that contrast effects are neither simple nor local. They are produced by relational factors that extend over very broad regions, and they are compelling in situations in which simple concepts of lateral inhibitory interaction cannot possibly apply. These illusions, quite to the contrary, must be the outcome of very broad global processes that depend on spatial "symbols" and "interpretations" of the global organization of stimuli more than on local inhibitory interactions. The fact that higher-order (i.e., Level 4) processes are seemingly better explanations of the simultaneous contrast phenomenon that Level 2 processes does not mean, of course, that analogous systematic laws of spatial interaction do not hold for these higher-level processes as well as for the lower-level ones. Indeed, it has been repeatedly shown that the distance between the test stimulus and a simultaneously contrasting stimulus is also a critical factor in determining the magnitude of the contrast effect. For example, in experiments similar in design to that of Heinemann's classic experiment, but with the added independent variable of distance between a circular test target and a circular (i.e., nonannular) contrasting target, Leibowitz, Mote, and Thurlow (1953), as well as Mackavey, Bartley, and Casella (1961), have shown that the farther the two stimuli are separated from each other the greater the decrease in the contrast effect. A fascinating and thoroughly illuminating corollary of this phenomenon, to which I alluded in Chapter 9, is that the decline in the brightness-contrast effect with spatial separation also occurs when the test and contrasting stimuli are increasingly separated in stereoscopic depth (Gilchrist, 1977; Gogel & Mershon, 1969; among others). Because depth is itself a constructed perceptual experience and because the two-dimensional projections on the retina are actually physically adjacent, this important finding also argues strongly against any kind of a peripheral neural interaction explanation of contrast. This finding-reduction in contrast efficacy with increase in depth separation-is also closely related to the continuing controversy concerning the possible binocularity of the simultaneous contrast phenomena. As noted in Chapter 9, there is some evidence (Uttal, 1973, p. 454) that brightness contrast occurs even when the test and contrasting portions of the stimulus are presented dichoptically. However, I must also acknowledge that this is not universally accepted. Many experimenters have interpreted their experimental results to indicate that dichoptic brightness contrast does not in fact exist. This point, of course, is critical, for it would require a severe stretch of a peripheral lateral inhibition model to explain any simultaneous contrast effects if the phenomena were definitively shown in this manner to be present with dichoptic stimuli. However, as we see later in this chapter, there are many other instances of binocular interactions that do support a centralist explanation for this general class of phenomenon.
B. SPATIAL CONTEXT
855
Geometrical factors other than spatial separation also play an influential role in defining the magnitude of the simultaneous contrast phenomenon . Increases in either the number or size of the contrasting stimuli increase the brightnesscontrast effect in the same way as does an increase in luminance (Torii & Uemura, 1965). The reader is also directed to Kitterle (1972) for both new data and a good minireview of the effects of size and number of the contrasting stimuli on the simultaneous contrast phenomenon. Thus both spatial and areal factors play an important role in defining the extent of the brightness-contrast effect. However they too are not totally definitive of the resulting percept. Organizational and configurational properties of the stimulus also influence simultaneous contrast effects even when luminance, distance, and area are kept constant. It is, for example , even possible to reverse contrast to assimilation while holding the luminance and areal relations constant simply by altering the spatial configuration. An example of the influence of configuration on simultaneous contrast can be found in the Benary cross illusion shown in Fig. 11-6. In this figure, it can be seen that the triangular area external to the body of the cross is apparently brighter than the one lying within the perimeter of the cross. However, both are surrounded by exactly the same amount of black and white and have exactly the same black-white contour length. Kanizsa (1975) explains this paradoxical illusion by noting that the triangle inside the cross appears to "belong" to the cross and the one outside the perimeter is not a " part" of the figure . Thus, according to Kanizsa, the difference in the degree of "belongingness " (a symbolic and certainly not a geometric concept) is the immediate precursor of the difference in contrast effects.
FIG. 11 .6. The Benary cross illusion showing paradoxical contrast. Though both gray triangles should have the same inhibitory forces operating on them, they tend to be seen with different brightness because of their configurational relationship (inside versus outside) to the large cross. (Courtesy of Dr. Gaetano Kanizsa of the University of Trieste.)
856
11.
LEVEL 4: PERCEPTUAL RELATIVISM
Just how complex the contrast interactions can be is perhaps best illustrated by an important study reported by Jameson and Hurvich (1961). In their experiment the stimulus used was a complex pattern involving five different square regions arranged as indicated in Fig. 11-7. The reflectance of each of the five regions was kept constant; therefore, the relative luminance of all of the areas remained constant, however much the overall illumination varied. The pattern was presented at three different levels of illumination, and observers were asked to make brightness judgments (as measured by the setting of a matching light) about each region at each of these three levels of illumination. Jameson and Hurvich obtained a rather startling result in this experiment. As would have been expected from what we already know about contrast, there was no simple relationship between the luminance of each region and its respective apparent brightness. For example, no power law was able to define the relationship between the luminance of each area and the perceived brightness of that area. The brightness function of each area depended on its relations with its neighbors, much more than on the absolute level of illumination and its reflectance. Contrast affects were, therefore, complex among the different areas and no single function could account for the brightness changes of all five regions at the three levels of illumination. Indeed, the perceptual responses to the various stimulus regions were not even all qualitatively alike; i.e., they were not in the same direction. The stimulus square with the lowest reflectance descreased in brightness as the overall illumination of this complex scene increased. On the other hand, the three highest reflectance squares substantially increased in preceived brightness as the overall illumination increased. Not surprisingly in such a
FIG. 11.7. The stimulus pattern used in the Jameson and Hurvich experiment. (From Jameson and Hurvich, © 1961, with the permission of the American Association for the Advancement of Science.)
B. SPATIAL CONTEXT
857
situation, with some increases and some decreases there must be a neutral point and, indeed, the remaining stimulus, the second least reflective, remained relatively constant in brightness over wide ranges of illumination. All of these results are shown graphically in Figs. II-8A-E. Such diverse results led Jameson and Hurvich to reject both a "ratio" and "adaptation level" explanation of contrast. Although seeing some promise for neural interaction models, they too concluded that "interpretive" processes must be invoked to explain contrast. Simultaneous brightness contrast, therefore, seems to be a highly complex, and probably a very high-level, central effect in which the dominant processes are best incorporated within the Level 4 rubric. There certainly are significant perturbations inserted into the spatial interaction process by lower-level processes (Mach type lateral neural interactions may modify the effect near contours, and blur may affect the brightness-contrast phenomenon to some degree-Coren and Girgus, 1978). Nevertheless, there is little doubt that by far the major portion of the phenomena I have discussed here must be attributable to perceptual computations that work according to the rules of a logic of which we know very little. Given the contextual and paradoxical properties of wide area contrast, one can only conclude that the substantial confusion found in the literature of these Level 2 and Level 4 processes is based on little more than a superficial analogy. Walter Gogel (1978) has put it much better. Recalling and generalizing Kirschmann's (1891) and Titchener's (1915) earlier laws of proximity, he notes that many perceptual phenomena display a sensitivity to variations in the spatial separation between the interacting objects; Gogel refers to this very general sensitivity as the "adjacency principle. " The important point about the adjacency principle is that Gogel has conceptualized it as a very general concept denoting a wide variety of perceptual processes that, although are all dependent on distance, are not necessarily identical in terms of underlying mechanisms. Some of these processes certainly are mediated by peripheral and local neural interactions; others, however, are mediated by perceptual processes of a much greater level of complexity. Some are relatively automatic in the sense that Helmholtz referred to as "unconscious inference" (and I believe that simultaneous contrast falls well within the limits of such a subtle and invisible computational process); others may indeed involve attentive cognitive processing of an even more complicated nature.
b. Color Contrast. A generalization of the concept of simultaneous spatial contrast beyond the dimension of brightness is illustrated by the effects on perceived color that occur when a colored object is surrounded by objects or surrounds of other colors. The realization that, like brightness, the chromatic response to a stimulus is not determined solely by the wavelengths of light emitted or reflected from it, but also by its context, was appreciated very early in the history of experimental psychology. Meyer (1855), Helmholtz (1866), and Hering (1887) are among the great names of 19th century visual science that
3.0
3.0
" -..,+'"
w
z c z
i
.. .. ..3" ~
2.0
O~
u
z c z
.. .." :> oJ
~+
~
'"
i
..
C
C
IWIW OFOF BRIGHTNESS
1I
CONSTANCY
0 oJ
1.0
0.0
t.o
0.0 LOG TEST lUMINANCE
1.0
LAW 0' BRI'HTNESS CONSTANCY 1.0
0.0
2.0
t.e LO'
A
w
u
-..,+e-
oJ
2.0
.. Z
X
"'....
U
."3
B
C
O~
~'"
." ~
",
..
.." C
LAW OF BRIGHTNESS
lOG
~'" O~
% U
1.0
i.o
..~
1.0
Z
0 oJ
aD
."
-..,+ '"
z
..'""
0.0 lEST LUMINANCE
LAW OF BRIGHTNESS CONSTANCY 0.0
CONSTANCY
1.0
2.0
i.o
1.0 LOG
c
0.0 TEST LUMINANCE
1.0
D 2.0
"
-..,+e-
w u
z c
~
"-'
. :>
"'....
z
.. .."3
..'""
-..~
1.0
X o
O~
~'"
c
LAW OF BRIGHTNESS
CONSTANCY
0.0
i.o
i.o LOG
0.0 TES T LUMINANCE
1.0
2.0
E
FIG. 11.8. The results of the Jameson and Hurvich experiment showing the luminance to which the matching stimulus had to be set as a function of the overall luminance for (A) the center square in Fig. 11.7; (B) the right square; (C) the upper square; (D) the left square; and finally, (E) the lower square. (From Jameson and Hurvich, © 1961, with the permission of the American Association for the Advancement of Science.)
858
2.0
"
c
-..~
1.0
-..,+'"
w z
u
"
z z
c
i
:>
0.0 TEST LUMINANCE
2.0
3.0
.
'"
~'" O~
2.0
z
'"
u
.."
i
~'"
~
z
%
'"
-..,+'"
w
" -..,+..'"
u
2.0
B. SPATIAL CONTEXT
859
considered the problem of color contrast to be of substantial import . Subsequently, Pre tori and Sachs (1895) showed that color assimilation-i .e., situations in which the test stimulus acquires the color of the contrasting stimuluscould also be obtained even if the test area were black. The spatial interactions that mediate color contrast, therefore, share one of the main characteristics of brightness contrast-when the test stimulus is dim or dark, it tends to adopt the color of the inducing surround rather than to contrast with it. As early as 1891 Kirschmann felt that knowledge of color contrast was sufficiently complete to formulate five general rules governing the phenomenon. They have been summarized by Graham and Brown (1965 p. 460, abstracted) as follows: 1. Color contrast increases when the area of the contrasting region increases. 2. Color contrast increases when the area of the contrasted region decreases. 3. Color contrast declines when the contrasting regions are increasingly separated . 4 . Color contrast is at a maximum when brightness contrast is minimal. 5. Color contrast increases with increasing saturation of the contrasting region. These general rules not only describe the dynamics of the spatial interactions involved in color contrast but also represent a fairly complete expression of the factors (distance, purity, area, luminance difference) that are effective in defining the magnitude of these particular contrast phenomena. Most recent research has been an attempt to determine the quantitative details of each of Kirschmann 's rules. For example, Helson and Rohles (1959) have made detailed measurements of the effect of the size of the contrasting region and have generally confirmed Kirschmann's first law. As another example, Oyama and Hsia (1966) specifically attacked the problem of the effect on color contrast of the distance separatmg the contrasted and the contrasting stimulus. Using a procedure in which the observer had to adjust the color of a stimulus to the best perceived blue , green, yell or, or red appearance, they measured the changing magnitude of the contrast effect when a contrasting annulus was separated by various spacings from a concentric disk. The results of this experiment are shown in Fig. 11-9. Depending on the spectral properties of the contrasting stimulus and the separation, the adjustment of the test stimulus to produce the best color varied considerably. Oyama and Hsia found that the color-contrast effect, as suggested by Kirschmann, decreased with greater separation according to the functions shown in this figure. Kirschmann's law dealing with the impact of the spectral purity of the contrasting stimulus was studied quantitatively by Schjelderup-Ebbe (1926) in a study that remains definitive. He reported a strong positive and linear relationship between the magnitude of the contrast effect and the purity of the contrasting light. Another major finding of Schjelderup-Ebbe's experiment was that the
860
11.
LEVEL 4: PERCEPTUAL RELATIVISM
INSTRUCTED HUE-NAMES
SUBJECT: YH
'GREEN
BLUE
'""' I3LU£ -....j ~ 461m)1 llJ-
05'
~ ~ GI CI)
CI
~
...." '-,.
.1'$9
"'.. "nllve
dec,s,on Pfoces! proces! declSton
kerneIS I,nes edges. Gabor Gabar Signals SIgnals croSSlngs. terminators terrr"nators OIe etc
FIG. 4.7.
Caelli's theory of texture segregation (from Caelli, 1985),
analysis of the discrimination process in particular and perception in general. The highly constrained and specific stimulus, OUT ability to quantify it precisely, and the richness of the phenomena associated with texture perception all have made this a particularly coherent research topic. On the other hand, it is my personal opinion that far too much has been made of some rather superficial analogies between the perceptual units-the textons-and the neurophysiological unitsthe neurons. I believe, that the "textons " reflect the action of vast networks of interacting neurons (whose collective function is expressed in these special sensitivities) rather than the special sensitivities of individual neurons. Furthermore, it seems to me that the advantage that is gained by these special units can be mainly attributable to their unusual ability to pulllocal features together into global structures for which the computational requirements are vastly reduced. Local microstructure that does not contribute to such a global organization are not observed as textons, not because the neurons of the peripheral communication system are less weIl tuned to them, but rather because the algorithms built into more central portions of the nervous system are not programmed to respond to them. The texton hypothesis, therefore, is closely related to the global-Iocal controversy. In short, I applaud the technique and the findings of texture research, but I am not comfortable with the neuroreductionistic interpretations made of these fmdings.
E.
THE MATCHING OF VISUAL FORMS
199
E. THE MATCHING OF VISUAL FORMS
Discrimination as we have defmed it and as pursued as a research topic also includes other tasks than the similarity studies of Tversky and Krumhansl, on the one band, and the preattentive texture discrimination paradigm that has been developed and researched by Beck, Julesz, Triesman, and others, on the other. Another popular research paradigm that approaches the problem of discrimination requires observers equally simply to specify whether two stimuli are the same or different, not in their texture particularly, but more often in their global form. Although this task is, at first glance, simple and direct, it is fascinating to wander through the perceptual literature and read of the many applications of this technique. One distinguished researcher, Michael Posner (1978), has used this simple research paradigm to study a wide variety of perceptual and other cognitive processes. Same-different-type tasks have been applied by him in searches for understanding about preattentive visual processes, the coding of visual forms in short-term memory, and studies of active manipulation of visual images, as weIl as studies of memory and reading. Obviously there is a strong conviction that this superficially simple research tool embodies a particularly sensitive assay technique that may be applicable in a diverse way as a probe into a very wide range of cognitive processes. Indeed, there is just as strong a belief that a discrimination process of this kind is as central to perception as are the similarity judgments described earlier in this chapter. For example, Nickerson, (1978) says: "The act of comparing one thing with another and determining whether they are instances of the same thing must be a very fundamental type of perceptual activity . Without the ability to make such comparisons there could be no perception as we know it" (p. 77). The two experimental paradigms-similarity judgments and "same-different" or "matching" discriminations-as I noted earlier, must be considered to be the end points of a methodological continuum. At one extreme is a dichotomous, binary judgment of identity or nonidentity; at the other is a graded evaluation of the degree to which two items are alike. It is very likely that the underlying perceptual mechanisms in each case are very similar, if not identical. Nevertheless, the two approaches have grown up separately and the theoretical and cultural (in a scientific sense) differences between the two traditions are substantial. It is entirely possible to read a paper on same-different judgments that does not refer at all to the work on similarity by Tversky, Krumhansl, or any of the other leaders involved in that aspect of the problem. It is as rare in the similarity literature to find references to Posner or Nickerson' s distinguished work using the samedifferent methodology. This is another example of the unfortunate compartmentalization of modem scientific psychology, even in fields that are so closely related. It also is an indication of the unfortunate narrowness of most theoretical enterprises in this field.
200
4.
DISCRIMINATION OF VISUAL FORMS
This segregation of discrimination theories and findings into two separate camps is very likely a serious conceptual error, an overly rigid compartmentalization of the discrimination literature, that has no justification in a biological sense. Some steps have been taken to create a synthesis: I shall discuss Proctor's work, an effort toward unifying some of the matching and similarity data at least, but there may be other opportunities yet forthcoming for some kind of possible intellectual unification that go even further to demonstrate how similarity and matching data are manifestations of the same underlying process,. Perhaps the main reason for the special attention given to and the broad applicability of the same-different or matching technique is that a number of different perceptual mechanisms have been or can be probed by the same-different process by judicious design of the experiment. At its most primitive level (primitive at least from the joint of view of some theories of perceptual organization), matching involves only a simple templatelike comparison of the perceptual representation of two stimuli or even of their constituent features. At a somewhat more complex level of analysis, and this is where the technique becomes useful to psychologists interested in the representation process or other higher level cognitive mechanisms, the hypothesis that the image is encoded in some way that may not be simply a physical map can be tested. In this context, the invisible symbolic representations of geometrical objects (such as letters, words, or even patterns of activity in neural nets) often become more salient than the simple geometrical properties of the objects themselves. Obviously, there is much more going on here than simply a template match. At an even higher level of analysis, the active attentional process involved in making same-different judgments can also be explored. A very interesting application in this context is the case in which the comparison requires that some kind of a manipulation or transformation of the visual image or representation is carried out. Experiments have been designed in which the observer is asked to place the two objects to be compared in some kind of canonical spatial configuration, even though the stimuli were not originally presented in that configuration, before the matching process can be undertaken; the objects must be scaled in size or rotated (by what is clearly a very active and effortful attentional process) before the match is made. As we shall see in the discussion of the work of Shepard and Cooper, this approach provides a strategy for studying mental imagery that has been one of the most interesting developments of recent times in cognitive psychology. Thus the processes of matching or of making what apriori may seem to be a very simple judgment of the identity or nonidentity of a pair of stimulus objects can be a powerful probe of a wide variety of perceptual mechanisms. As with similarity judgments, this experimental procedure has become one of the main tools in the analysis of perceptual mechanisms that may be far more complicated and that go far beyond the most simple notion of discrimination itself. One way to understand why this simple technique is so broadly applicable is
E.
THE MATCHING OF VISUAL FORMS
201
to appreciate that there are usually many different criteria by means of which a pair of forms may be judged to be the same or different. One of the most powerful and influential experiments of the last two decades in this regard was reported by Posner's group. A discussion ofhis work constitutes one ofthe central themes of the remainder of this chapter. For the moment, I point out that the work of Posner and his colleagues suggested that observers may utilize at least three different kinds of comparisons, each of which may assay a totally different kind of judgmental process. Their observers could evaluate identity on the basis of a purely physical match, on the basis of a name match, or even on the basis of a match based on a highly abstract rule. For example, in the latter case, the rule guiding the decision might be based on the observer's implicitly asking whether or not two stimulus letters are consonants. It is aremarkable monument to the ingenuity of Posner and co-workers how deeply this simple same-different paradigm probed the nature of multiple levels of discriminative cognition. However, the very fundamental question must be asked at this point: Is there only one or are there several distinct processes involved in the same-different type of experiment? In other words, does the fact that a single experimental procedure is being used necessarily mean that only one process exists, or are there several different same-different processes potentially at work in this type of experiment? A completely satisfactory ans wer to any one of the forms in which this question can be posed is not easy to give. We do know that the simplest changes in an experimental task are capable of drastically altering the behavior and, presumably, which of several possible underlying mechanisms is invoked. One could imagine, for example, that a physical match might require only a preattentive, automatic template comparison, whereas a rule-based (e.g., Are the two letters consonants?) criterion might activate attentive mechanisms executing syntactic-rule evaluating processes that actually have nothing in common with the templatelike comparisons. As we see, the suggestion that multiple processes are tapped by a single experimental paradigm-same-different judgments-is exactly the solution to the problem that Posner evolved in his search for a conceptual schema for these data obtained in his experiments. In other cases, however, there has been a concerted effort to unify the diverse results and the hierarchy of reaction times. This is the direction that Robert Proctor has taken in his theoretical development, and we also consider this trend in thinking about discrimination. In this section some of the classic experiments and modem data collection efforts relevant to the same-different paradigm are discussed. I highlight, in particular, the work of Nickerson, Posner, and Shepard. We then look at some of the contemporary theories that deal specifically with the same-different paradigrns at all of its many possible levels and return to consider the issue of whether one or many mechanisms are involved when one asks an observer to make as simple a choice as "Are these two objects the same or different?
202
4.
DISCRIMINATION OF VISUAL FORMS
I. Nickerson on Physical Matches Since 1965 Raymond S. Nickerson has been studying one of the most curious paradoxes in discrimination research. In aseries of papers (Nickerson, 1965, 1968, 1969, 1973, 1975, 1976, 1978) that spans two decades he has concerned himself with the curious fact that an observer in a same-different discrimination task who is confronted with two stimulus objects that are physically the same takes a shorter period of time to assert that they are, in fact, the same than he would have taken to assert that a pair of nonidentical objects are different. Why is it that this seemingly straightforward datum is so startling and considered by so many scholars in this field to be a major empirical paradox? The answer to this question is, in my opinion, that this finding is inconsistent with the predisposition among contemporary psychological theorists toward afeature analytical explanation of the comparison process. In accord with what must certainly be considered to be the "mainstream," if not the most logically correct, oftheoretical thinking in recent years, the nearly consensual model ofthe shapematching process is founded on the idea that a feature-by-feature comparison must be made in a same-different discrimination task in which the criterion for identity is a physical match. Thus, a theoretician operating from the feature-theoretical point of view would argue that an observer should be willing to assert that any two stimulus objects are different immediately upon establishing that any single pair of comparable constituent features are not the same. On the other hand, this feature-centered logic goes on, the fact that two objects are the same can only be asserted after all of the pairs of features have been compared and shown to be the same. Thus, it is argued, it should take longer on the average to assert "same" than it should take to assert "different." Logical as this argument may seem, Raymond Nickerson has repeatedly shown (and many other workers have confirmed) that this is not the way the visual system works. Rather, "same" judgments take significantly less time on the average than do "different" judgments. Herein lies the paradoxical nature of the result and the basis of its special interest to the perceptual research community. Nevertheless, it must be reiterated that this result is paradoxical only in the context of a feature-by-feature theoretical analysis. Indeed, this line of explanation may be an exceptionally dear-cut case of begging the question, in logical parlance. It is entirely possible that there are equally plausible and reasonable alternative solutions, other than the purported feature-by-feature analysis based upon the assumption of a elementalistic solution, to the whole-part controversy. Indeed, this paradoxical result may itself be considered a strong argument that a feature comparison process is not taking place, but rather that our comparison is made on the basis of more global attributes of the stimuli. That this point of view is plausible has been suggested earlier by others, most notably Nickerson (1978) himselfwho, while speaking ofthe advantage of "same" over "different" trials in reaction times, noted that: what it seems to suggest is that the same judgment
E.
THE MATCHING OF VISUAL FORMS
203
in these experiments is not the outcome of an analytic process that compares stimuli in a feature-by-feature fashion and terminates either upon finding amismatch or after determining that there are no mismatches to be found" (p. 78). Nickerson then goes on in this same report to discuss some of the support for a nonfeature analytic model of the comparison process. Even more helpful, however, is his discussion of the variety of theories that have been proposed to explain the "paradox" of the advantage exhibited by "same" judgments over "different" responses. He lists the following possible explanations that do not assume a feature-by-feature analysis: 1. An artifact injected by the experimenter in sampling the stimuli such that specific "same" pairs occur too frequently. 2. The priming effect of the first stimulus in the pair to be compared on the second stimulus when the two are identical. 3. The improvement in the memory trace of the second, thus producing a higher "perceptual clarity" of the second when the two are identical compared to the case when the two stimuli differ. 4. Biases on the part of the observer to respond "same." 5. Rechecking in the sense suggested by Krueger (1978) when different stimuli are presented in a pair and the appropriate response is different. 6. Different decision processes for "same" and "different" pairs. Quite a few theoreticians in this field have not picked up on what is the especially deep insight by Nickerson. If they are wrong, and if Nickerson is correct in his assumption that the decisions in such a comparison may not, in fact, involve a feature-by-feature analysis, the "paradoxical" result is not quite so "surprising" or "paradoxical" as it may have seemed at first glance. One major alternative theoretical theme, for example, can be based on the premise that the two figures are dealt with holistically rather than elementalistically. If only a single global organizational property of one form need be compared with the same holistic attribute of another form, then quite a different expectation-namely that "different" judgments should be made quicker than "same" ones-would be easily justified. Egeth and Blecker (1971) have proposed that the "paradoxical" empirical superiority of "same" over "different" judgments can be explained on the basis that the two decisions are the end product of totally different mechanisms. That is, though the task in both cases at first glance seems to be identical, this is but an example of another superficial analogy. They argue that the nature of the stimuli directs the decision process when they are the same to what is essentially a different computational process than when they are different. In other words, the existence of two quite different processes can be invoked to explain the unexpected and pseudoparadoxical result of more rapid "same" judgments. The two
204
4.
DISCRIMINATION OF VISUAL FORMS
mechanisms simply have different properties ("same" judgments are mediated by a process that is just intrinsically faster than a "different" process) and the paradox is resolved simply by altering the premises under wbich the hypothetical explanation is generated. The other general theoretical direction in explaining the discrepancy between "same" and "different" judgments is exemplified by Krueger's (1978) "Noisy Operator" theory, a point of view that will be spelled out in more detail in a few pages. To briefly anticipate his model, however, both the identical and nonidentical stimuli are processed by the same mechanism in Krueger's model. But, Krueger (1978) and others propose that the delay in processing different stimuli (compared to the more rapid response to stimuli that actually are the same) is due to the necessity of continuously rechecking the "different" stimuli.
2. Posner on Matches Based on Different Criteria Another extremely influentialline of research on matching has been reported over an extended period by Posner. In a major series of reports that was initiated by a joint publication with R.F. Mitchell (Posner & Mitchell, 1967) and which were followed up with several other important papers including that of Posner (1969) and Posner, Boies, Eichelman, and Taylor (1969), Posner defined an experimental paradigm that has now come to be considered a classic means of studying matching behavior at several different cognitive levels. The key result of this work was the outcome that substantial differences in the way matches were made occurred depending on the nature of the criterion that was used to justify the match. That is, as mentioned earlier, the experimenter may ask the ob server to compare two letters on the basis of a purely physical match (e.g., two upper case B's must be reported to be the "same" according to this criterion, but an upper case B and a lower case b must be reported to be different); on the basis of a name match (e.g., an upper case B would be reported to be the "same" as a lower case b according to this criterion); or, finally, on the basis of some more elaborate rule such as whether or not the two letters were both vowels (e.g., an upper case E and a lower case u must be reported to be the "same" if the observer followed this rule). Other intermediate influences were also examined in the Posner studies. For example, there were differences that depended on the particular letters that were chosen and the type font that was used to produce the stimuli. A more subtle comparison was made between upper and lower case letters that were simply size transformations of each other, (e.g., C and c), on the one hand and letters that were actually different forms in the upper and lower cases (e.g., G and g). These elegant experiments, simple, direct, and relevant to a broad range of cognitive problems, produced results that were especially clear cut. Posner and bis colleagues, using reaction time as their dependent variable throughout this entire series of studies, discovered what appeared to be a bierarchy of processing
E.
THE MATCHING OF VISUAL FORMS
205
V,sual Physlcal
549
AA
Name
623 Aa
Vawels Ae
699
Consonants
904
SemantlC Different
80~AB
Bc
FIG. 4.8. The temporal hierarchy of aseries of reaction time experiments (from Posner, 1969),
levels that depended on the specific decision criterion that was imposed on the observer . Identical case letters were reported to be, in fact, identical with the shortest reaction times; letters that were alike in narne but were simply differentsize transformations of each other were the next fastest; letters that were alike in name but differed in shape were next; and finally, letters that were not alike in name, but were the same only in accord with some semantic or syntactic rule, had the longest reaction times. Figure 4.8 displays some of these results in an unusual form-a reaction-time hierarchy in which the sequential steps of an increasingly difficult (as ordered by their progressively prolonged reaction times) series of decisions is tagged with what are typical reaction times for the particular kind of criterion the observer is instructed to use. Posner (1978) interprets this empirical chart, which displays the length of time it takes to make a decision for each type of decision criterion, as reflecting the action of a perceptual system that is organized into a bierarchy of relatively independent processing stages. His analysis is based upon reaction times but, in a fundamental way, it is obvious that bis scheme is quite different from the classic notion of temporally additive processes suggested by Donders (1868). Posner's , 'chronometric" method has been used as a source of data to support what is essentially a theory of many parallel channels, some preserving the geometry of astimulus, some preserving the sound of its name, and some supporting a decision based on some much more subtle semantic aspect ofthe message it conveys. Posner argues that the different processes may actually go on in parallel rather than in serial order. Posner thus uses reaction-time measurements to support his theoretical conviction that, at the very least, there are separate and distinct encoded representations being constructed and processed in the perceptual system. Both the visual geometry and the verbal names for these alphabetic characters are represented and processed in different cognitively active systems. Furthermore, it is possible for these two systems to communicate in some manner with each other. Thus, name codes and geometrical codes can interact, in ways that are analogous to the Stroop phenomenon (Stroop, 1935; also see my discussion of Navon's work, 1977), quite effectively to interfere with each other.
206
4.
DISCRIMINATION OF VISUAL FORMS
In other experiments Posner has shown that the semantic or verbal level of processing seems to be totally insensitive to the raw physical characteristics of the stimulus letter (its contrast, luminosity, color, or even its size seem not to affect reaction times when the criterion of choice is the identity of a name or role match). On the other hand, modifications of these physical parameters can drastically slow up the processing in a matcbing task in wbich the observer is asked to use a criterion of physical identity. A closely related issue that has been investigated by Posner concerns matches between auditory and visual stimuli. He observed that whenever one is asked to make such a cross-modality match (e.g., does the spoken word "Ay" presented to an observer through a set of earphones represent the same thing as the printed letter a?) the reaction times obtained always seem to be comparable to those that are typical of verbal matches. The conclusion, therefore, is that cross-modality matches of this sort are mediated by the symbolic names or representations they generate, rather than by any comparison of the primary physical or sensory representations. Data of this sort raise serious questions about some of the dubious subfields of psychological research-specifically, the elusive phenomena called synesthesia, wherein, some psychologists assert that auditory stimuli give rise directly to visual experiences. Marks (1987) provides a useful service in highlighting a possible alternative explanation of synesthesia than direct sensory interactions. He correctly, in my opinion, points out that the conversion of auditory stimuli to visual responses (or vice versa) could be equally as well be understood in terms of semantic or symbolic processes and that the issue is still very much open. Cross-modal transformations such as synesthesia have always seemed to me to reflect a peculiar problem for experimental psychology. Perhaps comparison of reaction-time measures ofthis process with those from Posner's studies will help to resolve their origin. Another interesting aspect of Posner's experimental design is that it offers an opportunity for studying the process of the transition from visual to symbolic name representations. The work that I have described so far deals mainly with the effects obtained when the two letters to be compared are presented simultaneously. If, on the other hand, one modifies the experiment slightly, as Posner and bis colleagues did, and presents the two stimuli sequentially with varying temporal intervals between them, it is possible to study the time at which the visual representation begins to fail and the verbal one takes over. As the interval between two stimuli approaches 2 sec, there is a decreasing difference between the reaction times to two physically identical stimuli and two stimuli that are only alike in having the same name. The obvious explanation of this phenomenon is that the visual representation that is stored is available only for a relatively short time (Le., less than 2 sec), and that after that brief period has passed the information must be reencoded in a verbal or name form to be retained. Even though this duration is not identical with the measurements made of other forms of visual short-term storage, it is of the same order of magnitude and speaks to
E.
THE MATCHING OF VISUAL FORMS
207
the transitoriness of whatever storage mechanism it is that underlies the initial visual representation. The implications for short-term memory studies are, of course, both obvious and profound. Posner' s data and his conceptual model of a system of parallel and nearly independent representations provide far more knowledge about human information processes than just an analysis of the kind of discriminative behavior with which I am interested in this chapter. Indeed, it goes far beyond adescription of the initial visual processing of forms that is the main focus of this present work. In extending his analysis to the verbal and semantic codes and processes, Posner takes the discussion into a domain in which not only the form of the stimulus itself but also the modality along which it is presented plays only a secondary role. This is the enormous strength of Posner's analysis, it tends to provide a unifying theme tying together topics that are essentially geometrical and perceptual with topics that are essentially symbolic and cognitive. It is quite clear that the hierarchy of processes that are being assayed by Posner's work ranges from the lowest level preattentive mechanisms (e.g., those mainly affected by the physical properties of the visual stimulus) to high-level processes involving effortful attention (e.g., those requiring the application of semantic rules for their execution.) In this regard, this work plays the same kind of cross-Ievellinking role that Sperling's (1960) invention of the partial report technique did in coordinating our theories of perceptual matters, on the one band, and memorial topics, on the other. The most rapid skimming of Posner's (1978) book displays the remarkable relevance of the same-different paradigm to a far more diverse set of psychological topics than form perception alone. Let us concentrate now on the other kinds of theoretical arguments that have been proposed to explain the kinds of matching data that has been obtained in Nickerson's and Posner's studies.
3. Krueger's Noisy Operator Theory Another theoretical view that has considered the same-different discrimination process has been presented by Krueger (1978). Krueger's main goal was to specifically analyze the basis for the "paradoxical" peculiarity in discrimination tasks using same-different judgments that was identified by Nickerson-more rapid "same" than "different" response times. Krueger confronted this result by developing a theory based upon the supposition that the perceptual responses to the stimuli that are being compared (and, perhaps, also the underlying processing of those stimuli) are intrinsically noisy. That is, stimuli that are being compared cannot be considered to be perceptually identical even when they are physically identical because of the injection of random fluctuations by the perceptual systems into their representations. Thus the observer should not, if he or she is to optimize performance, immediately decide that stimuli are different on the basis of the mismatch of a single feature or attribute; it might just be a discrepancy due to random fluctuations. Rather, the observer must make a deci-
208
4.
DISCRIMINATION OF VISUAL FORMS
sion on the basis of a process somewhat akin to the decision made in a Theory of Signal Detection (TSD)-type paradigm. Figure 4.9 illustrates the analogy between the enumeration of the number of rnismatches that is central to Krueger's noisy operator theory and the strength of the noise and noise-plus-signal distributions, respectively, in the TSD paradigm. The difficulty faced by the observer , according to Krueger, in the context of a perceived rnismatch is that it is far more difficult to exclude a spurious rnismatch (due to noise) than it is to accept a true rnismatch in his or her comparison. The observer must, therefore, continuously recheck the rnismatching features, especially when the degree of rnismatch is near that which would be generated by the noise introduced into what were originally identical stimuli. It is at this point that the "paradoxieal" efficiency of same judgments begins to be understood, even if one adheres to a feature-oriented conceptualization of the problem. The observer must recheck the rnismatching features because truly different stimuli have more rnismatching features than the truly identical ones (i.e., both suffer from rnismatches due to noise) but since all rnismatches must be rechecked, it takes longer to process nonidentical forms than identical ones. Therein lies the resolution ofthe "paradox" and the advantage ofKrueger's noisy operator model for modeling same-difference judgments .
300
r •• pond
r •• pond
same "
"different"
250 Ci 200
,::
- 150
same
• 0
.jl
E :>
Z
different
100 50
o
5
10
15
20
Number of Perceived Differenc.s
FIG. 4.9. The similarity between the "Noisy Operator" and signal detection theory is clearly illustrated in this model of the former (trom Krueger, 1978),
E.
THE MATCHING OF VISUAL FORMS
209
One problem with Krueger's noisy operator theory is that it involves a "rechecking" procedure that is essentially sequential and dependent on earlier evaluations. It assumes, therefore, that aseries of examinations is made by the visual system, aseries that takes place over time, and that is based upon the presence or absence of mismatches that virtually have to be enumerated in each of many possible earlier steps in the sequence. Unfortunately, this is not in agreement with common experience; we have no sense of such aseries of decision and reprocessing steps. There is, therefore, very much of a post hoc flavor to this hypothesized process that leaves much to be desired of Krueger's model. Krueger and Chignell (1985) have gone on to prepare another updated generalization-the missing feature principle-that, in my opinion, removes them from the feature theorists and places them squarely among those who champion the precedence of holistic perception. They note that early in the perception of forms, local features have not emerged, and therefore some features are simply missing from the perceptual representation of, say, a letter. They go on to hypothesize that in those cases in which a feature may be missing from one of the pair of to-be-discriminated letters, a feature mismatch is simply ignored by the observer . Therefore, the ob server is much more likely to falsely report that the letters are the same than to report that they are different. Krueger and Chignell suggested that this phenomenon (which is essentially areversal of the predictions for reaction times of the noisy operator theory described above) should occur when the observer is stressed by having to respond very quickly (or if a trailing mask is used). In these situations, the judgment would be made on the basis of the early global, but only partial, representation of the stimulus. If the observer is given sufficient time, then the availability of all of the features allow the expected preponderance of errors in which more falsedifferent than false-same responses are made.
4. Proctor's Unified Theory of Matching Behavior One recent effort that has been made to simplify the increasing number of phenomena and theoretical models in the field of discrimination research is to be found in the work of Robert Proctor (1981). Proctor notes that both in the case of the "paradoxicaI" advantage in reaction times of "same" stimuli (seen in the work of Nickerson and others) and in the case of the advantage of physical over verbal matches (seen in the work ofPosner and others), explanatory models, whatever their details, have tended to concentrate on the early matching stages as the locus of the effects. In the case of the same-different advantage, Proctor points out that the theories generally attribute the phenomenon to either of two kinds of matching. In one kind of theory (e.g., Bamber, 1969), it is proposed that there are two different processes and the differences in reaction times are due simply to the time-constant characteristics intrinsic to the two processes. The other type of theory, exemplified by the work of Krueger (1978) that I have already
210
4.
DISCRIMINATION OF VISUAL FORMS
discussed, suggests that there is only one process, but this process operates differently in the "same" and "different" stimuli. The observed differences in performance are due to noise effects or some other factor that would necessarily demand that the comparison be rechecked more often for different than same stimuli because there are more discrepant features in nonmatcbing stimuli than in matching ones. Proctor's summary of the basic hypothesis underlying this latter theoretical orientation is that it argues that noise "is more likely to result in spurious mismatches offeatures than in spurious matches" (p. 292). Despite the differences in detail, both theories localize the critical decision at wbich the match occurs at a relatively peripheral level. In developing bis alternative "unified" model, Proctor argues that because the two phenomena-the time advantage of "same" over "different" judgments and the abbreviated reaction times for physical matches-are so similar in their phenomenology, it is essential that some effort be made to explain them in a common context and within a common theoretical rubric. Indeed, he also links both of these phenomena to the data obtained when certain other experimental paradigms, narnely priming and repetition6 are carried out following up on a suggestion that seems to have been first made by Nickerson (1978). Proctor suggests that there is a universal principle that brings together both of the matcbing phenomena, as well as the priming and repetition effects, under a common explanatory umbrella. That principle, he suggests (quite to the contrary of most of the other theoreticians who have worked in this arena) is not to be found in the matcbing process, but rather in the conditions that determine the rate of encoding of the stimuli. Proctor reports additional empirical research to justify this hypothesis, but because our concern here is mainly with theory, let us concentrate on the details of bis model. Proctor's version of an explanatory model is based upon three premises: 1. Several levels of processing occur. He refers to both passive physical or perceptual matcbing, on the one hand, and cognitive matching, wbich is linked to some aspect of memory or previous experience, on the other. This is very much in the vein of the preattentive-attentive dichotomy made by many contemporary investigators. 2. Repetition facilitates the process of stimulus encoding (i.e., when stimuli 6 Priming (e.g., see Beller, 1970) refers to the fact that a response to a particular stimulus may be facilitated or enhanced if another stimulus, to which the observer need not respond, but which is in some way similar to the test stimulus, precedes the test stimulus. Repetition (e.g., see Komblum, 1969) refers to the fact that in a sequence of reaction-time trials the reaction time will be shortened if the trial preceding the test one used the same stimulus. The differences between the two experimental designs are, obviously, quite smail. One, repetition, uses whatever is the previous tria1 in the run of trials as the influential one; the other, priming, introduces a special stimulus as a precursor of the test stimulus as the influential one. It is not at all dear that two distinct processes are at work in this situation.
F.
FORM MATCHING WITH TRANSFORMATIONS
211
are transferred from some preattentive, physical code to some symbolic, cognitive representation the process will go faster if it has occurred previously). 3. When competing cognitive codes are activated they inhibit each other. Thus, Proctor has three degrees of freedom to work with in bis attempt to unify the explanation of all four perceptual processes he proposes to bring within the same theoretical structure. His argument goes on to assert that there is a major difference in how the visual system handles simultaneous and sequential physical matches. In the case of simultaneous (side-by-side) matches, bis assertion is in agreement with most other theories that the advantage of the physical match occurs because a physica1 match is processed at a lower, simpler, and therefore faster level of processing. However, Proctor' s approach does differ with some of the other theories in at least some ways; he deviates from the mainstream when he goes on to assert that in all cases in wbich sequential comparisons must be made, both the diminishing advantage of the physica1 match over a name match and the progressive increase in the reaction times to "same" responses as the temporal interval between the stimuli increases is due to a "facilitation through repetition" process. Proctor's analysis thus incorporates many of the concepts of predecessor theories but brings them together by calling attention to the similarities among a number of different perceptual phenomena. This is a useful process, but by calling upon a triad of different processes (levels, facilitation, and inhibition) Proctor actually has not simplified the situation very much. Rather he has shown the existence of similarities and overlap, but from a formal point of view bis model does not reduce the number of premises that must be involved to explain a multitude of different phenomena. There are, however, a few differences from the conventional perspective in his model that are novel contributions. Proctor does, for example, propose a totally different explanation of the sequential physica1 matching process than is typical and his analysis does add to the arguments for a clear-cut distinction between the preattentive and attentive that is often blurred in other discussions.
F. FORM MATCHING WITH TRANSFORMATIONS In the previous section, I considered many theories that have to do with form matching. This topic was characterized by sets of stimuli that were presented in pairs typicaHy in some standard orientation and with some common size. Because of the relatively simple way in which these stimuli are arranged, there was a persistent theoretical focus on what appeared to be a rapid, automatic, preattentive processing relatively uninfluenced by attentive cognitive processes. In this section we consider a paradigm that is significantly different-before the stimuli can
212
4.
DISCRIMINATION OF VISUAL FORMS
be compared, they must be transformed by what appears to be a very active cognitive process to a canonical orientation, position, or size. As we see, what seems at first to be only a slight design difference has major ramifications both empirically and theoretically. One effect, in particular, is that the topics we discuss cannot by any stretch of the theoretical imagination be considered independently of some relatively high-level attentive and cognitive mechanisms. In all cases ofthis kind of experiment the observer must make a "conscious" and active effort to retransform the stimuli from their altered form into a standard form so that different items can be compared with each other. Whatever the nature of the comparison or matching process itself, it must be preceded by an attentive effort that takes this kind of matching process into what is now commonly referred to as the cognitive domain. There was, at least, a modicum of doubt about the issue of perceptual-versus-cognitive mechanisms for matching processes that occur without requiring such transformations. In this case, there is none. The experimental paradigm in this new case-matching following a necessary transformation-is much the same as that simpler task described earlier. The observer is asked only to specify if a pair of objects is the "same" or "different." However, before this is done, it is assumed that the observer must scale, rotate, or fold the stimuli in his or her "mind" to get the to-be-compared objects into the appropriate and comparable configuration. Obviously, such a mental image transformation makes very substantial demands on cognitive abilities that are not activated when an observer is asked to make a texture discrimination or to compare two untransformed stimulus objects. Some memory or representation of the two stimuli must be established and processed (thus giving an entree into the nature of that memory itself). Some active mental manipulation or transformation of at least one of the images is required (thus making it possible to examine the characteristics of the process of mental manipulation itself). These new processes must be superimposed upon what are, at the very least, analogs of the matching processes invoked in the standard form-matching paradigm that are also still required to carry out the task. I use the word analog in this context because there is no way at this point to tell whether the process of matching two simultaneously presented, untransformed images is identical to the process of matching memorized images in this new situation. The idea of comparing pairs of stimuli when one of them was transformed away from the canonical configuration of the other is an idea, as so many other in perceptual research, that has its origins weIl back in the classic literature. Ernst Mach (as cited in Shepard & Cooper, 1982) as early as 1886 had shown that pairs of random geometrical shapes were perceived as being identical more easily when they were in the same orientation than when one of them was rotated to some other orientation. The idea languished for many years, but it became an object of renewed interest to a number of investigators in the 196Os. One of the first modem studies on the mental rotation of geometrical forms was reported out by Boynton, Elworth, Onley, and Klingberg (1960). However, the names
F.
FORM MATCHING WITH TRANSFORMATIONS
213
most often associated with this line of research today are that of Shepard, Metzler, and Cooper, as well as a number of their other colleagues who have worked with them over the years. Indeed, the technique ofmental transformation has become the centerpiece of an extensive theoretical development in which this group has attempted to describe and explain the process underlying the mental representation of geometrical stimuli. It has also laid the basis for a renaissance of interest in mental imagery-a field that lay fallow for many years in want of a satisfactory experimental tool with which to probe and analyze what previously had been only introspective and anecdotal reports of mental imagery. Fortunateiy, for any one seriously interested in this topic, the evolution ofthis very important perceptual study has been reported in an anthology (Shepard & Cooper, 1982) of the key papers that have emerged in the previous 17 years from Shepard's group. This anthology of previously published papers has been annotated and commented to give us a very complete review of the development of these ideas and a deep insight into their theoretical approach. Shepard recounts, in an anecdote characterized by extraordinary temporal precision, that he conceived the idea of combining mental rotation and paired comparison techniques "in astate of 'hyponopomic' (sie) suspension between sleep and wakening, in the early morning of November 16, 1968. Just before 6:00 A.M." (p. 7). Although this sudden-insight phenomenon would itself be worthy of considerable study [What had he been reading or talking or thinking about in the hours before sleep the previous night?] it was only 2 days after the "hyponopomic" event that Shepard set down on paper the details of the experimental procedure that had leapt so curiously into his early morning consciousness. This brief note constituted the program of research that was to occupy him and bis colleagues for over 2 decades. In this document (Shepard, 1968) Shepard speIls out the general framework of the technique that was to be so powerfully utilized in the years to come. The original stimuli proposed were pairs of block constructions, as shown in Fig. 4.10, which may or may not be the same object simply
FIG. 4.10. The stimuli used by Shepard, Metzler, and Cooper in their experiments on mental rotation (from Shepard & Cooper, 1982).
214
4.
DISCRIMINATION OF VISUAL FORMS
rotated to a new orientation. The observer was to do exactly what the title of the work suggested, that is, rotate one of the two block objects in his or her imagination and as quiekly as possible assert whether the two were identical or not. Frequently the objects were designed to appear to be three-dimensional (by the judicious choiee of monocular cues), but in other cases they were two-dimensional forms that simply had to be rotated on the plane of the paper (i.e., about an invisible depth axis). The main independent variable was the difference in the initial degree of rotation between the two objects. The main dependent variable was the time it took to make this rotational transformation and respond that they were the same or different. This is, therefore, a chronometrie experiment in the tradition of Donders, Stemberg, and Posner. The first paper to grow from this fertile (and remarkably complete statement of the problem) was published by Shepard and Metzler (1971) and proved to be one of the most influential documents of its kind in perceptual psychology in this century. This brief Science article was then expanded with other experimental data and published by Metzler and Shepard (1974). In the expanded report, the tone for the program of research was set. The series of papers contained in Shepard and Cooper's anthology includes those by Shepard, Cooper, Metzler, Feng, Judd, Farrell, and others on mental rotations, paper folding, imagined construction, and apparent motion-each representing a creative and insightful experimental attack on a rich and diverse set of hitherto nonmanipulable perceptual phenomena. Among the many findings emanating from these studies were these: 1. The reaction time to tell if two objects are the same or different increases linearly as a function of the initial angular orientation differences between the two objects. 2. Mental rotation was typically carried out at less than 60 deg/sec. 3. Performance is only slightly degraded ifthe observers do not know in advance what the orientation of the presented objects is to be. 4. It does not matter to any significant degree whether the rotation is in the plane of the picture or, as indicated by monocular cues, to be in depth. 5. If a single character is presented at other than a vertical orientation, the time it takes to mentally rotate the character for recognition is not linear. 6. Imagined and real rotations are dealt with in virtually the same reaction times. To what theoretical issues does this work speak? The prime problem to which the authors direct their theoretical analysis concerns our understanding of the nature of the mental representation of two- and three-dimensional forms. As 1 have noted earlier, a major debate has raced for years in the imagery field conceming whether or not the image is stored in a symbolic (or verbal or propositional) form as opposed to a pictorial (or analog) manner. We have already seen in the work of Posner that both answers may be correct in different situations; pictorial represen-
F.
FORM MATCHING WITH TRANSFORMATIONS
215
tations seem to be short-lived and transient, rapidly (in a second or two) being transformed into some kind of a verbal or symbolic code. The work of Shepard, Metzler, and Cooler presents a vigorous case for the pictorial or analog representation theory. Their main point is that because the major result-the observed linearity of mental rotation speeds corresponds to the physics of a true physical rotation-the underlying process should also be occurring in the same continuous and analog manner. In their words " ... intermediate stages in the (mental) process have a one-to-one correspondence with the intermediate stages in the external rotation of an object" (p. 185 in Shepard & Cooper, 1982). Furthermore, they argue, in the holistic Gestalt tradition, that this means that the overall organization of the stimulus is maintained, and that it is, therefore, likely that the representation is also dealt with as an integrated pictorial whole rather than aseries of propositions or verbal codes. It must be made explicit, however, that the analog representation to which Shepard and his colIeagues allude is not a naive kind of "toy-in-the-head" model. They specifically eschew that sort of simplistic interpretation. Shepard goes out of his way to dismiss that notion and to substitute for it a notion of analog representation as a kind of mathematical isomorphism in which the representations of objects are related to, but not the same as, the attributes of the originally depicted object. In his model, there are one-to-one correspondences between objects and symbols, not dimensionally isomorphie physical maps. As distinguished and influential as this work has been, it must also be appreciated that the model of analog representation presented by Shepard and his colIeagues is not without its critics; their arguments are not considered to be 00equivocal by all contemporary theoreticians in the field. A number of criticisms ofthe analog (as opposed to the propositional or symbolic) approach to representation ofvisual images have been made. Among the most influential ofthe critics is Pylyshyn (1979, 1981). Some of his arguments are based on philosophical criteria of logic and parsimony. For example, Pylyshyn has asked: Because there are such good data that a great deal of pictorial information is encoded propositionally, why should we have to make a special system to handle data of this kind? Other arguments forwarded by Pylyshyn are based upon the fact that the results can be influenced by the kinds of stimulus, the nature of instructions, and practice. Thus what seems to be an analog system is nothing more than another propositional system clothed in what must be considered to be a phenomenological disguise. Another kind of counterargument to Shepard's notion of analog representation has already been mentioned. If it is true, as suggested by Moore (1956) that we cannot distinguish between alternative models on the basis of input-output relations alone, then there can be no resolution of the issue of what kind of internal representation is actually being used in any case no matter how high quality the empirical data. This argument has also been made by Anderson (1978), among others. I have more to say about this important point in the concluding chapter of this book.
s THE RECOGNITION OF VISUAL FORMS
A. INTRODUCTION In a previous book (Uttal, 1978) I wrote about the neurophysiological bases of learning and memory. I spoke of a major conceptual peculiarity conceming the most commonly accepted theoretical explanations of these memorial processes. That peculiarity lay in the fact that the nearly universally accepted theory of the most probable neurophysiological mechanism of learning and memory (synaptic changes as a result of use) was not based on any substantial amount of direct empirical evidence, but rather upon the logical and plausible argument that synaptic plasticity is, in fact, virtually the only possible neuroreductionistic mechanism ofbehavioral change as a result of experience. Although many experiments have been carried out in search of what Lashley (1950) referred to as the engram-the biological correlate of memory-it is a truism that no psychobiologist has ever shown changes in any synaptic junctions that can be direct1y related to any particular behavioral changes in marnmals.! ! There will probably be some argument about this assertion from biologists using simple invertebrate populations even if such an argument can easily be justified for vertebrates. Even in the case of those animals without spinal cords, I feel that the link between synaptic changes and learning (in the sense that we commonly think of learning is yet to be established. Whereas there is no question that simple behavioral changes analogous to vertebrate conditioning and habituation can be linked to certain neurophysiological events in invertebrates, the more elaborate kinds of learning in higher organisms still remain opaque to neurophysiological analysis. The reader interested in pursuing the problem of the neurophysiological basis of memory would profit from a reading ofLynch's (1986) essay. Lynch presents an up-to-date review ofthe field, but also acknowledges how preliminary is our wisdom in this field-even his own model is "long on speculation and short on data" (p. 65).
216
A.
INTRODUCTION
217
Nevertheless, the hypothesis of synaptic conductivity changes, perhaps associated with actual physical growth and perhaps associated with more subtle biochemical changes in membrane that are also putatively driven by use, is virtually universal in its acceptance among psychobiological theorists as the explanation of most persistent kinds of learning. There are, of course, many unknown details-we do not know how the specific biochemical mechanisms of synaptic plasticity are altered, whether use leads to a reduction or an increase in the synaptic count, or even where in the brain the critical synaptic events underlying any particular engram may be located. Nevertheless, a consensus has emerged that seems to agree that it is at the junctions between nerves-the synapses-that the necessary macromolecular events must occur to account for the retention of the effects of previous experience. Some insights are emerging into the biochemistry that accompanies changes from short-term to long-term storage and of the growth and swelling of junctions; there are even some hints being uncovered as to some of the particular regions of the brain in which the salient plastic events either actually take place or from where they may be controlled: There has, however,
never been any experiment that directly establishes the necessity and sufficiency ofthese neuronal chanqes for some closely linked behavior. For learning, as for allother cognitive processes, the leap from the discrete cellular mechanisms to the molar psychological phenomena has not yet been made. Anyone who disputes this point is living in a scientific fantasy world of connections unmade and of leaps unleapt. In spite of this unbridged conceptual gap, indirect arguments and simple logical necessity have made the synaptic hypothesis of learning among the most widely accepted of psychobiological theories. In this chapter the reader observes a similar situation unfolding concerning contemporary theories of the process underlying form recognition. The contemporary consensus among theoreticians, if the literature is to be accepted at face value, is that recognition occurs as a result of the analysis of a form into local features. As I review the theoreticalliterature, it is shown that the problem of form recognition is nearly universally approached from a point of view that argues that the analysis of complex forms into their "parts," "features," or "components" is the initial information-processing step in classification and recognition. I believe, however, that a considerable body of evidence argues that there are global strategies at work in human form perception that have little to do with the feature-by-feature algorithms that currently are the most popular theories of form recognition. The analogy I have drawn here between synapses and learning on the one hand and features and form recognition on the other is far from complete, however. In the former case there is little inconsistency between theory and empirical data; in the latter case data clashes with theory, even if this is not generally appreciated to be the case. There are many arguments why feature analysis can not be considered to be the main means by which the human being actually recognizes forms regardless of the ubiquitousness of this feature or elementalist approach in contemporary
218
5.
THE RECOGNITION OF VISUAL FORMS
theory building. Indeed, most of the demonstrations (as opposed to parametric experiments) that represent the first-order evidence of how we recognize forms are largely ignored by researchers in this field when it comes to generating theory . This kind of phenomenology, as so eloquently pointed out by Pomerantz and Kubovy (1981) in the summary to their very important book (a work that may be a harbinger of the arousal of interest in the cuerently dormant global or Gestalt perspective) must not be underestimated in terms of its impact on psychological thinking in the last half century. Demonstrations reflect the fundamental nature of perception in a direct and immediate fashion and should set the stage for research activities. Perceptual phenomena, therefore, must be, from one point of view, not only the initial guides to valid theories of perceptual reality, but the final arbiter in any dispute between a hypothetical mechanism and that perceptual reality. Simple demonstrations cannot be ignored in the search for reductionistic explanations solely because they urge a kind of theory for which we are not particularly weIl prepared by technique, tradition, or cuerent approach. The psychophysical fact, the final result of a long series of processes and transformations, must be the ultimate authority in disputes of this kind. Phenomenology incorporates the influences of all of the previous stages. Therefore, whatever neural- and feature-oriented explanations or heuristics we may develop must be altered if they are in conflict with these phenomenological first approximations. The point is that it is the explanations and models that must be modified, not the implications of the phenomenal demonstrations. Indeed, controIled psychophysical experimentation, one must remember, is nothing more than another way of achieving a more quantitative or more precise definition of phenomena; demonstrations differ from the graphic or statistical results of an experiment only in the degree of quantification and not in quality. To reject ademonstration because "it is but an illusion" or "it is not sufficiently quantified" runs counter to the very concept oftheory testing that has graced modem science. It is only in the rarest instances that a good mix of demonstration and more formal experimental data is produced. (One good recent example is Todd's, 1985, elegantly designed study of structure from motion.) And yet, we seem so very frequently to ignore the message being sent by some first-approximation demonstrations in the field of form perception. Whereas phenomenology cries out a distinctly global, Gestalt, and holistic message, our theories are predominantly elementalistic and feature oriented. Why should this be the case? I believe that the main arguments for the predominance of today's elementalistic theories are expediency and availability. These two reasons for the utter domination of feature analytic thinking in contemporary theory emerge from the successes of two powerful and influential bodies of knowledge-(a) modem studies of the discrete nature of the neuron and its responses and (b) the extraordinary engineering developments underlying the architecture and programming logic of the modem digital computer. These enormously exciting and fertile scientific traditions have been among the most effec-
Ä.
INTRODUCTION
219
tive intellectual forces constraining and guiding our theoretical structures in perception (as weIl as many other areas of science) in the past 40 years. The period of emergence of the new elementalistic form-recognition theories overlaps with the period in which both computer science and neurophysiological science have flourished. From an even broader perspective, and with the possible exception of macromolecular biochemistry, these two sciences have been the ones driving the current stage of human intellectual development in many of its different guises just as nuclear physics did in the era surrounding World War II. Nevertheless this influence has not been without its costs. Just as computer science has been a powerful heuristic for theory building in perception, it has also circumscribed a very specific kind of theoretical thinking because of its own intrinsic limits. 1 am referring to a much more esoteric reason than the fact that computers are made of discrete parts for this pervasive elementalism in psychobiological thinking: The theory-driving force exerted by the practicallimits of currently available computational algorithms and programming procedures. We currently have available a broad array of programming algorithms and other analytic methods that can be used to dissect forms into their parts and features and an equally large armamentarium of techniques for looking at the resulting components or parts. Our mathematics is quite competent, in other words, as a means of analyzing complex structures into their components. This programming limitation arises out of a contemporary computer hardware technology that is mainly one of serial "von Neumann" machines. Devices of this sort are weIl suited to dealing with the sequential analysis of local features, but are not weH designed for the parallel processing of spatially distributed arrays. It has only been in the last few years that attention has been directed towards the development of machines that are inherently global and distributed in their organization. It is only recently that parallel organized computers like the "Connection" machine produced by Thinking Machines, Inc. and the "Butterfly" machine produced by Bolt Berenek and Newman, Inc. (both stimulated by their elose proximity to the covey ofparallel-processing researchers at MIT in Cambridge, Massachusetts) and the N-Cube machine have become commercially available. It is yet to be seen if these new machines with their distinctly different philosophy of logical organization will successfully force our thinking and lead to holistic, rather than feature-oriented, theories as proposed by "connectionist" and "PDP" theorists. It is not difficult to establish how deficient we currently are, on the other band, in dealing with global form and the ensemble properties of complex structures. At the most fundamental level, there is yet no satisfactory means of even elassifying and categorizing broad elasses of geometrical forms. A few mathematical expressions can be used to define narrowly specified elasses of forms (e.g., the polynomial that defines quadric and cubic surfaces in space), but as 1 have shown in chapter 1, a consensus does not even exist concerning what we verbally mean by geometrical "form." Much less is there a formal means of specifying form in a wider context so that it (form) could become grist for some kind oftheoretical engine.
220
5.
THE RECOGNITION OF VISUAL FORMS
Similarly, however much we know about the physiology and information encoding and transmission properties of the individual neural components of the brain, there are few who can yet assert anything about the logical processes that are executed by arrays of neurons to instantiate even the simplest perceptual process. Beyond such stabilizing necessities as "lateral inhibitory interaction" and the logical necessity we can both deduct and observe for some kind of convergent and divergent mechanisms, there is a paucity of knowledge and even speculation about the kind of logical processes that neurons collectively carry out. The artifices of Artificial Intelligence such as list processing and the so-called expert systems (which seem to be nothing more than elaborate look-up tables based on elaborate sets of canned mIes) are hardly likely to provide much insight into the mechanisms used by the brain to associate concepts that are similar only in meaning, but not in coded representation. Indeed, although it is understandable how these subtle brain processes may be obscured by their complexity even if we could localize them in the brain, it is also true that we do not know where to look: Where, indeed, is it in the brain that neural activity becomes perception or thought? What is the locus of psychoneural equivalence? In general? In particular? Wehave not the inkling of a clue to answers to these questions in spite of almost a century of attempts to answer them. What the extirpation-behavioral analysis of classical physiological psychology has offered is still equivocal. We know of some regions that seem to be associated with sensory or motor processing or the control of memory. But, there is yet little iosight ioto the locus of higher cognitive processes which, for that matter, may not be localized, or localizable, but, rather, distributed throughout the brain. Classic physiological psychology has thus generally failed to achieve its goals, based as it was on apremise of precise localizability. The extraordinary proliferation ofvisual areas (see especially the work ofKaas, 1978 and Van Essen, 1985) accentuates the problem, it does not ameliorate it, by raising the necessity for interaction over broad regions of the brain to accomplish something like vision. The recent shift of attention from localization to the biochemistry of psychoactive drugs reflects this failure. This paradigm shift has brought us to the consideration of matters that are even less theoretical, less explanatory , and less enlightening vis-a-vis the questions of mentation than were the outcome of decades of tissue extirpation studies. Indeed, most neurochemical studies of behavior seem to be primarily concerned with the metabolie, rather than the informationprocessing aspects of the brain. It must also be acknowledged that both contemporary perceptual psychology and computational science have only recently begun to develop algorithrns and theories that would permit, as weIl as stimulate, us to examine the global attributes or Gestalt properties of a stimulus-form. Yet, if one considers the many prima facie pieces of evidence that characterize how recognition occurs, the apriori argument seems compelling that people recognize forms not because of the nature of the parts but, rather, because of some attribute of arrangement of the parts.
A.
INTRODUCTION
221
That is, I argue that the human perceptual system is, at its most fundamental level, holistic rather than elementalistic in the strategies it uses to carry out visual information-processing tasks. Other circumstantial and demonstrable evidence lies in the generalized ability ofthe human observer to (a) categorize objects under the same rubric even when constructed from many different kinds of features or in grossly degraded or reduced forms and (b) the ability of readers to virtually ignore the particular type font of printed words. There are many other similar demonstrations and arguments that can be made that our nervous system operates by holistic rules of logic in which global arrangement is more important than loeal features. Yet, when most perceptual theorists go into the laboratory, or, worse, when they attempt to develop computational models or psychological theories of form perception, the contemporary zeitgeist is to look at features, to analyze features, to manipulate features, and generally to emphasize features as the putative means by which humans process forms. However satisfactory this approach may be for computer vision engineering development projects that are not deeply concemed with modeling the human perceptual mind, such a hyperelementalistic philosophy and predilection has been a major distraction leading us away from the development of what first-approximation demonstrations suggest is a more truly valid theory of human recognition processes. Contemporary theory and experimentation in neurophysiology, on the one hand, and computer vision technology, on the other, in their joint emphasis on local features seem therefore to be divergent and digressive from what seems apriori to a more plausible approach to understanding how people recognize forms in terms of their most fundamental premises . Of course, there are many other contributing factors (e. g., both the difficulty in generating an arbitrary form and the difficulty in formally representing families of unrelated forms), but the main problem is that we do not yet have either a satisfactory holistic theory or an empirical methodology to provide the bases for a truly modem holistic theory of recognition based on the global rather than the local attributes. Any critic must acknowledge that even primitive holistic explanations are few and far between these days. But there are glimmerings, such as Hoffman's Lie Algebra approach, my own work with autocorrelation transforms, and some of the others emphasizing spatially distributed interactions. It must also be acknowledged that the fact that current thinking is dominated by feature-oriented theories may be unavoidable. It may not be so much a criticism of current thinking to describe it as over- or hyperelementalistic as it is a historical description of a necessary evolutionary step in the natural development of the science toward a more valid theory of form perception. We may simply have to accept the fact that what we have today is aprerequisite step toward fuH understanding. But whether it be a misdirected conceptual deficiency or a primitive evolutionary step, the point I make here is that current theories probably are nothing but the most preliminary of expedients and crutches. Just as with the study of the neurophysiological basis of leaming in which synaptic plasticity is accepted
222
5.
THE RECOGNITION OF VISUAL FORMS
witllOut substantiating evidence, logic and first order demonstrations argue that form perception is largely processed in terms of the global attributes of the organized stimuli. The argument that most form-recognition paradigms involve feature-oriented, and not holistically oriented methods may not seem, at first glance, to be supportable when one leaves the field of perceptual science and looks at what has happened in recent attempts to develop mathematical theories of "pattern" recognition. There is at least the initial suggestion that some of these models are looking at the global, rather than local, attributes of forms. However, as we see when we examine the various taxonomies of recognition algorithms later in this chapter, this is not the case. The purpose of this chapter is to review recognition theory. 1 present a review and analysis of a set of theories that, 1 believe, is based upon premises that, unfortunately, must be considered to be inaccurate, incomplete, and nonrepresentative of perceptual reality, as weIl as being patently misleading in that they are not even pointing in the right direction-that is, towards parallel and holistic mechanisms. It is an unfortunate state of affairs, but a necessary one in the absence of viable holistic theories and approaches to the problem of form recognition. My hope is that the reader, forewamed by the preceding paragraphs, will not assume (as so many of my coIleagues seem to have) that the dominant elementalistic approach is anything other than a preliminary step toward understanding the psychology of human form perception. The repetitive recitation of a long series of what are essentially elementalistic theories, it is hoped, do not detract attention from the fact that, in fundamental principle, people seem to see by virtue 0/ the arrangement 0/ the parts rather than by the nature 0/ the parts. With this caveat in place (and by force of repetition, in the forefront of our thinking about this problem) we now turn to some of the other issues that concern the process of form recognition. It might be weIl to start out by reasserting what it is that is meant by recognition. Phenomenologically, it is the act of classifying, categorizing, or conceptualizing a stimulus object as a member of a class of which it is a member. It is, of course, possible for an object to be a member of many different classes and have many different names: It is either the natural context or the design of the recognition experiment that determines within which of these many possible alternative classes an object will be classified. A "woman walking down a street" may be Mrs. Smith, my wife, a living organism, or a potential threat, depending upon the situation and the observer; a wide variety of contextual cues will determine the response of the observer to the stimulus object so inadequately described in the few words "a woman walking down a street. " In each case, there is much about the recognition process that transcends the requirements of the detection or discrimination processes. The question asked is always of the form: What, specijically, is the name of the stimulus object? or: To what group does this stimulus object belong? In the laboratory the experimenter is not simply structuring the procedure so that the observer can respond: "Is there
A.
INTRODUCTION
223
anything there?" or "Are these two things the same?" as characterized the processes described in previous two chapters, but is asking "What is it?" Recognition, therefore, is a process that requires more of the observer than simply answering questions of existence or similarity. It is also an act of conceptualizing or classifying that, without doubt, entails far more than just a straightforward tabulation of the specific features of the object. Just as discrimination subsumed detection, but added requirements of its own, so, too, does recognition require more information and more extensive information processing than did either detection and discrimination to be completely executed. Just as discrimination added a memorial requirement (the first stimulus, once having been attended to, must be remembered to be compared with the second one), recognition adds the further requirement that the characteristics of not just another single stimulus, but rather of a class (or several classes) of stimuli be recalled. At the most apriori and primitive theoreticallevel, the information-processing requirements are increasing as we pass through this hierarchy of visual processing stages. Whereas it is certainly conceivable to think of form detection and discrimination as the outcome of a relatively simple and passive algorithmic process, the process of recognition conjures up a theoretical environment in which terms like induction and logic are more likely to be encountered. And, as we see, recognition also seems to reflect a far greater influence of cognitive processes than do the earlier and lower stages: Empirical evidence suggests that recognition is, to a substantial degree, much more cognitively penetrable than is either detection or discrimination. Despite this apriori consideration of the increased complexities of the recognition process, it is the case that there is an enormous variety of recognition theories about, many of which assume some kind of aserial comparison or correlation process and most of which (if I rnay belabor a point) assume that some kind of a feature-by-feature sequential template-rnatching process occurs. Both logically and empirically, the sequential matching of an input stimulus with an array of comparison templates suggested by many of these theories seems implausibleour recognition processes do not seem to require more or less time to recognize a caricature than a photograph. Nor does it seem likely that an exhaustive series of templates are actually stored in our memory, each of which must be compared in turn with the stimulus object. The memory requirements of such a system would be enormous and the information processing inefficient. Preliminary examination of the recognition process thus suggests that something quite different from template rnatching is going on in the visual system, something that occurs in parallel and that does not depend upon aserial search (unless the experiment is specifically designed to make this demand on the observer by asking him to search a list of many stimuli in sequence) or a sequential series of simple cross-correlations. The nature of the associations that are made between percepts and classes of percepts seem to be incompatible with such serial processes. Another way to look at this problem is from the point of view of the logician.
224
5.
THE RECOGNITION OF VISUAL FORMS
Watanabe (1985), in the most recent ofhis distinguished series ofworlcs on pattern (Le., form) recognition (to which we turn for wisdom and organization in the next section of this chapter), describes the process of recognition as one of induction; that is, of the creation of concepts and categories by generalization based on the examination of many different exemplars. He distinguishes induction from deduction by noting that deduction is the production of specific exemplars from a general rule. The categorization or inductive process, from this point ofview, can be seen (and has been seen by Watanabe and others) as more descriptive of form recognition. This hypothesis, however, is not entirely satisfying. Whereas induction may be described as the generation of a dass, category, or concept from the specific examples, that is not exactly what a recognizer is doing. Rather, this kind ofvisual process requires that the observer place a newly presented exemplar into one of a set of preexisting dasses or categories. Even though it does not seem, therefore, that we can confidently propose a pure identity of recognition and induction, the former process-recognitiondearly has much more of the flavor of the latter-induction-than it does of the passive, automatonlike, deductive mechanisms that characterize so many of the theories of form recognition currently in the theoreticalliterature. It may be that the "deduction-induction" dichotomy is just not adequate to describe these processes, but that we should more properly consider a trichotomy of deductioninduction-recognition to be more descriptive of the logical reality of the formperception context. Their processes are separable enough to justify making this distinction among them. There is another unique attribute of recognition that transcends those of detection and discrimination. That attribute is that the act of recognition requires, to a far greater degree than did either detection or recognition, the association of meaning to the stimulus object; that is, dealing with the stimulus as a symbol for something else. Recognition is intrinsically more cognitive and semantically "loaded" than the previous two stages of visual processing, a fact that the empirical data seems to substantiate. The detection process does not require any semantic processing of the kind. The prototypical question: Is there anything there? requires only a binary response that ignores the nature of the detected form or any meaning that it may convey. Similarly, form discrimination requires only that the observer distinguish between two forms, and in the typical visualperception experiment it is possible to do so on the basis of partial mismatches of the geometry of the two stimuli while totally ignoring any semantic content it may contain. One can, to reiterate a previously made point, easily imagine an automaton performing either detectiom or discrimination perfectly weIl on the basis of the purely physical attributes of the stimuli. Detection and discrimination, therefore, may be considered to be purely syntactic, deductive (in a primitive sense), or even geometrical. It is these kinds ofprocesses that computers simulate quite weIl with algorithms that imitate feature-sensitive and preattentive, automatic processes. It is likely that these are also the kinds of processes being carried out
B.
THEORIES OF RECOGNITION
225
within the human brain when it performs detection or discrirnination. It seems quite unlikely, however, that the same argument can be made for recognition. I argue here that recognition is not simply syntactic in the same sense as detection or discrimination. Rather, this argument goes on, it is prototypically semantic; the act of classifying, or conceptualizing, or categorizing can be made dependent more on the meaning of the stimulus object than on its physical properties. This is true even when the forms are strict1y geometrical in nature. A letter a can be readily recognized and categorized regardless of the font, exact1y because the parts of the object letter possess certain relationships that transcend their local nature. Although the geometry of the font may perturb or modulate the recognition process,2 it is not sufficient to define the process as it is in the detection and discrimination processes. In short, the boundary between the passive, automatic, preattentive processes and the active, attentive, cognitive processes to which I alluded in chapter 1 may now be more specifically designated as the boundary between geometry-sensitive discrimination and meaning-Ioaded recognition. It is for this reason, among others, that I consider there to be an even smaller likelihood of help from neurophysiology in solving the problem of recognition mechanisms than in solving those of detection, for example. This is also the reason for the strong conjecture that the featuresensitive and triggered neurons that have so often served as heuristic models (line detectors, etc.) for form-recognition processes are probably totally irrelevant in actual fact to this kind of visual process in particular. Rather the complexities of a recognition process, with its heavy superstructure of meaning and classification, involve neural mechanisms that are as far from the simple and complex cells of the identified visual areas of the cerebral cortex as they are from the retina. I believe that it will ultimately prove necessary to look elsewhere to find the psyehoneural loeus of reeognition (if such a linkage ean ever be established).
B. THEORIES OF RECOGNITION The emphasis throughout this book has been to comsider theories of form pereeption as opposed to an exhaustive listing and discussion of the empirieal data base. It is remarkable, in this eontext, that there are so many more theories of form recognition than of any other process within the form-perception domain. In fact, there may be more approaches to the problem of recognition than to all of the other processes put together. It is fortunate in such a situation in which there is such a piethora of models, theories, and approaches, that others have attempted to develop taxonomie systems that organize and classify these theories of 2 Later in this chapter we see how the geometrie form of astimulus can produce modulations of results that appear to be the outcome of cognitive (categorical) influences when, in fact, they are not. These simulated cognitive influences must be distinguished from the true ones of the categorization process itself.
226
5.
THE RECOGNITION OF VISUAL FORMS
recognition. In the following section I have drawn upon the thoughts of several of the leaders in the field of form-recognition theory concerning the different approaches that are available and the ways in which the different theories can be categorized. Each provides a specific c1assification system of theories. Townsend and his collaborators Landon and Ashby are mathematical psychologists and approach their taxonomies from the point of view of the psychologically oriented theories. Watanabe is a distinguished physicist who has moved from the physics of matter to the physics of information oven the last four decades. He treats formrecognition theories from a completely different point of view-that of the physical and engineering sciences. Most ofthe models he treats are not explicitly theories of perception, but are techniques for accomplishing the categorization process by analytical means. Nevertheless, Watanabe's list inc1udes many approaches that have interesting things to say about the human visual processes. Pinker represents a new group of computational vision theorists and discusses the interrelationships among these theories. The reader is likely to be astonished, as I was, just how nonoverlapping the three taxonomies are.
I. Townsend, Landon, and Ashby's Taxonomy The first part of our discussion introducing three taxonomies of recognition theories is based entirely on the astute analysis of models of form recognition published in a preliminary form by Townsend and Ashby (1982) and subsequently in a more complete form by Townsend and Landon (1983). All members ofthe four c1asses of models that these authors describe can be said to fall into the category of descriptive mathematics and statistics. There is, as there should not be, in my opinion, virtually no allusion to possible physiological or mechanical mechanisms; the emphasis is placed upon formal descriptions of the psychological processes that may be taking place in the observer when he recognizes a form. These processes are described in terms of their functions, not their physical instantiations. Figure 5.1 presents Townsend and Landon's taxonomy ofthese mathematical form-recognition models. Two major subdivisions are indicated. The first inc1udes General Recognltton Model Models based on an Interna I observation General Discriminant Models
Feature - Confusion Models
Descriptive Models Sophisticated Guessing
Choice Models
Models
FIG.5.1. Townsend and Landon's taxonomy of theories of recognition (fram Townsend & Landon, 19831.
B.
THEORIES OF RECOGNITION
227
those that are "based on an internal observation." Exemplars of this class of theories all suggest that each stimulus event is dealt with separately by the perceptual-processing system. The probability of a correct recognition (i.e., a response with the correct name of the stimulus or the name of the appropriate category), therefore, depends on the evaluation of that stimulus item by a set of internal roles and criteria in the immediate terms of that particular event. In this subdivision of their taxonomy Townsend and his colleagues mainly pay attention to the specific processes such as feature detection that are presumed to exist within the cognitive strocture of the observer . The important generality is that theories of this class, whatever their details, are characterized by the fact that they deal with the properties of the individual stimulus event without recourse to other previous events or the current state of the observer . The second major subdivision includes those theories that Townsend and Landon (1983) call "descriptive." In this case the role of the individual event is minimized. Instead, the process of recognition is modeled as a kind of guessing or choosing an item from a set of possible responses on the basis of probabilistic (or weighted probabilistic) roles involving context properties that go far beyond the characteristics of the immediate event. Rather than processing the attributes of a single stimulus, as did the internal-observation theories, models falling into this category merely use the stimulus as one of many influences leading to an appropriate guess or choice of the proper response by the observer . The first major division-the "internal observation" category is further broken down into two other subdivisions by Townsend and Landon (1983)-the general discriminant models and the feature confusion models. General discriminant models are characterized by decision roles and procedures that evaluate the attributes of a particular stimulus and calculate a numeric value or discriminant for all possible responses that could conceivably be associated with that stimulus. This process thus produces, according to the premises of this kind of model, a set of numeric values associated with the possible responses or categories into which an item may be placed. Which response is chosen is determined in a very straightforward manner: The largest numeric value associated with any possible response becomes the selection criterion leading to the emission of that response. Examples cited by Townsend and Landon of discriminant-type theories include linear-discriminant models (see my discussion of the similarity models and the debate between Tversky, 1977, and Krumhansl, 1978, for a discussion ofwhat are essentially alternative versions of linear-discriminant models), nonlineardiscriminant models, statistical decision theories (including the influential-signaldetection theory-see my discussion in UUal, 1973b), template-matching models (as exemplified by Neisser, 1967), correlational models (including autocorrelation models), and feature-discriminant models. Other particular examples of each of these different theories are cited in Townsend and Landon's (1983) article. A related discussion of discriminant-type theories (which he refers to as "decision-theoretic" approaches to pattern recognition) has been presented by
228
5.
THE RECOGNITION OF VISUAL FORMS
Fu (1982b). Fu considered the problem from the point of view of an engineer interested in computer-form recognition, but the methodology is virtually the same. In addition to the decision-theoretic category of recognizers, Fu also classifies another group of recognizers as "syntactic." These recognizers work on the basis of a kind of geometric structural analysis analogous to sentence parsing to divide a complex form into simpler components or features but in a way that preserves their spatial relationships. As I have noted and reiterate later, there is no necessary suggestion that these computer algorithms represent the way in which humans recognize patterns: Our internallogic may be, in fact probably is, quite different. It is important to reiterate, if only in passing, an important point made by Townsend and Landon, Fu, and myself, among others. That is that virtually all of these theories that dote on the properties of the individual stimulus are feature theories. The wordfeature is not a sharply defined word, and there has been a great deal of debate concerning what is actually meant by a "feature." Nevertheless, there is a commonsense consensus concerning the meaning of the word in the computer form-recognition field as weIl as perceptual theory; it connotes a local attribute, region, or part; something that is less than the whole form. A ubiquitous theme in feature-analysis theory, therefore, is that a feature is some separable localized attribute of a form rather than an organizational property of the form. With one exception all ofthe theories that fall into the "internal-observation" category ofTownsend and Landon's taxonomy and all ofFu's decision theoretic and syntactic algorithms operate on the parts rather than the whole. Unfortunately, the exception is the one kind of theory that has been most criticized and least acceptable for combinatorial, intuitive, and logical reasons to the psychological community-the template-matching theory. Template matching refers to a physical , 'superimposition " of the stimulus and the various items of the potential response set. The response selected is the one that fits (or correlates with) the stimulus the best according to the template-matching theory. The implausibility of the existence of a huge set of templates, some of which may never have been experienced previously and the equally implausible nature of a exhaustive search through the library of templates have always provided strong conceptual arguments against the template hypothesis. The second category of internal-observation-type theories-feature-confusion models-postulated by Townsend and Landon is characterized by a different criterion. Models falling into this category are purported to actually compute matrices of specific confusions among the set of possible responses. Theories of this genre use the confusion data as a means of determining the rules for generating a reduced set of possible responses from among which the response is finally selected (rather than from the fuH range of possible responses). The experimental paradigm generating appropriate data for the feature-confusion models must be designed so as to maintain a substantial error rate. An example of this approach is Appleman and Mayzner's (1982) model.
B.
THEORIES OF RECOGNITION
229
Descriptive theories-the second major heading of Townsend and Landon's taxonomy-are also divided into two subcategories. The first gathers under a single rubric called the "sophisticated guessing-type model," a number of different types of models including the sophisticated guessing models themselves, all-or-none models, overlap models, and confusion-choice models. The second category, which they designate the "choice category," includes but a single exemplarthe "similarity-choice" model. For our present purposes it is important to note that the formal mathematical approach of the descriptive models represents a strictly macroscopic theoretical orientation. The mathematical formulae describe, in a formal way, processes that are, at best, only analogous to those presumably carried out in the perceptual nervous system. The variety of different assumptions about the nature of those processes is indicative of the fact that these models cannot, in fact, definitively resolve the matter of the specific nature of the internal mechanisms instantiating the form-recognition process any more than can psychophysics. Indeed, these models are typically statistical in nature and can be thought of as describing (rather than analyzing) the recognition processes. The degree to which each model fits the psychophysical data is only suggestive and can never unequivocally exclude, or include, any particular set of assumptions from a long list of possible and plausible statements about internal-processing mechanisms. It is also the case that many ofthese models, although classified in different parts ofTownsend and Landon's taxonomy, may be very similar in fundamental principle, not only to each other, but also to several of the other theoretical approaches that have been and are discussed. It is often surprising to discover that models based on slightly different experimental domains and nomenclatures are, in fact, formally equivalent to each other. Descriptive theories, thus, tend not to be reductionistic; rather they describe processes but make no definitive statement conceming the mechanisms by means of which they may be implemented. The processes are all described at a molar level and must, at best, be considered to be analogs of the actual underlying processes. Because, in general, they are all statistical rather than deterministic, they do not deal with individual cases of recognition or individual stimuli. Indeed, it must be remembered that all of these descriptive models are only descriptions of molar processes that are the behavioral analogs of human perceptual mechanisms and processes. There can be no conclusive proof, however, even when the fit between the model's prediction and psychophysical data is excellent, that, for example, such "processes" as "calculating discriminants" are actually going on inside the perceptual nervous system. Indeed, it seems unlikely that such a thing as this kind of numerical computation would be occurring. At best, the models relate to something that is comparable, analogous, and functionally similar, but probably not homologous or identical. The two subcategories into which the descriptive category of Townsend and Landon's taxonomy are broken down-the "sophisticated guessing" models and
230
5.
THE RECOGNITION OF VISUAL FORMS
the "choice" models-are both closely related and, indeed, given a few restrictions, turn out to make much the same predictions. The choice models are characterized by the work of Luce (1963). The sophisticated guessing models are much more varied and include an all-or-none model, an overlap model (as exemplified by the work of Townsend 1971a,b), an informed-guessing model, as exemplified by Pachella, Smith, and Stanovich (1978), and Smith's (1980) symmetrie sophisticated guessing model. Looking back over the two major classes of theories of recognition-the internal observation and descriptive categories-that have been suggested by Townsend and Landon, it is clear that they vary in one classic way that these authors did not emphasize. The internal-observation models are typically algorithmic interpretations of the characteristics of a single stimulus. They bring to mind mechanisms that could be constructed of relatively automatie computational engines that ignore both the immediate environment and the history of the recognizer previous to the stimulus event. They deal with the parts (with the exception of the unpalatable template-matching model) in a manner that is best characterized as empiricist in terms of the classification system offered in chapter 2 of this work. On the other hand, the descriptive models all involve a kind of active decision making or choice behavior in which the observer's characteristics, history, and set can be thought of as specifying the nature of the response. In the classic taxonomy of chapter 2, this would be considered to be characteristic of a rationalist approach. The existence of these two themes in such a modem context (they become the bottom-up-vs.-top-down distinction in computational vision, for example) is evidence of the persistence of this controversy over the centuries.
2. Watanabe's Taxonomy Another outstanding source in which a carefully constructed taxonomy of mathematical theories of form recognition has been provided was authored by Watanabe (1985). Watanabe does briefly consider human recognition in his extensive discussion of the field, but the main theme of his book is the recognition process as it is executed by the different kinds of mathematical methods used by computers to classify and categorize forms. The relevant portion of his taxonomy looks at recognition3 as it may alternatively be considered to be a form of: 3 One may argue that the use of the word recognition here is certainly not appropriate from a philosophical or psychological point of view. Machines, however good they may be at classifying and categorizing, are no more cognitive entities in the sense that the human is than is an amoeba. Machine classification is not a form of re-cognition or any other kind of cognition. It is, as 1 hope has been made clear in this chapter, the outcome of the evaluation of very precise rule-defined algorithms. The differences in logic that underlie machine "recognition" and human recognition are unknown. At the very least they differ vastly in their complexity: At the very most they are totally different entities. Therefore, the use of the word recognition is a shorthand that though useful is loaded with theoretical connotations that can lead us astray at other points this discussion. Be warned!
B.
• • • •
THEORIES OF RECOGNITION
231
Entropy minimization Covariance diagonalization Statistical decision making Mathematical discrimination
• Structure analysis In addition to using bis taxonomy to explicate the variety of mathematical techniques that are used by modem artificial intelligence and computer-vision researchers, I also want to use Watanabe's analysis to emphasize the conceptual theme that dominates this chapter-the predominance oflocal-feature approaches in the field of form recognition. At the outset, I reiterate a persistent difficulty that many of us who study form recognition (artificial or human) have alluded to in one way or another. That difficulty is based upon the fact that the word feature is so poorly defined. Feature is the equivalent in the world of form recognition, it seems, of the word mind in psychology, and indeed, is closely related to the difficulty in defining the word form itself. Let us consider in each case why each of the underlying mathematical procedures described in Watanabe's taxonomy is actually an example of a featureanalysis approach and predominantly elementalistic, rather than holistic, in terms of its fundamental premises. The last of the five methods (structure analysis) in Watanabe's taxonomy is explicitly a local-feature-analysis procedure and can be dealt with briefly in making this point. Structure analysis explicitly deals with two-dimensional forms as combinations of loca1 component features of prespecified types. That is, each stimulus object is defined as nothing more than a construction of, for example, vertical and horizontal straight lines, arches, and hooks. The task of the structural-analysis program is to take any stimulus form and specify from wbich of these component parts it has been constructed. In some algorithms, the order in which these parts have been assembled to produce the original figure can be determined, and this sequence information may also provide useful clues to the recognition of the form. A particular example in wbich assembly sequence may be critical is the recognition of handwriting. Nowadays, with the powerful image-processing algorithms that computer science has made available to us, determining where various prespecified parts are in a complex figure is relatively easy. Each ofthe prespecified local features, for example, can be convolved with the entire object in turn. Convolution is a well-established mathematical routine that is simply implemented on even the relatively small personal computers currently available. The convolution of a set of prototypical features (again for clarity, these exemplars might be straight or curved lines) with a stimulus-form produces a set of derivative maps in which the location of each of the prototypical features is highlighted. Simple superimposition techniques may then permit even the order in which each of the parts has been combined to be determined.
232
5.
THE RECOGNITION OF VISUAL FORMS
As I noted, in the case of the various kinds of mathematical treatments that fall within the structure-analysis rubric in Watanabe' s taxonomy, the featureanalytic orientation of the method is explicit, and the meaning of the termfeature is clear; they are the local geometrical components or parts of the overall form. They are also the prototypes used in the convolution procedure. Two other categories of his taxonomy, the mathematical discrimination and statisticaldecision-making methods, on the other hand, invoke feature analyses in a slightly more subtle manner. Tbe objects dealt with by these two methods are typically defined, according to Watanabe, as a multidimensional set of "vectors." But, what are these vectors? Watanabe indicates that they are either "observations" (i.e., the result of different measurements) or the outcome of some kind of distillation procedure in which the number of measurements, dimensions, or descriptors of the original object is reduced. In any case, in an ideal mathematical world, each of these vectors would be orthogonal or independent of the others so that no information would be redundant and, therefore, no processing time would be wasted. Of course, the vectors may have a common causal basis and, therefore, be correlated to some degree (e.g., the height and weight of the members of a population are related), but at the least, each can be measured independently of the other. Described in this way, it becomes clear that these decision-making techniques are also, at their foundation, feature analyzers and, for all practical purposes are equivalent to Fu's (1982a) decision theoretic classification. In this case, the features may be numeric rather than geometrie as they were in the methods classified within the structural-analysis category, but they are, nevertheless, features in exactly the sense I have been using the term previously. To the mathematical engine, this difference is irrelevant: A vector, a feature, or a measurement representing a nongeometrical attribute (such as weight) would be treated identically-as a feature of the pattern that must be processed. For that matter, the "patterns" need not even be geometrie forms but may be organized bodies of information of other kinds. The important fact is that these methods are just as much feature analyzers in terms of their fundamental approach as are the explicitly structuralanalysis methods described earlier. The general category of covariance diagonalization suggested by Watanabe includes a very large number of different methods that are very similar to each other in basic principle. Some of these methods are familiar to psychologists (who were among the leaders in developing them), whereas others are quite alien. Two that are familiar include diagonalization of the covariance matrix and factor analysis: Two that are not so familiar to psychologists are the Karhunen-Loeve expansion and self-featuring information compression (SELFIC). All of these methods, however, are characterized by attempts to draw out of a mass of data the major dimensions, vectors, or (one hopes without begging the question by a too-narrow definition) features that minimize the number of measurements that have to be made. In other words, all of these methods try to find some common measurement that is closely enough correlated with sets of features that it can
B.
THEORIES OF RECOGNITION
233
be used in their place. The extraction of such new features or factors from the many measurements that may make up the raw data of a survey or experiment depends upon correlations among the various measurements, and like the multidimensional scaling techniques discussed in chapter 4, is a means of reducing information to a set of essential, nonredundant measurements. The point to be made in the current context is that the inputs to these "diagonalization" procedures are nothing other than the kind of simple, measurable, and, when geometrieal, usually local attributes we have called features. The outputs are not so simply categorized, but reflect the outcome of information-reducing processes that were fed by these features. To the extent that we feed local features into this mill, these methods too must be considered also to be exemplars of the feature approach. Finally, let us consider the remaining method alluded to by Watanabe as being prototypical of the pattem-recognition process-entropy minimization. Entropy is defmed as the degree of disorder or randornness in a system, and Watanabe (1985) has analogized the process of recognition or categorization as being equivalent to removing this disorder. Therefore, categorization is akin, in his thinking, to discovering the intrinsic order of a system. From another point of view, Watanabe also notes, recognition is analogous to organie growth wherein disordered components are organized in ordered forms or entities. In other words, recognition minimizes disorder (entropy) by organizing the system into ordered classes or categories. Central to the analogy drawn between form recognition and entropy reduction, as it was to the covariance-diagonalization methods, is the concept of the reduction of dimensions so that the unimportant redundant dimensions can be ignored. This is the process by which order arises out of disorder. It is at exactly this point, however, that entropy reduction also can be seen to be currently nothing more than another feature-oriented approach, for the dimensions to be emphasized in reducing the entropy are the same measurable features or measurements that we have already considered. This state of affairs is, of course, due to today's nearly total deficiency of global descriptors of form. It must be acknowledged that the entropy-reduction technique does have a potential for a more global interpretation. !fthere are new developments in identifying holistic attributes, then this model or procedure may be transformable into a more holistic, arrangementcentered methodology. In summary, we can see that virtually all of the pattem-recognition models that are proposed in the field of computer information processing are feature analytic in their fundamental approach. They prejudge the issue (of globality versus features) by their mechanies; they do not deal with overall structure, even though in principle it is probably more a matter of the paucity of global dimensions and descriptors that prevents them from doing so. As we see shortly, the generalization that these mathematical methods for categorizing forms are predominantly oriented toward manipulating the elements of a form rather than its global structure is also true of the majority of psychological theories of form perception.
234
5.
THE RECOGNITION OF VISUAL FORMS
In the case of the engineering goal, such artifices are acceptable. In the case of psychologica1 theory, they are not acceptable because they mislead theory, drawing experiment and interpretation away from a valid model of human form perception. The fact that these methods are all predominantly oriented toward feature analysis leaves in doubt the issue of whether or not they could ever be applicable to the problem of recognition and classification in humans should our object descriptions themselves begin to take on more global qualities. Some might argue that any new global properties might themselves become "features," however different in kind, and these mathematical methods might still be useful. It seems plausible, on the other hand, that the explicitly feature-oriented logic of these methods may, in fact, make them inappropriate as processors of truly global properties. Other methods, more in tune with the global and holistic attributes of a form, are likely to be required to validly model the human form-perception process. This is a problem for the future to test; I raise it here as a caveat, not in any attempt to resolve it one way or another at the present time. The essential point I wish to make is that models of form recognition that are based on local features should, at best, only be considered to be superficial analogies of the logical methods that are being executed in the human brain. They are first approximations or analogies, but are probably not good homological explanations of the actual psychobiological mechanisms used to process forms. Despite the elaborate mathematical superstructure that surrounds these methods, the underlying mechanisms, 10gica1 processes, and implementations are most likely quite different from those utilized by organic form-recognition processes, particularly those of human vision. We can be sure of at least one thing: Modem computer and analytical methods, designed as they are to ron on serial computers and to dote on local features, do not homologize the mechanisms of perceptual reality. That reality is much more likely to incorporate parallel processes that can treat global form holistically and act upon spatial relationships rather than local features. Most of what we know of perceptual phenomenology gently pushes our theories in that direction just as contemporary mathematical and computer models tend to push us toward thinking about features. Wbatever the quality of the analogy between the feature-oriented models and human form perception, it must be appreciated that whatever is actually going on inside the brain remains as opaque nowadays as it did in a more philosophical and speculative age. To sum it all up in a single phrase, the imitation of behavior is not tantamount to the understandinq ofinner mechanisms. This is another way to assert the blackbox limitation originally alluded to in chapter 1. It is a fundamental limitation of mathematical models of any kind and a shadow lurking over the future of cognitive psychology as we know it today.
3. Pinker's Summary Pinker (1985), reprinting a special issue of the journal Cognition: International Journal of Cognitive Science, has also summarized a group of other form-
B.
THEORIES OF RECOGNITION
235
recognition theories that, though generally less formal and mathematical, are among the most popular and weIl-known approaches to explaining this process. His outline also serves us weIl in furthering our appreciation of the wide variety of different form-recognition theories that have been proposed. Pinker points first to template matching. The basic idea behind this simple and oft-discussed model is essentially one of cross-correlation, as we have seen. That is, it assumed that recognition occurs when a fit between the stimulus object and one of a large set of different patterns or templates stored somewhere in the memory system is maxirnized. The "best fit" may be measured by one of the statistical or mathematical procedures already considered, but whatever measure of fit is used, the match is strictly determined on the basis of the congruence of the stimulus and members of the set of templates. As I have already stressed, from its inception the template model has been beset with a number of obvious difficulties that have always made it unpalatable to theoreticians in this field. The number of templates that would be necessary to fit all possible retinal images of even a single three-dimensional object on a purely geometrical fit basis would be enormous. And, for the recognition of all possible stimulus objects in all possible orientations, projections, and magnifications, it seems that the set of necessary templates should approach a size that would overload even the 103 possible connections of any neuron to any one of the other 1013 neurons in the brain. The processing time required to carry out a search for a match between all possible templates and a single stimulus-form also strains credibility, even in the context of the parallel-processing milieu that now pervades computer theory. Pinker also points out that the template model is really only applicable to the recognition of single objects, and a far more complex kind of pattern recognition is typically at work in human visual perception in which complex scenes must not only be recognized but also analyzed into their component parts and the relationships among those parts established. Pinker goes on to discuss a variety of different "feature" -analysis models, raising many of the same objections I have already discussed in previous sections. The main problem with models of this sort, he asserts, is the absence in most feature models of parameters that describe the relationship of the features that compose the stimulus object, that aspect that I have already called "arrangement. " Next, he considers Fourier-analysis models and his criticisms in that case are also in agreement with those made elsewhere. Specifically, the transformation from the space domain into a frequency domain really does not solve the problem-it merely shifts it from how one processes patterns of lines to how one processes spatial frequencies. And, most distressing of all, from a philosophical point of view, there is just no similarity between the stimulus object and its Fourier representation in the two domains even though there is a formal/transformation from one to the other from a mathematical point of view. All three of the theoretical approaches to which Pinker has pointed up to this point are dependent on the geometry either in its raw form or as transformed
236
5.
THE RECOGNITION OF VISUAL FORMS
by some fixed algorithm like the Fourier-analysis procedures. The next category he identifies, however, is one in which structural descriptions are made of the form in a way that is not geometrieal, but rather symbolic or propositional. Thus, for example, a theory of this sort might invoke a set of rules that describe the nature of the parts and the relationships between them either in words or in terms of some kind of a graphical structure. Pinker is somewhat more positive about this, noting that the putative advantages of the structural-description approach include the fact that they do not lose information (a point with which I do not agree) and that a complex recognition process can be broken up into smaller parts. However, he also indicates that, like Fourier analysis, this method really is not a theory of recognition but another means of describing and representing stimulus objects. Once described by a structural model, the stimulus object is no closer to being recognized or classified than before its description. Subsequent processing steps akin to cross-correlation (if one is a template theorist) must now be invoked to carry out the identification itself. Of course, one step forward has been achieved after any transformation-the stimulus object is now represented in a format that may be simpler than the original spatial representation. It may be more precisely quantified or it may be symbolically denoted by structural relationships that bear a simpler kind of similarity to the set of object-classes within which it may be classified than it did when represented as a purely geometrical form. Pinker (1985) also describes the Marr-Nishihara (Marr 1982; Marr & Nishihara, 1978) approach, which I discuss later in this chapter. His evaluation of their theory is that it is a far better way to proceed than the traditional approaches he previously summarized. It is important to point out in preview, however, that the work of Marr and Nishihara is very specific: Their theoretical treatment, and much of the other work in this field, is aimed at problems of precise definition (How do we reconstruct the third dimension from projective drawings, from disparity, from motion, etc?) They are not, in any sense of the word, general theories of form recognition. This is the source of both their great strength and their tightly constrained limits.
4. Other Taxonomies Another older, much more formal, and very deep analysis of the patternrecognition process has been presented by Duda and Hart (1973) in their classic textbook on the topic. This book represents what was perhaps the epitome of the pattern-recognition field in the mid-1970s. It is interesting to note that Duda and Hart separate their book into two parts: One deals exclusively with scene analysis and the representation of scenes. (lncluded in this section, incidentally, is a discussion ofthe spatial frequency or Fourier methods, which Duda and Hart correctly identify as a means of representing objects and scenes and not, as is so often done, as a form-recognition process itself.)
C.
SPECIAL TOPICS
237
The other section of their book, on pattern classification, is a survey of a set of techniques for classifying stimulus objects into one or another category. It is fascinating to note that all of the methods they describe are essentially statistical and presume that one or more features or attributes have been measured. The task ofthe pattern recognizer is (sometimes not so simply) to determine into which category the set of features or attributes specifies that an object should be placed. There is, once again, a presumption here of local-feature analysis, of a set of separable measures of a form, that is sufficient to classify the object. All of these techniques, a number of which we have encountered before in this section and a few new ones (Bayes' decision theory, parameter estimation, nonparametrlc techniques, linear-discriminant functions, and clustering-all of which are also discussed in Fu, 1982a) are to be applied after the feature attributes of an object are measured and defined. The problem becomes akin to signal-detection theory, which itself can be considered to be a primitive form of a two-alternative (usually) classification or recognition system. The feature measures are assumed to be available, but the matter of what are the essential attributes of an object that should be fed into this statistical engine are finessed with only a few words in the introductory chapter of Duda and Hart's book. The point I make is that if a theory of human form recognition is 10 be developed it must include far more thanjust the statistical-decision rules. The entire analysis presented by Duda and Hart (1973) ignores the issue of what attributes of global form and arrangement might be used as inputs to the statistical-decision engine. This distinguished book illustrates, once again, the utter domination of the field of form recognition by methods that are mainly dependent on localized features with little attention given to the global, molar, or organizational properties that seem to be used by humans in their visual processing of scenes and objects.
C. SPECIAL TOPICS In the following sections I consider a number of special topics that characterize the contemporary work that is done in the field of human form recognition. In line with the emphasis on theory, as opposed to the empirical data, I stress in each of these special topics the nature of the ideas that have been invoked as explanations of form-recognition phenomena and the controversies that have erupted among these ideas.
I. Visual Search As I mentioned in an earlier chapter, the language used by many perceptual psychologists in specifying exactly which process is under examination in an experiment is sometimes not as clear as it should be. One of the most confusing examples of such a misnomer occurs in the experiments that are designed to test
238
5.
THE RECOGNITION OF VISUAL FORMS
an observer's ability to identify (i.e., recognize) a particular object when it is embedded in an ensemble of other objects. Although this paradigm is very often described as heing an example of visual discrimination (between the target item and the nontarget items in the ensemble), in fact, a closer examination of this paradigm suggests that in many, if not most cases, the process being assayed in such "discrimination" experiments is, in reality, better considered to be an example of recognition performance. The critical test to distinguish which process is actually being assayed, I believe, is quite simple: Which of the three basic perceptual questions ("Is there anything there?" "Are these two things the same or different?" "What is [the name ofJ this thing?" is being asked? It seems to me that, in fact, the visual-search problem is associated most closely with the last of these three queries. When an observer is asked to go through a list and, for example, pick out a noun or an alphabetic character from a list of verbs or numerals, the observer is actually categorizing, (i.e., recognizing) each item in sequence. A major controversy revolves around whether several categorization processes of this sort can be handled simultaneously (in parallel) or whether they must be done in sequence (in serial order). Another issue concerns the nature of the search. Is it exhaustive (i.e., are all 01 Lhe items in a list examined?) or is it automatically terminated when the soughtfor item is discovered? Most search experiments of this kind do show a characteristic response pattern when the material is more than just physically dissimilar. Phenomenologically, the observer reports attentive scrutinization of each item in the list, categorizing each according to the mIes of the game defined by the experimenter, until one of the items fits the properties of the search item. This item-by-item serial search is terminated when the fit to some criteria (e.g., find the name of a mammal) occurs4 and the length of time that is spent scanning the list is typically linearly related to the number of items that have been scanned. The process, if the material is sufficiently rich, is, therefore, intrinsically attentive: The observer's attention must be focused in sequence on each item in the ensemble until the search is completed. The properties of the serial search process that I have just described characterize experiments in which the items differ according to some meaningful criteria. The evaluation of each item necessitates the effortful and attentive scrutiny of all of the items in the ensemble. The situation is quite different when the items in the ensemble differ in some manner that is not semantic, but physical: The search process does not work as described when there are geometrie, chromatic, or temporal properties distinguishing between the searched-for item and the other nontarget items in the list. For example, if the irrelevant items in the list are all smaller than the target item, if they have a different color, if they have certain specific geometrical properties (akin to those defining Julesz's statistical moments, or tex4 In some serial search experiments of this kind observers may be asked to find two such fits or even to find all such members of the set that fits the criterion. These represent perturbations on the basic theme; we consider for the moment only the simple prototypical experiment.
C.
SPECIAL TOPICS
239
tons), or if they display different temporal properties than the target, the target will be more or less immediately discriminated from its neighbors by virtue of that physical difference. I purposely emphasize the word discriminate in the previous sentence, because it seems more likely that in those cases of the kind of "physical" differences I have just described the process is distinctly different from those in which semantic or meaningful properties distinguish between targets and nontargets. In the physical-difference case, the experiments probably are assaying a preattentive discrimination process rather than an attentive recognition process. This conclusion flies in the face of the fact that the procedure of the experimental paradigms may be virtually identical in each case. It should be noted how very easy it is to completeIy change the context of an experiment and the mechanisms that are being assayed by a simple alteration of the nature of the stimulus material. It should also be noted how very difficult it often is to determine exact1y which process one is actually examining in a given experiment. This is a pitfall that is evidenced throughout the literature of cognitive psychology; similar experimental designs often assay totally different underlying processes, much to the confusion of emerging theoretical explanations. And, vice versa, completely different experimental paradigms often turn out to be assaying the same underlying process. The visual search processes with which we are concerned in this chapter, therefore, are assumed to exist because of the outcome of a contrived and restricted subset of all possible search paradigms. In general, it seems most appropriate to designate them as recognition processes by virtue of what seems to be their semantic content. (We see, however, how this assumption does not always hold and that some presumably semantic influences have turned out to be dominated by the physical aspects of the stimulus.) The desire to emphasize the semantic content has often dictated that word and alphabetic information be used. However, even when this was done, as we see, the "semantic" nature of the stimulus categories could not be guaranteed. For example, the shape of a word can be significant in its recognition. Work that deals with such "physical" stimulus parameters as color, size, or local texture shall be considered only in passing: Such physical differences produce results that are more comparable to the texturediscrimination experiments dealt with in the previous chapter. The influence of the physical attributes of astimulus is suggested when the target form just "pops out" in the preattentive manner that was characterized by Triesman and by Julesz. Thus constrained, what are the theoretical questions and problems that guide research in the field of visual search? One of the most important, to which I briefly alluded earlier, is: U nder what conditions is recognition intrinsically aserial process, and under what conditions does it display parallel processing properties? This question is, in large part, one that can be attacked empirically and, indeed, there is a consensus supporting the conjecture that for the nonphysical, semantic type of search material, aserial sequence of selectively and individually attentive judgments must be made.
240
5.
THE RECOGNITION OF VISUAL FORMS
A corollary of the serial-parallel issue concems whether aserial search is exhaustive (i.e., even though the search may be serial, are all items in a searched list evaluated before a decision is made?) or self-terrninating (i.e., does the search process stop when the target item is located?). Although it might seem very inefticient to continue a search once the target is located there is, in fact, some evidence that exhaustive searching does occur in some situations. SampIe evidence speaking to this point would be of the form of response times that do not vary as a function of the position of the target item in lists of constant length. Another set of questions revolves around the matter of whether or not the items in the ensemble interact with each other. It seems clear that the interaction of an item with its neighbors, not in the simple sense of a spatial inhibition or the interaction among neural receptors that is so popular in the preattentive, passive models of lower level visual processes, but in the sense of interactions among semantic categories and classes, does occur. This is an issue of enormous complexity, for it involves interactions at high cognitive levels, not through passive processes sensitive to geometric shape or form, but through those sensitive to something as intangible as meaning. How, for example, would the outcome of a visual search for a word that was the name of a member of the cat family be affected if the nontarget words surrounding the target word were the names of other small mammals as opposed to the names of invertebrates? Experiments of this kind explore the impact of a kind of conceptual clustering, as opposed to simple geometrical propinquity, on recognition in the search paradigm. Another important issue in visual search experiments concems the impact of the degree of familiarity of a word on its recognizability. It is presumed that more familiar words are more likely to be recognized than less familiar words. Indeed, targets made up of strings of letters that are not words do seem to be harder to locate than those that do make up words. Similarly, the nature of the categories or conceptual groups of the nontarget items may be important in determining the search time. Similarities and differences in semantic content, category, or some other meaningful attribute of the nontarget items in the search set have repeatedly been shown to interact with the target item in a way that modulates search times. Another closely related question concems whether or not the information about the category to which the items in the search array belong can be determined prior to their recognition. This would imply some kind of a cognitive evaluation of the items occurs prior to full-blown recognition. Such an outcome would be strong evidence that the search-and-recognition process for items differing only in semantic content is actually carried out in aseries of stages-a possible tiner taxonomy than has been proposed here or elsewhere. Let us consider the current status of the answers to some of these issues. We begin with a discussion of the serial-parallel controversy. The issue of whether visual search takes place in parallel (i.e., all of the items in an ensemble are examined simultaneously-a process that may apriori seem unlikely but in the light
C.
SPECIAL TOPICS
241
of the work on visual textures seems more credible) or in serial order (each item in the ensemble is examined in sequence) was probably first raised to theoretical consciousness by another one of those seminal and classic experiments that historically have had such impact in psychological research. This classic experiment was carried out by Sternberg (1966, 1967) and reported in two relatively brief but extremely influential articles in the journals Science and Perception &
Psychophysics. Sternberg reasoned that if a search process of the kind we have described took place in serial order, that the larger the number the items in a limited set of alphabetic characters, the longer would be the reaction time to search through that set. If, on the other hand, the search process occurred in parallel, then the length of the list should not materially affect the reaction time. To carry out this experiment in a way that was free of many of the difficulties that exceptionally long lists would entail, Sternberg exposed relatively short lists of alphabetic characters to the ob server in a tachistoscope. After the list was presented and, presumably, stored by the observer in some kind of short-term memory (the specific nature of which is not material at this point) a single probe character was presented. The observer's task was to specify whether or not the particular alphabetic character was in the list originally presented. The results of Stemherg' s experiment seemed very clear-cut. The reaction times measured were strong linear functions of the number of items in the list that had been presented to the observer and committed to short-term memory. The search process seemed, therefore, to be best characterized as being serial; it could also he characterized as heing self-terminating because the process was completed when the target item was identified-reaction times were strong functions of where the item was positioned in the memorized list. That is, if the item occurred early in the list, then the reaction times were short; if later, they were longer. There have been many others who followed in Sternberg's footsteps and have provided similar kinds of data supporting the contention that the search for a singletarget alphabetic character is a self-terminating, serial process in which each character takes about 30 to 35 msec to process. However, others have argued that there is evidence that similar search processes are, quite to the contrary, carried out in parallel. The arguments for parallel processing are somewhat less direct and compelling, in my opinion, however. One modestly strong argument for the existence of some kind of parallel processing is to be found in the observation that there are some kinds of categorical and contextual effects exhibited by the results of search experiments. If any interaction (as evidenced by the modulation of reaction times as a result of the relations between the semantic content of the items in the list) occurs between the items in a search, then it must be inferred that the examination of each item in the list is not entirely independent of the nature of the other items. This argument goes on to assert that such an interaction is tantamount to some kind of parallel interaction if not a patently simultaneous solution of all parts of the problem.
242
5.
THE RECOGNITION OF VISUAL FORMS
Another set of somewhat stronger, arguments for parallel-search processes has come from the work of Shiffrin and Schneider (1977) and Schneider and Shiffrin (1977). The experimental design they used differed from the Sternberg paradigm in several ways, but the critical difference between the two experimental procedures lay in the fact that the observer was required to search for any one of several different targets at a given time rather than for only one single target letter. The argument is that if the observer has to compare each item in the nontarget list with only one target, it should take less time than if he has to make the comparison for all of the possibilities of the alternative target list serially. And, indeed, this is what seemed to happen initially to their untrained observers; the response time to specify the presence of a target varied linearly, as in the Sternberg experiment, but in this case as a function of the number of possible target items. Serial processing was thus initially supported for this kind of experiment. Shiffrin and Schneider's subsequent results, however, indicated that something quite different was occurring later when the observers were weIl trained: As long as the multiple target items and the nontarget items in the ensemble were always used in the same way (i.e., the target items were either aliletters or all numerals and the nontarget items were always chosen from the other category-a condition referred to as "homogeneous") then there was little difference in the search times required for the observer to report whether or not any of the target items were present as a function of the number of possible alternative targets. There was OnlY a slight degradation in the performance (i.e., elongated reaction times) of their weIl-trained observers when they were asked to look for six as opposed to one target letter. This result suggested the possibility of the conversion of a serial to a parallel process with experience. In the context of our present discussion, these results were presented by Shiffrin and Schneider as a proof of the existence of parallel processing under at least certain conditions of stimuli and experience. Whether it be called "automatic" processing (as some researchers more interested in memory than perception are likely to do) or "parallel" processing (as students of perception are likely to do), the absence of a prolongation of response times with increasing difficulty of the task suggests that some mental processes may be occurring simultaneously. On the other hand, if the items of the same category could be used as both target or distractor (nontarget) items (a condition referred to as "heterogeneous") , then there was always nearly a linear increase in search times. There is one strong argument that may refute the line of logic suggesting that observers in this experiment were converting from aserial to a parallel process with experience or vice versa when the experiment went from a homogeneous to a heterogeneous type. It is alternatively possible that the observers are not learning how to carry out a fixed set of underlying processes in parallel but have actually changed the nature of the task in some fundamental way. That is, as observers become weIl practiced, there is no assurance that they have changed
C
SPECIAL TOPICS
243
the task from a letter-by-Ietter comparison to a multiple simultaneous comparison (this is abasie premise essential to the argument that parallel processing takes place) late in training. Perhaps all ofthe letters have become stimuli for a higher level of information encoding that now requires only a single comparison to be made where many were required previously. It is difficulties like this, the enormous adaptability of the human observer , that makes such external arguments or cognitive theories difficult, if not impossible, to test. The premises of the serialparallel argument simply may no longer be valid: In such an inconsistent world, (see chapter 6 for an extended discussion of this issue of individual variability, strategy shifts, and elusive laws in psychophysies) it becomes difficult to provide a compelling argument for one side or the other of the controversy. Another more immediate and direct difficulty with the argument that Shiffrin and Schneider' s experimental design makes for parallel processing in at least some situations is that the two procedures, which use homogeneous and heterogeneous target and nontarget sets, respectively, are very different in terms oftheir respective levels of difficulty. In the first case, in which the targets are always from the same category and the nontargets are always from the other, the level of redundancy is very much higher than in the case in whieh the nontargets may be either alphabetic characters or numerals. In the latter case, the heterogeneous set of nontargets must be sorted in a more complex way than in the first, and the probability of the observer encountering a target on any given item is also much smaller. If the target is, for example, an alphabetie character, the observer is cued by the appearance of any alphabetic character that this item must be examined for a match whereas any numeral can be ignored. This difference in difficulty is further evidenced by the fact that observers made very few errors in the homogeneous case in which the target category items were never used as nontargets, but made many in the case in which heterogeneous items from the target category could be used as nontargets. Shiffrin and Schneider's (1977) experiment is, therefore, likely to be tapping much more complicated perceptual processes than it may at first seem. From one point of view, the sometimes-parallel results (no change in response time as a function of the number of target items) and the sometimes-serial results (a linear increase in response times with an increase in the number of target items) may be considered to be only the end points of a continuum describing the behavior of observers when they are presented with tasks of various and varying levels of difficulty. When the task is relatively simple, then faster and less computationally demanding processes can be executed in what seems to be a parallel manner. The simplest condition of all exists when the stimuli are so distinctiy different in a physieal manner (color, shape, size, etc.) that the processing is so fast that the percept "pops out" in a virtually automatie and preattentive way reflecting a patentiy parallel discrimination rather than serial search or recognition processes. There are other options that can simulate parallel processing, such as the sub-
244
5.
THE RECOGNITION OF VISUAL FORMS
stitution of one process for another that I mentioned earlier. More difficult tasks may not be so easily restructured as alternative tasks and may require more extensive, attentive effort: As the computational and analytic complexities increase, the observer must concentrate a limited supply of "attentive energy" on a more narrowly defined portion of the task at hand. As the observer concentrates and slows down, the underlying processes may simply be becoming more difficult and must increasingly be carried out in sequence; they thus appear to the experimenter in the guise that we have come to call "serial. " This line of thought suggests that the serial and parallel controversy may be a false dichotomy. Rather it may be that some kind of psychobiological reality exists in which there is a more or less constant amount of "mental processing capacity " (if I may use this phrase without being challenged to define it too precisely) that can be allocated either to single difficult tasks or to multiple easy tasks. This is exactly the theoretical theme proposed by the late Marilyn Shaw (whose tragic and premature death diminished the profession of psychology in many different ways) and her colleagues. Shaw (1978) was among the first to propose a formal model that emphasized that in addition to some limit on the amount of information that could be stored in the various memories of the observer , there was also a limit on the amount of processing capacity available. Depending on the nature and difficulty of the tasks with which the observer was confronted, various numbers of processes could be simultaneously executed. More recently, Shaw joined with other colleagues (Harris, Shaw, & Altom, 1985; Harris, Shaw, & Bates, 1979) to expand this allocation-of-limited-storeof-attentive-energy theory in ways that are extremely germane to the present discussion. The original model (Shaw, 1978) was expanded in these two more recent papers from one purely dependent on limited capacity to one that stressed the overlapping of various subprocesses in the visual-search tasks with which we are now concerned. The term "overlapping", the essential idea in the Shaw et al. theory, is presented by them as an alternative hypothesis to an extreme serialversus-parallel dichotomy. This new version of their theory proposes the existence of a "scanning" mechanism that, under certain conditions, is able to process more than one a item at a time. The process is sequential in terms of its information acquisition, but can operate on several items simultaneously. Thus, there is an overlap of the processing of items that entered previously, but that have not yet been completely processed, and those that enter later. Accepting the idea from the work of Sperling (1963) that it takes about 10 msec to process a single character, Shaw and her colleagues proposed that the scanning mechanism stepped along from alphabetic character to alphabetic character sequentially entering a new item into the processing mechanism at this rate. However, the essentially new aspect of their model is the proposal that the processing need not be completed for any character in the 10-msec acquisition period. If the processing takes longer than that quantal period of time, several items may simultaneously be processed. The processing of several items thus
C.
SPECIAL TOPICS
245
overlaps in time: by one definition this is a partially parallel system. It is also, of course, from another point of view, aserial system, in that the items are accessed in serial order. In point of fact, it is neither aserial nor a parallel system, but an overlapping one that can simulate either at its extremes and also behave quite differently from either in intermediate conditions. The amount of service that an item receives by the processing mechanism is a function of several factors, according to Shaw and her colleagues. First, it is a function of the number of items in the processor at any given time. Second, it depends to a different degree upon the number of items that entered the processor before the item being processed and the number of items that entered the processor after the item currently being processed. There is also a minimum amount of time within which any character can be processed, a value that for quite separate reasons the authors assumed was about 40 msec. But, for lists with multiple items, the factors mentioned above may prolong this processing time. Obviously, and this is the essence of the Shaw, et al. theory, if the minimum processing time for each item is 40 msec and the access time for each item is 10 msec, the processing of items must overlap. The overlap theory is empirically quite effective in modeling a wide variety of data. In the last paper in this series (Harris, Shaw, & Altom, 1985), the authors note that most of the previous experimental attempts to distinguish between serial and parallel processes produced inconclusive results-results that neither completely reflected serial nor parallel mechanisms. The very inconclusivity of the data forthcoming from so many of these experiments is, itself, compelling evidence that something intermediate between the two must actually be occurring and that the serial-parallel dichotomy ofthe extremes is probably an inappropriate model of the actual mechanisms. It must be appreciated, however, that the overlap model, as any other mathematical function, cannot establish beyond doubt anything specific about the nature of the underlying mechanisms. What it does do is establish an existence proof that something intermediate between the two extreme alternatives can and probably does exist. As such it transforms the serial-parallel question from one of great and enthusiastic absoluteness into a much less well-structured query about what was originally formulated as an unrealistic dichotomy. The eventual resolution of such exclusive and extremist controversies as the parallel-serial argument in the form of some sort of an intermediate compromise represents a step forward in the annals of this science. It is also typical of the eventual outcome when dichotomous arguments of this kind occur in any science. Perhaps the best known analogy is from physics-the wave-and-quantum issue that divided physicists for so many years. But, there are other significant and important historical examples of the power of such theoretical compromise, not the least of which is the heredity-environment debate that ravished psychology and biology until it came to be appreciated that neither could fully explain the richness and variety of adult behavior. In the same way, the serial-parallel con-
246
5.
THE RECOGNITION OF VISUAL FORMS
troversy seems to have been resolved by an outstanding act of theoretical insightthe overlap theory proposed by Shaw and her colleagues. There is also other evidence that the extreme serial and parallel alternatives may not be quite as distinct as they may at first seem. Jordan (1986), for exampIe, has developed a model of certain forms of serial behavior (specifically of coarticulation-the utterance of aseries of overlapping sounds-and dual task exeeution-simultaneously carrying out two tasks) based upon a parallel-processing network. He shows that the apparent temporal sequencing of superficially serial behaviors can easily be generated from simple parallel networks if one introduces some kind of a feedback process either from the output of a unit in a network to earlier stages of that network or even from the unit emitting the output to itself. In the latter case, the feedback is referred to as being "reeurrent." The specific formal details of Jordan's model are not germane to the present context, but it is important to consider the possibility that meehanisms that seem to be behaving in one manner can actually be implemented by the meehanical structure of the other. Thus, meehanisms behaving according to principles that must certainly be considered to be parallel at the microscopic level, can, if properly interconneeted, produce macroscopic behavior that is patently serial and vice versa. It is neeessary, therefore, to make a dear distinction between behavior and meehanism. This is often not done in literature of this field. Psychologists typically use intermediate terms (e.g., process) in a way that is intrinsically ambiguous. The strong evidence that some kind of overlapping must occur in the visualsearch task that has been provided by Shaw and her colleagues, and the formal demonstration of the fact that parallel meehanisms can give rise to serial behavior (and vice versa; if the serial process is fast enough, apparently parallel system behavior emerges from the operation of a single, but very fast, serial meehanism), raise serious additional questions about the validity of the parallel-serial dichotorny. Confounding the difficulty of distinguishing between aserial and a parallel meehanism is the "black-box problem" to which I have repeatedly alluded earlier in this book. The theoreticallimit to our ability to infer inner mechanism of the kind denoted by such words as serial and parallel from extemal behavior is another due that the serial-parallel controversy invoked by cognitive psychologists may be a false one. The next major theoretical issue in the visual-search domain concerns the nature of the interactions between different categories of stimulus materials. From the earliest period in which this problem had been studied, there have been repeated suggestions that the semantic, cognitive, or categorical nature of the material used as target and non target items affected search behavior. This specific issue has been rephrased in a number of different ways (e.g., Do we extract general categorical information before we dassify the specific item?), but the fundamental issue remains the same. Over the last deeade or so in which interest in this problem has been at a peak, theory has tumed to shambles as some of the early ideas
C.
SPECIAL TOPICS
247
have been shown to be artifactual or incorrect. Abrief history places the recent chaos in this field in an appropriate context. The experimental paradigm in which the category task is assayed is a fairly straightforward modification ofthe basic search task utilized by Sternberg (1967) and, indeed, was prototyped in his experimental work. The significant modification from a pure and simple search task is that the target and nontarget items are chosen to come from different "categories"; for example, the target character may be a letter whereas the nontarget items might be numerals. One main control condition is the one in which the target and nontarget items are chosen from the safe category-either all numerals or all alphabetic characters. When the target and nontargets came from different categories, the experiment is said to use a between-category design. When the target items and the nontarget items come from the same category, the experiment is referred to as one displaying a within-category design. (Be careful to note the differences between these two terms and the closely related use of heterogeneous and homogeneous defmed earlier. They are not the same.) As we have seen in my earlier discussion ofthe Sternberg (1967) experiment, within-category experiments produce a very linear increasing slope for the reaction-time experiment; each character in the list took about 35 msec (as opposed to 25 msec in a subsequent study by Jonides & Gleitman, 1972). The original results of experiments purporting to show a category effect, however, indicated that in the between-category condition, the slope of the reaction-time function was much flatter, indeed, often approaching a perfectly flat function. In addition to Sternberg's work, this result was also forthcoming in studies by Brand (1971), Ingling (1972), Egeth, Jonides, and Wall (1972), and in the most extensive series of studies of the category effeet by Jonides and Gleitman (1972, 1976) and Gleitman and Jonides (1976, 1978). Jonides and Gleitman (1972), for example, reported, among many other strong effects of category, that ambiguous letters such as a 0, which could be interpreted as either as the letter "oh" or the numeral "zero", would be dealt with quite differently, depending on whether observers were instructed to deal with them as either a between- or within-category item. That is, if in a list of alphabetic characters, the observers were told to search for an numerical "zero," search times would be much less than if they were asked to search for the same physical stimulus, but to deal with it as if it were an alphabetic "oh." The implication of this result was that the semantic category in which the item was placed determined how it would be processed and that, more generally , semantic category could be a powerful determinant ofvisual-search reaction times. This special and very dramatic case of the category effect captured an enormous amount of attention among cognitive psychologists at the time because it seemed to suggest a level of cognitive penetration of what hitherto had seemed to be very simple search tasks that was quite unexpected and, therefore, quite exciting. However, several investigators have reexamined the evidence for the visual search categorization effects in the 1980s. It now appears, in the light of these
248
5.
THE RECOGNITION OF VISUAL FORMS
new experiments, that much of that evidence for these pseudoexciting results is beginning to evaporate. Specifically, Duncan (1983) replicated as exactly as was possible the original Jonides and Gleitman (1972) "zero" and "oh" experiment, but did not find the reported results. In both the within- and between-category conditions, observers produced the same positive linear slopes for this ambiguous target. The different reaction times for the between- and within- categories that had been reported by Jonides and Gleitman were replicated in Duncan's experiments only when the targets were not ambiguous. But even in this latter case, Duncan points out, the results are equivocal, in that even in situations in which the putative "category" effects seem to be present, they can be eliminated if "the physical resemblance between targets and nontargets is controlled" (p. 231). More specific refutation of the existence of the category effects has been forthcoming from Krueger (1984). To appreciate the impact ofthis important study, it must be remembered that if the target and nontarget stimuli varied in some salient physical dimension such as color or size, there was likely to be a preattentive "popout' , of the target stimulus from what was essentially a different physical background. All of the characteristics of this kind of task suggest that performance in this case is preattentive and independent of the cognitive or semantic context of the stimuli. Upon dose examination, this kind of task seems much doser to the texture-discrimination experiments and the experimental results that were discussed in the previous chapter on what might more appropriately be designated as a discrimination process. Krueger specifically proposed and tested the hypothesis that the "semantic category effect," reported so frequently in the 1970s, was, in fact, a manifestation of just this kind of physical-stimulus difference. In other words, he suggested that there is some physical difference between the alphabetic characters and the numeric stimuli that would allow them to be distinguished in the same manner as would a group of red lights within a field of green lights, or as a group of curved lines would be discriminated from among a field of curved lines. In his experiments Krueger equated the letters and numerals in terms of their physical appearance: He controlled the local curves and lines from which the stimuli were constructed. Under these conditions of physically equivalent stimuli, the so-called cognitive-category effect disappeared. In other control experiments, Krueger found that if the local detailed microstructure of the alphabetic characters and numerals was not controlled, then a substantial "category" effect did appear. The implication of these findings is that the category effect is not real; rather that it is due to an artifact of the simplest kind-the alphabetic characters and numerals are distinguished from each other by their physical properties. In a similar vein, Krueger went on to note that differential familiarity between characters and numerals, itself, might playa very important role in simulating an influence of category with characters and letters.
2. Object and Word Superiority Closf"ly related to the category effect is another phenomenon of form recognition
C.
SPECIAL TOPICS
249
that has been referred to as the word- and/or object-superiority effect. Briefly this effect can be observed in either the context of words constructed from alphabetic characters or diagrammatic objects. The effect, which was brought back into prominence by Reicher (1969) in another one of those fruitful and seminal experiments, was first reported by Cattell (1886), a century ago. Reicher showed again that the recognizability of individual alphabetic characters increased drastically when they were presented in the context of a meaningful word as opposed to when they were presented alone or in the context of a nonsense word. Reicher's experiment was carefully designed to overcome a number of difficulties with earlier work that had been contaminated by guessing behavior or sequential dependeneies. The main control he used was to require the observer to choose between only two alternative alphabetic responses, either one ofwhich might form a word or might be a part of a nonword. Thus there is one critical experimental condition-the character in a real word-and two control conditions-the character alone and the character in a nonword. The general result of Reicher' s (1969) experiment was that the letters that were embedded in real words were recognized much better than letters presented alone or in nonwords. The fact that the observers were required to choose between two letters (a two-alternative, forced-choice psychophysical procedure) guaranteed that the guessing behavior that had contaminated many previous experiments concerning word recognition would not be a problem. The stimuli were also masked by a pattern of visual "noise" presented a short time after their presentation. This kept the ob server from repeatedly reexamining the visual icon (a form of very short-term storage) by "obliterating the stored image after a short period of observation time." Reicher's experiment was particularly interesting to psychologists because it was another one of those linking experiments that tied together what bad previously been considered to be two more or less independent mental activities-the recognition of visual form and the cognitive aspects of information acquisition and interpretation. For the meaning of a word or even the sequential dependencies between the letters of the alphabet to affect the recognition of a form was a result that cried out that the recognition process was not simply the results of some simplistic feature analysis or template fitting. Rather, it should be classified at a much higher level process involving meaning and information about the environment in which that form was embedded. This was an extraordinary result and immediately changed the way (or should have changed the way) in which letter-recognition experiments were interpreted. A full review of the literature of word-superiority effects has been given by Baron (1978), and I have more to say about theoretical analyses of this topic later, but for the moment, let us turn to a closely related topic-the object-superiority phenomena. The object-superiority effects are of the same genre as the word-superiority effects but do not depend on the semantic content of words. Rather, they depend on the geometrical order of organized structures. The effects (in the paradigm I discuss here) were first demonstrated by Weisstein and Harris (1974). Once
250
5.
THE RECOGNITION OF VISUAL FORMS
again, their work was reported in abrief article in the journal Science and further developed as a paradigm by Williams and Weisstein (1978) and Weisstein, Williams, and Harris (1982). Figure 5.2 shows the type of stimuli that were used in the experiment. The observer is asked to "identify" by naming (with a code such as an alphabetic character or by depressing one of several alternative keys) which one of several familiar test lines was present in each case. In some experiments a pre-or postmask might be used; in others no masking was utilized. The general result obtained in these experiments was analogous to the general effects observed by Reicher (1969) in the word-superiority experiment. The stimulus lines were identified better when they had a "meaningful" (i.e., in this case, an organized geometrical structure) context surrounding them than when they were presented alone. The effect has been attributed to a variety of different causes (as comprehensively reviewed by Enns & Prinzmetal, 1984) including:
1. The apparent three-dimensionality (Lanze, Weisstein, & Harris, 1982) of the complex stimuli (reflecting the assumption that three-dimensional stimuli were processed better than two-dimensional ones). 2. The production of emergent features such as arrows and triangles in the more organized and complex figures (Lanze, Maguire, & Weisstein, 1985; Pomerantz, Sager, & Stoever, 1977). 3. The activation of additional global organization-processing mechanisms when organized figures are presented (Earhard, 1980). 4. The possibility that separate mechanisms are activated by straight lines and complex figures (therefore, placing a line in a structured context made it immune
(Al
a
b
c
d
(Al
a
b
c
d
FIG.5.2 Stimuli used in Weisstein and Harris's work on the object superiority effect (from Weisstein & Harris, 1974).
C.
SPECIAL TOPICS
251
to the masking by the set of disordered straight lines that is typically used as a mask in this type of experiment), an explanation proposed by McClelland (1978). Enns and Prinzmetal (1984) have carried out several experiments concerning the object-superiority effect and have analyzed the problem from a somewhat different perspective. They point out that most of the previous research and explanatory models have falled to find any one or any group of the several variables that are known to influence the object-superiority effect to be sufficient to account for all of the various phenomena. One variable after another has been shown to influence the results of an experiment of this kind, but no unified principle emerges that can account for their collective influence. They propose, instead, that the results of the object-superiority effect can be totally accounted for by a process that I believe is also likely to explain the word-superiority effect discovered by Reicher. Enns and Prinzmetal suggest that the object superiority effect occurs only when the "object" in which the target line is placed conveys redundant or additional information that is not conveyed by the line alone. That is, the context is not simply making the target line more detectable but is able, by virtue of its own structure, to convey information about which line should have been there even ifit (the target Une) was not present. The assertion they are making is that object superiority is not a perceptual phenomena, but one of inference and higher level cognitive processing. The structures in which the target line is embedded are not simply noninformational "contexts" but are themselves clues to the solution to the problem posed to the observer . Enns and Prinzmetal, therefore, make the point that all of the discussion concerning holistic processing mechanisms, emergent features, or depth are largely irrelevant. These perceptual attributes are correlated with the critical variable-the amount of redundant nongeometrical information available to supplement the geometrical information. This objectsuperiority effect is really an experiment on the meaning and interpretation of geometric forms and does not really assay the perceptual visibility of the stimuli: It is, in other words, an experiment of inference and not of feature recognition. The word-superiority effect is also, I believe, of this same genre. Even though some attempt has traditionally been made to control for the sequential redundancy of the letters in the words, the observed effects are not so much associated with the increased visibility of the letters as they are of the structure of the English language. To talk of such processes as being inhibitory or excitatory may allow one to simulate the effects or act as an analogy, but the interactions between the letters in the word-superiority effect are , in my opinion, more likely to be associated with the sequential dependencies of letters in a language than with spatial interactions among the geometrical features of the letters. The most extensive theoretical statement of the word-superiority effect was published by two groups at approximately the same time and with essentially the
252
5.
THE RECOGNITION OF VISUAL FORMS
same point of view. McClelland and Rumelhart (1981) and Rumelhart and McClelland (1982), on the one hand, and Paap, Newsome, McDonald, and Schvaneveldt (1982), on the other, have proposed what essentially is a featureoriented model of the word-superiority phenomenon. Their theories are based upon a set of concepts that clearly seem to have emerged by analogy from the neurophysiological laboratory. Although cloaked in psychological terminology and not associated with specific neurons per se or even anatomic loci by any of these authors, both groups have presented a model that asserts that there are "inhibitory" and "excitatory" interactions between "detectors" for features, letters, and words. The three kinds of detectors are presumed to lie at increasingly higher hierarchical levels within the perceptual nervous system, but no attempt is made to precisely define the absolute anatomy of this system other than to note that they become selectively more complex in their ability to deal with the units ofthe stimuli (local geometrical features, letters, and words). Each ofthese levels (and presumably others even higher in the hierarchy that we would otherwise refer to as cognitive processes) are able to influence those at lower levels. The process is interactive in the sense that high-level processes are influenced by lower level ones, as well as vice versa, and lateral inhibitions hetween comparabie detectors are also invoked. Indeed there can even be intersensory interactions with acoustic responses, and vice versa. The basic idea in both of these models (as applied to the letter-identification situation) is that the perception of incompletely detected letters can be enhanced by the activation of possible words that might contain the letters. It is at this point that the metaphor introduced in this type of theory between perception and the putative "detectors" begins to strain plausibility. The kind of activation and interaction being discussed, which is claimed by theoreticians of this school of thought to be due to the interactions (inhibition and excitation of the severallevels of their hypothetical "detectors"), becomes an untestable metaphor for the mechanisms actually underlying these processes. Indeed, the model at this level comes very dose to heing more parsimoniously explained in terms of the sequential dependencies of words in the English language than in terms of, for example, any inhibitory interactions between the putative feature or letter detectors. The metaphorical interaction between "detectors" that are proposed by these workers seems to be an oversimplification par excellence. The point is that the simulations used by these authors are based upon mathematical equations that could as easily represent linguistic dependencies in the English language as they could hypothetical neural inhibitions and excitations. The analogy drawn between the detectors and sequential dependencies is useful in a mathematical sense, but it could as easily be supplanted by the language of the higher, linguistic level, which although less physiologically analyzable is equally well modeled by the chosen mathematics. The goodness of fit of the model to whatever data is available is virtually irrelevant in this case. It is the meaning of the factors in the formal equations that is the issue at hand and, as with so
C.
SPECIAL TOPICS
253
many other mathematical models, even a perfeet fit of the model is not tantamount to the validation of the proposed meehanisms. The mathematical model is a description-a description that can be interpreted in terms of a number of different underlying implementations. Another important feature of this kind of model is that all of these authors are very specifically suggesting an advantageous response accuracy when a letter is perceived embedded in a word. This is, to them, a perception model, and not one in which higher level interpretations allow the observer to in/er the correet response. The observer, it is assumed, actually sees the letters in words even though the same letter may not be seen when it is presented in isolation. I, for one, do not believe this general approach to be correet. Word superiority, like object superiority, is an inferential, not a perceptual phenomenon. 3.
Computational Reconstruction
One of the most influential of the new approaches to form reeognition is the computational vision approach that was originally championed by the late David Marr and his colleagues and that has now beeome a major driving force of a substantial portion of contemporary research and theory in vision. The computational approach is based upon the idea that although we cannot currently (and may not ever) be able to understand the specific neurophysiology or the specific logical "language" that underlies any visual process, it is possible to look at the transformations that are carried out and to determine from them what functional processes must be exeeuted to pass from the input form to the output form. Understanding, from the point of view of Marr and his followers, would then be achieved when a computational model carries out these processes in a way that simulates the transformations even though it may do so by logical rules and computational algorithms that are quite distinct from those actually used by the human visual system. To make this point clear, for it is the essence of this school of thought, the investigator may wish to determine what has to be done to reeonstruct a threedimensional visual form from some attribute of the stimulus such as its motion or the disparities between the images in the two eyes. The tone of the research that is carried out by computational vision theorists is not, therefore, characterized by questions like: How do we reeognize forms? but rather by very special queries such as: How do we imitate the processes that trans form the invariances implicit in the slightly different images on the retina to information that is interpretable as three-dimensional structure? In the sense that the observer is reeonstructing form from something that is less than (that kind of) form, it is an act of classification and recognition that I think fits better in this context than in any other. As an aside I should note that the perceptual tasks of reconstruction and recognition are psychophysically indistinguishable. In my laboratory the two tasks are procedurally identical; both require that the observer name the stimulus object.
254
5.
THE RECOGNITION OF VISUAL FORMS
The differences in the experiment turn out to be the nature of the stimulus itself. If one is asked to reconstruct something, fewer parts (or fewer parts in the appropriate order) are presented to the observer than when he or she is asked to recognize it. I believe that this would be the case with virtually all comparable tasks. To return to the main point, the computational vision tradition that has emerged today is one in which the steps in going from one information base-the input form-to another-the output form-(which is presumed to be implicit in the first) are extracted, identified, and then simulated. There is seemingly little interest in carrying out the steps, so identified, by necessarily using the same logical rules the observer does. In fact, there seems to be an implicit acceptance of the idea that we cannot, in principle, know the exact way in which these steps are carried out by the different logical mechanisms of the brain even if lip-service is often given to an opposite point of view. The emphasis, to the contrary, is on the steps that "must" be taken to make the transformation from the form of the information in the stimulus to the form of the information in the response. We "must," for example, be able to specify which points in one eye's view correspond to which points in the other eye's view (the correspondence problem) if we are to determine (by other steps of similar complexity) what the three-dimensional shape is of the dichoptically presented object. The main intellectual goal of this kind oftheory, therefore, becomes the determination ofthe necessary processing steps: The development of analogous and efficient, if not biologically homologous, mathematiCal algorithms to implement the identified steps becomes a mere technical tour de force, which, though impressive, is not at the heart of the theoretical problem stressed by students of human perceptions: What do people do when they recognize forms? It cannot be overstressed that although the steps may be intended to have biological, as wen as logical, significance, there is really nothing about the specific mathematical algorithms that need necessarily be linked to the underlying biology . I doubt if any of the leaders in this field would argue that a Laplacian operator, for example, is actually executed in the brain in the same way as the algorithm is executed in the computer. Rather, there is a processing step, or a function that has to be carried out-edge enhancement-and one way to do so is by use of this particular mathematical tool. But that same processing step may be executed by totally different algorithms within the brain. The summary point of this brief introduction to computational vision is that the practical success of a particular algorithm in carrying out some informationprocessing step is not a proof that the particular algorithm is being executed in the head. There may be literally an infinite number of analogous procedures that would do as wen. Thus, Marr's computational approach, and that of all of those whom he has influenced, is fundamentally intended to concentrate on the information-processing aspects of the transformations that are presumed to underlie visual perception
C.
SPECIAL TOPICS
255
independently of the neurophysiological and logical processes of the brain that are (for reasons of complexity) largely invisible to the experimenter or theoretician. It is my opinion that, in large part, the emergence of this brand of computational theory was a reaction to the failures of both the single-cell neurophysiological approach to carry on beyond the enormous progress in understanding individual neurons to the level at wbich the essence of the neural problem must lie-the interactive relationships between neurons-and the failure of psychological theory to be validated. Coupled with the combinatorial barrier-too many neurons interacting to be computed-is the fundamental difficulty with psychophysics-the "black-box" limitation-that is, its inability to say anything definitive about the underlying mechanisms. It must also be appreciated that Marr's ideas, as expressed in bis important 1982 book, also transcended the specifics of the computer program itself by concentrating on the information-processing aspects of the problem. In the case of the kinds of problems he and bis followers have worked on, the term information is very precisely defined. Marr was trying to determine what logical transformations "must" be carried out to enable the perceiver to transform the properties of the stimulus pattern into those of the psychological experience. To the extent that these information transformations could be identified, then, the visual process was "understood" in Marr's view, even though it was not possible to know much about the specific neural mechanisms or the (quite possibly very different than those used in the computer program) computational algorithms actually carried out in the nervous system. In Marr's (1982) words: The message was plain. There must exist an additional level of understanding at which the character of the information-processing tasks carried out during perception are analyzed and understood in a way that is independent of the particular mechanisms and structures that implement them in our heads. This was what was missing-the analysis of the problem as an information-processing task. Such analysis does not usurp an understanding at the other levels-of neurons or of computer programs-but it is a necessary complement to them, since without it there can be no real understanding of the function of all those neurons. (p. 19)
Marr then notes that there are three levels of this task with wbich the computational theorists must be concerned. First the theorist must be concerned with the logic and goals of the transformation from input to output. It must be appreciated what it is that the program must do and what strategies are suitable, if not biologically relevant, for acbieving these intended goals. This level of analysis is a conceptual one and involves none of the technical details of the procedures or algorithms of the computer or the neural logic of the brain. Detailing the program is the essence of the next level. The theorist must then, according to Marr, commit himself to specifying the exact information processes to be simulated and the ways in wbich the images will be encoded or represented in order to be so processed. The specific algorithms necessary to carry out the transformation from the raw input to the processed output must be established.
256
5.
THE RECOGNITION OF VISUAL FORMS
Finally, at the third level, the theorist must be concerned about the ways in wbich these algorithms can be implemented physically. (I should note some disagreement on my part with this assertion.) In fact, this third level has to be divided into two separate sublevels. First, one should be concerned with the physical implementation of the information-processing algorithms as theoretical tools. What kind of computational mechanisms can execute these processes and make the necessary transformations? This is a sublevel that is certainly acbievable. However, the second sublevel of this third level of Marr's analysis implies that one must also attempt to define the actual physiological implementation in the nervous system of these processes, and it is this kind of analysis that, as 1 have already noted, may not be achievable. What Marr was seeking, 1 believe, is a statement of the information processes that must be executed to transform stimuli to percepts on the one hand and some speculation concerning plausible mechanisms on the other. 1 do not, however, believe that it is possible, or that it will ever be possible, to definitively assert which of the many plausible mechanisms that might be conjured up are indeed the physiologically real ones because of the essential indeterminacy of the mathematical and input-output approaches with regard to internal structures. This is the essence of Moore's automaton theories, to which 1 have referred several times earlier. It is in the specifications ofthe information processes that "must" be executed, as opposed to either elegant computation algorithms or wilder speculations cOßcerning the specific neural mechanisms that Marr's (1982) work makes its most significant contribution. In particular, he and bis colleagues have made great strides in understanding the nature of the computations that "must" be carried out to derive three-dimensional shape information from the two dimensional projective images formed on the retina. Marr suggests that there are four sequential processing steps required to make this transformation. The four levels of processing are: 1. 2. 3. 4.
The The The The
representation primal sketch 2112-D sketch 3-D model representation
Table 5.1 (from Marr, 1982) describes the purpose of each of these steps and the primitives (or parameters of the representation) at each stage that must be manipulated to accomplish the task at each level. Marr's work obviously ranges over a very wide prospect. But in terms of general approach, it is mainly concerned with how a system might implement the transformations that it must go through to arrive at a stage at wbich recognition or classification ofthe solid objects derived from the 2-D representation can be made. Grimson's (1981) computational approach is in the same vein and con-
C.
SPECIAL TOPICS
257
TABlE 5.1 The Marr Theory of the Transformation Steps Involved in Stereoscopie Depth Perception (from Marr, 1982)
Name
Purpose
Image(s)
Represents intensitv.
Primal sketch
Makes explicit important information about the two-dimensional image, primarilv the intensitv changes there and their geometrical distribution and organization.
2 '12 -D sketch
Makes explicit the orientation and rough depth of the visible surfaces, and contours of discontinuities in these quantities in a viewer-centered coordinate frame. Describes shapes and their spatial organization in an objectcentered coordinate frame, using a modular hierarchical representation that includes volumetrie primitives (i.e., primitives that represent the volume of space that a shape occupies) as weil as surface primitives.
3-D model representation
Primitives Intensitv value at each point in the image. Zero-crossings Blobs Termination and discontinuities Edge segments Virtual lines Groups Curvilinear organization Boundaries local surface orientation (the "needles" primitives) Distance from viewer Discontinuities in depth Discontinuities in surface orientation 3-D models arranged hierarchicalIV, each one based on a spatial configuration of a few sticks or axes, to which volumetrie or surface shape primitives are attached
centrates on the disparity aspects of the stereopsis problem. Grimson's version of a computational model is illustrated in Fig. 5.3. From the same group at MIT has come the closely related computational theory ofUllman (1979), dealing with the manner in which the three-dimensional structure of an object can be reconstructed from knowledge about the relative motions of its parts in a two-dimensional projection. Ullman concentrates on how we can use such cues as relative motion as a cue to the three-dimensional organization of a form (structure from motion) just as Marr and Grimson emphasized disparitytype computations. Work in this field has progressed rapidly in the last few years mainly because of the very well-defined nature of the problem of how structure can be retrieved from motion. How precisely formulated this issue has become is evidenced by the very specific question motivating current research in this field: Given certain constraints, how many points and how many views does it take to define a sampled threedimensional form when the relative two-dimensional motion of the dots of which
258
5.
THE RECOGNITION OF VISUAL FORMS
RIGHT
LEFT
IMAGES
EDGE DETECTION OTHER
PRIMAL SKETCH STEREO
MOTION RAW 2'h-D SKETCH
.lNTERPOLAT,oN
FULL 2'h-D SKETCH
FIG.5.3.
A model of stereoscopie visual perception (trom Grimson, 1981).
it is composed is the only cue? Investigators have presented stimuli to observers as a means of psychophysically answering this question, but mathematicians have also attacked it analytically. A comparison of the two approaches is most informative because it allows us to determine how close the human observer comes to being an ideal observer-the later state being defmed by the mathematics simulating the situation. It is fascinating to observe that the mathematics answering the question How many views of how many points does one need to recover the three-dimensional shape of an object from the relative positions of the points in sequential views? is very weIl developed. Indeed, the question has been answered for several of the constraints that can limit the possible family of threedimensional shapes. The work has been done mainly by Ullman (1979) for the case in which structural rigidity of the three-dimensional object is assumed and by Hoffman and his colleagues and others for a number of other situations in which planar motion, constant velocity, or fixed-axis constraints apply. In general this table indicates that as additional constraints are imposed a smaller number ofviews and points is required to reconstruct the visual form. Similarly, increasing the number of dots will reduce the number of views, and increasing the number of views will reduce the number of dots theoretically necessary to reconstruct a form under any ofthese conditions. Table 5.2 summarizes the various answers to the question How many views of how many points are necessary to reconstruct? for a number of these constraining conditions. Intuitive predictions for the extreme conditions in which a very large number
C.
SPECIAL TOPICS
259
TABLE 5.2 Sufficient Conditions (How Many Views and How Many Oots are Requiredl for Recovery of Three-Oimensional Structure from Moving Sets of Points for Various Constraints From Braunstein, Hoffman, Shapiro, Andersen, and Bennet, 1986 Number of points
2
$
....u
Rigid planar motion a
3
.§
.!!!
Cl
....0 "-
CI> .Q
E ::l
4
Pairwise-rigid and planar motion a
2
~ CI>
3
Rigid fixed axis motion parallel to image plane constant angular velocityb
Rigid motion C Rigid fixed axis motion b
Nonrigid fixed axis motion d
Nonrigid fixed axis motion d
4
Rigid fixed axis motion, constant angular velocitye
5
Rigid fixed axis motion e
~
aHoffman & Flinchbaugh, 1982; bHoffman & Bennett, 1986; cUllman, 1979; dBennett & Hoffman, 1985; eHoffman & Bennett. 1986.
of dots or a very large number of views are available have been confirmed psychophysically by Lappin, Doner, and Kottas (1980) and Lappin and Fuqua (1983). The former paper showed that very large numbers of dots (i.e., 512) allowed the observer to perceive a sphere when only two different views were presented. Conversely if the the observer could see only three dots, it was still possible to reconstruct depth when a large number of views was allowed even if each view was as far apart as 120 deg of rotation of the represented threedimensional sphere. The work of Lappin and bis colleagues, however, did not look at the minimum conditions that were formally specified in Table 5.2. These conditions are at the opposite end of the continuum-the smallest number of combined dots and views that are necessary to reconstruct the form. Braunstein, Hoffinan, Shapiro, Andersen, and Bennett (1986) have attacked this problem psychophysically by presenting dot stimuli that simultaneously contained small numbers of dots and small numbers of views. The objects used as stimuli were also constructed from sampies of the dots on a sphere, but with few dots, they actually could represent
260
5.
THE RECOGNITION OF VISUAL FORMS
simpler structures. Observers were asked to make a discriminative judgment of , 'same" or "different" of two presentations that differed in the position of only one dot. To make the task reasonably difficult, the stimuli were rotated with respect to each other so that the two stimulus presentations were not identical even when the forms were the same. The main variable in the experiment was the degree of constraint placed on the figures. The stimuli could be constrained to be either (a) a rigid body, (b) a rigid body rotating around a fixed axis, or (c) a rigid body rotating around a fixed axis at a constant velocity . In additional trials, motion was eliminated and only two points used. The results of this experiment were extraordinary-Braunstein and bis colleagues determined that their observers seemed actually to be performing better than the predictions of a theoretical model suggested that they should. Subservient to the assumption of rigidity, the model predicted, for example, that three views of four points should be necessary to recover three-dimensional structure. However, for some of the tested conditions they discovered that their observers were able to discriminate shape at better than chance levels and thus beyond the predictions of the models when only two points were presented. At first glance this suggests that the human visual system is able to perform better than does the theoretically ideal observer , but this dreadful impossibility can be exorcised if, as the authors of this study correct1y noted, other constraints, redundancies, or information are available to the observer that were not incorporated in the mathematical model. Three additional pieces of information that Braunstein and bis colleagues suggest may have been operating in this situation to allow the human to surpass the model were: 1. Ob servers may have assumed that distances in the two-dimensional projection and the three-dimensional object were related by a constant. This constraint, they believe is the principal one in explaining the high level of performance vis-a-vis theory. 2. Observers may have noted the lack of movement of the center of rotation of the stimuli. 3. Observers may have noted the limited range of the velocity and axis movements in those cases in which these parameters did vary. Each of these constraints,5 possibly available to the observer but not to the theoretical model, would have made it understandable how the observer's performance could exceed that predicted by the model without any assumption that 5 It is, of course, the essence of the psychological problem how and which of these constraints are imposed by the observer. By applying constraints (or, as some might call them, prejudices), the human observer surpasses the currently available "ideal" rnathernatical observer by simplifying the task. Thus the definition ofthe ideal-observer model usually follows the psychophysical data because the ideal observer is not likely to be as fully constrained as the human observer . Simply put, we do not have ideal modelers supplying ideal constraints to ideal observers.
C
SPECIAL TOPICS
261
the human observer was performing "better" than some full-blown ideal would allow. The problem lies, of course, with the model of the ideal. It, quite simply, was not specific, complete, or constrained enough. The psychophysical data, therefore, showed that the model must be refined to truly represent the visual process. Surprisingly, the general prediction that increasing the number of dots should decrease the number of views and generally increase performance did not hold up in these experiments either. In the low-dot-numerosity range in which they worked, Braunstein and colleagues discovered that adding dots actually reduced performance on the task in certain conditions. They attributed this unpredictable outcome to the increasing complexity of the spatial interactions between the dots in the sequential views. Obviously, the trend of this function must reverse when the number of dots becomes so large that they were dealt with more holistically as shown in the Lappin, Doner, and Kottas (1980) study. An important point to be extracted from this discussion is that any model contains but a subset of the total information contained in the entity or process being modeled. Thus, predictions made by the model may not incorporate all of the aspects and constraints of the real entity. Therefore, when a conflict occurs between a model and a psychophysical experiment, barring some rare artifact, it is virtually always necessary to modify the model to meet the results of the experiment rather than to doubt the phenomena. This holds true if the model is either mathematical or neural. The final authority in any conflict between theory and experiment, as occurred here, must ultimately be the perceptual phenomenon. Another instance in which performance exceeded theory is to be found in my own work. In my studies (Uttal, 1975, 1983, 1985, 1987) I have been studying the pereeption of dotted forms when they are embedded in random noise (as described in chapter 3). In its most "recent incamation (Uttal, Davis, Welke, & Kakarala, in press) this work has evolved to include the processes of diserimination and reeognition. In particular, my eolleagues, Ramakrishna Kakarala, Naney Davis, and Cynthia Welke, and I have recently been working on the problem of the reconstruction of single-valued, geometrical surfaces, such as those shown in Fig.3.1 when only very sparse sampies (Le., dots) are presented. The question asked in this ease was: How many dots (sampies) of a surface must be presented to an observer to permit recognition of which one of a set of surfaces had been sampled? The question is analogous to the one asked by Ullman, Braunstein, and the others who were interested in how many dots and how many views of a moving object were necessary to reconstruct its shape. In our case, only a single view of a certain number of dots was presented; the critical information necessary to accomplish the task being presented in the form of binocular disparity rather than motion, the cue used in the previously mentioned "structure from motion" studies . The answer to the question of how many dots arenecessary for reconstruction of a stationary object was quite straightforward mathematically. It was determined,
262
5.
THE RECOGNITION OF VISUAL FORMS
using a two-dimensional polynomial curve-fitting procedure, that if the sampled surfaces were generated from a polynomial expression that was of the second order-a quadric-it required at least five dots for mathematical reconstruction. If, on the other hand, the surface was generated from a third-order equation-a cubic-then it took at least seven sampIe dots for the mathematical model to correctly reconstruct the form from which the dots were sampled. In corresponding psychophysical experiments, we were also astonished to discover that our observers ' performance, as indicated by his or her accuracy at recognizing and naming the forms from those very sparsely sampled stimuli, also exceeded the predictions of the curve-fitting model!There was a better than 50% accuracy of recognition with as few as six dots. Furthermore, even surfaces sampled with as few as two or three dots were recognized at better than chance levels, far better than the predicted mathematical model. Almost certainly this level ofperformance occurred because our observers used information not available to the model in the simple form in which it was initially programmed. For example, the observer could use the information that a pair of dots can, in some cases, distinguish between two groups of the forms-the flatter and the more convex ones. Furthermore, our observers were only required to select one from among eight forms, whereas the mathematical model must select one from among virtually an infmitude of possible cubic or quadric forms. Nevertheless, and in spite of this discrepancy between data and theory, it is important to note that both the model and our observers require very few sampled dots to reconstruct these shapes. As an answer to the twin question (How many dots for reconstruction? Mathematically? Perceptually?), the number is considerably lower than initially projected. Reconstruction is a more general problem than I have indicated here so far. Not only can motion and disparity be used to specify the shape of an object, but the human has a powerful ability to recognize a complex scene consisting of many objects. Bach demonstration of the power of the human visual system to reconstruct form from one or another partial cue is another illustration of how far we have to go in understanding how the observer carries out these functions. There are certainly many other constraints and cues to reconstruction yet to be discovered. With the continuing evolution of computerized robotics, a vigorous cottage industry of computer-vision techniques has grown up, all of which attempt to simulate the reconstructive power of the visual system and all of which provide hints about potential reconstruction cues to the perceptual theoretician. Stevens (1983 , 1984, 1986), for example, has studied the reconstruction of forms from the slant of surface contour maps and gradients oftexture. We have already spoken of the reconstruction of three-dimensional forms from the cue that is implicit in the invariances of disparity in a stereoscopic picture, and Horn (1975) discusses the way in which the shading of an object can be used to specify the shape of asolid but differentially reflecting object. Horn (1983) has also proposed using unit surface perpendiculars-the Needle Map-to generate a function known as
C.
SPECIAL TOPICS
263
an extended Gaussian image of a surface. A wonderful property of the Gaussian image is that no two convex objects can have the same form of this expression, and it, therefore, is a formidable mathematical representation of functionally discriminable forms. I am sure that many other cues that I have not mentioned have been suggested as a means to recognize or reconstruct surfaces and shapes. The main point is that the visual system of organisms like the human seems to be able to take advantage of a wide variety of different cues as it recognizes visual scenes. Grimson (1984) makes some especially interesting points in an insightful discussion of the way brightness cues are used in the reconstruction of surfaces. Among bis most important points are the ideas that a comparison is often needed between two or more views (either the object moves, the head moves, or the two eyes receive not -quite-duplicate images). (See the comparable discussion of relational Level IV processes in my earlier work [Uttal, 1981] for another expression of the importance ofrelative comparisons in vision.) In addition Grimson also notes that in most visual situations there are far more reconstructions possible than the one that finally is perceived by the observer . To reduce the possible number, the observer must "assume" certain constraints, a process that vastly exceeds current computer capabilities. Both mathematics and perception become plausible in light of the inferred constrained possibilities. Some we have already discussed-for example, the rigidity constraint that Ullman, Braunstein, and other workers in the field of structure through motion, impose upon their mathematical models. Apparently, similar constraints are imposed by the human observer as the prompt awareness of three-dimensional form is created from a stimulus as impoverished as a few briefly exposed, slightly translated dots. In summary, the ability of human observers to reconstruct visual images from grossly degraded images is outstandingly good. In several cases we have seen that the human equals or even exceeds the performance of what must frankly be considered to be relatively primitive mathematical models designed to simulate the visual process. We do not know exactly what logical mechanisms are used inside the human perceptual system to accomplish these computational feats, but it is clear that the models that have been developed by computational theorists, are, at best, only remote approximations to completely reductive and biologically valid explanations of the wonders of the human vision. It seems quite unlikely, considering the combinatorics of the situation, that we will be able to unravel the intricate network details of the neurophysiological basis of form perception. Without that information, it becomes problematic exactly how far we can go in understanding the logical mechanisms that underlie the extraordinarily powerful ability of the the human to recognize forms. The newly emerging fact of the cognitive penetrability of recognition processes, in particular, further argues that the neural processes involved are probably far more complicated than those underlying detection and discrimination. Theories to explain human form recognition may be among the most refractory ever attempted by scholars in any field
264
5.
THE RECOGNITION OF VISUAL FORMS
of psychology, in spite of considerable progress in comparable fields of computer vision. UHman, as did Marr and the others of this emerging new school of computational vision research, also notes bis negative reaction to the neuroreductionistic theoretical approach and expresses his view that we must turn to more abstract computational theories to understand form recognition. I cannot overemphasize the importance ofthis point ofview. Marr, Ullman, Braunstein, Hoffman, Grimson, and the others are to a significant degree also asserting (as I do) in one context or another that the neuroreductionistic approach is not going to succeed for reasons of deep principle, not just because of current technologicallimitations. We all agree that the best possible progress toward understanding the complex mental process called form recognition requires extensive computational simulations of the informational processes and interactions more than it requires further study of individual neurons. What computational-vision theorists have done is to provide an organized and powerful research paradigm. In such elementary ways as their insistence on precise definition of what it is that is intended to be modeled, they have made an important contribution. So much of the psychologicalliterature of the last few decades has described, analyzed, or modeled processes that are very poorly defined, and fuzziness of explanations has foHowed fuzziness of definition. Recognition is a much coarser concept than is edge detection or three-dimensional reconstruction from sampled motion. Very often in the earlier psychologicalliterature, the defInition of the visual process under investigation was defined by nothing more than the task presented to the observer . This led to enormous difficulty in relating the results of different experiments or the outcomes of different theories to each other. Computational theory provides a precise statement of the specific processes and, perhaps even more fundamentally, represents an easy step to a taxonomy of the processing steps involved in complex acts of visual perception. There is much yet to be learned about the actual homologous processing steps, as opposed to the analogies that have been put forward, but without question the requirement that computer programs make for precision of definition is a superb contribution by itself. Our science is much better off than it was previously now that we are asking such questions as: How are contours intensified? How do we segment a complex scene into its constituent surfaces? How do we reconstruct a threedimensional object from disparity, motion, texture, lighting, or some other of the many cues to solidity? Even if we cannot currently answer these questions fuHy, we are moving forward. There is another critical interpretive point conceming computational models of vision that should be made at this point in our discussions. Many computational vision projects are specifically intended to be models ofhuman vision. This, however, is not always the case in the more applied realms of the AI field where computer techniques are purposely being developed to perform practical functions such as image c1assification and recognition, but without any specific intent
C
SPECIAL TOPICS
265
to simulate the actual processes and logic by which humans perceive form. There is a host of methods and algorithrns being developed today that are all but free of psychological theoretical overtones. These methods are intended only to allow engineers to carry out certain useful and necessary tasks. Even though the mathematics in these cases may develop its own theoretical superstructure and even be useful and germane as heuristics leading to the kinds of theories that might be developed to study human form perception, understanding human psychological properties is not the main goal of this field of endeavor. 4. The Recognition of Individual Forms
This chapter would be incomplete if we did not deal with the very specific problem of how we recognize single geometric forms. The problem is the classic one in form perception: How do we know what a face, or an alphabetic character, or, for that matter, any other circumscribed stimulus-object is when, at best, we are presented with a two-dimensional, projective retinal mapping of the object? In terms of alphabetic characters (which have enormous advantages as stimuli because of the substantial degree to which they have been overlearned and the fact that they represent such a well-defined and limited set of stimulus-objects): How can we distinguish that an A is what it is and that it is not a B, or, at a more subtle level, that a C is not a U (more subtle because both share the same topology)? Or in terms of a more complex, but more natural, set of stimulus objects-human faces: How does one recognize a particular face as that of his wife and not that of the gas station attendant or some other individual? As simple as it is to pose these questions, perceptual psychologists must admit that they are among the most refractory faced by science. After many decades, if not centuries, of research on the problem of alphabetic character and face recognition, we still know precious little about the nature of the information processes that allow us to effortlessly and preattentively carry out these deceptively simple tasks. The most modem research gives us only the slightest amount of information concerning the extraordinary mechanisms that must underlie our powerful recognition ability. For example, it was only recently established that low frequencies seem to convey less information in a face-recognition task than do the high frequencies. In other words, we now have confirmed a fact that should have been apriori obvious, namely that outlines or contours seem to be more important than the slowly changing and, therefore, redundant regions of a face in determining recognition. Nevertheless, and in spite of a rather meager psychophysica data base, innumerable pattern recognition theories and algorithrns have been suggested. Most, however, seem to be based upon what is arguably an erroneous assumption-that the visual system performs a detailed analysis of the objects from its global form into the local features of which it is constructed. I have already presented arguments that should have made my opposition to this point ofview dear, but another "argu-
266
5.
THE RECOGNITION OF VISUAL FORMS
FIG. 5.4.
Kolers's "random forms" (from Kolers. 1970).
ment" that is also effective is the collection of "random" objects shown in Fig. 5.4. It takes no sophisticated psychophysical test to determine that all of these objects fall into the same perceptual category-chair-even though they share virtually no local feature communalities. The same point is made by Fig. 5.5, in which a group of alphabetic characters of widely differing fonts can easily be recognized or classified into the same name category in spite of the fact that they, similarly, share virtually no common local features. The point to be driven horne once again by these demonstrations of first-order phenomena is that it is probably not the local features of a stimulus-object that dominate the recognition process, but rather the arrangement 0/ the parts. The parts from which a picture is constructed can be totally irrelevant or have their own symbolic load, and yet we still see the intended global form if they are properly arranged. There is also a number of more formal empirical studies that argue against the feature analytical approach. In general these experiments either show the precedence of the global form over the local feature or illustrate the bizarre nature of features. Let us consider the latter data first. One of the most interesting
C.
SPECIAL TOPICS
267
a
GA. Ji a
a FIG.5.5.
~
c57l
A
cA
A fl
A
A compendium of alphabetic characters from different fonts.
phenomena of recent years has been the newly discovered subjective contour illusion (e.g., Kanisza, 1974. See also Petry & Meyer, 1986, for a summary ofmuch recent work). As illustrated in Fig. 5.6, the subjective contours are suggested by the arrangement of the parts of the stimuli. The germane aspect of these phenomena in the present context is that these contours are powerfully perceived even though they do not physically exist! They represent forms that are perceived because of the arrangement of the elements of the stimulus. The point is that perception is a process in which different roles apply from those in physics. Percepts, unlike energy or matter, can be created and destroyed by suggestion-without the necessity of their antecedents being present in an energetic or material sense. What we are discussing in this case is information and its corollary arrangement and not matter or potential-a distinction often overlooked by theoreticians in this field. Theories of illusion and perception that attempt to fill in the blanks by showing how the energy or matter is actua1ly created or moved from one part ofthe image to another (e.g., Ginsburg's, 1983, argument that the subjective contours were produced by band-pass filtering of the image or that they are produced by partial activation of neural line detectors) typically do not generalize beyond the specific situations for which the putative energy-related analogy was suggested. It is far more likely that the percept, be
FIG. 5.6. A Kanizsa triangle (courtesy of Dr. Gaefano Kanizsa of the University of Triestel.
268
5.
THE RECOGNITION OF VISUAL FORMS
it subjective contour or otherwise, is created from a processing of the information, or arrangement, or significance, or invariances that exist in the stimulusform rather than by any manipulation of the distribution of physical energy incident on the retina or passing along the visual pathway. An even more immediately compelling argument against energy-dependent perception can be seen in some data obtained by Townsend and Ashby (1982) in a study in which their main goal was, in fact, to compare various alternative feature-analysis models. One unusual aspect of the Townsend and Ashby experiment was that they asked their observers to specifically report what features they "saw" -an analytical step beyond simply naming the alphabetie charaeters that were presented. The astonishing and disconeerting result was that their observers reported the presence of features (i.e., eomponent lines of a limited set of alphabetic charaeters) even when the features were not present in the original stimulus! Thus, the feature seems in this ease to be an attribute created by the observer from interpretation of the global shape. This is in sharp opposition to its putative role as the driving force that creates the perception of the global form. Townsend and Ashby refer to the visible, but nonexistent, parts as "ghost features." The analogy between their result and the letter-in-word and objectsuperiority phenomena is immediate and direct. It seems, perhaps, that one could reasonably extrapolate this finding to eall alliocal features "ghosts" and feature theories of form recognition "gbost" theories. It seems more likely that they are processed well after the global attributes and arrangements have been proeessed and only in those situations in which very specific focal attention is directed at the details of the stimulus form. This assertion is essentially identical to the proposal of global preeedenee made by Navon (1977) and others discussed previously and formulated from another point of view. Other support for global precedence has been provided by Lupker (1979) in a study in which he examined the confusions among a small set of alphabetic characters. Aeknowledging that the contemporary theoretical scene emphasizes the features of a form (specifically, the temporal precedenee of features rather than global wholes), Lupker determined that the error matrices emerging from his experiment reflected confusions among a particular set of alphabetie and quasialphabetie charaeters that depended more upon their global form than on their shared eonstituent features. Similar results had been obtained by Bouma (1971) earlier but explained in a way that was criticized by Lupker as being inadequately justified, although in the latter's opinion quite correet. In spite of an array of empirical arguments for global influences in form perception, virtually every psychological theory of visual form recognition that I now eonsider coneeptualizes the task of form recognition to be one of analysis into parts, of feature detection, or of local attribute processing. I have already alluded to the reasons for this misdirection: The limits on our mathematical skills in dealing with global arrangement, the influence of the findings from the singlecell neurophysiologicallaboratory, and the discrete nature of computer hardware and software technology.
C.
SPECIAL TOPICS
269
Clearly, the redirection of our theoretical energies toward a more holistic point of view is not going to be easy. We need an increased mass of psychophysical data, like those ofthe studies ofBouma (1971), Navon (1977), and Lupker (1979), that strengthen the argument for the perceptual influence of the global arrangement of the features of stimulus-objects. We need new and more powerful mathematical and computational algorithms that measure and are sensitive to the arrangement of these parts and features. Most of all, however, we need a new perspective on the problem that will guide us to emphasize the holistic attributes of stimulus-forms rather than localized ones-a perspective that can come only from an appreciation that a misdirected trend in the current theoretical zeitgeist has led us to ignore the basic demonstrations and the empirical data supporting global influences. It is aremarkable fact how effective the zeitgeist has been in guiding our thinking. Feature-oriented theories are so elegant, so efficient, and so contextually consistent with the findings of other sciences that workers in this field have virtually ignored the data that many Gestalt psychologists argued decades aga to be the empirical facts of the matter. With this preamble in place, let us consider some of the theories of form perception that have been specifically offered as explanations of how we see specific forms. We will not consider in this section any further details ofthe wide range of computational vision "engines" that have been designed to mimic, without deep explanation or theoretical validity, human visual recognition processes. Though these practical engineering efforts have their own charm and utility and often can feed back heuristics to the theoretical effort, they are, in the main, unconcemed with the problem that is our main focus here-the recognition of forms by the brains of humans and other organisms. One of the earliest (post-World War II) approaches to form recognition was a relatively naive and implausible template theory. Template theories, although primitively holistic, can be quickly disposed of logically if not empirically. I have already introduced the basic idea that a cross-correlation is carried out between the input stimulus-form and an enormous set oftemplates or prerecorded images that were supposed to have been previously stored in some kind of visual memory . Carrying out this set of cross-correlations between the input stimulus and the set of "templates," according to the tenets of this hypothetical model, would eventually produce one maximum correlation between the input and a particular one of the very large number of templates. That maximum correlation acts as the decision criterion for the choice of a response. The arguments against template theory should be familiar now. Considering how good humans are at recognizing objects that are magnified, translated, or rotated, and the specifically spatialcorrelation process implied by this theoretical approach, it is obvious that an implausibly large number of templates would be required to permit this system to work. The vast number of templates is perhaps the most damning criticism of this approach, but explaining how the enormous number of cross-correlations could be carried out in the brief period of time required for recognition to occur is certainly another.
270
5.
THE RECOGNITION OF VISUAL FORMS
An interesting sidelight of the template-matching theory is that it probably cannot be invalidated empirically. That is, if one accepts the premise that there is a set of templates recorded in the brain, the theory obviously could work. All one has to do to account for recognition is to propose more and more templates. Because we cannot neurophysiologically identify a template (because it is likely to be too complex for neurophysiological analysis), we cannot reject its existence apriori, and the system could, in principle, operate in the way suggested. Similarly, some kind of a parallel correlational system, implausible but possible, cannot be rejected out of hand. The arguments against the template theory, therefore, mainly have to be made in terms of the logical and logistical demands made by the necessary processing and memory requirements, that is, by the combinatorics of the situation. Simply put, it seems that the template idea is just too much of a hammer-and-tongs approach, too gross, and too unsophisticated to be the process of choice of organic evolution. Admittedly, this is a matter of taste and not a rigorous argument, but it is compelling, at least to this reader ofthe theoretical scene. Another set of contemporary theories of form recognition, as noted, typically concentrates on the identification of the local features and assumes that at some higher level these elements interact with each other to produce a dassificatory response. Although there were many earIy steps in the sequence of ideas that led to this general approach to form perception, the most influential prototypes of this general dass of theories was the work done in the 1950s and 1960s on neural-network models. Most of these early network models bad a common architecture regardless of the details of the information processing proposed by their formulators. At the lowest level a map of the stimulus is projected on aseries of receptors; the output of these receptors is then fed into a some kind of a local feature-processing network that analyzes the image into a collection of narrowly defined local features such as angles or straight lines. The outputs of this stage of processing are then fed into another level of processing where they influence the choice of an output by virtue of some interactive concatenation of their outputs. These theories have the advantage of not requiring an extensive set of templates; the processing acts algorithmically upon the attributes ofthe stimulus to make a decision from among a prestored set of responses. Responses, we are assured, can be stored far more economically than are the putative templates of that even more primitive type of theory. One, of the most influential early theories (Selfridge, 1959) of this genre can be considered to be prototypical in that its operation was dependent upon features, but not upon templates. Selfridge proposed a multilayer system in which the raw sensory input was fed to a system of feature detectors. These specialized elements were activated by the presence of stimuli that matched their specific preset sensitivities. Thus a verticalline in the stimulus would activate a verticalline detector that would then proceed to signal its activation by increasing its level of activity-a simple geometrical-straight-line-to-frequency-of-response (i.e., a spatial
C.
SPECIAL TOPICS
271
to temporal) conversion. All of the other feature-detecting elements in the system would do the same when selectively activated by their respective trigger feature. The result was that there was a pattern of greater or lesser amounts of activity spread across the simulated array of feature detectors. At this point there was no special order to the outputs, only a variable amount of activity being differentially signaled by the detectors. Selfridge referred to this state of affairs as a pandemonium, and the name has stuck to this kind of model. The second stage of this type of system typically consisted of a set of integrators, each of which was tuned to a particular pattern of activity in the pandemonium of signals being sent up from the feature detectors. If a particular pattern of detector outputs appeared in the barrage of signals, certain elements of this second level of integrators were selectively and differentially activated. Thus a second level of pandemonium (in which the features triggering these units were patterns of activity from the receptors rather than the geometric features of the stimulus form) was established with each of these second-tier detectors activated to the extent that its specific input trigger pattern was present in the first level. A final decision level was then proposed by Selfridge. At this level a single mechanism capable of isolating the most active of the second-tier detectors selected the particular output response of the system. Recognition had been achieved because the appropriate response was made to an input three levels away. Selfridge's system depended on built-in feature detectors at several levels responding to different kinds of features. It was unspecific enough to have had a monumental influence in establishing the idea that feature detection by an interconnected network of receptors lay at the heart of the recognition problem. This idea has been persistent throughout the history ofform-perception research, as we shall see here even though it was not possible to practically implement these ideas until the recent arrival of parallel computers. Another important influence on theoretical thinking in the form-recognition field was Rosenblatt's (1958) perceptron model, an attempt to build a networktype recognition machine that learned by appropriate "reinforcements" of the sequential stages of information processing. We saw in chapter 2 that Rosenblatt's perceptron consisted of a mesh of receptors randornly connected through a network of interconnections to a group of output devices. Rosenblatt had proposed that a random network of this kind would "learn" to recognize particular input stimuli if appropriately "reinforced" by strengthening signals sent to particular interconnections in the network when an appropriate connection between one of the inputs and one of the outputs was established. Thus, the system would increasingly tend to respond with a correct response (i.e., the correct "recognition" or classification ofthe input pattern) as a result of experience. Rosenblatt's ideas had recently been challenged in the sense that there were thought to be severe constraints on the learned behavior of such a random network. Dnly certain types of network arrangements are capable of adapting in the sense he proposed, according to Minsky and Papert (1969), and, therefore, only certain ones would in principle be capable of becoming recognition-type engines.
272
5.
THE RECOGNITION OF VISUAL FORMS
It is now thought, however, that some of Minsky and Papert's criticism may have been too severe and there is a resurgence in interest in this approach. Newly emerging parallel processor technology, in particular, has allowed actual testing of some of these older ideas. A whole new field of "neural computers" purportedly modeled on them has suddenly exploded onto the scene. Neural is probably a misnomer in this case (no neuron-like elements are used in most of the neural computers); rather, the term has been rather loosely employed to describe the new generation ofmassively parallel computers. Some ofthese new machines are currently beginning to be used to simulate the kind of connected networks that Rosenblatt and Selfridge had so insightfully conceived years ago with considerable success. However, the field ofparallel processing is so new and so little is known about the programming of these incredible machines (some are capable of 2 to 3 billion instructions per second) that it will probably be a prolonged time before we understand what their impact will be on form recognition theory in the future. Even though all of the details of Rosenblatt's perceptron theory with its randomly connected and unspecialized elements do not stand unqualified, the notion of a network of interacting neuronlike elements coupled with the feature detectors of the Selfridge system to produce recognitionlike behavior became a main theme of contemporary recognition theory. We have already seen how this approach influenced the work of McClelland and Rumelhart (1986) and Papp and his colleagues (1982). Though there were many intervening steps, a review of the most recent form-recognition models reflects the fact that this basic idea of analysis-into-features is still the predominant premise of form-recognition theory. For example, one of the most interesting recognition theories of recent years (Bennett & Hoffman, 1985b; Hoffman & Richards, 1984) explicitly bases its "decomposition-into-parts" approach on the premises that we: (a) "must recognize an object from only partial information"; (b) that "decomposing such objects into appropriate parts, thereby decoupling configuration from other aspects of their shapes, can make easier their recognition"; and (c) that "a c1assification or description of parts ... is likely to be simpler than a c1assification of arbitrary shapes" (Bennett & Hoffman, 1985b). Arguments such as these, however, do not necessarily reflect the biological rea1ity of the human visual system; they read very much like a recipe for the simplest possible computer-vision system. The work of Hoffman, Richards, and Bennett is extraordinarily sophisticated in a mathematical and a philosophical sense. One wonders, however, how relevant it is to psychophysical reality. There are many other examples of feature-oriented perceptual theories, of course. Many are straightforward applications of the pattern-recognition theories that were summarized from Watanabe' s taxonomy discussed earlier in this chapter. However, there are also many others that are intended to be specific models of psychological processes. We have already mentioned some of the most notable
C.
SPECIAL TOPICS
273
of these, including that of McClelland and Rumelhart (1981) and Rumelhart and McClelland (1982). Others include the hypotheses governing the experiments of Geyer and DeWald (1973) and Biederman's (1987) work. Some recent developments, however, do seem to be steps in the direction I am advocating; that is, toward an increased emphasis on the global aspects of the form in the recognition process. Kahn and Foster (1981) and Foster and Kahn (1985) have proposed a scheme in which the spatial relations among a constellation of dots are used to explain discrimination performance in experiments in which observers are asked to assert whether two dot patterns are the same or different. Individual dots are categorized in terms of their spatial relations to all other dots ofthe constellation. Patterns are, according to this theory, represented in the visual system not by specific features but, rather, by neuromathematicallike expressions describing the spatial relationships among the systems of dots. Although this model is certainly incomplete and was developed in the context of a discrimination-type model, it is a step forward from the naive feature-oriented model so prevalent in today's literature. It is also, if I may note, very similar in principle to my own autocorrelation theory (Uttal, 1975), which also deals with multiple and global interdot relationships. Foster (1984) has gone on in an excellent review of the form-perception problem to note some of the other higher order global relationships that may be important other than the simple spatial relationships of the individual dots. For example, symmetry and the orientation between an object-based frame of reference and some independent frame of reference also may be salient aspects of global form perception. As an example, Foster points out, the orientation of a letter or a face is extremely important in its recognition by the human observer. 6 Modem theories that are specifically oriented towards the global organization of the stimulus-form are less frequently published these days than are feature theories for the reasons I have already mentioned. Others that tend in that direction, however, include Lockhead's (1970) work on blob perception, discussed in chapter 4 and Hoffman's (1966, 1980) theory discussed in chapter 2. Others certainly will appear in the future and are eagerly awaited.
6 Another aspect of this very interesting paper by Foster is bis powerful critique of the spatialfrequency "channei" model of form perception. It is a must-read item for everyone interested in visuaJ perception and the rise and fall in recent years of this particular theoretical orientation. It should also not go unmentioned that although the Fourier model of visuaJ perception is essentially a global one, it does not serve the need for a universal global model, for the reasons identified by Foster.
6 A CRITIQUE OF CONTEMPORARY APPROACHES TO FORM-PERCEPTION THEORY
Mathematics is too powerful to provide constraints on information; it models truth and drivel with equal felicity. -Cutting, 1986, p. xi
A. INTRODUCTION This book has been written to review and evaluate the state of the psychological sciences in the 1980s with regard to our understanding ofhow people see forms. I have sought to develop a minitaxonomy of the various stages of visual information processing-detection, discrimination, and recognition-that I believe represents the core of the visual-perception problem. These processes can generally be considered to be somewhat "higher level" than most of the visual processes discussed in my earlier works (Uttal, 1973b, 1978, 1981), in that they more likeIy than not involve neural networks for their execution that are more central and more compiex than the primarily transductive, communicative, or initial encoding mechanisms previousIy emphasized. Somewhere within the sequence of processes that I have discussed here there is a transition from what many of us now believe is a kind of more or Iess preattentive perception to a kind that is generally agreed to require effortful attention. It is my judgment based on a somewhat modest amount of evidence that this occurs at the interface between discrimination and recognition as I have specifically defined them in chapter 1. This book, as a conscious effort toward the deve10pment of some order in what otherwise often appears to be a disorganized collection of raw empirical data and isolated microtheories, should be considered to be a continuation and culmina-
274
A.
INTRODUCTION
275
tion of the train of thought that has developed during the course of preparing this tetralogy. In this series of books I can see how my personal reductionistic approach has evolved from one stressing explanations that were patently and exclusively neurophysiological to one reflecting an equal commitment to psychological, descriptive, phenomenological, mathematical, and even philosophical interpretations. My perspective is now largely based upon the premise that perceptual processes may, in principle, be immune to a purely neuroreductionistic explanation. Furthermore, I now suspect that they may, indeed, even be beyond any reduction to either process or mechanical implementation. The outcomes of this evolution of personal theory, admittedly, may place me perilously far from the current consensus with regard to how we believe we see forms Ca consensus that is dominated by direct, unmediated, algorithmic, elementalistic, empiricistic, and both mathematical and neuroreductionistic thinking), but I do see the dim outline of an emerging change in that consensus as reflected in the contemporary literature of perceptual theory. In an increasing number of instances it seems to me that some of my colleagues are beginning to be concemed about today's zeitgeist and are expressing an awareness that most interesting perceptual processes are not instantiated in simple and automatic mechanisms. There is a new realization of the enormous complexity of human perceptual processes and of the holistic, approximate, and indirect ways in which perceptual information is processed. The story has not yet fully unfolded; there are many conceptual and technical barriers yet to be overcome, but there are a few theorists Ce.g., Teller, 1984) who are also currently expressing these new points ofview. One of the disappointing developments associated with this new enlightenment, however valid it may be, is that we are now beginning to see some signs of limits or constraints on what we may hope to accomplish even in the ultimate future. These limits may be fundamental and may inhibit us from achieving some of the more or less naive goals set by cognitive, computational, and physiological psychologies in recent years. Upon reflection, the hubris that characterizes some of the goals set by students of perception for themselves is astonishing. That hubris is often compounded by a lack of awareness of the constraints imposed on theory building in this field by mathematical and logical principles that are much more directly applicable to perceptual research than many perceptual psychologists and other cognitive psychologists yet acknowledge. In pursuing this project I have sought to step outside the system sufficiently to at least permit myself to maintain the pretense that I am not championing a vested theoretical interest or grinding a personal ax. Not having been a top-down, cognitively oriented psychologist in the past has helped me to maintain that personal illusion. From this somewhat detached position I have assumed the roles of gadfly and critic. My hope is that this book, which has tumed out to be much more of a critique of the field of perceptual theory than the synoptic review I had originally intended, is neither unfair nor abrasive, but constructive and positive. I remain a true believer in this science and am convinced that even those
276
6.
CRITIOUE OF CONTEMPORARY APPROACHES
approaches of explanation 1 have criticized can be either beautiful or useful, or both, in their own right. Whatever 1 say in this chapter about the validity of neurophysiology, Artificial Intelligence, or mathematics as models of form perception should not be misinterpreted to mean that 1 think these fields of endeavor should be abandoned. They have interest, utility, elegance, and beauty sufficient to stand on their own and do not require perceptual theory to be sustained. 1 am only concemed here with the linkages that are made between them and the truth about perception. Having assumed the role of a critic, it would be unfair for me not to further acknowledge at the outset that the current state of theoretical affairs in perceptua1 psychology may be as good as it can possibly be given the intellectual forces acting on it-forces that come both from the nature of the material and the quality of the tools available from other closely related sciences. Some of the powerful intellectuals who have contributed to this field may be incorrect in their interpretations, from my point of view, but neither 1 nor anyone else could have reacted to these invalid (in my judgment) interpretations without the pioneering contributions of those colleagues and predecessors. It is their thoughts, right or wrong, that represent the foundation on which future understanding will be bullt. Sometimes progress is enhanced because of amistake or misinterpretation that highlights some conceptual confusion. My goal has been to concentrate on form perception theory in this book. 1 was correct that it would have been both impossible and wasteful to try to review all of the empirical evidence for all related perceptual phenomena. Theory is, first of all, why we are at work-theory in the most positive sense of integrative understanding and explanation. Theory is an important digester and summarizer ofwhat sometimes may seem to be a tangle oftrivial and irrelevant experiments. It is a truism, but a necessary one to assert, that it is only to the extent that an experiment contributes to integrative theory that it is worthwhile.
B. SOME SOURCES OF DIFFICULTY When 1 began this book it was with a deep fascination with the wondrous accomplishments of perceptual theory. At the conclusion of this project 1 find myself much less sanguine about the state of integrative thought in this field, for several reasons, each of which can be traced to a particular source. 1. First, much of what passes for explanatory theory these days has been misdirected by the accidents of what statistical, mathematical, or computational tools are fortuitously available. In particular, there is an enormous gap between the kind of mathematical processes that are necessary to describe the 10gic of the human visual system and currently available formal systems. Some tools (e.g., Fourier transformations) have quite correctly gained wide acceptance as convenient nomenclatural or descriptive systems, but quite incorrectly have been ex-
B.
SOME SOURCES OF DIFFICUL TY
277
tended to incorporate superfluous and interpretive pseudoreductionistic baggage that could easily be and should better have been kept separate. 2. Second, it seems to me as a result ofmy studies that there has been a paucity of thinking about fundamentals. What is it that we are trying to do? What do other sciences and methodologies, such as combinatorial mathematics, chaos theory, and automata theory (to highlight only three particularly salient examples), have to say about the strategies with which such goals can be approached? To assert that such matters are just "philosophical" (a word read by many as being synonymous with "frivolous") excursions misses the point that formallogic, mathematics, and epistemology have always acted and must continue to act as guides and constraints to speculation concerning the study of mental processes, just as the laws of physics have constrained what might be ill-posed efforts in applied engineering. Most practitioners of the physical arts and sciences know that it would be a wasted effort to attempt to do something (e.g., develop a perpetual-motion machine or build a faster-than-light space ship) that is formally excluded by those laws. Sometimes, however, it seems that many of my perceptual colleagues attempt to achieve goals that should have been similarly excluded by analogous constraints and fundamental principles. Sometimes this is due to the fact that they are not familiar with the caveats of mathematics or physics. Sometimes it seems that it is because they ignore these principles even though they are familiar with them. Frankly, I am at a loss to explain these logicallacunae. There are important limits and constraints that must be attended to by perceptual theoreticians. One of the most important is the mathematical fact expressed by Stockmeyer and Chandra (1979) that "some kinds of combinatorial problems require for their solution a computer as large as the universe running for at least as long as the age of the universe" (p. 140) in spite of the fact that they are not infinite and are "solvable in principle." The germane fact for perceptual and other kinds of reductionistically oriented cognitive scientists is that such' 'intractable problems" include systems as simple as acheckerboard (and the rules of checkers or chess). It seems that given the intractability of this simple problem it world be weH for us to consider the discrepancy between the magnitude of the connectivity of the checkerboard and that of the brain before we somewhat overambitiously attempt to analyze the cognitive functions of the brain from the same elemental level. There are several misperceptions at work here. One is that if a computer program simulates to some degree some perceptual process, it is akin to an explanation of that process. In fact, such a simulation is virtually always at the level of the molar process or function of the simulated system; it is an extrapolation of the most fundamental tenet of analog theory that the instantiated mechanism by which a given algorithm works can often be completely different from the mechanism by which the brain works, and yet both may be explained by the same formulae. For another, consider the relationship between the measurement of pressure in a container and the properties of the ensemble of the gaseous molecules
278
6.
CRITIOUE OF CONTEMPORARY APPROACHES
within that container. The overall measure-pressure-although direct1y related to the concatenation of the microscopic details of molecular position and dynamics tells us nothing about the behavior of the molecules. There is no way to go backwar~., from the global to the locally discrete. Yet, reductionistically oriented psychologists quite commonly assert that we can adduce something about the behavior of individual neurons from the molar phenomenology of the perceiver. On the other side of the coin, although it is possible in principle, it is in practice very difficult to go from the discrete to the global even when the number of elements is not very large. A bit of thinking about fundamentals seems to be in order here. Considering the number of neurons, the combinatorics of the human brain become formidable indeed. Although there is no universally agreed measure of complexity, one can get a feeling for the magnitude of the problem by appreciating that there are something like 1010 (or, as some would assert, 1013 if all of-thevery small cells of the cerebellum are counted) neurons and that each neuron may be connected to as many as 103 other neurons. Considering that comparably huge numbers of cerebral neurons may be involved in even the most simple percept, thought, or cognition, but without the regularity that permitted some early stabs at network simulations (e.g., in the retina or cerebellum), this is all too likely to be an intractable problem in the way identified by Stockmeyer and Chandra (1979). This is some checkerboard, indeed! The perceptual theorizer is often misled by an uncritical acceptance of the degree of accomplishment of simple network simulations of some perceptual process. In fact, no network model has ever produced behavior of the complexity of the simplest organism with, perhaps, only one area of exception-those situations in which the outcome is due to the function ofhighly repetitive, almost quasicrystalline kinds of structures such as has occurred in the modeling of the Mach band by lateral inhibitory interactions in the retina and some interesting analyses of the equally highly regular cerebellar network. But the irregular network of even a small portion of the central nervous system of avertebrate has never been analyzed, much less simulated in a computer program. Models that purport to do so are often very simple networks of a few cells that analogize simple forms of molar behavior. Thus, simple networks of a few neurons (such as those found in invertebrate nervous systems) can be used to produce behavior that is comparable to that produced by a much more complicated vertebrate organism. Here, too, there is ample opportunity for a fallacious leap from the minimal model to the full-blown network of the more complicated brain because of analogous behavior, when, in fact, the actual processes and mechanisms in model and organism are probably totally different. 3. A third cause of my concern about the state of integrative thought in the field of perceptual theory is that perceptual psychologists seem to be all too easily attracted to distinguished accomplishments in other sciences as putative models even when those data are demonstrably invalid in the context of cognition and
B.
SOME SOURCES OF DIFFICUL TY
279
mental processes. It is an oft-told aphorism that experimental psychologists really want to be physicists. To the extent that this means that they should introduce quantitative procedures and precise measurements into well-controlled experiments, this is all to the good. To the extent that they assume that the same kind of simple, monodimensional, functional relationships that characterize the study of physical entities are characteristic of what is intrinsically a multidimensional perceptual universe, this is probably amistake. Many other psychologists seem to have an inexorable desire to be neurophysiologists and to study neurons rather than mental phenomena. Although the study of individual neurons is exciting in its own right and has proven to be a seductive heuristic for perceptual theory, it is important to appreciate that the outstanding intellectual achievements that have resulted from the invention of the microelectrode by Ling and Gerard (1949) may be totally irrelevant to perception or cognition. Most of what we have learned from this powerful tool deals with the metabolism and function of individual neurons. We also have a good idea of the function of isolated synaptic junctions between pairs of neurons, but we have very limited knowledge of the nature of the realistically complicated networks into which they are connected. Nevertheless, this missing information is the only kind that is really relevant to developing asound neuroreductionistic theory of perceptual processes. It seems far more likely that perception is the outcome of the interaction of myriad neurons and not the action of a single cello In spite of this clash between levels of discourse and a substantial amount of data to the contrary, single-cell hypotheses purporting to explain perceptual processes still hold wide sway among perceptual theorists . Similarly, many psychologists seem to want to be computer experts in contemporary science. Although the computer is the dosest analog to the brain our technology has yet produced, these two distinct1y different kinds of information processors are almost certainly organized internally in accord with fundamentally different principles and operate according to totally different roles and logics. Nevertheless, the computer metaphor of perceptual processes is also widely accepted, particularly among cognitive psychologists with what Seems to be a minimum of critical exarnination of the fundamental issues raised and without careful attention to the formal roles that constrain achievement of their often illdefined goals. In retrospect, it seems that we have been all too willing to let the conveniences of the computer model distract us from the principles of perceptual organization that are clearly expressed in observed phenomenology. 4. Another concern 1 have is that perceptual psychologists are all too prone to inadequately define the entities and concepts that they study. We all acknowledge how intrinsically difficult it is to define mental terms; even such basic operational terms as detection or discrimination are used in a variety of different and often conflicting ways in various experimental reports. For example, we have seen in chapter 5 how the word discrimination (in the context of visual search) is often used to describe what is clearly a recognition process. In other instances
280
6.
CRITIOUE OF CONTEMPORARY APPROACHES
discrimination has been used to define the task in what clearly seem to be detection experiments. The lack of precise operational definitions of many of the most fundamental ideas of perceptual psychology often leads to a situation in which contradictory fmdings are ignored or false controversies engendered. A further definitional difficulty is often produced when the same word is used to denote two quite different processes that share some superficial functional similarities. Is, for example, the dishabituation of the response to a touch to the mantle observed in the Aplysia a reflection of the same kind of neural mechanism as that underlying human adaptation to seeing the same movie three or four times? Are we using words that are merely metaphors or analogs of each other to incorrectly identify distinctly different structural entities as homologs of each other? 5. Perceptual theorists also tend to both ignore the first-order demonstrations that should set the context and limit the data base to which they apply their models. In a previous chapter I discussed the role of demonstrations and how important they are to establishing the basic premises (e.g. , recognition is locally or globally precedent) of the intellectual domain of perceptual theory even when they are not sufficient by themselves to authenticate a perceptual theory (see foHowing). All too often the message being broadcast by these important pieces of first-order data is ignored. It seems to me, for example, that many ofthese demonstrations, if not most, argue that we should pursue a global, interpretive theoretical approach. To the contrary, however, much of contemporary perceptual theory seems to be more influenced by the available computational technology than by these phenomena in its initial choice of the strategy by means of which a theoretical attack will be made. Not unexpectedly, artificial controversies sometime occur between theories that are based on what appear to be totally different premises concerning the nature of psychobiological reality. But even worse, theoretical traditions sometimes tend to perpetuate themselves and to be reinforced by analogies and metaphors even though they are not speaking to the true mechanisms by means of which perceptual processes are carried out. 6. Further, psychologists, particularly those involved in the development of theories of vision, have often not heeded the empirical facts of modem cognitive psychology. They still tend to assume that passive and automatic processes account for most perceptual phenomena when, in fact, there is an emerging feeling that perception is far more a matter of interpretation and active reconstruction than of "hard-wired," automatic, and algorithmic calculations. Bruce Bridgeman (1987) summed it up weH in the introduction to a recent book review when he asserted: "A specter is haunting neuroscience. It is the specter of cognition, of higher level influences that can no longer be ignored" (p. 373). Bridgeman asserts that this is "motivating systematic attempts at interaction between neurobiologists and cognitive psychologists." But, there is another possibility, namely that active, cognitive processes can not be modeled by simulations of neural networks for the reasons I have considered earlier and will summarize later in this chapter. Perhaps the message of modem cognitive psychology's empirical research
B.
SOME SOURCES OF DIFFICUL TY
281
program-that perception is far more complicated than it previously seemed, and higher level effects penetrate down far lower than we had thought-is also being ignored. 7. Another cause for concem is that very grand perceptual theories are often based on astonishingly slight amounts of data: A single type of experimental result or even the outcome of a single experiment or demonstration has often been used to justify the most sweeping generalizations conceming how people see forms. Given the very low level of validity testing, virtually any rnicrotheory Can be supported simply because there is always some rnicroscopic phenomena that can be found that can be made to appear to be relevant. However, because of the enormous variability ofhuman perceptual and cognitive skills, it is often the case that vastly different conclusions can emerge from what are thought to be nearly identical experiments. Fit with a single experimental datum should not be allowed to validate anY theory. It is easy to find a formula to describe individual phenomena, but theories of worth should describe broad classes of data. I have criticized the Artificial Intelligence movement for developing programs that are not generalizable and thus cannot be considered to be intelligent in the sense applied to organic reasoning. But psychologists should be aware that many of our theories also do not generalize beyond the immediate empirical environment in which they were created. Those that do not are terribly vulnerable to repudiation. 8. Further, there is adesperate shortage of mathematical techniques and algorithms that are appropriate for the study of complex, interacting systems in which the global structure or arrangement of parts is more important than the nature of the Iocal features. Perceptual and other kinds of cognitive psychology are very much in need of something that Can playa role analogous to that played by the noneuclidean mathematics of Lobachevsky and Riemann in relativistic physics. Those mathematical developments allowed physics to take its giant leap forward earIy in this century. We rnight not recognize it when it comes along, but one hopes someone will perform the essential Einsteinean task of linking a truly applicable and relevant mathematics to a truly valid theory of perceptual and other cognitive phenomena. Bennett, Hoffman, and Prakesh's (1988) Observer Theory is a recent attempt to provide such a mathematical foundation, but the mathematics is extremely difficult, the conceptual structure subtle and cryptic, and it may be some time before this work becomes accessible to and understandable by other perceptual theorists. One of the most glaring lacuna in this regard must be filled by some novel mathematics that will help os to understand how the molar phenomenological properties of perception Can emerge from the concatenated action of a host of discrete neurons. Though there are very few insights in the way in which this may occur, one possible line ofthought has been stirnulated by Hoffman's (1966, 1980) application ofthe Lie algebra to perception (see chapter 2). But there are also some other obvious leads that must be followed up. It is, of course, possible that there
282
6.
CRITIOUE OF CONTEMPORARY APPROACHES
will never be a completely satisfactory mathematics (mathematics as we know it may not be adequate to model mind), but until we can establish such a dismal fact beyond a reasonable doubt it would be foolhardy to abort our efforts in that direction. Neurobiological networks, of course, do produce intelligent behavior (any intellectual accomplishment and any cognitive process stands as an existence proof of that fundamental truth), but there is a great need for research and development of more appropriate kinds of mathematics to represent the particular kind of computational engine-the brain-with which we are concerned. The necessity for establishing this foundation of understanding of how neural networks compute and the logic they use is very great. We must not deceive ourselves, however; this task will certainly be very difticult and may be impossible. Along with this search must go a substantial effort to determine what are the limits of analysis of such problems. 9. The final source of my concern is that the very empirical data upon which we perceptionists base our theories are often embarrassingly transient. We have seen throughout this book many instances in which some observation has drifted away like a smoke ring when attempts are made at replication. I argue that this paucity of reliable data is, at the most fundamental level, a result of the multivariate complexity of most psychological functions and the adaptive power of the cognitive system rather than any deficiency of control in a given experimental design. Nevertheless, the fragility of even the data base upon which we base our theories is a signal of the extreme care that must be taken in such complex situations to guarantee data validity. In short, because of the plasticity and adaptability of the organic mind-brain, perceptual research is a far more treacherous universe for the theoretician or empirical researcher than the much simpler worlds of basic particle or cosmological physics. Later in this concluding chapter I shall consider the widespread problem of the enormous changes in findings that can result from the slightest changes in experimental design, as weIl as the surprising absence of general roles able to describe reasonably broad ranges of psychological phenomena.
C. SOME THEORETICAL MISJUDGMENTS What are the misjudgments that we have coIlectively made with regard to current perceptual theory as a result of these forces? 1. My discussion of major misjudgments begins with the fact that there has been much too great an emphasis on local-feature-oriented models of perception rather than theories accentuating the global, Gestalt, holistic attributes of stimulusforms. In my judgment, all too many perceptual theorists have currently fallen victim to the elementalist technological zeitgeist established by neurophysiology and computer science. Although disappointing, this is not surprising, because
C.
SOME THEORETICAL MISJUDGMENTS
283
it is exact1y comparable to the way in which theoreticians in this field have been influenced by the pneumatic, hydraulic, horological, and telephonic analogies that have sequentially characterized theories of perception and mind over the last two millennia. In spite of this fallacious tendency toward features, a considerable body of evidence, only some of which has been surveyed in this volume, suggests that in fact we see holistically, that is, that there is a primary global precedence in human perception. We can, of course, direct our attention and scrutiny to the details of a picture, but we are as often ready to perceptually create details on the basis of some inference as to be influenced by the presence of the real physical details of the stimulus. First-order demonstrations, usually ignored but always compelling, urge us to consider the global aspects of a form, whereas one has to carefully construct an experimental situation to tease out some semblance of local precedence. In retrospect, we can now appreciate that classic Gestalt psychology set a tone that was fundamentally correct in spite of the fact that they were no better at explaining phenomena than we are today; and many wellcontrolled contemporary experimental studies also suggest a global precedence in our perception of the world around us. Wehave not listened to these messages. 2. Further, there has been much too much emphasis on theories that are essentially empiricist rather than rationalist. We tend to develop models of recognition in particular, that depend upon a passive transformation of the geometry of the stimulus-form by algorithms that are supposed to be direct1y sensitive to the details of the local geometry of the stimulus. In fact, it seems far more likely that the level of cognitive penetrability of perceptual processes is far greater than is generally accepted today within the empiricist tradition. What I am suggesting here is that, in psychophysical reality, there is a much more adaptive and interpretive processing of the symbolic content of the coded stimulus information transmitted along the peripheral sensory pathways than is generally appreciated. These codes, the media for the communicated information, are not the essential input for some kind of passive processing, but rather are only cues and hints that are used by more complex and higher level (i.e., more central) systems that very quickly throw out the specifics of the stimulus geometry. The meaningful messages-the relationships and salient relevancies-conveyed by those "media" -the neural signals and codes-are then processed in what can best be called a symbolic manner that has little to do with the physics or geometry of the stimulus or the dimensions of the neural codes by which the information was transmitted to the central nervous system. A corollary of this misinterpretation of perceptual reality, in which a valid rationalism has been supplanted by a superficially easy but spurious empiricism, is evidenced in the many models (of what are almost certainly very complicated, high-level perceptual processes) that are erroneously based on known peripheral neural mechanisms . This tendency to descend to the periphery for explanations is a miscalculation of the same order as the tendency toward deterministic, passive-
284
6.
CRITIQUE OF CONTEMPORARY APPROACHES
process models-in other words, those that I have classified as empiricistic. At some later time, I am convinced that we shall come to agree that more symbolica1ly representative and computationally active processes-those that I have classified as rationalistic-are actually at work here. ln short, there is an ubiquitous fuzziness in the selection of the point of demarcation between the strictly deterministic, passive, and preattentive processes of the peripheral nervous system, on the one hand, and the active, symbol-processing, attentive visual processes on the other. It appears that cognitive penetration can be discerned in recognition behavior but much less so in the lower levels of visual processing. My own research on the perception of dotted, stereoscopic forms, as well as many conversations with others who also work with stereoscopic stimuli, suggest to me that the cognitive penetration of which I speak is especially salient in studies involving actual tbree-dimensional stimuli or equivalent virtual objects inferred from two-dimensional invariances. 3. Another misjudgment has been a profound unwillingness to accept the seemingly obvious fact that perceptual processes are immensely more complex in terms of the numbers of involved variables, neurons, mechanisms, and processes than hitherto believed. A corollary of this misunderstanding is the general lack of appreciation of how sensitive most perceptual data are to relatively trivial changes in experimental design. To put it most directly, we have neither appreciated how complex even the simplest appearing perceptual process actually is nor how adaptive the human iso 4. For another misjudgment, we perceptionists exhibit a bizarre tendency to become involved in controversies between nonexclusive alternatives or dichotomies that actually represent only the end points of a continuum. As is typical of all ofthe sciences, usually both extreme points ofview have some support and eventually both are reconciled in terms of some compromise theory. 5. And another: The descriptive powers of mathematical models are often not distinguished from the many alternative physiological or physical instantiations that can be modeled by any single one of those formularizations. The possibility of analogous processes, describable with a common mathematics but implemented by totally nonhomologous mechanisms, is all too often ignored in perceptual theory. 6. Further, we tend not to be appreciative of the constraints and limits imposed upon the analysis of internal structure by behavioral methods. Moore's (1956) second theorem, the "black-box" limit from engineering, and simple combinatorics all argue that in principle some of the goals of perceptual theorists, particularly those of the "cognitive" persuasion, in which attempts are made to determine the inner mechanisms or processes underlying some observed or inferred mental states, are spurious and overly ambitious. It is worthwhile to restate Moore's (1956) theorem, which is so central to this argument: "Given any machine S and any multiple experiment performed on S, there exist other machines experimentally distinguishable from S for which the
C
SOME THEORETICAL MISJUDGMENTS
285
original experiments would have bad the same outcome" (p. 140). In other words, no experimental tests of a completely unknown and closed automaton can by themselves ever distinguish between two alternative hypotheses that both adequately predict or explain the internal workings of the machine. There are, indisputably, many ways in which to challenge the relevancy of Moore's theorem to perceptual science: Is the brain an automaton in the strict sense of the term? Are there other ways to restrict the number of alternatives and thus to converge on a valid understanding of inner mechanisms? Is the theorem even true? However, in the absence of proven alternatives and equally formal disproofs, it seems appropriate to accept this proof and to acknowledge the fact that there may be, in principle, limits on how far we can go in searching for a structural explanation or analysis of the kind of closed and complex system exemplified by the human mind-brain. To make this point clear, let me suggest that although it is very easy to specify that 2 + 2 = 4, it is impossible, given only the outcome of some such calculation (e.g., 4) to establish what the algorithm or process was that led to that particular number. It might have been 5 - 1, 3 + 1, 22 , or any of an infinite number of computational pathways leading to 4. Even though it is possible to eliminate certain hypotheses, especially in a simple numeric system such as the one exemplified here, there will always be a very large number, perhaps an infinite number of possible alternatives remaining. A closely related argument expressing the difficulty of understanding complex systems even when they are open to internal examination has been made by Crutchfield, Farmer, Packard, and Shaw (1986) in their important and clear discussion of chaos theory-a significant new development in mathematics. These authors point out that seemingly random behavior can be generated by the concatenation of very simple deterministic systems. I The apparent "chaos" of a plume of smoke, for example, unanalyzable back to its origins, results from the amplification of small uncertainties by even very simple interactions in systems of many constituent components and repeated processing steps. If one interprets random States as being, by definition, irreducible to regular, periodic, or lawful forcing functions, then the implication of this new development is that even though simple rules and processes may be involved at some primitive elementallevel, it may be impossible to go backwards from the resulting chaotic or random state (for example, the details ofthe neural activity ofthe brain underlying some mental act) to understand the steps by means of which the outcome was achieved. This is a handicap added to the difficulty of going from the answer 4 to the problem that generated that particular number. 4 is not reversible to 2 + 2 because there are so many routes along which one can pass to get to 4. Chaos is not reversible no matter how precise our knowledge of the details I Though Crutchfield and bis colleagues assert that this is a .. striking" and novel developrnent, the same point was made by Cox and Smith (1954) over 30 years ago and was weil known among sorne statisticians.
286
6.
CRITIQUE OF CONTEMPORARY APPROACHES
of the processes that led to it, because the paths have been obscured by the "magnification of small uncertainties. " If both correct and applicable, this theorem essentially challenges the top-down approach of reductionist-oriented psychologists. Similarly, the goal of realistically synthesizing complex behavior from the bottom up (by means of neural-net models) is also challenged. If concatenations of even modest numbers of real or modeled neurons quicldy produce apparently chaotic behavior, we will never be able to distinguish one of those neural chaoses from another: two "random" systems of this kind would look very much alike (in that both are "random") even if they are actually performing very different perceptual functions. Thus, brain (read that word as "complex neural network") states would be indiscriminable, regardless of how weH the mental states can be distinguished. Brain-mind associations at the network level would, therefore, be impossible. Crutchfield and bis coHeagues (1986) go on to explicitly make the same point I wish to make here. According to them, chaos "implies new fundamental limits on the ability to make predictions. " I assert, in addition, that chaos theory also implies new and fundamental limits on our ability to disassemble complex behavior into its neural, ceHular constituents. Although these authors go on to assert that some classes of random processes may actually be opened to analysis using the mathematics of chaos theory, their argument mainly can be read as implying that there will be fundamental limits emerging from this analysis that will constrain our ability to understand the breakup of smoke, the weather, the erratics of fluid motion, or the operation of a very large number of neurons in some perceptual process. In all of these cases, the strong implication is that it is unlikely that we will ever be able to go backward or down from the concatenated outcome to the rules of the individual elements. In each of these cases, the output is so complicated that it must essentially be considered to be random-thus precluding reductionistic analysis. Although it is possible to demonstrate some global measure that may satisfactorily summarize these random states in some statistical manner (e.g., pressure, the ratio ofturbulent to laminar flow, or perceptual phenomenology), reduction to the behavior of individual elements is impossible in principle as weil as in practice according to chaos theory! In short, the important conclusion is that where such a constraint bad previously been considered to be one of simple complexity and in practice limits on computability, chaos theory now suggests that there may be in principle limits. These systems are not just complex and multivariate, they produce random-that is, intrinsically unanalyzable-behavior. Crutchfield, Farmer, Packard, and Shaw (1986) also note that extreme numerosity of interacting components is not necessary for a chaotic situation to occur-chaos can arise out of the uncertainty that is inherent in repetitive processes of interaction among a few components as weH as from a large number of components. They allude to such "simple" systems as the collision of a few billiard balls as potentially evidencing chaotic behavior in very short periods of
C.
SOME THEORETICAL MISJUDGMENTS
287
time. The reason is that at the microscopic level, even the billiard table and balls turn out not to be a simple system-at each collision there is enormous uncertainty about the points of curvature at which the balls will collide, and as each ball travels across the table it is undergoing serial interactions each of which inserts its own microscopic portion of uncertainty into the quickly emerging chaotic behavior of the balls. Crutchfield, Farmer, Packard, and Shaw's (1986) discussion of chaos is not just mathematical esoterica-it is directly relevant and specifically damning to the neuroreductionistic strategies and theories that I have challenged here. This relevance can be made most dear by letting them speak for themselves, as follows: Chaos brings a new challenge to the reductionist view that a system can be understood by breaking it down and studying each piece. This view has been prevalent in science in part because there are so many systems for which the behavior of the whole is the sum of its parts. Chaos demonstrates, however, that a system can have complicated behavior that emerges as a consequence of simple, nonlinear interaction of only a few components. The problem is becoming acute in a wide range of scientific disciplines, from describing microscopic physics to modeling macroscopic behavior of biological organisms. The ability to obtain detailed knowledge of a system's structure has undergone a tremendous advance in recent years, but the ability to integrate this knowledge has been stymied by the lack of a proper conceptual framework within which to describe qualitative behavior. For example, even with a camplete map 0/ the nervous system 0/ a simple organism, such as the nematode studied by Sidney Brenner 0/ the University 0/ Cambridge, the organism 's behavior cannot be deduced. Similarly, the hope that physics could be complete with an increasingly detailed understanding of fundamental physical forces and constituents is unfounded. 1he interaction 0/ components on one scale can lead to complex global behavior on a /arger scale that in general cannat be deduced from Iawwledge 0/ the individual components. (p. 56; italics added)
The relevance of these comments also can be discerned with regard to the hierarchical model of perceptual processes (Le., detection, discrimination, and recognition) that I have proposed in this book. Specifically, is this type of analysis of perception into a set of subprocesses appropriate? Or, are the subprocesses so entangled by the complexity of the even lesser neural mechanisms that the simplicity that seemed to have been achieved by applying the method of detail in this case is only illusory? Did we lose the "baby" when we abstracted out manipulable components from the "bath water" and segmented perception into its microgenetic subprocesses? Is some of the transitoriness and fragility of our data due to this kind of perceptual chaos? These questions are not answerable at this time, but they must be asked and seriously considered if the course of our science is to be asound one. 7. Another misjudgment is that perceptual theorists displaya terrible weakness for deifying the concepts that they originally invoked merely as useful heuristics
288
6.
CRITIQUE OF CONTEMPORARY APPROACH ES
or metaphors with which to think about some of these terribly difficult problems or as computational analogs with which to describe them. Just as we have progressed to a point in our scientific knowledge of brain and behavior at which no one should now think that there is a homunculus in our head, neither should any reasonable student of cognition now accept the existence of any "list processor" or anything like an "expert system" between our ears. Furthermore, based on what we know about the limits of the neuroreductionistic strategy, perhaps not even the apparent successes of the simplistic, quasi-crystalline type of nervenet models (the regularity of which is their sine qua non) should be allowed to influence our thinking about perceptual systems. The tools ofthe simulation trade are not apriori good theories ofhow the mind works, and to go backwards from even excellent imitations of real cognition to detailed, unique conclusions concerning internal structure and logic in the perceiving brain is patently absurd. 8. Finally, there is a pervasive tendency throughout our profession not to accept three fundamental facts concerning that which modem science has not yet accomplished. a. First, we still have not the slightest inkling of how we bridge the gap from the action of discrete neurons to the molar mental processes that are indisputably the outcome of the interaction of vast networks of these same neurons. Chaos theory and simple combinatorics suggest that such a goal may not be ultimately achievable; even given that any conclusion about the state of the science in the distant future may be arguable, certainly we have not yet achieved any such bridging hypothesis or explanation inn our current psychobiological science. b. Second, we still have virtually no information about the kind of logic and logical processes that are executed by the brain in carrying out perceptual processes at more molar levels. The best nonneural, process-oriented analogies, those proposed by modem computational-vision experts, have no better justification than some of the other more primitive metaphors proposed as verbal or statistical models by earlier cognitive psychologists. The computational models, as David Marr-the late founding father of the field-so correctly asserted, only describe and simulate the transformations that "must" be made to go from an input to an output; but they, like all mathematical models, are essentially indeterminate with regard to the particular neural mechanism by which each step in the transformation is made. To assume that we can take a pair of two-dimensional images and extract invariant information conceming depth from them is a totally reasonable behavioral description of a visual process. To assume that contour enhancement takes place somewhere in this process is a plausible hypothesis. However, to specifically assume that there is a Laplacian operator in the brain is an unwarranted extrapolation from those reasonable assumptions. The point is that even the best fitting mathematical description is not tantamount to a unique definition of internal mechanisms. There are many different sequences of processes that will lead to the same transformation between input and output. c. Third, in this same vein, my retrospective examination of the discussions in earlier chapters of this book suggests that there is currently precious little data
D.
DO LAWS OF PERCEPTION EXIST?
289
to vigorously support any "cognitive-type" model of internal process. Controversy, disagreement, and conflict seem to be more characteristic of what we think we know about internal processes than is consensus. This ongoing inability to achieve closure in cognitive studies may itself once again reflect the fact, as I argue, that the exposition of internal mechanisms by behavioral techniques may be in principle an unsolvable problem. I must stress that it is totally inappropriate to consider any of the major theoretical transgressions in perceptual theory as the product of an "incompetent" group ofperceptual scientists. This is not the case. The real source ofthese difficulties with contemporary theory is the terrible task that psychologists, in general, and perceptual theoreticians, in particular, have set for themselves. We are not dealing with simple, single-valued functions, but with an active, adaptive, interpreting, responding, self-modifying mechanism-the human brainthat often changes the rules of the perceptual game in rnidstream. How often has an experiment "failed" because the observers "played a different game" than the one the experimenter tried to define by the experimental protocol? How varied is our repertoire of illusions 2-the common discrepancies between the message of the physical stimulus and the perceptual responses? The perceiving brain does not slavishly respond to stimuli but acts upon them. While the experimenter tries hard to devise a clever task to get at an underlying process, the observer in the experiment is working equally hard to devise a clever process to adapt to the needs of the task. Thus, cognitive penetration seems to reach deep into the visual process, into processes that at first glance seem to happen almost automatically, but which, in fact, on close analysis reflect more of Helmholtz's unconscious inference than it is popular to admit these days. The "cognitive specter," to which Bridgeman alludes, does more than simply haunt the neurosciences-it demonically possesses it! D. DO LA WS OF PERCEPTION EXIST? The adaptability of the human visual system is so great that it is almost as if a universal rule is operative that prohibits universal rules. We rnight call this the Rule 0/ Multiple Rules: Slight changes in procedure, stimulus material, or methodology often produce dramatic changes in the rules 0/ perception. 3 One 2 From one point of view, illusions may be thought of as rnaking the opposite arguments. In spite of repeated demonstrations that a line is straight, or two objects are the same size, or that a spiral is standing still, we still "see" these discrepancies with physical reality. However, I believe that this reflects the prepotency of the implied meaning of the stimulus over these other objective measures. Both meaning and secondary measures are examples of cognitive penetration; one merely dominates the other. 3 Some of the material in the following discussion of the Rule of Multiple Rules has been revised and edited from another of my works (Utta1, 1987) but is considered to be so germane to the current discussion that I have included it here in an edited and updated form.
290
6.
CRITIOUE OF CONTEMPORARY APPROACH ES
implication of this generalization is that the perceptual system must now be thought of as operating in a highly active way on any stimulus input rather than in a highly passive and automatic manner. Such a property, along with complexity and multivariateness per se, makes prediction of experimental outcomes and generalization to other experimental situations extremely difficult. Tbat this generalization should emerge after a century or more of experimental work based on the hope and prernise of unification and simplification is unfortunately perplexing, counterintuitive, somewhat distressing, and certainly surprising. It has been virtually axiomatic in psychophysical research that, if we are diligent in our collection of descriptions of the phenomena of vision, in the long ron general principles of perception should emerge that will unify the outcomes of what often seems to be, at best, a random assortment of the results of small-scale and isolated experiments. As successful as the proposition of ultimate generalization has been in other scientific enterprises, the analogous hypothesis that psychophysical phenomena can also be so unified remains unproven and, astonishingly, largely untested. Perceptual psychology currently remains a collection of small and seerningly unrelated empirical throsts. Certainly, it is only in the rarest cases that psychophysical data obtained from different paradigms and under different conditions have even been compared. Perceptual psychophysics has long been characterized by experiments specific to a rnicroscopically oriented theory and by theories that either deal with a narrowly defined data set at one extreme or, to the contrary, a global breadth that is so great that data are virtually irrelevant to their construction. Theories of this kind are more points of view than analyses. Tbe question posed now is: Is the lack of unification and the absence of truly comprehensive theoretical simplifications (i.e., generalizations), which is apparent in contemporary psychophysical science, a result of the youth of the science, or, to the contrary, does it reflect in some fundamental way the actual biological rea1ity of perceptual processes? Tbough the latter alternative is anathema to both experimental and theoretical psychologists and, from some points of view, a depressing prospect, it can not be rejected out of hand. It is at least conceivable that the perceptual brain-mind operates by means of subprocesses that are more independent and noninteracting than we had anticipated or hoped. It is entirely possible that superficially sirnilar visual processes may be mediated by quite different underlying mechanisms. Tbere have been so few instances in which a sufficiently wide range of experimental conditions has been explored within the context of a single paradigm that there is actually little support for the antithesis-the idea that unification is, in fact, possible (regardless of how much such an outcome would have pleased us or satisfied William of Ockham or Lloyd Morgan). A closely related idea is that all perception is entirely uncodifiable and enormously adaptive inference; a necessary consequence is that because the sensory channels are so heavily coded, the observer can never know the world with certainty. Perception, after all, exists for the survival of the organism, not for the 0'
D.
DO LAWS OF PERCEPTION EXlsn
291
convenience of the theorist. The concept of rigid laws or mIes may be another one of those inapplicable ideas uncritica1ly transposed from physics to psychology . Furthermore, it must not be overlooked that there is also a possibility that the difficulty in identifying universallaws is also caused by differences in the perceptual strategies of individual observers. Evidence that individual differences are larger than we had thought even in carefully controlled and contrived experimental situations can be found in the work of Ward (1985) at even such a relatively early level of processing as that of defming whether stimulus dimensions will be integral or separable in Shepard's (1964) and Garner's (1974) sense. Intraobserver experimental designs (i.e., using the same observer, to the extent possible, for all conditions of an experiment that are to be compared) are absolutely necessary , if it turns out that in fact different mIes are applied by different observers. Even such designs are not foolproof, however. Shifting relationships among stimuli and alternatively selected responses may result in strategies that vary from task to task even for the same observer . The evidence for independence of processes, however strongly they may interact, rather than generality is prevalent in the fmdings of perceptual psychophysics once one begins to look for it. As one goes from one laboratory to another, or from one research problem to another, there is rarely any linkage between the various outcomes. Furthermore, as we survey the history of psychophysical research, how often we notice that the classic summary statements are clusters of almost independent mIes (e.g., Korte's, 1915, laws of apparent movement; Wertheimer' s, 1923, enunciation of the Gestalt Rules of Grouping; Grassman 's, 1853, laws of color mixture; etc.) rather than a single unified conclusion or formula tying together the separate results of experiments carried out in different settings. Many other psychologists have also noted the absence of universal principles in perception. Hurvich, Jameson, and Krantz (1965) have suggested that this is the case in their insightful comment: "The reader familiar with the visualliterature knows that this is an area of many laws and little order" (p.lOl). Ramachandran (1985) phrased it neatly when he raised the possibility that vision is characterized more as a perceptual "bag oftricks" (p. 101) than by great universal principles. Of course, in any theoretical endeavor everything looks like a "bag of tricks" early in the game before the unifying principles become evident. Nevertheless, an increasingly large number of observers of this field agree with the conjecture that a widely diverse set of mathematical models may be necessary to describe what are best viewed as a set of nearly independent visual processes. Grossberg (1983) makes the same point by listing the numerous different mathematical models that are now used to describe visual processes of which seems to be applicable to another. He also alludes to a comment by Sperling (1981) concerning the necessity of multiple formal models (and thus multiple, and presumably independent, internal mechanisms). A specific instance in which this same sort of idiosyncratic perceptual behavior
292
6.
CRITIQUE OF CONTEMPORARY APPROACHES
is rampant has been noted by Grossberg and Mingolla (1985). Pointing out that the way texture segregation occurs depends more on the "emergent perceptual units" than on the "local features" of the stimulus, they warn that this "raises the possibility of scientific chaos." In their words: If every scene can define its own context-sensitive units, then perhaps object percep-
tion can only be described in terms of an unwieldy taxonomy of scenes and their unique perceptual units. One of the great accomplishments of the Gestaltists was to suggest a short list of rules for perceptual grouping that helped to organize many interesting examples. As is often the case in pioneering work, the rules were neither always obeyed nor exhaustive. No justification for the rules was given other than their evident plausibility. More seriously for practical applications, no effective computational algorithms were given to instantiate the rules. (p. 142)
It should not go unmentioned, however, that Grossberg and Mingolla provide in this article what they believe to be a step forward from the "scientific chaos" that they perceive as such a danger . Their model is based upon a set of analytic expressions that are colleetively called the "Boundary Contour System Equations." In their 1985 paper, Grossberg and Mingolla do apply the model to a number of more or less well-known perceptual phenomena with a substantial amount of succesS. These phenomena include certain textural discriminations (Beek, Prazdny, & Rosenfeld, 1983); the neon spreading illusion (Van Tuijl, 1975); the Glass Moire patterns (Glass & Switkes, 1976); and the Cafe Wall illusion (Gregory & Heard, 1979). In doing so, they have linked several visual phenomena to a common mechanism and may have taken a step forward from the "scientific chaos" they have viewed with such alarm. However, Grossberg and Mingolla do stray from their goal of finding universal mechanisms in a way that suggests that they are still suffering along with the rest ofus from the problem of idiosyncratic roles. In analyzing Beck's data, they point tOß "remarkable aspeet" of perceptual grouping due to colinearity. They ask: "Why do we continue to see aseries of short lines if long lines are the emergent feature that control perceptual grouping?" (p. 150). Their response to this question is to invoke at least two separate and distinct perceptual "outputs" from the boundary-contrast system; one of which is terminator sensitive and one of which is not. Both, however, influence the perceptual outcome of the stimulus. Unfortunately, this invocation of multiple mechanisms appears to me to be conceptually identical to the "idiosyncratic roles" solution to the scientific-chaos problems in visual psychology about which they and others have complained. The invocation of multiple mechanisms is identical, in principle, to permitting additional degrees of freedom in an increasingly flexible model within which a wider variety of functions can be fit. The situation seems the same even when we are dealing with as specific and fundamental a problem as the search for a putative universal metric of visual space. How does the visual system distort or transform physical space as it views it with
D.
DO LAWS OF PERCEPTION EXIST?
293
its "cyclopean eye" -an "eye" influenced by many monocular and dichoptic cues? Wagner (1985), in the very act ofpresenting a new metric for the transformations assayed by bis experimental procedure, came to the conclusion: "In sum, this multiplicity of well-supported theories indicates that no single geometry can adequately describe visual space under an conditions. Instead the geometry of visual space itself appears to be a function of stimulus conditions" (p. 493). And, 1 might add, of procedure as well. Haig (1985) alludes to the same limitations on the search for generalities with regard to face recognition when he notes: "Individual differences (in recognition strategy) are strong, however, and the variations are such that the uncritical application of generalized-feature-salience lists is neither useful nor appropriate" (p. 601). Haig also explains that different stimulus-faces seem to evoke different recognition strategies, thus further complicating the search for simple rules of face perception in particular and fonn perception in general. It is possible that we simply do not yet perceive the grand scheme because our experiments have been too spotty and disorganized. The unfortunate conclusion is that whatever the youth of this science, the current state of theory is one that seems to support the unhappy conclusion that separation and independence of the constituent processes of perception and idiosyncratic behavior may be real and not artifacts of an inadequate experimental technology. It should also be noted, lest one incorrectly concludes that the absence of general roles is unique to vision, that the underlying separateness of function seems also to be typical of many other cognitive processes. Indeed, in a recent report, Hammond, Hamm, and Grassia (1986) summed up the general problem in the following way: Doubts about the generality of results produced by psychological research have been expressed with increasing frequency since Koch (1959) observed, after a monumental review of scientific psychology in 1959, that there is 'a stubborn refusal of psychological findings to yield to empirical generalization' (pp. 729-788). Brunswik (1952, 1956), Campbell and Stanley (1966), Cronbach (1975), Epstein (1979, 1980), Einhorn and Hogarth (1981), Greenwald (1975, 1976), Hammond (1966), Meehl (1978), and Simon (1979), among others, have also called attention to this situation. Jenkins (1974), wamed that 'a whole theory of an experiment can be elaborated without contributing in an important way to the science because the situation is artificial and nonrepresentative' [italics added] (p. 794). Tulving (1979) makes the startling observation that 'after 100 years of laboratory-based study of memory, we still do not seem to possess any concepts that the majority of workers would consider necessary or important.' (p. 3)
Harnmond, Harnm, and Grassia (1986) argue that at least in the fields that they have surveyed this situation is caused not by the nature of human biology, but, rather, by the absence of an appropriate analytic methodology. They propose a technique they suggest would help to alleviate the lack of generality in
294
6.
CRITIQUE OF CONTEMPORARY APPROACHES
studies of cognitive judgment-their field of interest. It is not possible for me to judge if their technique is suitable for the kind of perceptual separateness observed in the perceptual domain-the reader will have to refer to their work for details; here I merely raise the issue for other students of cognitive psychology to ponder. The list of other distinguished psychologists who have made the same point includes Ulrich Neisser (1976). He also noted the absence of generality and of the limits of psychological facts to the specific experiments that originally elucidated them in the field of cognitive psychology. In summary, a considerable body of theoretical and empirical research, therefore, does seem to currently support the argument that the perceptual system is a constellation of relatively idiosyncratic and independent information-processing engines. Furthermore, analyses of a variety ofhigher level cognitive approaches also suggest that narrowness, specificity, and a lack of generality characterize work in that domain. We should make no mistake about this point: However abstract and esoteric it may seem, however remotely "philosophical," the issue raised is fundamental. Have we missed the generalities (assuming they are there in some true biological sense) because ofthe method of detail that we must use for practical, paradigmatic reasons? Or, to the contrary, has our "hope" that these generalities exist blinded us to a very important, although contradictory, generality in its own right-namely, that because of the enormous adaptability of the human cognitive system (i.e., the mind), there are few perceptual generalities beyond the most global or the most trivial to be discovered concerning visual perception? To conclude this discussion, it may be more positively hypothesized that perhaps the elusive laws and aggravating variability observed in human form perception are but other arguments for the deep cognitive penetration of OUT visual processes. Perhaps we will have to accommodate ourselves to the performance of a system that itself is so adaptive that it permits strategy shifts in what are ostensibly the most well-controlled experiments, and that allows wide-ranging individual differences to influence data. That system may operate, in general, more by what has classically been called rationalistic than by empiricistic principles.
E. A SUMMARY AND A PRESCRIPTION FOR THE FUTURE In summary, my review of the data and theories of visual form perception in the preparation of this volume has left me with two very different views of the nature of psychobiological reality in this science-what the science is at present, and what it should be in the future. Clearly we are in a phase of the study of form perception that is characterized by the terms: elementalistic, empiricistic, and reductionistic.
E.
A SUMMARY AND A PRESCRIPTION FOR THE FUTURE
295
Elementalistic Form Perception. There is a pervasive local-feature orientation in modem theory, perhaps due to the absence of good tools to study overall organization. This is in contrast to the demonstrations and formal data that seem to support a holistic, Gestalt, global kind of thinking emphasizing the arrangement of the parts rather than the nature of the parts. Empiricistic Form Perception. The vast amount of contemporary theory in form perception assumes that the perceiving organism operates on the basis of passive, automatic, algorithmic interpretations of those local features by what are essentially rigid and mechanical computational engines. This is in contrast to a vast amount of phenomenal evidence that form perception is so adaptive and interpretive, and is so influenced by the meaning of the stimulus, that it would be better to classify it as rationalistic. Reductionistic Form Perception. There is an enormous contemporary confidence that form recognition can be analyzed into the underlying constituent mechanisms and processes in the not-too-distant future. This philosophy operates at two levels. First, it is assumed that neurophysiologica1 fmdings can be used as a model of form perception in spite of the arguments from chaos, automata, and combinatorial considerations that such a reductionism is, in principle, not to be realized. Second, it is also assumed that perception can be reduced to units of cognitive process, which themselves may or may not be reducible to neurophysiological terms. Such a conviction, however, also flies in the face of the arguments that "black boxes" cannot have their internal mechanisms uniquely defined by input-output methods alone and that for combinatorial reasons, neither can the cellularly oriented neurophysiologist neurosurgeon open the "black box. " What major guidelines should direct perceptual research in the immediate and long-term future? Here are my suggestions. 1. First, it must be recognized that the many applied computer-vision and Artificial Intelligence models of human form perception, however useful they may be and however weIl they mimic the properties of human vision, are not necessarily valid explanations or theories ofhuman vision. Indeed, it can be argued that some models (such as the "expert system' ') are a complete surrender of the hope that we can really model human mental processes. These table-lookup operations clearly do not model the way human associative thinking works, but simulate the behavior by logics and mechanisms that are beyond a doubt entirely different than those used in human cognition. To put it bluntly, "expert systems" may be the unacknowledged swan song of a dying belief-that natural intelligence can be realistically modeled on computers. It is essential that the relationship between an imitation by an analogy and the elucidation of a homologous 10gica1 mechanism must be clarified and understood. Even though we can admire and respect the practical and useful accomplishments of this field of engineering application and development (i.e., AI), we must rid ourselves of any misconception that such
296
6.
CRITIQUE OF CONTEMPORARY APPROACHES
engineering tools are any more likely than any other type of model to be valid theories of human perception. Computational modeling of visual processes represents a special case and is an especially seductive quasi-theory, but it must also fall victim to this same criticism in the final analysis. Computational models are designed to simulate the transformations that must be executed to go from the informational state defined by the stimulus input to the state described by the perceptual phenomenology. As such, they are also process analogs and may plausibly invoke any useful (and available) mathematical or computational process to accomplish the transformation. Although this is an extremely useful approach to understanding some of the transforming steps, it does nothing to tell us which of the many possible alternative machanisms or logics within the visual system is the one that actually carries out these transformations. Indeed, the steps need not even be computationally defined in the sense demanded by Marr and his collaborators. The human brain may use approximate processes quite unlike the Marrian algorithms. These approximations may depend upon linkages of meaning and global organization rather than upon numerical or algebraic transformations of local attributes. These proces ses (or combinations of processes) may produce solutions to perceptual tasks that are totaHy adequate, even though not satisfying the criteria of the computational theorists as a good model. In short, computational modeling is also subject to the limits imposed by the Moore theorem and the fundamental indeterminacy of mathematical descriptions or input-out analysis with regard to internal structure. 2. My second proposed guideline requires acknowledgment that human visual perception is mainly holistic in its operation. The Gestalt psychologists understood and correctly taught this principle, but their wisdom was not influential because the computational and mathematical technology that was needed to pursue the holistic strategy was not then and is not now, for all practical purposes, available. We have an enormous obligation to convince mathematicians to develop techniques better suited to studying arrangement than parts. A major effort is necessary to develop the appropriate mathematics in order that some future equivalent of a noneuclidean mathematics can be utilized by some future equivalent of an Einstein to make the much-needed breakthrough in visual theory, so that our science can enjoy the same kind of growth in understanding. This breakthrough may not be in the form of general principles, but perhaps a softer kind of mathematics, able to handle different kinds of relationships beyond added to, subtracted from, and multiplied or divided by. 3. Next, we must determine what limits apply to the goals of visual theory buidling. There is an urgent need for additional efforts to determine what constraints are operating on this science so that we can avoid a naive and enormous waste of impossible theory-building energy. That there should be limits is in no sense a condemnation of perceptual psychologists or psychology any more than acceptance of the limits on perpetual motion or speed of light are of physics. It is clear, however, that perceptual science can be correctly andjustifiably criti-
E.
A SUMMARY AND A PRESCRIPTION FOR THE FUTURE
297
cized for not paying sufficient attention to the fundamentals before going off illprepared and overconfident into the heady world of neurophysiological or cognitive process reductionism. 1 am convinced that a few more skeptical combinatorial, automata, or chaos theorists interested in the problems raised by perceptual psychology would do more for the future progress of our science than an army of "true believers" in the ultimate solvability of all our problems. 4. Another guideline stresses that the empirical psychophysical approach, in which the phenomena are sought, discovered, and described, must be the centerpiece of any new development in this science. This empirical effort, however, should be redirected to emphasize the global or holistic properties of stimuli rather than the local ones currently in vogue. This is the most effective means of diverting the zeitgeist from what it is to what it should be. 5. Further, we must develop rnathematical models that concentrate on quantifying, formalizing, and describing reported perceptual phenomena. But not just any models: It is mandatory that there be a conscious effort to develop techniques that emphasize the global and organizational attributes of a stimulus-form. It is also mandatory that we understand the intrinsically nonreductive nature of mathematics in this kind of theory building. 6. Reluctantly, given my personal scientific background, 1 think that we are going to have to abandon the idea that perceptual processes can be reduced to neurophysiological terms. This romantic notion, this will-of-the-wisp, this dream, is almost certainly unobtainable in principle as weIl as in practice if combinatorial and chaos theory do turn out to be applicable to this domain of inquiry. What we know about the metabolism and physiological functioning of individual neurons, though a distinguished intellectual and scientific accomplishment in its own right, can probably never be transformed into knowledge ofhow they operate collectively in the enormous networks of the brain to produce molar behavior. 7. Equally reluctantly, 1 believe an appreciation must emerge that the major goal of cognitive psychology-to determine the functional processes that are carried out by the nervous system in form perception or, for that matter, in any other kind of mental activity-will always be equivocal. Not only have the data been inconsistent, but so too have been the conelusions drawn. These outcomes reflect the enormous adaptability of the perceiver on the one hand, and the fundamental indeterminateness of any theory of the processes going on within what is, for any conceivable future, a elosed system. 8. We will also have to come to appreciate that the study of perception, as all of the other cognitive processes, is an information-processing science, and not an energy- or rnatter-processing one. The nature of intemal codes and representations, therefore, is far more arbitrary and complex, and the laws describing operations are necessarily going to appear to be far less rigid than those emerging from the study of simple physical systems. Indeed, there is even a question whether or not the general concept that "laws" exist that are operative in the energy/matter dominated fields of science may be transferable to this much more
298
6.
CRITIOUE OF CONTEMPORARY APPROACH ES
multivariate domain of perceptual processes. I argue that stimuli do not lead inexorably to responses by simple switching circuitlike behavior. Rather, it seems that there is a rational, meaningful, adaptive, utilitarian, and active construction of percepts and responses by mechanisms that depend more upon the meaning of a message than its temporal or spatial geometry. In the perceptual world, information, unlike matter or energy in the physical world, can actually be created and destroyed. 9. We are going to have to accept the reality of mental processes, and the fact that these processes are the result of ultra-complex neurophysiological mechanisms. The basic principle of psychobiological monism asserts that there is nothing supernatural, extranatural, or even separate at work in mental activity-it is nothing more or less than one of the processes of neural activity. But we do have to appreciate that complexity and numerosity themselves can exert influences that come perilously dose to producing exactly the same kind of mysteries that would appear if there were unnatural forces at work. Thus, we must at once reaffirm our commitment to psychobiological monism (without which any scientific study of the mind would certainly perish) and at the same time acknowledge that the gap between the two levels of discourse-neuronal network state and mental phenomenon-may never be crossed. This requires an epistemological or methodological behaviorism in practice and a metaphysical neuroreductionism in principle. The intrapersonal privacy of perception may also require some compromises in that it, too, forces us to simultaneously function as behaviorists and introspective mentalists. Nevertheless, we must accept the facts that mind, in general, and perceptual experience, in particular, exist, that they are unobservable directly, and that they Can at best only be inferred from behavioral responses by observers. The interpersonal privacy of a percept may be as much a barrier to analysis as is the combinatoriallimit or the "black-box"constraint. 10. Finally, we are going to have accept the primacy of the phenomena in any controversy between different points ofview or theories in perceptual science. That is, the final arbiter of any explanatory disagreement or controversy must be the reported nature of the perceptual experience. Neurophysiology, mathematics, parsimony, and even some kind of simplistic plausibility are all secondary and incomplete criteria for resolving such disputes. The perceptual experience is the final outcome of a concatenation of processes and is complete in the sense that it reflects all ofthe relevant previous steps. Anything-idea, theory, formal model, or verbal explanation-that is in conflict with the perceptual phenomenon, in principle, requires modification or rejection. This does not mean that the percept can define everything or even indicate to us what the underlying processing steps were, but rather that in those cases where a conflict between observation and explanation does occur, the former must be definitive. At a qualitative level the perceptual phenomenon is also a good source of heuristics for theory building in this science simply because it is the stuff of this science: Perceptual psychologists are primarily in business to describe and explain the psychobiological reality we
E.
A SUMMARY AND A PRESCRIPTION FOR THE FUTURE
299
call perceptual experience, not to exercise computers or to speculate about uses for the increasingly large number of anatomically or physiologically specialized neurons that are appearing at the tips of our microelectrodes. In short, what I am proposing is a mathematically descriptive, IIOnreductionistic, holistic, rationalistic, mentalistic, neobehaviorism that is guided more by the relevant phenomena than by available analytic tools-a neobehaviorism ambitious to solve the classic problems of perceptual psychology, but modest in avoiding recourse to strategies that are patently beyond the limits ofthis or any other science. All too much of our effort has been spent on unattainable goals in the past few decades. I believe such a strategy would be a step toward a realistic and mature scientific approach to understanding how people see forms.
REFERENCES
Abadi, R. V. (1976). Induction masking-a study of some inhibitory interactions during dichoptic viewing. Vision Research, 16, 269-275. Abelson, R. P. (1973). The structure of belief systems. In R. C. Schank & K. M. Colby (Eds.), Computer models of thought and language. San Francisco: Freeman. Abu-Mostafa, Y. S., & Psaltis, D. (1987). Optical neural computers. Scientific American, 256, 88-95. Adelson, E. H., & Bergen, J. R. (1984). Motion channels based on spatiotemporal energy.lnvestigative Ophthalmological Visual Science Supplement, 25, 14. Aiba, T. S., & Granger, G. w. (1983). Colour and position processing mechanisms in human vision. Hokkaido Behavioral Science Repon, Series P, No. 12. Alpern, M. (1953). Metacontrast. Journal ofthe Optical Society of America, 43, 648-657. Anderson, J. R. (1978). Arguments concerning representations for mental imagery. Psychalogical Review, 85, 249-277. Andreassi,1. L., Mayzner, M. S., Beyda, D., & Waxman, J. (1970). Sequentiai blanking: A Ushaped function. Psychonomic Science, 18, 319-321. Andrews, H. C. (1970). Computer Techniques in lmoge Processing. New York: Acadernic Press. Andrews, H. C., & Hunt, B. R. (1977). Digital image restoration. Englewood Cliffs, NJ: Prentice-Hall. Appleman, J. B., & Mayzner, M. S. (1982). Application of geometric models to letter recognition: Distance and density. Journal of Experimental Psychalogy: General, 111, 60-100. Arbib, M. A. (1975). Parailelism, slides, schema, and frames. In M. A. Arbib (Ed.), Two papers on schema and frames (COINS Technicai Report 75C-9). Amherst, MA: Department of Computer and Information Sciences, University of Massachusetts. Armstrong, D. M. (1960). Berkeley's theory ofvision. Melbourne: Melbourne University Press. Arnheim, R. (1974). An and visual perception: A psychology of the creative eye, the new version. Irvine: University of Caiifornia Press. Attneave, F. (1950). Dimensions of sirnilarity. American Journal of Psychology, 3, 516-556. Attneave, F., & Arnoult, M. D. (1956). The quantitative study of shape and pattern perception. Psychological Bulletin, 53, 452-471. Auslander, L., & Mackenzie, R. E. (1963). Introduction to differentiable manifolds. New York: McGraw-Hill.
300
REFERENCES
301
Bachmann, T. (1980). Genesis of subjeetive image: Acta et Commentatione. Universitat Tancuensis #522 Problems of Cognitive Psychology. 102-126 Tartu Estonia, USSR. Bachmann, T., & Allik, J. (1976). Integration and interruption in the masking offonn by fonn. Perception, 5, 79-97 Bamber, D. (1969). Reaction limes and error rates for "same" - "different" judgments of multidimensional stimuli. Perception and Psychophysics, 6, 169-174. Banks, W. P., Bodinger, D., & lllige, M. (1974). Visual detection accuracy and target noise proxirnity. Bulletin of the Psychonomic Society, 2, 411-414. Banks, W. P., & PrinzmetaJ, W. (1976). Configuration of effects in visual information processing. Perception and Psychophysics, 19,361-367. Barlow, H. B. (1972). Single units and sensation: A neuron doctrine for perceptual psychology. Perception, 1, 371-394. Barlow, H. B. (1978). The efficiency of deteeting changes of density in random dot patterns. Vision Research, 18, 637-650. Barlow, H. B. (1982). Tbe past, present and future of feature deteetors. In S. Levin (Ed.), Lecture notes in biomathematics, Vol. 44 ofDuane G. Albreeht (Ed.), Recognition ofpauem andform. Berlin: Springer-Verlag. Barlow, H. B., & Levick, W. R. (1965). The meehanism of direetionality seleetive units in the rabbit's retina. Journal of Physiology, 178, 477-504. Barlow, H. B., & Mollon, J. D. (1982). The senses. Cambridge: Cambridge University Press. Barlow, H. B., & Reeves, B. C. (1979). The versatility and absolute efficiency of detecting mirror symmetry in random dot displays. Vision Research, 19, 783-793. Baron, J. (1978). The word-superiority effeet: Perceptuallearning from reading. In W. K. Estes (Ed.), Handbook of leaming and cognitive processes. (V01. VI). Hi11sdale, NJ: Lawrence Erlbaum Associates. Beek, J. (1966). Perceptual grouping produced by change in orientation and shape. Science, 154, 538-540. Beek, J. (1972). Surface color perception. Ithaca, NY: Cornell University Press, 1972. Beek, J. (1982). Textura! segmentation. In J. Beek (Ed.), Organization and representation in perception (pp. 285-317). Hillsdale, NJ: Lawrence Erlbaum Associates. Beek, J. (1983). Textura! segmentation, seeond-order statistics, and textura! elements. Biological Cybemetics, 48, 125-130. Beek, J., Prazdny, K., & Rosenfeld, A. (1983). A theory of textura! segmentation. In J. Beek, B. Hope, & A. Rosenfeld (Eds.), Human and machine vision. New York: Academic Press. Beller, H. K. (1970). Parallel and serial stages in rnatching. Journal of Experimental Psychology, 84, 213-219. Bennett, B. M., & Hoffman, D. D. (I 985a). The computation of structure from fixed axis motion: Nonrigid structures. Biological Cybemetics, 51, 293-300. Bennett, B. M., & Hoffman, D. D. (1985b). Shape deeompositions for visual shape recognition: The role oftransversality. In W. A. Richards (Ed.), Image understanding: 1985. Norwood, NJ: Ablex. Bennett, B. M., Hoffman, D. D., & Prakash, C. (1988). Observer mechanics. Bergen, J. R., & Julesz, B. (1983). Parallel versus serial processing in rapid pattern discrimination. Nature, 303, 696-698. Berkeley, G. (1954). An essay towards a new theory ofvision and other writings. New York: Dutton. (Originally published as An Essay Towards a New Theory of Vision, 1709) Bernstein, I. H., Proctor, R. W., & Schunnan, D. L. (1973). Metacontrast and brightness discrimination. Perception and Psychophysics, 14, 293-297. Biedennan, I. (1987). Reeognition-by-components: A theory of human image understanding. Psychological Review, 94, 115-147. Bjork, E. L., & Murray, J. T. (1977). On the nature of input channels in visual processing. Psychological Review, 84, 472-484. Blakemore, C., & Campbell, F. W. (1969). On the existence ofneurons in the human visual system
302
REFERENCES
seleetively sensitive to the orientation and size of retinal images. Journal of Physiology, 203, 237-260. Blank, A. A. (1959). The Luneberg theory ofbinocular space perception. In S. Koch (Ed.), Psychology: A study of a science (Vol. 1). New York: McGraw-Hill. Bouma, H. (1971). VisuaI reeognition of isolated lower-case letters. Vision Research, 11,459-474. Boynton, R. M. (1972). Discrimination of homogeneous double pulses of light. In D. Jameson & L. M. Hurvich (Eds.), Handbook of sensory physiology; Visual psychophysics (Vol. VII/4) , pp. 202-232 New York: Springer-Verlag. Boynton, R. M., Elworth, C. L., Onley, J. W., & Klingberg, C. L. (1960). Form discrimination as predicted by overlap and area (Report RADC-TR: 60-158). Rome, NY Rome Air Development Center, Air Research and Development Command. Brand, J. (1971). Classification without identification in visuaI search. Quanerly Journal of Experimental Psychology, 23, 178-186. Braunstein, M. L. (1976). Depth perception through motion. New York: Acadernic Press. Braunstein, M. L. (1983). Contrasts between human and machine vision: Should technology recapitulate phylogeny? In J. Beek, B. Hope, & A. Rosenfeld (Eds.), Human and machine vision. New York: Acadernic Press. Braunstein, M. L., Hoffman, D. D., Shapiro, L. R., Andersen, G. J., & Bennett, B. M. (1986). Minimum points and views for the recovery of three-dimensional structure. In Studies in the cognitive sciences (No. 41). Irvine, CA: School of Social Sciences, University of California. Breitmeyer, B. G. (1984). Visual masking: An integrative approach. New York: Oxford University Press. Breitmeyer, B. G., & Ganz, L. (1976). Implications ofsustained and transient channels for theories ofvisual pattern masking, saccadic suppression, and information processing. Psychological Review, 83, 1-36. Brick, D. B. (1%9). Pattern recognition: The challenge, are we meeting it? In N. S. Watanabe (Ed.), Methodologies ofpattem recognition. New York: Acadernic Press. Bridgeman, B. (1971). Metacontrast and lateral inhibition. Psychological Review, 78, 528-539. Bridgeman, B. (1987). Psychology and neuroscience. Review ofmind and brain: Dialogs in cognitive neuroscience. Science, 235, 373-374. Broadbent, D. (1977). The hidden preattentive process. American Psychologist, 32, 109-118. Brown, D. R., & Owen, D. H. (1967). The metrics of visuaI form: methodological dyspepsia. Psychological Bulletin, 68, 243-259. Bruner, J. S., & Postman, L. (1947). Perception, cognition, and behavior. Journal of Personality, 16,69-77. Bruner, J. S., & Postman, L. (1949). Perception, cognition, and behavior. Journal of Personality, 18, 14-31. Brunswik, E. (1939). The conceptnal focus of some psychological systems. Journal of Unified Science, 36-49. Brunswik, E. (1952). The conceptnal framework of psychology. In International encyclopedia of unified science (Vol. 1; No. 10). Chicago: University of Chicago Press. Brunswik, E. (1955). Representative design and probabilistic theory in a functional psychology. Psychological Review, 62, 193-217. Brunswik, E. (1956). Perception and the representative design ofpsychological experiments (2nd ed.). Berkeley: University of California Press. Bunge, M. (1980). 1he mind-body problem: Apsychobiological approach. Oxford: Pergamon Press. Burgess, A. E., & Barlow, H. B. (1983). The preeision of numerosity discrirnination in arrays of random dots. Vision Research, 23, 811-820. Burgess, A. E., Jennings, R. J., & Wagner, R. F. (1982). Statistical efficiency: A measure ofhuman visuaI signal-detection performance. Journal of Applied Photographic Engineering, 8, 76-78. Burgess, A. E., Wagner, R. F., Jennings, R. J., & Barlow, H. B. (1981). Efficiency ofhuman visuaI signal discrimination. Science, 214, 93-94. Burb, A. W. (1974, April). W7w invented the general-purpose electronic computer? Lecture at The University of Michigan. Ann ArOOr, MI.
REFERENCES
303
Burks, A. W., & Burks, A. (1981). The ENIAC: First general-purpose electronic computer. Annals of the History of Computing, 3, 310-399. Burks, A. W., Goldstine, H. H., & von Neumann, J. (1947). Preliminary discussion ofthe logical design of an electronic computing instrument. (Part I, Vol. 1). Princeton, NJ: Institute for Advanced Study. Caelli, T. M. (1980). Facilitive and inhibitory factors in visual texture discriminations. Biological Cybemetics, 39, 21-26. Caelli, T. M. (1981a). Some psychophysical determinants of discrete Moire patterns. Biological Cybemetics, 39, 97-103. Caelli, T. M. (1981b). Visual perception theory and practice. Oxford: Pergamon Press. Caelli, T. M. (1982). On discriminating visual textures and images. Perception and Psychophysics, 31, 149-159. Caelli, T. M. (1985). Three processing characteristics of visual texture segmentation. Spatial Vision, 1, 19-30. Caelli, T. M. (1987, Ianuary). Lecture at Naval Ocean Systems Center-Hawaii Laboratory, Kailua, ill. Caelli, T. M., & Dodwell, P. C. (1980). On the contours of apparent motions: A new perspective on visual space-time. Biological Cybemetics, 39, 27-35. Caelli, T. M., & Iulesz, B. (1978). On perceptual analyzers underlying visual texture discrimination: Part I. Biological Cybernetics, 28, 167-175. Caelli, T. M., Iulesz, B., & Gilbert, E. (1978). On perceptual analyzers underlying visual texture discrimination: Part D. Biological Cybemetics, 29, 201-214. Caelli, T. M., & Yuzyk, I. (1985). What is perceived when two images are combined? Perception, 14, 41-48. Campbell, D. T., & Stanley, I. (1966). Experimental and quasi-experimental designsfor research. Chicago: Rand-McNally. Campbell, F. W., .& Robson, I. G. (1968). An application of Fourier analysis to the visibility of gratings. Journal of Physiology, 197, 551-566. Cannon, T. M., & Hunt, B. R. (1981). Image processing by computer. Scientific American, 245, 214-225. Cattell, I. M. (1886). The time taken by cerebral operation. Mind, 11, 220-242. Cheatham, P. G. (1952). Visual perceptuallatency as a function of stimulus brightness and contour shape. Journal of Experimental Psychology, 43, 369-380. Cheeseman, I., & Merikle, P. M. (1984). Prirning with and without awareness. Perception & Psychophysics, 36, 387-395. Cherry, C. (1957). On human communication. New York: The Technology Press ofMassachusetts Institute of Technology and Wiley. Chomsky, N. (1981). A naturalistic approach to language and cognition. Cognition and Brain Theory, 4(1), 3-22. Coffin, S. (1978). Spatial frequency analysis of block letters does not predict experimental confusions. Perception and Psychophysics, 23, 69-74. Cohen, P. R., & Feigenbaum, E. A. (1982). The handbook of artificial intelligence (Vol. 3). Los Altos, CA: William Kaufmann. Cohn, P. M. (1957). Lie groups. London: Cambridge University Press. Coren, S., & Girgus, I. S. (1978). Seeing is deceiving: The psychology ofvisual illusions. Hillsdale, NI: Lawrence Erlbaum Associates. Cormack, R., & Blake, R. (1980). Do the two eyes constitute separate visual channels? Science, 207, 1100-1101.
Cox, D. R., & Srnith, W. L. (1954). On the superpositionofrenewal processes. Biometrika, 41, 91-99. Crawford, B. H. (1947). Visual adaptation in relation to brief conditioning stimuli. Proceedings of the Royal Society of London, 134, 283-302. Cronbach, L. I. (1975). Beyond the two disciplines of scientific psychology. American Psychologist, 30(2), 116-127.
304
REFERENCES
Crutchfield, J. P., Farmer, J. D., Packard, N. H., & Shaw, R. S. (1986). Chaos. Scientific American, 256(6),46. Cutting, J. E. (1986). Perception with an eye for motion. Cambridge, MA: MIT Press. Dalcq, A. M. (1951). Form anti modern embryology. In L. L. Whyte (Ed.), Aspects ofform. New York: Pellegrini & Cudahy. Daugman, J. G. (1984). Spatial visual channels in the Fourier plane. Vision Research, 24,891-910. Davis, E. T. (1981). Allocation of attention: Uncertainty effects when monitoring one or two visual gratings of noncontinuous spatial frequencies. Perception anti Psychophysics, 29, 618-622. Davis, E. T., Kramer, P., & Graham, N. (1983). Uncertainty about spatial frequency, spatial position, or contrast of visual patterns. Perception anti Psychophysics, 33(1), 20-28. d'Espagnet, B. (1979). The quantum theory and rea1ity. Scientific American, 241(5) 158-181. DeValois, R. L., Albrecht, P. G., & Thorell, L. G. (1982). Spatial frequency selectivity of cells in macaque visual cortex. Vision Research, 22, 545-559. Diener, D. (1981). On the relationship between detection and recognition. Perception anti Psychophysics, 30(3), 237-246. Dodwell, P. C. (1970). Visual Pattern Recognition. New York: Holt, Rhinehart & Winston. Doehrman, S. (1974). The effect ofvisual orientation uncertainty in a simultaneous detection recognition task. Perception & Psychophysics, 15, 519-523. Donders, F. C. (1868). On the speed ofmental processes. (W. G. Koster, Trans). In W. G. Koster (Ed.), Attention anti performance 1I. Acta Psychologica, 1969, 30, 412-431. Dreyfus, H. L. (1979). What computers can 't do: 1he limits of anificial intelligence (rev. ed.). New York: Harper & Row. Duda, R. 0., & Hart, P. E. (1973). Pattern classification and scene analysis. New York: Wiley. Duff, M. J. B. (1969). Pattern computation in pattern recognition. In S. Watanabe (Ed.), Methodologies of Pattern Recognition (pp. 133-140). New York: Academic Press. Duncan, J. (1983). Category effects in visual search: A failure to replicate the "oh-zero" phenomenon. Perception & Psychophysics, 34, 221-232. Earhard, B. (1980).The line in object superiority effect in perception: It depends on where you fix your eyes and what is located at the point of fixation. Perception & Psychophysics, 28, 9-18. EccIes, J. C. (1979). 1he human mystery. Berlin: Springer-Verlag. Egeth, H., & Blecker, D. (1971). Differential effects of farniliarity on judgments of sameness and difference. Perception & Psychophysics, 9, 321-326. Egeth, H. E., Jonides, J., & Wall, S. (1972). Parallel processing ofmulti-element arrays. Cognitive Psychology, 3, 674-698. Einhorn, H.J" & Hogarth, R. M. (1981). Behavioral decision theory: Processes of judgment and choice. Annual Review of Psychology, 32, 53-88. Emerson, P. L. (1979). Necker cube: Duration of preexposure of an unambiguous form. Bulletin of the Psychonomic Society, 14, 397-400. Encyclopedia ofphilosophy (1967). New York and London: Macmillan and Collier. Enns,1. T., & Prinzmetal, W. (1984). The role ofredundancy in the object-line effect. Perception and Psychophysics, 35, 22-32. Enroth-Cugell, C., & Robson, J. G. (1966). The contrast sensitivity ofretinal ganglion cells ofthe cat. Journal of Physiology, 187, 517-552. Epstein, S. (1979). The stability ofbehavior: 1. On predicting most of the people much of the time. Journal of Personality and Social Psychology, 37, 1097-1126. Epstein, S. (1980). The stability of behavior: 11. Implications for psychological research. American Psychologist, 35, 790-806. Eriksen, C. W. (1960). Discrimination and learning without awareness: A methodologica1 survey and evaluation. Psychological Review, 67, 279-300. Eriksen, C. W., Becker, B. B., & Hoffman, J. E. (1970). Safari to masking land: A hunt for the elusive U. Perception and Psychophysics, 8, 245-250.
REFERENCES
305
Eriksen, C. W., & Hoffrnan, M. (1963). Form recognition as a function of adapting field and interval between stimulation. Journal of Experimental Psychology, 66, 485-499. Eriksen, C. W., & Lappin, J. S. (1964). Luminance summation-contrast reduction as a basis for certain forward and backward masking effects. Psychonomic Science, 1, 313-314. Eriksen, C. W., & Lappin, J. S. (1967). Independence in the perception ofsimultaneously presented forms of brief duration. Journal of Experimental Psychology, 73, 468-472. Eriksen, C. W., & Marshali, P. H. (1969). Failure to replicate a reported U-shaped visual masking function. Psychonomic Science, 15, 195-196. Estes, W. K. (1972). Interactions of signal and background variables in visuaI processing. Perception & Psychophysics, 12, 278-286. Estes, W. K. (1974). Redundancy of noise elements and signals in the visuaI detection of letters. Perception and Psychophysics, 16, 53-60. Falzett, M., & Lappin, J. S. (1983). Detection of visual forms in space and time. Vision Research, 23, 181-189. Fitts, P. M., & Leonard, J. A. (1957). Stimulus correlates of usual pattern perception: A probability approach. Columbus: Qhio State University, Aviation Psychology Laboratory. Fodor, J. A. (1978). Tom Swift and his procedural grandmother. Cognition, 6, 229-247. Foster, D. H. (1979). Discrete internal pattern representations and visual detection of small changes in pattern shape. Perception & Psychophysics, 26, 459-468. Foster, D. H. (1984). Local and global computational factors in visuaI pattern recognition. In P. C. Dodwell & T. Caelli (Eds.), Figural synthesis. Hillsdale, NJ: Lawrence Erlbaum Associates. Foster, D. H., & Kahn, J. I. (1985). Internal representations and operations in the visual comparison oftransformed patterns: Effects ofpattern point-inversion, positional symmetry, and separation. Biological Cybernetics, 51, 305-312. Freeman, W. J. (1981). A physiological hypothesis of perception. Perception in Biology and Medicine, 24, 561-592. Freeman, W. J. (1983). The physiological basis ofmental images. Biological Psychiatry 10, 1107-1125. French, R. S. (1954). Pattern recognition in the presence of visual noise. Journal of Experimental Psychology, 47, 27-31. Fu, K. S. (1982a). Applications of Pattern Recognition. Boca Raton, FL: CRC Press. Fu, K. S. (1982b). Syntactic pattern recognition and applications. Englewood Cliffs, NJ: Prentice-Hall. Fukushima, K. (1970). A feature extractor for curvilinear patterns: A design suggested by the mammalian visual system. Kybernetik, 7, 153-160. Fukushirna, K. (1975). Cognitron: A self-organizing multilayered neural network. Biological Cybernetics, 20, 121-136. Fukushirna, K., & Miyake, S. (1982). Neocognition: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition, 15, 455-469. Furcher, C. S., Thornas, J. P., & Campbell, F. W. (1977). Detection and discrirnination of simple and complex patterns at low spatial frequencies. Vision Research, 17, 827-836. Ganz, L. (1966). Mechanism of the figural aftereffect. Psychological Review, 73, 128-150. Gardner, E. P., & Spencer, W. A. (1972a). Sensory funneling. 1.: Psychophysical observations of human subjects and responses of cutaneous mechanoreceptive afferents in the cat to patterned skin stimuli. Journal of Neurophysiology, 35, 925-953. Gardner, E. P., & Spencer, W. A. (1972b). Sensory funneling. 11.: Cortical neuronal representation of patterned cutaneous stimuli. Journal of Neurophysiology, 35, 954-977. Gardner, G. T. (1973). Evidence for independent parallel channels in tachistoscopic perception. Cognitive Psychology, 4, 130-155. Gamer, W. R. (1974). The processing of information and structure. Hillsdale, NJ: Lawrence Erlbaum Associates. Garner, W. R. (1978). Aspects of a stimulus: Features, dimensions, and configurations. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categoriwtion. Hillsdale, NJ: Lawrence Erlbaum Associates. Garner, W. R., & Clement, D. E. (1963). Goodness ofpattern and pattern uncertainty. Journal of Verbal Hearing and Verbal Behavior, 2, 446-452.
306
REFERENCES
Garner, W. R., & Felfoldy, G. L. (1970). Integrality of stimulus dimensions in various types of information processing. Cognitive Psychology, 1, 225-241. Geiger, G., & Lettvin, J. Y. (1986). Enhancing the perception offonn in peripheral vision. Perception, 15, 119-130. Geyer, L. H., & DeWald, C. G. (1973). Feature lists and confusion matrices. Perception & Psychophysics, 14, 471-482. Gibson, J. J. (1950). The perception ofthe visual world. Boston: Houghton Mimin. Gibson, J. J. (1966). The senses considered as perceptual systems. Boston: Houghton Mimin. Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mimin. Ginsburg, A. P. (1983). Visual fonn perception based on biological filtering. In L. Spillmann & B. R. Wooten (Eds.), Sensory experience, adaptation and perception: Festschrift for Ivo Kohler. Hillsdale, NJ: Lawrence Erlbaum Associates. Glass, A. L., Holyoak:, K. J., & Santa, J. L. (1979). Cognition. Reading, MA: Addison-Wesley. Glass, L., & Switkes, E. (1976). Pattern recognition in humans: Correlations which cannot be perceived. Perception, 5, 67-72. Gleitman, H., & Jonides, J. (1976). The cost of categorization in visual search: Incomplete processing of target and field items. Perception and Psychophysics, 20, 281-288. Gleitman, H., & Jonides, J. (1978). The effect of set on categorization in visual search. Perception and Psychophysics, 24, 361-368. Goldmeier, E. (1965). Limits of visibility of bronchogenic carcinoma. The American Review of Respiratory Diseases, 91(2), 232-239. Goldmeier, E. (1972). Similarity in visually perceived forms. Psychologicallssues, 8 (Whoie 1f29). (Originally published 1936) Goldmeier, E. (1982). The memory trace: ltsformation and itsfate. Hillsdale, NJ: Lawrence Erlbaum Associates. Graham, N. (1977). Visual detection of a periodic spatial stimuli by probability summation among narrow-band channels. Vision Research, 17, 637-652. Graham, N. (1980). Spatial frequency channels in human vision: Detecting edges without edge detectors. In C. S. Harris (Ed.), Visual coding and adaptability. Hillsdale, NJ: Lawrence Erlbaum Associates. Graham, N., & Nachmias, J. (1971). Detection of grating patterns containing two spatial frequeneies: A comparison of single channel and multichannel models. Vision Research, 11,251-259. Graham, N., Robson, J. G., & Nachmias, J. (1978). Grating summation in fovea and periphery. Vision Research, 18, 815-826. Grassman, H. (1854). On the theory of compound colours. Philosophical Magazine, 7, 254-264. Green, D. M., & Birdsall, T. G. (1978). Detection and recognition. Psychological Review, 85, 192-205. Green, D. M., & Swets, J. A. (1966). Signal detection theoryandpsychophysics. New York: Wiley. Greenspon, T. S., & Eriksen, C. W. (1968). Interocular nonindependence. Perception and Psychophysics, 3, 93-96. Greenwald, A. G. (1975). Significance, nonsignificance, and interpretation of an ESP experiment. Journal of Experimental Sodal Psychology, 11, 180-191. Greenwald, A. G. (1976). Within-subjects design: To use or not to use? Psychological Bulletin, 83, 314-320. Gregory, R. L. (1970). The intelligent eye. London and New York: Weidenfeld. Gregory, R. L. (1974). Choosing a paradigm for perception. In E. C. Carterette & M. P. Friedman (Eds.), Handbook ofperception; Historical and philosophical roots ofperception (Vol. 1). New York: Academic Press. Gregory, R. L., & Heard, P. (1979). Border locking and CafiWali illusion. Perception, 8, 365-380. Grice, G. R., Canham, L., & Boroughs, J. M. (1983). Forest before trees? It depends where you look. Perception and Psychophysics, 33, 121-128. Grimson, W. E. L. (1981). From images to surfaces: A computational study ofthe human early visual system. Cambridge, MA: MIT Press.
REFERENCES
307
Grimson, w. E. L. (1984). On the reconstruction ofvisible surfaces. In S. Ullrnan & W. Richards (Ed.), Image Understunding, Norwood, NI: Ablex Publishing Corporation. Grossberg, S. (1983). Tbe quantized geometry of visual space: Tbe coherent computation of depth, form, and lightness. The Behavioral und Brain Sciences, 6, 625-692. Grossberg, S., & Mingolla, E. (1985). Neural dynamics ofperceptual groupings: Textures, boundaries, and emergent segmentations. Perception und Psychophysics, 38(2), 141-171. Haber, R. N. (1966). Nature ofthe effect ofset on perception. Psyehologieal Review, 73,335-351. Haber, R. N. (1969). Repetition, visual persistence, visual noise, and information processing. In K. N. Leibovic (Ed.), Information processing in the nervous system. New York: Springer-Verlag. Haig, N. D. (1985). How faces differ-A new comparative technique. Pereeption, 14, 601-615. Hammond, K. R. (1966). 1he psyehology ofEgon Brunswik. New York: Holt, Rhinehart, & Winston. Hammond, K. R., Hamm, R. M., & Grassia, I. (1986). Generalizing over conditions by combining the multitrait multimethod matrix and the representative design of experiments. Psyehological Bulletin, 100(27), 257-269. Harris, C. S. (1980). Visual eoding und adaptability. Hillsdale, NI: Lawrence Erlbaum Associates. Harris, I. R., Shaw, M. L., & Altom, M. I. (1985). Serial position curves for reaction time and accuracy in visual search: Tests of a model of overlapping processing. Pereeption und Psyehophysics, 38, 178-187. Harris, J. R., Shaw, M. L., & Bates, M. (1979). Visual search in multicharacter arrays with and without gaps. Pereeption und Psyehophysies, 26, 69-84. Hartline, H. K., & Ratliff, F. (1957). Inhibitory interaction ofreceptor units in the eye of Limulus. Journal of General Physiology, 40, 357-376. Hebb, D. O. (1949). The organization of behavior. New York: Wiley. Hecht, S., Shaler, S., & Pirenne, M. H. (1942). Energy, quanta, and vision. JournalofGeneral Physiology, 25, 819-840. Heimholtz, H. von (1948). On the sensations oftane as aphysiologieal basisfor the theory ofmusic. (A. J. Ellis Trans.). New York: (Originally published 1863). Heimholtz, H. von (1968). An address delivered on Founder's Day at Berlin University, August 3, 1878. In R. M. Warren & R. P. Warren (Eds.), Helmholtz on perception: Its physiology und development. New York: Wiley. Heimholtz, H. von (1968). Excerpts from Treatise on Physiological Optics (3rd ed.). In R. M. Warren & R. P. Warren (Eds.), Helmholtz on pereeption; Its physiology und development. New York: Wiley. Heison, H. H. & Fehrer, E. (1932). Tbe role ofform in perception. AmerieanJoumal ofPsychology, 44, 79-102. Hernandez, L. L., & Lefton, L. A. (1977). Metacontrast as measured under a signal detection method. Perception, 6, 695-702. Hinton, G. E., & Anderson, I. A. (1981). Parallel models of associative memory. Hillsdale, NI: Lawrence Erlbaum Associates. Hobbes, T. (1665). De Corpore. In W. Molesworth (Ed.), English Works ofThomas Hobbes. Oxford: Reprinted, 1961. Hochberg, J. E., & Peterson, M. A. (1985) Pereeptual couples as measures ofthe role ofloeal eues und intention in form pereeption. Unpublished manuscript. Hochberg, J. E., & McAlister, E. (1953). A quantitative approach to figural "goodness." Journal of Experimental Psychology, 46, 361-364. Hoffman, D. D., & Bennen, B. (1985). Inferring the relative 3-D positions oftwo moving points. Journal ofthe Optieal Society of America, 75,530-533. Hoffmann, D. D., & Bennen, B. (1986). The computation ofstructure from fixed axis motion: Rigid Structures. Biological Cybemeties, 54, 1-13. Hoffmann, D. D., & Finchbaugh, B. (1982). Tbe interpretation ofbiological motion. Biological Cybernetics, 42, 197-204.
308
REFERENCES
Hoffman, D. 0., & Richards, W. A. (1984). Parts of recognition. Cognition, 18, 65-96. Hoffman, J. (1980). Interactions between global and locallevels of form. Journal of Experimental Psychology: Human Perception & Performance, 6, 222-234. Hoffman, W. C. (1966). The Lie algebra ofvisual perception. Journal ofMathematical Psychology, 3,65-98. Hoffman, W. C. (1970). Higher visual perception as prolongation of the basic Lie transformation group. Mathematical Biosciences, 6, 437-471. Hoffman, W. C. (1971). Visual illusions of angle as an application of Lie transformation groups. Society for Industrial anti Applied Mathematics, 13, 169-184. Hoffman, W. C. (1978). The Lie transformation group approach to visual neuropsychology. In E. L. J. Leeuwenberg & H. Buffart (Eds.), Formal theories of visual perception. Chichester, England: Halsted. Hoffman, W. C. (1980). Subjective geometry and geometrie psychology. Mathematical Modeling, 1,349-367. Hoffman, W. C. (1985). Some reasons why algebraic topology is important in neuropsychology: Perceptual and cognitive systems as fibration. International Journal of Man Machine Studies, 22, 613-650. Hogben, J. H. (1972). Perception of visual pattern with components distributed in time. Unpublished doctoral dissertation, University of Western Australia. Horn, B. K. P. (1975). Obtaining shape from shading information. In P. H. Winston (Ed.), 1he psychology of computer vision. New York: McGraw-HilI. Horn, B. K. P. (1983). Extended gaussian images. MIT Artificial Intelligence Laboratory Memo No. 740, Cambridge, MA. Hubei, D. H. (1978). Vision and the brain. Bulletin ofthe American Academy of Ans anti Sciences, 31,17-28. Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurons in the cat's striate cortex. Journal of Physiology, 148, 574-591. Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocu1ar interaction, and functional architecture in the cat's visual cortex. Journal of Physiology, 160, 106-154. Hubel, D. H., & Wiesel, T. N. (1965). Receptive fields ofand functional architecture in two nonstriate visual areas (18 and 19) of the cat. Journal of Neurophysiology, 28, 229-289. Hughes, H. C. (1982). Search for the neural mechanisms essential to basic figural synthesis in the cat. In D. J. Ingle, M. A. Foodale, & R. J. W. Mansfield (Eds.), Analysis ofvisual behavior. Cambridge, MA: MIT Press. Hughes, H. C., Layton, W. M., Baird, ]. C., & Lester L. S. (1984). Global precedence in visual pattern recognition. Perception anti Psychophysics, 35, 361-371. Hurne, D. (1941). A treatise on human nature (L. Selby-Bigge, Ed.). New York: Oxford University Press. (Originally published 1739) Hurne, D. (1966). Enquiry concerning human understantIing (2nd edition; T.]. McCormack & M. W. Calkins, Eds.). LaSalle, IL: Open Court Publishing. (Originally published 1748) Humphreys-Owens, S. P. F. (1951). Physical principle underlying inorganic form. In L. L. Whyte (Ed.), Aspects ofform. New York: Pellegrini & Cudahy. Hurvich, L. M., ]ameson, 0., & Krantz, D. H. (1965). Theoretical treatments of selected visual problems. In R. D. Luce, R. R. Bush, & E. Galanter (Eds.), Handbook ofmathematics psychology (Vol. 111). New York: Wiley. Ingling, N. W. (1972). Categorization; A mechanism for rapid information processing. Journalof Experimental Psychology, 94, 239-243. Ittleson, W. H. (1952). 1he Ames demonstrations in perception. Princeton: Princeton University Press. James, W. (1950). 1he principles ofpsychology. New York: Dover. (Originally published, 1890). Jameson, 0., & Hurvich, L. M. (1972). Handbook ofsensory physiology: Visual psychophysics (Vol. VI1I4). Berlin: Springer-Verlag.
REFERENCES
309
Jenkins, B. (1982). Redundancy in the perception of bilateral symmetry in dot textures. Perception and Psychophysics, 32, 171-177. Jenkins, B. (1983a). Temporal limits to the detections of correlation in transpositionally symmetrie textures. Perception and Psychophysics, 33, 79-84. Jenkins, B. (1983b). Component processes in the perception of bilaterally symmetrie dot textures. Perception and Psychophysics, 34(5), 433-440. Jenkins, B. (1983c). Spatiallimits to the detection of transpositional symmetry in dynamic dot textures. Journal of Experimental Psychology: Human Perception and Performance, 9, 258-269. Jenkins, J. (1974). Remember that old theory of memory? Weil, forget it! American Psychologist, 29, 785-795. Jonides, J., & Gleitrnan, H. (1972). A conceptual category search in visual search: 0 as letter or as digit. Perception & Psychophysics, 12, 457-460. Jonides, J., & Gleitman, H. (1976). The benefit of categorization in visual search: Target location without identification. Perception & Psychophysics, 20, 289-298. Jordan, M. I. (1986). Serial order: A parallel distributed processing approach (ICS Report 8604). San Diego, CA: Institute for Cognitive Science, University of California at San Diego. Julesz, B. (1962). Visual pattern discrirnination. Institute of Radio Engineers Transactions on Information Theory, IT-8, 84-92. Julesz, B. (1971). Foundations of cyclopean perception. Chicago: The University of Chicago Press. Julesz, B. (1975). Experiments in the visual perception oftexture. Scientific American, 232(4), 34-43. Julesz, B. (1978). Perceptuallirnits oftexture discrirnination and their implications to figure-ground separation. In E. L. J. Leeuwenberg & H. F. J. M. Buffart (Eds.), Formal theories ofvisual perception, New York: Wiley. Julesz, B. (1981). Textons, the elements oftexture perception and their interactions. Nature, 200, 91-97. Julesz, B. (1983a). Textons, the fundamental elements in preattentive vision and perception of textures. Bell System Technical Journal, 62, 1619-1645. Julesz, B. (1983b). Adaptation in a peephole: A texton theory ofpreattentive vision. In L. Spillmann & B. R. Wooten (Eds.), Sensory experience, adaptation, and perception. Hillsdale, NJ: Lawrence Erlbaum Associates. Julesz, B. (1984). Abrief outline of the texton theory of human vision. Trends in Neurosciences, 7, 41-45. Julesz, B., & Bergen, J. (1983). Textons, the fundamental elements in preattentive vision and perception of texture. Bell System Technical Journal, 62, 1619-1645. Julesz, B., Gilbert, E. N., Shepp, L. A., & Frisch, H. L. (1973). Inability ofhumans to discrirninate between visual textures that agree in second order statistics-revised. Perception, 2, 391-405. Julesz, B., & Schumer, R. A. (1981). Early visual perception. Annual Review of Psychology, 32, 575-627. Kaas, J. H. (1978). The organization of visual cortex in primates. In C. Noback (Ed.), Sensory systems of primates. New York: Plenum Press. Kahn, J. 1., & Foster, D. H. (1981). Visual comparison ofrotated and reflected random-dot patterns as a function of their positional uncertainty and separation in the field. Quanerly Journal of Experimental Psychology, 33A, 155-166. Kahneman, D. (1968). Metacontrast: Method, findings, and theory in studies of visual masking. Psychological Bulletin, 70, 404-425. Kahneman, D., & Triesman, A. (1984). Changing views of attention and automaticity. In R. A. Parasuraman & D. R. Davies (Eds.), Varieties of attention. Orlando, FL: Acadernic Press. Kanizsa, G. (1974). Contours without gradients or cognitive contours. Italian Journal ofPsychology,
1, 93-112. Kanizsa, G. (1976). Subjective contours. Scientific American, 234(4), 48-52. Kanizsa, G. & Luccio, R. (1987). Pragnanz and its ambiguities. (personal Correspondence). Kant, I. (1781). Critique of Pure Reason. Riga.
310
REFERENCES
Kaufman, L. (1974). Sight and mind: An introduction to visual perception. New York: Oxford University Press. Kawabata, N., Yamagarni, K., & Noaki, M. (1978). Visual fixation points and depth perception. Visual Research, 18, 853-854. Kelly, D. H. (1972). Flicker. In D. Jameson & L. M. Hurvich (Eds.), Handbook ofsensory physiology: Visual psychophysics (Vol. VW4). New York: Springer-Verlag. Kendrick, K. M., & Baldwin, B. A. (1987). Cells in temporal cortex of conscious sheep can respond preferentially to the sight of faces. Science, 236, 448-450. Kidder, T. (1982). 1he soul of a new machine. New York: Avon. Kinchla, R. A. (1974). Detecting target elements in multi-element arrays: A confusibility model. Perception and Psychophysics, 1974, 15, 410-419. Kinchla, R. A., Solis-Macias, V., & Hoffrnan, J. (1983). Attending to different levels of structure in a visual image. Perception and Psychophysics, 33, 1-10. Kinchla, R. A., & Wolfe, J. (1979). The order ofvisual processing: "Top-down," "bottom-up," or "rniddle-out." Perception and Psychophysics, 25, 225-230. Kinsbourne, M., & Warrington, E. K. (1962). The effect of an after-corning random pattern on the perception of brief visual stimuli. 1he Quanerly Journal ofExperimental Psychology, 14, 223-234. Klein, R. M., & Barresi, J. (1985). Perceptual salience offorrn versus material as a function ofvariation in spacing and nurnber of elements. Perception and Psychophysics, 37, 440-446. Knuth, D. E. (1976). Mathematics and computer science: Coping with finiteness. Science, 194, 1235-1242. Koch, S. (1959). Epilogue. In S. Koch (Ed.), Psychology: A study ofa science (Vol. 3). New York: McGraw-Hill. Koffka, K. (1935). Principles of Gestalt psychology. New York: Harcourt, Brace. Köhler, w. (1947). Gestalt psychology: An introduction to the new concepts in modern psychology. New York: Liverright. (Originally published, 1929) Kolers, P. A. (1968). Some psychological aspects ofpattem recognition. In P. A. Kolers & M. Eden (Eds.), Recognizing patterns. Cambridge, MA: MIT Press. Kolers, P. A. (1970). The role of shape and geometry in picture recognition. In B. S. Lipkin & A. Rosenfeld (Eds.), Picture processing and psychopictorics. New York: Acadernic Press. Kolers, P. A., & Rosner, B. S. (1960). On visual masking (metacontrast): Dichoptic observation. American Journal of Psychology, 73, 2-21. Konorski, J. (1967). 1ntegrative activity ofthe brain. Chicago: University of Chicago Press. Kornblum, S. (1969). Sequential determinants of information processing in serial and discrete choice reaction time. Psychological Review, 76, 113-131. Korte, A. (1915). Kinematoskopische untersuchungen. Zeitschrift fur Psychologie, 72, 193-296. Krueger, L. E. (1978). A theory of perceptual matching. Psychological Review, 85, 278-304. Krueger, L. E. (1984). The category effect in visual search depends on physical rather than conceptual differences. Perception and Psychophysics, 35, 558-564. Krueger, L. E., & Chignell, M. H. (1985). Same-differentjudgements under high speed stress: Missingfeature principle predorninates in early processing. Perception and Psychophysics, 38(2), 188-193. Krurnhansl, C. L. (1978). Concerning the applicability of geometrie models to sirnilarity data: The interrelationship between sirnilarity and spatial density. Psychological Review, 85, 445-463. Krurnhansl, C. L. Independent processing ofvisual forrns and motion. Perception, 1984, 13, 535-546. Kubovy, M., & Pomerantz, J. R. (1981). Perceptualorganization. Hillsdale, NJ: Lawrence Erlbaurn Associates. Land, E. H. (1977). The retina theory of color vision. Scientific American, 237, 108-128. Land, E. H. (1983). Recent advances in retinex theory and some implications for cortical computations: Color vision and the natural image. Proceedings ofthe National Academy ofScience (USA),
80, 5163-5169. Land, E. H., & McCann, J. J. (1971). Lightness and retinex theory, Journal ofthe Optical Society of America, 61, 1-11.
REFERENCES
311
Lanze, M., Maguire, W., & Weisstein, N. (1985). Emergent features: A new factor in the objectsuperiority effect. Perception and Psychophysics, 38, 438-442. Lanze, M., Weisstein, N., & Harris, C. S. (1982). Perceived depth versus structural relevance in the object-superiority effect. Perception & Psychophysics, 31, 376-382. Lappin, J. S., Doner, J. F., & Kottas, B. (1980). Minimal conditions for the visual detection of structure and motion in three dimensions. Science, 209, 717-719. Lappin, J. S., & Fuqua, M. A. (1983). Accurate visual measurement ofthree-dimensional moving patterns. Science, 221, 480-482. Lappin, J. S., & Uttal, W. R. (1976). Does prior knowledge facilitate the detection ofvisual targets in random noise? Perception & Psychophysics, 20, 367-374. Lashley, K. S. (1942). The problem of cerebra! organization in vision. Biological Symposium, 7, 301-322. Lashley, K. S. (1950). In search of the engram. Society of Experimental Biology Symposium, No. 4. Physiological Mechanisms of Behavior, 454-482. Lashley, K. S., Chow, K. L., & Semmes, J. (1951). An examination ofthe electrical tield theory of cerebral integration. Psychological Review, 58, 123-136. Lawden, M. C. (1983). An investigation of the ability of the human visual system to encode spatial phase relationships. Vision Research, 12, 1451-1463. Lawton, T. B. (1984). The effect of phase structures on spatial phase discrimination. Vision Research, 24, 137-148. Leeuwenberg, E. (1971). A perceptual coding language for visual and auditory patterns. American Journal of Psychology, 84, 307-349. Lefton, L. (1973). Metacontrast: A review. Perception and Psychophysics, BOB), 161-171. Liam, A. (1973). Obiect and perceptual identity: Erroneous presuppositions in psychological studies of colour and space perception. Olso: University of OIso. Lindberg, D. C. (1976). 1heories of vision from Al-Kindi to Kepler. Chicago: University of Chicago Press. Ling, G., & Gerard, R. W. (1949). The normal membrane potential offrog sartorius tibers. Journal of Cellular and Comparative Physiology, 34, 383-385. Locke, J. (1975). An essay concerning human understanding. In P. H. Nidditch (Ed.), Clarendon edition ofthe works ofJohn Locke. Oxford: Oxford University Press. (Originally publisbed 1690) Lockhead, G. R. (1966). Effects of dimensional redundancy on visual discrimination. Journal ofExperimental Psychology, 72,95-104. Lockhead, G. R. (1970). Identitication and the form of multidimensional discrimination space. Journal of Experimental Psychology, 85, 1-10. Lockhead, G. R. (1972). Processing dimensional stimuli: A note. Psychological Review, 79,410-419. Lowry, E. M., & Oe Palma, J. J. (1961). Sine-wave response of the visual system: I. Tbe Mack phenomenon. Journal ofthe Optical Society of America. 51, 740-746. Luce, R. D. (1963). Detection and recognition. In R. D. Luce, R. Bush, & E. Galanter, (Eds.), Handbook of mathematical psychology (Vol. 1). New York: Wiley. Luckiesh, M. (1965). Visual illusions. New York: Dover. Luneberg, R. K. (1950). The metric ofbinocular visual space. Journal ofthe Optical Society ofAmerica, 40,627. Lupker, S. J. (1979). On the nature ofperceptual information during letter perception. Perception & Psychophysics, 25, 303-312. Lynch, G. (1986). Synapses, circuits, and the beginnings ofmemory. Cambridge, MA: MIT Press. Mach, E. (1959). 1he analysis of sensations (C. M. Williarns Trans.). New York: Dover. MacLeod,1. D. G., & Rosenfeld, A. (l972a). 1he visibility ofgratings: a space domain model. (Tech. Rep. No. 205). College Park, MD: University of Maryland. MacLeod, I. D. G., & Rosenfeld, A. (1972b). 1he visibility ofperiodic bar patterns: Prediction of aspace domain model. (Tech. Rep. No. 209). College Park, MD: University of Maryland. Mandelbrot, B. B. (1983).1he fractal geometry of nature. New York: Freeman.
312
REFERENCES
Marks, L. E. (1987). On cross-modal similarity: Auditory-visual interactions in speeded discrimination. Journal of Experimental Psychology: Human Perception & Performance, 13, 384-394. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: Freeman. Marr, D., & Nishihara, H. K. (1978). Representation and recognition of the spatial organization of three-dimensional shapes. Proceedings of the Royal Society of London, 200, 269-294. Marx, M. H., Hillix, W. A. (1973). Systems and theories in psychology. New York: McGraw-Hill. McCleliand, I. L. (1978). Perception and masking of wholes and parts. Journal of Experimental Psychology: Human Perception & Performance, 4, 210-223. McCleliand, I. L., Rumelhart, D. E., and the PDPresearch group. (1981). An interactive activation model of context effects in letter perception; Part 1. An account of Basic findings, Psychological Review, 88, 375-407. McCleliand, I. L., Rumelhart, D. E., and the PDP research group. (1986) Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 2: Psychological and biological models. Cambridge, MA: MIT Press. McCulioch, W. S., & Pitts, W. A. (1943). A logical calculus ofthe ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115-123. McFarland, I. H. (1965). Sequential part presentation: A method of studying visual form perception. British Journal of Psychology, 56, 439-446. McFarland, I. H., & Prete, M. (1969). The effect ofvisual context on perception of a form's parts as successive. Vision Research, 9, 923-933. Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow process of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806-834. Meisel, W. S. (1972). Computer oriented approaches to pattern recognition. In Mathematics in science und engineering. New York: Academic Press. Metzler, I., & Shepard, R. N. (1971, April). Mental co"elates ofthe rotation ofthree-dimensional objects. Paper presented at the Annual Meeting of the Western Psychological Association, San Francisco, CA. Metzler, J., & Shepard, R. N. (1974). Transformational studies of the interna! representation ofthreedimensional objects. In R. L. Solso, (Ed.), Theories of cognitive psychology: The Loyola Symposium. Hillside, NI: Lawrence Erlbaum Associates. Miller, G. A., Galanter E., & Pribram, K. N. (1960). Plans und the structure ofbehavior. New York: Henry Holt. Miller, J. (1981). Global precedence in attention and decision. Journal of Experimental Psychology, 7, 1161-1174. Minsky, M., & Papert, S. (1969). Perceptions: An introduction to computational geometry. Cambridge, MA: MIT Press. Monahan, I. S. & Lockhead, G. R. (1977). Identification of integral stimuli. Journal of Experimental Psychology: General, 106, 94-110. Moore, E. F. (1956). Gedanken-experlments on sequential machines. In C. E. Shannon & I. McCarthy (Eds.), Automata studies (pp. 129-153). Princeton, NI: Princeton University Press. Morotomi, T. (1981). Selective reduction in visibility ofa post target by an identical pre target masked by noise. Perception and Psychophysics, 30, 594-598. Mostow, G. D. (1950). The extensibility oflocal Lie groups oftransformations and groups on surfaces. Annals of Mathematics, 52, 606-636. Mountcastle, V. B., Lynch, J. C., Georopoulos, A., Sakata, H., & Acuna, C. (1975). Posterior parietal association cortex of the monkey: Command functions for operations within extrapersonal space. Journal of Neurophysiology, 38, 871-908. Nachmias, I., & Rogowitz, B. E. (1983). Masking by spatially-modulated gratings. Vision Research, 23, 1621-1629. Nachmias, I .. & Weber, A. (1975). Discrimination of simple and complex gratings. Vision Research, 15,217-222.
REFERENCES
313
Nachmias, 1., & Webster, A. (1983). Discrimination of simple and complex gratings. Vision Research, 23, 1621-1629. Nagano, T., & Fujiwara, M. (1979). A neural network model for thedevelopmentofdirection selectivity in the visuaI cortex. Biological Cybemetics, 32, 1-8. Nakatani, K. (1980). A model of pattern recognition by binary orthogonal transformation. Behaviormetrika, 2, 47-59. Navon, D. (1977). Forest before trees: The precedence of global features in visuaI perception. Cognitive Psychology, 9, 353-383. Necker, L. A. (1832). On an apparent change ofposition in a drawing of engraved figure of a crystaI. Phi/osophical Magazine, 1. Neisser, U. (1967). Cognitive pschology, New York: Appleton-Century-Crofts. Neisser, U. (1976). Cognition and reality. San Francisco: Freeman. Nickerson, R. S. (1965). Response times for "same"-"different judgments. Perceptual and Motor Skil/s, 20, 15-18. Nickerson, R. S. (1968). Note on "same" -"different" response times. Perceptual and Motor Ski/ls, 27, 565-566. Nickerson, R. S. (1969). "Same"-"different" response times: A model and a preliminary test. In W. G. Koster (Ed.), Attention and performance l/. New York: Academic Press. Nickerson, R. S. (1972). Binary c1assification reaction time: A review of some studies of human information processing capabilities. Psychonomic Monograph Supplements, (Whoie No. 65), 275-318. Nickerson, R. S. (1973). Frequency, recency, and repetition effects on same and different response times. Journal oj Experimental Psvchology, 101, 330-336. Nickerson, R. S. (1975). Effects of correlated and uncorrelated noise on visuaI pattern rnatching. In P. Rabbit & S. Dornic (Eds.), Attention and Performance V. New York: Academic Press. Nickerson, R. S. (1976). Short-terrn retention ofvisually presented stimuli: Some evidence ofvisual encoding. Acta Psychologica, 40, 153-162. Nickerson, R. S. (1978). On the time it takes to tell things apart. In J. Requin (Ed.), Attention and Performance VII. Hillsdale, NJ: Lawrence Erlbaum Associates, 77-88. Nosofsky, R. (1985). Overall simi1arity and the identification ofseparable dimension stimuli: A choice model analysis. Perception and Psychophysics, 38, 415-432. Ohzu, H., & Enoch, J. M. (1972). Optical modulation by the isolated human fovea. Vision Research, 12, 245-251. Olson, C. X., & Boynton, R. M. (1984). Dichoptic metacontrast masking reveals a central basis for monoptic chrornatic induction. Perception & Psychophysics, 35(4), 295-300. OIton, D. S., Branch, M., & Best, P. J. (1978). Spatial correlates ofhippocampal unit activity. Experimental Neurology, 58, 387-409. Osgood, C. E. (1957). Motivational dynarnics oflanguage behavior. In M. R. Jones (Ed.), Nebraska symposium on motivation. Lincoln: University of Nebraska Press. Oyarna, T., & Watanabe, T. (1983). Effect oftest-rnask simi1arity on forward and backward rnasking of patterns by patterns. Psychological Research. 45, 303-313. Paap, K. R., Newsome, S. L., McDonald, J. E., & Schvaneveldt, R. W. (1982). An activationverification model for letter and word recognition: The word-superiority effect. Psychological Review, 89, 573-594. Pachella, R. G., Smith, J. E. K., & Stanovich, K. E. (1978). Qualitative error analysis and speed classification. In N. J. Castellan & F. Restle (Eds.), Cognitive theory (Vol. 3). Hillsdale, NJ: Lawrence Erlbaum Associates. PaImer, S. E. (1975). VisuaI perception and world knowledge: Notes on a model of sensory-cognitive interaction. In D. A. Norrnan & D. E. Rumelhart (Eds.), Explorations in cognition. San Francisco: Freeman. Pavio, A. (1978). The relationship between verbal and perceptuaI codes. In E. C. Carterette & M. P. Friedman (Eds.), Handbook ojperception, Vol. III: Perceptual coding (pp. 376-397). New York: Academic Press.
314
REFERENCES
Perrett, D. 1., Rolls, E. T., & Caan, W. (1982). Visual neurones responsive to faces in the monkey temporal cortex. Experimental Brain Research, 47, 329-342. Peterson, M. A. (1986). IDusory concomitant motion in ambiguous stereograms: Evidence for nonstimulus contributions to perceptual organization. Journal 0/ Experimental Psychology: Human
Perception and Performance, 12, 50-60.
Petry, S., & Meyer, G. E. (1986). Adelphi international conference on illusory contours: Areport on the conference. Perception and Psychophysics, 39(3), 210-22l. Piaget, I. (1969). The mechanisms o/perception. (G. N. Seagrim Trans.). New York: Basic Books. Pieron, H. (1935). Le processus de metacontraste. Journal de Psychologie Normale et Pathologique, 32,5-24. Pinker, S. (1985). Visual cognition. Cambridge, MA: MIT Press. Pippenger, N. (1978). Complexity theory. Scientijic American, 238(6), 140-159. Pitts, W., & McCulloch, W. S. (1947). How we know universals: The perception of auditory and visual forms. Bulletin 0/ Mathematical Biophysics, 9, 127-147. Pollack, I. (1971). Perception of two-dimensional Markov coustraints within visual displays. Perception und Psychophysics, 9, 461-464. Pollack, I. (1972). Visual discrimination of "unseen" objects: Forced choice testing of MayznerTresselt sequential blanking effects. Perception und Psychophysics, 11, 121-128. Pollack, I. (1973). Discrimination of third-order Markov constraints within visual displays. Perception und Psychophysics, 13, 276-280. Pomerantz, I. R., & Kubovy, M. (1981). Perceptual Organization. Hillsdale, NI: Lawrence Erlbaum Associates. Pomerantz, I. R., Sager, C. S., & Stoever, R. I. (1977). Perception ofwholes and oftheir component parts: Some configural superiority effects. Journal 0/ Experimental Psychology: Human Perception & Performance, 3, 422-435. Popper, K. R., & Eccles, I. C. (1977). 1he self und its brain. New York: Springer-Verlag. Posner, M. I. (1969). Abstraction and the process ofrecognition. In G. Bower & I. T. Spence (Eds.), Psychology o/leaming und motivation (Vol. 3). New York: Academic Press. Posner, M. I. (1978). Chronometrie explorations o/mind. Hillsdale, NI: Lawrence Erlbaum Associates. Posner, M. 1., & Mitchell, R. F. (1967). Chronometrie analysis of classification. Psychological Review, 74, 392-409. Posner, M. 1., Boies, S. J., Eicheltnan, W. H., & Taylor, R L. (1969). Retention ofvisual and name codes of single letters. Journal 0/ Experimental Psychology, 79, 1-16. Posner, M. 1., Snyder, C. R. R, & Davidson, B. I. (1980). Attention and the detection of signals.
Journal 0/ Experimental Psychology: General, 109, 160-174.
Pratt, W. K. (1978). Digital image processing. New York: Wiley. Pribram, K. H., Nuwer, M., & Baron, R. J. (1974). The holographie hypothesis ofmemory structure in brain function and perception. In D. H. Krantz (Ed.), Contemporary developments in mathematical psychology (Vol. 11). San Francisco: Freeman. Price, H. H. (1940). Hume 's theory 0/ the external worM. Oxford: Clarendon Press. Prinzmetal, W., & Banks, W. P. (1977). Good continuation affects visual detection. Perception and Psychophysics, 21, 389-395. Proctor, R. W. (1981). A unified theory for matching-task phenomena. Psychological Review, 88, 291-326. Proctor, R W., & Rao, K. V. (1981). On the "misguided" use ofreaction timedifferences: A discussion of Ratcliff & Hacker (1981). Perception & Psychophysics, 31, 601-602. Pylyshyn, Z. W. (1979). Validating computational models: A critique of Anderson's indeterminacy of representation claim. Psychological Review, 86, 383-394. Pylyshyn, Z. W. (1981). The imagery debate: Analogue media versus tactic knowledge. Psychological Review, 87, 16-45. Raab, D. (1963). Backward masking. Psychological Bulletin, 60, 118-129. Ramachandran, V. S. (1985). The neurobiology of perception. Perception, 14, 97-103.
REFERENCES
315
Ranck, J. (I rn3). Studies on single neurons in dorsal hippocampal fonnation and septum in unrestrained rats. Part 1. Behavior correlates and firing repertoires. Experimental Neurology, 41,461-531. Ratcliff, R. (1981). A theory of order relations in perceptual matching. Psychological Review, 88, 552-572. Ratcliff, R., & Hacker, M. J. (l98Ia). Speed and accuracy of same and different responses in perceptuaJ matching. Perception & Psychophysics, 30, 303-307. Ratcliff, R., & Hacker, M. J. (198Ib). On the misguided use ofreaction time differences: A reply to Proctor & Rao. Perception & Psychophysics, 31, 603-604. Ratcliff, R., & Hacker, M. J. (1982). On the misguided use of reaction-time differences: A reply to Proctor and Rao (1982). Perception & Psychophysics, 31, 603-604. Ratliff, F. (1965). Mach bands: Quantitative studies on neural networks in the retina. San Francisco: Holden-Day. Ratliff, F., & Hartline, H. K. (1959). The responses of Limulus optic nerve fibers to patterns of illumination on the receptor mosaic. Journal of General Physiology, 42, 1241-1255. Reicher, G. M. (1969). Perceptual recognition as a function of the meaningfulness of stimulus material. Journal of Experimental Psychology, 81, 274-280. Reicher, G. M., & Wheeler, D. D. (1970). Processes in word recognition. Cognitive Psychology, 1, 59-85. Restle, F. A. (1961). Psychology of judgment and choice. New York: Wiley. Robinson, D. N. (1966). Disinhibition of visually masked stimuli. Science, 154. 157-158. Robinson, D. N. (1%8). Visual disinhibition with binocular and interoculat presentation. Journal of the Optical Society of America, 58, 254-257. Robinson, J. O. (1972). The Psychology of visual illusion. London: Hutchinson University Library. Robson, J. G. (1983). Frequency dornain visual processing. In O. J. Braddick & A. C. Sleigh (Eds.), Physical and biological processing of images. Berlin: Springer-Verlag. Rock, I. (1975). An introduction to perception. New York: MacMillan. Rock, I. (1983). The logic of perception. Carnbridge, MA: MIT Press. Rogers, T. D., & Trofanenko, S. C. (1979). On the measurement ofshape. Bulletin ofMathematical Biology, 41, 238-304. Rosch, E. (Irn5). Cognitive representations of sematic categories. Journal of Experimental Psychology: General, 104, 192-233. Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386-408. Rosenblatt, F. (1962). Principles of neurodynamics. Washington, DC: Spartan Books. Rosenfeld, A. (1%9). Picture processing by computer. New York: Academic Press. Rosenfeld, A., & Kac, A. C. (1982). Digital picture processing (Vols. 1& 2). Orlando, FL: Academic Press. Royce, J. R. (1974). Cognition and knowledge: Psychological epistemology. In E. C. Carterette & M. P. Friedman (Eds.), Handbook ofperception, Vol. I: Historical and philosophical roots of perception (pp. 149-176). New York: Academic Press. Rumelhart, D. E. (1977). Introduction to human information processing. New York: Wiley. Rumelhart, D. E., & McCleliand, J. L. (1982). An interactive activation model of context effects in letter perception: Part 2. The contextual enhancement effect and some tests and extension of the model. Psychological Review, 89, 60-94. Rurnelhart, D. E., McCleliand, J. L., and the PDP research group. (1986). Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1: Foundations. Carnbridge, MA: MIT Press. Ryle, G. (1949). The concept ofmind. New York: Barnes & Noble. Sachs, M. B., Nachmias, J., & Robson, J. G. (1971). Spatial frequency channels in human vision. Journal of the Optical Society 0/ America, 61, 1176-1186. Sagi, D., & Hochstein, S. (1983). Discriminability of suprathreshold compound spatial frequency gratings. Vision Research, 12, 1595-1606. Sakitt, B. (1972). Counting every quantum. Journal of Physiology, 223, 131-150.
316
REFERENCES
Santee, I. L., & Egeth, H. E. (1982). Independence versus interference in the perceptual processing of letters. Perception & Psychophysics, 31, 101-116. Schade, O. H. (1956). Optical and photoelectric analog of the eye. Journal of the Optical Society of America, 46, 721-739. Scheerer, E. (1973). Integration, interruption, and processing rate in visual backward masking: 1, Review. Psychologische Forschung, 36, 71-93. Schiller, P. H., & Smith, M. C. (1966). Detection in metacontrast. Journal ofExperimental Psychology, 71, 32-39. Schneider, W. (1987). Session I-Presidential Address. Connectionism: Is it a paradigm shift for psychology? Behavior Research Methods, Instruments, & Computers, 19(2), 73-83. Schneider, W., & Shiffrin, R. M. (1977). Controlled and automatie human information processing. I. Detection, search, and attention. Psychological Review, 84, 1-66. Schonemann, P. H., Dorcey, T., & Kienapple, K. (1985). Subadditive concatenation in dissirnilarity judgments. Perception & Psychophysics, 38, 1-17. Schwartz, E. L. (1977). Afferent geometry in the primate visual cortex and the generation ofneuronal trigger features. Biological Cybernetics, 28, 1-14. Schwartz, E. L. (1980). Computational anatomy and functional architecture of striate cortex: A spatial mapping approach to perceptual coding. Vision Research, 20, 645-669. Selfridge, O. (1959). Pundemonium: Aparadigmfor leaming. In Symposium on the mechanization of thought processes. London: HM Stationary Office. Selwyn, E. W. H. (1948). Photographic Journal, Sec. B., 88B. Shaw, M. L. (1978). A capacity allocation model for reaction time. Journal ofExperimental Psychology: Humon Perception & Perfomulnce, 4, 586-598. Shaw, R., & Bransford, I. (1977). Perceiving, acting, und knowing: Towards an ecological psychology. Hillsdale, NI: Lawrence Erlbaum Associates. Shaw, R. E., & Turvey, M. T. (1981). Coalitions as models for ecosystems: Arealist perspective on perceptual organization. In M. Kubovy & J. R. Pomerantz (Eds.), Perceptual organization (pp. 343-415). Hillsdale, NI: Lawrence Erlbaum Associates. Shepard, R. N. (1957). Stimulus and response generalization: A stochastic model relating generalization to distance in a psychological space. Psychometrika, 22, 325-345. Shepard, R. N. (1%2). The analysis of proximities: Multidimensional scaling with unknown distances. Part I. Psychometrika, 27, 125-140. Shepard, R. N. (1964). Attention and the metric structure ofthe stimulus space. Journal ofMathematical Psychology, 1, 54-87. Shepard, R. N. (1968). On tuming something over in one's mind. Unpublished manuscript, Stanford University . Shepard, R. N. (1978). Externalization of mental images and the act of creation. In B. S. Randhawa & W. E. Coffman (Eds.), Visualleaming, thinking, and communication. New York: Academic Press. Shepard, R. N. (1981). Psychophysical complementarlty. In M. Kubovy & I. R. Pomerantz (Eds.), Perceptualorganization (pp. 279-341). Hillsdale, NI: Lawrence Erlbaum Associates. Shepard, R. N., & Cooper, L. A. (1982). Mental images and their transformations. Cambridge, MA: MIT Press. Shepard, R. N., & Metzler, J. (1971). Mental rotation ofthree-dimensional objects. Science, 171, 701-703. Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatie human information processing. 11. Perceptuallearning, automatie attending, and a general theory. Psychological Review, 84, 127-190. Shor, R. (1971). Symbol processing speed differences and symbol interference effects in a variety of concept domains. Journal of General Psychology, 85, 187-205. Simon, H. A. (1979). Madels of thought. New Haven: Yale University Press.
REFERENCES
317
Smith, J. E. K. (1980). Models of identification. In R. S. Nickerson (Ed.), Attention anti Performance Vol. VI/I. Hillsdale, NI: Lawrence Erlbaum Associates. Smith, M. C., & Schiller, P. H. (1966). Forward and backward masking: A comparison. Canadian Journal of Psychology, 20, 191-197. Sneddon, I. N. (1976). Encyclopaedic dictionary of mathematics for engineers anti applied scientists. Oxford: Permagon. Snodgrass, J. G., & Townsend, I. T. (1980). Cornparing parallel and serial models: Tbeory and implementation. Journal of Experimental Psychology: Human Perception anti Performance, 6, 330-354. Snyder, C. R. R. (1972). Selection, inspection and naming in visuaI research. Journal of Experimental Psychology, 92, 428-431. Sperling, G. (1960). Tbe information available in brief visuaI presentations. Psychological Monographs: General anti Applied, 74, 1-29. Sperling, G. (1963). A model for visuaI memory tasb. Human Factors, 5, 19-31. Sperling, G. (1981). Mathernatical models ofbinocular vision. In S. Grosberg (Ed.), Mathematical psychology anti psychophysiology. Providence, RI: American Mathernatical Society. Sperry, R. W., Miner, R., & Meyers, R. E. (1955). VisuaI pattern perception following subpail slicing and tantalum wire implantations in the visuaI cortex. Journal of Comparative anti Physiological Psychology, 48, 50-58. Staller, J. D., & Lappin, I. S. (1981). Tbe visuaI detection of multi-letter patterns. Journal of Experimental Psychology: Human Perception anti Performance, 7, 1258-1272. Sternberg, S. (1966). High speed scanning in human memory. Science, 153, 652-654. Sternberg, S. (1967). Two operations in character recognition. Some evidence from reaction time experiments. Perception & Psychophysics, 2, 45-53. Stevens, K. A. (1983). Slant-Tilt: Tbe visuaI encoding ofsurface orientation. Biological Cybemetics, 46, 183-195. Stevens, K. A. (1984). On gradients and texture gradients. Journal of Experimental Psychology: General, 113, 17-220. Stevens, K. A. (1986). Inferring shape from contours across surfaces. In A. P. Pentland (Ed.), From pixels to predicates. Norwood, NI: Ablex. Stevens, S. S. (1951). Mathematies, 'measurement, and psychophysics. In S. S. Stevens (Ed.), Handbook of experimental psychology. New York: Wiley. Stigler, R. (1910). Chronophotische studien uber den umgebungs kontrast. Pfluuger's Archiv fur die Gesante Physiologie des Menschen und der Tiere, 134, 365-435. Stockmeyer, L. J., & Chandra, A. K. (1979). Intrinsically difficult problems. Scientific American, 240, 140-159.
Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643-662. Swenson, R. G., & Judy, P. F. (1981). Detection of noisy visuaI targets: Models for the effects of spatial uncertainty and signal to noise ration. Perception & Psychophysics, 29, 521-534. Teller, D. Y. (1984). Linking propositions. Vision Research, 24, 1233-1246. Tbomas, J. (1978). Binocular rivalry: Tbe effects of orientation and pattern color arrangement. Perception & Psychophysics, 23, 360-362. Tbornas, I. P., Gille, I., & Barker, R. A. (1982). Simultaneous visuaI detection and identification: Tbeory and data. Journal of the Ophthalmology Society of America, 72(12), 1642-1650. Tbompson, D. W. (1917). On Growth anti Form. Cambridge: Cambridge, University Press. Titchener, E. B. (1896). An outline ofpsychology. New York: MacMillan. Todd, J. T. (1985). Perception of structure from motion. Is projective correspondence of moving elements a necessary condition. Journal of Experimental Psychology: Human Perception anti Performance, 11,689-710. Tolansky, F. R. S. (1964). Optical illusions. Oxford: Pergamon Press. Torgenson, W. S. (1958). Theory anti methods of scaling. New York: Wiley.
318
REFERENCES
Torgenson, w. S. (1965). Multidimensional scaling of similarity. Psychometrika, 30, 379-393. Townsend, J. T. (1971a). Tbeoretica1 analysis of an alphabetic confusion matrix. Perception & Psychophysics, 9, 40-50. Townsend, J. T. (1971b). Alphabetic confusion: A test of models for individuals. Perception & Psychophysics, 9, 449-454. Townsend, J. T., & Ashby, F. G. (1982). Experimental test of contemporary mathematical models ofvisualletter recognition. Journal 01 Experimental Psychology: Human Perception anti Performance, 8, 834-864. Townsend, J. T., & Landon, D. E. (1983). Mathematica1 models of recognition and confusion in psychology. Mathematical Social Sciences, 4, 25-71. Trevarthen, C. B. (1968). Two mechanisms of vision in primates. Psychologische Forschung, 31, 299-337. Triesman, A. (1986). Features and objects in visual processing. Scientific American, 255, 114B-126. Triesman, A., & Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, 12,97-136. Triesman, A., & Patterson, R. (1984). Emergent features, attention, and object perception, Journal 01 Experimental Psychology: Human Perception anti Performance. 10, 12-31. Triesman, A., & Schmidt, H. (1982). IDusory conjunctions in the perception of objects. Cognitive Psychology, 14, 107-141. Triesman, A., & Souther, J. (1985). Search Assymetry: A diagnostic for preattentive processing of separable features. Journal 01 Experimental Psychology: General, 114(3), 285-310. Triesman, A., Sykes, M., & Gelade, G. (1977). Selective attention and stimulus integration. In S. Dornic (Ed.), Attention anti performance, VI. Hillsdale, NJ: Lawrence Erlbaum Associates. Tulving, E. (1979). Memory research: What kind ofprogress? In L. G. Nilsson (Ed.), Perspectives in memory research. Hillsdale, NJ: Lawrence Erlbaum Associates. Turing, A. M. (1936). On computable numbers with an application to the entscheidungs problem. Proceedings 01 the London Mathematical Society (series 2), 42, 230-265. Turvey, M. T. (1973). On peripheral and central processes in vision information processing analysis of masking with pattemed stimuli. Psychological Review, 80, 1-52. Turvey, M. T. (1977). Contrasting orientations to the theory of visual information processing. Psychological Review, 84, 67-88. Turvey, M. T., & Shaw, R. E. (1979). Tbe primacy ofperceiving: An ecological reformulation of perception for understanding memory. In L. G. Nilsson (Ed.), Perspectives on memory research: essays in honor 01 Uppsala University's 500th anniversary. Hillsdale, NJ: Lawrence Erlbaum Associates. Turvey, M. T., Shaw, R. E., & Mace, W. (1978). Issues in the theory ofaction: Degrees offreedom, coordinative structures, and coalitions. In J. Requin (Ed.), Attention anti performance, VII. Hillsdale, NJ: Lawrence Erlbaum Associates. TverskY, A. (1977). Features of Similarity. Psychological Review, 84, 327-352. TverskY, A., & Gati, I. (1982). Similarity, separability, and the triangle inequality. Psychological Review, 89, 123-154. TverskY, A .. & Hutchinson, J. W. (1986). Nearest neighbor analysis of psychological spaces. Psychological Review, 93, 3-22. Ullman, S. (1979). 1he interpretation 01 visual motion. Cambridge, MA: MIT Press. UttaI, W. R. (1969a). Masking ofalphabetic character recognition by dynamic visual noise (DYN). Perception & Psychophysics, 6, 121-128. UttaI, W. R. (1969b).Tbe character in the hole experiment: Interaction of forward and backward masking of alphabetic character recognition by dynamic visual noise (DYN). Perception & Psychophysics, 6, 177-181. Utta1, W. R. (1970a). Masking of alphabetic character recognition by ultrahigh-density dynamic visual noise. Perception & Psychophysics, 7, 19-22. UttaI, W. R. (1970b). Yiolations ofvisual simultaneity. Perception & Psychophysics, 7,133-136.
REFERENCES
319
Uttal, W. R. (197Oc). On the physiological basis of masking with dotted visuaI noise. Perception & Psychophysics, 7, 321-327. Uttal, W. R. (l97Od). A masking approach to the problem of form perception. Perception & Psychophysics, 9, 296-298. Uttal, W. R. (1973a). Tbe effect of deviations from linearity on the detection of dotted line patterns. Vision Research, 13, 2155-2163. Uttal, W. R. (1973b). 1he psychobiology of sensory coding. New York: Harper Row. Uttal, W. R. (1975). An autocorrelation theory ofform detection. Hillsdale, NI: Lawrence Erlbaum Associates. Uttal, W. R. (1976). VisuaI spatial interactions between dotted line segments. Vision Research, 16, 581-586. Uttal, W. R. (1977). Complexity effects in form detection. Vision Research, 17, 359-365. Uttal, W. R. (1978). 1he psychobiology ofmind. Hillsdale, NI: Lawrence Erlbaum Associates. Uttal, W. R. (1981). A taxonomy ofvisual processes. Hillsdale, NI: Lawrence Erlbaum Associates. Uttal, W. R. (1983). Visualform detection in 3-dimensional space. Hillsdale, NI: Lawrence Erlbaum Associates. Uttal, W. R. (1985). 1he detection ofnonplanar surfaces in visual space. Hillsdale, NI.: Lawrence Erlbaum Associates. Uttal, W. R. (1987). 1he perception ofdottedforms. Hillsdale, NI: Lawrence Erlbaum Associates. Uttal, W. R., Bunnell, L. M., & Corwin, S. (1970). On the detectability of straight lines in visuaI noise: An extension of French's paradigm into the millisecond domain. Perception & Psychophysics, 8, 385-388. Uttal, W. R., Davis, N. S., Welke, C. L., & Kakarala, R. (in press). Tbe reconstruction of static visuaI forms from sparse dotted sampies. Perception and Psychophysics. Uttal, W.R,. & Hieronymous, R. (1970). Spatio-temporal effects in visuaI gap detection. Perception & Psychophysics, 8, 321-325. Van Essen, D. C. (1985). Functional organization of primate visuaI cortex. Cerebral Conex, 3, 259-329. van Meeteren, A., & Barlow, H. B. (1981). The statistical efficiency for detecting sinusoidal modulation of average dot density in random figures. Vision Research, 21, 765-777. von Neumann, J. (1951). Thegeneral and logical theory ofautomata. In L. A. Ieffries (Ed.), CerebraJ mechanisms in behavior: The Hixon symposium. New York: Wiley. Van Tuijl, H. F. I. M. (1975). A new visuaI illusion: Neonlike color spreading and complementary color induction between subjective contours. Acta Psychologica Amsterdamn, 39, 441-445. Wade, N. J. (1978). Why do pattemed afterimages fluctuate in visibility? Psychological Bulletin, 85, 238-352. Wagner, M. (1985). Tbe metric of visuaI space. Perception & Psychophysics, 38(6), 483-495. Wallach, H. (1939). On constancy of visuaI speed. PsychologicaJ Review, 46, 541-552. Wallach, H. (1948). Brightness constancy and the nature of achromatic colors. JOUrnal L.~ Experimental Psychology, 38, 310-324. Ward, I. (1985). Individual differences in processing stimulus dimensions: Relation to selective processing abilities. Perceptual & Psychophysics, 37, 471-482. Warren, R. M., & Warren, R. P. (1968). Helmholtzonperception: 1tsphysiologyanddevelopment. New York: Wiley. Watanabe, S. (1985). Pattern recognition: Human and mechanical. New York: Wiley. Watson, A. B. (1982). Summation of grating patches indicate many types of detector at one retinal location. Vision Research, 22, 17-25. Watson, A. B. (1983). Detection and recognition of simple spatial forms. In O. I. Braddick & A. C. Sleigh (Eds.), Physical and biological processing of images. Berlin: Springer-Verlag. Watson, A. B., & Ahumada, A. J. (1985). Model of human visual-motion sensing. Journal of the Optical Society of America, 2, 322-341.
320
REFERENCES
Webster's new world dictionary. (1976). (2d college ed. D. B. Gwalink, Editor-in-Chief). Cleveland: William Collins & World Publishing. Weisstein, N. (1972). Metacontrast. In D. Jameson & L. M. Hurvich (Eds.), Handbook of sensory physiology: Visual psychophysics (Vol. VW4). New York: Springer-Verlag. Weisstein, N. (1980). The joy of Fourier analysis. In C. S. Harris (Ed.), Visual coding and adaptability (pp. 365-380). Hillsda1e, NJ: Lawrence Erlbaum Associates. Weisstein, N., & Harris, C. S. (1974). Visual detection of line segments: An objeet-superiority effeet. Science, 186, 752-755. Weisstein, N., & Harris, C. S. (1980). Masking and the unmasking of distributed representations in the visual system. In C. S. Harris (Ed.), V/Sual coding and adaptability (pp. 317-364). Hillsda1e, NJ: Lawrence Erlbaum Associates. Weisstein, N., Jurkens, T., & Ondersin, T. (1970). Effeet offorced choice vs. magnitude-estimation measures on the waveform of metacontrast type functions. Journal of the Optical Society ofAmerica,
60, 978-980. Weisstein, N., Williams, M. C., & Harris, C. S. (1982). Depth, connectedness, and structura1 relevance in the objeet superiority effeet: Line segments are harder to see in flatter patterns. Perception,
11, 5-17. Werner, H. (1935). Studies on contour: I. Qualitative analysis. American Journal of Psychology, 47,40-64. Werner, H., & Wapner, S. (1952). Towards a general theory ofperception. Psychological Review, 59, 324-338. Wertheimer, M. (1923). Unterchungen zur Lehre von der Gestalt, 11. Psychologische Forschung, 4, 301-350. Wheeler, D. D. (1970). Processes in word recognition. Cognitive Psychology, 1, 59-85. Whorf, B. L. (1956). Language, thought, and reality. Cambridge, MA: Teehnology Press. Whyte, L. L. (1951). Aspects of Form. New York: Pellegrini & Cudahy. Williams, A., & Weisstein, N. (1978). Line segments are perceived better in coherent context than alone: An objeet-line effeet. Memory and Cognition, 6, 85-90. Wolford, G. (1975). Perturbation model for letter identification. Psychological Review, 82, 84-199. Yang, H. S., & Kac, A. C. (1986). Determination of the identity, position, and orientation of the topmost object in a pile. Computer Vision, Graphics and 1mage Processing, 36, 229-255. Young, M. J. (1985). Detection of dottedforms in a structured visual noise environment. Unpublished master's thesis, University of Michigan. Zusne, L. (1970). Visual perception ofform. New York: Academic Press. Zusne, L. (1975). Form perception bibliography 1968-1973. JSAS Catalog of Selected Documents in Psychology, 5, 272. Zusne, L, (1981). Form perception bibliography 1974-1978. JSAS Catalog of Selected Documents in Psychology, 11, 48-49.
AUTHORINDEX
Numbers in italics denote reference citations.
A
Abadi, R. V. 78, 300 Ableson, R. P. 106, 300 Abu-Mostafa, Y. S. 38, 300 Acuna, C. 32, 312 Adelson, E. H. 90, 300 Ahurnada, A. J. 90, 319 Aiba, T. S. 43, 300 Albrecht, P. G. 117, 304 Allik, J. 158, 301 Altorn, M. J. 244, 245, 307 Alpern, M. 158, 300 Andersen, G. J. 259, 302 Anderson, J. A. 35,94,215, 307 Anderson, J. R. 35, 94, 215, 300 Andreassi, J. L. 158, 300 Andrews, H. C. 84, 300 Applernan, J. B. 228, 300 Arbib, M. A. 106, 300 Armstrong, D. M. 67, 300 Arnheirn, R. 20, 300 Arnoult, M. D. 21, 300 Ashby, F. G. 226, 268, 318 Attneave, F. 21, 179,300 Auslander , L. 91, 300
B
Bachmann, T. 46, 158, 301 Baird, J. C. 172,308 Baldwin, B. A. 79, 310 Bamber, D. 209, 301 Banks, W. P. 145, 155, 156, 301, 314 Barker, R. A. 165, 317 Barlow, H. B. 78, 90, 122, 143, 144, 188, 191,
301,302,303,319 Baron, R. J. 38, 82, 249, 301, 314 Barresi, J. 168, 310 Bates, M. 244, 307 Beck, J. 153, 188, 196, 292, 301 Becker, B. B. 158 Beller, H. K. 210, 301 Bennett,B. M. 57,68,259,272,281,301,302,
307 Bergen, J. R. 45, 90,196,197,300,301,309 Berkeley, G. 39, 67, 301 Bernstein, I. H. 158, 301 Best, P. J. 32, 313 Beyda, D. 158, 300 Biederman, I. 273, 301 Birdsall, T. G. 165, 306 Bjork, E. L. 44, 155, 156, 302
321
322
AUTHOR INDEX
Blake, R. 128, 303 Blakemore, C. 118, 302 Blank, A. A. 40, 302 Blecker, D. 203, 304 Bodinger, D. 155, 301 Boies, S. J. 204, 314 Boroughs, J. M. 171, 172, 306 Bouma, H. 268, 269, 302 Boynton, R. M. 130, 161, 213, 302, 313 Braneh, M. 32, 313 Brand,247 Bransford, J. 54, 73, 74, 316 Braunstein, M. L. 4, 5, 59, 60, 106, 259, 302 Breitmeyer, B. G. 78, 143, 147, 150,302 Briek, D. B. 86, 302 Bridgeman, B. 78,280,302 Broadbent, D. 170, 302 Brown, D. R. 23, 302 Bruner, J. S. 106, 302 Brunswik, E. 50, 51, 101, 293, 302 Bunge, M. 302 Bunnell, L. M. 143, 319 Burgess, A. E. 143, 188, 191, 302, 303 Burks, A. W. 62, 303
c Caan, W. 79, 314 Caelli, T. M. 75, 76, 77, 143, 189, 193, 194, 197, 198, 303 Campbell, D. T. 293, 303 Campbell, F. W. 79, 118, 121, 164,302,303,
305 Canham, L. 171, 172, 306 Cannon, T. M. 84,303 Cattell, J. M. 249, 303 Chandra, A. K. 4, 277, 278, 317 Cheatham, P. G. 158, 303 Cheeseman, J. 165, 303 Cherry, C. 17,303 Chignell, M. H. 209, 310 Chomsky, N. 8, 35, 303 Chow, K. L. 82, 311 Clement, D. E. 136,306 Coffin, S. 24, 128, 303 Cohen, P. R. 16, 303 Cohn, P. M. 91, 303 Cooper, L. A. 212, 213, 215, 316
Coren, S. 55, 303 Cormaek, R. 128, 303 Corwin, S. 143, 319 Cox, D. R. 158, 285, 304 Crawford, B. H. 149, 304 Cronbaeh, L. J. 293, 304 Crutehfield, J. P. 285, 286, 287, 304 Cutting, J. E. 274, 304
o Daleq, A. M. 20, 304 Daugman, J. G. 128, 129, 130, 304 Davidson, B. J. 127,314 Davis, E. T. 127, 128,304 Davis, N. S. 261, 319 Dember, 158 DePalma, J. J. 118, 311 d'Espagnet, B. 27, 304 DeValois, R. L. 117,304 DeWald, C G. 273, 306 Diener, D. 14, 165, 304 Dodwell, P. C. 17, 20, 76, 82, 303, 304 Doehrman, S. 14, 165, 304 Donders, F. C. 205, 304 Doner, J. F. 259, 261, 311 Dorcey, T. 186, 316 Dreyfus, H. L. 58, 59, 304 Duda, R. O. 236, 237, 304 Duff, M. J. B. 19, 304 Dunean, J. 248, 304
E Earhard, B. 250, 304 Eccles, J. C. 66, 304, 314 Egeth, H. E. 44, 203, 247, 304, 316 Eiehelman, W. H. 204, 314 Einhorn, H. J. 293, 304 Elworth, C. L. 213, 302 Emerson, P. L 98,304 Enns, J. T. 250, 251, 304 Enoeh, J. M. 123 Enroth-Cugell, C. 152, 304 Epstein, S. 293, 304 Eriksen,C. W'44' 152, 158, 160, 165,305,306 Estes, W. K. 44, 145, 155, 156, 305
AUTHOR INDEX F Falzet!, M. 144, 305 Farmer, J. D. 285, 286, 287, 304 Fehrer, E. 108, 109, 307 Feigenbaum, 16, 303 Felfoldy, G. L. 43, 306 Finchbaugh, B. 259, 308 Fitts, P. M. 21, 305 Fodor, J. A. 59, 305 Foster, D. H. 134, 273, 305, 309 Freernan, W. J. 82,305 French, R. S. 132, 305 Frisch, H. L. 187, 189, 190, 192, 309 Fu, K. S. 228, 232, 237, 305 Fujiwara, M. 89, 313 Fukushirna, K. 88, 133, 305 Fuqua, M. A. 259, 311 Furcher, C. S. 164, 305
G Galanter, 106 Ganz, L. 78, 302, 305 Gardner, E. P. 153, 305 Gardner, G. T. 44, 305 Gamer, W. R. 42, 43,136, 166,291,305,306 Gati, I. 184, 186, 318 Geiger, G. 159, 160, 306 Gelade, G. 163, 172, 188, 318 Georopoulos, A. 32, 312 Gerard, R. W. 279, 311 Geyer, L. H. 273, 306 Gibson, J. J. 17, 74, 189,306 Gilbert, E. N. 187, 189, 190, 192, 193, 194,303,
309 Gille, J. 165, 317 Ginsburg, A. P. 144, 267, 306 Girgus, J. S. 55, 303 Glass, A. L. 12, 306 Glass, L. 12, 292, 306 Gleitman, H. 247, 248, 306, 309 Goldmeier, E. 133, 168, 170,306 Goldstine, H. H. 62, 303 Graharn, N. 119, 125, 127, 128, 129,304,306 Granger, G. W. 43, 300 Grassia, J. 293, 307 Grassrnan, H. 291, 306
323
Green, D. M. 14, 165, 306 Greenspon, T. S. 152, 306 Greenwald, A. G. 293,306 Gregory, C. C. L. 20 Gregory, R. L. 106, 292, 306 Grice, G. R. 171, 172, 306 Grimson, W. E. L. 256, 258, 263, 307 Grossberg, S. 87, 89, 163, 196,291,292, 307 Growney, 159
H Haber, R. N. 142, 160, 307 Hacker, M. J. 35, 45, 315 Haig, N. D. 293, 307 Hamm, R. M. 293, 307 Hammond, K. R. 293, 307 Harris, C. S. 80, 170, 250, 307, 311, 320 Harris, J. R. 244, 245, 307 Hart, P. E. 236,237,304 Hartline, H. K. 152, 307, 314 Heard, P. 292, 306 Hebb, D. O. 44, 68, 82, 173, 307 Hecht, S. 109, 307 Heimholtz, H. von 39, 99, 100, 307 Helson, H. H. 108, 109, 307 Hemandez, L. L. 158, 307 Hieronyrnous, R. 143, 319 Hillix, W. A. 51, 312 Hinton, G. E. 94, 307 Hobbes, T. 66, 307 Hochberg, J. E. 23, 142, 307 Hochstein, S. 126, 315 Hoffman, D. D. 57, 68, 259, 272, 281, 301, 302,
307, 308 Hoffman, J. 158, 171, 172,308,310 Hoffman, J. E. 158, 171, 172, 305 Hoffman, M. 32, 160, 305 Hoffman, W. C. 63, 91, 273, 308 Hogarth, R. M. 293, 304 Hogben, J. H. 151, 308 Holyoak, K. J. 12, 306 Horn, B. K. P. 262, 308 Hubel, D. H. 32, 52, 79,90,92, 117, 308 Hughes, H. C. 143, 172, 308 Hurne, D. 68, 308 Hurnphreys-Owens, S. P. F. 20, 308 Hunt, B. R. 84, 300, 303
324
AUTHOR INDEX
Hurvich, L. M. 130, 291, 308 Hutchinson, J. W. 184, 318
J lliige, M. 155, 301 Ingling, N. W. 247, 308 Ittleson, W. H. 101, 308
Köhler, W. 22, 73, 168, 310 Kolers, P. A. 160, 161, 266, 310 Konorski, J. 79, 310 Kombium, S. 210, 310 Korte, A. 291, 310 Kottas, B. 259, 261, 311 Kramer, P. 127,304 Krantz, D. H. 291, 308 Krueger,L.E.203,204,206,208,209,248,310 Krurnhansl, C. L. 43, 179, 185, 227, 310 Kubovy, M. 17,218,310, 314
J Jameson, D. 130, 291, 308 Jenkins, B. 144, 309 Jenkins, J. 293, 309 Jennings, R. J. 191, 302, 303 Jonides, J. 247, 248, 304, 306, 309 Jordan, M. I. 246, 309 Judy, P. F. 128,317 Julesz, B. 45, 90,125,129,143,144,145,187, 1M, 190, ~2, ~3, ~~ ~5, 1%, 1~,
301, 303, 309 Jurgens, T. 158, 320
K Kaas, J. H. 220, 309 Kac, A. C. 84, 315, 320 Kahn, J. I. 273, 305, 309 Kahneman, D. 142, 150, 158, 160, 309 Kakarala, R. 261, 319 Kanizsa, G. 22, 45, 267, 309 Kant, I. %, 309 Kaufman, L. 14, 105, 310 Kawabata, N. 98, 310 Kelly, D. H. 130,310 Kendrick, K. M. 79, 310 Kidder, T. 81,310 Kienapple, K. J. 186, 316 Kinchla, R. A. 169, 170, 171, 310 Kinsboume, M. 150, 310 Klein, R. M. 168, 310 Klingberg, C. L. 213, 302 Knuth, D. E. 4, 310 Kubovy, M. 20, 310 Koch, S. 310 Koffka, K. 22, 73, 154, 167, 310
L Land, E. H. 310 Landon, D. E. 226, 227, 318 Lanze, M. 250, 311 Lappin, J. S. 40, 44, 128, 144, 160,259,261,
305,311,317 Lashley, K. S. 38, 82,216,311 Lawden, M. C. 126, 311 Lawton, T. B. 127,311 Layton, W. M. 172, 308 Leeuwenberg, E. 22, 311 Lefton, L. A. 158, 307, 311 Leonard, J. A. 21, 305 Lester, L. S. 172, 308 Lettvin, J. Y. 159, 160, 306 Levick, W. R. 90, 301 Liam, A. 21, 311 Lindberg, D. C. 96, 311 Ling, G. 279, 311 Locke, J. 67, 311 Lockhead, G. R. 43, 166, 170, 273, 311, 312 Lowry, E. M. 118, 311 Luccio, R. 22, 309 Luce, R. D. 180, 230, 311 Luckiesh, M. 55, 311 Luneberg, R. K. 40, 311 Lupker, S. J. 268, 269, 311 Lynch, G. 216, 311 Lynch, J. C. 32, 312
M Mace, W. 74, 318 Mach, E. 69, 81, 152, 311 MacKenzie, R. E. 91, 300 MacLeod, I. D. G. 129, 311
AUTHOR INDEX Maguire, W. 250, 311 Mandelbrot, B. B. 18, 311 Marks, L. E. 207, 312 Marr, D. 11, 85,236, 255, 256, 257, 312 Marshali, P. H. 158, 305 Marx, M. H. 51, 312 Mayzner, M. S. 158, 228, 300 McAlister, 23 McCann, J. J. 310 McClelland, J. L. 94, 251, 252, 272, 273, 312, 315 McCulloch, W. S. 39, 81, 312, 314 McDonald, J. E. 252, 313 McFariand, J. H. 151, 312 Meehl, P. E. 293, 312 Meisel, W. S. 312 Merikle, P. M. 165, 303 Metzler, J. 214, 312, 316 Meyer, G. E. 267, 314 Miller, G. A. 106, 312 Miller, J. 170, 171,312 Miner, R. 82, 317 Mingolla, E. 163, 196, 292, 307 Minsky, M. 88,271, 312 MitchelI, R. F. 204, 314 Miyake, S. 88, 305 Mollon, J. D. 122, 301 Monahan, J. S. 166, 170, 312 Moore, E. F. 4, 5, 36, 171, 176,215,284,312 Morotomi, T. 161, 312 Mostow, G. D. 91, 312 Mountcastle, V. B. 32, 312 Murray, J. T. 44, 155, 156,302
N
Nachmias, J. 119, 125, 126, 128,306,312,313, 315 Nagano, T. 89, 313 Nakatani, K. 313 Navon, D. 168, 205, 268, 269, 313 Necker, L. A. 98, 313 Neisser, U. 74, 102, 227, 294, 313 Newsome, S. L. 252, 313 Nickerson, R. S. 199, 202, 210, 313 Nishihara, H. K. 236, 312 Noaki, M. 98, 310 Nosofsky, R. 166 Nuwer, M. 38, 82, 314
325
o Ohzu, H. 123, 313 Olson, C. X. 161 Olton, D. S. 32, 313 Ondersin, T. 158, 320 Onley, J. W. 213, 302 Osgood, C. E. 35, 313 Owen, D. H. 23, 302 Oyama, T. 156, 313
p Paap, K. R. 252, 272, 313 Pachella, R. G. 230, 313 Packard, N. H. 285, 286, 287, 304 Palmer, S. E. 170, 313 Paper!, S. 88, 271, 312 Pavio, A. 34, 35, 313 Patterson, R. 145, 172, 174, 196, 318 Perrett, D. I, 79 Peterson, M. A. 141, 142,307, 314 Petry, S. 267,314 Piaget, J. 105, 314 Pieron, H. 158, 314 Pinker, S. 234, 236, 314 Pippenger, N. 314 Pirenne, M. H. 109, 307 Pitts, W. A. 39, 81, 312, 314 Pollack, I. 161, 188, 192, 314 Pomerantz, J. R. 17,20,218,250, 310, 314 Popper, K. R. 66, 314 Posner, M. I. 127, 199, 204, 205, 208, 314 Postman, L. 106, 302 Prakash, C. 57, 68, 301 Pratt, W. K. 84, 116, 118, 314 Prazdny, K. 196, 292, 301 Prete, M. 151, 312 Pribram, K. H. 38, 82, 106, 314 Price, H. H. 68, 314 Prinzmetal, W. 145, 155, 156,250,251,301,
304,314 Proctor, R. W. 35, 158, 209, 301, 314 Psaltis, D. 38, 300 Pylyshyn, Z. W. 35, 215, 314
R
Raab, D. 158, 314
326
AUTHOR INDEX
Ramachandran, V. S. 291, 314 Ranck, J. 32, 315 Rao, K. V. 35, 314 Ratcliff, R. 35, 36, 45, 315 Ratliff, F. 70, 152, 307, 315 Reeves, B. C. 144, 301 Reicher, G. M. 170, 249, 250, 315 Restle, F. A. 179, 182, 315 Richards, W. A. 272, 308 Robinson, D. N. 160, 315 Robinson, J. O. 55, 315 Robson, J. G. 79, 116, 119, 121, 125, 128, 152, 303, 304, 306, 315 Rock, I. 101, 102, 315 Rogers, T. D. 22, 315 Rogowitz, B. E. 126, 312 Rolls, E. T. 79, 314 Rosch, E. 34, 315 Rosenblatt, F. 88, 271, 315 Rosenfeld, A. 84, 129, 196,292,301, 311, 315 Rosner, B. S. 161, 310 Royce, J. R. 28, 29, 30, 315 Rumelhart, D. E. 94, 103, 104, 252, 272, 273, 312,315 Ryle, G. 28, 315
s Sachs, M. B. 125, 315 Sager, C. S. 250, 314 Sagi, D. 126, 315 Sakata, H. 32, 312 Sakitt, B. 109, 315 Santa, J. L. 12, 306 Santee, J. L. 44, 316 Schade, O. H. 118, 316 Scheerer, E. 158, 316 Schiller, P. H. 150, 152, 161, 316, 317 Schlaer, S. 109, 307 Schmidt, H. 172, 318 Schneider, W. 93, 242, 243, 316 Schönemann, P. H. 186, 316 Schumer, R. A. 125, 129,309 Schurman, D. L. 158, 301 Schvaneveldt, R. W. 252, 313 Schwartz, E. L. 82, 316 Selfridge, o. 270, 316 Selwyn, E. W. H. 118, 316
Sernmes, J. 82 Shapiro, L. R. 259, 302 Shaw, M. L. 244, 245, 307 Shaw, R. E. 54, 73, 74, 316, 318 Shaw, R. S. 285, 286, 287, 304 Shepard, R. N. 33, 35, 43, 166, 179, 181,212, 213, 214, 215, 291, 312, 316 Shepp, L. A. 187, 189, 190, 192, 309 Shiffrin, R. M. 242, 243, 316 Shlaer, 109 Shor, R. 169, 316 Simon, H. A. 293, 316 Smith, J. E. K. 230, 313, 317 Smith, M. C. 150, 152, 161, 316, 317 Smith, W. L. 285, 304 Sneddon, I. N. 180, 317 Snodgrass, J. G. 45, 317 Snyder, C. R. R. 45, 127, 314, 317 Solis-Macias, V. 171, 310 Souther, J. 172, 318 Spencer, W. A. 153, 305 Sperling, G. 150, 208, 244, 291, 317 Sperry, R. W. 82, 317 Staller, J. D. 128, 317 Stanley, J. 303 Stanovich, K. E. 230, 313 Sternberg, S. 241, 247, 317 Stevens, K. A. 262, 317 Stevens, S. S. 15, 183, 317 Stigler, R. 158, 317 Stockmeyer, L. J. 4, 277, 278, 317 Stoever, R. J. 250, 314 Stroop, J. R. 169,205,317 Swenson, R. G. 128, 317 Swets, J. A. 14, 306 Switkes, E. 292, 306 Sykes, M. 172,318
T Taylor, R. L. 204, 314 Teller, D. Y. 7, 82, 176,275,317 Thornas, J. P. 78, 164, 165, 305, 317 Thompson, D. W. 17,22,317 Thorell, L. G. 117,304 Titchener, E. B. 72, 317 Todd, J. T. 218, 317
AUTHOR INDEX Tolansky, F. R. S. 55, 317 Torgenson, W. S. 179, 181, 317, 318 To~nd,J. T.45,226,227,230,268,31~318
Trevarthen, C. B. 318 Triesrnan, A. 142, 145, 163, 172, 173, 174, 188,
1%,309,318 Trofanenko, S. C. 22, 315 Tulving, E. 293, 318 Turing, A. M. 34, 175, 318 Turvey, M. T. 74, 150, 152,316,318 Tversky, A. 178, 179, 180, 181, 182, 184, 185, 186, 227, 318
u Ullman, S. 11, 257, 258, 259, 318 Uttal, W. R. 2, 3, 5, 6, 11,22,24,28,31,33, 39,40,44,46,52,53,59,66,72,73,76, 77,78,110,111,113,128,132,134,135, 136, 138, 139, 140, 143, 149, 160, 176, 187, 188, 216, 227, 261, 263, 273, 274, 289, 311, 318, 319
v Van Essen, D. C. 220, 319 van Meeteren, A. 144, 319 Van Tuijl, H. F. J. M. 292, 319 von Neumann, J. 4, 62, 303, 319
w Wade, NJ. 78, 319 Wagner, M. 40, 293, 319 Wagner, R. F. 191, 302, 303 Wall, S. 247, 304
327
Wallach, H. 75, 319 Wapner, S. 46, 320 Ward, I. 167,291,319 Warren, R. M. 45, 100, 319 Warren, R. P. 100, 319 Warrington, E. K. 150, 310 Watanabe, S. 59, 224, 230, 233, 319 Watanabe, T. 156, 313 Watson, A. B. 90, 92, 118, 129, 319 Waxrrwn, J. 158, 300 Weber, A. 312, 313 Weisel, T. N. 79,90,92 Weisstein, N. 78,121,158,159,170,250,311,
320 Welke, C. L. 261, 319 Wemer, H. 46, 158, 320 Wertheimer, M. 22, 73, 153, 167,291,320 Wheeler, D. D. 170, 315, 320 Whorf, B. L. 34, 320 Whyte, L. L. 17, 19,20,320 Wiesel, T. N. 32, 117,308 Williams, A. 250, 320 Williams, M. C. 250, 320 Wolfe, J. 170, 310 Wolford, G. 145, 320
y Yamagami, K. 98, 310 Yang, H. S. 320 Young, M. J. 156,320 Yuzyk, J. 143, 303
z Zusne, L. 17,23,41, 143, 320
SUBJECT INDEX
A
absolute thresho1d, 108 abstract and arnodal, 34 active processing, 9, 11, 70, 284, 298 adaptation by spatial frequencies, 118 adumbrative interpreter, 54 affordance, 17 algebraic equations, 85 algorithmic evaluations, 103 algorithmic interpretations, 230 algorithmic primitives, 58 algorithmic processing, 103 algorithms, 84, 145, 254-255, 265, 269, 292, 296 all-or-none, 229-230 alphabet, 249 alphabetie character, 127-128, 169, 238, 241, 244, 265-266 arnbiguous figures, 102 American structuralists, 175 arnodal representation, 35 analog computer, 85 analog, 30, 34, 56, 59, 85, 175-176,212,215, 234, 288, 296 analog-propositional debate, 178 analogous behavior, 56 analogy, 251 analysis, 75
analytic function theory, 88 analytic mathematics, 75 analytie models, 60, 75, 166 anatomical hypotheses, 117 and gate, 175 animists, 27 anticipatory schemata, 102 Aplysia, 280 apparent motion, 13 apparent simultaneity, 149 apprehend, 16 area patterns, 19 arrangement, 21, 24, 42, 133, 189, 220, 233, 266-269 Artificial Intelligence, 58, 60-61, 63, 94, 220, 264, 276, 281 assemblages of textons, 195 associationism, 64, 67-68, 74 attention, 7, 9, 54, 170, 1%,284 attentive energy, 244 attributes, 26, 41-43, 188 auditory stimuli, 169 autocorrelation, 76, 132, 134, 138, 221 automata, 277, 295, 297 automatie processing, 2, 242 automatic-inferential dichotomy, 9 average metric, 186 average power metric, 186 axioms of computational theory, 4
329
330
SUBJECT INDEX
axis of symmetry, 144
B backward-masking, 151, 157 bag of tricks, 291 band-pass filtering, 267 bandwidth, 119 Bayes' decision theory, 237 behavioral analogs, 229 behavioral techniques, 35 behaviorism, 68 between-category design, 247-248 binocular disparity, 261 binocular viewing, 136 biochemical mechanisms, 217 biochemistry, 220 black box limitation, 5, 44, 116, 172, 174,246, 255, 284, 295, 298 blob, 166, 168, 170, 194-1% blooming, buzzing confusion, 71 blurring, 123 Boolean algebra, 81 bottom-up, 11, 54-55, 172, 230 Boundary Contour System Equations, 292 boundary-contour model, 196 brain, 220, 234, 286 brain-mind, 286, 290 bright flash masking, 149 brightness, 263 British empiricism, 67-69, 175 Butterfly machine, 219
C cafe wall illusion, 292 camouflage, 146 canonical spatial configuration, 200 CAT-scan displays, 128 category effect, 247-248 cell-assembly theory, 82, 173 central masking, 150 central nervous integration system, 9 central nervous system, 2 channel shape, 129 channels, 24, 80, 119, 156 chaos theory, 277, 285, 288, 295, 297 chaotic behavior, 286
character, 244 choice models, 226, 230 chromatic induction, 161 chromaticity, 7 chromodynamicists, 27 chronometrie method, 205, 214 city-block metric, 186 classification, 2 closure, 154 clumpiness, 189 clustering, 190, 237 coarticulation, 246 codes, 33 cognition, 109, 278, 288 cognitive aspects, 2 cognitive decision theory, 101 cognitive domain, 212 cognitive factors, 172 cognitive mechanisms, 142, 171 cognitive penetration, 223, 289 cognitiveprocesses, 11, 109,207,211,217,251, 284, 293, 295 cognitivepsychology, 8, 52, 200, 280-281, 297 cognitive spector, 289 cognitive-category effect, 248 cognitive-information processing theories, 104 cognitron, 88 collicular pathway, 43 color differences, 157 color, 169, 195, 239, 243 combination stimuli, 125 combinational explosion, 194 combinatorial barrier, 255 combinatorial considerations, 295 combinatorial limit, 298 combinatorial mathematics, 277 combinatorial problems, 277 combinatorics, 278, 288, 297 combined wave, 126 common fate, 154 common orientation, 154 communication codes, 33 completion, 197 complexity, 44, 142, 254 component properties, 42 components, 217 computation system, 59 computational algorithms, 84 computational analOgS, 288 computational approach, 84-85 computational difficulty, 56
SUBJECT INDEX
computational engines, 295 computational models, 83, 221, 253, 296 computational procedures, 76 computational reconstruction, 253 computational technology, 280 computational theory, 257 computational vision, 11, 230, 254, 264, 288 computer, 32, 43, 57, 60, 81, 83, 142, 148,218, 234, 263, 268, 278 computer architecture, 62 computer image processing, 84 computer instructions, 84 computer metaphor, 62, 279 computer programming, 56 computer science, 40 computer-simulation research, 59 computer-vision, 262 computerized robotics, 262 concealment, 146 concept, 15 cones, 108 confusion-choice models, 229 connected networks, 272 connection machine, 219 connectionism, 93 connectionist models, 94 connectivity, 277 connotation, 10 conscious, 13 conspicuous features, 194 conspicuousness, 195 constraints, 260 construction, 45, 103 constructionism, 8 constructionistic processes, 157 constructivist concept, 103 continuity, 154 continuous analysis, 85 continuous field model, 77, 82 contour enbancement, 118 contrast gratings, 116 contrast ratio (CR), 120 contrast reduction, 115 contrast sensitivity function (CSF), 122 contrast, 76, 120-121 convergent operations, 171 convolution, 75, 197,231 cooperative and competing networks, 88 correlational models, 227 correspondence problem, 254 counting-set model, 183
331
covariance diagonalization, 231, 233 criterion levels, 108 critica1 empiricist, 69 critica1 realist, 74 cross-correlations, 269 crossing of line segments, 194 crow's tlight, 186 cubic surfaces, 219 cubic, 138, 262 cues, 263 curve-fitting model, 262 curves, 248 curvilinearity detector, 133 cyclopean eye, 293
D
dark adaptation, 109 data collection, 49 data integration, 49 decision-making, 232 decision-theoretic approaches, 227 decomposition, 77 decomposition-into-parts, 272 deduction, 59, 224 deduction-induction dichotomy, 224 definitions, 10 deformation, 135, 138-139 degraded, 221 degrees of freedom, 185 delayed simultaneous contrast, 160 demask, 159 denotation, 10 density hypothesis, 185, 190-191 depth perception, 60 depth, 47, 251, 288 describe, 49 descriptive categories, 230 descriptive mathematics, 226 descriptive models, 226 descriptive statistics, 189 descriptive systems, 276 descriptors, 232 detection of gratings, 115 detection sensitivity, 121 detection task, 128 detection threshold, 187 detection, 2, 13, 108-110, 112, 114, 130, 133, 140, 162, 224, 274, 279, 287 detector units, 197
332
SUBJECT INDEX
detectors, 252 detenninism, 67, 284 developmental psychology, 40 diagonalization, 233 dichoptic disparity, 137 dichoptic masking, 150 dichoptic metacontrast, 161 different judgements, 202 differential equations, 85 dimensional isomorphism, 7, 33 dimensional model, 183, 185-186 dimensional nonisomorphism, 7 dimensions, 42-43, 179, 184, 232 direct reaJism, 73-75 direct-mediated issue, 65 discrete digital computation, 85 discrete elements, 63 discrirninability, 162 discrirninants, 229 discrirnination paradigms, 163 discrirnination task, 163 discrirnination thresholds, 164 discrirnination, 2, 7,14,47,76,110,112,114, 140, 162, 187, 192, 198, 200, 224, 238, 261, 274, 279, 287 discrirninative behavior, 207 discrirninative cognition, 201 disinhibition, 160 disorder, 137 disparity, 264 displacement activity, 143 distal stimulus, 23 distance function, 185 distance, 181, 185-186 distributed networks, 81 dot masking, 132 dot numerosity, 135 dot pattern, 19, 273 dot-masking experiments, 150 dot-masking, 143 dot-spacing irregularity, 135 dots, 131, 133, 136, 143, 151,257,259,261-263 dotted background, 191 dotted noise, 130 dotted patterns, 191 dotted stereoscopie forms, 284 dotted stimulus, 132 dotted stimulus-forms, 130-131, 133-134 dotted-form detection, 115 dotted-line detection, 137 dotted-noise, 132, 134
dotted-noise-masking procedure, 132 dotted-outlines, 137 dotted-signal, 132 dual task, 246 dummy dots, 112
E ecological function, 74 ecologicaJ optics, 74 efficiency, 191 effortful attention, 172 Einstellung, 155 electricity, 1 electroencephalographs, 82 electrotonic interactions, 82 element, 43 elemental sensory impressions, 68 elementalistic form perception, 40, 167-168, 203, 217-218, 221, 294-295 emergent features, 23, 250-251 emotion, 7 empiricism, 9, 30, 50, 52-55, 58, 64, 66, 68, 70, 73, 106, 164, 283, 294-295 encoding, 274 energy detection, 108-109 engram, 216 entrOPY rninimization, 231, 233 entropy, 154, 233 entropy-reduction, 233 epistemology, 69 error matrices, 140 Euclidean geometry, 40 Euclidean space, 23, 180 evoked brain potentials, 82 exhaustive search, 240 expert system, 61, 220, 288, 295 explanation, 49 extemaJ world, 26, 29 extirpation studies , 220 extrarnission, 95 extrasensory, 3
F
face, 265 face-recognition, 265 farniliarity, 240
SUBJECT INDEX feature analysis, 167, 172, 174,202-203,228,
232, 234-235, 252, 266 feature confusion, 227 feature detectors, 270-271 feature extraction, 77 feature hypothesis, 172 feature integration theory, 172-173 feature maps, 173 feature mismatch, 209 feature recognizers, 38 feature specific neurons, 174 feature theoretical analysis, 202 feature vector model, 92 feature, 42, 228, 231-232 feature-by-feature algorithms, 217 feature-confusion models, 226, 228 feature-detecting neuron, 80 feature-detectors, 155-156, 191 feature-discriminant models, 227 feature-integration theory, 174 features, 42, 146, 169-170, 183-184,217,233-
234, 252, 268-269, 283 figural goodness, 136 figural superimposition masking, 148 Figure of Merit (Fm), 135, 138 flliing in, 197 finite automaton, 36 first-order statistics, 189, 191-193 flashing dot, 136 flicker, 129 flicker-fusion, 2 font, 225 forced-choice procedures, 158 forced-choice procedure, 249 forced-choice, 159, 169 form matching, 211 form perception, 2, 7, 26, 41, 43, 46, 55, 85,
133,221,276 form recognition, 228, 236, 263-264, 270, 272 form, I, 17,20-21,23-24,42,63, 153,219-220,
231,253,265,299 form-detection experiments, 134 formal models, 60 formal mies, 61 formal systems, 276 four-dimensional, 5 Fourier analysis, 23-24, 38, 75, 87, 92, 116-118,
132, 235-236 Fourier channels, 79-80, 129 Fourier frequency, 116 Fourier mathematics, 116
333
Fourier methods, 236 Fourier model, 80, 83, 127-128, 145 Fourier representation, 235 Fourier theorem, 116, 119 Fourier transformations, 276 Fourier visuaJ system, 118 fractals, 18 fracturing, 115 frames, 30 free will, 41 frequency of firing, 8 frequency, 121 frequency-analyzing mechanism, 119 function, 28 functional algorithms, 145 functional fixedness, 146-147 funneling, 153
G Gabor function, 92-93, 117-118 gaps in a train of flashes, 129 Gaussian functions, 92, 117 gedanken experiment, 142 general discrirninant models, 226-227 general recognition model, 226 generality, 294 generalization, 58, 290 geometrie form, 43, 112, 132, 142, 200 geometrie theories , 179, 185 geometrical parts, 38 geometrical surfaces, 261 geometry, 54, 94 Gestalt grouping, 148-149, 153, 291 Gestalt psychology, 14, 69, 73 Gestalt, 17,20,22,40,42,45,73,132,145,153,
170, 215, 218, 220, 269, 282-283, 295 Gestaltlike proxirnity factor, 156 ghost features, 268 ghost in the machine, 28 global arrangement, 43, 221 global attributes, 22, 170 global descriptors, 233 global form, 131, 133, 160, 168-170, 199 global interdot relationships, 273 global levels, 25 global mental skills, 63 global organization-processing mechanisrns , 250 global perception, 196 global precedence, 168, 170-172, 268, 283
334
SUBJECT INDEX
global priority, 171 global properties, 193,218,234, 268, 278, 295, 297 global qualities, 234 global structural properties, 167 global-local controversy, 198 globality, 193, 233 good continuity, 156 goodness, 22-23, 73 grating detectability, 127 grating-detection experiment, 123 grating-type stimuli, 165 gratings, 165 grid stimulus-fonns, 139 gooup, 15, 153, 197,292 goowth,4O
H
hallucinations, 13 hard-wired, 280 hannonics, 119 heredity-environment debate, 245 heteoogeneous, 242-243, 247 heuristics, 59-60, 196-197 hierarchical structuring, 163, 165, 287 high frequencies, 265 high-frequency grating, 126 holisDn, 11,40,43, 133, 167-170,203,215,218, 221-222,231,233-234,261,269,295-297, 299 holistic attributes, 282 holistic processes, 33, 251 holistic-elementalistic controversy, 166 hologram, 82 holography, 38 homogeneous, 242-243, 247 hornologs, 56-57 hooological, 283 hydraulic, 283 hyponopornic suspension, 213
J ideal observer, 192, 258 idealism, 27, 97 idealistic monist, 67 identity, 14, 199-200 illusions, 8, 16, 64, 76, 218, 289
image blur, 133 image manipulation, 15 image, 66, 86 imagery, 16 imaginations, 13 irnrnaterialists, 27 immediate, 2, 39 independence axiom, 183 independence of processes, 291 indeterrninacy, 8, 297 indirect construction, 71 indirect realism, 73, 94 indiscrirninable surface pooperties, 148 individual differences, 141-142, 293 induction, 59, 223-224 inference, 290 information acquisition, 165, 249 information processing, 176, 297 information, 95, 104, 154, 255, 267 information-processing interconnections, 31 inforrnation-processing theories, 103 inforrned-guessing model, 230 inhibitory interactions, 153 innate ideas, 8, 39-40, 65, 97 input-out analysis, 296 integral and separable dimensions, 163, 166 intensity, 8, 20, 76 interactions, 240, 279 interconnections, 78 interdot interval, 136 interdot-interval irregularity, 137 internal codes, 297 internal mechanisrns, 5 internal observation, 226-227, 230 internal representation, 35-36 interpersonally, 10 intraorganisrnic events and dispositions, 50 intrapersonal privacy, 10, 30 intrornission, 95 invariance axiorn, 183 invariance extractions, 45 inverter, 175 invisibility of internal processes, 5 ionic forces, 31 isomorphism, 7-8, 215
J. K just detectable contrast, 125 Karhunen-Loeve expansion, 232
SUBJECT INDEX knowledge acquisition, 1 knowledge, 61
L
laciness, 189, 192 laminar flow, 286 Laplacian operator, 254 latent periods, 157 lateral inhibitory interaction, 77-78, 133, 152, 155, 220, 252, 278 lawful forcing functions, 285 laws of apparent movement, 291 laws of color rnixture, 291 laws, 47, 243, 289, 297 leaming machines, 94 leaming, 216 letter, 225 letter-identification, 252 letters, 169, 197, 200, 248-249, 252 level of analysis, 31 levels of processing, 210 Lie algebra, 32, 91-92, 221, 281 lighting, 264 limits, 275 line crossing, 195-196 line orientation, 135 line patterns, 19 line terminator, 195 linear-discriminant functions, 237 lines, 136, 231, 248, 250 linguistic dependencies, 252 linguistic representation, 34 linking hypothesis , 83 list processor, 288 list structures, 57 list-processing computer, 57 local attribute, 43 local features, 131, 133, 167, 169, 170, 188, 197, 219,221,292 local geometry, 188 local precedence, 171 local texture, 239 local-feature-oriented models, 282 localizability, 220 locally discrete, 278 logic gates, 57 logic, 223 logical mechanisrns, 63 logical units, 57
low frequencies, 265 low-frequency grating, 126 lurninance distribution, 121 lurninance of the peak (Lp), 120 lurninance of the trough (Lt), 120 lurninance peaks, 124 lurninance, 54, 120 lurninosity detection, 108-109, 149 lurnpiness, 192
M Mach bands, 152 macromolecular biochemistry, 219 macroscopic behavior, 32, 246 macroscopic rationaiity, 56 macroscopic theoretical orientation, 229 magnitude-estirnation, 159 mask, 44, 136, 148, 150 rnasking by pattern, 147 rnasking of the fifth kind, 153, 155 rnasking of the first kind , 149 masking of the fourth kind, 153 rnasking of the second kind, 150 rnasking of the sixth kind, 157 rnasking of the third kind, 152 rnasking, 115, 146, 159 rnasking-dots, 136 matches, 204, 210 matching axiom, 182 matching, 199-200, 209 material world, 66 materialistic monist, 67 materialistic substrate, 3 materialism, 28, 67 mathematical discrirnination, 231 maturation cycle, 40 rnaxirna, 139 meaning, 94, 298 measure, 49, 71 measurement, 232 mechanisms, 57 mechanistic, 67 mediate visuaI stimuli, 39 membrane potentials, 31 membrane, 62, 217 memory, 33, 216 mental activity, 9, 249 mental chemistry, 68 mental event, 68
335
336
SUBJECT INDEX
mental irnagery, 213 mental phenomenon, 298 mental processes, 26, 244, 279, 298 mental responses, 27 mental rotation, 212, 214 mental transformation, 213 mental world, 66 mentalistic, 299 metacontrast, 148-149, 157-158, 160 metaphors, 17, 30, 82, 288 metaphysical assumption, 59 metaphysical prirnacy, 4 metaphysics, 66 metaprinciples, 8, 13 metaquestion, 35 metatheoretical, 3, 50 method of detail, II 0 methode, 110 metric and feature models, 189 metric spaces, 180 Metrics for Bounded Response Scales (MBR), 186 metrics, 179 Mexican Hat shape, 118 microcosm, 16 microelectrodes, 299 microgenic, 46 microscopic level, 32, 246, 287 microscopically oriented theory, 290 microstructure, 198 microworld, 58 middle-out, 172 rnind,3,6-7,27-29,31,62,64-65,94,99,212, 221, 231 mind-body problem, 25, 31 mind-brain problem, 25, 31 mind-brain, 26, 31, 282, 285 minima, 139 minirnality axiom, 180-181, 185 minimum processing time, 245 mismatches, 210 missing dot, 151 missing feature principle, 209 models, 16, 1l0, 129, 185 modulation transfer function (MTF), 122 Moire patterns, 292 molar fields, 82 molar mental processes, 31, 34, 288 molar minds, 82 molar process, 277 molecular structures, 31 moments, 189
monads, 96 monochromator, 24 monocular mask, 152 monocular viewing, 136 monodimensional, functional relationships, 279 monotonicity axiom, 183 motion, 264 movement detection, 90 movement, 76, 260 multidimensional perceptual universe, 279 multidimensional scaling, 179, 181, 183-184 multidimensional vector, 80 multidimensional, 24, 42 multilayered network, 88 multilevel processing, 5 multiple formal models, 291 multiple-channel mechanisms, 129 mysticisms, 27
N N-Cube machine, 219 naive realist, 74 name match, 204 narne-and-color-interference phenomenon, 169 narning, 15 natural geometry of visual space, 40 natural language processing, 58 Necker cube, 98 Needle Map, 262 neo-Gestaltist, 175 neobehaviorism, 299 neocognitron, 88 neoholists, 170 neon spreading illusion, 292 neorationalism, 55, 64, 101 nerve-net models, 39, 70, 89, 288 network analysis, 75, 288 network simulations, 278 neural activity, 298 neural analysis, 61 neural circuitry, 5 neural codes, 74 neural computers, 272 neural geometry, 82 neural inhibitions, 252 neural interaction, 148 neural mechanisms, 78 neural models, 75
SUBJECT INDEX
neural networks, 70, 77, 80-83, 89-90,93,270, 274, 286 neural substrate, 26 neurobiological networks, 282 neurochemical studies, 220 neuroelectrical field theories, 42 neuronal network state, 298 neurons, 235, 264, 278-279, 288 neurophysiological datum, 52 neurophysiological reductionism, 68 neurophysiology, 4, 62 neuroreductionism, 4, 26, 52, 77-79, 83, 160, 188, 216, 264, 279, 288, 298 neuroscience, 61 neurosciences, 63 noisy operator theory, 208 noise, 44 noise,44 nonadequate stimulation, 13 nonidentity, 14, 199-200 nonlinear systems, 85 nonneuroreductive, 52 nonparametric techniques, 237 nonplanar surfaces, 138 nonreductionistic, 299 nonverbal, 15, 34 noumena, 97 numerals, 238, 248 numeric values, 181 numerosity, 136
o object superiority, 249-251, 253 objective, 53, 155 Observer theory, 57, 281 observer, 5, 57 Ockham's Razor, 65 one-dimensional functions, 128 opthalmoscope, 99 optic array, 74 optical computers, 38 optical MTF, 123 order, 250 orders of statistical complexity, 189 organization, 14, 22, 62, 74, 90, 145, 200, 297 orientation , 137, 195 orientation-sensitive detectors, 92 orthogonal functions, 127 orthogonal wave-forms, 116
337
overlap theory, 229-230, 245
p pandemonium, 271 paracontrast, 157 paradoxical advantage, 203, 208-209 parallel channels, 205 parallel computers, 272 parallel mechanisms, 246 parallel processing, 44-45, 84, 93-94, 222, 235, 238-239, 241-243, 245-246, 272 parallel-search, 242 parallel-serial dichotomy, 246 paralle1ograms, parallelograms, 135 parameter estimation, 237 partic1e particJe physicists, 27 parts, 42, 168, 217, 220, 266, 272 passive transducer , 69 passive processing, 9, 284 c1assification, 237 pattern cJassification, pattern perception, 24 pattern recognition, 222, 233, 236-237 pattern synthesizer, 103 pattern, 19-20, 24, 153, 232 pedestal, 120 perceiving, 30 percept, 13 perception, 7, 10-11, lO-ll, 13,26, 31, 46, 62, 95 13,26,31,46,62,95 perceptron, 88, 94, 271-272 percepts, 33 perceptual credo, 26 perceptual organization, 141 perceptual-versus-cognitive mechanisms, 212 performance analogs, 57 peripheral communication, 33 peripheral sensory nerves, 2 peripheral-masking processes, 150 perpetual motion, 277, 296 phase relation, 126 phase sensitivity, 127 phase sequences, 68, 173 phase shifts, 127 phenomena, 47, 50, 97, 298 phenomenology, 218. 218, 296 philosophical speculation, 1 phosphenes, 13 photic stimulation, 13 photochemical-receptive processes, 108 photochemistry, 4
338
SUBJECT INDEX
photon, 109 phylogeny, 59 physical geometry, 181 physical matches, 202, 204 physical optics, 1 physical realism, 3 prior knowldege, 127 probability summation, 125 probalistic functionalists, 100-101 process, 7, 68, 246 processing bierarchy, 165 projective geometry, 87 propinquity, 56 propositional representation, 175-177, 215 proto-empiricism, 64-65, 95 proto-rationalism, 65 prototheories, 50 proximal stimulus, 74 pseudo-features, 145 psychoactive drugs, 220 psychobiological monism, 3-4, 56, 298 psychobiological reality, 280 psychobiology, 66 psychological epistemology, 29 psychoneural equivalence, 3, 33, 52, 81, 220 psychophysical CSF, 123 psychophysical data, 83 psychophysical experimentation, 218 psychophysical indeterminacy, 36 psychophysical phenomena, 78, 290 psychophysics of form, 23 psychophysics, 27, 50 pure reason, 96 physical reality, 27 physical stimuli, 27, 66 physical universe, 27, 55 physiological psychology, 220 physiology, 1 pick-up-sticks, 136 pictograpbiclike relationsbip, 66 pixel, 19 planes, 136 plasticicity, 217 Platonie forms, 64 pneumatic, 283 point patterns, 19 point-plotting cathode ray tubes, 148 polynomial curve-fitting, 262 popout, 243, 248 positivistic, 67 potential theory, 88
poverty of the stimulus, 8 Pragnanz, 22, 154 preattentive, 2, 9, 170, 196, 265, 284 prerational, 56 preverbal, 15 primal sketch, 256 primary and secondary illusions, 105 primary qualities, 67 priming, 210
Q
quadric, 138, 219, 262 quanta, 108 quantification, 15 quantum mechanists, 27 quasi-random dot patterns, 191-192
R radical direct realism, 74 radical empiricism, 103 radical rationalism, 103 radical realism, 27 radiographs, 133 random dot texture, 144 random forms, 266 random bistograms, 21 random network, 271 random patterns, 19 random processes, 286 random state, 285 random stimuli, 140 random systems, 286 random dot stereogram, 143 randomly arrayed dots, 137 range, 260 raster-scan, 148 rational, 74 rationalism,9,SO,52-55,58,64, 70-71, 73,94, 96-97, 106, 164, 283, 299 rationalist-empiricist dimension, 54 raw sense data, 30 reaction times, 205, 209, 241-242 reading, 15 realism, 27-28 realistic neural systems, 4 reality of space, 97 reality, 29
SUBJECT INDEX reason, 96 receptive fields, 129 receptors, 33, 109 rechecking, 203-204, 208 recognition performance, 165 recognition process, 222 recognition, 2, 7, 15, 38, 58, 76, 88, 110, 112, 114,140,217,224,230,238-240,249,254, 261,265,271,274,279,287 reconstruction, 8, 37, 254, 262 reductionism, 229, 275, 294-295, 297 redundancies, 260 reflection, 67 relationships, 61, 167 relativistic physics, 281 release-from-masking effect, 155 repetition, 210 representation, 33-34, 37-38, 73, 79, 256 resonant peak, 122 retina, 90 retinal maps, 37 retinal projection, 38 retino-geniculate pathway, 43 retinotopic maps, 8 reversible figures, 98 Ricco's law, 108 Riemannian geometry, 87 rigid body, 260 rods, 108 Rule of Linear Periodicity, 133, 139 Rule of Multiple Rules, 289 Rule of Three-Dimensional Noncomputability, 139 rule-based criterion, 201 rules of grouping, 153, 291 rules, 289
s same-different judgments, 162, 178, 199-202, 208, 210, 260 scaling, 15 scanning mechanism, 244 schema, 30 scientific chaos, 292 scripts, 30 scrutinization, 238 search process, 241 searching, 15, 114 second-order statistics, 189, 193
339
secondary qualities, 67 seeing, 13 segregation, 197 selective adaptation, 174 selective attention, 9, 174 self-featuring information compression, 232 self-organizing, 88 self-terminating, 240-241 semantic analyses, 57 semantic category effect, 240, 247-248 semantic content, 239 serni-pictographic coding, 34 sensing, 30 sensory coding, 34, 74 sensory communication, 9, 32-33 sensum, 13 separable dimensions, 163, 166 separable-integral dichotomy, 166 sequential dependencies, 252 sequentiality, 163 serial order, 241 serial process, 239 serial processing, 242 serial processor, 45 serial system, 44, 238, 245-246 serial-parallel dichotomy, 245 serial-parallel question, 45, 240, 244 set theory, 179, 182-184 shape, 21, 243 short-term storage, 207, 249 signal-detection theory, 14 signal-to-noise detection, 14, 127, 131, 133, 139, 145 sirnilarity, 154, 156, 162-163, 174, 178, 181, 183-184, 186, 199-200 sirnilarity-choice model, 229 sirnilarity-type experiments, 166 simple and complex cells, 117 simplicity, 65 simultaneous (side-by-side) matches, 211 sinc function, 118 single-cell feature detection model, 77-78, 81, 83 single-cell sensitivities, 167 sinusoidal components, 92, 125, 132 sinusoidal stimulus-field, 121 size constancies, 38 size, 239, 243 smoothing, 137 sociobiological, 8 software, 268 solvability axiom, 183
340
SUBJECT INDEX
sophisticated guessing models, 226, 229-230 sorting, 15 soul, 64 space, 8 sparse sampies, 261 spatial and temporal interactions, 32 spatial entropy, 22 spatial features, 92 spatial frequencies, 80, 116, 119, 121-122, 125, 128, 139 spatial frequency features, 24 spatial frequency power spectrum, 92 spatial geometry, 2 spatial inhibition, 240 spatial irregularity, 137 spatial proximity, 154 spatial-frequency analysis, 79, 119 spatial-frequency channels, 80, 116-118, 128 spatial-frequency detection model, 77 spatial-frequency hypothesis, 127 spatial-frequency model, 24 spatial-temporal patterns, 7 spatially distributed arrays, 219 special relativity, 87 specific attribute question, 41, 143 specific energies of nerves, 72 speed of conduction, 169 speed of light, 296 stages of visual processing, 113 statistical decision making, 231, 237 statistical dimension, 22 statistical modeling, 93 statistical properties, 90 statistics, 226 stereoscopic depth, 40, 76, 138, 141 stereoscopic dotted stimulus, 131 stimulus cues, 71 stimulus determination, 9 stimulus dimension, 33 stimulus encoding, 210 stimulus equivalence, 37 stimulus object, 53 stimulus scene, 74 stimulus-equivalence, 38 stored program digital computer, 62 straight lines, 134 stream of consciousness, 72 Stroop phenomenon, 205 structural analysis, 228 structural descriptions, 236 structuralism, 68, 72, 175
structure analysis, 231 structure from motion, 257, 261, 263 subitize, 157 subjective contours, 45, 267 subjective explanations, 159 subjective, 53 subliminal perception, 165 subprocesses, 46 successive recognition, 109 superimposed sinusoidal frequencies, 126 superimposition, 228, 231 superstring, 27 surface color, 153 surface cues, 153 symbol-processing, 284 symbolic descriptions, 58 symbolic meaning, 24 symbolic relationships, 61 symbolic representation, 95 symbolic, 34, 177,215 symbolization, 30 symmetry axiom, 180-181 symmetry, 144, 154 symmetry-detection, 145 synapses, 217 synaptic conductivity, 217 synaptic hypothesis, 217 synaptic junctions, 216, 279 synesthesia, 206-207 synthesists, 11
T
table-Iookup operations, 295 tabula rasa, 30 tachistoscope, 147, 241 target detection, 156 taxonomy, 3, 6, 226, 230, 236, 240 technology, 59 telecommunication technology, 176 telephonic, 283 template matching, 38, 223, 227-228, 230, 235, 269 temporal interaction, 158 temporal intervals , 13 7 temporal precedence, 268 temporal proximity, 154 temporal summation, 148 terminators, 194, 196 textons, 90-91, 175, 188-189, 193-198
SUBJECT INDEX texture discrimination, 188, 197 texture segregation, 174 texture works, 144 texture, 15, 19, 47, 76, 187-193, 196,264 theologies, 27 theories of recognition, 225 theories, 50 theory-driving force, 219 thinking, 30 third dimension, 71 third-order statistics, 190-193 thought, 220 three-dimensional perception, 5, 23, 34, 76, 110, 116, 138-139, 144, 214, 235, 250, 253, 256-258, 260, 263, 284 time, 8 top-down, 11, 53, 55, 172,230, 275 topologically, 181 toy in the head, 34, 64, 176, 215 transactional functionalists, 100 transactional relationship, 75 transduction, 74, 274 transformations, 211 transient and sustained channels, 157 triangle inequality axiom, 180, 184 triangle similarity axiom, 182 triangles, 135 trigger features, 79 trigger-feature sensitivity, 51 truth, 276 Turing model, 57 Turing's formal proof, 34 two-and-one-half-dimensional image, 76 two-dimensional gratings, 128 two-dimensional perception, 23, 76, 214, 250, 256, 257, 260, 284, 288 two-pulse interaction, 149
U
U-shaped function, 151, 158 ultimum sentiens, 95 uncertainty, 127, 286 unconscious conclusions, 99 unconscious inference, 39, 289 understanding, 16-17, 255
unidimensional, 42 unification, 290 uniform density, 154 unitary, 166 universal forms, 64 universal laws, 291 universals, 37
v validity testing, 281 vector analysis, 87 vector difference, 170 vector representation, 75 vectors, 92, 232 verbal codes, 215 verbal strings, 34 verbal, 34, 177 verbs, 238 visual cortex, 82 visual forms, 199 visual imagination, 102 visual noise, 115, 249 visual processing, 2 visuaI search, 237, 239, 247 visual thoughts, 33 voluntarism, 41 voluntary, 41 von Neumann rnachines, 219
w wave-and quantum issue, 245 wavelength, 7 whole,43 wholes versus parts, 167, 174 within-category design, 247, 248 word-superiority effect, 249, 251-253 words, 200, 240, 252
x,z X-ray pictures, 133 zero distance, 180
341