162 99 9MB
English Pages 218 [216] Year 2021
Current Natural Science
Carlo ALECI
Measuring the Soul Psychophysics for Non-Psychophysicists
Printed in France
EDP Sciences – ISBN(print): 978-2-7598-2517-2 – ISBN(ebook): 978-2-7598-2518-9 DOI: 10.1051/978-2-7598-2517-2 All rights relative to translation, adaptation and reproduction by any means whatsoever are reserved, worldwide. In accordance with the terms of paragraphs 2 and 3 of Article 41 of the French Act dated March 11, 1957, “copies or reproductions reserved strictly for private use and not intended for collective use” and, on the other hand, analyses and short quotations for example or illustrative purposes, are allowed. Otherwise, “any representation or reproduction – whether in full or in part – without the consent of the author or of his successors or assigns, is unlawful” (Article 40, paragraph 1). Any representation or reproduction, by any means whatsoever, will therefore be deemed an infringement of copyright punishable under Articles 425 and following of the French Penal Code. Ó Science Press, EDP Sciences, 2021
To the Midnight Researchers. To those who dream of jumping beyond the fence. To my little Princess Sofia and my young Pirate Mattia.
Contents Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IX XI
CHAPTER 1 Adequate and Inadequate Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
CHAPTER 2 The Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Absolute and Difference Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Weber’s, Fechner’s and Stevens’ Laws . . . . . . . . . . . . . . . . . . . . . . . . .
3 4 5
CHAPTER 3 The Psychometric Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
CHAPTER 4 Detection and Discrimination Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
CHAPTER 5 Response Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Yes/no Response Model [Y/N] . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Alternative Forced Choice Model [AFC] . . . . . . . . . . . . . . . . . . . 5.2.1 The “Implicit” AFC Version . . . . . . . . . . . . . . . . . . . . . . . 5.3 Target Probability in Y/N and AFC Response Models . . . . . . . . 5.4 Alternative Unforced Choice Response Model (AUC: Kaernbach, 2001) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Modified AUC (Klein, 2001) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
21 21 21 24 27
.... ....
28 29
False-Negative and False-Positive Errors (Lapse and Guess Rate) . . . . . . . . .
31
. . . . .
. . . . .
. . . . .
CHAPTER 6
Contents
VI
CHAPTER 7 Psychophysical Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Nonadaptive Psychophysical Procedures: The Method of Constant Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Nonadaptive Psychophysical Procedures: The Method of Limits and of the Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35 35 38
CHAPTER 8 Adaptive Psychophysical Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
CHAPTER 9 Nonparametric Adaptive Psychophysical Procedures . . . . . . . . . . . . . . . 9.1 Truncated Staircase Method (Simple Up-down Method: Dixon and Mood, 1948; von Békésy, 1960; Cornsweet, 1962) . . . . . . . . . 9.1.1 Two Interleaved Staircases to Preserve the Assumption of Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Transformed Up-down Staircase Method (Up-down Transformed Response, UDTR: Wetherill and Levitt, 1965; Levitt, 1970) . . . . 9.2.1 (UDTR) Staircase with Rating of Confidence . . . . . . . . . . 9.3 Forced-Choice Tracking (Zwislocki et al., 1958) . . . . . . . . . . . . . . 9.4 Non-parametric Up-down Staircase Method (Derman, 1957) . . . . 9.5 Weighted Up-down Method (Kaernbach, 1991) . . . . . . . . . . . . . . 9.6 Stochastic Approximation (Robbins and Monro, 1951) . . . . . . . . 9.7 Accelerated Stochastic Approximation (Kesten, 1958) . . . . . . . . . 9.8 Block Up-down Temporal Interval Forced Choice (BUDTIF: Campbell and Lasky, 1968) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.9 Parameter Estimation by Sequential Testing (PEST: Taylor and Creelman, 1967) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10 A More Virulent PEST (Findlay, 1978) . . . . . . . . . . . . . . . . . . . . 9.11 Binary Search and Modified Binary Search (MOBS: Tyrrell and Owens, 1988) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
....
45
....
45
....
48
. . . . . . .
. . . . . . .
48 49 50 50 51 53 55
....
57
.... ....
57 61
....
62
. . . . . . .
. . . . . . .
CHAPTER 10 Parametric Adaptive Psychophysical Procedures . . . . . . . . . . . . . . . . . . . . 10.1 The Maximum Likelihood Estimation [MLE] . . . . . . . . . . . . . . . . . 10.1.1 Psychophysical Procedures Based on MLE: The Best PEST (Pentland, 1980; Lieberman and Pentland, 1982) . . . . . . . 10.1.2 The MLE Simplified Approximation of Emerson (Emerson, 1984) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Bayesian Psychophysical Procedures . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Quick Estimation by Sequential Testing (QUEST: Watson and Pelli, 1979, 1983) . . . . . . . . . . . . . . . . . . . . . 10.2.2 ZEST (Zippy Estimation of Sequential Testing: King-Smith et al., 1994) . . . . . . . . . . . . . . . . . . . . . . . . . .
.. ..
69 70
..
75
.. ..
75 76
..
79
..
80
Contents
10.2.3 10.2.4 10.2.5 10.2.6 10.2.7
VII
The Minimum Variance Method (King-Smith, 1984; King-Smith et al., 1994) . . . . . . . . . . . . . . . . . . . . . . . . . . The Ideal Procedure: Behemothic Estimation of Sequential Testing (BEST: Pelli, 1987; Sims and Pelli, 1987) . . . . . . . YAAP (“Yet Another Adaptive Procedure: Treutwein, 1997”) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Step Method (Simpson, 1989) . . . . . . . . . . . . . . . . . . MUEST (Snoeren and Puts, 1997) . . . . . . . . . . . . . . . . . .
..
81
..
82
.. .. ..
83 84 86
..... ..... .....
87 92 94
.....
94
CHAPTER 11 Adaptive Psychophysical Procedures for the Estimate of the Slope . . . 11.1 Adaptive Probit Estimation (APE: Watt and Andrews, 1981) . 11.2 Hall’s Hybrid Procedure (Hall, 1981) . . . . . . . . . . . . . . . . . . . . 11.3 The Simulated Staircase Technique of Leek, Hanna and Marshall (Leek et al., 1992) . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 The Adaptive Threshold and Slope Estimation (ATASE: Kaernbach, 2001b) . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Modified ZEST (King-Smith and Rose, 1997) . . . . . . . . . . . . . 11.6 The Ψ Method (Kontsevich and Tyler, 1999) . . . . . . . . . . . . . 11.7 The Sweet Point-Based MLE Procedure of Shen and Richards (2012) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 95 . . . . . 99 . . . . . 104 . . . . . 107
CHAPTER 12 Multidimensional Adaptive Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 The Functional Adaptive Sequential Testing (FAST: Vul et al., 2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 The Quick CSF Method (qCSF: Lesmes et al., 2010) . . . . . . . . . . . 12.3 The Quick TvC Method (Lesmes et al., 2006) . . . . . . . . . . . . . . . . 12.4 The MEEE Method of Cobo-Lewis for Performance Categorization .
. . 113 . . . .
. . . .
114 118 120 123
. . . . .
. . . . .
127 130 131 133 134
CHAPTER 13 What Makes a Psychophysical Technique a Good Psychophysical Technique? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Assessing the Stability of the Psychometric Function . . . . . . . . . . 13.1.1 The Method of Hall (1983) . . . . . . . . . . . . . . . . . . . . . . . 13.1.2 The Interleaved Tracking Procedure of Leek et al. (1991) 13.2 Assessing the Goodness-of-fit . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
CHAPTER 14 What is the Best Psychophysical Technique? . . . . . . . . . . . . . . . . . 14.1 AFC or Y/N: Which is the Best Response Model? . . . . . . . 14.1.1 Subjective Criterion in the Y/N and Bias in n-AFC Response Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.2 Drawbacks of the n-AFC Response Models . . . . . . .
. . . . . . . 137 . . . . . . . 137 . . . . . . . 137 . . . . . . . 139
Contents
VIII
14.2
14.3 14.4
14.1.3 Drawbacks of the Y/N Response Model . . . . . . . . . . . . . . 14.1.4 Spatial or Temporal AFC? How Many Forced Choices? . . What is the Best Psychophysical Procedure? . . . . . . . . . . . . . . . . . 14.2.1 Nonadaptive vs. Adaptive Procedures . . . . . . . . . . . . . . . . 14.2.2 Adaptive Nonparametric Procedures: Variable vs. Fixed Step Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.3 Adaptive Nonparametric vs. Parametric Procedures . . . . . 14.2.4 Adaptive Parametric Procedures . . . . . . . . . . . . . . . . . . . 14.2.5 Considerations on the Staircase Procedures with Fixed Step Size: the Optimal Setup . . . . . . . . . . . . . . . . . . . . . . . . . . Drawbacks of the Maximum Likelihood Estimation . . . . . . . . . . . . Drawback of the Bayesian Procedures . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
141 142 144 144
. . 145 . . 146 . . 147 . . 148 . . 150 . . 153
CHAPTER 15 Introduction to the Signal Detection Theory . . . . . . . . . . . . . . . . . . . . . . . . . 155 15.1 SDT and the ROC Curve in Y/N and 2AFC Response Models . . . . . 161 15.2 The Single-Interval Adjustment Matrix Procedure (SIAM: Kaernbach, 1990) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 CHAPTER 16 Suprathreshold Psychophysics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1 Procedures Based on Magnitude Scaling: Category Scaling 16.2 Procedures Based on Magnitude Estimation . . . . . . . . . . . 16.3 Procedures Based on Reaction Time . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
169 169 170 171
CHAPTER 17 Brief Outline of Comparative Psychophysics . . . . . . . . . . . . . . . . . . . . . . . . . 173 Afterword. The Inverse Problem of Perception . . . . . . . . . . . . . . . . . . . . .
175
Appendix Appendix Appendix Appendix
. . . .
179 181 185 189
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
195
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
201
I- Logistic and Weibull Distribution . . . . . . . . . . . . . II- The Maximum Likelihood Estimation. . . . . . . . . III- Probit Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . IV- About Bayes’ Theorem and Bayesian Statistics
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Foreword
When asked what psychophysics is, the answer can be difficult. Romantically, psychophysics can be defined as the science aimed at measuring the soul. Namely the sensitive soul, according to which the ancient scholastic philosophers argued many centuries ago. Maybe it can sound overly romantic, but, probably, this is the most truthful definition: the goal of psychophysics is to quantify sensations, that is to say, the sensitivity of a sensory system: its capability of perceiving what it is in charge of. If physics measures actions, like the weight of a stone on a scale, psychophysics measures sense-a(c)tions, like the sensation of brightness of a spot of light: in this case, the scale is a sensory system: the visual system. The challenge of such intention relies on the variability, on the volatility of the sensations. Rather than being fixed values, like the constant altitude of an airplane flying straight across the sky, sensations are fluctuating quantities, like the twirling of a butterfly over a flowery meadow: for these reasons, the exact (objective) measure of a (subjective) sensation is difficult per se. To complicate things, unpredictable environmental or subjective variables can affect the flight of the butterfly. All these factors make hard to label the perceived strength of a sensation by assigning it a number, a quantity. And yet, psychophysics is crucial in sensory research and in the ophthalmological and audiological practice. Here is a treatise written by an ophthalmologist and addressed to other ophthalmologists, vision specialists, researchers, curious ordinary readers as well as health professionals who deal with psychophysical examinations, but have no specific competence on this discipline. I apologize in case careful reading will reveal some imperfections, especially in the mathematical equations and in the sections dealing with probabilistic computation. If it were the case, I am ready to make due corrections. For the consistent bulk of DOI: 10.1051/978-2-7598-2517-2.c901 Ó Science Press, EDP Sciences, 2021
X
Foreword
research on the topic, the sampling of references is unavoidably arbitrary, and I apologize in advance if many excellent papers may be missing. As this is a manual aimed to explain in the simplest way a difficult matter to a non-specialized audience, math is kept to a minimum. At the risk of sounding naïve in many passages, nothing is taken for granted and even basic concepts are explained as simply and clearly as possible. Notions are often repeated for sake of clarity. The book is rich in footnotes: this is on purpose, as it can be read at a double level: the basic concepts are exposed in the main text, with additional information and more specific explanations in the footnotes, available to those who are interested in more detailed information. Introducing as simple as possible the principles of psychophysics: this is the intent. Making psychophysics a fascinating subject: this is the challenge and our deepest desire. Carlo Aleci
Introduction
The undisputed father of psychophysics is Gustav Theodor Fechner, a German scientist and philosopher who directed efforts to find a way for measuring sensations mathematically.
Gustav Theodore Fechner (1801–1887).
Fechner distinguished two types of psychophysics: outer and inner psychophysics. Outer psychophysics deals with the relation between a sensation and the characteristics of the stimulus that evokes the sensation. Inner psychophysics, in turn, studies the relationship between a sensation and the characteristics of the underlying neural activation. At the time of Fechner, techniques suitable for investigating inner psychophysics were not available; more recently, the development of electrophysiological and imaging experimental methods DOI: 10.1051/978-2-7598-2517-2.c902 Ó Science Press, EDP Sciences, 2021
XII
Introduction
has made it possible, allowing a better comprehension of neurophysiology of perception.
Inner and outer psychophysics, threshold and suprathreshold psychophysics. See text for explanation.
In its common meaning, psychophysics is called threshold psychophysics, because it aims at estimating the threshold (i.e., the minimum perceivable intensity of a stimulus, or the just noticeable difference between two stimulations). However, a particular branch of psychophysics measures the effect of suprathreshold stimulations: it is the suprathreshold psychophysics. This treatise deals with the classical form of psychophysics: outer threshold psychophysics. An introduction to suprathreshold psychophysics, however, is provided in chapter 16.
Chapter 1 Adequate and Inadequate Stimuli In the classical sense of the term, psychophysics aims at measuring the sensitivity1 to external adequate stimuli. Stimuli are external when they are generated outside the visual system – in other terms, when they come from the external world. Stimuli are adequate in that they belong to the physical domain proper of the sensory system under examination, for example, sounds are adequate stimuli for the auditory system, and the sense of smell is adequate for the olfactory system. Light and light differences are the adequate stimuli for the visual system. Even if visual stimulation differs according to the perceptual task (for example, light spots for light sensitivity, sine-wave gratings for contrast sensitivity, high contrast configurations like letters, digits or draws for visual acuity, colored patterns for color perception, oriented lines for orientation sensitivity, random dots for motion perception, etc.), light is the lowest common denominator. As a matter of fact, the eye is not sensitive to sound waves but to light waves or photons. Yet, to some extent, the eye can be responsive even to non-adequate stimulations, like sudden pressure (trauma) on the eyeball or traction exerted on the retina: when a bulbar trauma or retinal traction2 occurs, the subject may experience an ephemeral, entoptic3 perception, made of lights or sparks, and defined photopsia. This means that not only light but also other physical stimuli can activate the visual
1 Attention should be paid to the difference between sensitivity and sensibility. We refer to sensitivity as to the ability of an organism or an organ to respond to external stimuli. In turn, according to the Oxford Dictionary, sensibility is the ability to experience and understand deep feelings. 2 A retinal traction takes place when the vitreous body (the transparent gel-like substance that fills up most of the inner space of the eyeball and is in contact with the inner layer of the retinal tissue) sticks to a circumscribed portion of the retina but is detached in the surrounding regions. 3 An entoptic perception is a form of visual perception whose source is not located in the external world but in the eye itself.
DOI: 10.1051/978-2-7598-2517-2.c001 © Science Press, EDP Sciences, 2021
2
Measuring the Soul TAB. 1.1 – Adequate and inadequate stimuli for the visual system.
Sensitivity of the visual system to: (1) Adequate stimulus: light The visual sensation depends on a light stimulation coming from the external world (2) Inadequate stimuli: sudden pressure on the eyeball → photopsia traction exerted on the retina → photopsia metabolic stimulation of the → phosphenes, theicopsia visual cortex The visual sensation depends on inadequate stimulation (traction, pressure, metabolic) coming from the eyeball or evoked in the cortex.
system: yet, sensitivity to such non-visual (inadequate) events is far lower compared to the visual (adequate) stimulations.4 Even chemical changes in the cerebral cortex, as they occur in seizures of visual aura, can evoke the illusory perception of visual phenomena, called phosphenes or (when they are more structured) theichopsias. To be noted is that in the first two cases, the visual sensation is evoked by inadequate stimulations of the eyeball (pressure/traction), while in the latter it is triggered by internal (cerebral) phenomena that activate the brain directly (table 1.1). In this book, these conditions will not be considered, and visual psychophysics will be treated as a discipline that deals with the effect of external and adequate stimulations. Since psychophysical measurement requires the active collaboration of the subject (contrary, for example, to electrophysiological testing or imaging diagnosis), we are inclined to define sensitivity, as the state of being sensitive referred to the subject and not simply to the organ – that is to say, the awareness of a sensation. The higher the degree of awareness of that sensation, the more sensitive is the subject in that particular sensory domain. As a matter of fact, the stimulation of the visual system may take place when the observer is absent-minded or tired. In this case, even if the stimulus evokes a sensation, the awareness of the sensation (we can call it the “perception”) can be absent or reduced so that the psychophysical response will be null or defective. In sum, psychophysics does not measure the mere responsiveness of a sensorial (for example, visual) system to an adequate stimulus, but the conscious responsiveness of the observer. If the observer is inattentive or unmotivated, the psychophysical outcome will be worse than expected despite perfect visual functioning. The dependence of the psychophysical outcome on the response of the subject, its susceptibility to his/her awareness as well as to his/her will or capacity to collaborate, on the one hand is probably the main drawback of psychophysics, on the other hand it is the challenge that contributes to make this subject so intriguing.
4
To compare the effect of different physical dimensions on a sensorial (e.g., visual) system, sensitivity can be expressed as the reciprocal of the amount of energy of the stimulation required to evoke a (visual) sensation at a predefined level of intensity.
Chapter 2 The Threshold The sensitivity of a sensory system is quantified as threshold. The visual threshold is the lowest intensity able to elicit a visual response; so, depending on the perceptual task, luminance threshold can be the lowest luminance1 that makes a spot of light perceivable, motion threshold can be the lowest speed of the target that makes its movement detectable, and contrast threshold can be the lowest contrast of the bars of a grating needed for their discrimination. The threshold is also defined as the just noticeable difference (jnd), that is the smalles difference between two levels of stimulation leading to a change in experience (Treutwein, 1995). Operatively, the threshold is the level of stimulation at which the stimulus is perceived with a given probability. The probability is pre-determined by the experimenter or depends on the psychophysical technique employed. According to the decision of the experimenter or the psychophysical technique, light threshold can be considered as the luminance level which makes the target perceivable half (50%) or three quarters (75%) of the times it is presented. These probabilities of perceiving the stimulus, therefore, refer to the intensity of the stimulus at threshold, and are called target probabilities. The threshold is the inverse function of sensitivity; the higher the sensitivity, the lower the level of stimulation that makes the stimulation barely perceivable. Let us consider a dim spot of light on a black background, so feeble to be invisible. Now, let us increase gradually its luminance to make the spot barely detectable: the minimum level of luminance of the spot that makes the subject aware of the presence of the stimulus is the luminance threshold for that observer in that experimental condition. The lower the luminance threshold, the higher is the luminance sensitivity of the subject. If a second observer needs more luminance to perceive the same spot of light, his luminance threshold will be higher, and, correspondingly, his sensitivity will be lower.
1
Luminance is the intensity of light emitted, transmitted, or reflected by a surface per unit area. The measure unit of luminance (International System) is candle per square meter (cd/m2).
DOI: 10.1051/978-2-7598-2517-2.c002 © Science Press, EDP Sciences, 2021
Measuring the Soul
4
Generally, the threshold is computed by presenting stimuli of a different intensity and singling out the level at which perception occurs with a predefined frequency or, even better with a predefined target probability.
2.1
Absolute and Difference Threshold
Psychophysics deals with two types of threshold: absolute threshold and difference threshold: – Absolute threshold is the lowest stimulus intensity (the target) needed to make the stimulus detectable2 compared to the zero intensity level: for example, a spot of light briefly presented on a black (i.e., zero luminance) background. – Difference threshold is the lowest stimulus intensity needed to make the stimulus detectable compared to a reference (or basal) intensity level > 0. We will refer to the basal intensity level as the pedestal. Consider, for example, a spot of light briefly presented on a (less intense) background. In this case, the difference threshold can be regarded as the just noticeable difference (jnd) that is the difference between the intensity of the stimulus Istim (the spot of light) and the intensity of the pedestal Ibas (the luminance of the background). On the contrary, when estimating absolute thresholds the jnd depends exclusively on the value of the target because the level of the pedestal Ibas is zero (figure 2.1). The difference threshold Δθ can be formalized as follows: Dh ¼ Istim Ibas ¼ DI ¼ jnd,
ð2:1Þ
where ΔI is the additional intensity of the stimulus required to make it perceivable: in other terms, ΔI is the jnd.
FIG. 2.1 – Absolute and difference threshold.
2
As explained in section 4, to detect means to perceive the difference between the event and its absence: for example, the onset of a dim flash of light; to discriminate means to perceive the difference between two events (e.g., the difference in luminance between two spots of light, the difference in contrast of two sine wave gratings, the difference in speed of two moving targets, the difference in length between two lines, and so on).
The Threshold
2.2
5
Weber’s, Fechner’s and Stevens’ Laws
For a wide range of intensities the jnd is proportional to the pedestal. This statement is intuitive. Let us assume, for example, to measure the just noticeable difference in weight. If an object weighting 20 g is placed in the palm of a hand, the least weight to be added to sense a difference is 0.4 g. This is the jnd: compared to the initial 20 g, the observer will be aware of an increase in weight if in his hand are placed at least 20.4 g. But if the weight is heavier, for example, 1 kg, adding 0.4 g will be not enough to make the subject aware of this “surplus”. In this case, the weight to be added to elicit a difference in weight sensation must be greater: at least 20 g. In substance, the higher the pedestal (Ibas: in our example 20 g and 1 kilo), the larger is the jnd (in our example 0.4 g and 20 g, respectively), and this proportion is constant: DI = Ibas ¼ k;
ð2:2Þ
ΔI is the additional intensity of the stimulus to sense the difference: it is the jnd. This law is known as Weber’s law and k is the Weber constant, from the name of Sir Ernst Heinrich Weber (1795–1878). The Weber constant k allows, therefore, computing to what extent a stimulus must be increased to experience the jnd: in fact, from the previous equation it can be derived that: DI ¼ kIbas :
ð2:3Þ
In sum, according to Weber’s law, the jnd increases with the intensity of the stimulus so that the sensitivity to a difference in signal strength decreases as the intensity of the signal rises (figure 2.2). The Weber constant k differs according to the sensory domain: in case of light intensity, k is 0.079: for example, the luminance of a target displayed on a 10 cd/m2 background must be at least 0.79 cd/m2; otherwise, it cannot be perceived. This is similar to saying that the jnd for a pedestal of intensity 10 cd/m2 is 0.79 cd/m2 and the corresponding intensity of the target at threshold is 10.79 cd/m2. In table 2.1, the values of k referred to different sensory domains are reported. k varies even within the same sensory domain, depending on the task: in the visual domain k for light intensity is 0.079, whereas k for the estimate of line length is lower, being 0.029. In figure 2.3 the increase in the jnd as a function of the intensity of the pedestal for different sensory domains is depicted. The constant of Weber assumes that the relationship between I and ΔI is linear. This is not always the case, especially at very high levels of intensity. As a matter of fact, in many sensory domains as well as for different visual tasks the jnd tends to increase faster than linearly; for example, to perceive a bright target 5 deg wide the observer needs a luminance of the target more intense than that predicted by Weber’s law. This is the equivalent of saying that as the signal strength rises, the magnitude of the sensation it generates (the sensory magnitude) tends to increase not linearly but at a lower rate.
Measuring the Soul
6
FIG. 2.2 – Luminance absolute threshold, difference threshold, and the constant of Weber. The absolute threshold (a) is the minimum level of intensity of a stimulus that makes the stimulus detectable starting from zero-intensity level. The difference threshold (b, c) is the additional intensity ΔI required for a stimulus to be detected when presented on a pedestal level. Lower left panel: According to the Weber law, the additional amount of intensity required to reach the jnd is proportional to the level of the pedestal. The superior line represents the limit of visibility. For comparison, the lower line represents the stimulus values that would be obtained by adding the same intensity (0.1 units) irrespective of the level of the pedestal. At each pedestal, if the stimulus intensity were increased not by a proportional value Ibask but by the same amount a (Ibas + a), the signal would be too low to be perceived.
TAB. 2.1 – The Weber constant in different sensory domains. Sensorial domain Light intensity Sound volume Weight Salt taste Electricity
k 0.079 0.048 0.020 0.083 0.013
To account for the lower than expected performance (that is for the nonlinear increase in the jnd, contrary to the prediction of Weber’s law) at the highest intensity levels of stimulation, Fechner suggested relating the jnd (therefore, the sensory magnitude) not to the stimulus intensity, but to its logarithm (Fechner’s law). DI ¼ k log Ibas :
ð2:4Þ
The Threshold
7
FIG. 2.3 – The constant of Weber k differs according to the sensory domain, and, within a sensory domain, according to the perceptual task (for example light intensity and line length).
Still, other types of stimulations follow the opposite trend. The sensory magnitude of electricity (current through fingers), for example, tends to increase (and not decrease) faster than linearly as the intensity of the stimulation grows: the higher the intensity of the pedestal, the less additional current is required to obtain the jnd. To account for these different psychophysical behaviors, Stevens posited that the magnitude of the sensation is related not to the stimulus intensity (or the logarithm of the stimulus intensity) multiplied by a constant factor (like the Weber’s and Fechner’s law), but to the stimulus intensity raised to some power multiplied by a constant factor. This is the Stevens’ law or Stevens’ power law (Stevens, 1957): wðI Þ ¼ KI a ;
ð2:5Þ
where ψ(I) is the perceived intensity of the stimulus (the sensory magnitude), I is the intensity of the stimulus, a is an exponent that depends on the type of stimulation, and K is a proportionality constant that depends on the unit used. For a = 1 the sensory magnitude follows the Weber’s law: the magnitude of the sensation rises as the stimulus intensity increases at a constant (directly proportional) rate: it is the case, for example, of the perception of the length of a line. For a < 1, the magnitude of the sensation rises as the stimulus intensity increases not at a constant (directly proportional) rate (as stated by Weber’s law) but at a lower rate (compressive function): it is the case of the perception of light spots 5 degrees wide. For a > 1, the magnitude of the sensation rises as the stimulus intensity increases not at a constant (directly proportional) rate but at a higher rate (accelerated function): it is the case of an electric shock (figure 2.4). Since the jnd is an inverse function of the sensory magnitude, it can be derived from the graph that the jnd for the perception of the length of a line and the perception of a spot of light 30 deg wide follows the Leber’s law; on the contrary, the jnd of a spot of light 5 deg wide increases faster than linearly as it is made brighter (compressive function), contrary to that predicted by the Weber’s law (Thoss, 1986). Moreover, the jnd for electric stimulation increases slower than expected as
Measuring the Soul
8
FIG. 2.4 – Stevens’ law. Linear, compressive, and accelerated functions relating sensory magnitude and stimulus intensity. See text for explanation. the electric impulse is made stronger (accelerated function). In fact, as the electrical stimulation increases, the subject is more sensitive than expected by Weber’s linear model to electrical stimulation. In table 2.2, the values of a for different sensory domains are reported. TAB. 2.2 – The values of a in different sensory domains. Sensory domain Loudness Vibration Vibration Brightness Brightness Brightness Brightness Lightness Visual line length Visual area Redness (saturation) Taste Taste Smell Cold Warmth Pressure on palm Muscle force Heaviness Electric shock Angular acceleration
a 0.67 0.95 0.6 0.33 0.5 5 1 1.2 1 0.7 1.7 1.4 0.8 0.6 1 1.6 1.1 1.7 1.45 3.5 1.4
Stimulation Sound pressure of 3000-Hz tone Amplitude of 60 Hz on finger Amplitude of 250 Hz on finger 5° target in dark Point source Brief flash Point source briefly flashed Reflectance of gray papers Projected line Projected square Red-gray mixture Salt Saccharine Heptane Metal contact on arm Metal contact on arm Static force on skin Static contractions Lifted weights Current through fingers 5-s rotation
Chapter 3 The Psychometric Function As explained in chapter 2, to measure a visual threshold means to single out the stimulus level which allows perceiving the target with a given frequency or, even better, probability: for example, 50% of the times. The simplest way to achieve this goal is by presenting the subject n-groups of n- identical stimuli so that the stimuli that belong to the same group have the same intensity and the stimuli that belong to different groups have different intensities (method of constant stimuli). Presumably, the observer will be able to perceive not even one of the stimuli belonging to the lowest intensity sets. As sets of higher intensities are presented,1 at least one of the stimuli will be seen. From this point on, the frequency of seeing the target increases rapidly as the intensity of the sets is made higher. It follows that the proportion of correct responses will rise from 0 to 10%, then 20%, then 30% etc., till it reaches 50%, that can be selected as the threshold probability level (the target probability φ: in our example φ = 50%). For higher values of intensity, the frequency of seeing the target increases further (60%, 75%, 90%), until the intensity level of the set is so high that the observer perceives all the targets (frequency of seeing the target: 100%). From this point on, an additional increase in intensity will not provide further effect, and the frequency (i.e., probability) to see the next (more intense) set of stimuli remains constant (100%). The relationship that links the probability to perceive a stimulus with its intensity is called the psychometric function (figure 3.1). In the visual domain, the psychometric function that describes the probability to see the stimulus as a function of its intensity in each point of the visual field2 is also known as the frequency-of-seeing curve (FOSC).
1
The order of the presentations is randomized. The visual field is the extent of space that is perceived when the eye is directed straight ahead. The clinical assessment of the visual field is called perimetry.
2
DOI: 10.1051/978-2-7598-2517-2.c003 © Science Press, EDP Sciences, 2021
10
Measuring the Soul
FIG. 3.1 – Psychometric function. For low levels of intensity, the proportion of correct responses (i.e., the proportion of perceived stimuli) is virtually 0. As the intensity is increased (beyond the level 4 in this example), the probability to “hit” the target (correct response) rises until it reaches 50% at stimulus level S0.5 = 13. This is the threshold at a target probability φ of 0.5. An additional increase in intensity determines a further rise in the probability of perception until the proportion of correct responses is 100% at a stimulus intensity S1 = 24. From this level on, higher intensities do not lead to additional psychophysically measurable effects.3
3
Linear vs. logarithmic abscissa: to be noted that, in the figure, the abscissa is on linear scale. Still, the signal strength is generally reported on a logarithmic scale. This, indeed, is preferable, especially if the range of stimulations is wide. Moreover, the intensity level expressed logarithmically makes the increase in the stimulus strength (step size) proportional to the actual level of intensity across the whole range of stimulations. In fact, on a linear scale increasing by 1 unit the intensity of a stimulus of intensity = 1 means doubling its intensity (from 1 to 2 units = 100% increase), whereas raising by the same amount (1 unit) a stimulus of intensity = 10 means making it more intense just by 10% (from 10 to 11 units = 10% increase). Increasing by the same amount (1 unit) a stimulus of intensity = 100 makes it more intense just by 1% (from 100 to 101 units = 1% increase). In other words, increasing the intensity by the same amount across the whole spectrum of stimulation does not lead to increments in the signal strength proportional to the actual signal level. In sum, in linear scale, the size of the increment is not proportional, but smaller and smaller as the level of intensity increases (a rise of 10 units means increasing the signal strength tenfold from 0 to 10 units, but just twofold from 10 to 20 units, and even less, i.e., just by 10%, from 100 to 110 units, and by 1% from 1000 to 1010 units). On the contrary, on a logarithmic scale, the increment remains proportional to the actual stimulus level. Starting from level 0, increasing the signal level by 10 dB equals to raise the intensity by 10 linear units (in fact: 10log10 10 = 10). Increasing the intensity of the stimulus by the same amount (i.e., by 10 dB) from 10 to 20 dB equals to raise the intensity much more, that is by 90 linear units (from 10 to 100 linear units, in fact: 10log10 100 = 20). Likewise, increasing the intensity of the stimulus by the same step size (10 dB) from 20 to 30 dB equals to raise the intensity by 900 linear units (from 100 to 1000 linear units, in fact: 10log10 1000 = 30). In sum, the step size when expressed in logarithmic scale tends to widen more and more as the level of the stimulus is made higher, so that its width as a function of the level of the signal remains constant.
The Psychometric Function
11
The psychometric function, therefore, defines the probability to perceive (“hit”) a stimulus (signal) or a difference between two or more stimuli as a function of the level of a variable of the signal (brightness, size, contrast, speed, length, etc.). The psychometric function is obtained by plotting the sum of the binary4 outcomes versus the stimulus level. The psychometric function is similar to a cumulative distribution function derived by the binomial 5 distribution of the responses. In psychophysics, a cumulative distribution function describes how the probability to perceive the stimulus cumulates (from the left to the right of the curve, i.e., from the lowest to the highest levels of stimulation), therefore rises, as the level of stimulation increases (see figure 3.1). Being similar to a cumulative distribution function,6 the psychometric function has the shape of a sigmoid with a lower and an upper asymptote, and a slope between them. The shape of the sigmoid (e.g., symmetrical or asymmetrical) can be described by different types of distribution: for example, the logistic distribution (symmetrical) differs from the Weibull distribution (asymmetrical)7 (figure 3.2).
4
Binary response: a binary response is a response to a stimulus that considers two mutually exclusive outcomes: hit or miss (correct or incorrect, seen or not seen). 5 Binomial distribution: a binomial distribution is the probability distribution of binary responses in a sequence of statistically independent trials. A sequence is statistically independent when it reflects a stationary process. In a stationary process, the response given at each subsequent trial should not be influenced by subjective or environmental changes during the examination: in other terms, the probability should not be affected by a shift in time. This condition is not strictly met in psychophysics: due to adaptation phenomena, fatigue or attention drops, for example, incorrect (or “miss”) responses can occur more frequently at the end of the examination compared to the first stages. In turn, the rate of correct responses may be higher as the test progresses for learning effect. 6 Difference between the cumulative distribution function and the psychometric function: as reported by Treutwein (1995), in hindsight, the psychometric function differs from a cumulative distribution function inasmuch in the latter the asymptotic limits correspond to 0 and 100% probability, whereas in the psychometric function the lower asymptote may correspond to values different from zero as a consequence of false-positive errors (hit responses to non-existent stimuli or to stimuli with intensity so low to make them unperceivable by the observer: guess rate). In turn, the upper asymptote is often set to a probability level lower than 100% for the possible occurrence of false-negative errors (incorrect or no responses to stimuli that should be easily perceived by the observer: lapse rate). In other terms, for extremely low stimulus intensities (close or equal to zero), the expected probability of perceiving the target should be zero (p = 0), whereas for very high intensities the probability of perceiving the target should be 100% (p = 1). If it were the case, the psychometric function would be a true cumulative distribution function with asymptotes 0 and 1. And yet, the possible occurrence of false-positive responses makes the lower limit of the distribution function a little higher than 0. Likewise, the possibility of false-negative responses makes the upper asymptote of the distribution a little lower than 1. 7 Typical cumulative distributions suitable to fit psychometric functions are the Weibull distribution, the normal cumulative distribution, the logistic distribution or the Gumbel distribution. See appendix I for the definition of the logistic and the Weibull distribution.
12
Measuring the Soul
FIG. 3.2 – Difference between the logistic (left) and Weibull (right) cumulative distribution. Note the upper asymptote.
Which model to adopt is not substantial: the estimate of threshold and slope is roughly the same irrespective of the shape of the psychometric distribution pre-selected by the experimenter8 (Harvey, 1986; Wichmann & Hill, 2001b). In other terms, there is not a function (logistic, normal cumulative, Weibull…) more appropriate than another, but from time to time the distribution that best fits the data obtained in the experiment can be adopted (Strasburger, 2001, 2001b). The position on the x-axis and the shape of the psychometric function are described by four parameters: – A position parameter α across the intensity scale: α is the threshold, is placed about the midpoint of the slope of the sigmoid, and localizes the psychometric function along the x-axis (signal intensity9: figure 3.3). – A dispersion parameter of the observations β. The dispersion parameter β refers to the slope of the psychometric function, generally estimated below and above the threshold level. As recalled by Leek (2001), the slope describes “how rapidly performance changes with a given change in stimulus value”. A steep slope means that below and above the threshold the observations are concentrated within a
Footnote 7 continued
From Treutwein, 1995. Reproduced with permission of Elsevier via Copyright Clearance Center. However, sometimes the shape of the curve matters with regard to the computation of the threshold: in the Weibull distribution, that is asymmetrical, the target probability for the threshold must be placed at a higher level compared to the normal or logistic function. See appendix I. 9 Stimulus strength X can be expressed as normalized threshold units: Xnorm = X/α. This way, the stimulus intensity Xnorm is independent of the scale of the intensity (abscissa) [Klein, 2001]. So, if α is 1.2 log units, a 2 log units stimulus is Xnorm = 2/1.2 = 1.66 normalized threshold units, whereas for a stimulus 1.2 log units Xnorm = 1.2/1.2 = 1 normalized threshold unit. 8
The Psychometric Function
13
FIG. 3.3 – The threshold, in this case corresponding to a target probability φ =0.5 (50% of probability to perceive the stimulus), is set about the midpoint of the slope of the sigmoid and localizes the psychometric function along the axis of the stimulus level (signal strength): higher thresholds set the actual function rightward (i.e., toward higher intensities), lower thresholds leftward (i.e., toward lower intensities). From Treutwein, 1995. Reproduced with permission of Elsevier via Copyright Clearance Center. narrower interval of intensities; a flatter slope means that below and above the threshold the observations are dispersed over a wider interval of intensities (figure 3.4). In sum, the slope of the psychometric function expresses the width of the transitional range from no perception to perception, while the threshold is the absolute position of this transition (Kontsevich & Tyler, 1999).
FIG. 3.4 – The slope β of the psychometric function expresses the amount of dispersion of the observations as a function of the stimulus intensity above and below the threshold. A steep slope means that above and below the threshold the observations are concentrated within a narrow range of stimulus intensities. A flatter slope means that above and below the threshold the observations are scattered across a wider range of stimulus intensities.
14
Measuring the Soul
As an alternative definition, the slope of a psychometric function is “the extent over which it displays non-asymptotic behavior, measured in whichever units are relevant” (Garcia-Pérez, 1998). – The position of the lower asymptote along the y-axis (γ). The parameter γ represents the base rate of performance when no signal is presented or when its intensity is too low to make the stimulus perceivable. In other terms, γ is the guess rate (i.e., response by chance). The parameter γ localizes the position of the lower asymptote of the psychometric function on the ordinate scale, i.e., along the axis representing the proportion of correct responses. If a subject guesses 2% of the non-perceivable stimuli presented during the test, the position of the lower asymptote will be at 0 + 0.02 = 0.02 probability level (2%), whereas if no guesses are made, the position of the lower asymptote will be at 0% probability level (figure 3.5).10
FIG. 3.5 – Representation of how γ (guess rate, false-positive errors) and λ (lapse rate, false-negative errors) determine the position of the lower and upper asymptote of the psychometric function in a yes/no response model (see section 5.1). As explained, due to the occurrence of guesses and lapses, the psychometric function differs from a real cumulative distribution function. The upper asymptote, in fact, does not represent 100% (p = 1) of correct responses as expected in a cumulative distribution function but, for the occurrence of lapses (in this case 0.02), it is placed at a lower level. The upper asymptote is therefore placed at 1–0.02=0.98, i.e., 98% of correct responses (instead of 100%). In turn, the lower asymptote does not represent 0% of correct responses as expected in a cumulative distribution function, but, for the occurrence of a proportion of guesses (in this case 0.02), it is set at a higher level. The lower asymptote is therefore placed at 2% of correct responses (instead of zero percent).
10
To anticipate what will be explained further (sections 5.1 and 5.2), in a yes–no response model (when the observer is required to answer “yes” only whenever he/she perceives a spot of light), γ (the proportion of correct responses for non-perceivable stimuli) is expected to be close to 0. In a 2-alternative forced choice (2AFC) 2 presentations are presented and the interval containing the variable of interest must be selected. In this case, since the observer is forced to guess (chance level) when he is not able to choose the correct interval, the probability to guess correctly (γ) is 50%.
The Psychometric Function
15
– The position of the upper asymptote along the y-axis (1−λ). This is provided by λ, the probability to miss when the signal is expected to be fully visible. In other terms, λ is the lapse rate (false-negative errors). The inverse of λ (1−λ) localizes the position of the upper asymptote of the psychometric function on the ordinate scale. If a subject misses 2% of the stimuli expected to be seen for sure, the position of the upper asymptote will be at 1–0.02 = 0.98 probability level (98%), whereas if no lapses are made (λ = 0) the position of the upper asymptote will match 100% probability level (1–0 = 1) (see figure 3.5). The four parameters describing the psychometric functions are reported in figure 3.6. In conclusion, the psychometric function is described by the four parameters a; b; c; and k. The threshold is defined by a given level of performance (for example 50% in a yes–no response model or 75% in a 2AFC response model), or, within an analytic framework, it is a point (α) halfway of the slope of the psychometric function. The projection of this point onto the x-scale (stimulus intensity or signal strength) informs on the position of the curve. The slope is the dispersion parameter β that expresses the rate of change in the correct response probability (or the proportion of perceived stimuli, i.e., hit rate) as a function of the signal level. A steep slope means that small changes in the signal level lead to great changes in the probability of correct responses. The distribution is bounded by a ceiling given by the lapse rate λ and a floor resulting from the guess rate c.11
FIG. 3.6 – The four parameters that characterize the psychometric function (see text for explanation).
11
An intriguing aspect is that the single neurons’ response function (firing rate vs. signal strength in motion discrimination) resembles a psychometric function. See in this respect: Britten KS, Shadlen MN, Newsome WT, Movshon JA. (1992). The analysis of visual motion: A comparison of neuronal and psychophysical performance. Journal of Neuroscience, 12: 4745–4765.
Measuring the Soul
16
The psychometric function, that expresses the probability to answer correctly as a function of the signal strength x, can be formalized as: Z
wðx Þ ¼ pð1; x; hÞ ¼ ð1 kÞF ðx Þ þ c½1 F ðx Þ;
ð3:1Þ
where λ is the lapse rate, γ is the guess rate and F(x) is a cumulative distribution function that describes the probability of a psychophysical outcome at stimulus intensity x. The cumulative distribution function can be a normal, logistic, Weibull function (or other functions), with a given slope β (or its inverse, the spread σ). As reported by the first equivalence, the psychometric function expresses the cumulative probability p of a correct response (=1) to a stimulus of intensity x given threshold θ. More precisely, as recalled by Garcia Pérez (1998), the equation “expresses the probability of a correct response as the sum of probabilities of detecting the pattern and not lapsing (first summand) and not detecting but guessing correctly (second summand).”12 An alternative way to formalize the psychometric function is: wðx; h; r; k; cÞ ¼ c þ ð1 k cÞFðx; h; rÞ;
ð3:2Þ
where θ is the threshold, that is the location of the function on the abscissa, σ is the spread (inverse slope of the function), and λ and γ are the lapse and guess rate, respectively). Whatever is the formula, assigning zero value to λ and γ will transform the psychometric function into a true cumulative distribution function: wðx; h; rÞ ¼ Fðx; h; rÞ:
12
Garcia Pérez, 1998, p. 1865.
ð3:3Þ
Chapter 4 Detection and Discrimination Threshold So far, we have defined a visual threshold as the limit between “seeing and not seeing the stimulus”. But what does “to see” mean in a visual psychophysical experiment? Basically, psychophysics of vision deals with two kinds of task and corresponding thresholds: – Detection task, if the goal is to measure the ability to detect the transition from a state of no stimulation to a state of stimulation: this takes place, for example, when a subject is asked to report the sudden onset of a spot of light (like in perimetry) or the presence of an oriented sine wave grating (like in contrast sensitivity testing), or the onset of a minuscule, barely visible stimulus on a uniform background, or, finally, to make out a cluster of coherently moving dots embedded in a field of randomly moving dots (coherent motion perception: figure 4.1). In a detection paradigm, the level of the null stimulus (i.e., the absence of the target, state of no stimulation) is set at the zero value on the x-axis for both absolute and difference threshold; when measuring an absolute threshold the intensity of the null target is zero, whereas when measuring a difference threshold (i.e., the difference between background and stimulus), the intensity of the null target is the same as the background (the pedestal): at this intensity, in fact, the target is undetectable. – Discrimination task, if the goal is to measure the ability to discriminate between a state of stimulation and another state of stimulation: that is to recognize the change for a given variable of the stimulus, or to judge whether the target is equal or different from a reference configuration. A discrimination threshold, for example, is estimated when the observer is asked to judge if two lines have a different orientation (orientation discrimination), if two elliptical targets have different eccentricity, if two sine wave gratings have different contrast, or to judge which of n-lines is longer compared to a reference line. In these cases, it is not a question of detecting the onset of a stimulus but of comparing a given variable of a stimulus or a variable of a set of stimuli, so as to establish if they are the same or if they differ (figure 4.2). DOI: 10.1051/978-2-7598-2517-2.c004 © Science Press, EDP Sciences, 2021
18
Measuring the Soul
FIG. 4.1 – Detection threshold. The task is to detect the onset of the stimulus, i.e., to report the difference between its absence and its presence or, even better, the transition from a state of no stimulation to a state of stimulation.
FIG. 4.2 – Discrimination threshold. In the reported cases, the task is to detect the difference in length, orientation or contrast of the target (left) compared to the reference (right). The reference and the test(s) can be presented sequentially or at the same time. In a different modality, the reference may be not displayed, but simply described before the examination starts. The observer, for example, is asked to judge if a bar is oriented clockwise or counterclockwise compared to an implicit (not displayed) reference bar (orientation discrimination). In sum, in a detection task, the goal is to detect the presence of a stimulus; in a discrimination task, the goal is to detect the presence of a difference between two (or more) stimuli. In a detection task, the null stimulus is zero (in case of absolute thresholds) or its intensity is the same as the background (in case of difference thresholds). In a discrimination task, the null stimulus matches the value of the reference stimulus, and the signal can have both a negative or a positive value compared to the reference stimulus: for instance, a test line can be longer (positive value) or
Detection and Discrimination Threshold
19
FIG. 4.3 – Upper panel: stimulation range for a detection task (task: estimating difference luminance). The value goes from 0 to positive, where 0 refers to no stimulation (in this case target intensity = background intensity = null stimulus). Lower panel: stimulation range for three discrimination tasks (estimating the difference in line orientation, the contrast of a sine wave grating or the length of a segment compared to a reference). The reference is the null stimulus (value: zero, the middle target of the row). The test stimulus can be tilted clockwise or anticlockwise (case 1: line orientation), it can have higher or lower contrast (case 2: contrast of a grating), or it can be/longer or shorter (case 3: line length) compared to the reference. To be noted that in these three cases the stimulation varies from negative to positive values.
shorter (negative value) compared to the reference line, or a sine wave grating (test) can be displayed with a contrast higher (positive value) or lower (negative value) compared to the reference grating (figure 4.3). Both in detection and discrimination tasks the threshold is computed for a single variable of the stimulus (for example, luminance of a spot or contrast of a sine wave grating), whereas the other parameters of the signal (for example, size of the spot or spatial frequency of the grating, and in any case presentation time) are kept constant.
Chapter 5 Response Models To measure a detection or discrimination threshold, the observer is asked to perform a particular task, responding according to the instructions given by the operator. In essence, two response models can be used in a psychophysical experiment: yes/no (y/n) and alternative forced choice (AFC).
5.1
Yes/No Response Model [Y/N]
In a detection threshold, the subject is asked to respond “yes” whenever he/she perceives the target. The “yes” response marks the difference between the previous perceptive condition of no-target (“null” stimulus) and the actual condition of the presence of the target (figure 5.1). In a discrimination task, the observer is asked to respond (“yes”) if he/she judges that the two stimuli (the target and the reference) are different from the variable under investigation: for example, a “yes” response is expected when the contrast of a sine wave grating (the target) is judged higher or lower than the contrast of a reference grating. Therefore, the yes-response marks the difference between the target and the reference (figure 5.2).
5.2
Alternative Forced Choice Model [AFC]
According to the ASTM International, “alternative forced choice (AFC) test is a method “in which 2, 3, or more stimuli are presented, and assessors are given a criterion by which they are required to select one stimulus”.1 AFC response models are particularly suitable for measuring discrimination thresholds: the observer is forced to judge a particular variable of the target by
1
ASTM International. 2009. Standard Terminology Relating to Sensory Evaluations of Materials and Products, E253-09a. ASTM International, West Conshohocken, PA.
DOI: 10.1051/978-2-7598-2517-2.c005 © Science Press, EDP Sciences, 2021
22
Measuring the Soul
FIG. 5.1 – Y/n response model, detection task. In this case, light spots of variable luminance are presented. At each interval, the observer is required to respond “yes” (for instance pressing a button) if he/she perceives the stimulus. The stimulus level that corresponds to 50% of correct responses (perceived targets) is generally considered the detection threshold. In this example, a difference threshold is measured because the light spot is displayed on a non-zero luminance background.
FIG. 5.2 – Y/n response model, discrimination task. In this example, two sine wave gratings (the reference displayed at a constant contrast level and a test grating displayed at a contrast level that changes trial after trial) are presented. The observer is asked to respond “yes” whenever a difference in the contrast of the two stimuli is perceived. This is a discrimination task because the aim is to discriminate between two levels of contrast. The discrimination threshold referred to contrast sensitivity is the just noticeable difference (jnd) of contrast between the two stimuli (i.e., the difference of contrast needed by the observer to realize that the contrast of the test grating is different from the contrast of the reference 50% of the time). In turn, a detection task requires to report the presence of a grating with variable contrast whenever it is displayed.
choosing between two or more alternatives. Stimuli can be presented during the same interval (simultaneously) or sequentially. We will refer to these as spatial and temporal AFC, respectively. In both cases, the AFC paradigm presents n alternatives, one of these containing the criterion of interest. For example, in a discrimination task the observer is forced to judge which of three presentations is a line tilted clockwise (3AFC) (figure 5.3).
Response Models
23
FIG. 5.3 – AFC response model, discrimination task. Upper panel: spatial 3AFC. At each trial, the observer is asked to choose which spatial interval (left, middle or right) contains a clockwise line. Lower panel: temporal 3AFC. The observer is required to choose which of the three temporal intervals displayed sequentially (first, second, third) contains a clockwise line. The example is a discrimination task because the task is to compare a variable (in this case orientation) of the displayed target (lines) with the reference variable (vertical line).
Here is a second example: the subject is forced to judge which of the following stimuli is a circle (figure 5.4). A reference stimulus can be displayed in addition to the n-alternatives and the observer is forced to judge which alternative differs from the reference. For example,
FIG. 5.4 – AFC response model, discrimination task. Upper panel: spatial 4AFC. At each trial, the observer is asked to choose which interval (1, 2, 3, 4) is a circle. Lower panel: temporal 4AFC. The observer is asked to choose which of the four sequential intervals (first, second, third, or fourth) is a circle. The example is a discrimination task because the task is to compare a variable (in this case the eccentricity) of the displayed target with the reference variable (circle = zero eccentricity).
Measuring the Soul
24
FIG. 5.5 – Temporal 3-AFC response model, discrimination task (contrast sensitivity). A reference grating followed by three alternatives is presented. At each trial, the observer is forced to judge which interval (a, b, c) differs from the reference.
in a 3AFC contrast discrimination task 3 options are presented side by side or sequentially, together with the reference. The subject is forced to choose which interval matches or differs from the reference. An example is depicted in figure 5.5.
5.2.1
The “Implicit” AFC Version
The examples in the previous sections refer to the classical forced choice response models: the alternatives are n-stimuli presented according to a spatial or temporal paradigm and the task is to single out the stimulus that contains the criterion of interest. Another type of AFC, we will call the implicit AFC version,2 may be assumed. Unlike the standard AFC version, in the implicit AFC version only one stimulus is displayed, alone or together with a reference, and the subject is asked to choose between n-possible alternatives.3 In this case the alternatives are not presented but described verbally by the operator before the examination starts. According to the number of “implicit” alternatives, the response model can be 2AFC, 3AFC, 4AFC or n-AFC.4 Both discrimination and detection thresholds can be measured by adopting implicit AFC response models. i - Implicit AFC Response Model to Estimate Discrimination Thresholds. AFC in its implicit version can be used to estimate discrimination thresholds between a target stimulus and some (implicit) alternatives. For instance, in an implicit 3-AFC, the observer is explained that a horizontal ellipse or a vertical ellipse or a circle will be displayed on a PC monitor (three alternatives). Each time the stimulus is presented, the subject has to judge which of the three alternatives matches the target (Aleci et al., 2010) (figure 5.6). Even if our description of the “implicit AFC version” does not exactly satisfy the definition of AFC according to the ASTM International, we believe it can be considered as a type of AFC, because the only difference is that the n-alternatives are not presented, but reported verbally. 3 AFC response models (as implicit version) were first introduced in 1858 by Bergmann, who used oriented gratings to estimate visual acuity. The task was to judge the orientation of the bars of the stimulus. 4 An example of 10AFC of this type has been reported by Strasburger (2001). 2
Response Models
25
FIG. 5.6 – Example of 3 AFC, implicit version. Estimate of discrimination threshold referred to spatial relationship perception (Aleci et al., 2010). At each trial, a circular, horizontal or vertical elliptical target is displayed. Trial after trial the observer is forced to judge the correct answer choosing one of three alternatives: vertical ellipse, horizontal ellipse or circle.
If a reference is presented in addition to the stimulus to be judged, the two configurations can be displayed simultaneously (spatial paradigm) or sequentially (temporal paradigm). In every case, again, the possible alternatives are not displayed. For example, a test and a reference line are presented to measure orientation discrimination (figure 5.7). The observer is forced to judge if the test line is oriented toward the left or the right compared to the reference: “leftward-oriented” or “rightward-oriented” are, therefore, the two implicit (not displayed) alternatives. Another example: A reference and a test line are presented and the observer is forced to choose if the latter is shorter, equal or longer compared to the former (implicit 3AFC with reference: figure 5.8). The cases described so far aim at measuring discrimination thresholds. In theory, implicit AFC response models can measure detection thresholds as well.
FIG. 5.7 – Example of implicit 2AFC with reference. Estimate of the discrimination threshold referred to lines orientation: at every trial two lines, the reference and a test stimulus, are presented oriented clockwise or counterclockwise. The observer is forced to report the orientation of the test line compared to the reference choosing between two (implicit) alternatives: clockwise or anticlockwise: (a) temporal paradigm (first interval: reference, second interval: test stimulus; (b) spatial paradigm. Compare with figure 5.3 (3AFC, standard version).
26
Measuring the Soul
FIG. 5.8 – Example of implicit 3AFC with reference. Estimate of the discrimination threshold referred to lines length: at every trial, two lines of equal or different lengths, the reference and a test stimulus, are presented. The observer is forced to choose if the test line is longer, shorter or equal (three implicit alternatives) compared to the reference. (a) Temporal paradigm (first interval: reference, second interval: test stimulus, (b) spatial paradigm.
ii- Implicit AFC Response Model to Estimate Detection Thresholds. Let us assume an experiment that aims to measure contrast detection threshold. The subject is explained that at each trial a sine wave grating with variable contrast made up of bars oriented leftward or rightward will be presented. He is asked to judge the orientation of the bars of the grating. This is an example of implicit 2AFC applied to a detection task: to identify the orientation of the bars, the observer has to detect the bars. If the contrast of the bars is too low, the bars will not be detected: the subject will not perceive any grating and will be forced to guess its orientation. In contrast to the detection threshold measured with the y/n response model, reporting the presence of the grating (“yes” response) is not enough in this case, but the subject must demonstrate to have perceived the stimulus by choosing the correct (implicit) alternative referred to its orientation (figure 5.9).
FIG. 5.9 – Example of implicit 2AFC for contrast detection threshold. Temporal paradigm. Trial after trial, a sine wave grating with a given contrast is displayed. The grating can be oriented leftward or rightward. The observer is forced to select the correct orientation.
Response Models
5.3
27
Target Probability in Y/N and AFC Response Models
Both y/n and AFC models deal with binary outcomes so that at each trial the response can be “miss” (not correct) or “hit” (correct). In a y/n model, the proportion of correct responses that characterizes the psychometric function ranges (ideally) from 0% (lower asymptote of the psychometric function: no detection or discrimination at all at the lowest signal level) to virtual 100% (upper asymptote: constant detection or discrimination at the highest signal level): in this case, the threshold should be set at least at 50% of correct responses (the midpoint of the slope of the psychometric function). Because the goal of a psychophysical test is to identify the level of intensity that corresponds to a given probability to answer correctly (in this case 50%), this probability is called the target probability (φ). In sum, the threshold in a y/n response model is the signal intensity that (generally) corresponds to the 50% probability point of correct responses (φ = 0.5) on the psychometric function. In the 2AFC response model, the proportion of correct responses that characterizes the psychometric function ranges from 50 to 100%. The 50% level is the probability to guess which one of the two presentations is correct (chance performance). It follows that in a 2AFC paradigm the threshold is set at a probability of correct responses halfway between chance (50% probability of correct response, i.e., guessing rate, corresponding to the lower asymptote of the psychometric function) and constant correct response (100% probability of correct response, corresponding to the upper asymptote of the psychometric function). The threshold, evidently, cannot be set at φ = 0.5 (i.e., 50%) of correct responses, but must be placed at a higher level of probability, say 75%. In fact, φ = 0.75 is the midpoint of the slope of a 2AFC psychometric function. In a 3AFC, the chance performance is lower, namely 33.3%: since the psychometric function ranges from 33% to virtually 100% of correct responses, φ is generally selected at 0.66.5 It is clear, now, that the target probability depends on the type of response model. Besides, as explained further, many adaptive psychophysical procedures converge on a specific target probability; so, when a psychophysical procedure is associated with a response model, attention must be paid to the fact that the φ to which the psychophysical procedure converges cannot be lower than the φ required by the response model: for example, 2AFC response models, that require φ ≥ 0.75, cannot be associated with psychophysical procedures targeting the threshold at φ = 0.5.
5
In AFC response models, chance performance (that is to say the probability of guessing right) is the reciprocal of the number of n-possible alternatives: 2AFC = 1/2 = 0.5 (50%); 3AFC = 1/3 = 0.33 (33%); 4AFC = 1/4 = 0.25 (25%), etc. It follows that the midpoint of the psychometric function in n-AFC response models will be: (100%-chance performance %)/2] + chance performance %.
Measuring the Soul
28
5.4
Alternative Unforced Choice Response Model (AUC: Kaernbach, 2001)
In the adaptive procedures described in the next sections, the next stimulus is presented at a level of intensity that depends on the previous answer: at a higher intensity (i.e., easier to detect or discriminate) in case of a miss response, at a lower intensity (i.e., more difficult to detect or discriminate) in case of a hit response. The alternative unforced choice response model (AUC: Kaernbach, 2001) has been conceived to address the problem that arises when, during a 2AFC paired with an adaptive procedure, a sequence of correct (“lucky”) guesses occurs. After a false-positive sequence, some stimuli will be presented with an intensity too low compared to the threshold of the observer: these presentations are thereby useless. In these cases, recovering the adequate testing level (close to the threshold) requires additional trials, prolonging the examination time. Besides, if the test is made of a small number of trials and the guesses occur at the beginning of the run,6 the final estimate of the threshold will suffer from a low level of confidence. The AUC differs from the AFC inasmuch “don’t know” responses are admitted when the observer is undecided what to answer (low confidence responses). This way, the subject is no more forced to guess when his sensitivity is too low compared to the level of the stimulus, but he can answer “I don’t know”, minimizing lucky guesses. To set the intensity of the next stimuli after a “don’t know” response,7 the author suggests using the following formula: 1 a1 dmiss ; ddon0 t know ¼ dhit þ a a
ð5:1Þ
where δ is the step size and a is the number of alternatives. For example, in the weighted up-down staircase method,8 the signal level is decreased by 1δ after a hit response (δhit = 1), and it is increased by 3δ after a miss response (δmiss = 3) when the convergence to φ = 0.75 is required. According to equation 5.1, in case this weighted staircase (targeting φ = 0.75) were associated with the 2AUC response model, δdon’t know will be 2: 1 21 3 ! ddon0 t know ¼ 0:5 þ 1:5 ! ddon0 t know ¼ 2: ddon0 t know ¼ 1 þ 2 2 In summary, when the weighted up-down staircase targeting φ = 0.75 is associated with the 2AUF response model, the stimulus level is reduced by 1 step size after a hit response, is increased by 3 step sizes after a miss response, and is increased by 2 step sizes after a “don’t know” response. 6
In an adaptive procedure, a run is a sequence of presentations in the same direction (incremental or decremental). 7 The amount of signal to be added or subtracted after a hit, miss, and (in this case) “don’t know” response is called step size. 8 See section 9.5.
Response Models
29
Kaernbach compared simulated sessions of 2AUC and 2AFC weighted staircase, and yes/no simple staircase.9 The 2AUC weighted staircase proved to be slightly more efficient compared to the 2AFC weighted staircase and both were more efficient compared to the y/n simple staircase.10 According to the author, the AUC response model is particularly suitable for naïve or inexperienced subjects and within the clinical setting (Kaernbach, 2001): in fact, AUC is not only more efficient but also more comfortable than the 2AFC response design. These advantages are preserved even when using n >2-alternatives. A potential drawback of the AUC is the introduction of a subjective, internal criterion to decide whether to choose an alternative or to refrain from choosing an alternative, admitting “don’t know”. However, according to the author, the bias generated by this internal criterion is lower than the bias generated by the yes–no response model, and overall marginal.11
5.5
Modified AUC (Klein, 2001)
To further improve the AUC, Klein (2001) devised a modified version. It differs in that the answers for the uncertain trials are three: (1) “I do not know which is the correct answer (or choice)”; (2) “Probably I am right (that is: “probably my choice is correct”) but I am not so sure”; (3) “Probably I am wrong (that is: “probably my choice is incorrect”), but I am not so sure”. The observer can thus choose among four possible answers that are: – Surely hit (correct) answer: the observer is sure to have selected the correct alternative: high confidence for a correct response (HC). – Surely miss (incorrect) answer (i.e., “I do not know which is the correct choice”); the observer is sure to be unable to select the correct alternative: high confidence for an incorrect repsonse (HI). – Probably hit (correct) answer. The observer believes he has selected the correct alternative, but he is not 100% sure to be right: low confidence for a correct response (LC). – Probably miss (incorrect) answer. The observer suspects he has not selected the correct alternative, but he is not 100% sure to be wrong: low confidence for an incorrect response (LI). The stimulus level varies according to the following rule: – For a sure hit (surely correct, HC): move 1 level less intense (i.e., −1). 9
See section 9.1. In this experiment the task was not visual but auditory. 11 See Kaernbach (2001), pp. 1379–1380 for an exhaustive explanation on this issue. 10
Measuring the Soul
30
– For a sure miss (surely incorrect, HI): move 5 levels more intense (i.e., +5). – For a probable hit (probably correct response, LC): move 1 level more intense (i.e., +1). – For a probable miss (probably incorrect response, LI): move 3 levels more intense (i.e., +3). Or, alternatively: HC → −1. HI → + 5. LC → 0. LI → + 4 (Klein, 2001). In summary, in its original (Kaernbach, 2001) and modified versions (Klein, 2001), the AUC response model helps avoid guesses by allowing the observer to answer “undecided”, or to admit low levels of confidence. This solves the problem of sequential guesses at low intensities that divert the correct testing level from the region of the expected threshold. In this case, in fact, exams with a small number of trials (as those used in the clinical setting) can be unable to recover in time the threshold testing level.
Chapter 6 False-Negative and False-Positive Errors (Lapse and Guess Rate) Two types of error characterize a psychophysical test: false-negative errors (FN, lapses) and false-positive errors (FP, false alarms or guess rate1). A lapse takes place when the observer misses (“no” in the y/n or wrong alternative in the n-AFC response model) a well perceivable signal. For example, if the observer does not respond to a stimulus of luminance 50 cd/m2 after having detected a target of 10 cd/m2, probably the miss response is a false-negative error. On the contrary, a false alarm takes place when the observer reports the occurrence of a stimulus that is too weak to be perceived or is not presented at all. Lapses generally depend on attention drops or poor cooperation, and are regarded as a sign to judge a test poor reliable. In fact, frequent false-negative errors decrease the slope of the psychometric function, so that the variability of the threshold estimate increases (Green, 1995). Yet, missing a stimulus can occur even in highly reliable observers as a result of the intrinsic fluctuation of sensitivity. The fluctuation of the threshold within a certain range is physiological: a rhythmic change in sensitivity has been demonstrated by Thoss and colleagues (1998) and by Lotze and Associates (2000). When sensitivity is low, the fluctuation of the threshold tends to increase, as it occurs in the scotomatous regions of the visual field. In this case, the short-term fluctuation of the threshold is above average, so that the sensitivity of the observer at time T2 can happen to be lower compared to time T1: it follows that despite a signal of low intensity was perceived at T1, the same signal or a signal of higher intensity unexpectedly may not be perceived at T2. In these cases, false-negative responses depend on the fluctuation of sensitivity and not on the low reliability of the observer.
The term guess should be used to denote, in AFC designs, the responses guessed by the observer who is forced to respond to non perceived stimuli.
1
DOI: 10.1051/978-2-7598-2517-2.c006 © Science Press, EDP Sciences, 2021
Measuring the Soul
32
With the y/n response model, the false-positive errors are more indicative signs of unreliability than the lapses: if an abnormal fluctuation of the sensitivity may account for lapses in a reliable observer, there is no reason to respond to a stimulus that has not been perceived (even more so if it has not been presented).2 Errors, be they false-positive or false-negative, bias the estimate of the threshold. In a yes/no response model, the correction for the guess rate is provided by the Abbott’s formula:3 wðv Þ Fp wadj ðv Þ ¼ ; 1 Fp
ð6:1Þ
where: – wadj ðv Þ is the parameter of interest, that is the adjusted (real) proportion (or probability) of correct (hit) responses for a given stimulus variable v. In other terms wadj ðv Þ is the true hit rate after accounting for the false-positive errors. – wðv Þ is the proportion of correct responses (hit rate) at the end of the experiment (from 0 to 100%). – Fp is the proportion of false-positive errors over the total amount of false-positive stimuli presented. For example, let us suppose to have a hit rate ψ(v) = 80% (i.e., 0.8) in a y/n detection task: 80 stimuli out of 100 of the same intensity have been perceived. The proportion of false-positive errors Fp is 10% (i.e., 0.1 = 1 out of 10 false-positive trials presented). According to equation 6.1: 0:8 0:1 0:7 ! wadj ðv Þ ¼ ! wadj ðv Þ ¼ 0:77: wadj ðv Þ ¼ 1 0:1 0:9 According to the Abbot formula, in the experiment, the true hit rate wadj ðv Þ for the detection threshold in the presence of 10% of false-positive errors is 77%. This means that due to the guess rate (false-positive errors), the true hit rate is lower than the measured hit rate by 3%. In case of no occurrence of false-positive errors, wadj ðv Þ coincides with wðv Þ: 0:8 0 0:8 wadj ðv Þ ¼ ! wadj ðv Þ ¼ ! wadj ðv Þ ¼ wðv Þ ¼ 0:8: 10 1
2
When lapses and false alarms occur during an exam of the visual field (perimetry), the former generate threshold estimates more biased in subjects with high sensitivity, the latter yields measures more biased in subjects with low sensitivity (Anderson & Johnson, 2006). False responses tend to remain constant during the perimetric examination (Johnson et al., 1988), and their occurrence does not seem to change after feedbacks are given to the patient (Johnson & Nelson-Quigg, 1993). 3 Also called the probability-based correction for guessing. As explained by Klein (2001), an even better way to account for the false-positive errors is the z-score-based correction for bias derived from the signal detection theory. See section 15.1.
False-Negative and False-Positive Errors (Lapse and Guess Rate)
33
An extension of this formula includes also the false-negative errors: wadj ðv Þ ¼
wðv Þ Fp : 1 Fp Fn
ð6:2Þ
In AFC response models, the guess rate is a normal occurrence below the threshold level, for the subject is forced to guess when he does not recognize the target. The guess rate depends on the number of alternatives n that characterizes the n-AFC paradigm:4 so, the position of the lower asymptote is a function of n.
4
See section 5.3.
Chapter 7 Psychophysical Procedures The threshold, be it a detection or discrimination task, can be estimated by asking the observer to respond according to a pre-specified response model (y/n or n-AFC). It remains to state which way the stimuli must be presented, in other terms what psychophysical procedure to adopt. The psychophysical procedures can be: – Nonadaptive: The procedure does not comply with the psychophysical behavior of the observer: in other words, trial after trial, the stimuli are presented irrespective of the response of the subject. – Adaptive1: The intensity of the next stimulus depends on the previous response (s), so that the procedure complies with the performance of the observer. The nonadaptive psychophysical procedures are: (1) The method of constant stimuli. (2) The method of limits and its application for continuous or quasi-continuous stimulations in the domain of time: the method of adjustment.
7.1
Nonadaptive Psychophysical Procedures: the Method of Constant Stimuli
The method of constant stimuli is the most straightforward way to derive a psychometric function. The observer is presented with different sets made of an identical number of stimuli. Stimuli belonging to a set have the same intensity, and different sets have different intensities. To avoid adaptation phenomena, presentations are randomized2
1
Also defined titration methods (Rose et al., 1970). The randomization is referred to the variable under examination: luminance, contrast, orientation, speed, etc. The other parameters (e.g., size or presentation time) are kept constant.
2
DOI: 10.1051/978-2-7598-2517-2.c007 © Science Press, EDP Sciences, 2021
36
Measuring the Soul
and at each presentation the observer is required to detect or discriminate the target. At the end of the examination, the proportion of correct responses is sorted as a function of the intensity of the stimuli: from minimum to maximum. At very low intensities of the target, the observer is expected to miss all the stimuli (hit rate = 0% or close to 0% in the presence of false-positive errors); the proportion of correct responses within each set increases with the strength of the signal so that the at highest values all the targets of the set will be perceived (hit rate = 100% or close to 100% in the presence of false-negative errors). The threshold is computed as the intensity of the set of stimuli showing a pre-defined proportion φ of hit responses. Consider 6 sets of stimuli, each made of 8 identical spots of light (total: 48 targets), so that the stimuli belonging to set 1 are the dimmest and those belonging to set 6 are the brightest. The 48 stimuli are presented in randomized order, then the proportion of correct responses is plotted as a function of the signal strength. In figure 7.1, an example is shown: in this case no stimuli of the first group (the dimmest) are detected, in the second set just one stimulus is detected, in the third set two stimuli, in the fourth set four stimuli, and in the fifth set six stimuli are detected: finally, in the last set (the brightest) all the stimuli are detected. If we set the threshold at φ = 50%, in our example, the threshold corresponds to the luminance of the stimuli belonging to the fourth group.
FIG. 7.1 – The method of constant stimuli. 1, 2: six sets of presentations each of them made up of 8 identical stimuli are presented in randomized order; 3: the proportion of correct responses is plotted as a function of the stimulus intensity.
Psychophysical Procedures
37
Because the range of stimulation is a discrete (and not continuous) sequence, the target probability φ (e.g., 0.5) may not correspond exactly to one of the stimulus levels tested by the procedure. In a block of stimuli of a given intensity (e.g., intensity level 25), the hit rate can happen to be just below φ (e.g., 46%); in the next block of more intense stimuli (suppose intensity level 26), the hit rate may be above φ (suppose 0.52). As explained by Ehrenstein and Ehrenstein (1999), the threshold can be interpolated using the following formula: h ¼ a þ ðb a Þ
50 pa ; pb pa
ð7:1Þ
where pa and pb are the hit rates that “bracket” the target probability (respectively, lower and upper levels), and a, b are the corresponding intensities of the set. In the example reported above, the threshold θ will be: h ¼ 25 þ ð26 25Þ
50 46 4 ! h ¼ 25 þ 1 ! h ¼ 25 þ 0:66 ! h ¼ 25:66: 52 46 6
In this example, the estimated threshold (target probability 50%) is at the intensity level 25.66. The method of constant stimuli takes into account all the responses given by the subject and the level of the whole set of stimuli is fixed in advance: it is, therefore, the most straightforward tool to obtain the proportion of correct responses as a function of the signal level, that is to say to plot the whole psychometric function. (threshold and slope: figure 7.2). The only two parameters to set in advance are: – The total number of trials (i.e., number of presentations per set of stimuli per number of sets). The number of presentations per set, i.e., the number of stimuli displayed at each level of intensity, is a variable that deserves consideration: Treutwein (1995) recalled that the precision with which the percentage of
FIG. 7.2 – The method of constant stimuli is the most straightforward way to plot the psychometric function.
Measuring the Soul
38
correct responses is measured depends not only on the false-positive and false-negative rates (this undesirable variable can be minimized by applying the Abbot’s formula3) but also on the number of trials in each set: the higher the number of trials per set n, the lower the variance4 of the proportion of correct response phit, as shown by the formula: p ð1 phit Þ : Varðphit Þ ¼ hit n
ð7:2Þ
– The step between the sets is called by Watson and Fitzhugh (1990) grain. For grains too small (e.g., differences in intensity of 1 dB), the testing interval may not include the threshold, or it tends to be overestimated (positive bias); for grains too large (e.g., 5–6 dB), some of the levels may be too far from the threshold to be informative, or the threshold tends to be underestimated (negative bias). The best compromise, according to the authors, is a grain of 4 dB. In the method of constant stimuli, the target probability is established by the operator. The procedure estimates not only the threshold but the entire psychometric function (slope included). Finally, it provides a way to interpret the response bias, and the random or pseudorandom sequence of the presentations helps prevent non-stationary responses (Watt and Andrews, 1981).5
7.2
Nonadaptive Psychophysical Procedures: the Method of Limits and the Method of Adjustment
In the method of limits, the value of the parameter is changed in small incremental fixed steps, starting from a minimum to a maximum level of intensity; then, the run can be reversed. The average between the intensity of the signal after the last miss response in the incremental sequence and the last hit response in the decremental sequence is taken as the threshold (estimated at φ = 0.50: figure 7.3).6 The main difference between the (nonadaptive) method of limits and the (adaptive) simple up-down staircase (that will be described in section 9.1) is that 3
See chapter 6. The variance is a measure of the spread between values in a data set: in other terms, it measures how far each number in the set is from the mean. In this case, it expresses the variability of the proportion of correct responses at each stimulus intensity. 5 See further for the definition of stationary responses (chapter 8). 6 Leek (2001) recalled that the first psychophysical method used to study the threshold had been developed by Hughson & Westlake during World War II (1944) and referred to the auditory system. The technique made use of ascending sequences of presentations from inaudible to audible stimuli. Such a strategy differed from the classical method of limits, inasmuch in the latter the estimate of the threshold is based not only on the ascending sequence but also on a descending run. 4
Psychophysical Procedures
39
FIG. 7.3 – The method of limits. The level of the stimulus is increased by a predefined step size. The procedure is then reversed (descending steps). The stimulus level set, respectively, after the last miss response in the incremental sequence and the last hit response in the decremental sequence is taken as the threshold.
the former does not reverse the run when the trend of response changes (from miss to hit or vice versa). The method of adjustment is similar to the method of limits and is suitable for testing continuous variables like luminance or contrast. The subject is required to “adjust” the value of the variable (by increasing or decreasing its level) so as to make it equivalent to the value of a reference or, in absence of a reference, to make the signal subjectively barely detectable: for example, the task is to change the contrast of a sine wave grating until it is perceived as identical to a reference (discrimination threshold), or to adjust the contrast that makes the grating barely visible (detection threshold). In both cases, this level of signal is the point of subjective equivalence (PSE, the difference threshold) of the observer (figure 7.4).
FIG. 7.4 – The method of adjustment. See text for explanation.
Chapter 8 Adaptive Psychophysical Procedures The main requirement for psychophysical tests administered within the clinical setting is a combination of precision and short examination time: actually, the organization of routine work in the clinical institutions requires fast diagnostic procedures; moreover, time-consuming examinations may cause fatigue effect in patients, leading to low reliability. In the nonadaptive procedures (constant stimuli and limits) stimuli are presented according to a pre-established order, irrespective of the response of the subject: in other terms, the way nonadaptive procedures work is pre-determined as the sequence of the presentations is decided in advance. It follows that the time taken to carry out the exam depends on the reaction time of the observer, on the delay between the stimuli, on their duration, and, last but not least, on their number. About this last, crucial point, indeed, the main drawback of the method of limits and especially of the constant stimuli is that a consistent amount of presentations is set at levels far from the threshold: these trials prolong the examination time without providing any substantial information about the threshold.1 To solve this issue, presentations should be concentrated about the expected threshold, avoiding signals so strong or so weak to be surely hit or missed, and as such non-informative. In ideal conditions (if the approximate location of the physiological threshold is known) the method of constant stimuli is efficient2 (Simpson, 1988). But in real conditions, where the location of the threshold is not known, and especially within the clinical setting where a high variability of the sensitivity is common among pathological subjects, the efficiency of the method of constant stimuli and in general of the nonadaptive procedures is not satisfactory. To improve efficiency, the adaptive psychophysical procedures have been developed.
1
And yet, these presentations are informative about the slope of the psychometric function (see section 11). 2 The efficiency of a psychophysical procedure is proportional to its accuracy and inversely proportional to the examination time.
DOI: 10.1051/978-2-7598-2517-2.c008 © Science Press, EDP Sciences, 2021
42
Measuring the Soul
In the adaptive methods, the level of the next stimulus depends on the previous answer(s) of the observer: if the observer “hits” the stimulus, its intensity is reduced, otherwise it is increased. As a result, the threshold is achieved faster, no matter where it is located on the scale of signal intensity. In sum, a procedure is adaptive because it adapts to the answers of the observer, focusing the examination on the threshold region, and avoiding to present unnecessary stimuli: as a result, testing time is consistently reduced. More formally, the adaptive procedures combine the stimulus response Zn to the stimulus value Xn at trial n (or at preceding trials) to obtain the stimulus value Xn+1 that will be presented at the next trial n + 1 (Treutwein, 1995). It is worth recalling at this point that, to avoid biased estimates, the psychophysical procedures should behave as stationary stochastic processes. Adaptive psychophysical procedures are considered stochastic processes as it is assumed that the response to the previous stimulus does not determine the response to the next stimulus/stimuli, and the probability that the intensity of the stimulus Xn+1 is the threshold occurs with a probability Φ that depends on the outcome of the previous presentation(s). This way, the probability that the stimulus intensity Xn+1 is the threshold increases more and more as the procedure progresses. The psychophysical stochastic process is assumed to be stationary. In a stationary stochastic process the probability distribution of the responses does not change over time: given some sequences made of identical presentations, time after time the mean and variance3 of the distribution of the responses in each sequence remains the same.4 To be stationary, the responses should be independent of each other (statistically independent, this is the so-called Bernoulli assumption). If in the last sequences the rate of hit responses is higher due to learning effect, or lower for adaptation phenomena, fatigue and/or attention drops, consecutive responses cannot be considered statistically independent, but biased by the previous ones: in these cases, indeed, the process is not stationary and in the first example the final threshold will be biased lower (biased towards a better performance), in the other examples it will be higher (biased towards a worse performance). In sum, even if psychophysical procedures are assumed to be stationary stochastic processes, for the abovementioned reasons this is not properly the case. In substance, the main goal of the adaptive procedures is to single out as fast as possible and with the highest precision the threshold, wherever it is located across the spectrum of intensity. The more the examination progresses, the more Xn approaches, or better, converges on the threshold. As explained, the convergence
The variance σ of the distribution of the responses is its squared standard deviation: σ = SD2. Stationarity is a property of a process in which mean and variance do not change over time: but how long is this time interval? The definition of stationarity in psychophysics, indeed, is relative, as it is attributed to a relatively short sequence of responses. But, to be truly stationary, the parameters of a process must remain constant over a very long sampling of data. In turn, if stationary data are collected during a relatively short interval of time, local oscillations of the responses (the “wavelength of the data”) can make the process seem non-stationary.
3 4
Adaptive Psychophysical Procedures
43
may occur at a target probability φ of 50%, 75% or n- % correct responses, depending on the type of procedure and on the response model adopted. Adaptive methods may differ in the following respects: (a) For the rules that drive the placement of the next stimulus level as a function of the response given at the previous presentation(s), in particular: – After how many consecutive miss responses a more intense stimulus should be presented (one? Two? More?). – After how many consecutive hit responses a less intense stimulus should be presented (one? Two? More?). – To what extent the intensity (or step size δ) of the subsequent stimuli must be changed?5 Is the step size a fixed value or it varies as a function of the actual stimulus level? (b) For the stopping rule, i.e., for the criterion chosen to end the test. (c) For the way the threshold is computed. The shape of the psychometric function referred to the visual task under investigation is unknown, or, based on previous experiments or data reported in the literature, it can be assumed to be known. According to this criterion, Treutwein (1995) recalls that adaptive procedures can be classified as:6 – Nonparametric adaptive procedures, that are used when no assumptions are made on the shape of the psychometric function. – Parametric adaptive procedures, that are used when the shape of the psychometric function referred to the visual task under examination is assumed to be known. Since the shape of the psychometric function is assumed to be known, the parametric procedures single out, among a family of identical distributions differing only in their position along the x-axis (the axis of the stimulus level), the one that best fits the performance of the observer (maximum likelihood). The point on the slope of this psychometric curve corresponding to a given target probability (for example 0.5 or 0.75) matches the intensity level perceived by the subject with that target probability; therefore, it corresponds to the threshold (figure 8.1).
5
Knowing a priori the shape of the psychometric function, in particular its slope, helps speed up the threshold estimate. Knowing the steepness of the slope, in fact, allows selecting the most suitable step size to approach the threshold. In presence of a flat slope the signal intensity has to be changed by a consistent amount at each trial to increase the probability of correct (or miss) responses: thereby, with shallow slopes wide step sizes are preferable in order to speed up the examination. On the contrary, with steep slopes little changes of intensity increase consistently the hit or miss rate of the responses: in this case small step sizes should be used to preserve the precision of the procedure. 6 In both classes the only requirement is the monotonicity of the psychometric function. A function is monotonic when the trend is unidirectional.
44
Measuring the Soul
FIG. 8.1 – In the parametric adaptive procedures, the only variable to be estimated is the position of the curve along the intensity axis (x). The positional parameter is the threshold (α) that in this case converges on a target probability p = 0.5, corresponding to the midpoint of the slope. Higher thresholds place the function rightward (i.e., toward higher levels of intensity), lower thresholds place the function leftward (i.e., toward lower levels of intensity).
Chapter 9 Nonparametric Adaptive Psychophysical Procedures Most of the nonparametric adaptive procedures are based on a combination of ascending and descending sequences, the reversals depending on the trend of responses of the observer, and are therefore called staircase procedures. Staircase procedures, initially described by Dixon and Mood (1948), base the computation of the threshold on the serial responses of the subject, so that the next stimulus level depends on the response given to the previous stimulus (or stimuli): if the stimulus is detected or discriminated, the algorithm presents the next stimulus to a lower signal level; otherwise, the level is increased. Staircase procedures are simple and flexible and do not require any assumption on the shape of the underlying psychometric function. Their principles have been clearly explained by Treutwein in his review dating back to 1995.
9.1
Truncated Staircase Method (Simple Up-down Method: Dixon and Mood, 1948; von Békésy, 1960; Cornsweet, 1962)
In the simple up-down method the signal value (SV) is increased after each miss response and reduced after each hit response by a fixed step δ. The procedure starts from a suprathreshold (i.e., easy to detect or to discriminate) SV so that a series of hit trials is expected. The SV is reduced by δ after each correct response until a miss will occur at a SVm. At this point, the sequence (or run) is reversed1 by the same δ until the next hit at SVh. The threshold θ is usually estimated as the mean between SVm and SVh: θ = (SVm + SVh)/2, and converges on a target probability = 0.5.
1
The switch from a decremental to incremental run or vice versa is called reversal.
DOI: 10.1051/978-2-7598-2517-2.c009 © Science Press, EDP Sciences, 2021
Measuring the Soul
46
The examination should be no longer than 20 trials (Cornsweet, 1962) and ends after a predetermined number of reversals. Because the procedure converges on a target probability of 0.5, it cannot be associated with the 2AFC response model (that requires a target probability of at least 0.75).2 The simple up-down method can be formalized as follows: SVn þ 1 ¼ SVn d ð2Rn 1Þ;
ð9:1Þ
where SVn+1 is the next stimulus value, Rn is the response (hit or miss) to the actual stimulus value SVn (for a hit: R = 1, for a miss: R = 0), and δ is the step size. For example, consider a sequence starting from 40 dB (log luminance value) and δ = 2 dB. Let us suppose the following responses: SVn ¼ 40: hit; then SVn þ 1 ¼ 40 2ð2 1 1Þ ! SVn þ 1 ¼ 40 2 ! SVn þ 1 ¼ 38; SVn þ 1 ¼ 38: hit; then SVn þ 2 ¼ 38 2ð2 1 1Þ ! SVn þ 2 ¼ 38 2 ! SVn þ 2 ¼ 36; SVn þ 2 ¼ 36: hit; then SVn þ 3 ¼ 36 2ð2 1 1Þ ! SVn þ 3 ¼ 36 2 ! SVn þ 3 ¼ 34; SVn þ 3 ¼ 34: hit; then SVn þ 4 ¼ 34 2ð2 1 1Þ ! SVn þ 4 ¼ 34 2 ! SVn þ 4 ¼ 32; SVn þ 4 ¼ 32: hit; then SVn þ 5 ¼ 32 2ð2 1 1Þ ! SVn þ 5 ¼ 32 2 ! SVn þ 5 ¼ 30;
SVn þ 5 ¼ 30: miss; then SVn þ 6 ¼ 30 2ð2 0 1Þ ! SVn þ 6 ¼ 30 2ð1Þ ! SVn þ 6 ¼ 30 þ 2 ! SVn þ 6 ¼ 32; SVn þ 6 ¼ 32: miss; then SVn þ 7 ¼ 32 2ð2 0 1Þ ! SVn þ 7 ¼ 32 2ð1Þ ! SVn þ 7 ¼ 32 þ 2 ! SVn þ 7 ¼ 34; SVn þ 7 ¼ 34: hit; then SVn þ 8 ¼ 34 2ð2 1 1Þ ! SVn þ 8 ¼ 34 2 ! SVn þ 8 ¼ 32;
2
As anticipated in chapter 3 and explained in section 5.3, the target probability is the probability of correct responses generated by a stimulus intensity that is assumed to be at the threshold level. It generally corresponds to the midpoint of the slope of the psychometric function. In a yes–no response model, the correct response probability ranges virtually from 0 to 100%, so the target probability is generally set at p = 0.5. In a 2AFC response model, the correct response probability ranges from 50% (chance level) to 100%, so that the target probability is generally set at p = 0.75. Each adaptive procedure converges to a given target probability: for example, in the simple up-down staircase it is 0.5, in the transformed 1 up-2 down staircase it is 0.707 (in both cases is computed using the y/n response model: see next section 9.2.
Nonparametric Adaptive Psychophysical Procedures
47
The estimated threshold SVθ is 32 dB: it is the average between the intensity of the first miss (SVn+5 = 30 dB) during the first (decremental) sequence and the intensity of the next hit (SVn+7 = 34 dB) during the second (incremental) sequence (figure 9.1). It should be noted from the example that since the level of 32 dB (SVn+4) has been perceived during the first (decremental) run, the same intensity level should have been perceived during the second (incremental) run when presented at SVn+6. In fact, this is not always the case. The discrepancy can be due to the occurrence of a false-positive error at SVn+4 or to the effect of short-term fluctuation of the visual sensitivity.3 In the up-down simple staircase, the step size δ is predetermined. A step size too small will generate a great number of presentations, thereby low efficiency; in turn, a step size too large will produce a continuous alternation of presentations beyond and below the threshold. Cornsweet concluded that the step size should be selected to make a run no longer than 4 presentations.
FIG. 9.1 – Truncated staircase method or simple up-down method. In this example, the procedure starts from suprathreshold stimuli and ends at the second reversal. The threshold is the average between the intensity of the first miss response during the first (decremental) run and the intensity of the first hit response during the second (incremental) run. In this example, therefore, it is: (30 + 34)/2 = 32. In the figure, the intensity of the signal decreases from bottom to top (compare with the next figures).
3
See chapter 6.
48
9.1.1
Measuring the Soul
Two Interleaved Staircases to Preserve the Assumption of Independence
In the up-down staircase method, as well as in the other staircase procedures that are mentioned in the next sections, the assumption of independence tends to be violated: due to adaptation and other phenomena, the response to the next stimulus may be biased to a certain extent by the presentation of the previous one(s). To address this issue, Cornsweet (1962) suggested using two interleaved independent staircases. As he reported: “The examiner predetermines two starting points instead of the usual one. The first stimulus is presented at one of these predetermined levels, and the response is recorded. On the next trial, the next predetermined level is presented, and the response is recorded […]”. The third stimulus is made more or less intense according to the response given to the first stimulus (staircase A). In turn, the fourth stimulus is made more or less intense according to the response given to the second stimulus (staircase B). “In this way, two staircase-series are run concurrently, one on odd- and the other on even number of trials, each alternate stimulus depending upon the response to the previous stimulus in its own staircase” (Cornsweet, 1962: p. 490). Rather than alternating regularly the presentations between staircase A and staircase B, the author suggested randomizing the stimuli of the first and second staircase. This double staircase mixed procedure is expected to avoid the bias of the dependence of the next response on the previous one(s).
9.2
Transformed Up-Down Staircase Method (Up-down Transformed Response, UDTR: Wetherill and Levitt, 1965; Levitt, 1970)
The transformed up-down method differs from the simple up-down method inasmuch the next decremental stimulus level SVn+1 depends on the response of the last two or more presentations. The step size δ is kept constant, like in the simple up-down method. Therefore, SVn+1 increases after a miss (like in the simple up-down method), but decreases only after the occurrence of two consecutive hits. This rule is labeled as 1-up/2-down transformed method (figure 9.2). Different rules converge on different target probabilities, always higher than 0.50. In the 1-up/2-down method, the target probability is 0.707,4 in the 1-up/3-down method it is 0.794, in the 1-up/4-down method it is 0.841, and in the 1-up/6-down method it is 0.89.5 These values of φ, higher than 0.5, make the transformed up-down staircase method suitable for AFC response models.
4
In other terms the procedure converges to the stimulus intensity detected 71% of the time. These are the four rules provided by Levitt to obtain four different target probabilities. Other rules have been subsequently reported to achieve other convergence points on the psychometric function (see Brown, 1996, table 1). The formula for the computation of φ in the 1-up n-down transformed
5
Nonparametric Adaptive Psychophysical Procedures
49
FIG. 9.2 – The transformed 1-up/2-down method. SV is increased after a miss response but is decreased only after two consecutive hit responses.
9.2.1
(UDTR) Staircase with Rating of Confidence
When performing a staircase procedure, Levitt suggested making use of additional information, namely a rating of confidence about the response provided by the subject at a given trial (Levitt, 1970). The rating of confidence is used to decide the intensity of the next stimulus that is to say the step size: if the rating of confidence is high (i.e., the observer reports to be highly confident that his response is corrected) the step size will be large; otherwise, it will be small. This strategy appears useful
Footnote 5 continued pffiffiffiffiffiffiffi staircase methods (paired to y/n response models) is provided by Leek (2001) as φ = n 0:5. The following table summarizes φ for the most used UDTR rules: UDTR rule 1U1D(simple up/down staircase) 1U2D 1U3D 1U4D 1U5D 1U6D
φ 0.5 0.707 0.794 0.841 0.870 0.890
It is worth recalling, however, that even if the convergence of UDTR staircases on φ depends on the adopted U/D rules, it is affected to a certain extent by the step size and by the ratio between up- and downstep size when adopting a hybrid “weighted–UDTR staircase” (see section 13.4.3). Finally, the convergence of the UDTR differs according to the response model: when paired to the 2AFC, the 1U2D and 1U3D transformed staircase converge on φ = 0.67 (and not φ = 0.707) and on φ = 0.75 (and not φ = 0.794, like in the y/n response model), respectively (Kaernbach, 1990, see section 13.4.3).
Measuring the Soul
50
especially at the beginning of the experiment, when the prior information on the position of the threshold is uncertain. Subsequently, the strategy can be maintained or reverted to a normal up-down staircase.
9.3
Forced-Choice Tracking (Zwislocki et al., 1958)
UDTR staircases, as explained, may estimate the threshold at different target probabilities so as to make the procedure suitable for different n-AFC response models. The 1U3DTR, probably the most used UDTR, is compatible with the 2AFC paradigm that requires φ ≥ 0.75. As a matter of fact, 1U3DTR does not converge precisely on φ = 0.75 but to a higher level (φ = 0.794). Zwislocki described a modified 1U3DTR that avoids the discrepancy and converges on the expected target probability of 0.75. This forced-choice tracking method is a transformed up-down staircase that differs from the 1U3DTR of Wetherill and Levitt in that the signal level is reduced after three correct responses not necessarily consecutive. The author provided a mathematical demonstration of the effective convergence on φ = 0.75. The theoretical prediction has been confirmed by auditory experiments in real subjects and simulated sessions (Zwislochi et al., 1958; Zwislocki and Relkin, 2001). The forced-choice tracking is more efficient (less time consuming) than the classical 1U3DTR procedure, as “when consecutive correct responses are required, they may be erased by incorrect responses occurring before a required number of correct responses is completed. Then, the correct responses have to be accumulated anew. The time taken up by an incomplete and erased set of correct responses is wasted. When non-consecutive responses are accented, no responses have to be erased” (Zwislochi and Relkin, 2001, p. 4814).
9.4
Non-Parametric Up-down Staircase Method (Derman, 1957)
In the non-parametric up-down method, the stimulus level is always increased by a fixed δ after a miss response. After a hit response, its reduction is not certain, but happens with a given probability. The non-parametric up-down method has been formalized by Treutwein (1995) as: SVn þ 1 ¼ SVn dð2Rn SU 1Þ;
ð9:2Þ
where SΦ is a binomial random variable whose value is 1 with probability φ, and Rn is the response (hit or miss) to the actual stimulus value SVn.: for a hit: R = 1, for a miss: R = 0.
Nonparametric Adaptive Psychophysical Procedures
51
Whenever a miss occurs, SVn+1 is always increased by δ. Whenever a hit occurs, SVn+1 is not always decreased by δ, but it is decreased by 1 , where φ is the target probability set by the δ with a probability p = 2u experimenter. For a target probability of 50%, this method matches the truncated up-down method. In fact: u50% ¼
1 1 1 ! ! : 2u 2 0:5 1
In this case, therefore, the binomial random variable SΦ is set at 1 with a probability of 100% (i.e., its value is always 1) and the equation becomes the same as the truncated up-down method: SVn þ 1 ¼ SVn dð2Rn 1 1Þ:
ð9:3Þ
For a hit response Rn =1, then: SVn þ 1 ¼ SVn dð2 1 1 1Þ ! SVn þ 1 ¼ SVn d 1 ! SVn þ 1 ¼ SVn d: For a target probability = 0.5, therefore, SVn+1 will be always reduced by δ after a hit response (i.e., with 100% probability: p = 1). On the contrary, for a target probability of 75%, after a hit SVn+1 will be reduced by δ with a probability p = 0.66 (66%, instead of 100%). In fact: u75% ¼
1 1 1 ! ! ¼ 0:66: 2u 2 0:75 1; 5
If this occurrence takes place, then the binomial random variable SΦ is set at 1, so that after a hit SVn+1 is reduced by δ. However, the SΦ can be set to zero with the complementary probability of 44%, so that, after a hit, SVn+1 may be increased (and not decreased as normally expected) by δ (figure 9.3). The higher the target probability, the less probable is the occurrence of a reduction in δ after a hit. For example for φ = 85%: U85% ¼
9.5
1 1 1 ! ! ¼ 0:59: 2u 2 0:85 1:7
Weighted Up-down Method (Kaernbach, 1991)
In the weighted up-down method, δ is not fixed but differs according to the direction (incremental or decremental) of the run. The amount of difference in the up/down step size depends on the target probability φ:
Measuring the Soul
52
FIG. 9.3 – The nonparametric up-down method with target probability = 0.75. The intensity of the stimulus is always increased after a miss response but, unlike the simple up-down staircase, it is not always reduced after a hit response. This occurrence, indeed, happens with a probability Φ which depends on the target probability φ; so, it is possible that after a hit the intensity of the stimulus increases instead of decreasing (dashed arrows).
dup ¼ ddown
1u ; u
ð9:4Þ
where δup is the incremental step after each miss and δdown is the decremental step after each hit. For φ = 0.5, δ is the same irrespective of the ascending or descending direction of the run: dup ¼ ddown
1 0:5 0:5 ! dup ¼ ddown ! 1dup ¼ 1ddown ; 0:5 0:5
in other terms, after a hit or a miss response, the procedure changes the signal level by the same amount, i.e., 1 step size (1down/1up): in this case, the weighted staircase corresponds to the simple up-down method. Instead, for φ = 0.75:6 dup ¼ ddown
1 0:75 0:25 ! dup ¼ ddown ! dup ¼ 0:33 ddown ; 0:75 0:75
so; 1dup ! 1=3ddown or; more intuitively: 1ddown ! 3dup ðsee footnote 6Þ: So, the procedure decreases the signal level by 1 step size after a hit response, and increases the level by 3 step sizes after a miss response (figure 9.4).
1δdown is 0.99 δdown rounded to unity.
6
Nonparametric Adaptive Psychophysical Procedures
53
FIG. 9.4 – Weighed up-down method. The size of δ depends on the direction of the run (incremental or decremental) and is driven by the target probability φ. For a target probability > 0.5, δ is asymmetric. In this example φ = 0.75: after a hit response, the procedure reduces the signal level by 1 step size, whereas after a miss response the level is increased by 3 step sizes. For φ = 0.80, after a hit response, the procedure decreases the signal level by 1 step size, whereas after a miss response, the level is increased by 4 step sizes. In fact: dup ¼ ddown
1 0:8 0:2 ! dup ¼ ddown ! 1dup ¼ 0:25ddown ; 0:8 0:8
so; 1dup ! 1=4ddown or; more intuitively: 1ddown ! 4dup :
9.6
Stochastic Approximation (Robbins and Monro, 1951)
In the stochastic approximation, the step size δ is smaller and smaller as the procedure progresses, both in the decremental and incremental directions. The stochastic approximation can be formalized as follows: SVn þ 1 ¼ SVn
d1 ðRn uÞ; n
ð9:5Þ
where δ1 is the initial step size, Rn is the response (hit: Rn = 1, miss: Rn = 0) to the actual stimulus value SVn and φ is the target probability. The procedure converges on the target probability φ. According to equation 9.5, for φ = 0.5, the size of δ during the decremental sequence after every correct response is:
Measuring the Soul
54
d1 d1 ð1 0:5Þ ! ð0:5Þ: n n
ð9:6Þ
For SVn1 = 20, and δ1 = 4 after each hit response the stimulus value SV decreases as follows: 4 SVn1 ¼ 20: hit; then the SV at trial 2 is: SVn2 ¼ 20 ð1 0; 5Þ ! SVn2 2 ¼ 20 2ð0:5Þ ! SVn2 20 1 ! SVn2 ¼ 19; 4 ð1 0; 5Þ ! SVn3 3 ¼ 18:33;
SVn2 ¼ 19: hit; then the SV at trial 3 is: SVn3 ¼ 19 ¼ 19 1:33ð0:5Þ ! SVn3 ¼ 19 0:66 ! SVn3
4 ð1 0; 5Þ ! SVn4 4 ¼ 17:83;
SVn3 ¼ 18:33: hit; then the SV at trial 4 is: SVn4 ¼ 18:33 ¼ 18:33 1ð0:5Þ ! SVn4 ¼ 18:33 0; 5 ! SVn4
4 ð1 0; 5Þ ! SVn5 5 ¼ 17:43;
SVn4 ¼ 17:83: hit; then the SV at trial 5: SVn5 ¼ 17:83 ¼ 17:83 0:8ð0:5Þ ! SVn5 ¼ 17:83 0:4 ! SVn5
4 ð1 0; 5Þ ! SVn6 6 ¼ 17:10:
SVn5 ¼ 17:43: hit; then the SV at trial 6 is: SVn6 ¼ 17:43 ¼ 17:43 0:66ð0:5Þ ! SVn6 ¼ 17:43 0:33 ! SVn6
Likewise, for φ = 0.5, the size of δ during the incremental sequence after every miss is: d1 d1 ð0 0:5Þ ! ð0:5Þ: n n
ð9:7Þ
In the same example (φ = 0.5), for SVn1 = 20 and δ1 = 4 after each miss response, the stimulus value SV increases as follows: 4 SVn1 ¼ 20: miss; then the SV at trial 2 is: SVn2 ¼ 20 ð0 0:5Þ ! SVn2 2 ¼ 202ð0:5Þ ! SVn2 20 ð1Þ ! SVn2 ¼ 21; 4 SVn2 ¼ 21: miss; then the SV at trial 3 is: SVn3 ¼ 21 ð0 0:5Þ ! SVn3 3 ¼ 21 1:33ð0:5Þ ! SVn3 ¼ 21 ð0:66Þ ! SVn3 ¼ 21:66;
Nonparametric Adaptive Psychophysical Procedures
55
FIG. 9.5 – Stochastic approximation. The step size decreases more and more as the examination progresses. At every presentation incremental δ (in case of miss responses) or decremental δ (in case of hit responses) is the same if φ = 50%; for φ > 50%, incremental and decremental δ differ according to the response (miss or hit). 4 SVn3 ¼ 21:66: miss; then the SV at trial 4 is: SVn4 ¼ 21:66 ð0 0:5Þ ! SVn4 4 ¼ 21:66 1ð0:5Þ ! SVn4 ¼ 21:66 ð0; 5Þ ! SVn4 ¼ 22:16; 4 SVn4 ¼ 22:16: miss; then the SV at trial 5 is: SVn5 ¼ 22:16 ð0 0:5Þ ! SVn5 5 ¼ 22:160:8ð0:5Þ ! SVn5 ¼ 22:16 ð0:4Þ ! SVn5 ¼ 22:56; 4 SVn5 ¼ 22:56 miss; then the SV at trial 6 is: SVn6 ¼ 22:56 ð0 0:5Þ ! SVn6 6 ¼ 22:56 0:66ð0:5Þ ! SVn6 ¼ 22:56 ð0:33Þ ! SVn6 ¼ 22:89: It is clear that whatever is the direction of the run (incremental or decremental), the step size δ becomes smaller and smaller as the procedure progresses. For φ = 0.5, the decrease in the step size in the incremental and decremental sequence is symmetrical. For φ > 50%, the ascending and descending steps become smaller in a non-symmetrical fashion: in other words, the value of δ decreases to a different extent depending on the direction (incremental or decremental) of the run. The exam ends when δ reaches a predefined value (figure 9.5). The estimated threshold SVθ is the next SV that would have been presented had the procedure not concluded; alternatively, the threshold can be computed as the average of the last stimulus values presented (Faes et al., 2007).
9.7
Accelerated Stochastic Approximation (Kesten, 1958)
The accelerated stochastic approximation is a variant of the stochastic approximation. It differs because, after the second trial, δ is reduced not after each presentation but after each reversal:
Measuring the Soul
56
SVn þ 1 ¼ SVn
d ðR uÞ; n [ 2; 1 þ REV
ð9:8Þ
where REV is the number of the reversals, R is the response given at each trial (1 = correct or hit, 0 = incorrect or miss), δ is the step size, and φ is the target probability (figure 9.6). As shown in figure 9.6, for φ = 0.5, the step size is halved after each reversal. The exam ends when δ is so small to match a predefined value. As in the stochastic approximation, the estimated threshold SVθ is the next SV that would have been presented had the procedure not concluded, or it is the average of the last stimulus values presented (Faes et al., 2007). According to Faes and colleagues, the first rule is preferable because averaging the last stimulus values leads to a higher variance of the threshold estimate. The accelerated stochastic approximation enables a faster convergence on the threshold compared to the stochastic approximation (i.e., fewer trials are required: Kesten, 1958).7 As pinpointed by Leek (2001), the advantage of a procedure with variable step sizes like the stochastic approximation and the accelerated stochastic approximation is that more signal levels can be tested compared to strategies with fixed step size. This way, the resolution is improved and the threshold is assessed with a higher degree of precision. When simulated with a limited number of trials (n = 100), the 2AFC stochastic approximation and the accelerated stochastic approximation show a slight tendency to underestimate the threshold. The negative bias decreases with the number of trials, or when the initial step size is set large,8 and with higher target probabilities (Faes et al., 2007).
FIG. 9.6 – Accelerated stochastic approximation. The step size decreases with the number of reversals. In this example φ = 0.5.
7 In automated perimetry the strategy 4−2 or 4−2−1 dB can be considered an accelerated stochastic approximation with target probability = 0.5 (Bebie et al., 1976; Spahr, 1975). 8 More specifically, the authors refer to the relative step size that is the step size as a function of the spread of the psychometric function (relative step size = δ/σ, where σ is the spread). The optimal
Nonparametric Adaptive Psychophysical Procedures
9.8
57
Block Up-down Temporal Interval Forced Choice (BUDTIF: Campbell and Lasky, 1968)
The difference between the BUDTIF and the simple up/down staircase is that in the former a block of n-presentations (and not a single stimulus) is displayed at each signal level. In the original paper, BUDTIF was paired with the 2AFC response model. The number of presentations per block depends on the desired target probability φ, being for example 4 for φ = 0.75 or 5 for φ = 0.8. In the first case, when at the block n the hit rate is higher than 3 out of 4 presentations (>75%), the signal level is reduced at the next block n+1; on the contrary, if the hit rate is lower than 3 out of 4 presentations ( σb−1 then σadj = σb−2. The asymmetry has its justification because at the beginning of the experiment σ is overestimated (the set of the presentations covers a necessarily wide range of stimulation): so, as the procedure progresses, the asymmetric effect of the inertia makes more likely a decrease rather than an increase in the range of the stimuli of the block. By replacing the previous μb and σb with the new values μb+1 and σb+1, the bþ1 updated four intensity levels x1...4 of the next block of stimuli are finally calculated according to the equation: c c bþ1 x1...4 ¼ lb þ 1 rb þ 1 ; lb þ 1 rb þ 1 ; lb þ 1 þ rb þ 1 ; lb þ 1 þ rb þ 1 : ð11:4Þ 3 3 At the end of each block, μ and σ are re-computed based on the results obtained in the previous two via probit analysis. The updated estimates of μ and σ are used to place the next block at the four updated intensity levels. In this way, μ and σ (threshold and [inverse] slope, i.e., the parameters at issue) tend to be more and more precise after each set of presentations. To summarize, at the end of the second and every subsequent block of constant stimuli, probit analysis of the last two blocks is carried out to obtain the actual best estimate of μb and σb. From these actual μb and σb estimates, μb+1 and σb+1 are derived by applying equations 11.2 and 11.3. The computed new values μb+1 and σb+1 are used to provide the four intensity levels of the next block of presentations according to equation 11.4. The procedure continues for 64 blocks, each made of 16 presentations (4 different stimulus levels) for a total of 1024 trials, and the final estimate of μ and σ is obtained from a probit analysis of all trials.
Measuring the Soul
94
11.2
Hall’s Hybrid Procedure (Hall, 1981)
Hall devised a hybrid procedure (for 4AFC experiments, not suitable for 2AFC) that adopts the PEST rules for the placement of the stimuli but, unlike PEST, all the responses collected during the examination (and not only the final value of the track) are used to fit the psychometric function12 via MLE. This way, not only the threshold but also the slope is derived. In Hall’s hybrid procedure n blocks are presented. Each block is made of a predefined number of presentations (50). After each block, the threshold and inverse slope (or spread) of the psychometric function are computed by MLE on the whole set of data collected at that block. These updated estimates of threshold and spread are used as the starting values for the next block of stimuli. The stopping rule is the same as in PEST. Because the estimates are computed on the whole set of trials and not just on the last response, Hall’s hybrid procedure is robust against bias.
11.3
The Simulated Staircase Technique of Leek, Hanna and Marshall (Leek et al., 1992)
According to Leek and colleagues (1992), and along the lines of the hybrid procedure devised by Hall, the slope can be computed using a staircase procedure and performing a post hoc analysis of all the responses. The authors simulated an up-down transformed staircase strategy that, for the final computation, considers not only the response of the last trial but all the data obtained during the test. Based on all the answers collected during the examination, the psychometric function is fitted via the MLE. From the psychometric function, it is possible to obtain the measure of the slope as well as of the threshold. To verify the assumption, 1U2DTR and 1U3DTR procedures paired with 2AFC, 3AFC, or 4AFC response models have been simulated and compared. The experiment showed that the estimate of the slope obtained with these adaptive procedures improves with the number of the presentations, being inaccurate for tracks shorter than 50 trials and fairly accurate for tracks longer than 200 trials.13 Of the response models, 4AFC was the best, while 2AFC generated the poorest estimates. Even if the model of Leek and Associates demonstrates that it is possible to derive the measure of the slope of the psychometric function with the staircase procedures (provided the number of trials is at least 100), the authors reported that these adaptive strategies lead to an overestimation of the dispersion parameter β: such bias is lower with 4AFC and more consistent in case of 2AFC. Kaernbach 12
Contrary to APE, which uses a normal distribution, in Hall’s hybrid procedure the logistic distribution is assumed as the psychometric model. 13 For this reason, with adaptive (staircase) procedures Leek and colleagues (1992) recommended administering tracks at least 100 trials long to minimize bias that affects the estimate of the slope.
Adaptive Psychophysical Procedures for the Estimate of the Slope
95
TAB. 11.1 – The correction factors a and b as a function of the type of UDTR staircase and of the AFC response model (from Leek et al., 1992, p. 250).
2AFC 3AFC 4AFC
1U2D (φ = 0.71%)
1U3D (φ = 0.79%)
a 0.57 0.67 0.73
a 0.56 0.66 0.70
b 1.52 1.20 1.14
b 1.56 1.33 1.27
(2001b) subsequently confirmed (simulating a simple up-down staircase with two different step sizes and a stochastic approximation) this finding. The biased estimation of the slope relies on the fact that the collection of the responses in the staircase procedure is sequential, therefore not independent as it occurs with the constant stimuli (Kaernbach, 2001b); in this context, the overestimation depends on the asymmetric distribution of the trials above and below the threshold (more presentations above the midpoint and less below (Leek et al., 1992; Leek, 2001). To solve this issue (minimizing biased slope estimates), Leek and colleagues introduced two correction factors, a, and b, so that: btrue ¼ a bbest :
ð11:5Þ
The two correction factors a and b depend on the type of UDTR and on the response model. For tracks 50 trials long, a and b are reported in table 11.1. The estimate of the slope (expressed as variability, i.e., percent deviation from the geometric mean) was slightly less reliable for 2AFC, and, in general, improved with the number of presentations. The 4AFC had the highest efficiency,14 whereas the 2AFC was the worst. Like bias and reliability, efficiency improved with the number of presentations (showing a ceiling effect after 100–200 trials). In conclusion, staircase procedures can be used to estimate not only the threshold but also the slope of the psychometric function provided at least 100 trials are administered. In this respect, the 2AFC response model seems the least suitable as it is less accurate, more biased, and less efficient compared to 3- and especially 4AFC. The correction factors computed by Leek and colleagues help minimize the undesirable biasing effect of the non-randomized presentations proper of the adaptive strategies.
11.4
The Adaptive Threshold and Slope Estimation (ATASE: Kaernbach, 2001b)
Kaernbach (2001b) devised a staircase procedure in which threshold and slope are measured by presenting couples of stimuli, one at a higher (H) and the other at a lower (L) signal level. Computed as sweat factor K = nσ2 (see chapter 13), where the variance σ2 is the squared standard deviation of the slope.
14
Measuring the Soul
96
The mean value of intensity Test = (H + L)/2 computed trial after trial is the estimator for the threshold, whereas the intensity interval Σest = H–L is the estimator for the spread (that is the inverse of the slope: figure 11.4). At each trial the possible responses given by the observer are: (1) (2) (3) (4)
H H H H
→ → → →
hit, L → hit. miss, L → miss. hit, L → miss. miss, L → hit.
In the first case (H → hit, L → hit), the signal interval Σ remains unchanged because both H and L are moved down (to a lower intensity) by the same amount. In other terms, the value of the estimator for the spread Σest will be unchanged at trial 2: (H–L)t2 = (H–L)t1, while the estimator for the threshold decreases: (Ht2 + Lt2)/ 2 < (Ht1 + Lt1)/2 (figure 11.5). Also in case (H → miss, L → miss) the signal interval Σ remains unchanged because both H and L are moved up (to a higher intensity) by the same amount. In other terms, the value of the estimator for the spread Σest will be unchanged at trial 2: (H–L)t2 = (H–L)t1, while the estimator for the threshold increases (Ht2 + Lt2)/ 2 > (Ht1 + Lt1)/2 (figure 11.6). In the third case (H → hit, L → miss), H is moved down (to a lower signal level), whereas L is moved up (to a higher intensity): it follows that the signal interval Σ narrows. So, the value of the estimator for the spread Σest will be reduced at trial 2: (H–L)t2 < (H–L)t1, while the value of the estimator for the threshold will be the same (Ht2 + Lt2)/2 = (Ht1 + Lt1)/2 (figure 11.7). Finally, in the fourth case (H → miss, L → hit), H is moved up (to a higher signal level), whereas L is moved down (to a lower intensity): it follows that the signal interval Σ widens. In other terms, the value of the estimator for the spread Σest will be increased at trial 2: (H–L)t2 > (H–L)t1, while the value of the estimator for the threshold will be the same (Ht2 + Lt2)/2 = (Ht1 + Lt1)/2 (figure 11.8).
FIG. 11.4 – ATASE. See text for explanation.
Adaptive Psychophysical Procedures for the Estimate of the Slope
97
FIG. 11.5 – ATASE. H: hit, L: hit. Both H and L are moved down by the same amount (in this case by 2 units of intensity: H: from 18 to 16 intensity units and L: from 14 to 12 intensity units). The estimator of the spread Σest is unchanged (Σest = 4) while the estimator of the threshold Test is reduced (Test from 16 to 14). Compare with figure 11.4.
FIG. 11.6 – ATASE. H: miss, L: miss. Both H and L are moved up by the same amount (in the example by 2 units of intensity: H: from 18 to 20 intensity units and L: from 14 to 16 intensity units). The estimator of the spread Σest is unchanged (Σest = 4) while the estimator of the threshold Test is increased (Test from 16 to 18). Compare with figure 11.4.
Based on the step size (called α by the author) chosen to move up or down the couple of stimuli when stimulus H and stimulus L are both miss or hit,15 and on the step size (called β by the author) chosen to move downward the stimulus H and
15
First and second cases, respectively.
98
Measuring the Soul
FIG. 11.7 – ATASE. H: hit, L: miss. H moves down, L moves up by the same amount (for example, by 1 unit of intensity: H: from 18 to 17 intensity units and L: from 14 to 15 intensity units). The estimator of the spread Σest is reduced (Σest from 4 to 2) while the estimator of the threshold Test remains unchanged (Test = 16). Compare with figure 11.4.
FIG. 11.8 – ATASE. H: miss, L: hit. H moves up, L moves down by the same amount (in the example by 1 unit of intensity: H: from 18 to 19 intensity units and L: from 14 to 13 intensity units. The estimator of the spread Σest is increased (Σest from 4 to 6) while the estimator of the threshold Test remains unchanged (Test = 16). Compare with figure 11.4. upward the stimulus L (or vice versa) in case of different responses for H and L,16 the procedure converges on a target probability φt = 0.5 for the threshold, and to two target probabilities φσ1, φσ2 for the slope, placed, respectively, above and below the threshold on the psychometric function (φσ1 = 33% and φσ2 = 67%). Third and fourth cases. Given a step size δslope, in the third case (H hit, L miss), the intensity level of H and L is, respectively, decreased and increased by δslope. In the fourth case (H miss, L hit), the intensity level of H and L is, respectively, increased and decreased by 4δslope,
16
Adaptive Psychophysical Procedures for the Estimate of the Slope
99
At the end of the exam, the stimulus level corresponding to φt will be the threshold, and the interval of intensities between φσ1 and φσ2 will be used to compute the spread, thereby the slope. ATASE differs from the Bayesian procedures aimed at measuring the slope (like the modified ZEST and the Ψ method17) inasmuch ATASE performs blocks of two trials at a certain distance, whereas the Bayesian approaches perform single trials with a bimodal distribution of the test levels. It is yet unclear how important this difference is (Kaernbach, 2001b, p. 1396).
11.5
Modified ZEST (King-Smith and Rose, 1997)
Background Among the researchers who addressed the problem of the slope estimation, King-Smith and Rose provided a substantial contribution. In a paper, they discussed the general principles underlying the computation of this parameter and described a method able to estimate slope as well as threshold based on a two-dimensional probability density function and Bayesian estimate. The authors define the normalized intensity relative to threshold v as the difference between log intensity x and log intensity at threshold θ multiplied by the slope k: v ¼ k ðlog x log hÞ:
ð11:6Þ
This way, the normalized intensity v when log intensity x matches log threshold intensity θ is zero. For intensities of the stimulus lower than the threshold, the resulting normalized intensity will be negative; for intensities of the stimulus higher than the threshold, the resulting normalized intensity will be positive (figure 11.9). As explained by the authors: – For each probability of perceiving the target at a normalized intensity (proportion of correct responses), there is a standard error in the normalized intensity that generates that probability. Consider in figure 11.10 the red horizontal bar localized at 10% of correct responses and representing the standard error of the normalized intensity: it shows that a target should be perceived with probability = 10% if its (normalized) intensity is −2, but due to a standard error, the same probability of seeing the target (i.e., the same proportion of correct responses) can occur also for intensities a little higher (say −1.8) or a little lower (say −2.2). At 50% of probability of perceiving the target (i.e., when the proportion of correct responses is 50%) the standard error of the normalized intensity of the stimulus (the intensity at threshold = zero normalized intensity) is at its minimum (the red horizontal bar is the shortest).
17
See sections 11.5 and 11.6.
100
Measuring the Soul
FIG. 11.9 – The psychometric function as proportion of correct responses versus normalized intensity as described by King-Smith and Rose (1997).
FIG. 11.10 – The standard error referred to the (normalized) intensity of a target for a given probability to perceive that stimulus (red horizontal bars) and of the probability to perceive the target (proportion of correct responses) for a given (normalized) intensity (green vertical bars). See text for explanation. From King-Smith and Rose, 1997. Reproduced with permission of Elsevier via Copyright Clearance Center.
– In turn, for each (normalized) level of intensity of the stimulus, there is a standard error in the corresponding probability of perceiving that stimulus (i.e., in the proportion of correct responses). Consider in figure 11.10 the green vertical bar localized at zero normalized intensity (threshold level): it shows that a stimulus should have normalized intensity zero when it is perceived 50% of the times, but due to a standard error, the same intensity can be perceived with a probability a little higher (for example, 55%) or a little lower (for example, 45%). So, at zero normalized intensity the standard error of the probability of perceiving the stimulus (i.e., 50%) is at its maximum (the green vertical bar is the longest).18
18
The standard error Ep of the probability of perceiving a stimulus of a given intensity is calculated as: Ep = [p(1−p)/n]1/2, where p is the predicted probability of perceiving the stimulus and n is the number of trials. For an experiment made of 100 trials, for example, Ep for a stimulus intensity at threshold (p = 0.5) is: Ep = [0.5(1–0.5)/100]1/2→Ep = 0.05; in turn, Ep for a stimulus intensity
Adaptive Psychophysical Procedures for the Estimate of the Slope
101
In sum, the minimum standard error of the measure of the intensity is localized at a probability level of 50% correct responses i.e., at zero normalized intensity: it is, therefore, at the threshold. As the probability of correct responses increases or decreases, the standard error of the estimate of the corresponding intensity increases. It follows that the most accurate estimate of the threshold intensity (the estimate with the minimum standard error) is obtained presenting stimuli at zero normalized intensity, that is about the presumed threshold. In turn, the minimum standard error of the proportion of correct responses is localized at a probability of correct responses well above and below 50%, i.e., at a normalized intensity different from zero: namely, it is localized at a probability of perceiving the target of 10.4% and 89%. Indeed, these are the probabilities to perceive stimuli of −2.24 and +2.24 normalized intensity. As the intensity of the stimuli converges toward the threshold, the standard error referred to the proportion of correct responses increases. Now, the standard error of the proportion of correct responses reflects the dispersion of the observations: therefore, it reflects the spread (or inverse slope) of the psychometric function. The fact that this parameter is at its maximum at the threshold level (zero normalized intensity) explains why the spread is generally defined about the threshold level.19 In turn, the fact that this parameter is at its minimum at −2.24 and +2.24 normalized intensity suggests that, to obtain the most reliable estimate of the slope, trials should be presented at these two symmetrical values of normalized intensity.20 To further clarify why −2.24 and +2.24 normalized intensities are the two most suitable intensities for the estimate of the slope, it is worth recalling that the slope describes how rapidly performance changes with a given change of the signal strength (Leek, 2001).21 That is the same as saying that it describes the trend of the change in probability (or proportion) of correct responses as a function of stimulus intensity. This trend can be computed as the difference in the probability of correct responses in the two points of the psychometric function. Evidently, the estimate of the slope is degraded by the standard error of the probability of correct responses. Thereby, computing the probability of correct responses in the two points of the psychometric function where the standard error of such probability is at a minimum (+2.24 and −2.24 normalized intensity) allows obtaining the most reliable estimate of the slope. Footnote 18 continued below threshold, say at 0.2 probability of correct responses, is: Ep = [0.2(1–0.2)/100]1/2→Ep = 0.04. So, as explained, the lowest standard error referred to the proportion of correct responses as a function of the stimulus intensity is not localized at the threshold. 19 See chapter 11. 20 If the psychometric function is symmetrical and the response design is yes/no, an equal number of trials should be presented at the two levels of intensity +2.24 and −2.24; if the psychometric function is symmetrical but the response design is 2AFC, relatively more trials should be presented at the higher intensity (+2.24); finally, if the psychometric function is asymmetrical (e.g., Weibull) and the response design is yes/no, relatively more trials should be presented at the lower intensity (−2.24). 21 See chapter 3.
102
Measuring the Soul
In sum, the moral is that the most suitable stimulus level for the estimate of the threshold is different from the two most suitable stimulus levels for the estimate of the slope. To make the term “suitable” quantifiable, the authors have defined threshold efficiency and slope efficiency, two parameters inversely proportional to the variance of the estimate of threshold and slope, respectively. Because the variance is proportional to the standard error,22 efficiency is inversely related to the standard errors: the lower the standard error, the higher the efficiency. Efficiency ranges from 0 (maximum variance or standard error) to 1 (minimum variance or standard error). As expected, figure 11.11 shows that the efficiency for the estimate of the threshold is at its maximum for normalized intensities about zero, whereas the highest efficiency for the measurement of the slope peaks at normalized intensities ±2.24. Intensities aimed at obtaining the highest efficiency for the measurement of the slope (efficiency=1) would lead to an efficiency of no more than 0.32 for the measurement of the threshold. In turn, intensities aimed at obtaining the highest efficiency (=1) for the measurement of the threshold would fail to measure the slope (slope efficiency = 0). So, what is the optimal compromise of stimulation to obtain at the same time information on the threshold and on the slope? King-Smith and Rose computed a normalized intensity of ±2.07, which corresponds to a probability of correct
FIG. 11.11 – The efficiency of the estimation of threshold and slope. The green line is the level of intensity that yields the maximum efficiency for the estimate of the threshold (normalized intensity = 0). The red lines refer to the maximum efficiency for the estimate of the slope (normalized intensity = ± 2.24). The blue lines refer to the optimal compromise of stimulation to measure both threshold and slope (normalized intensity = ± 2.07). Modified from King-Smith and Rose, 1997. Reproduced with permission of Elsevier via Copyright Clearance Center.
Variance of the sample Var = Standard Error2 sample size.
22
Adaptive Psychophysical Procedures for the Estimate of the Slope
103
responses of 88% and 12%.23 For these values, the reduction in slope efficiency compared to the optimal values ±2.24 is negligible (from 1 to 0.99) while threshold efficiency improves from 0.32 to 0.376 (figure 11.11). Modified ZEST Based on these premises, the same authors devised a modified version of ZEST to measure not only the threshold but also the slope of the psychometric function. Trial after trial, the stimuli are placed about the two intensity levels ± 2.07, defined by the authors as “high” and “low” intensity levels. The procedure considers a double a priori probability density function, one referred to the threshold, the other to the slope. Time after time, the presentations are randomly placed either about the low or the high intensity level via MLE, so as to refine at the same time the pdf of the threshold and the pdf of the slope, with the maximum compromise of efficiency. The peak of the pdf of the threshold shifts toward lower or higher levels of intensity after a correct or incorrect response, as it occurs in the Bayesian procedures (like ZEST); in turn, the peak of the pdf of the slope moves to higher values of steepness if a miss response to a low intensity level or a hit response to a high intensity level is obtained. On the contrary, the peak of the pdf of the slope shifts toward lower values of steepness if an “unexpected response” is obtained (i.e., a “hit” to a low intensity level or a “miss” to a high intensity level). The combined estimate of the two parameters is well represented by a posterior bidimensional pdf in which the probability of correct responses (y-axis) is plotted vs. log threshold (z-axis) and log slope (x-axis) (figure 11.12).
FIG. 11.12 – The bidimensional pdf in the modified ZEST. Upper panel: prior bidimensional pdf. Lower panels: posterior pdf after a yes and no response for a high intensity stimulus. Modified from King-Smith and Rose, 1997. Reproduced with permission of Elsevier via Copyright Clearance Center. 23
According to Levitt (1970), the best stimulus placement to measure at the same time threshold (target probability: 0.5) and slope is at the intensity that generates 84.1% and 15.9% of correct responses.
Measuring the Soul
104
The posterior pdf is updated at every trial via MLE until the standard error of the log slope estimate drops below a predefined limit, or after a pre-established number of trials. At the end of the examination, mean24 and standard errors of threshold and slope are obtained.
11.6
The Ψ Method (Kontsevich and Tyler, 1999)
The Ψ method is a Bayesian adaptive procedure devised by Kontsevich and Tyler to estimate both threshold and slope of the psychometric function for the 2AFC response model, with the threshold set at the target probability = 0.75. It places the next stimulus at the position that minimizes the expected entropy of a vector representing the threshold and slope of the psychometric function. As explained by the authors, the Ψ method is similar to the minimum variance method, but it computes the pdf in a bidimensional space whose parameters are threshold and slope; given that the minimum variance method cannot be extended from a monodimensional space (threshold) to a bidimensional space (threshold + slope) because the values of threshold and slope are incommensurable, the variance is replaced by entropy. In summary, the stimulus is presented at the level of intensity that minimizes at the next trial the entropy25 of the pdf of the vector threshold + slope. In more detail, in the Ψ method the threshold α and slope β are combined into a vector, we will call λ. Each vector λ, therefore, is made up of a pair of threshold and slope values, each pair having a probability to correspond to the parameters of the psychometric function of the observer: thereby, a bidimensional probability density function of λ can be obtained. The Ψ method aims to find this posterior probability density function for λ: this pdf(λ), in fact, contains the most likely threshold and slope that characterize the psychometric function of the subject. At each trial the pdf(λ) is computed via Bayesian estimation for each stimulus intensity x and a correct and incorrect response r: pðrjk; x Þpn ðkÞ pn þ 1 ðkjx; r Þ ¼ P : pðrjk; x Þpn ðkÞ
ð11:7Þ
This way, n-conditional pdfs(λ) are obtained. The mean of each pdf is the most probable λ expected for a given stimulus intensity and a given response (correct/incorrect) of the observer. The width (variance) of each pdf(λ) is inversely proportional to the information about λ. In the minimum variance method, the next stimulus is presented at the intensity level that corresponds to the mean of the pdf that, among those computed, has the minimum variance. Yet, as anticipated, the minimum variance is unsuitable in our 24
The mean is preferred to mode and median as it is the optimal parameter to minimize the variance not only of the estimate of the threshold (see ZEST, section 10.2.2) but also of the slope (Kontsevich and Tyler, 1999). 25 Instead of the variance, as it applies to the minimum variance method.
Adaptive Psychophysical Procedures for the Estimate of the Slope
105
case because, as explained by Kontsevich and Tyler: “the variance of the posterior probability distribution […] cannot be readily expanded to two dimensions [threshold and slope, author’s note] because the threshold and slope dimensions are incommensurate” (Kontsevich and Tyler, 1999, p. 2730). Since threshold and slope are incommensurable, in the Ψ method variance is replaced by entropy. Like variance, in fact, entropy measures the dispersion in terms of uncertainty of the posterior pdf. So, in the Ψ method, the next stimulus is presented at the intensity level that corresponds to the mean of the pdf which, among all those computed, has the minimum entropy. In its broadest sense, entropy is a measure of the degree of disorder or uncertainty in a system. In information theory, Shannon entropy is a measure of the uncertainty of the information.26 In our case, the measure of uncertainty is applied to λ, so that entropy is the parameter used to assess the certainty that the actual (i.e., estimated until that trial) λ-vector is the right estimate. The authors define entropy at trial n as: Hn ¼
n X
pðkÞ log pðkÞ:
ð11:8Þ
k
According to the equation, entropy at trial n is the summation of the probability that each λ is the most likely λ for the observer multiplied by its logarithm. The negative sign makes entropy always higher or equal to zero. Log is the base 2 logarithm (if the entropy is expressed in bits).27
26
In general, entropy is defined as: H ðX Þ ¼
n X
pðxi Þ log2 pðxi Þ;
i¼1
where P(xi) is the probability of a given outcome, given a variable x. When logarithm is base 2, entropy is measured in bits. In psychophysics, the variable x can be the threshold θ: H ðhÞ ¼
n X
pðhÞ log2 pðhÞ;
h
where p(θ) is the probability of the threshold at the stimulus intensity x. In the case considered here, entropy obtained at each trial n is referred to the vector λ: Hn ðkÞ ¼
n X
pðkÞ log pðkÞ:
k
27
For clarification purpose, here are some examples:
– Given four possible pn(λ) = 0.15, 0.25, 0.4, 0.2, H will be: −(0.15 log 0.15 + 0.25 log 0.25 + 0.4 log 0.4 + 0.2 log 0.2) = −(−0.40) + (−0.5) + (−0.528) + (−0.46) = 1.88 bits. – If pn(λ) = 0.8, 0.05, 0.03, 0.2, H will be: –(0.8 log 0.8 + 0.05 log 0.05 + 0.03 log 0.03 + 0.2 log 0.02) = −(−0.25) + (−0.216) + (−0.15) + (−0.46) = 1.07 bit. – For the extreme case that pn(λ) = 1.0, 0.0, 0.0, 0.0, H will be: –(1log1) = 0.
106
Measuring the Soul
The lower the entropy, the higher is the certainty that the λ-vector describes the psychometric function of the subject. The expected entropy H(λ|x,r) referred to the conditional pdf (λ) obtained at each intensity x and for a correct or incorrect response is then computed as: X Hn ðkjx; rÞ ¼ pn ðkjx; rÞ logðpn ðkjx; r ÞÞ: ð11:9Þ Of the entropies calculated, the smallest one is selected, and the corresponding stimulus value is presented at the next trial. So, like the minimum variance method, the Ψ method “looks one step ahead”, predicting the most informative level (i.e., with the lowest entropy) for the placement of the stimulus at the next presentation after a correct and an incorrect answer: this way, the number of trials required to converge on the λ of the subject is minimized. After the response of the observer is recorded, the entropy of the updated pdf(λ) is re-computed, and the difference between the previous and the actual entropy is the gain of information obtained. The procedure is repeated at each presentation. This way the entropy gets smaller and smaller as the procedure progresses. In sum, the Ψ method performs the following steps: (1) A prior probability density function pdf0(λ)28 is assumed. The variable under investigation is λ, the vector with threshold and slope as coordinates: pdf0(λ), therefore, expresses the prior probability that each λ is the real λ of the subject. (2) The posterior probability of each psychometric function with probability p(λ) at trial n is computed via Bayesian estimation for each intensity x of the stimulus that can be presented at the next trial n + 1. The computation is made in case the answer r will be correct or incorrect. (3) The expected entropies referred to each pdfn(λ) after trial n are then computed. (4) Of all the expected entropies computed for each pdfn(λ), the one that shows the minimum entropy is singled out. The intensity level xme corresponding to the mean of this pdfn(λ) is then presented at trialn+1. (5) Based on the answer r (correct or incorrect) to xme, the posterior pdf(λ) is updated via Bayes estimation. (6) From the posterior pdf at trial n + 1, step 2 is repeated, so that an updated pdf is obtained via Bayesian estimation. The procedure continues with steps 3, 4, and finally 5. The process is iterated until completion of a predefined number of presentations. Monte Carlo simulations (W method paired with 2AFC response model and threshold set at the target probability = 0.75) revealed that during the initial phases the Ψ method aims to localize the threshold; after the threshold has been measured with acceptable accuracy (this occurs after 30–40 trials in experiments with real observers), the slope becomes the main target. At this phase of the exam, the entropy as a function of the signal strength shows two local minima positioned 28
Kontsevich and Tyler reported that even if in their original simulation and in their experiment with a human observer the prior pdf was uniform, it would be preferable to assume a non-rectangular prior probability density function so as to speed up the examination time.
Adaptive Psychophysical Procedures for the Estimate of the Slope
107
FIG. 11.13 – After the threshold has been measured, the entropy as a function of the signal strength shows two local minima below the upper asymptote and below the midpoint of the psychometric function. See text for explanation. From Kontsevich and Tyler, 1999. Reproduced with permission of Elsevier via Copyright Clearance Center.
below the upper asymptote and below the midpoint of the psychometric function.29 Trial after trial, stimuli are alternatively presented at the two signal levels corresponding to such minimum entropy values, suitable to estimate the slope (figure 11.13). Meanwhile, the threshold estimate is further refined.
11.7
The Sweet Point-Based MLE Procedure of Shen and Richards (2012)
In a MLE procedure aimed at estimating threshold and slope of the psychometric function, Shen and Richards (2012) recalled that the stimuli should be presented at the level that minimizes at the same time the expected variance of the estimate of the threshold α and of the slope β: these positions are called the sweet points30 (Green, 1990, 1993). A single α-sweet point is sufficient for a reliable estimate of the threshold, while two β-sweet points are necessary to obtain the slope.31 So, stimuli presented at the level of the α-sweet point and of the β-sweet point(s) provide, respectively, the most reliable estimate of the threshold and of the slope.
29
For a fairly large number of trials, the two minima correspond to p = 0.69 and 0.92 on the psychometric function. 30 The correct localization of the sweet points on the psychometric function requires knowledge of the guess and lapse rate. 31 Unless the assumed psychometric function is a Weibull distribution: in this case a single β-sweet point is enough (Shen and Richards, 2012).
108
Measuring the Soul
The sweet point for the estimate of the threshold (the α-sweet point) is, unexpectedly, not exactly at the midpoint of the slope of the psychometric function, but above it (Green, 1990; Brown, 1996). In a 2AFC, for example, it is not localized at 75% but at 80% of correct responses32 (Shen and Richards, 2012). It follows that in 2AFC response models the presentations for the measure of the threshold should be placed at the signal level that yields a proportion of correct responses of 80%, rather than 75%. At this probability level, in fact, the variance of the threshold estimate is at its minimum (figure 11.14, left panel). If the α-sweet point is suitable for the threshold, it is inadequate to derive the slope: in fact, the variance of the estimate of the slope approaches infinity at 80% of correct responses (figure 11.14, right panel). As recalled, for the slope two β-sweet points are required, one above and the other below the threshold (figure 11.14, right panel). Evidently, the α-sweet point and the two β-sweet points do not match. Assuming a logistic psychometric function, the sweet points for the slope have been computed for a yes–no task (in the hypothetical case of no lapses and guesses) and for 2, 3, and 4-AFC: in all cases, the two values are expressed as probabilities of correct responses (SWβlow, SWβhigh). The optimal sampling positions (sweet points) for the estimate of threshold, slope, and threshold + slope have been reported by O’Regan and Humbert (1989) and summarized in table 11.2.
FIG. 11.14 – The expected variance of threshold (left) and slope (right) as a function of the
proportion of correct responses for three different values of slope β. The bottom of the curve is the sweet point. To be noted is the presence of one α-sweet point (at level 80% of correct responses, more evident for β = 0.5) and two β-sweet points. Evidently, the α-sweet point and the β-sweet points do not match. It should be considered that, as expected, the variance of the α-sweet point increases with the decrease in the slope: shallower slopes are related to higher variances of the α-sweet point (see left panel). In turn, the variance of the two β-sweet points increases with the steepness of the slope: steeper slopes are related to higher variances of the β-sweet points33 (see right panel). From Shen and Richards, 2012. Reproduced with permission of AIP Publishing via Copyright Clearance Center. 32
The authors computed this value assuming a logistic distribution (see appendix I), a guess rate fixed at 0.5, and a lapse rate of zero. According to Green (1990), the α-sweet point is as high as at 84–94% of correct responses (2AFC). 33 An additional variable that affects the sweet points is the lapse rate λ: higher lapse rates move the sweet points downward along the psychometric function and increase their variance.
Adaptive Psychophysical Procedures for the Estimate of the Slope
109
TAB.11.2 – The α- and β-sweet points computed for different response models. These are the optimum sampling positions for deriving the threshold α, the slope β, or both (α + β) in y/n, 2AFC. 3AFC, and 4AFC response models. The expression “optimum sampling position” refers to the position where the normalized variance is minimized. The most accurate estimate of the threshold is obtained by placing the stimuli about the threshold level (i.e., at the signal level where the target probability φ is expected to be 0.50, 0.75, 0.66, or 0.62 depending on the response model). In turn, the optimum position for the estimation of the slope and for the simultaneous estimation of threshold and slope is above and below the target probability. (Summarized from table 1, p 435, O’Regan and Humbert, 1989). Response model Y/N 2AFC 3AFC 4AFC
SWβlow 0.083 0.616 0.472 0.394
SWβhigh 0.616 0.973 0.959 0.952
SWα 0.5 0.75 0.66 0.62
SWα+βlow 0.176 0.675 0.548 0.479
SWα+βhigh 0.823 0.959 0.936 0.919
The table shows that the optimal loci for the slope in a y/n response model are those levels of intensity that generate 8.3% and 61.6% of correct responses because these estimates of intensity are those that minimize the variance of the estimate of the slope. To be noted is that the optimal measurement for the threshold in y/n or 2,3,4-AFC response models is the expected threshold level itself (i.e., generating the target probability φ = 0.5, 0.75, 0.66, and 0.62, respectively): as a matter of fact, it is well known that the best placement of the stimuli to measure the threshold is about the threshold (or slightly above, as explained). And yet, at the threshold, the variance of the estimate of the slope is infinite. In case the aim was to obtain the smallest variance of the estimate of both threshold and slope, a compromise between the sweet points of the slope and of the threshold must be obtained. The two sweet points generated by this trade-off are reported in the rightmost two columns. Presenting the stimuli at a suboptimal position leads to increased variance in the estimate of the parameters of interest, and the detrimental effect will be more evident for the n-AFC response model: the lower the n, the higher the variance. Of the two parameters, the slope is more affected by the misplacement (O’Regan and Humbert, 1989). Shen and Richards (2012) devised a Bayesian procedure aimed at presenting stimuli at the level of the α- and β-sweet points. In addition, as the lapse rate λ is shown to consistently affect the position of the α- and β- local minima (it increases the variance of the α- and β-sweet points, and shifts their position toward a lower proportion of correct responses: figure 11.15), the λ-sweet point is included as well. Unlike the other two parameters, the variance of the lapse rate λ has not a local minimum, but is a decreasing monotonic function of the stimulus intensity: as the stimulus intensity increases, the variability of λ decreases, and tends asymptotically to a minimum. So, the λ-sweet point can be considered as the proportion of correct response generated by very high signal levels.
110
Measuring the Soul
FIG. 11.15 – The lapse rate affects the α- and β- sweet points. See text for explanation. From Shen and Richards, 2012. Reproduced with permission of AIP Publishing via Copyright Clearance Center.
The candidate sweet points α, βlow, βhigh, and λ are pre-established (21 possible values for the α-sweet point, 10 possible values for the βlow,-sweet point, 10 possible values for the βhigh- sweet point, and 5 possible values for the λ-sweet point).34 A prior distribution (Gaussian) of the pdf referred to the α-, β-, and λ- sweet points is established.35 MLE selects trial after trial the most likely set of sweet points. Then, the procedure chooses randomly the α-sweet point or one of the two βsweet points, or the λ-sweet point, and presents the stimulus at that level. Based on the response of the observer, MLE estimates, among the n-levels, what is the most likely intensity corresponding to the sweet point just tested. The posterior pdf referred to that sweet point is then updated, and then a new stimulus is randomly presented at one of the four levels of intensity corresponding to the four sweet points. At the end of the examination (100 trials), the intensity level most likely associated with the α, βlow, βhigh, and λ-sweet points is obtained, allowing, therefore, to fit the psychometric function. The final estimates of the four sweet points are the mean of the corresponding last posterior probability density functions. Instead of randomly selecting the α-, βlow, βhigh-sweet points (we will call it random procedure for the sweet point selection), an up-down staircase strategy, like the 1U4D transformed staircase can be adopted (we will call it hybrid procedure for the sweet point selection). In this version, the stimulus intensity is decreased to the intensity level corresponding to the lower sweet point (for example, from α to βhigh or from βlow to α) after 4 correct responses, while after 1 wrong response it is increased to the intensity level corresponding to the upper sweet point (for example, from β2 to α or from α to β1). As for the previous algorithm, a maximum-likelihood procedure estimates the most likely intensity associated with the α-, βlow- and βhigh-, and λ-sweet points among the considered values. Simulations revealed that the two techniques are effective in estimating the threshold and slope of the psychometric function, irrespective of its steepness. The randomized strategy, however, seems slightly better in measuring the threshold.
The guess rate c is assumed to be known. The authors chose a logistic distribution for the psychometric function.
34 35
Adaptive Psychophysical Procedures for the Estimate of the Slope
111
It remains, as recalled in section 11.3, that the slope of psychometric functions measured by the current adaptive staircase procedures is biased to some extent (Leek et al., 1992; Treutwein and Strasburger, 1999; Kaernbach, 2001). In simulated up-down simple staircase runs with two different step sizes,36 Kaernbach reported a consistent overestimation in the MLE of the slope. The bias was lower when the large step size was used and decreased with the number of presentations, tending to the asymptote for runs 100 trials long. However, the overestimation of the slope was not negligible even in prolonged runs, being about 10% for the large step size and 20% for the smaller one (Kaernbach, 2001). The biasing effect is even worse when simulating a stochastic approximation. With this procedure, indeed, the estimate of the slope is about double compared to its real value, and it does not tend to decrease with the length of the track.37 The explanation for the biased estimate of the slope relies on the serial dependency of data collection: bias relative to the slope is intrinsic to the adaptive procedures because the adaptive procedures prevent from collecting data independently. In fact, unlike the method of constant stimuli, in the adaptive procedures the next signal level is chosen according to the response given at the previous presentation. If this serial dependency is negligible in the threshold estimate (i.e., where convergence on a single level of signal strength is expected), it hampers the true measures required to assess the value of the slope38 (Kaernbach, 2001). So, Kaernbach stated that “the bias can be overcome only by disregarding that part of the data that is dependent on earlier data”.39 As a matter of fact, the slope assessed using independent constant stimuli is not biased.
Associated with a y/n response model. The step size was Σ/4 and Σ/8 (corresponding to the change in stimulus intensity by 0.5 and 0.25, respectively), where Σ is the spread of the steepest portion of the psychometric function. 37 10, 20 or 30 trials. 38 A simple explanation for the biased measurement of the slope obtained with adaptive procedures is provided by Kaernbach (2001) and reported by Strasburger: “[…] an adaptive placement strategy will lead to responses having, above threshold, slightly improved chances of being correct by both a selective lack of retesting after a correct response and a selective increase of retesting after an incorrect response (serial dependency). The converse is true below the threshold. Taken together, this leads to a positive bias in slope estimation”. (Strasburger, 2001, p. 1370). 39 Kaernbach, 2001, page 1397. 36
Chapter 12 Multidimensional Adaptive Testing In a number of psychophysical experiments the matter of interest is how a dependent variable (generally the threshold α, referred, e.g., to contrast sensitivity, motion perception, luminance…) changes as a function of an independent variable q. The independent variable q can be the spatial frequency of a sine wave grating (in case of contrast experiments), external noise, retinal eccentricity, etc. The function relating the dependent variable α to the independent variable q has been called the threshold function by Vul and colleagues (2010). The threshold function, evidently, differs from the psychometric function: the psychometric function is monodimensional, as it considers the proportion of correct responses as a function of a single variable (the threshold α), whereas the threshold function is bidimensional, as it deals with the proportion of correct responses as a function of two variables: α and q. The most straightforward way to plot a threshold function is to estimate the threshold α at each discrete value of q by performing independent tests, for example staircase procedures and then selecting the function that best fits the measured α values across the q range. This solution generates individual, independent unidimensional distributions, each of them addresses the threshold at a different value of q. However, this “Method of 1000 Staircases” (Breakwell et al., 2000) is time consuming: because each threshold is estimated independently, each α is uninformative about the possible value of the others. A bidimensional function can be derived more efficiently by extending the adaptive paradigm applied to the domain of the psychometric function so as to encompass the domain of the independent variable: in a contrast sensitivity paradigm, for example, the optimization is obtained by informing the adaptive paradigm aimed at estimating the contrast threshold at a given q-value with the knowledge obtained at the other q-values tested up to that phase of the examination. For this purpose, bidimensional testing procedures, like the Functional Adaptive Sequential Testing, the quick CSF and the quick TvC method, have been devised.
DOI: 10.1051/978-2-7598-2517-2.c012 © Science Press, EDP Sciences, 2021
Measuring the Soul
114
12.1
The Functional Adaptive Sequential Testing (FAST: Vul et al., 2010)
Functional Adaptive Sequential Testing (FAST) is a psychophysical procedure devised for estimating the threshold function. The threshold function, as recalled, describes how the threshold (α) varies as a function of an independent variable q: this can be the case of contrast threshold (α) as a function of spatial frequency (q). The threshold function related to contrast sensitivity is called contrast sensitivity function. FAST assumes a surface of response probability. One dimension is determined by the proportion of correct responses vs. signal strength and generates the psychometric function: it is, therefore, the dimension of the variable α as a function of the signal strength. The other dimension is determined by the position of the variable α along the axis of the independent variable q, and generates the threshold function (figure 12.1). The sigmoid shape of the psychometric function assumed in the FAST method is logistic,1 characterized by the slope β, and it is determined by the equation: Pr ¼
1 : 1 þ ebðxaÞ
ð12:1Þ
The position of the lower and upper asymptotes depends on the guess rate γ and the lapse rate λ, and needs not to be estimated in the FAST procedure. In turn, the shape of the threshold function depends on the type of threshold: in the contrast function (that relates contrast sensitivity to spatial frequency) it is a log parabola with a peak of sensitivity at the middle range of the spatial frequencies, and a progressive decay at the higher and lower values. Because sensitivity is the inverse of the threshold, the curve referred to the contrast threshold is an inverse function compared to the curve of contrast sensitivity (figure 12.1, left panel). The shape and position of the log parabola is described by a vector ~ h made of three parameters, namely: – θ1 = log10 contrast threshold at peak sensitivity (in figure 12.1 θ1 is the trough of the threshold function); θ1 shifts the parabola up and down along the scale of signal strength (contrast). – θ2 = log10 spatial frequency corresponding to the peak of sensitivity; θ2 shifts the parabola left and right along the axis of the spatial frequency. – θ3 = log10 of the inverse bandwidth of the threshold function under investigation; θ3 stretches or shrinks the parabola horizontally (figure 12.1, right panel). The threshold function F is described by the equation: Log a ¼ h1 þ 10h3 ðlog10 ðq Þ h2 Þ2 :
1
See appendix I.
ð12:2Þ
Multidimensional Adaptive Testing
115
FIG. 12.1 – Left: surface of response probability: the first dimension is the psychometric function (in red) that represents the proportion of correct responses as a function of the signal strength. The other dimension is the threshold function (for example the contrast sensitivity function): it is a log parabola (black curve) that describes how the threshold of the psychometric function changes along the axis of the independent variable q (the spatial frequency in the contrast sensitivity function); right: the log-parabola of the threshold function described by the three parameters θ1, θ2, θ3. See further text for explanation. Since knowing the characteristics of the log parabola means knowing α for each value of q, the vector ~ h (the shape of the log parabola determined by θ1, θ2, and θ3) is targeted by the FAST procedure. In addition, FAST requires the estimate of the slope2 β of the psychometric function. In summary, FAST derives this bidimensional threshold function from ~ h and β. FAST is a Bayesian technique that estimates the bidimensional posterior probability density function referred to the slope β and the vector ~ h to obtain the threshold function of contrast sensitivity vs. spatial frequency. The procedure starts from a bidimensional a prior distribution of a given set of β and ~ h values. Before each presentation, the entropy of each bidimensional posterior probability distribution is computed for the two possible responses (hit/miss) across the stimulation range, like the Ψ method of Kontsevich and Tyler.3 The next stimulus is presented at a level that minimizes the expected posterior entropy of the bidimensional pdf. After each trial, based on the response of the observer, the bidimensional prior pdf is updated via MLE according to Bayes’ theorem: Ppost ð~ h; bjqp ; xp Þ ¼ Pðrjqp ; xp ; ~ h; bÞ Pprior ð~ h; bÞ:
ð12:3Þ
Then, the entropy of each bidimensional pdf is re-computed for the two possible responses, and the level of the stimulus x that minimizes the entropy is selected and presented at the next trial. In this way, after each trial, a gain of information is
More precisely the spread r. See section 11.6.
2 3
116
Measuring the Soul
obtained compared to the previous presentation, so that the entropy of β and ~ h decreases progressively. Considered that the computation of the expected posterior entropy after a hit or miss response for an indefinite number of combinations of stimulus intensity + β + ~ h would be prohibitively slow, the procedure is made faster by simplifying the bidimensional prior probability density function as a bidimensional probability mass function (pmf): the bidimensional pmf can be represented as a lattice with a finite number of loci, each defined by a value of θ and β. The grain of the lattice is uniform and the range of β and ~ h is pre-defined. The lattice over the bidimensional surface is bounded by the marginal probability distribution of the two parameters of interest (~ h and β). The two marginal probability mass functions are approximated to marginal probability density functions (figure 12.2). As explained by the authors, each of the two marginal probability density functions “can be loosely thought of as the profile of the N-dimensional joint probability density as projected onto the axis of interest”. “The peak of the marginal pdf does not generally agree with the peak of the N-dimensional pdf” (Vul et al., 2010, p. 494). At the beginning of the examination, a prior bidimensional distribution is obtained by assigning a prior value of ~ h and β to each locus of the lattice.4
FIG. 12.2 – The lattice (i.e., the multidimensional pmf) related to the joint (bidimensional)
pdf and the two corresponding marginal probability mass function referred to β and ~ h. Each locus of the pmf is determined by the combination of a value of β and a value of ~ h. Each marginal pmf can be thought of as the profile of the bidimensional pmf projected onto the surface, but the peak of each marginal pmf does not generally correspond to the peak of the bidimensional pmf. From Vul et al., (2010). Courtesy of BRILL.
4
The prior may be rectangular.
Multidimensional Adaptive Testing
117
As explained, after the stimulus that minimizes the entropy of the bidimensional pmf is presented and the response is recorded, MLE is recomputed according to the response given to that stimulus, and the whole bidimensional pmf is updated by assigning to each locus the most likely value of β and ~ h. The next stimulus is then presented at the level that minimizes its entropy and the process is iterated. The computations are done using the mean of the marginal probability density functions. Global or local strategy for stimulus placement To select the most suitable stimulus level, that is the presentation that minimizes the posterior entropy of the bidimensional pdf, the first step is to select the minimum expected entropy of n-alternative bidimensional pdfs. These alternatives include the combination of signal strength (contrast) and spatial frequency (the independent variable q). The computation may be performed at a global or at a local level. In the global strategy, the expected entropy is computed simultaneously at each stimulus level and at each spatial frequency (q). This way, the global strategy selects the globally most informative parameters x, q for the next presentation (in our specific case a given contrast at a given spatial frequency): this is the combination of x and q that minimizes the overall entropy at all the spatial frequencies of the threshold function. In turn, the local strategy estimates the level of x with the minimum entropy at a pre-selected spatial frequency q. The pre-selected spatial frequency changes trial after trial. In other terms, the local strategy picks the optimal contrast that minimizes the local entropy at a pre-determined spatial frequency (q) at trial n, and at a different pre-determined spatial frequency at the next trial n + 1. The drawback of the global strategy is that if the shape of the real threshold function deviates from the shape of the assumed prior threshold function (the log parabola in our case), the global strategy is unable to detect the discrepancy, so it loses efficiency. In addition, it requires a lot of computation after each trial, thereby, a considerable amount of time to be completed. On the contrary, the local positioning strategy is more robust to deviations of the threshold function from the assumed prior, but is overall less efficient than the global strategy. However, simulations demonstrated that the global strategy is more biased than the local strategy when the prior assumption of the shape of the threshold function is wrong. The authors, thereby, recommend choosing the latter. Dynamic resampling The simplification of the bidimensional pdf as a discrete lattice grid raises the issue of the best compromise between testing range and resolution (grain) of the threshold. A narrow range allows for greater resolution (a higher number of loci per bidimensional surface), but the posterior pmf may be localized out of the testing range, making the procedure inefficient. In turn, a wide range may reduce the density of the loci, degrading to an unacceptable level the precision of the estimation. An elegant solution is the adoption of a dynamic re-sampling of the loci of the lattice: the mean and standard deviation of the marginal pdf of θ1, θ2, θ3, and β are
118
Measuring the Soul
re-sampled at a certain phase of the exam5 so as to shift or shrink the bidimensional pmf of the lattice according to the information collected so far: this way, the procedure converges on the last updated bidimensional distribution (the global maximum), increasing the precision without the risk to let the parameters ~ h and β fall outside the tested range. The main advantage of FAST over the method of 1000 staircases is reduced examination time: each point of the bidimensional space is reciprocally informed: this optimizes the placement of the stimulus at the next trial, speeding up the convergence toward the threshold values in the q (spatial frequency) domain. In addition, the estimation of the threshold is not restricted to a discrete range of (predefined) x-loci, but its value at any given q-location provides information to the whole x-domain. With this feature, bidimensional adaptive testing provides a second main advantage: it is more sensitive to local deviations from the assumed shape of the threshold function. FAST proved to be more accurate in estimating the contrast threshold function in the first 100–200 trials compared to the individual threshold obtained with independent adaptive procedures at 14 spatial frequencies.6 However, after 200 trials, the difference in performance between the two techniques disappears (Vul et al., 2010).
12.2
The Quick CSF Method (qCSF: Lesmes et al., 2010)
The quick CSF method (qCSF) is another Bayesian procedure that aims at estimating the threshold function related to contrast sensitivity (contrast sensitivity function, CSF). As recalled in the previous section, the contrast sensitivity function describes contrast sensitivity (1/threshold) as a function of spatial frequency. The contrast sensitivity function in the qCSF is parameterized as a left-side truncated log-parabola. As explained by the authors, this curve preserves the symmetry near the peak of the function and the plateau at low spatial frequencies typical of the CSFs empirically determined. The parameters characterizing the truncated log-parabola are: – – – –
5
Smax: the maximum sensitivity (peak gain) in decimal log units. Fmax: the spatial frequency corresponding to the maximum sensitivity. B: the bandwidth (at half-maximum) in octaves.7 T: the left-side truncation level in decimal log units (figure 12.3).
By computing the mean and standard deviation of the corresponding marginal pdfs. The authors used the W method associated with the 4-AFC response model. 7 A band of frequencies is one octave in width if the upper frequency of the band is double the lower frequency. So, an octave band is a band that spans one octave. For example, a CSF with spatial frequencies spanning from 0.5 to 16 cycles per degree has a bandwidth of 5 octaves (0.5–1, 1–2, 2–4, 4–8, and 8–16). 6
Multidimensional Adaptive Testing
119
FIG. 12.3 – The left-truncated log parabola used in the quick CSF method as the form of the contrast sensitivity function. See text for details. The qCSF performs a Bayesian estimation of this left-truncated log parabola ~ (Smax, Fmax, B, T). This is achieved parameterized in the four-dimensional space x as follows: (1) The procedure starts from a prior probability density function p0(~ x) of the CSF, defined over the four-dimensional vector (~ x): The prior parameterization ~ is obtained assuming that each of the four parameters is represented by a of x prior probability density function: p0(Smax), p0(Fmax), p0(B), and p0(T); these pdfs have the shape of a flat hyperbolic secant. The integration of the four p0 generates the prior pdf p0(~ x) that is, indeed, the prior knowledge of the four parameters in the form of a flat, left-truncated log parabola. (2) After each presentation, p0(~ x) is turned into a posterior pdf pt(~ x): the posterior ~ contains the values of Smax, Fmax, B, pdf describes the probability that each x and T that define the CSF of the observer. The update of pt (~ xj~ x ) after each response (correct or incorrect) to the stimulation ~ x (the combination of contrast level and spatial frequency) is obtained via Bayes rule: ~;~ pt ð~ xj~ x Þ ¼ p0 ð~ xÞpðr jx x Þ:
ð12:4Þ
This way, after each response r at the stimulation ~ x , pt (~ x) becomes the prior for the next trial. The stimulus placement is a material point: the next ~ x is the vector with the combination of spatial frequency and contrast level that minimizes the expected entropy of p(~ x) at the next presentation. For this purpose: (3) The expected entropies Hn referred to the pdfð~ xÞ computed for a correct (rc) and an incorrect (ri) response to each stimulus level ~ x that can be presented at the next trial is calculated as:
Measuring the Soul
120
Hn ð~ xj~ x jrc Þ ¼
t X
~j~ ~j~ pðx x ; rc Þ log pðx x ; rc Þ;
ð12:5Þ
~j~ ~j~ pðx x ; ri Þ log pðx x ; ri Þ:
ð12:6Þ
~ x
xj~ x jri Þ ¼ Hn ð~
t X ~ x
(4) ~ x that provides the lowest expected entropy is selected and presented at the next trial. This way the procedure is “one step ahead”, as it predicts the most suitable ~ x to present to minimize the uncertainty about p(~ x) at the next trial.8 (5) The posterior pt(~ x) is updated according to the response of the subject, and the whole process is iterated for a fixed number of trials. The qCSF method uses a 2AFC response model. Simulated data by Lesmes and colleagues (2010) proved that it requires at least 100 trials to obtain a reliable and precise estimate (1–3 dB) of the contrast sensitivity function. The simulated results were confirmed by real data in an experiment performed on three observers with 11 spatial frequencies ranging from 0.2 to 20 cpd and 46 levels of contrast (from 0.15% to 99%). Test–retest reliability was 82–96%. The authors reported that, especially if the subject’s CSF does not fit the truncated log-parabola (e.g., in case of notches in the threshold function along the spatial frequency dimension), the Ψ method of Kontsevich and Tyler9 is more precise but, inevitably, less efficient (longer examination time as it tests one spatial frequency at a time in separate sessions).
12.3
The Quick TvC Method (Lesmes et al., 2006)
The quick TvC method is a procedure that measures the threshold function defined by Lesmes and colleagues as threshold versus noise contrast function (TvC). The TvC function describes how contrast threshold (or contrast sensibility) varies as a function of external noise. Like the contrast threshold function, the TvC function can be represented on a surface formed by the psychometric distributions that describe how the proportion of correct responses changes with contrast across different external noise levels (figure 12.4). Lesmes reported that the estimate of contrast threshold at three well separate performance levels (φ1, φ2, φ3 that is below threshold, at threshold [target probability: 79%], and above threshold) is sufficient to obtain a satisfactory description of all the psychometric functions across the surface: thereby, of the TvC function.
The strategy is similar to the W method and the qTvC method (see next section). See section 11.6.
8 9
Multidimensional Adaptive Testing
121
FIG. 12.4 – The bidimensional contrast versus noise threshold function (TvC). The bidimensional function is made of a family of psychometric distributions measured across the range of noise levels. Next: noise; c: contrast. See text for explanations. Modified from Lesmes et al., (2006). Reproduced with permission of Elsevier via Copyright Clearance Center. As shown in figure 12.4, the surface shows an elbow: on the left side of the elbow (low noise levels), noise does not affect contrast sensitivity (the amount of contrast required to obtain φ correct responses does not change as noise increases);10 on the contrary, beyond a critical value Ncrit, noise affects the threshold, so that the higher the noise, the higher is the contrast required to keep the proportion of correct responses at φ. The ratio of the contrast at the three target probabilities is constant across the noise levels: this suggests that the slope of the psychometric functions is the same irrespective of the level of external noise: this simplifies the identification of the TvC surface that is therefore made up of n parallel psychometric functions, all with the same slope. The TvC depicted in figure 12.4 can thereby be described by three parameters: – The optimal threshold, that is, the threshold measured at the left side of the knee (low noise levels) αopt; – The critical noise level Ncrit, that is, the noise level from which the contrast threshold αopt starts increasing (αopt → α); – The slope β of the psychometric functions, (contrast vs. proportion of correct responses). As explained, the slope is constant across the tested range of external noise. The three parameters can be integrated into a three-dimensional vector ~ v (αopt, Ncrit, β). The vector describes a parameter space encompassing all possible psychometric functions across the range of noise considered. Starting from a prior
10
This confirms what was reported more than 50 years ago by Nagaraja (1964).
Measuring the Soul
122
probability density function of (~ v), the quick CvT method uses Bayes’ rule to estimate trial after trial the posterior probability that each possible (~ v) is the (~ v) of the observer. Before the next trial, the entropy of pdf(~ v) is computed for a correct and incorrect response to each stimulus level ~ x (the stimulus level is a vector made of a combination of contrast and noise). Of all the entropies computed for the pdfð~ vÞ, the value of pdfð~ vÞ associated with the minimum expected entropy after a yes (or correct) and no (or incorrect) response is then selected; ~ x corresponding to this pdfð~ vÞ is presented at the next trial. This way, the level of uncertainty (entropy) about the pdfð~ vÞ that fits the performance of the observer decreases after each presentation. In summary, like the Ψ method of Kontsevich and Tyler and the quick CFS method, the quick TvC method follows these steps: (1) The procedure starts from a prior parametrization of the TvC function, i.e., from a prior probability density function p0(~ v) with each probability p0 defined * over v (αopt, Ncrit, and β). (2) After each presentation and each correct rc or incorrect ri response, p0(~ v) is transformed into a posterior probability density function pt(~ v): the posterior pdf describes what is the probability that each (~ v) contains the values of αopt, Ncrit, and β that define the TvC function of the observer. The update of pt(~ v) after each response (correct or incorrect) to the stimulus (~ x ) takes place via Bayes rule: pt ðj~ v j~ x ; rc Þ ¼ p0 ð~ vÞ pðrc j~ v j~ x Þ;
ð12:7Þ
pt ðj~ v j~ x ; ri Þ ¼ p0 ð~ vÞ pðri j~ v j~ x Þ:
So, after each response pt(~ v) becomes the prior for the next trial. The stimulus placement is a material point: which is the next ~ x (that is the optimal combination of contrast level and noise level of the next stimulation)? The optimal stimulation is ~ x that minimizes the expected entropy of pr(~ v) at the next presentation. For this purpose: (3) The entropy H of p(~ v) is computed for a correct (rc) and incorrect (ri) response to each level of stimulation ~ x that can be presented at the next trial: Hpð~ v j~ x ; rc Þ ¼
t X
pð~ v j~ x ; rc Þ log pð~ v j~ x ; rc Þ;
~ v
Hpð~ v j~ x ; ri Þ ¼
t X ~ v
pð~ v j~ x ; ri Þ log pð~ v j~ x ; ri Þ:
ð12:8Þ
Multidimensional Adaptive Testing
123
(4) ~ x that provides the lowest expected entropy across the computed range of entropies is selected and presented at the next trial. This way, the procedure is “one step ahead”, as it predicts to what extent the uncertainty about p(~ v) will be reduced after the next presentation. (5) The posterior pt(~ v) is updated and the whole process is iterated for a fixed number of trials. According to Monte Carlo simulations and an experiment with real subjects, the quick TvC method is more efficient than the method of constant stimuli and the mono-dimensional adaptive procedures that explore one noise level at a time. It requires less than 300 trials to determine the TvC function with acceptable precision.
12.4
The MEEE Method of Cobo-Lewis for Performance Categorization
The MEEE (Minimum Estimated Expected Entropy) method is an adaptive procedure devised by Cobo-Lewis (1997) to categorize the performance of a subject.11 Consider the case of contrast sensitivity: different shapes of the threshold function may suggest different functional impairments. The method of Cobo-Lewis aims at singling out the category that best fits this impairment in the observer. The experimental scenario is peculiar and raises a main issue that does not involve the common psychophysical experiments for the estimate of the threshold. In the current psychophysical procedures, in fact, the threshold (target parameter) and the strength of the signals presented for its estimate (stimulation parameter, e.g., luminance) are quantities expressed in the same unit of measure, so that staircase or MLE-based procedures can be used to choose, trial after trial, the optimal level of stimulation according to the response of the observer. But when it comes to categorizing the performance of a subject, the target and the stimulation parameters are incommensurate: if the latter remains a quantitative dimension, the former is a nominal variable. The minimum variance method, as seen, may solve the problem of incommensurate parameters, but only if the target parameter is continuous and not discrete.12 On the contrary, if it is categorical (nominal, thereby discrete), the minimum variance method is not suitable for the guidance of the stimulus placement at each trial.13 Cobo-Lewis addressed this issue devising an adaptive procedure based on the so-called Minimum Estimated Expected Entropy (MEEE).
11
The original purpose of the author was audiogram categorization. Consider, for example, threshold and slope: two incommensurate but continuous parameters. 13 As a matter of fact, the variance of non-continuous parameters is undefined. 12
Measuring the Soul
124
The MEEE procedure starts from the prior probability distribution that the observer belongs to a category, represented as a prior pdf, and at each trial it selects the stimulus parameters ~ sm “in order to minimize the estimated expected entropy of the a posteriori probability distribution that expresses how likely a subject is to belong to each of a group of mutually exclusive categories” (Cobo-Lewis, 1997 p. 989). Therefore, ~ sm is presented, a response is obtained by the observer, and in light of the acquired datum, the posterior pdf is recomputed. Then, a new MEEE (that will be smaller, i.e., more informative than the previous one) is calculated. The process is reiterated. This way, as the exam proceeds, the MEEE method classifies with more and more precision the performance of the observer into one of a predefined number of categories. Consider this case: the goal is to classify the contrast sensitivity function (CSF) of a subject into 5 CSF categories (figure 12.5): (1) (2) (3) (4) (5)
Normal CSF. Globally depressed CSF. Reduced sensitivity at low spatial frequencies. Reduced sensitivity at high spatial frequencies. Reduced sensitivity at middle spatial frequencies.
The likelihood that the data obtained from the observer fits one of the five categories is formalized as follows: Lðobtained set of responsesjc1:::5 ¼ C Þ ¼
n Y
Lðrn ¼ Rn jcn ¼ CÞ;
ð12:9Þ
i¼1
where c is one of the five categories, C is the correct category for the subject, r is the obtained response at that trial n, and R is the expected response at that trial if the category under consideration c is the correct category (C) for the subject. The equation can be read as follows: “the likelihood that the set of responses obtained across n trials is that expected in case the category c1…5 is the correct category (C) is the product of the likelihood that at each trial n the obtained response (rn) is the response that is expected (Rn) if cn is the correct category C”. The Bayesian estimation of the correct category for the observer is obtained adding a prior pdf to this likelihood function. In a simplified form, it can be summarized as:
FIG. 12.5 – The five categories of contrast sensitivity function assumed in the example.
Multidimensional Adaptive Testing
125
Pdf post ðc1:::5 ¼ C jobtained set of responsesÞ n Y Lðrn ¼ Rn jcn ¼ C Þ: ¼ Pdf prior ðc1:::5 Þ
ð12:10Þ
i¼1
The pdfpost describes the probability distribution of the performance of the observer across the five categories after each trial. Once the pdfpost has been updated, the next presentation is driven at the signal ~ s 14 whose response (correct or incorrect) is expected to minimize the entropy about the category of the observer. So, conditional entropy (the level of uncertainty about each category)15 is calculated in case of a correct and incorrect response at each stimulation ~ s according to the usual formula: X_ _n _ H ð~ sÞ ¼ P c ð~ s Þ log P c ð~ s Þ; ð12:11Þ c
^
n
sÞ ¼ H ð~
X^ ^ pc ð~ s Þ log pc ð~ s Þ: c
_
^
In the equation, (P c ) and ðP c Þ are the posterior conditional probability that each category is the category of the observer after a correct (∧) and an incorrect (∨) response to signal ~ s. Like the Ψ method, the Minimum Variance Method, and the quick methods of Lesmes, the MEEE method is “one step ahead”, as it predicts the entropy after each trial, and indicates which stimulus to present and to what extent the uncertainty about c will be reduced after the next presentation. The process is iterated until the MEEE is low enough.16
14
In the example reported here, the signal s is parametrized by contrast and spatial frequency. Cobo-Lewis recalled that entropy can be viewed as a measure of “[…] how much surprised the subject should be to discover that he is actually a member of category c” (Cobo-Lewis, 1997, p. 991). In the hypothetical case pnc were 1 (i.e., 100% of probability that the subject belongs to category c), entropy computed from equation 12.11 would be zero: we would not be surprised at all to discover that the subject belongs to category c, as there is no uncertainty about this hypothesis. On the contrary, if Pc were 0 (i.e., 0% of probability that the subject belongs to the category c), entropy computed from equation 12.11 would be infinite: in this case we should be infinitely surprised to discover that the subject is a member of category c, as there is not even the slightest chance that the subject belongs to category c. This example serves to clarify that the higher the entropy related to a category, the more the surprise in discovering that the subject is a member of that category (in fact the higher is the level of uncertainty). 16 “Low enough” means ≤0.469 bits, corresponding to a probability of at least 0.9 that the selected category is the correct category of the observer (Cobo-Lewis, 1997). 15
Chapter 13 What Makes a Psychophysical Technique a Good Psychophysical Technique? The reliability and precision of the adaptive procedures can be evaluated using simulations or in real experiments. In the first case, whether the results reflect the real human behavior (included the occurrence of false-positive and false-negative errors and the positional or temporal bias documented in the AFC response models) is questionable. In the second case, threshold fluctuation, learning effect or fatigue effect can bias the judgment on the procedure, especially its variability.1 Despite these unavoidable methodological issues, Treutwein (1995) recalled that the goodness of a psychophysical technique can be improved by optimizing the following parameters: (1) The examination time required to measure the variable(s) of interest (generally threshold or threshold + slope). This is crucial because in human observers long examinations predispose to attention drops, and attention drops can affect the reliability of the test. In addition, time-consuming procedures are not practical, especially within the clinical setting. (2) Precision and efficiency. The precision η is the inverse of the variance r2h of the number of threshold estimates θ (Taylor, 1971; Treutwein, 1995): g¼
1 t1 ¼P ; 2 r2h t ðht AVG hÞ
ð13:1Þ
1
The principle of stationarity (a crucial assumption of psychophysics) is always violated to a certain extent, as human responses tend to show sequential dependence (Harvey, 1986). Interleaving independent runs for different parameters (e.g., presenting different spatial frequencies in random order during a contrast sensitivity test) could help minimize sequential interactions (Treutwein, 1995).
DOI: 10.1051/978-2-7598-2517-2.c013 © Science Press, EDP Sciences, 2021
Measuring the Soul
128
where t is the number of threshold estimates, θt is each threshold estimate, and AVG θ is the average threshold estimate obtained from all the estimates. The lower the variance of the threshold estimate, the higher is the precision η of the psychophysical technique. The main factor that affects the variance, thereby the precision of the examination, is the systematic error2 or bias. Bias is of two types: – Measurement bias (bm): it quantifies the difference between the estimated (empirical) threshold and the true threshold. When some threshold estimates are measured and averaged, measurement bias indicates how this average differs from the real (true) threshold (King-Smith et al., 1994). Measurement bias of a procedure can be assessed only via simulations because the true threshold θtrue in real observers is unknown. Once θtrue is obtained by simulations, measurement bias bm is computed according to the formula provided by Treutwein (1995): 1X bm ¼ ðhtrue ht Þ; ð13:2Þ t t where t is the number of the threshold estimates and ht is the value of each threshold measure. Besides, measurement bias is quantifiable as the difference between the (log) true threshold and the mean of the (log) threshold estimates: bm ¼ htrue AVGht :
ð13:3Þ
If there is no difference, measurement bias is zero; the larger this difference, the greater the measurement bias. In summary, measurement bias expresses the deviation of the estimate (empirical threshold) from the real value (true threshold). Even if the empirical threshold will never match the true threshold in a real observer, the closer these two values, the more precise is the technique. Measurement bias depends on the way data are collected: for example, on the rule selected for the placement of the stimuli (i.e., mode, median or mean of the pdf in the Bayesian procedures) or on the number of reversals in staircase methods. Besides, measurement bias depends on the way the final threshold is computed. – Interpretation bias (bi). Interpretation bias is the probability that a threshold estimate stems from different real thresholds. When a threshold is measured, interpretation bias indicates how this threshold estimate differs from the mean of the possible real thresholds (King-Smith et al., 1994b). In other terms, interpretation bias expresses the probability that different real thresholds give rise to the threshold estimated by the test. The higher this probability, the greater is the interpretation bias. Interpretation bias is, therefore, the inverse of measurement bias. The systematic error is quantified as the deviation of the mean values from the convergence point. In turn, the statistical error is the fluctuation of the estimates around their mean value (Kaernbach, 1991). The total error is the sum of the systematic and statistical error (Kaernbach, 1991).
2
What Makes a Psychophysical Technique a Good Psychophysical Technique? 129 If examination time and precision η are known, efficiency E can be determined: efficiency, in fact, depends on the best trade-off between examination time and precision: the more precise and less time consuming the procedure, the higher its efficiency. Taylor and Creelman (1967) introduced the sweat factor K as a measure of efficiency. The sweat factor quantifies the amount of work required to measure a threshold with a given precision. In other words, it represents the amount of work needed to achieve a reliable measure of the threshold. K is calculated as the variance r2h of the threshold estimate multiplied by the number n of trials needed to obtain that estimate: Kh ¼ nr2h
ð13:4Þ
So, the higher the variance of the threshold estimate (i.e., the lower is the precision) and the higher the number of trials required to obtain the threshold, the greater is the sweat factor of the procedure (the lower is its efficiency). By normalizing the sweat factor Kθ by an ideal sweat factor Kideal, the efficiency of different techniques can be compared; Kideal reflects the absolute value of efficiency (100%) of an ideal perceiving machine (a simulation technique) that estimates at its best (that is with the lowest variance and as few presentations as possible) the threshold.3 Relative efficiency can be assessed by comparing the sweat , factor of the procedure Kθ with the ideal sweat factor Kideal: the higher the ratio KKideal h the lower the efficiency of the procedure: Kideal 2 : ð13:5Þ Relative efficiency ¼ Kh
Since Kideal ¼ nr2ideal , it must be established what is the ideal variance r2ideal . As recalled by Treutwein, Taylor (1971) suggested to adopt the variance of the stochastic approximation (Robbins-Monro approximation) that converges asymptotically on the threshold for a given target probability u, and after a fixed number of trials n. So:
3
Kideal ¼ n r2RM ; where r2RM ¼
uð1 uÞ nðbÞ2
;
ð13:6Þ
where b is the slope of the psychometric function. It follows that: Kideal ¼ n
uð1 uÞ nðbÞ2
! Kideal ¼
uð1 uÞ : b2
ð13:7Þ
In turn, Alcalà-Quintana & Garcia-Pérez suggested to compute the ideal variance r2ideal assuming that the location of the threshold h at a given target probability u is known and that the intensity of the stimulus presented at all trials xn is = h. In this hypothetical case, indeed, the variance of _
the estimated threshold h is the variance of the proportion of correct responses, that is to say of u. It follows that Kideal ¼ uð1 uÞ. If the target probability is 0.5, Kideal will be 0.25. If the target probability is 0.75, Kideal will be 0.187 (Alcalà-Quintana and Garcia-Pérez, 2004).
130
Measuring the Soul
(3) Test-retest reliability. The measurement of the threshold obtained for each observer must be confirmed in subsequent examinations performed in identical experimental conditions. If the outcome from the same subject differs consistently in subsequent exams, the test cannot be considered reliable. (4) Inferred validity. As pointed out by Blackwell (1952) and reminded later by Jäkel and Wichmann (2006), the inferred validity of a psychophysical procedure depends on the extent to which criteria considered negligible affect the threshold estimate: in simpler terms, the inferred validity expresses the susceptibility of the procedure to variables like the number of stimuli or the order they are displayed. (5) Sensory determinacy. Sensory determinacy refers to the absolute magnitude of the threshold. The lower the threshold estimate, the more sensorially determined it will be. So, methods that yield lower thresholds have higher sensory determinacy because they are less affected by extrasensory influences leading to threshold overestimation (Blackwell, 1952). What reduces sensory determinacy is especially the attentional demand (Jäkel and Wilchmann, 2006). This aspect should be taken into consideration because attentive demand depends on the psychophysical procedure employed.
13.1
Assessing the Stability of the Psychometric Function
The main factor affecting the reliability of the psychophysical measurements is the fluctuation of the sensitivity. A physiological oscillation of the visual sensitivity4 associated with attention drops, changes in the subjective criterion, internal noise, and learning/fatigue effect make the performance of the subject variable to a certain extent: as a result, the psychometric function tends to be unstable during the examination, with a shift of its positional parameter (the threshold) and underestimation (flattening) of its slope.5 In sum, the principle of stationarity is violated. Adaptive methods can inform about these fluctuations. For this purpose, Leek in his review (2001) reported two approaches.
4
There is evidence that visual sensitivity fluctuates over time with periodicity. Thoss and colleagues measured sensitivity at threshold for a detection task (detection of a white spot of light presented on a white background). The periodic presentation of the stimulus and subsequent recording of positive (detected) or negative (not detected) responses generated a polarity-autocorrelation function with an oscillation periodicity encompassing two preferential intervals: between 0.5 and 2 min, and between 5 and 15 min (Thoss et al., 1998). 5 In fact, averaging the slope of a psychometric function whose position fluctuates along the intensity axis biases low the final estimate of the dispersion parameter b.
What Makes a Psychophysical Technique a Good Psychophysical Technique? 131
13.1.1
The Method of Hall (1983)
Hall proposed a method to screen the non-stationarity of the psychometric function, i.e., the variability of the subject’s performance (ultimately the variability of the threshold) during AFC experiments. In the method of Hall, the responses at each trial are compared to those expected from the maximum likelihood estimate of the psychometric function (i.e. of threshold and slope). If the difference is constant at each trial, then the psychometric function being fitted is stable. If the difference varies across trials, it means that the performance is not stable but fluctuates during the exam. To illustrate his system, Hall utilized the hybrid procedure that bears its name6 (Hall, 1981), associated with 4- or 16-AFC; however, he recalled that the technique can be applied to other adaptive psychometric procedures like UDTR staircase or PEST. The method measures the discrepancy Z between the observed proportion of correct responses at different intensities and the corresponding proportion of correct responses as expected based on an assumed psychometric function. If the discrepancy Z remains stable as the examination proceeds, the performance of the subject is stable, that is to say, the threshold and more in general the psychometric function is stationary across trials. Otherwise, it must be concluded that the threshold has shifted during the test. A straightforward way to establish the shift in performance is to perform a regression analysis taking Z as the dependent variable and the number of the trials as the independent variable: if the correlation is low enough, (the slope of the linear model is close to zero), then the function is stationary, otherwise it is unstable. A more detailed explanation is as follows: let Yx be the probability of a correct response at the stimulus intensity x. The expected probability of a correct response E(Y) at any stimulus level x is determined by the psychometric function F(x) so that E(Y) = F(x). A second variable Z is defined as the difference between the expected probability of a correct response E(Y) at the stimulus intensity x and the probability of a correct response Yx at the same stimulus level. This variable, therefore, expresses the discrepancy between Yx and E(Yx): Zx ¼ Yx E ðYx Þ:
ð13:8Þ
The discrepancy occurs if the proportion of the correct responses of the observer does not correspond to the response expected according to the assumed psychometric function: Yx ≠ E(Yx), → Yx−E(Yx) ≠ 0 → Zx ≠ 0. On the contrary, if Yx = E(Yx) → Yx−E(Yx) = 0 → Zx = 0. Now, to detect a non-stationary psychometric function what matters is not the absolute value of Zx (be it zero or non-zero), but the change in Zx during the examination. The procedure follows two steps: first, Z is computed for each tested level of intensity and then linear regression of Z vs. trial number is performed. If the slope of
6
See section 11.2.
132
Measuring the Soul
the linear model is steeper than expected according to the null hypothesis (stationarity of the psychometric function), it is concluded that the position of the psychometric function has changed during the session. An example inspired by the representative case reported by Hall (a Hybrid Procedure experiment paired with the 4AFC response model) in his paper is described below and depicted in figure 13.1. In a hypothetical session of PEST, the intensity presented at trials 1-4 is 40 dB, and, according to the assumed psychometric function, at this intensity all responses “hit” are expected (proportion of correct responses = 1.0) so that E(Y) = F(x) = 1. Let us assume that, indeed, all responses “hit” have been obtained. So: Z40 = 1.0 −1.0 = 0. At trials 5–8, the intensity presented is 25 dB, and according to the assumed psychometric function, 65% of responses “hit” are expected at this level of intensity (proportion of correct responses = 0.65). So: E(Y) = F(x) = 0.65. Let us assume that at this level of intensity 75% of responses “hit” have been obtained. So: Z25 = 0.75−0.65 = 0.1. At trials 9–12, the intensity presented is 15 dB, and 40% of responses “hit” are expected at this level of intensity (proportion of correct responses = 0.4). So: E(Y) = F(X) = 0.4. At this level of intensity, 50% of responses “hit” have been obtained. So: Z15 = 0.50–0.40 = 0.1. At trials 13–16, the intensity presented is again 25 dB and 65% of responses “hit” are expected at this intensity, so that E(Y) = F(X) = 0.65. At this level of intensity, all responses “hit” have been obtained. So: Z25 = 1.0−0.65 = 0.35. Finally, at trials 17–20 the intensity presented is 5 dB and 20% of responses “hit” are expected at this intensity, so that E(Y) = F(X) = 0.2. At this level of intensity, 25% of responses “hit” have been obtained. So: Z55 = 0.2−0.25 = 0.05. The slope of the linear model is 8.7 × 10−3. The coefficient of the linear model that rejects the null hypothesis (the hypothesis that the performance is stable) has been estimated by Hall with Monte Carlo simulations = 1.73 × 10−5. The value is computed assuming a logistic psychometric function and using 4AFC-Hall’s hybrid procedure (50 trials), that is the procedure adopted in the original paper.
FIG. 13.1 – Checking the non-stationarity of the psychometric function according to Hall (1983). Left: a hypothetical session of PEST. In red is represented the assumed psychometric function. Right: the linear regression model (Z vs. trial number). See text for explanation.
What Makes a Psychophysical Technique a Good Psychophysical Technique? 133 In the example provided by the author, the slope was 5.4 × 10−3, whereas in our hypothetical example it was 8.7 × 10−3: in both cases, the slope is higher than the limit computed under the null hypothesis (1.73 × 10−5): so, in both cases the null hypothesis must be rejected and it must be concluded that the subject’s performance has changed during the experiment. The author explained that in n-AFC response models the sensitivity of the procedure decreases with the number of alternatives so that for n-AFC where n < 4 his method is not applicable.
13.1.2
The Interleaved Tracking Procedure of Leek et al., (1991)
Leek et al., (1991) described a technique able to monitor unstable psychometric functions based on a double interleaved staircase procedure. The authors paired two transformed 1up-3 down staircase tracks, both with a step size of 2 dB and 50 trials long, after a few trials performed with a single track (step size: 5 dB). The measure of the local as well as of the global variability of the psychometric function is computed comparing the different values of sensitivity obtained in the two tracks. Local variability reflects the difference in responses adjacent in time and is the expression of the discrepancy between the two tracks on consecutive trials. The local variability at each trial t is measured by computing at t the deviation of the level of intensity of the stimulus from the average intensity of the two tracks. The sum of the deviations calculated at each trial reveals the variability between the two tracks. Therefore, the local variability can be algebrically formalized as: N X ðstA stB Þ2 t¼1
2
;
ð13:9Þ
where: N is the number of trials t. stA is the stimulus level of the track A at trial t. stB is the stimulus level of the track B at the same trial t. The exponent 2 means the local variability is computed in terms of variance. Local variability is little affected by the long-term fluctuation of the sensitivity due to learning or changes in attention: so, it can be used to measure the true (unbiased) slope of the psychometric function.7 Global variability (long-term variability during the examination) is computed by summing the (squared) deviations of the average intensities between the two tracks at each trial from the mean intensity level of all the trials in both tracks.8 Global variability allows detecting the non-stationarity of the psychometric function during the test. It is effective in detecting unstable performances, provided the shift of the psychometric function is at least 4 dB, and the fluctuation is not too fast (figure 13.2).
See Leek et al., 1991, for details. See Leek et al., 1991, p. 1386 for the mathematical formalization.
7 8
Measuring the Soul
134
FIG. 13.2 – Local (left) and global (right) variability as computed according to the interleaved double staircase procedure described by Leek and colleagues. To be noted that at the beginning of the simulation a single staircase is performed, with step size = 5 dB. After a few trials, two 1U3D staircase procedures with a smaller step size (2 dB) are interleaved. Continuous red line: track A; green dotted line: track B. Since the procedure makes use of the 1U3D staircase, in the figure each decremental step occurs after 3 consecutive hit responses.
13.2
Assessing the Goodness-of-Fit
A sign of reliability is that the observations fit the underlying psychometric function with an acceptable level of confidence. The goodness-of-fit, indeed, is an important parameter that deserves consideration. As highlighted by Wichmann and Hill (2001), bad fits, that is over-dispersion of the observations, depend on the following reasons: (1) The assumed psychometric model is not correct, so there is lack of correspondence between the observed distribution of the data and the distribution expected according to the shape of the curve. (2) The principle of stationarity is violated. As discussed, during the exam it is possible that the sensitivity of the observer changes for a drop of attention, fatigue, or adaptation. In these conditions, runs tend to become non-stationary and not statistically independent. If the trend of responses is non-stationary, the observations will be over-dispersed so that they will not fit in with the model, whatever the model is (Jäkel and Wichmann, 2006). Wichmann and Hill (2001) recalled that to fit a psychometric function, a model of distribution is chosen, and the parameters threshold (α), slope (β), lapsing rate (λ) and guessing rate (γ) that minimize the error metric are selected according to the data obtained in the test. MLE is a common method used for this selection. The likelihood of the parameters α, β, λ, and γ is computed from the data collected during the examination, and the value of each set with the maximum likelihood is selected and used to model the psychometric function. In this respect, two issues are raised by the same authors: – First, the ML estimate of the parameters can be unreliable: this is the case when the likelihood estimation of the lapse rate λ from a given data set generates a
What Makes a Psychophysical Technique a Good Psychophysical Technique? 135 negative value (λ nonsensical), or is too large (λ inappropriate: e.g., due to drop of attention or in case few stimuli are presented at high levels of intensity). For this reason, a Bayesian constrain of λ within a realistic range is advisable. – Second, the authors showed that assuming a zero lapsing rate, as it commonly occurs in n-AFC response models, may generate consistent bias in the MLE of threshold and slope. To solve this problem, λ should not be kept at a zero fixed value, but it should be allowed to vary within a predefined range during the MLE-based procedure. With Monte Carlo simulations, Wichmann and Hill made it clear that a variable (i.e. not a pre-defined) lapsing rate is less prone to bias, leading to more accurate estimates of threshold and slope. The goodness-of-fit can be assessed with different methods. One of the most effective is to consider the residuals in the data of the subject and compare with the corresponding data predicted by the model. To preserve the goodness-of-fit, Klein highlighted the importance of finding rules that allow detecting the bad runs: for example, checking if the slope of the psychometric function tends to become too steep or too shallow during the presentation of constant stimuli or, in case of adaptive procedures like staircases, if the tested level of the signal consistently shifts toward a lower or a higher value. At any rate, if the number of trials is too small the goodness-of-fit can be degraded (Wichmann and Hill, 2001).
Chapter 14 What is the Best Psychophysical Technique? A great effort has been spent to identify the best psychophysical technique. The results obtained from simulations and with real experiments are conflicting, depending on some variables, including the type of observer: accustomed subjects in experimental sessions or naïve patients in the clinical practice. It should be kept in mind that due to its nature, the psychophysical behavior in humans (not simulations) is somehow biased by unavoidable misleading factors. So, the psychometric approach cannot but make arbitrary assumptions: in particular, the independence of the responses and the stationarity of the sensory performance: yet, as explained, the two principles tend to be violated. Despite these intrinsic limits, what should be considered the strategy of choice to measure sensations is an important matter of debate.
14.1
AFC or Y/N: Which is a Better Response Model?
Which is the optimal response model, AFC or y/n, is a controversial matter. In this section, the pros and cons of the two paradigms are discussed.
14.1.1
Subjective Criterion in Y/N and Bias in n-AFC Response Model
The main advantage of the AFC compared to the y/n response model is that the former does not suffer from variations of the response criterion of the observer, responsible for false-positive and false-negative errors. In fact, in y/n models bias can originate from changes in the internal judgment criterion about when to respond to the stimulus: the criterion adopted by the observer to decide if he has really perceived a stimulus may change at some point of the testing, becoming more conservative (a stimulus previously judged as perceived by the observer is no more DOI: 10.1051/978-2-7598-2517-2.c014 © Science Press, EDP Sciences, 2021
138
Measuring the Soul
reported as “hit” when re-presented) or less conservative (a stimulus previously not reported is subsequently judged as perceived, thereby reported). If, during the examination, the judgment criterion adopted by the subject is not stable, the observer is prone to give a “hit” or “miss” response to the same signal in different moments. If at a given moment the subjective criterion turns less conservative compared to the previous trials, a false-positive error occurs because the observer responds “hit” to a stimulus that, based on the trend of responses recorded till then, is not expected to be perceived or discriminated. This “hit” response may influence the final estimate of the threshold, leading to underestimation (threshold estimate biased low). The opposite takes place if, after n presentations, the criterion turns more conservative (in this case the observer answers “miss” to a stimulus that, based on the trend of responses recorded till then, is expected to be perceived or discriminated: false-negative error, threshold estimate biased high). In sum, the change in the response criterion, a variable that cannot be neither predicted nor controlled by the operator, may consistently affect the precision of the threshold estimate when performing adaptive (especially staircase) yes/no procedures. This drawback does not affect the AFC response models: unlike y/n, in fact, where the observer is asked to answer “yes” only when he is confident enough to have perceived the target, in AFC the subject is forced to report which of the n-alternatives is the correct one. In case false hit (false-positive) responses occur due to temporary change in the criterion level, their proportion remains about the range of chance performance (50% in 2AFC) so that the possibility that such biased (false-positive) responses affect the final threshold estimate (set at 75% probability level in 2AFC) is negligible. As a matter of fact, it is highly improbable that the observer guesses the correct alternative as many times as to reach the target probability of correct responses, that is the threshold level. Yet, AFC response models suffer from other types of bias, respectively, temporal and positional bias. In temporal AFC, the judgment can be influenced by the temporal order of the alternatives, so that the observer can be inclined to choose one interval rather than another (temporal bias). For instance, when measuring contrast sensitivity using temporal 2AFC response models, the contrast in the second interval tends to be illusory perceived about 5% higher (Klein, 2001). This finding of Klein confirms previous evidence of Green and Swets (1966). A similar temporal-interval effect has been documented by Johnson and colleagues (1984) for a wide variety of auditory tasks when the temporal 3AFC response design is used. The authors showed that correct trials were more frequent when the signal was present in the third interval; on average, the difference in performance between the first and the third intervals was about 8.5%.1 In temporal (auditory) 4AFC, correct detections are more frequent if the stimulus is in the first interval after the reference, and decreases gradually up to
1
To account for their results, the authors excluded a backward masking effect of the second and third presentation on the first presentation, as backward masking takes place within an interstimulus interval far lower (100 ms) than that used in their experiment (500 ms). On the contrary, they postulated the effect depends on selective attention and/or memory.
What is the Best Psychophysical Technique?
139
FIG. 14.1 – Distribution of the proportion of correct responses in the virtual absence of bias (left panel), in the presence of responses biased toward the interval 1 (middle panel) and in the presence of responses biased toward the interval 2 (right panel). Simulated data. the last presentation; the same goes for the frequency of responses (Watkins and Clark, 1968). In spatial AFC, the judgment can be influenced by the position of the presentations on the screen: the subject can be more prone to choose the upper or the leftward presentation in a 2AFC or could be biased toward the upper right corner in a 4AFC (positional bias). In a 2AFC, the proportion of correct responses for the first spatial/temporal interval vs. the proportion of correct responses for the second spatial/temporal interval can be plotted on a Cartesian graph. In the absence of bias, the observations are expected to be symmetrically distributed along the bisecting line. If responses are biased toward one of the two intervals, the observations will be displaced toward the corresponding side of the graph (figure 14.1).2
14.1.2
Drawbacks of the n-AFC Response Models
The AFC response designs are commonly used in psychophysical experiments and, compared to the y/n paradigm, have the advantage to avoid the problem of unpredictable changes in the subjective response criterion. Yet, they are not exempt from flaws. The criticism is related to the following aspects: (a) The threshold estimated with n-AFC(namely 2AFC) is more variable and biased than the y/n paradigm according to Kershaw (1985), Rose and colleagues (1970),3 and to Madigan and Williams (1987).4 This means that with
2
In the context of the signal detection theory (see chapter 15), it is worth considering how the performance of the observer changes with the variation of the sensitivity at different constant bias levels. This is provided by the isobias curves. An isobias curve is a curve obtained by keeping the variables that may affect the response bias of the observer constant and modifying only the discriminability of the stimulus (change of the signal-to-noise ratio). In other terms, isobias curves are functions that relate the hit and false alarm rates computed at different levels of sensitivity (different signal to noise ratios) in the presence of a constant bias (Macmillan and Creelman, 1991). 3 The authors compared the 2AFC with the yes/no response model simulating runs with the 1-up/2-down transformed up-down method. 4 The authors simulated 2AFC-and y/n-PEST, best PEST, and QUEST.
140
(b)
(c)
(d)
(e)
(f)
5
Measuring the Soul the 2AFC more trials (expression of lower efficiency) are needed compared to the y/n paradigm to reach the same standard deviation of the threshold estimate. In fact, the standard deviation of the estimate obtained with 2AFC simulations5 is 3.21 times higher than that computed from y/n simulations, and the efficiency is almost 5 times lower (King-Smith et al., 1994). The difference is explained by the fact that 2AFC is inherently more variable due to its 50% guessing probability (McKee et al., 1985).6 AFC is demanding: when performing a temporal 2AFC, the observer needs to remember the first presentation to be compared with the second. This could be demanding in the clinical setting, particularly for some categories of subjects as children and elderly patients, and for those who are naïve to this kind of examination. AFC is time consuming (Kaernbach, 1990; Phipps et al., 2001): in the 2AFC (temporal paradigm), the signal occurs in one of the two presentations: this response design provides therefore 1 bit of information at each trial. In turn, in the y/n response model the signal in each trial can be present or absent, so 2 bits of information will be gained. It follows that AFCs tend to be more time consuming compared to yes/no response model. Actually, best PEST converges on the threshold after 16 trials when adopting the 2AFC vs. 8 trials of the y–n response model (Phipps et al., 2001). AFC is more affected by inattention compared to the y/n response model7 (Green, 1995). Attention drops produce increased variability of the threshold estimate because they reduce the slope of the psychometric function. AFC is prone to positional and temporal bias8 (especially for AFC with multiple alternatives: Klein, 2001). In these cases the principle of stationarity of the responses tends to be violated. Finally, in the AFCs, the estimate of the threshold tends to drift from its true value when a sequence of “lucky” guesses occurs. Klein (2001) noticed that with a 2AFC adaptive procedure it may be difficult for the testing sequence to recover the correct level of examination after a series of guesses. It follows that a given amount of useless presentations will take place at a level of intensity that
Associated with the minimum variance method. At the threshold level, the variability is higher (about twice as variable) in the 2AFC compared to the y/n response design with the same number of presentations. Consider an experiment conducted with the method of constant stimuli: the yes/no response model generates the highest variability of the binomial response at the threshold level (50% of correct responses) and the lowest variability at the level of the two asymptotes; on the contrary, in the 2AFC model the highest variability is at the lowest asymptote (the chance level), and the lowest variability is at the upper asymptote. This makes confidence limits in 2AFC asymmetrical (the lower limit is farther from the estimated threshold than the upper limit) and larger compared to the y/n design. This asymmetric distribution of the binomial variability in the 2AFC model suggests using an asymmetrical distribution of the stimuli, avoiding presentations much below the expected threshold level (75%) (McKee et al., 1985). In the 2AFC model, the confidence limits for the threshold estimate are larger if the slope of the psychometric function is unknown, and decrease with the number of trials. 7 This is particularly the case of the 2AFC. 8 See section 14.1.1. 6
What is the Best Psychophysical Technique?
141
is too low: when the examination is made of a small number of trials and the guesses take place at the beginning of the sequence, the final threshold estimate can have a low level of confidence. A number of solutions have been advanced to address this problem: for example, it has been suggested to increase the number of choices so as to reduce the probability to guess right (3AFC: 33%, 4AFC: 25% etc.), or to turn 2AFC models into alternative unforced choice models (AUC: Kaernbach, 2001; Klein, 2001):9 in effect, as discussed, AUC models tend to minimize the guess rate, avoiding sequential false-positive errors (Kaernbach, 2001). As reported in section 5.4, Kaernbach (2001) compared the 2AUC coupled to a staircase weighted up-down method with the conventional 2AFC associated with the same staircase weighted up-down method; in addition, he compared his procedure with the yes/no simple staircase paradigm. The 2AUC weighted staircase proved to be slightly more efficient compared to the 2AFC weighted staircase and both strategies were found to be more efficient compared to the simple up-down y/n method. According to the author, the AUC response model is particularly suitable for naïve subjects as well as within the clinical setting: in fact, the AUC is not only more efficient but also more comfortable than the 2AFC. This advantage would be preserved even when using n-alternative choice models where n is >2. Finally, it has been stated that using the 2AFC response model with staircase procedures is questionable, as it provides more biased estimates compared to the y/n response model used with the same staircase (Kershaw, 1985).
14.1.3
Drawbacks of the Y/N Response Model
As discussed, the possibility of a sudden change in the subjective response criterion during the examination in the y/n paradigm is a variable that neither can be predicted nor controlled, and that tends to bias consistently the outcome: actually, this lack of control of the response criterion is the major drawback of this response model.10 This bias is particularly evident when testing detection tasks in experiments on contrast sensitivity (Derrington and Hennings, 1981). Besides, the y/n response model is particularly susceptible to the lapse rate: even one false-negative response at a very early stage of the exam (before the fourth trial) is enough to reduce efficiency; if two sequential lapses occur at the beginning of the test, the procedure is unable to recover the current stimulation level before the test stops. If the two false errors happen after the fourth trial, the procedure finally recovers but after a consistent number of trials, i.e., 10–16 presentations, so that half to more than three-fourth of the programmed trials is wasted in recovering the normal (unbiased) examination level (Phipps et al., 2001).
9
See section 5.4. The goal of the SIAM procedure (section 15.2), indeed, was to neutralize the criterion bias when using the yes/no paradigm. 10
142
Measuring the Soul
Regardless of these shortcomings, Marvit and Florentine (2003) in simulated auditory sessions reported comparable efficiency11 of the 2AFC and yes/no response model (ZEST procedure). Likewise, despite the criterion bias that affects the yes/no model, test-retest reliability of the threshold estimate obtained in 19 subjects with the y/n paradigm is found comparable to the 2AFC (procedure employed: QUEST: Pierce and King-Smith, 1992). Finally, sensory determinacy in the y/n paradigm seems lower compared to AFC, as suggested by the higher estimates of sensitivity found by Sekuler and Blake using n-alternative forced choice models (1994).
14.1.4
Spatial or Temporal AFC? How Many Forced Choices?
Within the clinical setting, that is to say for subjects more or less naïve to psychophysical tests, Jäkel and Wichmann (2006) recommend spatial instead of temporal AFC.12 Spatial AFCs, in fact, do not require keeping in mind the impression from the previous interval(s) while waiting for the subsequent presentation(s), and are twice as fast as temporal intervals. Moreover, spatial 2- and 4AFC are less affected by bias compared to temporal 2AFC. Because, as explained at the beginning of this section, 2AFCs suffer from consistent variability of the threshold estimate (Shelton and Scarrow, 1984: real data; Leek et al., 1992: simulated data),13 more than 2 alternatives, namely 4, are advisable, especially when measuring detection thresholds: actually, with 4AFC the variance of correct answers is lower, while test-retest reliability and efficiency are higher. In sum, as a general rule the more the alternatives, the lower the variance of the estimate. Besides, the goodness-of-fit is better (Jäkel and Wichmann, 2006). This finding is in line with previous observations of Schlauch and Rose (1986, 1990), who found that (for auditory tasks) temporal 3AFC and 4AFC associated with staircase procedures are more efficient, less biased, and provide fewer variable estimates compared to the 2AFC. Likewise, Leek and colleagues (1992) showed in their simulation study (n-AFC paired with 1U2DTR and 1U3DTR staircase), that the 4AFC and 3AFC response models are more efficient (lower sweat factor) compared to the 2AFC. Finally, 2-AFC is found to yield less accurate and more variable estimates of the slope, especially when few trials are used. It is worthwhile recalling again the study of Jäker and Wilchmann (2006), who found that n-AFC response models perform differently according to the type of task (detection or discrimination) and to the class of subjects (naïve14 or accustomed to psychophysical experiments). The authors investigated spatial 2AFC, 4AFC, 8AFC,
11 Efficiency was computed as the sweat factor, which is the product of variance of the threshold estimate and the number of trials (see chapter 13). 12 For detection and discrimination tasks. 13 To reduce the variability related to the 2AFC response model, some practice is recommended before starting the examination (Shelton and Scarrow, 1984). 14 Four adult, non-expert observers (25 years old).
What is the Best Psychophysical Technique?
143
and temporal 2AFC using a block design and providing auditory feedback. They used a contrast sensitivity detection and discrimination task. For the detection task, 4AFC generated the lowest threshold (highest threshold determinacy)15 and the highest test-retest reliability in naïve observers. The goodness-of-fit (estimated as the deviance of the binomial distribution from the fit of the psychometric function) was optimal with the 4AFC in this class of subjects. On the contrary, in highly experienced observers the 2AFC showed the highest threshold determinacy but 4AFC was less affected by learning effect. No difference in the goodness-of-fit between 2, 4, and 8AFC was found. In the discrimination task, 2AFC had the highest threshold determinacy whereas 4AFC revealed the worst performance (irrespective whether the observer was naïve or expert). The reason for this difference relies on the attentive demand: the attentive demand is higher when the observer is required to discriminate rather than to detect a variable; now, in a detection task the spatial attentive demand for 4- and 2AFC is roughly the same; therefore, it does not affect the 4AFC judgment. Instead, in a discrimination task increasing the spatial alternatives (from 2 to 4) produces higher attentional demand, leading to higher thresholds in 4AFC compared to 2AFC. In other terms, a greater number of spatial locations generate a worse performance when estimating discrimination thresholds. Overall, the shortcomings of 4AFC are counterbalanced by consistently lower variability so that, according to the authors, 4AFC is the AFC with the highest precision. All in all, Jäker and Wilchmann concluded that, despite the lower sensory determinacy (and the tendency to generate bias), 4-AFC should be preferred to 2AFC not only in detection but also in discrimination tasks, because its efficiency is higher. Despite its drawbacks, the 2AFC response model continues to be widely used, because 3AFC and 4AFC response models are more time consuming and more prone to positional or temporal bias compared to the former (Johnson et al., 1984). To conclude, seventy years ago Blackwell stated that:16 – – – –
2AFC should be preferred over yes/no response models. 2AFC should be preferred over 4AFC response models. Temporal intervals should be preferred over spatial intervals. Grouped stimuli of the same magnitude should be preferred to randomized stimuli. – Feedback should be provided. – Participants should be well trained in psychophysical testing (Blackwell, 1952).
15
Chapter 13. For the sake of precision, these recommendations are referred to discrimination thresholds.
16
Measuring the Soul
144
14.2
What is the Best Psychophysical Procedure?
Not only the best response model but also the best psychophysical procedure is a matter of debate: adaptive or nonadaptive? Adaptive with fixed or with variable step size? Parametric or non parametric?
14.2.1
Nonadaptive vs. Adaptive Procedures
Even if the method of constant stimuli has been nowadays replaced, especially in the clinical practice, by the adaptive techniques, there is no complete consensus whether the adaptive techniques must be always preferred to the constant stimuli. As a matter of fact, the method of constant stimuli presents some advantages: it is simple to implement, it provides the most straightforward way to plot the psychometric function, and does not require any knowledge on its slope. Besides, in the method of constant stimuli, the sequence of the presentations does not depend on the response given after each trial, but is randomized: contrary to the adaptive procedures, such independence makes constant stimuli less liable to non-stationary tendencies. Moreover, the reliability of the constant stimuli is higher compared to the adaptive techniques, especially in experiments involving naïve subjects (Jäkel and Wichmann, 2006). Finally, studies of psychoacoustics showed that thresholds estimated with adaptive methods tend to be biased low compared to the nonadaptive techniques (Shelton et al., 1982; Stillman, 1989; Kollmeier et al., 1998). This could apply to visual psychophysics as well. Yet, as already explained, the presentation of a consistent number of stimuli far above and below the threshold makes the method of constant stimuli less efficient than the more sophisticated adaptive techniques. Simpson (1988) compared the method of constant stimuli with the best PEST by performing Monte Carlo simulations.17 He found that with a relatively small number of trials (100 or less), the efficiency of the two procedures is comparable, but the method of constant stimuli yields less biased threshold estimates.18 The less biasing effect of constant stimuli depends on the uniform distribution of the tested levels of intensity. Simpson in the same study maintained that “adaptive methods may give a quick threshold estimate, which gives an impression of efficiency, but this estimate will have high variability”. He concluded that “for measurements of 100 trials or less, the method of constant stimuli is clearly superior to the adaptive maximum likelihood method”.19 Likewise, the method of constant stimuli is stated to be as efficient
17
Both procedures were associated with 2AFC. Adopting a logistic psychometric function, Simpson calculated the threshold via maximum likelihood estimation (the analysis computed the likelihood that the threshold was at each of the 21 selected stimulus levels). As an additional finding, in both procedures standard deviation and bias of the threshold decreased with the number of trials. 18 Best PEST threshold estimate was biased low. 19 Simpson, 1988: page 435–436.
What is the Best Psychophysical Technique?
145
as PEST (in nine subjects: Hesse, (1986)) and UDTR staircase procedures (Watson and Fitzhugh, 1990). Yet, other studies came to different conclusions. Performing psychometric experiments in real observers, Taylor and colleagues (1983) found that the method of constant stimuli is more biased and less stable than PEST in forced choice experiments. Using simulations, Watson and Fitzhugh (1990) compared the method of constant stimuli with conventional UDTR staircase, with a maximum likelihood-based UDTR staircase (ML-UDTR), and with QUEST. They found that efficiency (the number of trials required to yield a given standard deviation) of the constant stimuli was roughly the same compared to UDTR, but was lower compared to ML-UDTR, and especially to QUEST: compared to QUEST, in fact, the relative efficiency of the constant stimuli was 20–40%.20 Instead, the difference in bias was not substantial. Examining the difference in entropy after n trials, they concluded that the adaptive procedures (QUEST in particular) gain information more rapidly than the nonadaptive constant stimuli. According to the authors, the conclusion of Simpson that the method of constant stimuli is as efficient as the parametric adaptive procedures like best PEST depends on the favourable conditions of his simulations: namely, on the position of the true threshold, expected to be always within the tested interval. Yet, this cannot be taken for granted in real experiments. In conclusion, “the method of constant stimuli can never be as efficient as a properly designed adaptive method” (Watson and Fitzhugh, 1990, p. 91). Using simulated and real data, Kollmeier and colleagues confirmed the conclusion of Watson and Fitzhugh, comparing the constant stimuli with 1U2D UDTR, 1U3D UDTR, and PEST (with 2AFC vs. 3AFC): even if the results obtained in the four procedures were similar, the most efficient was the 3AFC-1U3D staircase.
14.2.2
Adaptive Nonparametric Procedures: Variable vs. Fixed Step Size
In the nonparametric procedures with fixed step size, the step size during an ascending and descending simulated run remains constant. Of the procedures with fixed step size, the weighted up-down staircase (converging to φ = 0.75) proved to be slightly more efficient and faster (10% less time) than the 2U1DTR staircase (φ = 0.707: Kaernbach, 1991). A similar experiment has been performed on human subjects: weighted up-down staircase (converging to φ = 0.75) has been compared with the 2U1DTR staircase (φ = 0.707) using a 2AFC response model and an auditory discrimination task (Rammsayer, 1992). The authors confirmed even in human subjects what found by Kaernbach in his simulation: the weighted up-down staircase is slightly better than the 2U1DTR staircase as it converges faster on the threshold (about 20 trials vs. 40 trials) and has smaller between-subject variability. The authors concluded that the weighted up-down staircase should be the preferred method. 20
QUEST revealed to be twice more efficient than the ML-UDTR.
146
Measuring the Soul
As pinpointed by Leek (2001), the nonparametric procedures with a variable step size like the stochastic approximation, the accelerated stochastic approximation, or MOBS, test a greater number of signal levels compared to the strategies with fixed step size. This way, resolution is improved and the threshold is assessed with a higher degree of precision. The accelerated stochastic approximation, in particular, enables faster convergence on the threshold compared to the stochastic approximation. Faes and colleagues (2007) compared the performance of a staircase procedure with a variable step size, namely the accelerated stochastic approximation, with that of a staircase with fixed step size (1U3DTR). They performed repeated simulations 100 trials long. To make the results of the two procedures comparable, the target probability of the accelerated stochastic approximation was set to 0.793. The authors found that the 1U3DTR generated more variable threshold estimates. Of the procedures with variable step sizes, the Modified Binary Search (MOBS) proved to be more efficient, precise and less affected by response errors compared to the accelerated stochastic approximation (Tyrrell and Owens, 1988); moreover, it is more precise21 and faster compared to a procedure with fixed step size like the 1U2DTR.22 In agreement with Tyrrell and Owens, Johnson and Shapiro found that MOBS yields lower variable estimates compared to the accelerated stochastic approximation23 but it is more time consuming (Johnson and Shapiro, 1989). After considering the data obtained from their simulation of MOBS and accelerated stochastic approximation, Anderson and Johnson in 2006 concluded that “[…] we find little evidence to recommend the use of the MOBS threshold technique [that relies on heuristic rules, Author’s note] above the more statistically rigorous ASA technique [Accelerated Stochastic Approximation, Author’s note]”.24
14.2.3
Adaptive Nonparametric vs. Parametric Procedures
Best PEST is found to be more precise and efficient compared to PEST, more virulent PEST, and the conventional staircase techniques in yes/no simulated experiments (Pentland, 1980). Also the simulated variability of the threshold estimate is lower (that is the accuracy is higher) for best PEST and QUEST compared to PEST (irrespective of the response model used: y/n or 2AFC, Madigan and Williams, 1987), and to the staircases with fixed step size (Alcalà-Quintana and Garcia Pérez, 2007). This finding is in line with that previously reported by Watson and Pelli (1983), that is the efficiency of QUEST after 128 trials (84%) is higher compared to PEST (45%: Taylor and Creelman, 1967) and Hall’s hybrid procedure (71%: Hall, 1981) when the same number of presentations is administered. Likewise, the efficiency of simulated QUEST is higher than that of simulated UDTR staircases (Watson and Fitzhugh, 1990).
21
Smaller confidence interval for the threshold estimation. Indeed, MOBS requires less than one-third of the trials compared to 1U2DTR to converge on the threshold. 23 The 4–2 dB stochastic approximation used in clinical perimetry (Bebie et al., 1976). 24 Anderson and Johnson, 2006, p. 2410. 22
What is the Best Psychophysical Technique?
147
The superiority of the parametric procedures has been confirmed in human observers By Phipps and colleagues, who reported a variability of the threshold estimated with y/n best PEST 3–4 dB lower than the nonparametric procedures (Phipps et al., 2001). Yet, unlike the findings obtained in the abovementioned studies, in simulated auditory sessions Marvit and Florentine reported comparable efficiency of ZEST and 1U3DTR,25 expressed as sweat factor (Marvit and Florentine, 2003). Likewise, Shelton and his group found in real observers no substantial differences between best PEST, simple staircase, and PEST associated with 2AFC (Shelton et al., 1982) and 3AFC (Shelton and Scarrow, 1984).26 Buss and colleagues, too, found that in a detection (auditory) task a MLE-based procedure and 3AFC-3U1DTR staircase produced equally stable estimates in children aged 6–11 (Buss et al., 2001). However, the maximum likelihood procedure was faster, requiring about 10 trials to converge on the threshold. For this reason, the authors recommend using MLE procedures to test infants and animals, even if it may be difficult due to the very few suprathreshold trials presented. In line with these findings, some years later in a 2AFC visual discrimination experiment with inexperienced subjects no difference in accuracy27 was found between PEST, best PEST, and QUEST. The same results were found when the same procedures were associated with the y/n response model (Madigan and Williams, 1987).28 On this basis, Madigan and Williams defended the validity of PEST against the more modern parametric procedures, stating that “PEST should be the procedure of choice for an investigator who is not in position to worry about stimulus spacing, slope parameters, and a priori distributions” (p. 248).
14.2.4
Adaptive Parametric Procedures
In a simulation study, King-Smith and coll (1994) showed that the 2AFC and yes–no Minimum Variance Method is more efficient than ZEST, and that ZEST, in turn, is more efficient than QUEST and best PEST. According to Simpson, the step method is less biased compared to constant stimuli and to best PEST that, in fact, is prone to negative bias especially in experiments lasting less than 40 trials (Simpson, 1989). Experiments on human subjects29 confirmed that the performance of the step method is not worse than best PEST and constant stimuli (Simpson, 1989). In another simulation, King-Smith and colleagues (1994b) showed that the minimum variance method associated with 2AFC and, even more, with yes–no
25
Associated with the 2AFC response model. In both cases: auditory task. 27 Or error of threshold estimation. 28 Replacing the human performance with 2AFC and y/n simulated data, the efficiency of best PEST and QUEST was similar, and higher than PEST. 29 Task: discrimination of the difference in the number of random dots between two presentations. Response model: 2AFC. 26
148
Measuring the Soul
response models, was slightly more precise than ZEST; in turn, ZEST was more accurate (lower standard deviation of the threshold estimate) than QUEST. The superiority of ZEST over QUEST had been reported by the same author ten years before (King-Smith, 1984). Phipps confirmed that the convergence of ZEST is faster compared to best PEST because in the former the signal level is close to the threshold from the very beginning of the track (Phipps et al., 2001). Despite the ψ method is devised to measure within a single session slope and threshold and is therefore not specifically aimed at the threshold, it seems to be not inferior to ZEST for the estimate of the threshold (Kontsevich and Tyler, 1999). However, ZEST is considered by Kontsevich and Tyler as the best adaptive method.30 Accordingly, Marvit and Florentine (2003) stated that ZEST combined with 2AFC should be the procedure of choice in psychoacoustics. In conclusion, the most appropriate words to resume what is reported in this chapter are those of Madigan and Williams: “Psychophysical procedures have been actively refined over a century, and the result is a collection of methods that introduce error variances that are small compared with the uncontrolled error variances contributed by the subjects themselves. For this reason, small refinements in psychophysical methods are unlikely to result in measurable improvements in the quality of human 2AFC [as well as y/n: author’s note] psychometric data” (Madigan and Williams, 1987, p. 248). It is revealing, indeed, that in human observers the effectiveness of the nonparametric staircase procedures, whose initial theorization dates back the early sixties, is overall comparable to the more modern and sophisticated Bayesian procedures (Alcalá-Quintana and Garcia Pérez, 2007).
14.2.5
Considerations on the Staircase Procedures with Fixed Step Size: The Optimal Setup
In two of his papers, Garcia Pérez discussed the characteristics and efficiency of some of the most common staircase procedures: namely, those with fixed step size like the simple up-down, the transformed UDTR, and the weighted staircase (Garcia Pérez, 1998, 2001). The author observed that the relationship reported by Levitt (1970) between target probability and up/down rule in the staircase procedures is questionable: in fact, he considered that the step size may affect the convergence of these procedures on the threshold. As explained in section 9.2, in the transformed (UDTR) staircase the size of the incremental and decremental steps is the same: what changes is the number of sequential correct responses required to reduce the level of the signal. Now, the author stated that in UDTR procedures associated with the 2AFC or y/n response model the threshold probability targeted by the staircase does not depend only on the up/down rule adopted by the experimenter (as described by Levitt): it is affected 30
According to the group of King-Smith, the Minimum Variance Method is even more efficient than ZEST (King-Smith and coll, 1994).
What is the Best Psychophysical Technique?
149
also by the step size δ (namely by the relative step size = δ/σ, where σ is the spread of the psychometric function, i.e., the inverse of the slope31). In UDTR staircases, indeed, the convergence takes place at the expected target probability only for small relative step sizes (δ/σ < 0.05). Unlike the UDTR procedure, in the weighted up-down staircase the number of sequential correct responses required to reduce the level of the signal is 1 (i.e., 1up/1down) but the size of the decremental (δ↓) and incremental (δ↑) steps is different, and each up/down ratio (δ↓/δ↑) is supposed to converge on a given target probability. Now, this is true only: – When the relative step size is very small ( β. – Respond “no” if L < β. In a 2AFC response model the observer is confronted with two presentations. Like in the y/n case, in each presentation he is required to judge how it is likely that he has seen the signal if the signal is present [P(y|s)], and how it is likely that he has seen the signal if the signal is not present [P(y|n)]. If in the first interval the likelihood ratio L1 = P(y|s)/P(y|n) is greater than in the second, the observer will answer “first interval”. Otherwise, he will reply “second interval”. So, for the 2AFC response model the rule is: Reply “first interval” if L1 > L2. Reply “second interval” if L2 > L1. Becausein 2AFC a response bias c (i.e., the attitude to prefer one interval) can affect the judgment,4 the definitive rule is: Reply “first interval” if L1 > c L2. Reply “second interval” if L2 > c L1. Plotting the probability to correctly perceive the stimulus (the hit rate, i.e., the probability to reply “yes” when the stimulus is present) vs. the probability to generate a false alarm (i.e., the probability to reply “yes” when the stimulus is absent = in the presence of noise alone), a Receiving Operator Characteristic curve (ROC curve) for each signal strength (or, even better, for each possible d′) is obtained. The ROC, therefore, describes all the possible behaviors of the subject who is asked to respond to a stimulus of a given intensity. The ROC, in other terms, is traced by the possible positions of the subjective criterion β in a yes/no response
4
See the temporal and positional bias in n-AFC, section 14.1.1.
Measuring the Soul
162
model, or by the possible positions of the response bias parameter c in a 2AFC response model. The subjective criterion β placed more on the left on the curve means more conservative, so that the observer prefers to be sure to have perceived the target before answering “yes”: in this case the hit rate will be lower and the false alarm rate will be lower (see β1 in figure 15.7). On the contrary, β displaced more on the right means more liberal, so that the observer feels more confident of his answer: in this case the hit rate will be higher, but, in turn, the false alarm rate will be higher (see β2 in figure 15.7). The three ROC curves show that the proportion of false-positive responses is not constant along the psychometric function (i.e., along the spectrum of the signal strength),5 but, as pointed out by Harvey (1986), the false alarm rate differs at each point of the distribution, i.e., at each signal level (figure 15.8, left panel). In the right panel of figure 15.8 the neutral criterion β is set at the point where the slope of the ROC curve is 1. This point corresponds to the maximum distance of the ROC curve from the diagonal. At this point, the subjective criterion β (as well as the response bias c) is 1 that is to say neither conservative nor liberal (neutral attitude). In summary, the ROC curve can be seen as a curve of isosensitivity that describes at each stimulus level the proportion of false alarms as a function of the hit rate. So, the different points along the curve refer to the increase in the false alarm rate with the increase in the hit rate. The more the curve approaches the upper left corner of the plot, the higher is the sensitivity of the observer, as in the upper left corner the hit rate is 100% and the false alarm rate 0% (see d′ = 4.65, previous section). The more the curve flattens and approaches the diagonal of the plot, the lower is the sensitivity of the observer, as the diagonal indicates that the hit rate equals the false alarm rate (see d′ = 0, previous section): at this level, the noise distribution and the signal distribution coincide.
5
A solution to remove the effect of false-positive responses and obtain unbiased psychometric data is provided by the probability-based correction for guessing and the related Abbott formula (see chapter 6). Yet, Klein (2001) recalled that the psychometric function may remain biased to some extent by the false positive rate (Fp) even after the probability-based correction for guessing is applied. For this reason, he recommends using the z-score-based correction for bias, an alternative method based on the signal detection theory. In his review, the author provides a clear explanation of the difference between the two methods. The Abbot formula is: wadj ðv Þ ¼
wðv Þ Fp : 1 Fp
In the equation, wadj ðv Þ is the real proportion of correct responses, accounting for the false-positive rate Fp. In the z-score-based correction for bias, the proportion of correct responses is transformed into z-score, so that d′ (and not the proportion of correct responses) is reported on the ordinate. In yes/no paradigms the z-score-based correction for bias requires that the number of trials presented at the stimulus strength = 0 is equivalent to the number of trials presented at the other stimulus levels. The probability to make false-positive errors, in fact, is not constant but differs across the spectrum of the signal strength so that it should be measured along the whole extent of the psychometric distribution. The signal detection correction for guessing, indeed, estimates the false-positive errors not only at the signal levels close to zero but across the whole spectrum of stimulus intensity.
Introduction to the Signal Detection Theory (SDT)
163
FIG. 15.7 – ROC curves representing the overall response behavior of the observer for three different signal strengths (d′) in a y/n response model. Abscissa: false alarms rate, Ordinate: hit rate. Left panel: each curve represents how false alarm rate increases with increasing the proportion of correct responses for three different values of d′. The higher is d′, the higher is the proportion of correct responses at the same proportion of false-positive responses. The sensitivity of the observer to a given signal intensity is given by the area under the corresponding ROC curve (AUROC). Right panel: the criterion β establishes the actual performance of the observer: in the lowest curve, for example, β1 (more conservative) generates a probability of correct responses = 60% and a probability of false-positive errors = 15%; β2 (more liberal) generates a higher probability of correct responses (82% in this example) but determines a higher probability of false-positive errors (35%). Modified from Kaernbach C. (1990). Reproduced with permission of AIP Publishing via Copyright Clearance Center.
FIG. 15.8 – Left panel: the false-positive rate is not constant across an examination, but it changes as a function of the stimulus strength a, b, or c. Right panel: the point corresponding to the maximum distance of the ROC curve from the diagonal is the point at which the subjective criterion β (or the response bias c) is 1 (neutral attitude). Modified from Kaernbach C. (1990). Reproduced with permission of AIP Publishing via Copyright Clerance Center.
Measuring the Soul
164
The signal detection theory sheds light on the importance of the response criterion and highlights the issue of the y/n response model that suffers from the unpredictable and uncontrollable variation of this parameter during the examination. A yes–no procedure able to control the internal criterion would therefore be desirable. The Single-Interval Adjustment Matrix procedure addresses this issue.
15.2
The Single-Interval Adjustment Matrix Procedure (SIAM: Kaernbach, 1990)
SIAM (Karenbach, 1990) is a staircase procedure devised to provide unbiased results when the yes/no response design is used. In fact, it maintains a neutral response criterion, making y/n models as precise as the criterion-free AFC paradigms. According to the signal detection theory, the response to a signal depends on the likelihood ratio LR between the probability of an observed event (for example the response “yes”) if the signal is present, and the probability of the same event (“yes” response) if the signal is absent: LR ¼ P ðyjs Þ=P ðyjn Þ:
ð15:3Þ
In the y/n response model, the subject answers “yes” if LR exceeds a subjective response criterion d: LR > d→ response: “yes”. LR < d→response: “no”. Evidently, a change in d during the examination will generate a biased estimate. In the 2AFC response model the subject answers “first” or “second” interval depending on which interval has the higher likelihood ratio LR: LR1 > LR2 → Response given: “first interval”. LR1 < LR2 → Response given: “second interval”. In this paradigm, there is no need for a criterion response: in fact, in the AFC response there is no need for the observer to set a subjective criterion (that can change during the examination) to decide which alternative to choose during a trial. When LR1 LR2, he will select at random one of the two alternatives. As made clear by Kaernbach “This implies a symmetry in the observer decision”, provided “[…] there is no bias in the selection of one interval over the other”. (Kaernbach, 1990, p. 2674). To control the internal criterion in the y/n paradigm, accurate information is needed not only about the hit rate (the occurrence of a “yes” response when a signal is presented) but also on the false alarm rate (the occurrence of a yes response when the noise [i.e., no signal] is presented). These variables can be expressed as P(y|s) and P(y|n), respectively. The need for precise information on the false alarm rate is the reason why the SIAM, as it will be explained, not only presents a quota of intervals with the signal but also a consistent amount of intervals with noise.
Introduction to the Signal Detection Theory (SDT)
165
For a given stimulus strength, a ROC curve (plotting P(y|s) in ordinate and P(y|n) in abscissa) describes all the possible criterion levels, from the most conservative (on the left side of the curve) to the most liberal (on the right side of the curve: figure 15.9). The different positions of the criterion d along the curve informs on the probability of the subject to answer “yes” when the signal or the noise is present. In case of a liberal criterion (d on the right side of the ROC curve), the probability of a “yes” response when the signal is presented P(y|s) is higher compared to the case of a conservative criterion, but so are the “yes” responses with noise P(y|n). The internal criterion is neutral if it is located at the maximum distance of the curve from the diagonal, that is where the curve has a slope = 1. As can be seen in figure 15.9, this distance is given by the difference P(y|s)−P(y|n), and is called Maximum Reduced Hit Rate (MRHR). Like the Area under the ROC curve (AROC), the MRHR is a reliable measure of sensitivity, and at this point of the ROC curve the criterion d (that is neutral), provides the best sensitivity. So, d must remain set at this point during the procedure. To achieve this goal, Kaernbach makes use of a payoff-like matrix, “punishing” the subject when a miss or a false-positive response occurs, and “rewarding” him when a hit takes place. The amount of the “punishment” and of the “reward” may be different in case of a miss response, a hit response, or a false-positive response. In SIAM, the payoff matrix is substituted by an adjustment matrix, where an increase in the signal level is the punishment, and a decrease is the reward. In this way, the adjustment matrix motivates the subject to stabilize his criterion at the neutral position, thereby to maintain the highest sensitivity for the whole length of the exam. In summary, the SIAM is a staircase procedure paired with a yes–no response model that estimates hit rate, false alarm rate, miss rate, and correct rejections by presenting stimuli with signal and stimuli with noise. An adjustment matrix based on the hit rate, miss rate and the false alarm rate allows keeping the response criterion of the subject at a neutral level, making the yes–no paradigm unbiased, therefore as precise as the 2AFC response model. In SIAM, presentations containing the signal and presentations with no signal are randomly displayed in a pre-established proportion. The proportion of intervals containing the signal is called the signal quota. The author judged that the most convenient signal quota is 50, which is 50% of presentations with the signal and 50% of presentations without the signal (noise). The stimulus level depends not only on the current response (hit/miss) but also on the response criterion kept by the observer (controlled, as explained, by the hit rate and false alarm rate). So, the signal level is decreased by n-steps after a hit (correct response), it remains unchanged after a correct rejection and it is increased by n-steps after a wrong response (miss) or after a false-positive response. The rules for the adjustment of the signal level are provided by an adjustment matrix that takes into account:
Measuring the Soul
166
FIG. 15.9 – ROC curves and different response criteria. The neutral criterion is located at the maximum distance of the curve from the diagonal. See text for explanation. (i) The desired target performance t. The target performance corresponds to the desired difference between hit rate and false alarm rate (t = HR−FP).6 The target performance can range from 0 to 1. (ii) The signal quota (as explained, a signal quota = 50% is the most convenient value according to the author). The matrix for any target performance t is the following (table 15.1). The matrix, therefore, allows deriving the rules for changing the stimulus level as a function of the target performance. In table 15.2, the number of incremental or decremental steps after a hit, miss response, correct rejection, and false-positive error is derived from the adjustment matrix for 5 different target performances. As shown in the table, for the target performance t = 0.5, the matrix generates the following rules: – For a hit (“yes” response in the presence of the signal) decrease the stimulus level by 1 step. – For a correct rejection (“no” response in the presence of noise, i.e., in absence of the signal) keep the stimulus level unchanged. – For a miss (“no” response in the presence of the signal), increase the stimulus level by 1 step. TAB. 15.1 – The matrix of the SIAM procedure; t is the target performance. Signal quota: 50%.
Signal present Noise present
Response YES = Hit: −1 = False POS: 1/(1−t)
Response NO = Miss: t/(1−t) = Correct reject: 0
The target performance can be assimilated to the target probability, as for u ¼ 0:5 the targeted hit rate is 0.5 and the desired false alarm rate is (always) 0%, so that the desired difference between hit rate and false alarm rate is 50%. The same applies to all the other values of u.
6
Introduction to the Signal Detection Theory (SDT)
167
TAB. 15.2 – The adjustment of the signal level according to the adjustment matrix (table 15.1) for the signal quota = 50% and five different values of target performance t. A target performance of 0.5 is highlighted. H = Hit response, M = miss response, FP = false-positive, CR = correct rejection. (From Kaernbach, 1990). t = 0.257 Answer Signal present Noise present
Y H−3 FP 4
N M1 CR 0
t = 0.338 Y H−2 FP 3
N M1 CR 0
t = 0.50 Y H− 1 FP 2
N M1 CR 0
t = 0.67 Y H−1 FP 3
N M2 CR 0
t = 0.75 Y H−1 FP 4
N M3 CR 0
– For a false-positive error (“yes” response in the presence of noise, i.e. in absence of the signal), increase the stimulus level by 2 steps. The goal of the matrix is, therefore, to provide an adaptive procedure based on the yes/no response model whose final estimate is not biased by the response criterion of the observer: it must be noted, in fact, that in SIAM bias is not considered a misleading factor but it actively participates in the computation of the threshold. Moreover, unlike the other staircase procedures, in SIAM false-positive errors tend to increase and not to decrease the threshold estimate.
7
In this case, to obtain a whole number the final adjustment is a multiple (×3). In this case, to obtain a whole number the final adjustment is a multiple (×2).
8
Chapter 16 Suprathreshold Psychophysics Suprathreshold psychophysics aims at judging the sensitivity to stimulations above threshold (Ehrenstein and Ehrenstein, 1999). Indeed, many functions deal with tasks involving well visible targets, and the interest can be focused on measuring the perceived intensity of stimulation (magnitude) of a given variable rather than the threshold. In substance, suprathreshold psychophysics1 measures the perceived sensory magnitude of stimulation rather than the threshold. In a nutshell, it makes use of three different strategies: – Procedures based on magnitude scaling. – Procedures based on magnitude estimation. – Procedures based on reaction time.
16.1
Procedures Based on Magnitude Scaling: Category Scaling
Trial after trial, the observer is asked to assign the stimulus to one of a predefined number of categories according to the perceived magnitude of stimulation referred to the variable of the signal under investigation. The mean value of the variable in each category allows estimating the subjective intensity (magnitude) of the stimulation: plotting the mean value of the signal in each category versus the number of categories, the estimate of the suprathreshold sensitivity referred to the variable under investigation is obtained for a given range of stimulations.2 For example, consider a moving point: let us suppose the variable under examination is speed with, say, 24 possible values; six categories are established: 1 = almost stationary, 2 = very slow,
1
The American psychologist Stanley Smith Stevens gave a material contribution to this topic. The levels of the stimulus must be far greater than the number of categories.
2
DOI: 10.1051/978-2-7598-2517-2.c016 © Science Press, EDP Sciences, 2021
170
Measuring the Soul
FIG. 16.1 – Category scaling. Left: table of speed vs. category for 24 different signal strengths (speed) and 6 categories. Right: (log) mean values of speed (ordinate) plotted versus the categories (abscissa). The resulting graph describes the subjective suprathreshold sensitivity (magnitude) referred to the speed of the stimulus across the range of velocity considered. See text for explanation. 3 = slow, 4 = quite fast, 5 = fast, 6 = very fast. Stimulus presentations are randomized, and presentation after presentation the observer is asked to “categorize” the speed of the target into one of the 6 predefined levels. The observer, therefore, has to judge which category of speed (1, 2, 3, 4, 5, or 6) each moving stimulus belongs. At the end of the exam, the speed values assigned to each category are averaged to obtain a single mean value per category (in our example: 6 mean values). After log transformation, each mean value is plotted on the ordinate versus the number of categories on the abscissa axis. The resulting graph describes the subjective suprathreshold sensitivity (magnitude) referred to the variable velocity of the stimulus across the range of speed considered (figure 16.1).
16.2
Procedures Based on Magnitude Estimation
The main shortcoming of category scaling is that signal intensities slightly different but potentially able to determine in the observer different sensations are grouped into the same category. Using procedures based on magnitude estimation these differences are preserved. The task is to assign an arbitrary strength level to the stimulus variable presented at each trial. In this case, therefore, no categories are pre-defined. Trials are randomized so that at the end of the exam a number of subjective judgments (magnitude estimations expressed as numerical values) about, for example, the speed of the stimulus, are collected. Regarding the example of the previous section, the speed values presented during the test are then turned into (log) values and plotted in ordinate versus the corresponding (log) arbitrary judgment (magnitude estimation) in abscissa (Ehrenstein and Ehrenstein, 1999). In a different version of magnitude estimation, an arbitrary value (the modulus) is assigned to a reference stimulus (the standard). The observer is asked to judge the strength of the test stimulus assigning a score that is a multiple or fraction of the modulus (for example a test stimulus three times brighter than the modulus, or two times faster or slower than the modulus).
Suprathreshold Psychophysics
16.3
171
Procedures Based on Reaction Time
A different approach in suprathreshold psychophysics makes use of reaction time (RT). Reaction time is an indicator of the immediacy of the apprehension of a stimulus; therefore, it can be regarded as a measure of the sensitivity of the visual system to suprathreshold stimuli: the shorter the reaction time, the more sensitive the visual system in that particular task. A main property of the paradigms based on RT is that different reaction times can describe different neural mechanisms: so, the outcome can reveal functional differences between sensory modalities and selective impairments referred to one of these. For example, RT for the recognition of low spatial frequencies is lower compared to high spatial frequencies with the same mean luminance (Breitmeyer, 1975). Because the first configuration is processed by the magnocellular system and the latter by the parvocells, the difference in RTs is related to the difference in magnocellular/parvocellular functioning (specifically: conduction speed).
Chapter 17 Brief Outline of Comparative Psychophysics Comparative psychophysics is psychophysics applied to the animals’ sensory function: in other words, comparative psychophysics is aimed at measuring the threshold in animals: mammalians, birds, fish, and insects. Ehrenstein and Ehrenstein (1999) provide a clear description of the principles of this discipline. To overcome the impossibility of obtaining a verbal response, in a preliminary phase the animal is conditioned to pair a stimulus with a reflexive response recordable with objective techniques (like increased heart rate) and elicited by an aversive stimulation: this is the so-called stimulus associated conditioning (or classical conditioning). More in detail, a stimulus, called the “conditioned stimulus” is presented to the animal associated with a reflex-eliciting stimulation, like a weak electric shock that causes heart rate acceleration. At the end of the period of conditioning, the animal is expected to respond with increased heart rate every time a stimulus that subjectively matches the reference stimulus is presented and perceived. Heart-rate acceleration, thereby, corresponds to a hit response. “Unconditioned” stimuli are then presented at increasing intensities. If the intensity of the unconditioned stimulus is perceived by the animal as different from the intensity of the conditioned stimulus, no change in heart rate will take place. On the contrary, the intensity of the unconditioned stimulus will elicit heart-rate acceleration if it is perceived by the animal as if it were the conditioned stimulus. So, the minimum difference in intensity between the conditioned stimulus and the unconditioned stimulus that fails to elicit heart rate acceleration Scθ can be regarded as the just noticeable difference. In the preliminary phase, the animal can be trained to perform the task of interest not via classical conditioning, but by rewarding it (with food or water) after a correct performance. In this case, the response of the animal is not coded as reflexive, but as a behavioral outcome. In a detection experiment, the animal can be trained with a well suprathreshold stimulus (the test stimulus) and a null (not detectable) stimulus; if the animal selects the stimulus containing the signal (correct performance), it will be rewarded. DOI: 10.1051/978-2-7598-2517-2.c017 © Science Press, EDP Sciences, 2021
174
Measuring the Soul
After the training period, the examination can start: the intensity of the test stimulus is changed according to the procedure of constant stimuli or an adaptive technique, and the minimum intensity of the test stimulus that generates the expected behavioral outcome in the animal is taken as the detection threshold. In a discrimination experiment, the animal can be trained with the presentation of two suprathreshold stimuli that differ in a variable (for example contrast, or luminance, or spatial frequency….). During the preliminary period, the two stimuli are easy to discriminate, and the animal can be trained to discriminate the one that contains the higher signal strength (we will call it the test stimulus). The animal will be rewarded when it selects the correct stimulus. After the training period, the examination can start: the intensity of one of the two stimuli is changed according to the procedure of constant stimuli or an adaptive technique, and the minimum difference between the two stimuli that generates a behavioral outcome in the animal (the behavioral outcome suggests that the animal is still able to discriminate the test stimulus from the reference one) is taken as the discrimination threshold (just noticeable difference).
Afterword The Inverse Problem of Perception The ultimate reason why the soul is so difficult to measure, the reason why psychophysics is so problematic, is enclosed in a sentence: perception is an inverse problem. The explanation of this issue has been provided by Prof. Zygmunt Pizlo in his paper titled Perception viewed as an inverse problem (2001). It is explained that psychophysics deals with the relationship between a stimulus and its mental measure: the perception or percept. As theorized by Fechner, the percept is the result of a sequence of operations that start from a distal stimulus (the object of the perception), continues turning the distal stimulus into its corresponding retinal projection (the proximal stimulus) and ends after the proximal stimulus, transmitted along the retino-cortical pathway, gets to the visual cortex and is finally apprehended. Such visual apprehension implies a measurement of the characteristics of the stimulus. The degree of fidelity with which the percept reflects the properties of the stimulus depends on the precision of this measurement: that is to say the more precise the measure, the more faithful the correspondence between percept and stimulus. In case a disturbing element (noise) due to ophthalmological diseases (for example cataract, glaucoma, or retinal degeneration) occurs, the apprehension of the signal is hindered: the distal stimulus will not be adequately mapped as the proximal stimulus or it will not correctly transmitted to the visual cortex, resulting in degraded visual perception. Still, even in normal conditions the psychophysical measurement of the object is not simple, and the threshold is difficult to quantify. It depends on the fact that the threshold, like the flight of the butterfly, suffers from the adverse events en-route. Now, the longer the flight, the more probable it is to encounter a breath of wind or a rain shower: that is to say, some “noise” that affects the perception of the object. For this reason, it is easier to measure the proximal stimulus, less affected by external influences (even if some of its features can be “stolen” or degraded along the way to the visual cortex) than the distal one: the threshold of a distal stimulus can be subject to change in luminance, color, shape and other variables, unlike the proximal projection that remains stable on the retina.
176
Afterword
Yet, the reality is out there and the proximal stimulus is nothing more than its internal (visual) representation. The fact is that the visual system cannot but rely on the proximal stimulus to discover the reality: as a matter of fact, the visual system has direct access only to the proximal stimulus. Besides, optimal processing of the proximal stimulus, which is the perfect mapping of the distal stimulus onto the retina, is not enough to ensure perceptual fidelity. Consider a solid object: evidently its tri-dimensionality is lost in its flat, bidimensional retinal projection. The bidimensional proximal stimulus is poor compared to the tridimensional richness of the object it is derived from. And yet, perception demonstrates to have effective strategies to overcome this limit. Because outer psychophysics (the subject of this treatise) aims at measuring the distal stimulus, the difficulties in achieving this goal now will be clear. To use the words of Pizlo, the difficulty relies on the fact that perception is an inverse problem, for which no univocal solution exists. Describing the proximal stimulus starting from the object is a direct problem because it has a unique solution and is computationally stable, it is robust against perceptual noise: in very simple terms, from an object it is easy to derive the characteristics of its projection onto the retina – like its shadow on a wall. The solution to this direct problem is one and only one because it aims at defining how the proximal stimulus will be considering its source, the object. Things differ when it comes to measuring the distal stimulus (the object) starting from its shadow on the wall: that is to say starting from its proximal stimulus. This is, indeed, what the outer psychophysics intends to do. Think of a Chinese shadow puppet show. By inspecting the shadow (that is a bidimensional shape) of a bird, it is difficult for a child (especially if he or she has never watched a Chinese puppet show) to deduce that the bird shadow comes from a hand, and even more the exact position of the fingers. Now let us assume the wall is the retina, the Chinese shadow of the bird is the retinal projection (proximal stimulus), and the hand is the distal stimulus: to identify the distal stimulus (and its characteristics: the hand and the position of its fingers) based on the proximal one (the shadow of the bird) it is not obvious, because other objects could produce the same perceptual result: we are facing an inverse problem, a problem that unlike the direct problem lends itself to more than one solution. Even simpler stimuli suffer from the limitations inherent to the inverse problem: the projection of a spot of light on the retina can be generated by a light source in the visual space with different intensities. Conditions like air clarity, the reflectance of neighbor surfaces, and other environmental interferences (as well as temporary changes in ocular media transmittance) make the correspondence between the retinal projection and the distal light source not univocal. To top it all off, perception does not end (if anything, it starts) with the proximal stimulus. The proximal stimulus, mapped on the retina, is transmitted and re-mapped on the occipital cortex: it is as if a second proximal stimulus (even more proximal, being on the cortex) stems from the retinal projection, making, so to speak, the inverse problem of visual perception a double inverse problem.
Afterword
177
Now it is evident: psychophysics is far more than what is explained in this book, and its fashion lays also in these ambiguous complexities. It is not easy to measure the soul, to catch the butterfly. And yet, the life of a butterfly lasts one day. Psychophysics has already reached two centuries. Carlo Aleci, Turin, 21st May 2020.
Appendix I Logistic and Weibull Distribution The logistic distribution is given by the equation: P ðx; v Þ ¼
c þ ð1 c kÞ : 1 þ ebðxaÞ
ðI:1Þ
The Weibull function (Weibull, 1951) is given by the equation: h i P ðx; v Þ ¼ c þ ð1 c kÞ 1 eðx=aÞb
ðI:2Þ
where: – – – – – – – –
P(x, ν) is the probability of a correct response. x is the signal strength. ν is the parameter vector ða; b; c; kÞ. c is the guess rate (e.g., 0.5 in a 2AFC. 0.33 in a 3AFC, and virtually 0 in the y/n response designs). λ is the lapse rate. b is the inverse of the slope (spread), that is the steepness of the curve. a is the threshold level, that is generally the midpoint of the curve. e is the natural logarithm base, that is the Euler’s number.1 Its value is e = 2.71828.
Inspecting figure I.1, it can be noted that the shape of the two functions is slightly different. The sigmoid referred to the Weibull distribution, in fact, is not symmetrical. This difference implies that in the Weibull distribution the positional parameter α (the threshold) is localized at a point not halfway between the two
1
A variant of the Weibull function is the Quick function. In the Quick function the natural logarithm base e is replaced by 2. The Quick function is identical to the Weibull, but it is slightly shifted horizontally.
180
Appendix I
FIG. I.1 – The logistic and Weibull cumulative distribution. To be noted is that contrary to the logistic distribution, the Weibull distribution is not symmetric. So, when a Weibull distribution is adopted, the positional parameter α (corresponding to the threshold) should be localized not midway on the slope (75% of correct responses in 2AFC response model) but to a higher position (at 82% of correct responses). asymptotes (e.g., at p = 0.75 in a 2AFC response model) but at a position above the midpoint (p = 0.82: Harvey, 1986; p = 0.96: Watson and Pelli, 1983). Likewise, in the y/n response model the target probability for the threshold is not set at φ: 0.5, but at 0.62 (Madigan and Williams, 1987), or even at 0.79 (Watson and Pelli, 1983). In sum, the thresholds estimated on a Weibull function are higher compared to the thresholds derived from logistic or normal distributions (Harvey, 1986). The Weibull distribution is plotted over a logarithmic abscissa. If the abscissa is linear, the Weibull distribution becomes a Gumbel distribution (Treutwein & Strasburger, 1999). Which is the distribution that best represents the psychometric function in psychophysical experiments? As underlined by Harvey (1986), within a theoretical frame, the differences are negligible (both Weibull and logistic distribution are valid choices, for example2); in practice, the function to be adopted should be the one that best fits the data collected in the experiment. However, in AFC detection experiments the Weibull function gives better fits than the logistic or normal distribution (Harvey, 1986). Other distributions often adopted in psychophysics are the normal distribution and the Gumbel distribution.
2
And yet, in n-AFC response models the Weibull distribution is preferable to the logistic (or Gaussian integral) function (Harvey, 1986).
Appendix II The Maximum Likelihood Estimation For a given threshold θ defined by a target probability φ, the likelihood function can be represented with two crossing hyperbolic tangents: one that operates on the pdf in case of a correct (“yes”) response, the other in case of an incorrect (“no”) response. After each presentation, the likelihood function estimates the likelihood Lθ that the actual sequence of responses is generated by the threshold. The computation of the likelihood Lθ is repeated across the range of intensities. Of all the likelihoods Lθ, the highest is selected: this is the MLE of the threshold, and the corresponding intensity is the most likely threshold θMLE at that presentation. The pdf that peaks at θMLE is selected, and the next stimulus is presented about its peak. According to the response, the MLE of the threshold is then recomputed. The reiteration of the process (also called successive approximation adaptive algorithm) makes the threshold estimation more and more reliable. After a “yes” response the likelihood that the higher intensities contain the threshold decreases: the likelihood function “clips” the pdf on the right side; in turn, after a “no” response the likelihood that the lower intensities contain the threshold decreases: the likelihood function “clips” the pdf on the left side; in both cases the pdf tapers and its updated peak will localize at the most probable threshold at presentation n. In his review, Harvey provided a clear explanation of how MLE is computed (Harvey, 1986). He considers the example of a binary sequence of responses (C = Correct, I = Incorrect) collected for a given stimulus intensity: for example log intensity = −2.5. Let us hypothesize that the recorded sequence of responses for a stimulus with log intensity = −2.5 is: C C I C I C C C I C:
182
Appendix II
Each response has a probability to be correct P(C) and, in turn, a reciprocal probability to be incorrect P(I). P(I) is therefore = 1−P(C). Based on the sequence of responses recorded, the LE estimates the likelihood that log intensity = −2.5 is the threshold. This means that it estimates how likely it is that log intensity = −2.5 generates 50% of correct responses (probability of a correct response P(C) and incorrect response P(I) = 0.5) based on the (C, I) recorded sequence. The likelihood (L) that a threshold (φ = 50%) with log intensity = −2.5 generates the sequence of responses C C I C I C C C I C reported by the observer can be computed as the product of each probability: L ¼ PðCÞ PðCÞ PðIÞ PðCÞ PðIÞ PðCÞ1 PðCÞ PðCÞ PðIÞ1 PðCÞ: Because P(C) and P(I) is 0.5, the result is: L50% = 0.5 ∗ 0.5 ∗ 0.5 *∗ 0.5 ∗ 0.5 ∗ 0.5 ∗ 0.5 ∗ 0.5 *∗ 0.5 ∗ 0.5 = 0.000976, that is L1 = 9.76∗10−4. In sum, the likelihood that the proportion of correct responses is 50% at that level of intensity (i.e., the likelihood that that level of intensity is the threshold) is 9.76 ∗10−4. This is the same as saying that if the tested log intensity −2.5 were at the threshold, the likelihood to record the sequence C C I C I C C C I C would be 9.76 ∗10−4. The next step is to establish the likelihood that the proportion of correct responses is different from 50% at the same level of intensity (i.e., the likelihood that that level of intensity is not the threshold). The procedure is the same. Using the same sequence of responses, the likelihood (L) that the probability of correct responses to a stimulus with log intensity = −2.5 is, say, 80% (i.e., that the stimulus is above threshold) is computed as the product of each probability P(C) = 0.8 and P(I) = 0.2: L ¼ PðCÞ PðCÞ PðIÞ PðCÞ PðIÞ PðCÞ1 PðCÞ PðCÞ PðIÞ1 PðCÞ: L80% = 0.8 ∗ 0.8 ∗ 0.2 ∗ 0.8 ∗ 0.2 ∗ 0.8 ∗ 0.8 ∗ 0.8 ∗ 0.2 ∗ 0.8 = 0.00168, that is L1 = 1.68∗10−3. The same likelihood can be computed for all the other hypotheses (10%…90%). The likelihood that the probability of correct responses at the same intensity (log −2.5 in our example) is 10%…90% is reported in table II.1. As shown in the table, the likelihood to generate the reported sequence of responses to a log intensity = −2.5 is at its maximum if it is hypothesized that log intensity = −2.5 generates a probability of correct responses of 70% and not of 50%: evidently, log intensity −2.5 is too high to be the threshold. The likelihood function, therefore, “clips” the pdf on the right side and the next stimulus will be presented to a lower level of intensity, at the mode (or mean) of the updated pdf.
Appendix II
183
TAB. II.1 – Likelihood that the binary sequence of responses P(C) ∗ P(C) ∗ P(I) ∗ P(C) ∗ P (I) ∗ P(C)1 ∗ P(C) ∗ P(C) ∗ P(I)1 ∗ P(C) to a log stimulus intensity −2.5 is the expected sequence of responses if log −2.5 corresponds to φ = 0.1…0,5 (the threshold)…09. In the last column of the table, log likelihood is reported because log likelihood is computationally preferable to the likelihood.3 The MLE equivalent to the hypothesis nb.7 is reported in bold. Hypothesis nb
1 2 3 4 5 6 7 8 9
Probability of correct responses P(C) = φ
0.1 0.2 0.3 0.4 0.5 (target probability) 0.6 0.7 0.8 0.9
Probability of incorrect responses P(I) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
Likelihood
Log likelihood4
7.29*∗10−8 6.55∗10−6 7.50∗10−5 3.54∗10−4 9.77∗10−4 1.79∗10−3 2.22∗10−3 1,68∗10−3 4.78∗10−4
−7.13 −5,18 −4.12 −3.45 −3.10 −2.74 −2,65 −2.77 −3,32
The resulting answer C or I is recorded, a new MLE is performed, the signal intensity with the maximum likelihood of the threshold is selected, and the pdf is “clipped” on the right or the left side, depending on the correct/incorrect response. The process is reiterated for a fixed number of trials or until the variance of the pdf falls below a given interval. The likelihood function is formalized as: LðhjXÞ ¼ LðhjX 1 . . .x n Þ ¼
n Y
Lðhjx i Þ:
ðII:1Þ
i¼1
where Lðhjx i Þ is the likelihood computed after each trial i for all the intensities of the range. So, the product of the likelihood at each trial i produces the likelihood function Lðhjx Þ until that trial.
3
YAAP and ZEST use direct likelihood, the other parametric procedures use log likelihood. According to Harvey (1986) log likelihood is computationally safer than likelihood.
4
Appendix III Probit Analysis Probit analysis is a regression technique introduced by Bliss in the thirties (Bliss, 1934) to analyze binomial response variables. Probit analysis has been originally conceived in the toxicological field to study the dose–response relationship of pesticides (dose of pesticide vs. percentage killed). Bliss (1934) proposed transforming the percentage killed into a “probability unit” (or “probit”), and the independent variable (in this case the dose of pesticide) as log units. This way, the dose–response sigmoid is turned into a linear function, so that the asymptotic parts of the curve (and especially the upper asymptote that is crucial to assess the optimal dose of a pesticide) can be better analyzed. In psychophysics, probit analysis is a regression model that converts the binomial outcome (miss = 0, hit = 1), i.e., the proportion of correct responses as a function of the signal level, into units of probability (probits). In other terms, probit analysis assesses the relationship between the probits of detection/discrimination and the log signal level. To this purpose, it transforms the sigmoid psychometric function5 obtained from the binomial distribution of the responses into a linear model. In the model, the positional parameter α (the threshold) and the dispersion parameter β (the slope) can be analyzed via least squares or maximum likelihood.6 Probit analysis has been applied to the psychophysical experiments that make use of the method of constant stimuli, like the Adaptive Probit Estimation (APE).7 Bliss provided a table to convert the kill percentages (or proportion of correct responses in psychophysics) to probits. In the conversion, 1% kill equals 2.67 probits, 50% equals 5.00 probits, and 99.90% equals 8.09 probits. Data points are inversely weighted by their binomial variability (Mc Kee et al., 1985).
5
The only requirement of probit analysis is that the psychometric function fits a cumulative normal distribution. If this is not the case, logit analysis should be preferred. 6 As highlighted by Lieberman (1983), probit analysis differs from the standard linear regression techniques since it heavily weights the data points about the threshold and does not assign equal weight to all the observations. 7 Section 11.1.
Appendix III
186
The conversion table from percent kill to probits (in its final version) has been published by Finney (1952) and reported in table III.1. Probits are then plotted against the logarithm of the dose (or stimulus strength). This way a straight line is obtained via least squares8 or MLE (figure III.1). The main advantages of the probit transformation in psychophysics are: – It allows analyzing the relationship between sensitivity (proportion of correct responses) and stimulus strength at the asymptotic levels of the psychometric function. Each stimulus intensity at the asymptotic level, for example at 99.5% of TAB. III.1 – Conversion from percent kill (or proportion of correct responses) to probits (From Finney, 1952). For example 2% equals to 2,95 probits, 50% equals to 5.00 probits, 72% equals to 5.58 probits, 99.90% equals to 8.09 probits. % 0 10 20 30 40 50 60 70 80 90 – 99
0 – 3.72 4.16 4.48 4.75 5.00 5.25 5.52 5.84 6.28 0.0 7.33
1 2.67 3.77 4.19 4.50 4.77 5.03 5.28 5.55 5.88 6.34 0.1 7.37
2 2.95 3.82 4.23 4.53 4.80 5.05 5.31 5.58 5.92 6.41 0.2 7.41
3 3.12 3.87 4.26 4.56 4.82 5.08 5.33 5.61 5.95 6.48 0.3 7.46
4 3.25 3.92 4.29 4.59 4.85 5.10 5.46 5.64 5.99 6.55 0.4 7.51
5 3.36 3.96 4.33 4.61 4.87 5.13 5.39 5.67 6.04 6.64 0.5 7.58
6 3.45 4.01 4.36 4.64 4.90 5.15 5.41 5.71 6.08 6.75 0.6 7.65
7 3.52 4.05 4.39 4.67 4.92 5.18 5.44 5.74 6.13 6.88 0.7 7.75
8 3.59 4.08 4.42 4.69 4.95 5.20 5.47 5.77 6.18 7.05 0.8 7.88
9 3.66 4.12 4.45 4.72 4.97 5.23 5.50 5.81 6.23 7.33 0.9 8.09
FIG. III.1 – The sigmoid dose–response and its linearization through the conversion of the dependent variable (proportion of correct responses or lethal events) in probit units and of the log transformation of the independent variable (signal strength or poison concentration, in x-coordinate). See text for explanation. From Bliss, 1934. Reproduced with permission of the American Association for the Advancement of Science via Copyright Clearance Center. 8
The method of least squares minimizes the distance between the line and the data.
Appendix III
187
correct responses, is obtained by checking on the linear model the log value of the stimulus intensity reported on the abscissa that corresponds to the probit value of 99.5% (i.e., 7.58 probits) reported on the ordinate. Then, the log stimulus intensity is turned into stimulus intensity. – It allows detecting a change in the effect of the stimulus strength on the sensitivity (proportion of correct responses) over different ranges of stimulus intensity (see the abrupt change in the slope of the linear model figure III.1: dotted vs. continuous line).
Appendix IV About Bayes’ Theorem and Bayesian Statistics As recalled by Spiegenlhalter and Rice (2009) “Bayesian statistics is a system for describing epistemological uncertainty using the mathematical language of probability. In the Bayesian paradigm,9 degrees of belief in states of nature are specified; these are non-negative, and the total belief in all states of nature is fixed to be one. Bayesian statistical methods start with existing ‘prior’ beliefs, and update these using data to give ‘posterior’ beliefs, which may be used as the basis for inferential decisions”. The Bayesian approach assumes a probability distribution to describe the actual (a prior) knowledge about the parameter of interest: for example the threshold. This prior knowledge is then combined (updated) with the likelihood that a given level of the signal is the threshold according to the responses of the subject. In this way a posterior probability distribution is obtained. This posterior probability distribution takes into account what was previously known about the possible value of the parameter and uses the data collected as the examination progresses to define the parameter more and more accurately. In the words of Alcalá-Quintana and García-Pérez: “the likelihood clips the less likely side of the prior. When the [last] trial has been completed, the resultant product is the final posterior […].10 So: Posterior P ¼ Likelihood Prior P: Bayes’ theorem, therefore, can be formalized as follows: pðhjx Þ ¼
9
pðxjhÞpðhÞ : pðx Þ
ðIV:1Þ
Theorized in 1763 by the British mathematician Thomas Bayes (Bayes T, 1763, An Essay towards solving a Problem in the Doctrine of Chances. Philosophical Transactions of the Royal Society of London). 10 Alcalá-Quintana & García-Pérez (2004), p. 253.
190
Appendix IV
FIG. IV.1 – Prior, likelihood, and posterior distribution for the threshold. The posterior probability distribution of the threshold is shifted to the right compared to the likelihood distribution due to the effect of the prior probability distribution of the threshold. The posterior distribution can be seen as the trade-off between the likelihood estimation of the threshold provided by the data gathered in the experiment and the previous knowledge about the parameter of interest. where: – x is the signal strength; – θ is the parameter under investigation (for example the threshold); – pðhjx Þ is the posterior conditional probability distribution that the threshold θ corresponds to the signal strength x; – pðxjhÞ is the likelihood that the signal strength x is the threshold θ; – p(θ) is the prior probability distribution of x being the threshold θ; – p(x) is a normalizing constant. So, “the Bayesian approach combines the prior distribution p(θ) (epistemological uncertainty) with likelihood pðxjhÞ to provide a posterior distribution pðhjx Þ (updated epistemological uncertainty)” (Spiegenlhalter & Rice, 2009: figure IV.1).
What is the model of distribution that best represents the prior? A main issue is about the most suitable function to represent the prior distribution. In figure IV.2 different potential prior distributions are depicted. King-Smith and Associates (1994) stated that the best prior is a modified hyperbolic secant, more suitable than a Gaussian function when the threshold is far from the peak of the prior distribution.11 However, the same authors pinpoint that 11
The hyperbolic secant satisfactorily reflects the prior pdf of difference light sensitivity across the visual field (perimetric data) in normal subjects. A bimodal pdf with a sharp peak at a high level of sensitivity and a flatter peak at zero sensitivity represents the prior pdf in a composite population made of normal and glaucomatous subjects (Vingrys & Pianta, 1999, p. 592.). Vingrys and Pianta
Appendix IV
191
FIG. IV.2 – Different prior functions that can be adopted in Bayesian procedures. From left to right, upper panels: uniform distribution, trapezoidal distribution, Gaussian distribution; lower panels: beta distribution (inverse), hyperbolic secant distribution.
the shape of the prior pdf does not substantially affect the threshold estimate of the procedure (ZEST in their case). According to Treutwein and Strasburger (1999), the most appropriate function is the beta distribution B (x; p,q).12 In their simulation study, Alcalà-Quintana and Garcia-Pérez investigated which is the prior that provides the most accurate measure of the threshold in yes/no Bayesian experiments (Alcalà-Quintana and Garcia-Pérez, 2004). They measured the difference between true and averaged estimated thresholds using as a prior a modified hyperbolic secant, a Gaussian distribution, a beta distribution, a (Footnote 11 continued) (1999) suggested using this composited prior pdf that takes into account the normal and abnormal distribution of the threshold in the population. This is, indeed the shape of the prior pdf recommended by the authors for ZEST applied to clinical perimetry. 12 The beta distribution B (x; p,q) is a family of probability distributions whose form is parametrized by two positive variables: p and q. It is defined by: F ðx; p; q Þ
x p1 ð1 x Þq1 : B ðp; q Þ
ðIV:2Þ
where x is a realization (an observed value) of a probability and B is a normalizing constant that ensures that the total probability integrates to 1. The different values of the parameters p and q let the distribution assume a wide variety of shapes: rectangular (p and q = 1), symmetrical U-shaped (p = q < 1), symmetrically inverse U-shaped (i.e., Gaussian: p = q > 1), asymmetrically inverse U-shaped (p ≠ q). In the figure below are represented different beta distributions, parameterized by different p and q. From Treutwein and Strasburger, 1999. Reproduced with permission of Springer Nature via
192
Appendix IV
trapezoidal distribution, and a uniform (rectangular) distribution. The authors used a fixed number of trials (10, 20, and 30) and the mode of the pdf as the parameter for the placement of the stimuli and for the threshold estimate. The assumed shape of the psychometric function was logistic. From the modified hyperbolic secant to the uniform distribution, the prior function is progressively broader and flatter. The authors found that the flatter is the distribution, the broader is the range of true thresholds accurately estimated (i.e., the difference between true and estimated threshold decreases); this range corresponds to the region of the prior distribution with the highest probability density. The hyperbolic function was the prior distribution that gave the strongest bias and the largest standard errors with n = 10 trials. Bias and standard errors increased as a function of the displacement of the true threshold from the peak of the prior distribution, generating underestimation (threshold biased low) of high true thresholds and overestimation (threshold biased high) of low true thresholds. The range of unbiased averaged estimates broadened as the prior flattened out so that the uniform prior yielded the broadest range of unbiased estimates. Besides, bias decreased with the increment of the number of trials (from 10 up to 30).
How wide should be the prior? The correct choice of the a priori probability distribution function in a Bayesian procedure involves not only its shape but also its width, i.e., its standard deviation (SD). Informative priors are useful to accelerate the convergence of the posterior pdf toward the threshold, avoiding positioning the stimuli at the extremes of the spectrum of intensity during the initial phase of the examination: in this sense, the more precise (i.e., refined) the prior, the more informative it will be about the location of the threshold, and the more efficient the Bayesian procedure: so, the precision of the prior pdf is reflected in its reduced standard deviation. Yet, reducing the standard deviation of the prior improves the estimate of the threshold (and of the slope) of the psychometric function, but only if the reduction (Footnote 12 continued) Copyright Clearance Center.
Appendix IV
193
remains below a limit: if the SD of the prior pdf becomes too small, the updated posterior pdf tends to be biased, so that the estimate worsens. This is most evident if the number of trials is not large. As an explanation, a too narrow prior distribution biases the sequential tracking of the MLE, misleading the positioning of the stimulus during the first trials, thereby generating a biased final judgment: even more so if the prior value is included in the threshold computation at the end of the test. As a rule of thumb, the SD of the prior distribution should be between one-half and one-quarter of the range of intensities tested by the Bayesian procedure (Shen & Richard, 2012).13 It follows from the above that in the Bayesian approach the shape and width of the prior pdf must be cautiously chosen. In addition to the appropriate prior, three other parameters require consideration in Bayesian psychophysics: (1) The placement rule. Trial after trial, the stimulus can be presented at the presumed threshold intensity corresponding to the mean (Emerson, 1986), median (King-Smith et al., 1994), or mode (Watson & Pelli, 1983) of the updated pdf. (2) The final estimator of the threshold. As for the placement rule, the final estimate of the threshold can rely on the mean (King-Smith et al., 1994), on the median, or the mode (Watson & Pelli, 1983) of the last updated pdf. Compared to the median and mode, the mean is superior both as a placement rule and as a estimator of the threshold (Emerson, 1986; King-Smith et al., 1994; AlcaláQuintana & García-Pérez, 2004). In simulations, King-Smith and colleagues (1994b) and Emerson (1986) showed that the mean of the posterior pdf provides more reliable measures of the threshold compared to the median or the mode. In addition, the mean minimizes the variance of the estimate of the slope (Kontsevich & Tyler, 1999). The prior can be considered (Watson & Pelli, 1983) or not considered (King-Smith et al., 1994) for the final calculation of the threshold. It is argued that the inclusion of the prior in the final computation of the threshold biases the estimate, especially when the number of trials is low (Alcalá-Quintana & García-Pérez, 2004). Evidently, with the use of a uniform (i.e., non-informative) prior, this problem is avoided but the advantage in terms of testing time is lost.14 (3) The termination rule.The stopping rule can be fixed-trial (the exam ends after a specified number of trials) or dynamic, when the confidence interval for the threshold falls below a pre-selected limit, like in YAAP.15 According to a
13
The standard deviation of the prior affects the measure of the slope parameter to a less extent compared to the threshold. 14 When a non-informative (“or rectangular”) a prior probability distribution is adopted, the Bayesian approach seems to coincide with the maximum likelihood estimation. Yet, as highlighted by Spiegenlhalter and Rice, there is a main “philosophical difference between Bayesian and frequentist inference; Bayesians make statements about the relative evidence for parameter values given a dataset, while frequentists compare the relative chance of datasets given a parameter value” (Spiegenlhalter & Rice, 2009). 15 Section 10.2.5.
194
Appendix IV
simulation, the dynamic rule is not more efficient than the fixed-trial stopping criterion (Anderson, 2003). Bayesian statistical methods can be used when the actual evidence seems inadequate or imperfect due to potential bias. Reanalyzing the problem in the light of prior assumptions makes the conclusions more truthful and accurate. In psychophysics, the Bayesian model aims at increasing the efficiency of the computation of the parameters of interest (threshold, slope, or both), reducing the examination time without loss of precision in the final estimate. However, the Bayesian approach is not free of drawbacks. As reported by Treutwein and Strasburger (1999), especially when the number of trials is low, the prior knowledge can “poison” the objective procedure of ML-threshold estimation with an external (and subjective, as it is arbitrarily selected by the operator) assumption. In these cases, the posterior pdf may be consistently affected by the prior function, which dominates, causing biased judgments. In conclusion and apart from criticism, “Bayes theorem can be thought of as a way of coherently updating our uncertainty in the light of new evidence” (Spiegenlhalter and Rice, 2009). In psychophysics it is used to increase the efficiency of the modern adaptive procedures, a key issue when testing patients within the clinical setting.
References
Aleci C., Piana G., Anselmino F. (2010) Evaluation of spatial anisotropy by curvature analysis of elliptical targets, Open Ophthalmology J. 4, 20. Alcalá-Quintana R., García-Pérez M.A. (2004) The role of parametric assumption in adaptive Bayesian estimation, Psychol. Methods 9, 250. Alcalá-Quintana R., Garcia Pérez M.A. (2007) A comparison of fixed-step size and Bayesian staircases for sensory threshold estimation, Spatial Vision 20, 197. Anderson A.J. (2003) Utility of a dynamic termination criterion in the ZEST adaptive threshold method, Vision Res. 43, 165. Anderson A.J, Johnson C.A. (2006) Comparison of the ASA, MOBS, and ZEST threshold methods, Vision Res. 46, 2403. Bebie H., Fankhauser F., Spahr J. (1976) Static perimetry: strategies, Acta Ophthalmologica 54, 325. Bergmann C. (1858) Anatomisches und Physiologisches über die Netzhaut del Auges, Zeitschrift für rationelle Medicin (J. Henle, C. von Pfeufer, Eds). Dritte Reihe, II. Band, Winter, Leipzig & Heidelberg, p. 83. Berkson J. (1955) Maximum likelihood and minimum chi-square estimates of the logistic function, J. Amer. Stat. Assoc. 50, 130. Blackwell H.R. (1952) Studies of psychophysical methods for measuring visual thresholds, J. Opt. Soc. Amer. 42, 606. Bliss C.I. (1934) The method of probits, Science 79, 38. Bliss C.I. (1934) The method of probits - a correction, Science 79, 409. Breakwell G.M., Hammond S., Fife-Schaw C.F. (2000) Research Methods in Psychology. Sage Publications, London. Breitmeyer B.G. (1975) Simple reaction time as a measure of the temporal response properties of transient and sustained channels, Vision Res. 15, 1411. Brown L.G. (1996) Additional rules for the transformed up-down method in psychophysics, Percept. Psychophys. 58, 959. Burt O.R. (1964) Curve fitting to step functions, J. Farm Econ. 46, 662. Buss E., Hall J.W., Grose J.H., Dev M.B. (2001) A comparison of threshold estimation methods in children 6-11 years of age, J. Acoustical Soc. Amer. 109, 727. Campbell R.A., Lasky E.Z. (1968) Adaptive threshold procedure: BUDTIF, J. Acoustical Soc. Amer. 44, 537. Chauhan B.C., House P.H. (1991) Intratest variability in conventional and high-pass resolution perimetry, Ophthalmology 98, 79. Chauhan B.C., Johnson C.A. (1999) Test-retest variability of frequency-doubling perimetry and conventional perimetry in glaucoma patients and normal subjects, Invest. Ophthalmology Visual Sci. 40, 648.
196
References
Chahuan B.C., Tompkins J.D., LeBlanc R.P., McCormick T.A. (1993) Characteristics of frequency-of-seeing curve in normal subjects, patients with suspected glaucoma, and patients with glaucoma, Invest. Ophthalmology Visual Sci. 34, 3534. Cobo-Lewis A.B. (1997) An adaptive psychophysical method for subject classification, Percept. Psychophys. 59, 989. Cornsweet T.N. (1962) The staircase method in psychophysics, Amer. J. Psychology 75, 485. Derman C. (1957) Non-parametric up- and down experimentation, Ann. Math. Stat. 11, 186. Derrington A.M., Hennings G.B. (1981) Pattern discrimination with flickering stimuli, Vision Res. 21, 597. Dixon W.J., Mood A.M. (1948) A method for obtaining and analyzing sensitivity data, J. Amer. Stat. Assoc. 43, 109. Ehrenstein W.H., Ehrenstein A. (1999) Psychophysical methods, Modern techniques in neuroscience research (U. Windhorst, H. Johansson, Eds). Berlin, New York: Springer, pp. 1211– 1241. Emerson P.L. (1984) Observations on a maximum likelihood method of sequential threshold estimation and a simplified approximation, Percept. Psychophys. 36, 199. Emerson P.L. (1986) Observations on maximum-likelihood and Bayesian methods of forced-choice sequential threshold estimation, Percept. Psychophys. 39, 151. Faes L., Nollo G., Ravelli F., Ricci L., Vescoci M., Turatto M., Pavani F., Antolini R. (2007) Small-sample characterization of stochastic approximation staircases in forced-choice adaptive threshold estimation, Percept. Psychophys. 29, 254. Findlay J.M. (1978) Estimates on probability functions: a more virulent PEST, Percept. Psychophys. 23, 181. Finney D.J. (1952) Probit analysis, Finney Ed. Cambridge University Press, Cambridge, England. Finney D.J. (1971) Probit analysis. Cambridge University Press, Cambridge. Garcia Pérez M.A. (1998) Forced-choice staircases with fixed step sizes: asymptotic and small-sample properties, Vision Res. 38, 1861. Garcia Pérez M.A. (2001) Yes-no staircases with fixed step sizes: psychometric properties and optimal setup, Optometry Visual Sci. 78, 56. Green D.M. (1990) Stimulus selection in adaptive psychophysical procedures, J. Acoust. Soc. Amer. 87, 2662. Green D.M. (1993) A maximum-likelihood method for estimating thresholds in a yes-no task, J. Acoust. Soc. Amer. 93, 2096. Green D.M. (1995) Maximum-likelihood procedures and the inattentive observer, J. Acoust. Soc. Amer. 97, 3749. Green D.M., Swets J.A. (1966) Signal detection theory and psychophysics. Wiley, New York. Green D.M., Luce R.D. (1975) Parallel psychometric functions from a set of independent detectors, Psychol. Rev. 82, 483. Hall J.L. (1968) Maximum-likelihood sequential procedure for estimation of psychometric functions, J. Acoust. Soc. Amer. 44, 370. Hall J.L. (1981) Hybrid adaptive procedure for estimation of psychometric functions, J. Acoust. Soc. Amer. 69, 1763. Hall J.L. (1983) A procedure for detecting variability of psychophysical thresholds, J. Acoust. Soc. Amer. 73, 663. Harvey L.O. (1986) Efficient estimation of sensory thresholds, Behavior Res. Methods, Instrum. Comput. 18, 623. Hesse A. (1986) Comparison of several psychophysical procedures with respect to threshold estimates, reproducibility, and efficiency, Acta Acustica united with Acoustica 59, 263. Hughson W., Westlake H. (1944) Manual for program outline for rehabilitation of aural casualties both military and civilians, Trans. Amer. Acad. Ophthalmology Otolaryngology 48, 1. Jäkel F., Wichmann F.A. (2006) Spatial four-alternative forced-choice method is the preferred psychophysical method for naïve observers, J. Vision 6, 1307. Johnson C.A., Adams C.W., Lewis R.A. (1988) Fatigue effects in automated perimetry, Appl. Opt. 27, 1030.
References
197
Johnson C.A., Nelson-Quigg J.M. (1993) A prospective three-year study of response properties of normal subjects and patients during automated perimetry, Ophthalmology 100, 269. Johnson C.A., Shapiro L.R. (1989) A comparison of MOBS (Modified Binary Search) and staircase test procedures in automated perimetry, Technical Digest Series, pp. 84-87. Johnson D.M., Watson C.S., Kelly W.J. (1984) Performance differences among the intervals in forced-choice tasks, Percept. Psychophys. 35, 553. Kaernbach C. (1990) A single-interval adjustment-matrix (SIAM) procedure for unbiased adaptive testing, J. Acoust. Soc. Amer. 88, 2645. Kaernbach C. (1991) Simple adaptive testing with the weighted up-down method, Percept. Psychophys. 49, 227. Kaernbach C. (2001) Adaptive threshold estimation with unforced-choice tasks, Percept. Psychophys. 63, 1377. Kaernbach C. (2001b) Slope bias of psychometric functions derived from adaptive data, Percept. Psychophys. 63, 1389. Kaplan H.L. (1975) The five distractors experiment: exploring the critical band with contaminated white noise, J. Acoust. Soc. Amer. 58, 404. Kershaw C.D. (1985) Statistical properties of staircase estimates from two interval forced choice experiments, British J. Math. Stat. Psychol. 38, 35. Kesten H. (1958) Accelerated stochastic approximation, Ann. Math. Stat. 29, 41. King-Smith P.E. (1984) Efficient threshold estimates from yes-no procedures using few (about 10) trials, Amer. J. Optometry Physiological Opt. 61, 119. King-Smith P.E., Grigsby S.S., Vingrys A.J., Benes S.C., Supowit A. (1994) Efficient and unbiased modifications of the QUEST threshold method: theory, simulations, experimental evaluation and practical implementation, Vision Res. 34, 885. King-Smith P.E., Grigsby S.S., Vingrys A.J., Benes S.C., Benes S.C., Supowit A. (1994b) Comparison of the QUEST and related methods for measuring thresholds: efficiency, bias, and practical considerations, Vision Res. 34, 885. King-Smith P.E, Rose D. (1997) Principles of an adaptive method for measuring the slope of the psychometric function, Vision Res. 37, 1595. Klein S.A. (2001) Measuring, estimating, and understanding the psychometric function: a commentary, Percept. Psychophys. 63, 1421. Kollmeier B., Gilkey R.H., Sieben U.K. (1998) Adaptive staircase techniques in psychoacoustics: a comparison of human data and a mathematical model, J. Acoust. Soc. Amer. 83, 1852. Kontsevich L.L., Tyler C.W. (1999) Bayesian adaptive estimation of psychometric slope and threshold, Vision Res. 39, 2729. Leek M.R., Hanna T.E., Marshall L. (1991) An interleaved tracking procedure to monitor unstable psychometric functions, J. Acoust. Soc. Amer. 90, 1385. Leek M.R., Hanna T.E., Marshall L. (1992) Estimation of psychometric functions from adaptive tracking procedures, Percept. Psychophys. 51, 247. Leek M.R. (2001) Adaptive procedures in psychophysical research, Percept. Psychophys. 63, 1279. Lelkens A.M., Opponeer P.M. (1983) The frequency of seeing square wave density modulations in random dot patterns, Biol. Cybern. 48, 165. Lesmes L., Jeon S., Lu Z., Dosher B. (2006) Bayesian adaptive estimation of threshold versus contrast external noise functions: the quick TvC method, Vision Res. 46, 3160. Lesmes L., Lu Z., Baek J., Albright T.D. (2010) Bayesian adaptive estimation of the contrast sensitivity function: the quick CSF method, J. Vision 10, 1. Levitt H. (1970) Transformed up-down methods in psychoacoustics, J. Acoust. Soc. Amer. 33, 467. Lieberman H.R. (1983) Computation of psychophysical thresholds using the probit technique, Res. Meth. Instr. 15, 446. Lieberman H.R., Pentland A.P. (1982) Microcomputer-based estimation of psychophysical thresholds: the best PEST, Behav. Res. Methods Instru. 14, 21. Lotze M., Treutwein B., Roenneberg T. (2000) Daily rhythm of vigilance assessed by temporal resolution of the visual system, Vision Res. 40, 3467. Macmillan N.A., Creelman C.D. (1991) Detection theory. A user’s guide. Cambridge University Press, Cambridge.
198
References
Madigan R., Williams D. (1987) Maximum-likelihood psychometric procedures in two-alternative forced-choice: evaluation and recommendations, Percept. Psychophys. 42, 240. Marvit P., Florentine M. (2003) A comparison of psychophysical procedures for level-discrimination thresholds, J. Acoust. Soc. Amer. 113, 3348. McKee S.P., Klein S.A., Teller D.Y. (1985) Statistical properties of forced-choice psychometric functions: implications of probit analysis, Percept. Psychophys. 37, 286. Nachmias J. (1981) On the psychometric function for contrast detection, Vision Res. 21, 995. Nagaraja N.S. (1964) Effect of luminance noise on contrast thresholds, J. Opt. Soc. Amer. 54, 950. O’Regan J.K., Humbert R. (1989) Estimating psychometric functions in forced-choice situations: significant biases found in threshold and slope estimations when small samples are used, Percept. Psychophys. 46, 432. Patterson V.H., Foster D.H, Heron J.R. (1980) Variability of visual threshold in multiple sclerosis, Brain 103, 139. Pelli D.G. (1987) The ideal psychometric procedure, Invest. Ophthalmology Visual Sci. 28, 366. Pentland A. (1980) Maximum-likelihood estimation: the best PEST, Percept. Psychophys. 28, 377. Phipps J.A., Zele A.J., Dang T., Vingrys A.J. (2001) Fast psychophysical procedures for clinical testing, Clin. Exper. Optometry 84, 264. Pierce G.E., King-Smith P.E. (1992) Yes-no or two alternative forced choice. Which is the better clinical threshold technique? Invest. Ophthalmology Visual Sci. 33,964. Pizlo Z. (2001) Perception viewed as an inverse problem, Vision Res. 41, 3145. Rammsayer T.H. (1992) An experimental comparison of the weighted up-down method and the transformed up-down method, Bull. Psychonomial Soc. 30, 425. Robbins H., Monro S. (1951) A stochastic approximation method, Ann. Math. Stat. 22, 400. Rose R.M., Teller D.Y., Rendleman P. (1970) Statistical properties of staircase estimates, Percept. Psychophys. 8, 199. Roufs J.A.J. (1974) Dynamic properties of vision-VI. Stochastic threshold fluctuations and their effect on flash-to-flicker sensitivity ratio, Vision Res. 14, 871. Schlauch R.S., Rose R.M. (1986) A comparison of two-, three-, and four-alternative forced-choice staircase procedures, J. Acoust. Soc. Amer. 80, S123. Schlauch R.S., Rose R.M. (1990) Two-, three-, and four-interval forced-choice staircase procedures: estimator bias and efficiency, J. Acoust. Soc. Amer. 88, 732. Sekuler R., Blake R. (1994) Perception, 3rd Ed. McGraw-Hill, New York. Shelton B.R., Scarrow I. (1984) Two-alternative versus three-alternative procedures for threshold estimation, Percept. Psychophys. 35, 385. Shelton B.R., Picardi M.C., Green D.M. (1982) Comparison of three adaptive psychophysical procedures, J. Acoust. Soc. Amer. 71, 1527. Shen Y., Richards V.M. (2012) A maximum-likelihood procedure for estimating psychometric functions: thresholds, slopes, and lapses of attention, J. Acoust. Soc. Amer. 132, 957. Simpson W.A. (1988) The method of constant stimuli is efficient, Percept. Psychophys. 44, 433. Simpson W.A. (1989) The step method: a new adaptive psychophysical procedure, Percept. Psychophys. 45, 572. Sims A., Pelli D.G. (1987) The ideal psychometric procedure, Invest. Ophthalmology Visual Sci. 28, 366. Snoeren P.R., Puts M.J.H. (1997) Multiple parameter estimation in an adaptive psychometric method: MUEST, an extension of the QUEST method, J. Math. Psychology 41, 431. Spahr J. (1975) Optimization of the presentation pattern in automated static perimetry, Vision Res. 15, 1275. Spiegenlhalter D., Rice K. (2009) Bayesian statistics, Scholarpedia 4, 5230. Stevens S.S. (1957) On the psychophysical law, Psychol. Rev. 64, 153. Stillman J.A. (1989) A comparison of three adaptive psychophysical procedures using inexperienced listeners, Percept. Psychophys. 46, 345. Strasburger H. (2001) Invariance of the psychometric function for character recognition across the visual field, Percept. Psychophys. 63, 1356. Strasburger H. (2001b) Converting between measures of slope of the psychometric function, Percept. Psychophys. 63, 1348.
References
199
Tanner W.P., Swets J.A. (1954) A decision-making theory of visual detection, Psychol. Rev. 61, 401. Taylor M.M. (1971) On the efficiency of psychophysical measurement, J. Acoust. Soc. Amer. 49, 505. Taylor M.M., Creelman C.D. (1967) PEST: Efficient estimates on probability functions, J. Acoust. Soc. Amer. 41, 782. Taylor M.M., Forbes S.M., Creelman C.D. (1983) PEST reduces bias in forced choice psychophysics, J. Acoust. Soc. Amer. 74, 1367. Thoss F. (1986) Visual threshold estimation and its relation to the question: fechner-law or Stevens-power function, Acta Neurobiologiae Experimentalis 66, 303. Thoss F., Bartsch B., Stebel J. (1998) Analysis of oscillations of the visual sensitivity, Vision Res. 38, 139. Treutwein B. (1995) Adaptive psychophysical procedures, Vision Res. 35, 2503. Treutwein B. (1997) YAAP: yet another adaptive procedure, Spatial Vision 11, 129. Treutwein B., Rentschler I. (1992) Double-pulse resolution in the visual field: the influence of temporal atimulus characteristics, Clin. Vision Sci. 7, 421. Treutwein B., Strasburger H. (1999) Fitting the psychometric function, Percept. Psychophys. 61, 87. Tyrrell R.A., Owens A. (1988) A rapid technique to assess the resting states of the eyes and other threshold phenomena: the modified binary search, Behav. Res. Methods, Instr., Comput. 20, 137. Vingrys A., Pianta M. (1999) A new look at threshold estimation algorithms for automated static perimetry, Optometry Vision Sci. 76, 588. Von Békésy G. (1960) Experiments in hearing. NcGraw-Hill Book Company Inc, New York. Vul E., Bergsma J., MacLeod D.I.A. (2010) Functional adaptive sequential testing, Seeing and Perceiving 23, 483. Wald A. (1947) Sequential analysis. John Wiley & Sons Eds, New York. Watkins V.H., Clark H.J. (1968) Forced-choice technique: Inter-alternative comparisons, Science 11, 351. Watson A.B. (1979) Probability summation over time, Vision Res. 19, 515. Watson A.B, Pelli D.G. (1979) The QUEST staircase procedure. Appl. Vision Assoc. Newslett. 14, 6. Watson A.B., Pelli D.G. (1983) QUEST: a bayesian adaptive psychometric method, Percept. Psychophys. 33, 113. Watson A.B., Fitzhugh A. (1990) The method of constant stimuli is inefficient, Percept. Psychophys. 47, 87. Watt R.J., Andrews D.P. (1981) APE: adaptive probit estimation of psychometric functions, Current Psychol. Rev. 1, 205. Weibull W.A. (1951) A statistical distribution function of wide applicability, J. Appl. Mech. 18, 292. Wetherill G.B. (1963) Sequential estimation of quantal response curves, J. Royal Stat. Soc. B25, 1. Wetherill G.B., Levitt H. (1965) Sequential estimation of points on a psychometric function, British J. Math. Stat. Psychology 18, 1. Wichmann F.A., Hill N.J. (2001) The psychometric function I. Fitting, sampling, and goodness of fit, Percept. Psychophys. 63, 1293. Wichmann F.A., Hill N.J. (2001b) The psychometric function II. Bootstrap-based confidence intervals and sampling, Percept. Psychophys. 63, 1314. Zwislochi J.J., Maire F., Feldman A.S., Rubin A. (1958) On the effect of practice and motivation on the threshold of audibility, J. Acoust. Soc. Amer. 30, 254. Zwislochi J.J., Relkin E.M. (2001) On a psychophysical transformed-rule up and down method converging on a 75% level of correct responses, Proc. Nat. Acad. Sci. 98, 4811.
Index
.ψ method, 91, 92, 99, 103–106, 115, 120, 122, 125, 148 2AFC, 14, 15, 24–29, 49, 50, 56, 57, 80–83, 94–96, 103–106, 110, 138–143, 145, 147–150, 152, 161, 162, 164, 165 3AFC, 22–27, 94, 95, 109, 110, 138, 141–145, 147 4AFC, 23, 24, 27, 94–96, 109, 110, 132, 138, 139, 141–143 A Abbot (formula), 32, 38, 162, 172 Accelerated (stochastic approximation), 7, 8, 17, 55–57, 66, 67, 146 Adaptation, 11, 35, 47, 134 Adaptive, 27, 28, 38, 42–45, 57, 61, 69, 74, 81, 84, 87, 90–92, 94, 95, 104, 111, 113, 114, 118, 123, 127, 130, 131, 135, 138, 140, 144, 145, 148, 166, 174 Adjustment (method), 35, 38, 39, 164, 166, 167 Alternatives, 22–29, 33, 80, 117, 133, 138, 140, 142, 143, 164 AUC, 28–30, 141 B Background, 3–5, 17–19, 22, 88, 89, 99, 130, 156 Bayesian, 76–80, 83, 86, 91, 99, 103, 105, 106, 109, 115, 118, 119, 124, 128, 135, 148, 152, 153, 189, 192–194 Best PEST, 74–76, 82–85, 87, 139, 140, 144–148, 151, 152 Beta distribution, 190–192
Bias, 29, 32, 38, 48, 56, 79, 81, 83–86, 94, 95, 96, 111, 117, 127, 128, 130, 135, 137, 139, 140–145, 147, 150, 152, 153, 159–162, 164, 166, 167, 192–194 Binomial (distribution), 11, 50, 51, 58, 143 C Chance (level), 14, 80, 140, 152 Confidence, 9, 28–30, 49, 59, 62, 66, 72, 73, 77, 79, 81, 83, 134, 140, 141, 146, 193 Constant stimuli (method), 19, 35–38, 50, 90, 91, 92, 111, 123, 140, 144, 145, 185 Contrast, 1, 3, 4, 11, 21, 22, 24, 26, 39, 88, 113–115, 117–125, 128, 138, 141, 143, 174 Criterion (subjective, judgment), 21, 137, 138 Cumulative (distribution), 11, 12, 15, 16, 71, 73, 74, 83, 90, 92, 180, 185 D Detection, 17–19, 21, 22, 24–27, 32, 35, 39, 69, 87, 138, 141–143, 147, 155, 157, 161, 164, 173, 174 Discrimination, 3, 17–19, 21–27, 35, 39, 85, 87, 92, 142, 143, 145, 147, 174, 186 Dispersion (parameter of), 12, 71, 81, 83, 185 E Efficiency, 41, 79, 95, 102, 103, 117, 129, 140–148, 151, 153, 194 Entropy, 92, 103–106, 108, 115–117, 119, 120, 122–125, 145
Index
202 F False alarms, 31, 158, 159, 162, 163 False negative, 11, 14, 15, 31–33, 36, 38, 65, 80, 83, 127, 137, 138, 141, 151, 158 False positive, 11, 15, 32, 36, 38, 47, 127, 137, 138, 141, 151, 158–163, 165–167 Fechner’s law, 6, 7 Fluctuation, 31, 32, 47, 65–67, 127, 130, 133, 151 FOSC, 9, 88 G Goodness-of-fit, 134, 135, 142, 143 Guess rate, 11, 14–16, 32, 33, 80, 86, 107, 109, 114, 141, 179 Gumbel (distribution), 11, 180 H Hall (procedure of), 53, 74, 79, 90, 91, 94, 108, 131, 132, 146 I Independence, 48, 138, 144 Inertia, 59, 61, 93, 151 Inferred validity, 130 Interpretation bias, 81, 128 Isobias curve, 139 J Just noticeable difference (jnd), 3, 4, 22 L Lapse/lapsing rate, 11, 14–16, 80, 86, 107–110, 114, 134, 140, 152, 179 Learning (effect), 11, 42, 127, 143, 151, 152 Length, 4, 5, 7, 8, 11, 18, 19, 111, 165 Likelihood ratio test, 57 Limits (method of), 38, 39, 41 Logistic (distribution), 11, 71, 94, 107, 109, 179, 180 Luminance, 3–7, 19, 22, 31, 36, 39, 113, 123, 171, 174 M Maximum Likelihood Estimation (MLE), 70, 73, 77, 92, 144, 150, 181, 193 Measurement bias, 128 Minimum variance (method), 81, 82, 104–106, 123, 125, 147 MOBS, 62–67, 69, 146
Monte Carlo (simulation), 66, 75, 76, 106, 123, 132, 135, 144 Motion, 1, 3, 15, 17, 18, 113 N Noise, 113, 120–122, 130, 155–159, 161, 162, 164, 165 Nonadaptive (procedures), 41 Nonparametric (procedures), 145, 147 O Orientation, 1, 17–19, 23–26, 37 P Parametric (procedure), 84 Pdf (probability density function), 71 PEST, 57–61, 73–76, 79, 91, 94, 131–133, 145–147, 152 Positional bias, 138, 139, 161 Posterior (pdf), 75, 77–83, 86, 103, 105, 109, 115–117, 119, 122, 124, 190–194, 193, 194 Prior (pdf), 77–81, 83, 124, 152 Probit (analysis), 91–93 Psychometric function, 9–16, 27, 31, 35, 37, 38, 41, 43, 69, 69–71, 73–75, 78–80, 83, 86–88, 90–92, 94, 95, 99–104, 106–111, 113–115, 130–135, 140, 143, 144, 149, 152, 162 Q QUEST, 76, 77, 79–83, 86, 87, 142, 145–148, 152 R Reliability (test-retest), 120, 130, 142, 143 Response model, 21–30, 35, 43, 81, 95, 110, 141, 144, 146 Robbins-Monro (process), 76 ROC (curve), 161–163, 165 S Sensitivity, 1–3, 5, 17, 22, 24, 28, 31, 32, 41, 47, 65, 87–89, 114, 130, 133, 134, 138, 141, 142, 155, 158–160, 163, 170, 186, 187 Sensory determinacy, 130, 142, 143 Shape (of the psychometric function), 12, 43, 69, 74, 78, 114 Sigmoid, 11–13, 71, 92, 114, 179, 185 Signal/signal-plus-noise trial, 155
Index Sine wave, 17, 19, 21, 22, 26, 39, 113 Slope, 11–14, 16, 27, 37, 38, 43, 44, 69, 70, 75, 78, 80, 81, 83, 84, 87–91, 93–96, 99, 101, 102, 121, 127, 130–132, 135, 140, 142, 147–149, 152, 165, 179 Spread, 94, 99, 101 Staircase (procedures), 45, 47–50, 52, 57, 66, 67, 71, 94, 95, 113, 142, 167 Stationarity, 42, 78, 129–133, 137, 137, 140 Step size, 10, 28, 39, 43, 46–49, 51–53, 55–57, 61, 62, 66, 75, 76, 79, 97, 98, 111, 133, 134, 135, 144–146, 148–151 Stevens’ law, 5, 7, 8 Stochastic approximation, 53, 55–57, 66, 67, 76, 95, 111, 129, 146 Stopping (rule), 43, 74, 83, 94 Subjective criterion, 14, 130, 137, 138, 155–164 Sweat factor, 86, 95, 129, 142, 147 Sweet point, 91, 92, 94, 107–110 T Target probability, 4, 10, 12, 13, 27, 37, 38, 43, 44, 46, 48, 50–53, 56–58, 61, 66, 75, 79, 84, 104, 106, 109, 129, 138, 146, 148–150, 166, 181
203 Temporal bias, 127, 138, 140, 143 Threshold, 3–6, 9, 10, 12, 13, 15–17, 21, 22, 24–28, 30–32, 35–39, 41–47, 50, 55–59, 61–67, 69–100, 88, 89, 90–102, 114, 117, 118, 120, 128–131, 138, 140–180 U UDTR, 48–50, 67, 90, 91, 95, 131, 145, 148–150 Up-down staircase, 28, 38, 46, 48, 50, 52, 95, 110, 145, 149, 150 V Virulent PEST, 61, 75, 146 W Weber’s law, 5–7 Weibull (distribution), 11, 80, 86, 90 Weighted staircase, 29, 52, 141, 148, 149 Y YAAP, 75, 77, 83 Yes/no (y/n), 21 Z ZEST, 77, 80–82, 87, 91, 99, 103, 104, 142, 147, 148, 153, 191