Entrenchment in Usage-Based Theories: What Corpus Data Do and Do Not Reveal About The Mind 9783110294002, 9783110293852

This book explores the usage-based claim that high usage frequency leads to the entrenchment of complex words in the min

190 51 22MB

English Pages 306 [308] Year 2012

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Acknowledgements
Conventions
Tables
Figures
1 Introduction and overview
2 Entrenchment in usage-based theories
2.1 Entrenchment in usage-based constructionist approaches
2.2 Entrenchment and usage-based assumptions about language representation and processing
2.3 Entrenchment in usage-based theories of language acquisition
2.4 Entrenchment in emergentist theories of language phylogeny
2.5 Entrenchment in usage-based theories of language change
2.6 Entrenchment in related frameworks
2.7 Summary and outlook: The epistemological status of entrenchment in usage-based frameworks
3 The cognitive realism of usage-based generalizations, with a special focus on the relationship between token frequencies and entrenchment
3.1 The assumed link between collective usage and individual representation
3.2 Against the psychological realism of corpus-derived claims
3.3 In support of the potential psychological realism of corpus-derived claims
3.3.1 Neuroplasticity
3.3.2 Experimental research on token frequency effects in multi-word sequences
3.3.3 Psycholinguistic research on the entrenchment of multiword sequences
3.3.4 Patholinguistic data showing selective dissociations between holistic versus novel sequences of morphemes
3.3.5 Neuroimaging studies supporting the idea of a neurocognitive split between holistic and compositional modules
3.4 Conclusion: Can we reasonably expect corpora to predict entrenchment in the mind?
4 Operationalizing entrenchment
4.1 Defining entrenchment
4.2 Assessing the psychological realism of a statement in experimental terms
4.3 Sources of inspiration
4.3.1 Gestalt psychology
4.3.1.1 Why seek inspiration from non-linguistic lines of research?
4.3.1.2 What makes a chunk in Gestalt psychology?
4.3.2 The masked priming paradigm
4.3.3 Frequency effects in English derivatives
4.3.4 Parametric studies on word frequency effects
4.4 Operationalizing Entrenchment
5 Experimental design
5.1 Stimuli
5.1.1 Stimuli for the masked priming fMRI study
5.1.1.1 Stimuli for the main conditions
5.1.1.2 Stimuli for the ‘no ’ -conditions
5.1.1.3 Stimuli for the control conditions
5.1.2 Stimuli for the memory experiment
5.2 Experimental procedures
5.2.1 Experimental procedure for the masked priming fMRI studies
5.2.2 Experimental procedure for the memory task
6 Behavioural data analysis
6.1 Main conditions
6.1.1 General method
6.1.2 Simple linear mixed-effects regression analyses
6.1.3 Multiple mixed-effects regression analyses
6.1.3.1 Introductory remarks on methodology
6.1.3.2 Multiple regression results for part-to-whole priming
6.1.3.3 Multiple regression results for whole-to-part priming
6.2 Behavioural analyses for the supplementary experiments
6.2.1 Mixed-effects regression analyses for jumbled target priming
6.2.2 Behavioural analyses for the comparison with monomorphemic controls
6.2.3 Behavioural analyses for the memory task
6.3 Conclusions
7 Neuroimaging Data Analysis
7.1 Introductory remarks
7.2 Statistical Parametric Imaging analysis
7.2.1 Imaging Parameters
7.2.2 fMRI data analysis
7.2.3 Preprocessing
7.2.4 First-level analysis
7.3 Results
7.3.1 Second-level analyses
7.3.2 Results for the part-to-whole priming task
7.3.3 Results for the whole-to-part priming task
7.3.4 Results for the jumbled target priming task
7.3.5 Conjunction analysis for the main conditions
7.4 Conclusion
8 Summary and conclusion
8.1 Summary: Entrenchment in usage-based theories
8.1.1 Research rationale and questions
8.1.2 Operationalizing entrenchment
8.1.3 Behavioural results
8.1.4 fMRI Results
8.2 Some further theoretical implications
8.3 The corpus-to-cognition principle: Towards more fine-grained correlations between corpus and cognitive data
8.4 Outlook
References
Appendix
Index
Recommend Papers

Entrenchment in Usage-Based Theories: What Corpus Data Do and Do Not Reveal About The Mind
 9783110294002, 9783110293852

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Entrenchment in Usage-Based Theories

Topics in English Linguistics 83

Editors

Elizabeth Closs Traugott Bernd Kortmann

De Gruyter Mouton

Entrenchment in Usage-Based Theories What Corpus Data Do and Do Not Reveal About the Mind by

Alice Blumenthal-Drame´

De Gruyter Mouton

ISBN 978-3-11-029385-2 e-ISBN 978-3-11-029400-2 ISSN 1434-3452 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.dnb.de. ” 2012 Walter de Gruyter GmbH, Berlin/Boston Printing: Hubert & Co. GmbH & Co. KG, Göttingen 앝 Printed on acid-free paper 앪 Printed in Germany www.degruyter.com

To Almamy, Alexandre, Annie, Peter, Caroline and Christian

Acknowledgements

It is a pleasure to thank those who made this thesis possible. First and foremost, my thanks go to my supervisor Bernd Kortmann, whose insight, encouragement and support from the beginning of my studies as an undergraduate all the way through the completion of this manuscript have been fundamental in shaping my understanding of language and my research. Without him, this book would never have been written. I would also like to express my gratitude to the Freiburg Brain Imaging Lab (FBI) for generous support in the production of this work. Thanks in particular to Cornelius Weiller for being so open-minded about the challenges of interdisciplinary research and for allowing me to use the FBI lab facilities. I am very grateful to my FBI colleagues, from whom I learned so very much about doing fMRI research. I am greatly indebted to Maria-Cristina Musso, who introduced me to the field of neurolinguistics and shared her neurological expertise and thoughts on language-brain relationships with me. Many thanks go to Verena Haser, Christian Langstrof, Christian Mair, Richard Matthews, Wolfgang Raible, Verena Schröter, and two anonymous reviewers for very helpful comments on earlier drafts of this book. I would also like to express my gratitude to Rebecca Sautter and Hans-Jörg Mast, who lent invaluable personal and technical support when I needed it most. The statistical part of this work benefitted greatly from statistical advice provided by Harald Binder from the Institute of Medical Biometry and Medical Informatics of Freiburg University. I am grateful to the staff of the International Office of Freiburg University, who helped me to recruit participants for my experiments. I also wish to thank all research participants for their time and feedback. I also owe heartfelt thanks to Fouzia Eulmi and her family for looking after Alexandre so well and with so much dedication over the last two years. Finally, my deepest gratitude and affection go to my family, who have given me unconditional support in every stage of this project. Very special thanks go to my husband, Almamy Sékou Dramé, and to our son Alexandre Ismaël, who have been a constant source of strength, comfort, and joy. Alice Blumenthal-Dramé Strasbourg, September 2012

Contents

Acknowledgements Conventions Tables Figures

vii xiii xv xix

1

Introduction and overview

1

2

Entrenchment in usage-based theories

4

2.1 2.2

Entrenchment in usage-based constructionist approaches Entrenchment and usage-based assumptions about language representation and processing Entrenchment in usage-based theories of language acquisition Entrenchment in emergentist theories of language phylogeny Entrenchment in usage-based theories of language change Entrenchment in related frameworks Summary and outlook: The epistemological status of entrenchment in usage-based frameworks

6

2.3 2.4 2.5 2.6 2.7

10 13 15 17 20 23

3

The cognitive realism of usage-based generalizations, with a special focus on the relationship between token frequencies and entrenchment 27

3.1

The assumed link between collective usage and individual representation Against the psychological realism of corpus-derived claims In support of the potential psychological realism of corpusderived claims 3.3.1 Neuroplasticity 3.3.2 Experimental research on token frequency effects in multi-word sequences

3.2 3.3

28 33 44 44 47

x

Contents

3.3.3

3.4

Psycholinguistic research on the entrenchment of multiword sequences 3.3.4 Patholinguistic data showing selective dissociations between holistic versus novel sequences of morphemes 3.3.5 Neuroimaging studies supporting the idea of a neurocognitive split between holistic and compositional modules Conclusion: Can we reasonably expect corpora to predict entrenchment in the mind?

50 52 57 61

4

Operationalizing entrenchment

66

4.1 4.2

67

4.4

Defining entrenchment Assessing the psychological realism of a statement in experimental terms Sources of inspiration 4.3.1 Gestalt psychology 4.3.1.1 Why seek inspiration from non-linguistic lines of research? 4.3.1.2 What makes a chunk in Gestalt psychology? 4.3.2 The masked priming paradigm 4.3.3 Frequency effects in English derivatives 4.3.4 Parametric studies on word frequency effects Operationalizing Entrenchment

73 76 85 93 101 104

5

Experimental design

109

5.1

Stimuli 5.1.1 Stimuli for the masked priming fMRI study 5.1.1.1 Stimuli for the main conditions 5.1.1.2 Stimuli for the ‘no’-conditions 5.1.1.3 Stimuli for the control conditions 5.1.2 Stimuli for the memory experiment Experimental procedures 5.2.1 Experimental procedure for the masked priming fMRI studies 5.2.2 Experimental procedure for the memory task

109 109 109 116 117 118 119

4.3

5.2

69 72 73

119 123

Contents

xi

6

Behavioural data analysis

124

6.1

124 124 124 132 132

6.3

Main conditions 6.1.1 General method 6.1.2 Simple linear mixed-effects regression analyses 6.1.3 Multiple mixed-effects regression analyses 6.1.3.1 Introductory remarks on methodology 6.1.3.2 Multiple regression results for part-to-whole priming 6.1.3.3 Multiple regression results for whole-to-part priming Behavioural analyses for the supplementary experiments 6.2.1 Mixed-effects regression analyses for jumbled target priming 6.2.2 Behavioural analyses for the comparison with monomorphemic controls 6.2.3 Behavioural analyses for the memory task Conclusions

7

Neuroimaging Data Analysis

158

7.1 7.2

Introductory remarks Statistical Parametric Imaging analysis 7.2.1 Imaging Parameters 7.2.2 fMRI data analysis 7.2.3 Preprocessing 7.2.4 First-level analysis Results 7.3.1 Second-level analyses 7.3.2 Results for the part-to-whole priming task 7.3.3 Results for the whole-to-part priming task 7.3.4 Results for the jumbled target priming task 7.3.5 Conjunction analysis for the main conditions Conclusion

158 159 159 160 160 160 161 161 161 169 174 174 179

6.2

7.3

7.4

133 134 138 138 143 144 147

xii

Contents

8

Summary and conclusion

186

8.1

Summary: Entrenchment in usage-based theories 8.1.1 Research rationale and questions 8.1.2 Operationalizing entrenchment 8.1.3 Behavioural results 8.1.4 fMRI Results Some further theoretical implications The corpus-to-cognition principle: Towards more fine-grained correlations between corpus and cognitive data Outlook

186 186 189 191 194 198

8.2 8.3 8.4

205 211

References

215

Appendix

255

Index

278

Conventions List of Predictor Variables and their Abbreviations

logTokFreqDerCELEX:

log-transformed token frequency of the derivative in the CELEX database SylDer: number of syllables in the derivative (or: length in letters) LetDer: number of letters in the derivative PhonDer: number of phonemes in the derivative logTokFreqDerHAL: log-transformed token frequency of the derivative in the HAL database OrthNeighDer: number of orthographic neighbours of the derivative in the HAL database PhonNeighDer: number of phonological neighbours of the derivative in the HAL database PhonGraNeiDer: number of phonographic neighbours of the derivative in the HAL database logMeanBigFreqDer: log-transformed average bigram frequency of the derivative in the HAL database logSumBigFreqByPosDer: log-transformed summed bigram frequency by position of the derivative in the HAL database logTokFreqBaseCELEX: log-transformed token frequency of the base in the CELEX database LetBas: number of letters in the base PhonBas: number of phonemes in the base logTokFreqBasHAL: log-transformed token frequency of the base in the HAL database DerBasLetRat: ratio between the number of letters in the derivative and the base OrtNeiBa: number of orthographic neighbours of the base in the HAL database PhonNeighBas: number of phonological neighbours of the base in the HAL database PhonGraNeiBas: number of phonographic neighbours of the base in the HAL database logMeanBigFreqBas: log-transformed average bigram frequency of the base in the HAL database logSumBigFreqByPosBas: log-transformed summed bigram frequency by position of the base in the HAL database logFamFreq: log-transformed morphological family frequency of the base in the CELEX database

xiv

Conventions

logFamSize: LetSuf: logRelFreqCELEX: logRelFreqHAL: Productivity: NumberOfHapaxes: DistinctWordsWithAffix:

log-transformed morphological family size of the base in the CELEX database number of letters in the suffix log-transformed relative frequency calculated on the basis of CELEX frequencies log-transformed relative frequency calculated on the basis of HAL frequencies category-conditioned degree of productivity (or: potential productivity) number of hapax legomena with a given suffix realized productivity (or: type frequency) of an affix

Tables

Table 1.

Coefficient for the single fixed-effect factor LOG RELATIVE FREQUENCY in a mixed-effects model fitted to log RTs in part-to-whole priming. Table 2. Coefficient for the single fixed-effect factor LOG RELATIVE FREQUENCY in a mixed-effects model fitted to log RTs in whole-to-part priming. Table 3. Coefficient for the single fixed-effect factor DERIVATIVEBASE LETTER RATIO in a mixed-effects model fitted to log RTs in part-to-whole priming. Table 4. Coefficient for the single fixed-effect factor DERIVATIVEBASE LETTER RATIO in a mixed-effects model fitted to log RTs in whole-to-part priming. Table 5. Coefficient for the single fixed-effect factor PHONOGRAPHIC NEIGHBOURHOOD SIZE OF BASE in a mixed-effects model fitted to log RTs in part-to-whole priming. Table 6. Coefficient for the single fixed-effect factor PHONOGRAPHIC NEIGHBOURHOOD SIZE OF BASE in a mixed-effects model fitted to log RTs in whole-to-part priming. Table 7. Coefficients for the fixed-effect factors in the mixedeffects model fitted to log RTs in part-to-whole priming. Table 8. Coefficients for the fixed-effect factors in the mixedeffects model fitted to log RTs in whole-to-part priming. Table 9. Coefficients for the fixed-effect factors in the mixedeffects model fitted to log RTS in part-to-whole priming. Table 10. Coefficients for the fixed-effect factors in the mixedeffects model fitted to log RTs in whole-to-part priming. Table 11. Coefficient for the single fixed-effect factor LOG RELATIVE FREQUENCY in the mixed-effects model fitted to log RTs in jumbled target priming.

125

126

126

126

126

127 131 131 133 135

139

xvi

Tables

Table 12. Coefficient for the single fixed-effect factor DERIVATIVEBASE LETTER RATIO the mixed-effects model fitted to log RTs in jumbled target priming. Table 13. Coefficient for the single fixed-effect factor PHONOGRAPHIC NEIGHBOURHOOD OF BASE the mixed-effects model fitted to log RTs in jumbled target priming. Table 14. Coefficients for the fixed-effect factors in the mixedeffects model fitted to log RTs in jumbled target priming. Table 15. Result of an ANOVA, conducted with lmer, comparing mean log RTS to bimorphemic stimuli from three frequency bins and mean log RTs to tightly matched monomorphemic words in the part-to-whole priming task. Table 16. Result of an ANOVA, conducted with lmer, comparing the mean log RTS for bimorphemic stimuli from three frequency bins to log RTs for tightly matched monomorphemic words in the whole-to-part priming task. Table 17. Coefficient for the single fixed-effect factor LOG RELATIVE FREQUENCY as gleaned from the HAL database in a mixed-effects model fitted to log RTs in part-towhole priming. Table 18. Coefficient for the single fixed-effect factor LOG RELATIVE FREQUENCY as gleaned from the HAL database in a mixed-effects model fitted to log RTs in whole-topart priming. Table 19. Coefficient for the single fixed-effect factor LOG SURFACE FREQUENCY in a mixed-effects model fitted to log RTs in part-to-whole priming. Table 20. Coefficient for the single fixed-effect factor LOG SURFACE FREQUENCY in a mixed-effects model fitted to log RTs in whole-to-part priming. Table 21. Regions showing a negative correlation between BOLD signal and LOG RELATIVE FREQUENCY in whole-brain analysis for the part-to-whole priming task. Table 22. Region showing a positive correlation between BOLD signal and LOG RELATIVE FREQUENCY in whole-brain analysis for the whole-to-part priming task.

139

139 141

143

144

147

147

151

152

161

169

Tables

Table 23. Conjunction between the parametric effects of LOG RELATIVE FREQUENCY in the main conditions in wholebrain analysis. Table 24. Schematic representation of different potential (and not mutually exclusive) ways of processing transparent multimorphemic sequences in recent versions of the dualmechanism model. Table 25. Result of an ANCOVA comparing how log RTs vary between males and females as a function of LOG RELATIVE FREQUENCY in part-to-whole priming. Table 26. Result of an ANCOVA comparing how log RTs vary between males and females as a function of LOG RELATIVE FREQUENCY in whole-to-part priming. Table 27. Result of an ANCOVA comparing how log RTs vary between males and females as a function of LOG RELATIVE FREQUENCY in jumbled target priming. Table 28. Region where women exhibit significantly stronger BOLD activation than men in a two-sample t-test comparing the parametric effects of LOG RELATIVE FREQUENCY between sexes for the whole-to-part priming task in whole-brain analysis. Table 29. Stimuli in the high-frequency bin for the whole-to-part priming task. Table 30. Stimuli in the middle-frequency bin for the whole-to-part priming task. Table 31. Stimuli in the low-frequency bin for the whole-to-part priming task. Table 32. Stimuli in the high-frequency bin for the part-to-whole priming task. Table 33. Stimuli in the middle-frequency bin for the part-to-whole priming task. Table 34. Stimuli in the low-frequency bin for the part-to-whole priming task. Table 35 Control stimuli for the whole-to-part priming task. Table 36. Control stimuli for the part-to-whole priming task.

xvii

176

195

207

207

208

209 263 265 267 269 271 273 275 275

xviii

Tables

Table 37. Stimuli for the jumbled target priming task (or: ‘no’stimuli for the whole-to-part priming task). 275 Table 38. ‘No’-stimuli for the part-to-whole priming task. 276

Figures

Figure 1. An example of pattern completion: The perceiver unconsciously completes the missing lines to match the stimulus with the holistic mental concept of a circle. From: Gestalt psychology. (2011, May 16). In Wikipedia, The Free Encyclopedia. Retrieved 08:40, May 20, 2011, from http://en.wikipedia.org/w/index.php?title=Gestalt_ psychology&oldid=429425456 (caption mine). Figure 2. Picture of a Dalmatian illustrating the principle of emergence. From: Gestalt psychology. (2010, April 10). In Wikipedia, The Free Encyclopedia. Retrieved 11:13, April 10, 2010, from http://en.wikipedia.org/w/index.php? title=Gestalt_psychology&oldid=428146601 (caption mine). Figure 3. Cartoon illustrating the principle of downward causation in language. Retrieved 14:40, September 28, 2012, from http://bluebuddies.com/ubb/ultimatebb.php/topic/1/2071.h tml, with permission from BlueBuddies.com (caption mine). Figure 4. An example of top-down coercion from words onto their constituent letters. From: Top-down and bottom-up design. (2011, April 21). In Wikipedia, The Free Encyclopedia. Retrieved 08:57, May 20, 2011, from http://en.wikipedia.org/w/index.php?title=Top-down_and _bottom-up_design&oldid=425225865 (caption mine). Figure 5. Schematic representation of masked priming using the ‘sandwich technique’. Figure 6. Regions showing significant repetition suppression for opaque and transparent pairs compared form and meaning pairs. Reprinted from Journal of Cognitive Neuroscience, 19(9), Bozic, Mirjana, William D. Marslen-Wilson, Emmanuel A. Stamatakis, Matthew H. Davis, and Lorraine K. Tyler, Differentiating morphology, form, and

76

79

81

82 86

xx

Figures

meaning: Neural correlates of morphological complexity, 1464–1475, 2007. Reprinted by permission of MIT Press Journals (caption modified). 90 Figure 7. Cluster showing enhanced activation for first (unprimed) presentations of complex and pseudocomplex words relative to length-matched simple words from the form and meaning conditions in whole-brain analysis after SVC. Reprinted from Journal of Cognitive Neuroscience, 19(9), Bozic, Mirjana, William D. Marslen-Wilson, Emmanuel A. Stamatakis, Matthew H. Davis, and Lorraine K. Tyler, Differentiating morphology, form, and meaning: Neural correlates of morphological complexity, 1464–1475, 2007. Reprinted by permission of MIT Press Journals (caption modified). 91 Figure 8. Region of interest displaying a frequency by word-type interaction after removal of the effect of response time. Reprinted from Brain Research, 1373, Vannest, J., E. L. Newport, A. J. Newman, and D. Bavelier, Interplay between morphology and frequency in lexical access: The case of the base frequency effect, 144–159, 2012, with permission from Elsevier (caption modified). 99 Figure 9. Regions of interest that display a main effect of word-type (complex>simple): left inferior frontal gyrus and left superior temporal gyrus. Reprinted from Brain Research, 1373, Vannest, J., E. L. Newport, A. J. Newman, and D. Bavelier, Interplay between morphology and frequency in lexical access: The case of the base frequency effect, 144– 159, 2011, with permission from Elsevier (caption modified). 100 Figure 10. Parametric effects of SURFACE FREQUENCY on singleword reading. Reprinted from NeuroImage, 42 /3, Hauk, O., M. H. Davis, and F. Pulvermüller, Modulation of brain activity by multiple lexical and word form variables in visual word recognition: A parametric fMRI study, 1185– 1195, 2008, with permission from Elsevier (caption modified). 103

Figures

Figure 11. LOG SURFACE FREQUENCIES as a function of LOG RELATIVE FREQUENCIES for 216 English suffixed derivatives in three surface frequency bins. Figure 12. Schematic representation of a priming trial. Figure 13. Effect size for the single independent variable lOG RELATIVE FREQUENCY in a linear mixed effects-model fitted to log RTs in part-to-whole priming (left panel) and whole-to-part priming (right panel). Figure 14. Effect size for the single independent variable DERIVATIVE-BASE LETTER RATIO in a linear mixed-effects model fitted to log RTs in part-to-whole priming (left panel) and whole-to-part priming (right panel). Figure 15. Effect size for the single independent variable NUMBER OF PHONOGRAPHIC NEIGHBOURS OF BASE in a linear mixedeffects model fitted to log RTs in part-to-whole priming (left panel) and whole-to-part priming (right panel). Figure 16. Partial effects for the predictors LOG RELATIVE FREQUENCY (left) and LOG FAMILY FREQUENCY (right) in a linear mixed-effects model fitted to log RTs in part-towhole priming. Figure 17. Partial effects for the predictor LOG RELATIVE FREQUENCY in a linear mixed-effects model fitted to log RTs in whole-to-part priming (cf. Table 10). Figure 18. Partial effects for the predictor LOG FAMILY FREQUENCY in the same model (cf. Table 10). Figure 19. Partial effects for the predictors NUMBER OF SYLLABLES IN DERIVATIVE (left panel) and ORTHOGRAPHIC NEIGHBOURS OF BASE (right panel) in the same model (cf. Table 10). Figure 20. Effect sizes for different single independent variables in jumbled target priming, in comparison to the relevant effect sizes in whole-to-part priming. Figure 21. The relationship between token frequencies (expressed by different font widths) at different levels of language representation and part-whole connections (symbolized by arrows).

xxi

113 122

128

129

129

134

135 136

138

141

151

xxii

Figures

Figure 22. Activation for regions showing a negative correlation between BOLD signal and LOG RELATIVE FREQUENCY in the part-to-whole priming task projected onto the surface of the standard MNI brain. Figure 23. Cluster 1 around the triangular part of the left inferior frontal gyrus projected on sagittal (top left), coronal (top right) and axial (bottom) sections of the canonical MNI single-subject template. Figure 24. Cluster 2 around the left supplementary motor area rendered on three axial slices (separated by 3 mm) of the MNI individual template brain. Figure 25. Cluster 3 around the right insula superimposed on three axial slices (separated by 3 mm) of the MNI individual template brain (left = right hemisphere). Figure 26. Cluster 4 around the right cerebellum projected on sagittal (top left), coronal (top right) and axial (bottom) sections of the canonical MNI single-subject template. Figure 27. Activation for regions showing a positive correlation between BOLD signal and LOG RELATIVE FREQUENCY in the whole-to-part priming task projected onto the surface of the standard MNI brain. Figure 28. Cluster 1 around the left precentral gyrus projected on sagittal (top left), coronal (top right) and axial (bottom) sections of the canonical MNI single-subject template. Figure 29. Cluster 2 around the calcarine fissure rendered on sagittal (top left), coronal (top right) and axial (bottom) sections of the canonical MNI single-subject template. Figure 30. Overlap between the parametric effects of LOG RELATIVE FREQUENCY in the main conditions. Activation is projected onto the surface of the standard MNI brain. Figure 31. Cluster 1 around the triangular part of the left inferior frontal gyrus projected on axial (left) and coronal (right) sections of the canonical MNI single-subject template. Figure 32. Cluster 2 around the right supplementary motor area projected on sagittal (top left), coronal (top right) and

163

164

165

166

167

170

171

173

177

177

Figures

Figure 33.

Figure 34.

Figure 35.

Figure 36.

Figure 37.

Figure 38.

Figure 39.

axial (bottom) sections of the canonical MNI singlesubject template. Schematic representation of lexically specific constructions stored at different emergence layers in the associative memory network (revision of Figure 22). Cluster around the opercular part of the right inferior frontal gyrus projected on sagittal (left) and coronal (right) sections of the canonical MNI single-subject template. Human anatomical planes. The transverse plane corresponds to the axial plane. From: Human anatomical terms. (2012, June 13). In Wikipedia, The Free Encyclopedia. Retrieved 10:29, August 16, 2012, from http://en. wikipedia.org/wiki/File:Human_anatomy_planes.svg (caption mine). Lateral view on the four lobes of the cerebral cortex. Adapted from: Frontal lobe. (2011, April 11). In Wikipedia, The Free Encyclopedia. Retrieved 20:21, April 15, 2011, from http://en.wikipedia.org/w/index.php?title= Frontal_lobe&oldid=423570113 (caption mine). Two coronal slices of the human brain with the basal ganglia highlighted. From: Basal ganglia. (2011, June 1). In Wikipedia, The Free Encyclopedia. Retrieved 15:10, June 18, 2011, from http://en.wikipedia.org/w/index. php? title=Basal_ganglia&oldid=43208127 (caption modified). Lateral (top) and medial (bottom) surface of the brain with Brodmann’s areas numbered. From: Brodmann area. (2011, February 21). In Wikipedia, The Free Encyclopedia. Retrieved 20:24, April 15, 2011, from http://en.wikipedia.org/w/index.php?title=Brodmann_area &oldid=415121185 (caption modified). Gyri and sulci of the left cerebral hemisphere, lateral (top) and medial (bottom) view. From: Sulcus (neuroanatomy). (2011, March 19). In Wikipedia, The Free Encyclopedia. Retrieved 20:25, April 15, 2011, from http://en.wikipedia.

xxiii

178

202

210

255

256

256

257

xxiv

Figures

org/w/index.php?title=Sulcus_(neuroanatomy)&oldid=41 9700519 (caption mine). Figure 40. Some important structures of the human brain. From: Human brain. (2011, May 10). In Wikipedia, The Free Encyclopedia. Retrieved 09:22, May 11, 2011, from http://en.wikipedia.org/w/index.php?title=Human_brain& oldid=428494427 (caption modified). Figure 41. Coronal section of the human brain. From: Dentate gyrus. (2011, April 3). In Wikipedia, The Free Encyclopedia. Retrieved 09:57, May 11, 2011, from http://en.wikipedia. org/w/index.php?title=Dentate_gyrus &oldid=422131753 (caption modified). Figure 42. Diffusion tensor imaging image of a human brain featuring, among other neural pathways, the left and the right arcuate fasciculi. Image provided by Aaron G. Filler, MD, PhD. From: Arcuate fasciculus. (2011, May 5). In Wikipedia, The Free Encyclopedia. Retrieved 17:56, June 18, 2011, from http://en.wikipedia.org/w/ index.php?title= Arcuate_fasciculus&oldid=42749714 (caption modified).

258

259

260

261

Despite intensive research, the author has been unable to trace the copyright holders of some figures. Legitimate claims will be honoured in accordance with the usual terms.

Chapter 1 Introduction and overview

This work deals with the relationship between frequency in natural language use and the entrenchment of complex linguistic strings in the minds of language users. This broad issue can be broken down into two more specific overarching questions: First, is it possible to gain cognitively realistic insights into speakers’ linguistic knowledge from quantitative generalizations over corpus data? Second, and maybe more importantly, what exactly should be understood by entrenchment? In the present work, these two questions, which are crucial for assessing the status and scope of usage-based theories, will be subject to detailed critical evaluation and empirical investigation using a multi-method research design. Chapter 2 will show that the concept of entrenchment, which forms one of the backbones of usage-based approaches to language, is as powerful as it is problematic. At least part of the intuitive appeal of this concept rests on its explanatory breadth. Thus, in usage-based emergentism, highly entrenched multi-word chunks are the stepping stones to the formation of abstractions at different levels of analysis (language ontogeny, phylogeny, and diachrony). On the other hand, the distinctly usage-based claim that high token frequency results in the entrenchment of fully compositional form-meaning pairings is extremely controversial. One major set of problems relates to the vagueness of usage-based statements. To give an example, although usage-based linguists commonly assume that mental entrenchment affects several cognitive dimensions (for example, mental autonomy, representational strength, chunk status, and ease of processing), they have been silent about how these defining features are supposed to be related. An even more serious concern relates to the unspecific epistemological scope of usage-based statements, which tend to collapse various domains of generalization into a single representation, the constructional taxonomy. It will be argued that the underlying (and largely implicit) assumption of an isomorphism between different levels of generalization gives rise to considerable theoretical confusion and, at worst, unsubstantiated statements on the mental representation of language. Chapter 3 will discuss whether corpus-extracted token frequencies can reasonably be expected to correlate with mental entrenchment, as posited by the so-called ‘corpus-to-cognition principle’. On the one hand, it will be argued that a strong version of this principle rests on a number of tacit as-

2

Introduction and overview

2

sumptions which are highly questionable and give rise to serious theoryinternal inconsistencies. On the other hand, a review of different kinds of findings from the literature will provide reason for cautious optimism with regard to the potential of corpus data to inform us about entrenchment in the mind. Thus, research in the field of neuroplasticity has shown that the brain gets modified as a function of the frequency of experiences throughout life. Likewise, different kinds of neuro- and psycholinguistic findings suggest, first, that high- and low-frequency items have a different status in the mind and, second, that the phenomenon of linguistic chunking exists in the first place. However, an essential missing link in the literature is empirical studies on the relationship between usage frequency and mental entrenchment. The chapter will conclude by highlighting the research questions to be explored in the rest of this work. Chapter 4 will set the stage for the experimental part of this study. One major aim will be to argue that the phenomena under consideration require a multi-method approach, and to provide the unfamiliar reader with the relevant background knowledge on neuroimaging and psycholinguistic experimental paradigms. Another purpose will be to show that semantically transparent bimorphemic derivatives (e.g., speakable, government) provide an ideal testing ground for frequency-related entrenchment. The most important goal, however, will be to develop a comprehensive entrenchment operationalization, which is notoriously difficult. It will be suggested that insights and methods from different lines of research (notably Gestalt psychology, emergentist philosophy, psycholinguistic priming research, and parametric fMRI analysis) can be fruitfully combined to develop a series of promising experimental paradigms. Chapter 5 will introduce the design for each of the five experimental conditions. The first part of this chapter will be devoted to presenting the stimuli for each condition, while the second part will provide a detailed description of the relevant experimental procedures. Chapter 6 will subject the behavioural performance data from each experiment to statistical analysis, before drawing some initial conclusions. Overall, it will be suggested that usage-based linguists would be welladvised to adopt a more nuanced conception of mental entrenchment and its relationship to usage data. The chapter will conclude by sketching the basic contours of a revised model of entrenchment, before outlining outstanding questions that can only be answered by looking at the neural substrates of entrenchment-related behavioural variation. Chapter 7 will seek to answer these questions through fMRI data analysis and interpretation. It will be argued that entrenchment is best accounted for in terms of an associative memory network which is supplemented by a

3

Introduction and overview

3

highly unspecific parsing mechanism. The associative network component must be seen as the path of least effort which is used as a default. It can be described as a structured inventory of interconnected exemplars of language use and abstractions over them. The computationally more costly parsing component only intervenes in cases of creative language use, when language users are required to break down or ‘glue together’ cognitive units. The closing chapter will offer a summary of this work, before exploring more far-reaching theoretical implications of my findings for a model of language and the corpus-to-cognition principle. It is hoped that this discussion will also make a positive contribution to other ongoing debates in the field concerning, among other things, questions of cognitive gradience or discreteness, the nature of entries in the mental lexicon, the cognitive reality of morphemes, and the relationship between representation and processing. Finally, some promising paths towards a more sophisticated understanding of the intricate relationships between corpus data and cognitive entrenchment will be sketched.

Chapter 2 Entrenchment in usage-based theories

Entrenchment of lexically specific strings, as a result of high frequency, is of considerable theoretical importance in linguistic theory. (Lieven and Tomasello 2008: 176)

Over the last few years, entrenchment has become a buzzword in usagebased cognitive and constructionist accounts of language alike. To provide a frame of reference to the first chapters of this book, let us start by adopting a relatively uncontroversial – and therefore necessarily somewhat vague – working definition of entrenchment. According to this definition, entrenchment denotes the strength or autonomy of representation of a formmeaning pairing at a given level of abstraction in the cognitive system. For example, it is probably a fair guess that in the brains of nonlinguists, the sentence colorless green ideas sleep furiously will not be entrenched in its concrete, lexical form, for the obvious reason that people will never have encountered this expression. However, it seems plausible to assume that native speakers of English will have a relatively strong mental representation for the component words of this expression, as well as for the abstract grammatical structure that was used to put them together. Conversely, the hedge it is probably a fair guess is likely to be separately represented as a ready-made chunk in the brains of most writers of academic papers. This example also illustrates another crucial defining feature of entrenchment – the idea that autonomous representations will be holistic rather than (merely) componential. Thus, according to De Smet and Cuyckens (2007: 188), a highly entrenched unit “represents an automated, routinized chunk of language that is stored and activated by the language user as a whole, rather than ‘creatively’ assembled on the spot.” This working definition will be refined, operationalized, and elaborated upon in later chapters. However, to avoid any misunderstanding, it must be emphasized at the outset that according to usage-based models, the existence of a holistic representation for a given complex expression does not necessarily preclude the simultaneous existence of a representation for its component parts. As will become clear from the next section, in many cases, the usage-based argumentation even explicitly requires the co-existence of multiple levels of representation, for example, when the representation of the whole expression determines how the individual parts are handled in compositional processing.

5

Entrenchment in usage-based theories

5

Although usage-based approaches come in a wide variety of shades and have focused on a number of different fields (cf. sections 2.1 through 2.6), they all share the fundamental constructivist tenet that language structure is not fixed and a priori, but rather emerges from and is continuously shaped by language use (Barlow and Kemmer 2000; Evans and Green 2006: ch. 4; Fried and Östman 2004: 23-24). This broad principle encompasses the more specific idea that abstract, hierarchically structured language representations are not there from the outset, but rather arise bottom-up from sequential experience with concrete utterances – a proposal commonly referred to as ‘emergentism’ (cf. Bates and Goodman 1997, 1998; Bybee and Hopper 2001; Croft 2001; Hopper 1988a, 1988b; Langacker 1988, 2000, 2010; MacWhinney 1999; O’Grady 2005, 2008a, 2008b; Tomasello 2003). The present chapter aims to demonstrate that highly entrenched compositional sequences form one of the backbones of emergentist accounts at different levels of theoretical description and explanation. Special attention will be devoted to the assumed relationship between frequency in language use and entrenchment, which is the focus of this work for several reasons. While the notion of frequency-related entrenchment is both central to usage-based model-building and intuitively appealing for its explanatory breadth, it is not without problems. One set of problems is related to its inherent vagueness. Thus, although usage-based linguists generally subscribe to the idea that entrenchment is somehow related to the autonomy, strength and unity of representation of a linguistic string, accounts of the precise interrelationships between these defining features of entrenchment are still lacking. A much more serious and general issue, which will be argued to be symptomatic of usage-based generalizations per se, concerns the epistemological status of usage-based statements: What is being represented in the first place? Or, to put it in other words, who or what is considered to be the locus of usage-based representations? The mind or brain of each individual language user? The scholarly works of usage-based linguists? A subjectexternal, objective principle guiding the emergence and dynamics of language in ontogeny, phylogeny and diachrony? Or maybe even all of this at the same time? This chapter will demonstrate that the issue is anything but clear, resulting in considerable theoretical confusion, and, at worst, unsubstantiated statements on the mental representation of language. These and related challenges, which make the notion of frequency-related entrenchment an exciting test case for a number of contentious linguistic issues, will also be further explored in chapter 3.

6

Usage-based constructionist approachess

2.1.

6

Entrenchment in usage-based constructionist approaches

Both specific expressions and abstracted schemas are capable of being entrenched psychologically and conventionalized in a speech community, in which case they constitute established linguistic units. (Langacker 2009: 2)

According to usage-based cognitive and constructionist approaches (e.g., Bybee 2006, 2007; Croft 2001; Goldberg 1995, 2006; Langacker 1987, 2008; Tomasello 2003), constructions are all-pervasive in language. Constructions can be defined as learned units of storage and processing which pair (lexical, grammatical or phonological) form and (semantic, pragmatic and discourse) function (cf. Croft 2001: 18; Goldberg 2006: 5, 68; Langacker 1991: 16). This section will give a brief overview of constructions, highlight the pivotal role that they play in usage-based grammars and show that they are inherently entrenched (for overviews of usage-based construction grammars, see Croft and Cruse 2004; Fried and Östman 2004; Goldberg 2006). Across different frameworks of linguistics, it is uncontroversial that any sequence of language which is not entirely derivable from its constituent morphemes and highly general rules must be acquired inductively on the basis of input and remembered in its specific form. Moreover, the sequence as a whole will be associated with a meaning – in other words, it will be entrenched as a holistic chunk (Croft and Cruse 2004: 183-184, 252; Goldberg 1995: 13, 153, 179, 199; Marslen-Wilson 2007: 177). Thus, the oft-cited and highly peculiar idiomatic expression kick the bucket must constitute a distinct chunk because its overall meaning is not predictable from the meanings of its individual constituents and general compositional rules. The same goes for kith and kin, which is lexically odd as it contains words which are not familiar in other combinations. Other constructions are unique with respect to their syntax, like the idioms all of a sudden and by and large, which contain constituents that do not fulfil their usual grammatical function. All these sequences are non-compositional in that they can neither be understood nor produced correctly unless learned from the input and entrenched in their specific, unitary form (Croft 2001: 15-17; 25; cf. also Goldberg 1995: 4-6, 220). By virtue of being units pairing form and function, non-compositional expressions are prime examples of constructions. Crucially, usage-based construction grammars claim that idiosyncrasy and, by extension, unit and construction status, may also apply to compositional sentences. Thus, Croft (2001) argues that many highly idiosyncratic units pairing of form and function are actually fully compatible with the

7

Entrenchment in usage-based theories

7

most general compositional rules of a language – it is just that they are associated with some extra specifications which require entrenchment in a concrete format (for relevant psycholinguistic work, see Cacciari and Tabossi 1988 and Gibbs, Nayak, and Cutting 1989). An example of a construction which can be considered as a chunk that is both quirky and compositional would be the sequence pop the question. Croft maintains that whenever the specific constituents that make up this construction occur together, they are conventionally and homomorphically associated with a particular metaphorical meaning (Croft 2001: ch. 5.2.; Croft and Cruse 2004: ch. 9.4.; Nunberg, Sag, and Wasow 1994: 497-505). Thus, in the specific context of the construction pop the question, pop metaphorically refers to ‘suddenly ask’ and the question refers to ‘a marriage proposal’. The meaning of the overall construction can then be described as a function of the individual (metaphoric) meanings of the constituents and general rules of composition (Croft 2001: 180). Likewise, on this account, the construction spill the beans is not opaque since it can be decomposed into spill (meaning ‘divulge’ or ‘tell’) and beans (referring to ‘secret’ or ‘information’). Strong evidence for the fact that the constituents of such ‘idiomatically combining expressions’ have individual, identifiable meanings comes from the finding that modifying individual pieces of such constructions also modifies part of their overall meaning (Croft 2001: 182). This is illustrated by the following quotation from a New York Times article: “Where there’s smoke there’s fire, and where there is a lot of smoke, like the destruction of documents, there is a lot of fire. This is really beginning to look like a fraud scenario” (Van Lancker Sidtis 2004a: 14; citing Norris 2002). The fact that idiomatically combining expressions are compositional also explains why they are usually understood even when they are unknown (Nunberg, Sag, and Wasow 1994: 495). Importantly, usage-based grammarians claim that idiosyncrasy may also exist at schematic levels of language representation. A schema is defined as a more or less abstract characterization which captures the commonalities between the concrete instances that it subsumes and thereby imposes constraints on new utterances (Langacker 1987: 492). For example, the schema have a V1 (e.g., have a drink) must be stored holistically for several reasons: First, it is governed by idiom-internal constraints which restrict the range of fillers that can be slotted into V (e.g., *have an eat). Second, it is 1. In the following, elements which allow for variation will be indicated by selfexplanatory abstract symbols.

8

Usage-based constructionist approachess

8

associated with a unique meaning – more precisely, it refers to a repeatable action which is limited in time, lacks an external goal and is of benefit to the agent. At the same time, it can be modified to some degree (have a drink/ run/ swim/ jog/ lie-down …) without losing its characteristic semantics. As a consequence, it must be entrenched at a less specific representation level than the above-mentioned lexically explicit idiomatically combining expressions (Croft and Cruse 2004: 243; Wierzbicka 1982). In cognitive construction grammars, the degree of schematicity at which a given construction is represented is thought to depend on the number and scope of ‘open slots’ that it contains. An open slot can be defined as a construction-internal paradigmatic category. The scope of a slot is determined by its type frequency, which refers to the number of distinct fillers that it covers (cf. section 2.2). Another critical factor is the degree of similarity of attested slot-fillers: The less similar the relevant items are, the more likely it is that a highly general and productive category will be formed (Croft 2001: 28; Behrens 2009; Bybee 2010: 9). There are many examples of entrenched chunks which are even more general than have a V. For example, there is a construction which consists of open slots for all content words besides the connective let alone (Fillmore, Kay, and O’Connor 1988): (1) (2)

She gave me more money than I could carry, let alone eat. (Croft and Cruse 2004: 248) Only a linguist would buy that book, let alone read it. (Croft and Cruse 2004: 248)

The most general constructions are those in which all elements are lexically open. A case in point is the resultative construction, which can be schematically represented as NP Verb NP XP: (3) (4) (5) (6)

The river froze solid. (Goldberg 1995: 181) The tools were wiped clean. (Goldberg 1995: 181) *He watched the TV broken. (Goldberg 1995: 181) *The hammer pounded the metal flat. (Goldberg 1995: 193)

As becomes evident from examples (5) and (6), even the most schematic constructions are subject to pattern-internal constraints which depend on the relevant construction as a whole (for an in-depth study, see Goldberg 1995: ch. 8). Like idiomatically combining expressions, fully general constructions are symbolic units in that they possess a coherent meaning which they

9

Entrenchment in usage-based theories

9

transfer to their individual lexical constituents. In sentences like Chris baked Caroline a cake, for example, we see a verb integrated in a construction which is not typical of it (since bake normally does not provide a slot for a recipient role nor entail ‘intended transfer of possession’). It is implausible, however, to claim that the verb bake is polysemous, with ‘intended transfer’ as one of its meanings. Construction grammarians therefore claim that the ditransitive construction Sbj V Obj1 Obj2 itself carries a meaning of intended transfer, which it imparts onto bake (Goldberg 1995). All in all, usage-based construction grammarians argue that fully schematic linguistic patterns are not qualitatively different from idiomatically combining expressions, with the only difference residing in the range of applications of their extra specifications. Both extremes are compositional but must be holistically represented, since they are – in their entirety – associated with a meaning or function which cannot be fully derived from their individual component parts. This is particularly evident in cases where the construction as a whole exerts top-down pressure on the potential fillers of its open slots or where the understanding of the meaning of the parts is contingent upon prior understanding of the meaning of whole (a phenomenon often called ‘coercion’, cf. Lauwers and Willems 2011). This leads to the usage-based proposal that the totality of our knowledge of language is captured by an inheritance hierarchy whose nodes represent constructions that vary along the parameter of schematicity. Broad generalizations (for instance, those concerning word order facts) are represented by constructions at the top of the hierarchy and are inherited by many other constructions. Subregularities (e.g., partially lexically filled schemas) are captured by constructions at various midpoints of the continuum. Single morphemes and fully idiosyncratic chunks of language are represented at the bottom (Croft 2001: 183-184, 501; Goldberg 1995: 73, 119; Langacker 1987: 63-66). The fact that idiomatically combining expressions are regarded as occupying the most specific pole on a cline of generality along which all compositional-holistic constructions of a language vary has generally been considered as tantamount to dispensing with the traditionally assumed sharp cut-off point between the idiosyncratic and the regular (Croft 2001: 183184; Bybee 2007: 279). I would rather argue that the constructionist idea that compositionality and idiosyncrasy are not mutually exclusive merely shifts the dividing line between the regular (i.e., the compositional-holistic, e.g., spill the beans) and the irregular (the non-compositional-holistic, e.g., kick the bucket). This book will be concerned with the compositionalholistic kind of expressions, and one of the questions to be addressed is

10

Language representation and processing

10

whether there are other factors – besides idiosyncrasy – which motivate their representation at a lexically specific level. This section has shown that according to usage-based grammarians, the totality of our knowledge of language is made up of constructions which – by virtue of their idiosyncrasy – are inherently entrenched, since they are associated with a mental representation which is both strong and unitary. Importantly, however, this does not preclude their being compositionally analyzable. Our knowledge of language can thus be said to consist of nodes of entrenched chunks at varying degrees of schematicity and complexity. Another point worth mentioning is that idiomatically combining expressions have a different status across linguistic theories. Thus, while constructionists underscore their importance and actually use them as a springboard from which they derive their whole theory (including more abstract constructions and the construction hierarchy), most traditional approaches deal with them under the heading of ‘idioms’ and relegate them to the ‘peripheral’ fields of sociolinguistics and pragmatics (Croft 2001: 15; Stefanowitsch and Gries 2003: 110). A further distinctly usage-based claim which is even more contentious is the idea that besides idiosyncrasy, multiword units may also be entrenched for reasons of usage, and in particular frequency, an assumption that will be explored in the next section. 2.2.

Entrenchment and usage-based assumptions about language representation and processing

Frequency of occurrence leads to entrenchment and the independent representation of even “regular” constructional patterns. (Ellis 2002b: 320)

The last section argued that a central defining criterion for constructions is unitary and independent representation, with higher degrees of idiosyncrasy leading to more concrete levels of entrenchment. This section will demonstrate that under a usage-based view, the picture is actually somewhat more complex, as entrenchment in the mind is also assumed to be conditioned by usage factors – an idea which follows naturally from the tenet that the “mental representation of language is shaped by language use” (Bybee 2007: 331). This section will serve as a brief introduction to this central assumption, leaving necessary elaboration to subsequent chapters. In usage-based models, it is generally acknowledged that high token frequencies in language use lead to entrenchment in the minds of speakers (e.g., Bybee 2003a: 617; Croft 2008: 52; Goldberg 2006: 93, 317; Lieven and Tomasello 2008: 174). The notion of token frequency refers to the

11

Entrenchment in usage-based theories

11

frequency of occurrence of a specific constructional instantiation. For example, in a hypothetical newspaper article, the token frequency of the derivative government, seen as an instance of the English derivational X-ment construction, could amount to five.2 This would mean that the specific string government occurs five times in this article, either on its own or with further inflectional material added (e.g., government, governments). Usage-based linguists now crucially assume that word forms or phrases which are theoretically derivable from higher-level schemas and morphemic items (such as talked, the problem is or I don’t think so), can be represented as lexically specific units at the bottom of the constructional hierarchy, provided that they are of sufficient token frequency (Bybee and Schreibmann 1999; Bybee 2007: 283; Croft and Cruse 2004: 292). Bybee (2007: 279) even goes so far as to claim that the exploitation of ready-made item-based chunks represents the dominant mode of using language – an assumption which ties in with recent claims from corpus-linguistic research (cf. section 2.6). It is interesting to note that although the term ‘entrenchment’ arguably goes back to Langacker (1987), the idea of frequencydriven chunking is much older. Thus, Ferdinand de Saussure (1959: 177) claimed that “when a compound concept is expressed by a succession of very common significant units, the mind gives up analysis – it takes a short-cut – and applies the concept to the whole cluster of signs, which then become a simple unit” (for an overview of the history of the idea of linguistic chunks, cf. Wray 2002: 8). At first sight, the assumption that frequently occurring sequences should be holistically represented seems questionable, as, of course, the storage of concrete exemplars rather than productive schemas plus single morphemes presumably minimizes storing parsimony. But this disadvantage is supposed to be outweighed by gains in computing parsimony, fluency and rapidity: When you do not have to put together every utterance from scratch, you need a minimum of on-line processing. As a consequence, you can reserve your main energies for other kinds of concomitant activity, for example the larger structure of discourse or idea generation and interpretation (Croft 2001: 28; Langacker 1987: 57-60; Tomasello 2003: 98, 306; Paul [1880] 1995: 53). 2. In the following, the open slots of derivational constructions will be indicated by arbitrarily selected capital letters. Each of these construction-internal paradigmatic variables is, of course, governed by more or less restrictive constraints. As the description and discussion of these constraints clearly goes beyond the scope of the present work, it was decided to use maximally noncommittal symbols.

12

Language representation and processing

12

In the usage-based framework, token frequency is thought to interact with type frequency, which refers to the number of distinct kinds of tokens instantiating a given construction. To continue our above example, if the newspaper article contained the items government, settlement, and amazement, it would have to be analyzed as exhibiting three types of the English derivational X-ment construction, regardless of the actual number of tokens subsumed by each type. Type frequency is assumed to determine the scope of constructional abstractions, in interaction with other factors such as the functional similarity of attested slot-fillers (cf. section 2.1). For the sake of completeness, let us mention that it is, in principle, perfectly possible to consider type and token frequencies at different levels of abstraction in the constructional hierarchy. At a more schematic level than above, one could, for instance, count distinct types of derivational constructions (e.g., the X-ment construction, the un-Y construction, and the Z-hood construction). The notion of token frequency would then refer to the sum of all items containing a given suffix (e.g., the number of occurrences of the morpheme -ment across all kinds of derivatives). Conversely, at a more concrete level, one could classify different ways of phonetically realizing government into types, with each type featuring a relevant number of tokens. Note that in the following, the notion of token frequency will be used to refer to an intermediate level of abstraction. More specifically, we will count lexically specific sequences of language, but ignore features characterizing concrete instances of language use such as context and phonetic realization. The rationale for this decision is that this is the most concrete level which is readily accessible through large-scale corpora. At this point, let us take stock of what we know about entrenchment so far. According to the definition given in the introduction to this chapter, high levels of entrenchment are characterized by a storage format which is strong, autonomous and holistic. This immediately prompts the question of how these defining features of entrenchment are related. Section 2.1 suggested that the holistic storage format which results from constructional idiosyncrasy entails strong memory representation (as weakly represented idiosyncrasies would arguably not survive in the system, cf. section 2.5). However, the present section would rather seem to support the opposite view that strong levels of representation at some point give rise to holistic storage. The exact relationship between gradual differences in representational strength and chunking will be further examined in subsequent chapters. In this discussion, special focus will be placed on concrete constructions, which form the backbone of emergentist accounts at different levels of generalization, as will be shown in the next three sections.

13

Entrenchment in usage-based theories

2.3.

Entrenchment in usage-based theories of language acquisition

13

That early child language is (partially) formulaic and item-based is one of the cornerstones of usage-based acquisition theories. (Behrens 2009: 393)

Usage-based theories of language acquisition deny the existence of innate abstract representations like rules and categories, and claim that knowledge of language is built up inductively from concrete instances of language use experienced in the input (Croft and Cruse 2004: 323). Abstractions are thought to arise gradually from stored pieces of language on the basis of a variety of domain-general cognitive abilities (memory, attention, Gestalt perception), learning mechanisms (distributional analysis, categorization, schematization, analogical generalization, to mention only the most relevant ones) and social interactive skills (most prominently intention-reading, perspective-taking and imitative learning) (cf. Arnon and Snider 2010; Dąbrowksa 2004; Lieven and Tomasello 2008; Tomasello 2003). In an in-depth account based on empirical data, Tomasello (2003) demonstrates that a bottom-up reading of construction hierarchy provides us with an adequate schema of the inductive, incremental fashion in which children acquire their first language. More specifically, he shows that children’s linguistic output starts from concrete, unparsed units (so-called ‘holophrastic expressions’) which are associated with a functionally coherent, yet undifferentiated communicative intention (e.g., here-ya-go, scaredof-that, Iwannadoit) – that is, with entrenched expressions located at the very bottom of the hierarchy. Then, beginning at about 18-24 months of age, children begin producing multi-word patterns that may best be characterized as consisting of a mixture of specific words and open slots which allow for a restricted range of variation. In the schema throw X, for example, the slot X can be thought of as something like the class of ‘throwable things’, i.e., as a low-scope construction-internal paradigmatic category. Obviously, at this stage, the first low-level abstractions have emerged, and children have moved up in the constructional hierarchy. As the relevant patterns usually feature a verb as their fixed core (e.g., throw X or give X) and children presumably do not see any structural relationship between them, they are often referred to as ‘verb island constructions’. With increasing linguistic experience, children create new open slots and extend the scope of existing ones. They also realize that substrings of chunks are constructions in their own right and may be inserted into other structures to generate novel sequences. Moreover, they come to build up connections between complex constructions which are lexically different,

14

Language acquisition

14

but functionally and formally similar, thus constructing their first higherlevel schemas. Finally, during the preschool years, analogies based on relatively abstract relational similarities allow children to produce constructions with more global categories like ‘transitive subject’ and ‘ditransitive recipient’. The most general categories such as ‘subject’, ‘object’ or ‘noun’ and ‘verb’ are simply emergent phenomena in this analogy-making process (Brooks and Tomasello 1999; Goldberg, Casenhiser, and Sethuraman 2004). Importantly, the process of language acquisition is assumed to depend “crucially on the type and token frequency with which certain structures appear in the input” (Tomasello 2003: 327; cf. also Tomasello 2003: 161; Ambridge, Pine, and Rowland 2012; Ambridge et al. 2012). Thus, certain transparent sequences will remain in a child’s language system as holistic chunks, if they are of sufficient token frequency. On the other hand, the higher the type frequency of a construction, the more abstract the generalizations that will be made. A similar point has been made by Eskildsen and Cadierno (2007) with regard to second language acquisition. In a longitudinal case study focusing on the emergence of negation patterns in a Mexican learner of English, they find that ready-made multi-word expressions like I don’t know give rise to increasingly abstract and flexible constructional schemas as a function of type frequency. Interestingly, the string I don’t know itself seems to remain in the subject’s language system as a concrete chunk, which Eskildsen and Cadierno attribute to its high token frequency (for similar findings in L2 learners of Japanese, cf. Sugaya and Shirai 2009). This section has shown that in the usage-based approach, lexically specific chunks are the starting point for language acquisition in multiple ways. First, they represent a store of rote-learned linguistic data from which children gradually extract parts and construct abstract representations via different cognitive mechanisms. Second, even if compositionally analyzable, some of these rote-learned fragments will remain in the cognitive system as chunks, if they are of high token frequency. In other words, this means that all of the children’s knowledge of language, be it abstract or concrete, is grounded in and emerges from lexically explicit, holistically represented language sequences in the input (cf. Bannard and Lieven 2009; Peters 2009). The next section will show that usage-based grammarians do not only assume that usage shapes language ontogeny – entrenched chunks are supposed to provide the raw material for phylogenetic language development.

15

Entrenchment in usage-based theories

2.4.

Entrenchment in emergentist theories of language phylogeny

15

A good case can be made that formulaicity is responsible for the very existence of grammar, for the emergence of grammatical constructions. (Pawley 2007: 31)

Recently, evolutionary linguists have started exploring the idea that language could have emerged over the past two million years from a holistic protolanguage entirely made up of concrete, unanalyzed chunks (Arbib 2005; Fitch 2005; Wray 1998, 2005; cf. Tallerman 2007, 2008 for a

thought-provoking critique of this approach, and Smith 2008 for a very balanced reply). Under this view, early hominid populations communicated relatively complex meanings by means of totally agrammatical and non-compositional utterances which performed different types of interpersonal and social functions (notably grooming and requests). Early hominid communication would thus have been similar to that of most other animals (Reiss 1989). Under the holistic scenario, over time, smaller form-meaning pairings such as morphemes as well as structural templates constraining their combination came to be extracted from larger wholes through a process variously referred to as ‘fractionation’, ‘segmentation’ or ‘analysis’. More specifically, Wray (1998: 55-57), who draws on usage-based theories of first language acquisition to gain insights into language evolution, suggests that the first words emerged when early hominids noticed that phonetic overlaps between subparts of chunks coincided with functional similarities between chunks. She goes on to illustrate how the segmentation process might first have occurred on the basis of the following hypothetical phonetic example sequences and their associated imaginary meanings in English: (7) (8) (9)

/mѓbita/ ‘give her the food’ (Wray 1998: 55) /ikatubѓ/ ‘give me the food’ (Wray 1998: 55) /kamѓti/ ‘give her the stone’ (Wray 1998: 55)

According to Wray, analyzers will at some point notice that (7) and (9) share both a form (the substring /m‫ܭ‬/) and a function (a singular female recipient). This will lead them to segment out a morpheme that captures this regularity, with the remainder of the sequences continuing to carry the remainder of their meanings. Subsequently, analyzers might compare (8) and (9) and realize that they share both the substring /ka/ and the meaning component ‘give’, thereby isolating a new morpheme.

16

Language phylogeny

16

The problem is that analyzers will also encounter lots of counterexamples to their emerging generalizations, since in Wray’s framework, the meanings of holistic sequences are totally arbitrary and recurring formmeaning coincidences occur only by chance. For example, although sentence (7) has ‘give’ as a meaning component, /ka/ does not surface as a formal element. Wray hypothesizes that in such cases, hypercorrection will take place, for example by altering /m‫ܭ‬bita/ to /m‫ܭ‬bika/, in order to make the system more consistent (since /ka/ will then mean ‘give’ across the board). Wray suggests that the emergence of compositional languages from a holistic protolanguage was gradual, with subsequent generations using an increasingly great share of analytical as opposed to holistic utterances and transmitting this knowledge to their children. This section has shown that emergentist theories of phylogenetic language evolution maintain that grammar emerged gradually and dynamically from holistic chunks that were broken down and assigned compositional structure over generations of increasingly creative speakers. It is interesting to note that the transition from holistic protolanguage to compositionally structured language has been successfully simulated in diverse mathematical and computational models under a range of different key assumptions concerning language learning, transmission conditions (vertical or horizontal), and the grounding of meaning in language use (Batali 2002; Vogt 2005; for an overview, cf. Kirby, Smith, and Brighton 2004). At first sight, holistic theories of the genesis of language stand in marked opposition to received theories, which hold that complex languages arose from single words which came to be combined through syntax (e.g., Bickerton 1990, 2003; Jackendoff 1999b; for an overview of different phylogenetic models of language evolution, cf. Fitch 2010). As an aside, it would be interesting to explore the extent to which the major point of contention between holistic and word-based scenarios really lies in diverging assumptions about the initial stage. At least from a usage-based point of view, the areas of disagreement are much more restricted than initial appearances suggest, since after all single words also represent opaque holistic chunks at the bottom of the structural hierarchy. So why not assume that some of these chunks were reanalyzed as being built up from smaller components, while others became available for insertion into emerging grammar structures without prior segmentation? Although this aspect would certainly deserve further consideration, we will not dwell on it here. Instead, the next section will be devoted to a topic which is of much more immediate relevance to the present study, the crucial role that highly entrenched multi-word sequences are assumed to play in language change.

17

Entrenchment in usage-based theories

2.5.

Entrenchment in usage-based theories of language change

17

[The] processes of grammaticalization and syntacticization can actually create grammatical structures out of concrete utterances. (Tomasello 2003: 13)

Recent usage-based theories of language diachrony have increasingly conceived of language change as movements along the hierarchical network of constructions (see, for example, Fried 2008; Noël 2007; Bisang, Himmelmann, and Wiemer 2004). Thus, constructions have been described as the context in which grammaticalization takes place (e.g., Booij 2008; Breban 2008; Heine 2003; Himmelmann 2004). Likewise, it has been suggested that more schematic constructions can function as analogical attractors for more item-based constructions, and that new constructions can emerge through language use (e.g., Traugott 2008a, b). Crucially, usage-based historical linguists seem to be unanimous in acknowledging that the locus of innovation is lexically filled constructions at the bottom of the constructional hierarchy (Traugott 2008c), and that token frequencies play a critical role as a motivating factor in language change. Traugott (2008a, b), who explicitly attempts to integrate grammaticalization and construction grammar, proposes a simplified version of the constructional network. Her taxonomy involves four levels of abstraction. The lowest level, which corresponds to phonetically explicit utterances like a lot in She’s a lot happier, are referred to as ‘constructs’ (all examples in this section hers). Slightly more schematic are ‘micro-constructions’, which represent abstractions across attested constructs (e.g., the a lot-construction with an open slot in the position of happier). At the next higher level are ‘meso-constructions’, which subsume sets of similarly behaving constructions such as a bit of and a lot of, both of which can drop of to function as adverbs preceding adjectives. At the top of the hierarchy are so-called ‘macro-constructions’, which represent the highest degree of abstraction at which the construct at hand can be discussed. As an example, Traugott (2008a) cites the grammaticalization of partitive a lot into a degree modifier. In its partitive use, a lot functions as a nominal head and refers to ‘a subset of items for sale’, as in the expression a lot of books when taken in its most concrete meaning. By contrast, in its degree modifier use, a lot modifies a nominal head and means something like ‘much’ or ‘many’ (as in a lot of pride). More specifically, Traugott (2008a) shows how a lot, which derives from Old English hlot ‘a share of’ gradually develops into a micro-construction which is governed by lowscope constraints on co-occurring nouns. Over time, a lot gets reanalyzed

18

Language change

18

as an adverb and is partially aligned to an existing degree modifier mesoconstruction (e.g., a lot wiser, I like him a lot). This leads to the expansion and further grammaticalization of the sanctioning macro-level construction, which becomes even more productive and extends to other degree modifiers such as very or pretty (as in It’s very fun/ pretty cowboy). This diachronic, bottom-up emergence of schematic constructions from concrete usage events through small-scale generalizations and progressive realignments is nicely summarized in Traugott (2008b: 26): When new sub-schemas develop, they may extend the boundaries of the relevant constructional macro-schema itself. This macro-schema can be emergent in the sense that it is the logical consequence of changes at the level of sub-schemas, and that it is less constrained than before the changes in the sub-schemas appeared. From a construction grammar perspective, grammaticalisation can be seen as the conventionalisation of more and more constructs into constructions, i.e. types or schemas with similarities which allow alignment with even higher-level constructions, as well as the gradual increase in generality, salience and non-compositionality. [translation mine]3

Trousdale (2008: 55) comes to a remarkably similar conclusion: In the process of grammaticalization, as new constructs emerge in language use through analogy with other constructs, new micro-constructions emerge; what were constructs at time t1 become micro-constructions at time t2; what were micro-constructions at t1 become meso-constructions at t2, and so on, resulting in yet further schematicity of the macro-construction.

Importantly, it has variously been suggested that in the process of grammaticalization, frequency plays a particularly prominent role. This 3. This is the original quote in German: “Indem sich neue Subschemata herausbilden, können sich die Grenzen des konstruktionalen Schemas auf der Makroebene selbst verschieben. Dieses Makroschema kann emergent sein in dem Sinne, dass es die logische Folge von Veränderungen auf der Ebene der Subschemata darstellt, und indem es den Status eines weniger beschränkten Schemas erreicht als zu dem Zeitpunkt, bevor sich die Veränderungen in den Untertypen einstellten. Aus konstruktionsgrammatischer Perspektive kann Grammatikalisierung als die Konventionalisierung von mehr und mehr Konstrukten als Konstruktionen betrachtet werden, d. h. als Typen oder Muster mit Gemeinsamkeiten, die eine Zuordnung zu höherstufigen Konstruktionen erlauben, sowie einen graduellen Zuwachs an Allgemeinheit, Salienz und NichtKompositionalität erfahren.”

19

Entrenchment in usage-based theories

19

becomes obvious from the emphasis on host-class expansion as a central defining feature of grammaticalization (Himmelmann 2004). Host-class expansion refers to the notion that a grammaticalizing construction will increase in type frequency, as can be seen from a lot, which has expanded its original host class from count nouns to mass nouns and comparative adjectives. Even more relevant for the present purposes is the fact that the lexically explicit parts of grammaticalizing constructions exhibit increases in token frequency, which are believed to go hand in hand with increased levels of entrenchment at a concrete level (cf. Bybee 2003a). These frequency-related phenomena, which have been ascribed to the fact that grammaticalizing constructions develop new polysemies (cf. Bybee 2007: 338; Kortmann and König 1992: 680), are thought to engender a loss of compositionality. Thus, in grammaticalized a lot of, a is no longer construed as being a determiner, and therefore does no longer lend itself to substitution by other determiners. Two other kinds of diachronic change which have been attributed to high token frequencies at a lexically explicit level are phonetic reduction and the loss of internal structure. Both phenomena are exemplified by the development of the multi-word blessing God be with you into the monomorphemic farewell greeting goodbye (Bybee 2007: 11; see also Croft 2008; Krug 1998). Likewise, Kortmann (1997: ch. 6.2) shows that in English, French, German and Spanish, the most frequently used adverbial subordinators also tend to be the shortest and morphologically least complex ones. This general tendency is evident in French, where monomorphemic quand ‘when’ is more frequent than bi-morphemic quoique ‘although’, which is in turn more frequent than multi-word jusqu’à ce que ‘until’. Another effect which seems to be conditioned by high token frequencies is the so-called ‘conserving effect’ (Bybee 2007: 10; Bybee and Thompson 2007; Diessel 2007). This term refers to the well-attested fact that highfrequency strings tend to survive in their concrete phonological form, whereas less frequent instances of the same construction undergo analogical extension or levelling (Ellis 2002a: 166; Greenberg 1966: 68-69; Paul 1995: 227). Thus, Pinker (1999: 69) shows that lower-frequency irregular verbs tend to get regularized before higher-frequency items (e.g., go-went versus climb-climbed [< clomb]). In a similar vein, Paul (1995: 191-194) argues that a few German expressions (such as Gut Ding will Weile haben ‘haste makes waste’ or ein ander Mal ‘some other time’) exhibit an otherwise obsolete Middle High German pattern of attributive adjective inflection because their high token frequency prevented analogical attraction to the more recent inflection pattern (for a similar point, see Wray 2009: 33)

20

Related frameworks

20

The foregoing examples illustrate that high token frequency promotes two kinds of phenomena which might seem contradictory at first sight. On the one hand, it is claimed to render strings of language impervious to ongoing language change. At the same time, paradoxical as it might sound, it is also held responsible for changes in the phonological substance and morphological structure of word strings. The solution to this apparent puzzle is cognitive in nature. As mentioned above, usage-based linguists claim that high token frequencies result in chunking or holistic memory storage. Things are actually somewhat more complicated, as the chunk status of an item-based string is seen as resulting from three interrelated cognitive sub-processes in the construction network. First, high frequency leads to a tightening of syntagmatic bonds, thereby conditioning phonological reduction and the loss of identity of formerly separate component parts (Bybee 1985). Second, it results in a loosening of paradigmatic bonds (cf. Bybee 2007: 301), which means that individual component parts become disassociated from previously related forms, be it paradigmatic alternatives or semantically related items (cf. Peters 2009; Bybee and Cacoullos 2009). Third, high token frequency conditions an emancipation of the string from more abstract constructions, thus making it resistant to analogical alignment with higher-level schemas, which eventually brings about conservation. The precise causal and temporal interrelationships between these processes would be an interesting topic for further research. This section has shown that the usage-based approach to diachrony holds that constructions emerge and change through the use of concrete lexical strings at the bottom of the construction network, with highfrequency sequences playing a pivotal role in this process. 2.6.

Entrenchment in related frameworks

The whole edifice that we build in this book rests on a single supposition – that chunking is a natural and unavoidable way of perceiving language text as it is encountered. (Sinclair and Mauranen 2006: 6)

A central defining criterion for usage-based linguistics is the attempt to induce descriptive generalizations from authentic, observable instances of language use (Evans and Green 2006: ch. 2; Croft 2001: ch. 1). In a relatively broad sense, the proponents of Pattern grammar, who construct their model of language from quantitative and distributional analyses of corpus data in a resolutely empirical and non-a priori fashion, can therefore be said

21

Entrenchment in usage-based theories

21

to be usage-based (Hunston and Francis 2000: 240). Pattern grammar, which has its roots in Sinclair’s (1991) work on corpus linguistics, can be located within the Firthian tradition of British text linguistics. The most prominent present-day adherents of this approach are Halliday, Stubbs, Francis and Hunston (Laffut and Davidse 2002: 170; Francis 1993: 137). Crucially – and this is a fact that has not been duly acknowledged in the literature so far – the similarities between Pattern grammar and usage-based approaches extend far beyond methodological convergences. This is all the more striking as the major points of convergence between these schools run counter to received, and, in particular, generative conceptions of language. The present section will go through these points without much elaboration, since the relevant details have been presented in the preceding sections. It can thus be seen as a wrap-up of major usage-based ideas which simultaneously introduces a new dimension to the discussion in an attempt to locate the framework under discussion within the wider theoretical context. First, like usage-based linguists, proponents of Pattern grammar reject introspective data as untrustworthy, biased and artificial (Croft 2001: 7; Francis 1993; Leech 1992; Sinclair 1991: 5). Thus, Stubbs, who offers an outline of the central tenets of corpus linguistics, asserts that “[l]anguage should be studied in actual, attested, authentic instances of use, not as intuitive, invented, isolated sentences” (Stubbs 1993: 8). Second, both families of approaches use idiosyncratic phrasal collocations as a springboard from which they derive their whole theory (Hunston and Francis 2000: 270f) – an idea which also resonates with many other non-Chomskyan linguists. For example, as early as 1987, the taxonomic structuralist Hockett maintained that we should dispense with the morpheme notion and acknowledge that idioms are available in a variety of different sizes: “Some idioms, true enough, are tiny and compact and don’t seem to be divisible into smaller pieces that are also idioms. But that is a matter of degree …” (Hockett 1987: 87; cf. also Lee 2007). By contrast, in generative grammar, ‘rule-driven’ phonology and grammar usually hold the centre stage, whereas the lexicon and idioms are relegated to the ‘peripheral’ fields of sociolinguistics and pragmatics. Third, pattern grammarians advocate a lexicon-grammar continuum and subscribe to the emergentist idea that syntax arises from lexis (Hunston and Francis 2000: 251). This involves the more specific postulate of a gradient of more or less complex and schematic chunks which constitute single choices, even if they can be transparently analyzed (Francis 1993: 143; Hunston and Francis 2000: 112, 230f; Sinclair 1987; Sinclair 1991: 110f). Strongly related to this is the emergentist notion that even the most abstract chunks are learned inductively on the basis of input (Francis 1993: 144f; cf.

22

Related frameworks

22

also Hunston and Francis 2000: 234). This idea is diametrically opposed to generative assumptions about language acquisition – witness Chomsky, who draws an analogy between the development of physical organs and the acquisition of language and goes on to claim that in certain fundamental respects we do not really learn language; rather, grammar grows in the mind. … In both, it seems, the final structure attained and its integration into a complex system of organs is largely predetermined by our genetic program, which provides a highly restrictive schematism that is fleshed out and articulated through interaction with the environment (embryological or post-natal). There are certain processes that one thinks of in connection with learning: association, induction, conditioning, hypothesis-formation and confirmation, abstraction and generalization, and so on. It is not clear that these processes play a significant role in the acquisition of language. Therefore, if learning is characterized in terms of its distinctive processes, it may well be that language is not learned. (Chomsky 1980: 134f)

Fourth, pattern grammarians argue that chunks exert top-down coercion. This general concept covers three more specific claims: the view of categories as something chunk-specific (Croft 2001: 46-48, 54, 85; Hunston and Francis 2000: 129-142; Hunston and Sinclair 2000; Barnbrook and Sinclair 1995), the idea that chunks have meanings and constraints of their own that they impose on their parts (Hunston and Francis 2000: 83,87,100; Sinclair 1991: 65; Hunston and Francis 2000: 250), and the assumption that individual lexemes are preferentially associated with certain complex patterns and vice versa (Hunston and Francis 2000: 43, 86, 208, 247). By contrast, generative models of language typically maintain that language is split up into several neatly distinct, autonomous systems, such as phonology, syntax and lexicon, with the lexicon consisting of meaningful items, and the grammar consisting of abstract, meaningless templates (Croft 2001: 14f; Chomsky 1957: 17). They also assume the existence of a small inventory of autonomous categories (noun, verb, etc.) from which all languages of the world draw. These categories are supposed to be instantiated by different morphemes across languages and to be combinable by means of highly general grammatical rules that allow speakers to form all licensed sentences of a language (Radford 1997: ch. 2). Fifth, pattern grammarians claim that fully compositional sequences of language that are of high usage frequency come to be separately stored and holistically handled (Hunston and Francis 2000: 37; cf. also Hunston and Francis 2000: 49, 67).

23

Entrenchment in usage-based theories

23

Sixth, pattern grammarians maintain that the predominant mode of using language consists in retrieving holistic chunks at the concrete pole of the lexicon-grammar continuum, an idea which has received independent support from research on second language acquisition: “[F]or a great deal of the time anyway, language production consists of piecing together the ready-made units appropriate for a particular situation and comprehension relies on knowing which of these patterns to predict in these situations” (Nattinger 1980: 341). Likewise, Pawley and Syder maintain that “native speakers do not exercise the creative potential of syntactic rules to anything like their full extent, and … if they did so they would not be accepted as exhibiting nativelike control of the language" (1983: 193). Given these striking similarities, it is surprising that both frameworks have always led a peaceful, or rather unnoticed, coexistence without ever referring to each other. 2.7.

Summary and outlook: The epistemological status of entrenchment in usage-based frameworks

This chapter has shown that the notion of entrenchment is central to usagebased approaches because constructions, which constitute the primitive building blocks of usage-based accounts, are inherently entrenched. This is illustrated by Croft’s (2012: 19) definition of constructions as “(possibly complex, i.e. syntactic) pairings of form and meaning that are autonomous entrenched units in a speaker’s knowledge about her language.” An interesting question in this context is whether the notion of ‘construction’ and ‘entrenched unit’ should be considered as coextensive. I would suggest keeping these notions apart by reserving the term ‘construction’ to entrenched form-function pairings in the language, and by construing entrenchment as a phenomenon which also exists in other cognitive domains (cf. the detailed argument in chapter 4). The present chapter has also demonstrated that transparent multi-wordsequences which are entrenched at the very bottom of the structural hierarchy as a result of natural language use are of considerable theoretical interest for several reasons. First, the distinctly usage-based claim that high token frequency results in the entrenchment of fully compositional formmeaning pairings is extremely controversial, whereas the assumption that idiosyncrasy yields holistic storage in some form seems to be commonly accepted by linguists of all persuasion. Second, according to usage-based emergentism, stored but decomposable sequences are the stepping stones to the formation of abstractions at different levels of analysis (language

24

The epistemological status of entrenchment

24

ontogeny, phylogeny, and diachrony). Third, usage-based linguists unanimously assume that the dominant mode of language use consists in piecing together units from our mental store of complex ready-made chunks (Sinclair and Mauranen 2006; Tomasello 2007). Before moving on to the next chapter, which will be concerned with the psychological reality of entrenched formulae, I would like to take a step back and consider the epistemological status of the construction taxonomy in some more detail. As suggested above, the hierarchical network of constructions is vague between different levels of generalization (cf. Croft 2001: 28). More specifically, it was pointed out that a dynamic bottom-up reading of the network provides us with a model of how abstract hierarchical structures gradually emerge from concrete linear instances of language use on at least three levels of language description: diachrony, phylogeny, and ontogeny. But the situation is even more complex than that, since the network is also used to describe the relatively stable and consolidated products of these different processes. More concretely, it is sometimes read as a model of levels of representation in the minds of individual language users (i.e., as a model of the product of language acquisition). At the same time, it has also been interpreted as a model of something which exists outside the subjects that carry it – the structure of language ‘out there’ in society. Although this assumption is not usually made explicit, it is clear that a construction network which arises phylogenetically or diachronically must transcend the individual language user. In a nutshell, the hierarchical network represents both dynamic processes of language shaping and relative stasis at both social super-individual and mental individual levels. This is a fact which seems to have been positively acknowledged in the past, presumably because it confers emergentist theories an extraordinarily broad descriptive and explanatory scope. Goldberg (2006: 204), for example, maintains that it is the goal of usage-based constructionist approaches to “represent grammatical knowledge in such a way that it can interface transparently with theories of processing, acquisition and historical change”. Likewise, Croft (2001) explicitly conceives of Radical Construction Grammar as a theory which accounts for both change and the synchronic manifestations of dynamic processes (Croft 2001: 7-8, 146). In a similar vein, Wiechmann (2008: 254) underlines the pivotal role that frequency plays at all levels of language shaping: “The ample body of available empirical evidence surely gives us some confidence in asserting that frequency information influences not only the acquisition and processing but also the shaping of grammar over historical time ...”

25

Entrenchment in usage-based theories

25

Now I am of course not calling into question the relatively trivial idea that all these levels must be interacting in some way. It is obvious that the social dimension of a language “cannot exist without a population of individuals who use and speak the language (e.g., who have mental structures in their brains which somehow correspond to English, Norwegian etc.)” (Jenset 2007: 20). Likewise, frequency effects on the phonological substance of language and language change in general are, of course, inconceivable without subjects performing certain inherently cognitive operations like analogy-formation and chunking (see also Croft 1996: 103, 111). It is also a truism that processes must somehow be related to their results (be it only temporally). However, the assumption of a straightforward isomorphism between mental and social, dynamic and static dimensions of language seems rather bold and may give rise to considerable confusion in that it tempts linguists to surf between different levels of description without being explicit about the precise scope of their claims. A case in point is the assumption (shared by many diachronic usage-based linguists) that language change consists in innovations that occur at the level of constructs in individual speakers and then spread within the language community, giving rise to higher-level constructions. For example, Traugott (2008c: 150) argues that when “innovations are conventionalized by some set of speakers, a micro-construction emerges, and this is a change”. Likewise, she claims that “in the terminology of construction grammar, the distinction between innovation and change can be seen as analogous to that between constructs (individual tokens) and micro-constructions (conventionalized schemas) [translation mine]” (Traugott 2008b: 22-23).4 As I understand it, this explanation exclusively associates ‘constructs’ with the individual, mental level and ‘microconstructions’ with the collective level. This, however, does not seem to fit in with the constructionist idea that the mental inventory of constructions consists of structures at all levels of abstraction. A simple equation between theoretical representations pertaining to different levels would also entail that usage-based linguists’ descriptive generalization across chronological variation consistently observable in children’s linguistic output are tantamount to a model of the relatively stable mental representation that children arrive at on the basis of the language input encountered. This indeed seems to be what Croft (2001: 58) is saying 4. This is the original German quote: “In konstruktionsgrammatischer Begrifflichkeit kann die Gegenüberstellung von Innovation und Wandel … als Analog der Unterscheidung zwischen Konstrukten (individuellen Token) und Mikrokonstruktionen (konventionalisierten Mustern) gesehen werden.”

26

The epistemological status of entrenchment

26

when he asserts that “as the child becomes able to process all aspects of the input, ... s/he gradually builds up a taxonomic network of constructions and their categories that comes to equal that possessed by adult speakers of the language”. This leads on to the main topic of the next chapter, which will explore the degree to which usage-based generalizations can be considered psychologically realistic.

Chapter 3 The cognitive realism of usage-based generalizations, with a special focus on the relationship between token frequencies and entrenchment

The last chapter showed that in the usage-based approach, the entrenchment of compositional high-frequency strings plays a pivotal role at various levels of model-building, which tend to be collapsed into one single representation. This chapter will demonstrate that many usage-based linguists go even further, taking for granted some kind of isomorphism between their wide-scope model of language on the one hand, and the individual language user’s language processing and representation on the other. This view is illustrated by Croft (2001: 3), who states that “Radical Construction Grammar is a theory of syntax, that is, a theory characterizing the grammatical structures that are assumed to be represented in the mind of a speaker” (cf. also Croft 2001: 8, 25; Goldberg 1995: 192, 220). Although this claim might be seen as a natural extension of the model presented in the last chapter, it is important to emphasize that it brings about an ontologically new dimension, since we are no longer talking about the level of supraindividual theoretical generalizations. Section 3.1 will present this presumed mind-model-isomorphism in more detail and explore some general concerns that an overly simplistic equation raises. Sections 3.2 and 3.3 will offer a critical evaluation of the so-called ‘corpus-to-cognition principle’, which states that statistical generalizations across corpus data can yield psychologically realistic insights into the cognition of actual speakers. The aim of section 3.2 is twofold. First, it will demonstrate that the corpus-to-cognition principle in its strong version rests on a number of tacit (or, for that matter, rarely stated) assumptions that should be questioned in the first place. Second, it will argue that these assumptions give rise to serious theory-internal inconsistencies. Section 3.3, by contrast, will present different kinds of data providing reason for cautious optimism with regard to the potential of corpus data to inform us about entrenchment in the mind. For example, it is by now commonly accepted that the brain is highly plastic, malleable and adaptive throughout life and that it gets modified as a function of the frequency of

28

Collective usage and individual representation

28

experiences. Likewise, different kinds of neuro- and psycholinguistic findings indicate, first, that high- and low-frequency items have a different status in the mind and, second, that the phenomenon of linguistic chunking exists in the first place. However, an essential missing link in the literature is empirical studies on the relationship between usage frequency and mental entrenchment. So far, these phenomena have never been examined for potential correlations, leading to important questions of model-building. Section 3.4 will wrap up the discussion by suggesting that while a weak version of the corpus-to-cognition principle seems plausible, many crucial questions still remain to be addressed. The chapter will therefore conclude by highlighting the guiding questions to be examined in the rest of this work. 3.1.

The assumed link between collective usage and individual representation

[W]hat is the grammatical knowledge of a speaker? The usage-based model attempts to answer that question. (Croft 2001: 28)

The foregoing quote, which is by no means exceptional for its strength and explicitness, nicely illustrates the usage-based assumption that the theoretical knowledge of the linguist reflects the mental representation of the individual language user, which will be explored in this section. The first question that immediately comes to mind when reading such a strong statement is how the underlying blurring of ontologically different levels of representation can come about. Although I do not pretend to have the final answer to this question, I would like to suggest a factor which might at least constitute the beginning of a plausible explanation. More specifically, I would like to argue that the confusion may hinge on a presumed methodological parallelism between usage-based theorists on the one hand and natural language users on the other. In section 2.6, it was shown that usage-based linguists are officially committed to constructing their theory in a non-a priori, empirical and maximally inductive fashion from authentic, observable usage data. Interestingly, they expect language learners to proceed in exactly the same fashion (cf. section 2.3), which suggests that the theorists’ ‘epistemological emergence’ might be taken to mimic the children’s language acquisition process. This assumed methodological parallelism may result in an analogy along the following lines: If children proceed in the same way as linguists to construct their linguistic representations, then the results which they obtain must be the same.

29

The cognitive realism of usage-based generalizations

29

In this connection, it should be mentioned that the above-mentioned (cf. section 2.7) blurring between process and result extends to the cognitive dimension of usage-based models as well. However, as far as this specific dimension is concerned, the equation between process and result can be argued to be theoretically consistent: It follows quite naturally from the fact that under a usage-based account of language, emerging abstractions do not necessarily replace the instances from which they emerge. Thus, Tomasello states that in usage-based approaches a given linguistic structure may exist psychologically for the speaker both as a concrete expression of its own … and at the same time, as an exemplar of some more abstract construction ... The main point from an acquisition point of view is that when a higher abstraction is made the lower level concrete constructions and expressions do not necessarily go away but remain available for use – especially if they are used frequently. (Tomasello 2003: 106; cf. also Tomasello 2003: 327)

In other words, usage-based linguists assume that especially in the case of regular high-frequency chunks, memory traces from the language acquisition process remain in the language system, leading to redundant representation (Bybee and Scheibman 1999; Bybee 2007: 302; Langacker 1987: 42; Lieven and Tomasello 2008: 175). Does this now mean that we should accept usage-based constructionist representations of language as psychologically realistic with regard to the minds of actual speakers without further stipulation? I would like to argue that we should not, and I will explain why in the following. One problem that deserves consideration is the fact that the most schematic constructions in the constructional hierarchy only represent potential (rather than actual) abstractions in the mental representation of speakers – witness Lieven and Tomasello (2008: 186), who claim that “higher-level schemas may only be weakly represented and, indeed, they may sometimes only exist in the formalized grammars of linguists!” This fact is problematic in that it undermines the network’s being entirely psychologically realistic from the outset. Rather, it suggests that the network should be taken to represent the knowledge of an idealized speakerhearer – in other words, a generalization which might be only loosely related to actual speakers. The reader may now object that theories and models in the social sciences inherently generalize and schematize, and I principally agree. The problem, however, is that usage-based linguists seem to believe that their models are rather close to the knowledge of actual language users (cf. Baayen 2009: 900-901). Even more problematic is the fact that such an

30

Collective usage and individual representation

30

assumption – which would imply a great deal of convergence between the mental representation of different speakers – would yield theory-internal inconsistencies, as will be shown in section 3.2. The major challenge for claims pertaining to the psychological realism of usage-based representations, however, is the fact that the field has only recently begun to be empirically explored, leaving many key assumptions still unconfirmed. Especially in the face of the usage-based commitment to empirically founded work, the situation is rather unfortunate and has been variously identified as such within the usage-based literature. Thus, Dąbrowska (2004: 227-228) deplores that although cognitive linguists are officially committed to developing an account of language that is usage-based and firmly grounded in human cognition, in practice, only a few have begun to go beyond the traditional introspective methods. It is to be hoped that more researchers will move in this direction in the future. A firm empirical basis is indispensable for work that purports to be psychologically realistic. (cf. also Croft 2001: 28)

A similar concern is expressed by Bybee (2007: 6-7), who states that “[d]espite the empirical bent of the functionalist movement and the acceptance of the notion that conventionalization through repetition creates grammar, there was still very little investigation into the nature of the effects of repetition or frequency on the cognitive representation of language.” This applies with even more force to the purported entrenchment of high-frequency regulars, which, in spite of their central theoretical importance (cf. chapter 2), have not been subject to much serious empirical scrutiny (an overview of the relevant work will be presented in section 3.3). Thus, Dąbrowska (2004: 20) bemoans that “there has been relatively little work devoted to the storage of complex expressions which are fully compositional.” Despite this rather unsatisfactory state of affairs, an even more specific claim about the relationship between generalizations across usage data and language representation in individual minds has been put forward in the literature (Schmid 2000: 39). According to the so-called ‘corpus-tocognition principle’, statistical generalizations over the collective performance of a linguistic community as attested in large corpora reflect the linguistic competence of individual speakers (cf. Esser 2002; Mukherjee 2004, 2005; see also Gries and Stefanowitsch 2006; Bybee 2007: 301). The corpus-to-cognition principle has recently found practical application in a range of corpus-driven studies, with most of them pertaining to the mental representation of syntactic, semantic, pragmatic and discourse

31

The cognitive realism of usage-based generalizations

31

meaning. More concretely, under the assumption that distributional similarity in corpora reflects functional similarity, cognitive corpus linguists have developed the so-called ‘behavioural profile approach’. Among other things, this sophisticated statistical method has been exploited to disentangle the different functions of polysemous constructions, to determine which of several senses of a verb should be seen as the most central one, and to find out how sets of near-synonyms are cognitively structured (Arppe and Järvikivi 2007; Gries and Divjak 2009). Other cognitively oriented corpus studies have examined the degree of attraction or repulsion between words and higher-level constructions (for example, between give and the prepositional indirect object construction) or have investigated alternations (such as the English dative alternation between the indirect object construction and the double-object construction) (cf. Gries and Stefanowitsch 2004; Stefanowitsch and Gries 2003, 2008). At first sight, the assumption that the gap between the individual-mental and the collective-behavioural levels can be bridged via statistical generalizations over corpus data seems highly appealing and to some degree plausible. It is appealing because if true, corpora could afford direct insights into the mental representation of language without the need to conduct experiments. Moreover, it would allow linguists to study different dimensions of language in an integrated fashion (cf. chapter 2). Also note that this assumption simply must be true in some sense, since, as Schmitt, Grandage, and Adolphs (2004: 147) put it, “the language in corpora has been produced by people using language and so must reflect language competence to some extent.” What is more, at least certain corpus-derived cognitive claims have recently begun to be submitted to empirical scrutiny, with many studies reporting convergence between experimental and corpus data (see, for example, Bresnan 2007; Divjak and Gries 2008; Gries, Hampe, and Schönefeld 2005, 2011; for an overview and evaluation of studies combining corpus linguistic and experimental approaches, cf. Gilquin and Gries 2009; Arppe et al. 2010; for a study disconfirming corpus-derived cognitive claims, see Gilquin 2008). On a much more negative note, however, it must be emphasized that the corpus-to-cognition principle in its strongest version hinges on several critical assumptions that are highly questionable in the first place, namely (i) (ii)

that averages are informative about individual representation; that usage impacts mental representation in a highly deterministic and predictable way;

32

Collective usage and individual representation (iii) (iv)

32

that the brain is usage-based in exactly the same way as usage-based linguists are in their theory; and, finally that linguistic knowledge is itself knowledge of the statistical structure of language.

These assumptions are debatable in themselves, but they also bring about several theoretical problems, as will be shown in the next section. Before moving on, it is worth noting that a debate on a related topic has been going on in the functional-typological community. This debate has focused on whether empirically established generalizations over distributional patterns in the world’s languages should be taken to match the mental representations of individual speakers, as assumed by Dryer (1997: 134) among others. Cristofaro (2009, 2011), who takes exception to this view, argues that it is essential to keep apart three levels of analysis which have been confounded in the literature: descriptive generalizations over grammatical patterns (i.e., the linguists’ classification device), the factors that give rise to these patterns, and the speakers’ mental representations. She maintains that the question of mental reality cannot be decided on the basis of the patterns as such, and suggests that language universals should be regarded as resulting from the strength of distinct functional motivations interacting at a diachronic level, rather than a function of speakers’ mental representations. While I fully agree with the view that descriptive generalizations over cross-linguistically attested patterns should not be considered to provide a direct shortcut to the mental representations of individuals, I also feel that a neat conceptual separation between the three analytical levels proposed by Cristofaro would be overly simplistic. However indirectly, functional motivations simply must interact with and be relayed by individual minds. As I see it, the only way of getting a more reliable picture of the relationship between usage-based cross-linguistic generalizations, human cognition, and functional factors would be to control or manipulate these dimensions in the setting of large-scale neuro- and psycholinguistic experiments. It is to be hoped that in the long term, the relatively new field of neurotypology, which explores the range and limits of cross-linguistic variation in linguistic cognition, will be instrumental in developing a more differentiated understanding of this issue (Bornkessel and Schlesewsky 2006; BornkesselSchlesewsky and Schlesewsky 2009; Kemmerer 2006; Kemmerer and Eggleston 2010). To be sure, my scepticism should not be taken as an endorsement of the claim that introspective evidence and intuitive analyses of ‘armchair data’ are superior to quantitative analyses of corpus data to gain insights into the

33

The cognitive realism of usage-based generalizations

33

mental representation of language (for the advantages of corpora over introspective methods, see Croft 1998; Gries and Divjak 2010; Gilquin and Gries 2009). On the contrary, I fully side with linguists who believe that “[c]orpus studies often reveal quantitative patterns that are not available to introspection but that are likely to be important to the understanding of how speakers store and access units of language” (Bybee 2007: 7). However, I wish to argue for a weaker, more nuanced, less naive and positivist interpretation of the corpus-to-cognition principle. In connection with this, it is interesting to note that even the very linguist who coined the term ‘corpus-to-cognition principle’ has by now come to adopt a much more balanced perspective on the topic. Thus, Schmid (2010) presents a critical evaluation of the claim that frequency in texts instantiates entrenchment in the cognitive system on the grounds that we still fail to have a clear understanding of the statistical relationship between corpus data and entrenchment. While I share his conclusion that many researchers “have had a great deal too much confidence in the potential of quantitative methods for the study of aspects of the linguistic and cognitive system” (Schmid 2010: 125), I would also like to extend the scope of the discussion by bringing it to a more basic level, as will be shown in the next section. 3.2.

Against the psychological realism of corpus-derived claims

But corpus data alone only give limited evidence about the underlying mental representation: it may well be that a certain collocation is highly frequent and shows little variation or linkage with other construction (sic!), but may nevertheless be fully analyzed. (Behrens 2009: 395)

This section will demonstrate that even if we wanted to endorse the view that language users are usage-based in that all of their linguistic knowledge is grounded in usage events and induced from lexically specific instances of language use (Langacker 1999: 91-92), the corpus-to-cognition principle implies different more or less obvious levels of idealization which make it rather unlikely that the resulting model will still be informative about the actual language representation in individual minds. In the following, I will present each of these idealizations, before turning to more general theoretical problems that a rigid version of the corpusto-cognition principle entails. Note that individual usage-based linguists may not feel equally committed to all of the assumptions discussed in this

34

Against corpus-derived cognitive claims

34

section – some may indeed be rather controversial even within the usagebased framework. The most obvious idealization resides in the largely implicit presupposition that corpora of a large and heterogeneous language community can give a reasonably faithful picture of the input that subjects actually encounter. There is no denying, however, that people are actually exposed to different kinds of data. As a result, it seems reasonable to assume that “[n]o two speakers have the same language, because no two speakers have the same experience of language” (Hudson 1996: 11). More specifically, with regard to compositional chunks, this entails that every person will “have their own unique store of formulaic sequences based on their own experience and language exposure” (Kuiper 2004: 38). Accordingly, it has often been hypothesized that the storage of specific expressions may vary depending on variables like social class, region and age (Croft 2001: 57; Sinclair 1987: 324; Wray 2002: 33-34). This is borne out in a study by Kuiper (1996), who shows that people from different fields of profession (for example, sportscasters and auctioneers) tend to store different psycholinguistically coherent phrases. The functional advantages of such an adaptive differentiation between subjects are obvious: Highly conventionalized clusters help to quickly and fluently produce the proper thing to say on any recurring situation in your live. In a similar vein, Dąbrowska (2004) reports that the level of schematicity at which people are able to operate in English and German varies considerably between subjects, with more educated speakers exhibiting higher degrees of abstraction than less educated ones, presumably as a result of greater literacy. Another obvious problem is that in inducing generalizations, language users will draw on fine-grained qualitative information that even the most richly annotated corpora do not provide in such detail – most notably with regard to gestures, intonation and other interactional features (note that even if we had such corpora, we would still have to integrate them with current corpus-to-cognition principle practice, which would not be a straightforward thing to do). However, there is by now some agreement that there are major phonological and prosodic differences between ready-made chunks and (merely) compositionally assembled sequences. Thus, Bybee and Schreibman (1999), who compare the characteristics of different realizations of I don’t know in American English conversation, show that when used as an interactive expression, this string tends to be phonologically reduced and fused to a much greater extent than when it is used in its much rarer literal, compositional meaning. Under a usage-based view, it seems natural to assume that systematic qualitative differences of this kind will feed into the mental representation of language users.

35

The cognitive realism of usage-based generalizations

35

A further challenge for corpus-based idealizations is the usage-based assumption that each individual’s language system is fluid, ever-changing and continuously updated as a function of ongoing experience with language. Thus, according to Ellis (2002a: 162), the “knowledge of a speakerhearer cannot be understood as a fixed grammar but rather as a statistical ensemble of language experiences that changes slightly every time a new utterance is processed.” A corpus-derived snapshot of the performance of the language community at a given point in time is of course notoriously inappropriate to capture fine-grained intra-subjective changes of accumulating language experience over time. From all these considerations, it becomes clear that corpus-based data provide at best a considerably simplified and possibly even distorted image of the language that speakers actually encounter. Crucially, however, under the usage-based view, individual language experience is the factor that determines cognitive representation. This leads on to the next level of idealization to be addressed, namely the idea that speakers will converge on the same set of generalizations if faced with the same data. Paradoxically, the very assumption that usage determines mental representation in a fairly predictable way seems to contradict empirical findings from usage-based research. Thus, Pine and Lieven (1993) observe that in early stages of language acquisition, later-born children tend to rely more on frozen phrases than their elder siblings. While this suggests that at least some inter-individual differences may ultimately be due to the quality of input that children receive in their specific family constellation (see also Barton and Tomasello 1994, who suggest that exposure to third-party, child-directed speech to siblings may play a role), we cannot exclude the existence of fundamentally different cognitive styles. Thus, Nelson (1973) introduced a distinction between two types of language learners. According to her classification, ‘expressive’ (or: ‘holistic’) children start the acquisition process from unitary chunks, whereas ‘referential’ (or: ‘analytic’) learners tend to proceed from single words (see also Peters 1983 and Dittmann 2010: 34-35; for overviews on inter-individual differences in the style and rate of first language acquisition, cf. Bates, Bretherton, and Snyder 1988 as well as Lieven 1997). In his outline of the current state of the art, Tomasello (2007) claims that the factors that are responsible for inter-subject differences are as yet elusive. Interestingly, he refers to researchers who report individual differences in human visual processing that closely parallel the analytic-holistic distinction in language, which reinforces the notion that subjects may be naturally more inclined towards one or the other cognitive style (Tomasello 2007: 291).

36

Against corpus-derived cognitive claims

36

Other studies suggest that one factor which might significantly affect subjects’ responsiveness to usage frequency is sex. On the basis of a series of experiments in different languages, Ullman and colleagues demonstrate that women exhibit stronger effects of usage frequency and tend to memorize complex strings (e.g., walked) which men will rather compute compositionally (e.g., walk + ed) (Ullman 2004; Ullman, Miranda, and Travers 2007; cf. Tabak, Schreuder, and Baayen 2005 for results pointing to the same conclusion). They also argue that converging results from patient, electrophysiological, psycholinguistic, pharmacological and developmental studies suggest that the increased reliance on ready-made chunks in females is due to enhanced declarative memory abilities, a fact which has in turn been attributed to higher levels of oestrogen in women. In Ullman’s model, the declarative memory system underlies the mental lexicon, which contains all stored chunks of language. Hartshorne and Ullman (2006) compare the past-tense overregularizations produced by boys and girls and find that girls overregularize far more than boys (e.g., *holded instead of held) – a result which, at first sight, seems to contradict the purported female advantage in lexical memory. However, Hartshorne and Ullman also show that in girls but not boys, over-regularization rates correlate with the number of regular phonological neighbours (i.e., similar-sounding words) of a given word (such as folded and molded for *holded). From this, they conclude that girls produce over-regularizations by generalizing over stored neighbouring regulars in an associative, superpositional lexical memory. By contrast, boys are more likely to depend upon the procedural memory system, which subserves the rule-governed combination of stored units into complex, hierarchically structured units (for a similar claim, see Pinker 1999, Pinker and Ullman 2002). While it thus seems that at least certain inter-subjective differences might be classifiable into a taxonomy, other subject-specific factors that are known to affect perception and memory (e.g., motivation, attention and expectation) may be more arbitrary (cf. Schmidt 1990, 1995). Accidental factors accounting for non-systematic variation are expected in statistical studies and usually modelled as noise, but the failure to model systematic differences will result in a loss of power and of important generalizations. Yet another level of idealization resides in the hypothesis that language users are empirical, inductive and bottom-up in exactly the same way as usage-based linguists are in their theory (cf. sections 2.3 and 2.6), more specifically that they track usage frequencies and compute statistics just like cognitive corpus linguists do. This is indeed the assumption that seems to be made when language users are described as unconscious ‘intuitive

37

The cognitive realism of usage-based generalizations

37

statisticians’ onto whom usage frequencies map fairly directly, as illustrated by the following representative quotes: In this view, language learners are intuitive statisticians, weighing the likelihoods of interpretations and predicting which constructions are likely in the current context, and language acquisition is contingency learning, that is the gathering of information about the relative frequencies of form–function mappings. (Ellis 2006: 1LWDOLFVKLV Input-driven models of learning assume that the individual keeps track of the frequency of occurrence of specific distributional properties of the input. These properties include where a particular linguistic form occurs, the forms with which it co-occurs, and the frequency with which it occurs, both alone and in combination with other forms. The approach is based on a probabilistic model of human inferencing that assumes that decisions are made and actions undertaken based on a computation of probabilities and utilities. (Harrington and Simon 2002: 263)

These claims, however, seem to hinge on the positivist idea that the data itself expresses certain regularities, which you – i.e., both the language user and the cognitive corpus linguist – only need to identify. Although I would like to remain agnostic on the philosophical question of whether there are (frequency and other) patterns which are inherent to language and which exist independently of a theoretical projection, as a matter of fact, there are different ways of statistically describing and analyzing language. What is more, different statistical analyses sometimes yield contradictory results (Kennison 2001; Wiechmann 2008), and there is as of yet no consensus as to which one is the most objective. As a result, even the most empirical and data-driven researcher will necessarily have to combine inductive and deductive reasoning (Geeraerts 2006: 24). This means that even if we accept the idea that language users somehow draw on statistical information to construct their representation of language, it seems rather daring to assume that we know which algorithm they use (leaving aside the problem of potential inter- and intra-subjective differences). But the problem is even more intractable, since it is often supposed that different kinds of statistical computation are tracked at different levels of language description, as the following quotes show: [L]anguage performance is tuned to input frequency at all sizes of grain: phonology and phonotactics, reading, spelling, lexis, syntax and morphosyntax, grammaticality, formulaic language, language comprehension, and sentence production. (Ellis 2006: 7, cf. also Shaoul and Westbury 2011: 191)

38

Against corpus-derived cognitive claims

38

Not only do we know the constructions that are most likely to be of overall relevance (i.e. first-order probabilities of occurrence), but we also predict the ones that are going to pertain in any particular context (sequential dependencies), and the particular interpretations of cues that are most likely to be correct (contingency statistics). These predictions are usually rational and normative in that they accurately represent the statistical covariation between events. In these ways, language learners are intuitive statisticians; they acquire knowledge of the contingency relationships of one-way dependencies and they combine information from multiple cues. (Ellis 2006: 8; for psycholinguistic claims to this effect see McRae, SpiveyKnowlton and Tanenhaus 1998; Mitchell et al. 1995; Tabor and Tanenhaus 1999)

This complex picture of statistical analysis seems hard to reconcile with the above-mentioned idea that token frequencies are the key determinant of entrenchment, and the obvious question to ask is how token frequency interacts with other statistical measures. The inconsistency becomes even more apparent when we take into account the fact that from a statistical point of view, the absolute frequency co-occurrence of two events is a poor indicator of their strength of association (statistically more sophisticated corpus-linguistic methods usually employ information theoretic measures like pointwise mutual information or the log-likelihood ratio test in combination with significance tests, cf., Church and Hanks 1989; Dunning 1993; Evert 2008; Evert and Krenn 2001). What is more, we do not know whether and where to set frequency thresholds that would be indicative of qualitative differences in representation and processing. The following quotes by Bybee demonstrate that such thresholds are explicitly expected and that it is as yet unclear how to get a statistical handle on them: There are various degrees of effect, depending upon the extent of the frequency. 1. low levels of repetition lead to conventionalization only (as in prefabs and idioms) 2. higher levels of repetition can lead to the establishment of a new construction with its own categories 3. extreme high frequency (sic!) leads to the grammaticization of the new construction and the creation of grammatical morphemes and changes in constituency. (Bybee 2006: 719) When one is studying token frequency, there is an inherent problem in determining the point at which high should be distinguished from low. (Bybee 2007: 16)

39

The cognitive realism of usage-based generalizations

39

Neither do we know whether the commonly held assumption that every single usage event has the same impact on language representation is true. Without further stipulation, the claim that every token of use affects entrenchment in a continuous fashion (Harris 1998: 1; Langacker 1987: 59) suggests (although, to be fair, it does not strictly speaking entail) that the effects of token frequency are linear, a view which seems overly simplistic for two reasons. First, it has been variously suggested that the function that relates entrenchment and frequency “is not linear but instead follows the power law of learning with the effects of practice being greatest at early stages of learning but eventually reaching asymptote” (Ellis 2006: 10; cf. also Lacroix and Cousineau 2006; Ritter and Schooler 2002). In other words, we should expect effects of frequency to be much more evident in low- than in high-frequency strings, and we should also be prepared to find a threshold beyond which further tokens do not affect entrenchment. Second – and this is a crucial issue that we will come back to in section 4.1 – many linguists seem to assume that entrenchment involves a point at which incremental processing differences (i.e., increasing ease of composition) turn into a qualitatively new state of representation (i.e., chunking), as witnessed by the following quote by Langacker (1987: 59): “With repeated use, a novel structure becomes progressively entrenched, to the point of becoming a unit; moreover, units are variably entrenched depending on the frequency of their occurrence.” It would appear reasonable to assume that the point at which chunking occurs in cognition should somehow be reflected by a natural gap in the frequency data. Another simplifying assumption is that frequency is the only factor to determine entrenchment. Note that while some usage-based linguists actually concede that long periods of disuse may lead to the progressive decay or disentrenchment of a construction (Dąbrowska 2004: 213; Langacker 1987: 59, Ellis 2002b: 305; Ellis 2006: 5), nobody seems to have gone beyond this temporal dimension of recency (for relevant studies, cf. Bard et al. 2000; Fowler 1988; Fowler and Housum 1987; Pluymaekers, Ernestus, and Baayen 2005a, b). However, many other factors suggest themselves as plausible predictors of entrenchment. One likely determinant of entrenchment is age of acquisition, which has been shown to affect the processing of words in a number of psycholinguistic studies. However, a moot point is the extent to which age of acquisition is ultimately reducible to corpus frequency – an issue which has recently received considerable attention in the literature (Brysbaert, Lange, and Van Wijnendaele 2000; Carroll and White 1973; Ellis and Lambon Ralph 2000; Gerhand and Barry 1999; Ghyselinck, Lewis, and Brysbaert 2004; Morrison and Ellis 1995, 2000; Stadthagen-Gonzales, Bowers, and Damian 2004;

40

Against corpus-derived cognitive claims

40

Zevin and Seidenberg 2002). Although it is commonly accepted that both predictors are highly correlated such that higher frequency words are learned earlier, it seems fair to say that the question is as yet unresolved. It has been suggested that age of acquisition and corpus frequency might actually represent two dimensions of a single underlying variable, cumulative frequency, which refers to the frequency of exposition to a given construction over lifetime (cf. Lewis, Gerhand, and Ellis 2001). This in turn indicates that the power law of practice may be put down to the fact that the impact of tokens of experience decreases with age. Needless to say, cumulative frequencies are notoriously difficult to come by as we still lack the relevant corpora. Likewise, it seems plausible to assume that the length of a string (be it in letters, morphemes or lexemes) also has a role to play in predicting linguistic entrenchment. Although I do not know of cognitive linguistic studies looking into this specific question, the psychological literature suggests that in general cognition, chunks may not consist of more than five to seven subchunks, and there is no reason to assume that things should be different in the domain of language (cf. section 4.3.1). As a result, it seems reasonable to assume, for example, that long poems will usually be broken down into several holistically represented subchunks for memorization. Another idea that deserves consideration is that the very rarity of a stimulus may also promote its entrenchment, since it will be unexpected and therefore highly salient. This is corroborated by Bley-Vroman (2002: 213), who states that “[m]any things that are encountered only once or very rarely may strike the learner as salient, be noticed and processed deeply, and be incorporated into linguistic knowledge.” Salience (which might in turn be statistically operationalizable as a low overall probability of occurrence, possibly in combination with a high probability of occurrence in certain contexts) may be one way of explaining why certain low token-frequency expressions like with bated breath, by dint of, and hale and hearty, which contain lexemes not otherwise attested in English (bated, dint and hale) are still part of the knowledge of native speakers of English (Bybee 2007: 16). More functional dimensions are also likely to play a role. Ullman (2007: 267), for instance, claims that unitary storage is more likely in stimuli associated with high mental imagery, and the psychological literature demonstrates enhanced memory retention for emotionally arousing versus neutral words (e.g., cock, horror or pain versus logic, layer and beard) (for an overview, cf. Sharot and Phelps 2004). Likewise, it has been observed that the automatic, highly repetitive and presumably holistic speech output characterizing certain kinds of aphasia tends to be high in emotional load (e.g., bloody hell or I told you, cf. Blanken and Marini 1997).

41

The cognitive realism of usage-based generalizations

41

Moving away from statistical issues, another critical point that deserves consideration is the fact that many linguists take for granted that knowledge of language is itself knowledge of the statistical structure of language. This becomes evident, for example, from Baayen’s (2007: 84) claim that “it must be advantageous for the brain to keep track of detailed combinatorial probabilities.” This is also reflected in the following statement by Ellis: Language learning too is an intuitive statistical learning problem, one that involves the associative learning of representations that reflect the probabilities of occurrence of form–function mappings. Learners have to FIGURE language out: their task is, in essence, to learn the probability distribution P(interpretation|cue, context), the probability of an interpretation given a formal cue in a particular context, a mapping from form to meaning conditioned by context (Manning 2003). (Ellis 2006: 8; italics his)

At first sight, this claim seems to be corroborated by studies like those by McDonald and Shillcock (2003a, 2003b). On the basis of a series of eye-tracking studies, these authors demonstrated that transitional probabilities between lexical items (i.e., the likelihood of seeing a word given the word that precedes or follows it) significantly correlate with fixation times and total gaze durations. From this they conclude that the brain exploits statistical information about word-to-word contingencies to predict upcoming words during reading (for studies suggesting similar conclusions with regard to preferential associations between lexical items and grammatical patterns, cf. Gahl and Garnsey 2004). However, conclusions like this make it blatantly clear that there is an unfortunate blurring between the linguists’ theoretical metaknowledge and the speaker’s internal language system. As Perruchet and Peereman (2004) rightly point out, the observation that human language is consistent with certain statistical rules does not necessarily mean that subjects extract and apply those rules. Perruchet and Peereman (2004: 116) go on to propose the following thought-provoking analogy: To draw an analogy, predicting the depth at which a hammer-stroke will drive a nail in a piece of wood may involve complex mathematical operations; However, nobody, we guess, would claim that the actual effect of the hammer is the result of a computational process.

Likewise, Kronenfeld (2006) convincingly argues that the formal description of some cognitively driven human behaviour will not necessarily be isomorphous with the cognitive processes underlying this very behaviour. In a similar vein, Newmeyer (2003: 695) claims that it is a long way from

42

Against corpus-derived cognitive claims

42

the finding that statistical generalizations over corpora reveal interesting linguistic patterns “to the conclusion that corpus-derived statistical information is relevant to the nature of the grammar of any individual speaker.” What can then be taken to drive the statistical structure of language and frequency effects in language users? Bley-Vroman (2002) proposes that these phenomena are secondary by-products of the semantic and pragmatic dimensions of language. As I see it, this view has the obvious benefit of being consistent with the usage-based claim that a construction’s distribution is conditioned by its functional role (e.g., Goldberg 2006: ch. 8; for further discussion cf. Argaman and Pearlmutter 2002). Note that the usagebased paradigm, which explicitly understands itself as an offshoot of American functionalism (Bybee 2010: 195), is officially committed to explaining “aspects of grammar through reference to meaning and discourse function” (Bybee 2010: 11). Moreover, if our linguistic knowledge was exclusively driven by statistical preferences, how could we flexibly adapt to new situations and get our specific communicative intentions across? One way of sustaining this affirmation would be to posit that our subjective communicative needs are illusionary and that we process contextual cues in a stimulus-response manner on the basis of our personal statistical experience with language. This situation may indeed hold in certain highly stereotypical situations (e.g., paying at the cash register), in experimental settings investigating the comprehension of ambiguous sentences in the absence of a meaningful context (Seidenberg 1997), and maybe even in other kinds of sterile experiments lacking interlocutors whose attitudes and intentions could be taken into account. But too rigid a behaviourist view seems unsatisfactory when applied to language production in natural, non-stereotypical contexts, where we may want to convey meanings that are actually statistically dispreferred. Another solution would consist in positing an extremely large gap between a statistically forged language representation and a much more individual, function-driven language output. This would, however, face us with the thorny problem of relating both levels. Although for all these reasons I am more inclined towards a functional solution to this puzzle (with a possible interaction with frequency-driven processing in certain contexts), it is strictly speaking impossible to exclude the alternative proposal that language knowledge is statistical in nature, with the functional side of language being epiphenomenal. To make things worse, I must admit that I cannot even think of a straightforward way of getting an empirical handle on this question. To devise experiments that orthogonalize frequency and function, one would first have to operationalize function, which would be a rather hazardous undertaking, as function

43

The cognitive realism of usage-based generalizations

43

covers a lot of heterogeneous phenomena. The only straightforward way of getting at function would be via statistical computations of distributions (cf. the behavioural profile approach mentioned in section 3.1), at which point the whole thing would become circular. It may thus very well be that this question is philosophical rather than empirical, and that statistics and function represent two separate and ontologically different ways of getting a grip at one and the same phenomenon. My aim here was not to settle the issue, but to point out that the blurring between metalinguistic knowledge and speaker-internal representation that follows from a rigid version of the corpus-to-cognition principle may engender theoretical inconsistencies. A strict one-to-one mapping between different levels would also yield circularities at other levels of explanation. With regard to the relationship between language diachrony and collective language use, a simple equation would result in stasis, leaving no room for functional adaptation, heterogeneity, and change, although these concepts are central to the usage-based approach (cf. chapter 2). Dąbrowska (2004: 67), for example, maintains that “the story of the emergence of language is one of co-evolution: humans adapted to language, but language (or rather languages) also adapted to humans.” Applied to the mental level, a strong version of the corpus-to-cognition principle would yield an argumentation along the following lines: -

Why is a given string highly entrenched in the mind? Because it is frequent in natural language use! But why is it frequent in language use? Because it is highly entrenched!

The underlying circularity is, for example, strongly brought out in Langacker’s (1991: 48) statement that “[e]ach structure has some degree of entrenchment, which reflects the frequency of its previous activation and determines the likelihood of its subsequent activation.” Under such an approach, how could changing in the frequencies of occurrence (which are assumed to accompany or motivate language change) occur in the first place? This section has shown that a strong version of the corpus-to-cognition principle would imply that language users record input frequencies and extract regularities from usage data in a largely uniform, statistical fashion, that they converge on the same set of generalizations (in spite of differences in exposure to usage data), and that knowledge of language is itself statistical in nature. Moreover, it highlighted the theoretical inconsistencies that an overly naive and positivist interpretation of the corpus-to-cognition

44

In support of corpus-derived cognitive claims

44

principle brings about and identified questions of statistical modelling that need to be addressed if we are to develop a better understanding of the relationship between mental entrenchment and usage data. 3.3.

In support of the potential psychological realism of corpusderived claims

The present section aims to highlight that despite the provisos outlined above (cf. sections 3.1 and 3.2), there is reason for cautious optimism with regard to the potential of corpus data to inform us about entrenchment in the mind. Section 3.3.1 will introduce the notion of neuroplasticity, which refers to the ability of the brain to change throughout life as a function of experience. It will also show that the concept of neuroplasticity has been used to explain recent experimental findings on frequency effects in single word processing. Section 3.3.2 will suggest that this account can be readily extended to cover attested token frequency effects in the acquisition and use of multiword sequences. Sections 3.3.3 through 3.3.5 will present results from different strands of neuro- and psycholinguistic research which support the view that certain item-based constructions have a unitary status in the mind. All in all, this section will show, first, that there undeniably exists a correlation between corpus-extracted token frequencies and language processing in the mind and, second, that the phenomenon of chunking is both psychologically real and amenable to empirical testing. It is, however, a long way from acknowledging this to claiming that token frequencies in corpora should be seen as mirroring entrenchment in the mind, since so far there has been no attempt to integrate empirical research on frequency effects and chunking. 3.3.1.

Neuroplasticity

In the 1980s, the long-standing dogma that the brain is rigid after a critical period in early infancy began to crumble. More and more studies demonstrated that the adult brain is not ‘hard-wired’ with immutable neuronal circuits, but rather changes constantly as a result of experience. Since then, experimental and neuropathological research has shown that the malleability and adaptability of the brain rests on several factors. First, it is by now widely acknowledged that even though mature neurons do not divide, new neurons can arise from certain kinds of stem

45

The cognitive realism of usage-based generalizations

45

cells (notably in the dentate gyrus of the hippocampus and the olfactory bulb, and possibly also in other brain regions). Importantly, these new neurons have been shown to be able to incorporate themselves into the functional architecture of the brain (for relevant research on mammalian brains, cf. Gould et al. 1999; Rakic 2002; Reynolds and Weiss 1992). Second, and this is an older finding, connections between existing neurons get constantly modified (i.e., added, removed, weakened or strengthened) (Schratt et al. 2006). Third, the functions of structures of the neurosystem are not permanently fixed and can be reassigned in response to brain injury or training. Thus, Musso et al. (1999) show that after aphasic stroke, language training induces cortical reorganization such that right hemisphere regions take on functions originally located in the left hemisphere, leading to improvements in comprehension. We also know that intensive training can result in significant enhancement of cognitive functions in normal adults (Mahncke et al. 2006). A gene that might play a role in modulating the plasticity of languagerelated neural circuits is FOXP2 on chromosome 7 (Fisher and Scharff 2009; Haesler et al. 2004; Lai et al. 2001; Newbury, Fisher, and Monaco 2010). Whatever the specific role of this gene, it is important to emphasize that its impact on language is far less direct than the catchphrase ‘FOXP2 – the language gene’ suggests, since it only regulates the expression of other genes. While findings on adult neurogenesis and rewiring tie in well with the usage-based hypothesis that language experience determines mental language representation, some caution is warranted as well, since there are obvious limits to neuroplasticity. Thus, we know that the brain’s capacity for adaptive reorganization gets more limited after childhood, that the effects of training decrease as a function of time, and that different parts of the brain may be more or less plastic (e.g., the cerebral cortex cannot compensate for the loss of the hippocampus, cf. Bindschaedler et al. 2011). Moreover, it has been shown that exposure alone is not sufficient – stimuli must be associated with reward, and cortical reorganization crucially hinges a subject’s motivation and attention (Beitel et al. 2003; Blake et al. 2006). In view of these limits to neuroplasticity as well as the relatively serious concerns about the corpus-to-cognition principle raised in section 3.2, it may come as a surprise that recent experimental research overall attests to the brain’s sensitivity to word frequencies in linguistic experience as gauged by large-scale corpora (for a more nuanced account of frequency

46

In support of corpus-derived cognitive claims

46

effects than the standard view reported here, cf. Baayen 2010b).5 Note, however, that since frequency effects on single word processing are not the main focus of the present work, I shall refrain from giving a detailed overview of the relevant studies in the following. Instead, I will content myself with presenting the bottom line, referring the interested reader to the most relevant studies as well as some of the numerous overviews which have been published over the last years. As far as psycholinguistic studies of single word processing are concerned, token frequencies have emerged as an important predictor in a wide variety of experimental conditions, for different modalities, populations and languages. More concretely, it has long been known that high-frequency words elicit quicker and more accurate responses than matched lowfrequency words, a finding which has been shown to be extremely replicable and robust (for recent studies, cf. Balota et al. 2004; Damian 2003; Munson 2007; Rayner, Liversedge, and White 2006; for overviews of older studies, cf. Ellis 2002a; Meunier and Segui 1999a; Monsell 1991; Murray and Forster 2004; Rastle 2007; Reichle, Rayner, and Pollatsek 2003). Crucially, electrophysiological and neuroimaging research has recently produced converging evidence to this effect (for a recent overview, see Vannest et al. 2011). Thus, word frequency has been widely reported to correlate with ERP peak latencies and amplitudes as well as BOLD fMRI amplitudes in different experimental conditions6, with the great majority of studies finding weaker and earlier brain responses for words of higher frequency (for electrophysiological studies on word frequency effects, cf. Assadollahi and Pulvermüller 2003; Carreiras et al. 2009; Dambacher et al. 5. Note that so far, the great majority of psycho- and neurolinguistic studies investigating frequency effects in the English mental lexicon have been based on frequency data extracted from the CELEX or HAL databases. These databases, and the corpora they are based on, will be presented in section 5.2.1.1. 6. Event-related potential recordings (ERP) and functional magnetic resonance imaging (fMRI) are non-invasive neuroimaging techniques used for monitoring cerebral activity. fMRI localises metabolic activity in cerebral tissues, exploiting the fact that neural activity induces local changes in blood flow and blood oxygenation, leading to minute changes in magnetic blood properties. This local increase in metabolic brain activity is captured by means of the so-called blood-oxygen-level dependence (BOLD) contrast. By contrast, ERP uses scalp electrodes to record event-related fluctuations in the electrical activity of cortical neurons which reflect the action potentials generated by large populations of neurons (cf. Ingram 2007: 59-61). More details on BOLD fMRI will be provided in section 4.2.

47

The cognitive realism of usage-based generalizations

47

2006; Hauk et al. 2006; Hauk and Pulvermüller 2004; Münte et al. 2001; Proverbio, Vecchi, and Zani 2004; Sereno, Rayner, and Posner 1998; Sereno and Rayner 2003; for fMRI studies on word frequency effects, cf. Carreiras, Mechelli, and Price 2006; Chee et al. 2002; Fiebach et al. 2002; Fiez et al. 1999; Hauk, Davis, and Pulvermüller 2008; Joubert et al. 2004; Kronbichler et al. 2004; Nakic et al. 2006). Hauk and Pulvermüller (2004), who conduct an electrophysiological study that yields negative effects of printed word frequency on ERP amplitudes in a lexical decision task, conclude to neuroplasticity in the cerebral network underlying visual word recognition and go on to elaborate: “The more often a word is encountered, the more efficient the synaptic connections representing this word in the network become, such that less activation is necessary to retrieve the corresponding word”. This means that Hauk and Pulvermüller (2004) interpret their findings in terms of strength of representation and ease of processing, which is of course fully in line with usage-based predictions about entrenchment. However, studies on monomorphemic words cannot shed full light on entrenchment, since single words are inherently holistically represented by virtue of their being arbitrary form-meaning associations – just like the non-compositional multi-morphemic sequences discussed in section 2.1. As a result, they are not appropriate to examine the second defining criterion of entrenchment singled out in chapter 2, namely the existence of a holistic rather than a (merely) morphemic cognitive representation. This phenomenon has to be investigated on the basis of transparent multimorphemic sequences, for which there is no logical necessity of resorting to a chunked representation. While most research into the cognitive correlates of token frequencies has indisputably focused on single words, there has recently been some research into multimorphemic strings, which will be briefly reviewed in the next section. 3.3.2.

Experimental research on token frequency effects in multi-word sequences

The studies to be presented in the present section can be subdivided into two broad subgroups: Studies on token frequency effects in first language acquisition, and psycholinguistic research on token frequency effects in adult processing. It will be shown that while all studies unanimously attest to the impact of token frequencies on the processing of multi-word expressions, most of them are problematic in that – despite claims to the contrary – they do not really test for holistic storage, but merely for ease of pro-

48

In support of corpus-derived cognitive claims

48

cessing. I defer discussion of why this is problematic to section 3.4; suffice it to say here that although such findings are both suggestive and interesting in their own right, they do not unequivocally support usage-based over competing theories, since they do not conclusively show that the items under investigation are mentally represented as units. In the field of first language acquisition, different studies of child speech have shown that token frequencies in the language input of caregivers are predictive of different phenomena in language acquisition (for overviews of frequency effects on language acquisition, cf. Gülzow and Gagarina 2007 and Lieven and Tomasello 2008). For example, Cameron-Faulkner, Lieven, and Tomasello (2003) report significant correlations between the frequency of occurrence of certain copula frames (like There’s the X or It’s a Y) in mothers’ and their children’s speech, which strongly suggests that early child language is highly conservative and usage-based. Likewise, naturalistic studies of child speech show that the order of emergence of multiword constructions in language acquisition critically depends on their frequency of use (De Villiers 1985; Naigles and Hoff-Ginsberg 1998; Theakston et al. 2005). Bannard and Matthews (2008) compare sentence repetition for familiar (e.g., sit in your chair) versus matched infrequent (e.g., sit in your truck) compositional sequences and find that both two- and three-year-olds are significantly more likely to correctly repeat high-frequency sentences. Interestingly, three-year-olds are also significantly quicker to repeat the first three words of a sequence if they form part of a chunk (for example, they are faster to say sit in your when this sequence is followed by chair than when it is followed by truck). Although all aforementioned studies explicitly try to make a case for the idea that children memorize and retrieve high-frequency utterances as in a holistic fashion, it is important to emphasize that they actually give no reason to reject the null hypothesis that high-frequency strings are simply assembled with greater ease than their low-frequency counterparts. Much more interesting in this respect are studies demonstrating that high token frequencies of lexically specific strings protect children from overgeneralization. Note that overgeneralization implies productivity or composition, while the absence thereof arguably indicates holistic storage. Thus, Rowland and Pine (2000) and Rowland (2007) demonstrate that children who frequently hear inverted questions (e.g., Do you X?, What can Y?) in their mother’s speech are significantly less likely to make subjectauxiliary inversion errors on these sequences in their own speech, which strongly suggests that they are retrieved rather than computed. Lieven and Tomasello (2008) emphasize that at first sight, the use of such highly entrenched sequences tends to deceive people into thinking that children have

49

The cognitive realism of usage-based generalizations

49

mastered the relevant ‘adult construction’, even though these sequences are not actually compositionally computed. Let us now go through the relevant psycholinguistic research on adult language processing. Bod (2000, 2001) confronts native speakers of English with compositional three-word (subject-verb-object) sequences of different frequencies and has them decide as quickly as possible whether they are English sentences or not. He finds that low-frequency strings (such as I keep it) are reacted to more slowly than high-frequency strings (such as I like it). As the sequences are tightly matched for plausibility, lexical frequency, complexity, subcategorization, and aspect, he concludes that the processing advantage for high-frequency strings must be due to holistic memory storage. Likewise, Jiang and Nekrasova (2007) conduct two online grammaticality judgment tasks and show that both native and nonnative speakers of English respond to high-frequency multiword expressions (e.g., to begin with) significantly faster and more accurately than to matched low-frequency phrases (e.g., to dance with). Like Bod (2000, 2001), they interpret their findings in terms of holistic storage for highfrequency formula. Similarly, two self-paced reading experiments by Bannard (2006) reveal that high-frequency phrases are read significantly quicker than infrequent counterparts (for a similar finding, cf. Tremblay, Derwing, and Libben 2009). This holds both for counterparts that are identical except for the terminal word (e.g., a state of pregnancy versus a state of emergency) and for those that merely have an identical syntactic form (but are matched in terms of length in letters, component word frequencies and sequenceinternal transitional probabilities, e.g., from the point of view versus about the role of taste). As a result, Bannard (2006) claims that frequency counts taken from the written component of the British National Corpus are informative about speakers’ processing of language. A similar point is made by Arnon and Snider (2010), who, in a series of phrasal decision experiments similar to that of Bod (2000), show that the processing time for compositional four-word phrases (such as don’t have to worry) correlates negatively with their frequencies, an effect which they show not to be reducible to the frequencies of individual component words or substrings. Arnon (2009) presents evidence supporting the importance of stored multi-word expressions in the language of children and adults. For example, preschool children are more proficient at producing irregular plurals (e.g., teeth) as parts of frequent chunks (e.g., Brush your ...) than in response to questions (e.g., What are those?). Arnon (2009) also demonstrates that multi-word phrases are part of the adult language repertoire. In a series of experiments, she shows that high usage frequency enhances the

50

In support of corpus-derived cognitive claims

50

processing of compositional four-word phrases in adults (for example, don’t have to worry will be processed more quickly than lower-frequency don’t have to wait). An artificial language learning experiment reveals that adults are better at learning grammatical gender when acquisition proceeds from big form-meaning parings rather than atomic units, which strongly suggests that the normal route for language acquisition is via segmentation of holistic sequences (rather than via composition of individual morphemes). Arnon concludes that speakers must have knowledge of the frequencies of occurrence of whole chunks. All the psycholinguistic studies reviewed in the present section show a clear processing advantage for higher-frequency sequences. As stated above, this result is interesting, but it does not conclusively demonstrate that the relevant sequences are represented in a holistic format, since the participants may simply have assembled them with greater ease and efficiency as a result of greater association strengths between morphemes. Fortunately, there has recently been some psycholinguistic research explicitly devoted to entrenchment. The bad news, however, is that although most of these studies do test for unit status, only one of them has begun to explore potential correlations between chunking and usage frequencies, as will become clear from the next section. 3.3.3.

Psycholinguistic research on the entrenchment of multi-word sequences7

The first series of entrenchment studies goes back to Harris (1998). In an initial experiment, Harris (1998) demonstrates that the recognition of the last word of four-word idioms (like great minds think alike) is speeded when preceded either by the first two words or the middle two words of the relevant idioms. The fact that idiom completion is found in non-adjacent sequences (i.e., great minds facilitates the recognition of alike in the absence of think) is taken to suggest that parts of an idiom activate a holistic chunk-level representation, which then top-down activates missing component parts.

7. The present overview deliberately excludes research focusing on sequences of the idiosyncratic type (like beat around the bush or beauty is in the eye of the beholder), most notably the studies presented in the volumes edited by Schmitt (2004) and Wood (2010), unless they are explicitly concerned with entrenchment.

51

The cognitive realism of usage-based generalizations

51

In a second experiment, Harris (1998) shows that when pairs of words are briefly flashed on a screen, letter detection is better in familiar collocations (e.g., focal point) than in collocation neighbours (i.e., word pairs identical to a collocation except for one letter, e.g., vocal point) and noncollocations (pairs of randomly conjoined words, e.g., cargo point). A third experiment involving the same task demonstrates that collocation neighbours actually tend to be misread as the relevant collocation (e.g., free would instead of free world, eight club instead of fight club or tag bill instead of tax bill), which corroborates the idea that unitized representations at the chunk level exist. In a more recent experiment, Caldwell-Harris and Morris (2008) report a correlation between the purported degree of entrenchment of collocations (in terms of their Google frequencies) and the probability of participants perceiving them in their familiar order when actually displayed in sequentially reversed order (e.g., seeing fan club when presented with club followed by fan). These ‘reversal errors’ are highest for high frequency collocations (zip code), next highest for lower frequency word combinations (e.g., machine gun), next for merely licit adjective-noun combinations (e.g., huge church), and lowest for random word pairs (puppy hill). Berant, Caldwell-Harris, and Edelman (2008) compare the entrenchment of liturgical Hebrew sequences in two groups of Israelis with different prayer recitation habits, observant and secular Jews. Both groups are asked to identify briefly displayed religious expressions selected from daily, weekly and annual prayers. Compared to the secular group, religious participants perform more accurately and show much stronger frequency effects such that phrases from daily prayers have greater accuracy than those from weekly and annual prayers. This experiment confirms that entrenchment varies as a function of a subject’s frequency of exposure to stimuli (cf. also Caldwell-Harris, Berant, and Edelman 2012). The first three entrenchment publications mentioned here clearly attest to the holistic storage of the sequences under investigation, since they show that chunk-level representations exist and take precedence over component parts in tasks that are not reducible to association strengths between adjacent morphemes. Unfortunately, only one of the above studies establishes a correlation between chunk status and usage frequencies. Even more regrettable is the fact that none of these studies attempts to investigate the neurophysiological underpinnings of entrenchment. Although there have been neurobehavioural studies on linguistic chunking, they only provide indirect information on the relationship between corpus frequencies and mental entrenchment for two reasons. First, most of the relevant studies focus on language pathology and therefore cannot be

52

In support of corpus-derived cognitive claims

52

taken to provide insights into the functioning of ‘normal’ brains (remember that, as shown in section 3.3.1, the brain rewires itself in response to injury). Moreover, these studies do not define chunks in terms of frequencies and, indeed, seem to be dealing with a rather disparate set of expressions, with many of them being in some form emotionally charged (e.g., curses or prayers). The problem with this is that emotionally arousing expressions are known to have a special mental status (cf. section 3.2), which makes it difficult to draw conclusions about holistic language in its entirety. These studies will still be reviewed in the next section to show how the distinction between chunks and non-chunks has been interpreted in the neurolinguistic literature. 3.3.4.

Patholinguistic data showing selective dissociations between holistic versus novel sequences of morphemes

As early as the mid-nineteenth century, the neurologist and aphasiologist John Hughlings Jackson [1874] (1915) hypothesized that the management of prefabricated expressions and the generation of novel sentences are represented in different regions of the brain and processed according to different mental mechanisms. Since then, his claim has been confirmed in numerous clinical studies (cf. Van Lancker 1987; cf. also Alajouanine 1956; Blanken and Marini 1997). Although many details remain unclear, there is by now widespread agreement in the patholinguistic literature that in most right-handed people, the primary site of processing of holistic expressions is the right hemisphere, whereas the left hemisphere is dominant for newly generated expressions. The main reason for assuming such a dichotomy is clinical evidence, which reveals that novel and holistic language are differentially disturbed in neurological disorders. Thus, it has long been known that in many types of aphasia, prefabricated expressions are the only residual output to exhibit normal grammar, fluency and articulation (Lum and Ellis 1994; Myers and Linebaugh 1981). For instance, Van Lancker Sidtis (2004a) cites the example of a righthanded adult patient who became aphasic after a left frontoparietal stroke. This patient, though virtually unable to generate any novel sentences, was observed to use formulaic expressions almost exclusively in fluent speech, as illustrated by the following conversation excerpt: Patient: I came, I saw, I conquered. Clinician: What else did you use to do? ... Were you an engineer?

53

The cognitive realism of usage-based generalizations

53

Patient: Yes, I was an engineer. It's more important. It's that I ... I said good morning. I said good morning. And ... or …I didn't even say good morning. I just said Hi, how are you? How are you? And we ... we ... Hi, good morning. How are you. It was 9, 8:30, 9:00. I decided to ... I did very, very well, and then, all of sudden. It's a long story. But I think I know what I’m talking about. I hope so. I hope so, too. (Van Lancker Sidtis 2004a: 21)

The earliest published account of the selective preservation of formulaic chunks seems to go back to Peter Rommel, who in 1683 described the linguistic output of a woman with severe motor aphasia in the following way: She could say no other word, not even a syllable, with these exceptions: the Lord’s Prayer, the Apostle’s Creed, some Biblical verses and other prayers, which she could recite verbatim and without hesitation, but somewhat precipitously. But it is to be noted that they were said in the order in which she was accustomed to saying them for many years, and, if this regular sequence were interrupted and she were asked to recite a prayer or Biblical verse not in the accustomed place, she could not do it at all, or only after a long interval and with great difficulty … Then we tried to determine whether she could repeat very short sentences consisting of the same words found in her prayers. However she was unsuccessful in this. (quoted in Benton and Joynt 1960: 113-114 and 209-210 as well as Wray 2002: 217-218)

Interestingly, studies on left hemispherectomized adults report similar speech output (Van Lancker and Cummings 1999; Van Lancker Sidtis 2004a: 23, Van Lancker 2004b). The same goes for patients whose left cerebral hemisphere is anaesthetized for a few minutes, leaving the other hemisphere relatively operational (Czopf 1972). It has been demonstrated that right hemisphere-damaged patients conversely perform more poorly on formulaic and idiomatic language than on novel expressions (Kempler et al. 1999). On the basis of these neurobehavioural observations, a dual-process model has been proposed, which states that there are two clearly distinct modes of using language: a holistic mode associated with subcortical8 and/or right-hemispheric structures for formulaic expressions, and a compositional mode relying on left-hemispheric networks for creative language (Van Lancker Sidtis 2010). Van Lancker Sidtis (2006), who presents an overview of neurological diseases that are associated with impaired 8. Subcortical structures are brain regions which are located below the cortex, a term which refers to the outermost cellular layers of the human brain (see Figure 40 and Figure 41 in the appendix).

54

In support of corpus-derived cognitive claims

54

automatic speech, claims that although these two modes of using language draw on disparate neurological structures, they are in continuous interplay in normal subjects. Incidentally, a proposal which is strikingly similar in general outline has been put forward by neurolinguists from the generativist tradition. According to so-called ‘dual-mechanism theories’, we have a holistic module which is clearly distinct from, but interacts with, a compositional module. More specifically, both Pinker’s Words-and-Rules theory and Ullman’s Declarative-Procedural model (Pinker 1999; Pinker and Ullman 2002; Ullman 2001, 2004) hold that language in general and inflectional morphology in particular rely on two distinct kinds of capacities which are subserved by separable neurocognitive substrates: a mental grammar, which underlies rule-based symbolic computations, and an associative memory, which involves the holistic storage and retrieval of unanalyzed chunks. This claim has received support from several experimental sources. Ullman and colleagues interpret the language deficits exhibited by individuals with so-called Selective Language Impairment as a selective impairment of the rule system (which they localize in Broca’s area9 and the basal ganglia10), and show that they can, to some degree, compensate for this deficit by memorizing complex forms (Walenski and Ullman 2005; Ullman and Gopnik 1999; Ullman and Pierpont 2005). By contrast, children with Williams syndrome, a rare neuro-developmental disorder of genetic origin, have been reported to produce more morphological overgeneralization errors than normal children while at the same time exhibiting spared processing of regular forms, which has been interpreted as a selective deficit of the storage system (Eisenbeiß 2009: 296; Clahsen, Ring, and Temple 2004; Semel and Rosner 2003: 44). Likewise, Shallice (1988) demonstrates that injury to the brain can lead to behavioural dissociations between English regular and irregular inflected forms. More recently, a host of behavioural studies involving patients with left frontal lobe damage have found selective disruptions for regularly inflected forms (Marslen-Wilson and Tyler 1997; Tyler, Randall, and Marslen-Wilson 2002; Tyler et al. 2002; Longworth et al. 2005). By contrast, patients with temporal lobe lesions have been shown to have greater 9. Broca’s area is located in inferior parts of the left frontal lobe. It is typically defined in terms of pars triangularis and pars opercularis (cf. Figure 36 and Figure 39 [top] in the appendix). 10. cf. Figure 37 in the appendix.

55

The cognitive realism of usage-based generalizations

55

difficulties with irregular forms, a pattern which also affects patients with degenerative diseases like semantic dementia or Alzheimer’s disease (Ullman 2007). Marslen-Wilson (2007: 175) argues that selective deficits in regular inflectional morphology are related to brain lesions in the left-hemispheric perisylvian language system, more precisely inferior frontal and superior temporal regions. By contrast, deficits for irregulars have been related to lesions in medial and inferior temporal areas.11 Thus, Tyler, MarslenWilson, and Stamatakis (2005) conducted a neuroimaging experiment with 22 right-handed brain-damaged subjects, who were exposed to morphologically related pairs of auditory stimuli, with the first element of each pair representing a past tense verb, and the second element being its present tense form (e.g., slept – sleep, jumped – jump). Participants were asked to decide as quickly as possible whether the stimuli they heard were real English words. Tyler, Marslen-Wilson, and Stamatakis (2005) established correlations between behavioural performance on different kinds of morphological relationships and the degree of integrity of tissues across the brains of these patients. The result was that pairs involving regular past tense forms rely on tissues in the left inferior frontal and superior temporal cortices, whereas irregular pairs are subserved by much more posterior temporoparietal brain regions. This section has shown that different strands of patholinguistic research converge on the broad claim that holistic memory and composition represent two clearly separate, but interacting modules. Intriguingly, different research traditions have associated holistic memory with different cerebral areas. A tentative explanation of this finding might stress the difference between the stimuli considered in the ‘formulaic’ and the ‘dual-mechanism’ traditions. Indeed, the only common denominator between notoriously idiosyncratic and emotionally charged formulae and high-frequency regular past tense forms seems to be their supposed holism. Be this as it may, the data on selective dissociations between computation and storage have been widely interpreted as falsifying the “emergentist view of language where all linguistic experience (be it atomic or complex) is processed by the same cognitive mechanism” (Arnon 2009: vi). However, it is important to emphasize that this interpretation is clearly not warranted by the data, since we might just as well assume that the relevant 11. Gyri are ridges on the surface of the brain. A gyrus is usually demarcated by several furrows, most of which are referred to as sulci. The largest sulci, which divide the brain into four major lobes, are often called fissures (cf. Figure 36 and Figure 39 in the appendix).

56

In support of corpus-derived cognitive claims

56

dissociations merely reflect impairments affecting extreme points of a continuum. To be sure, the proposed clear-cut dichotomy between composition and retrieval could not possibly be falsified by psycholinguistic experiments attesting to gradient linguistic behaviour in participants. Such a finding could easily be accommodated by positing that both modules are clearly separate, but interact to different degrees. This view is highlighted by Ullman’s (2004: 247) statement that “the same or similar types of knowledge can in at least some cases be acquired by both systems” (see also Pinker and Jackendoff 2005). Walenski and Ullman (2005: 336) go even further by pointing to the possibility of redundancies between the two modules: In addition, transparent and predictably structured complex forms (e.g., walked, the cat) that might be expected to be a function solely of morphological or syntactic computation (at least in certain linguistic approaches), can also be memorized in the lexicon (potentially with specification of their internal structure), resulting in the possibility for redundant (stored versus computed) representations for certain forms.

How could such a kind of redundancy, which involves clearly separate, but dynamically interacting cognitive modules connected via direct anatomical projections be told apart from a usage-based kind of redundancy, which implies redundancy between qualitatively identical and intricately interwoven representations along a single cognitive gradient? The real test case for the issue at hand would be a study which looks at the underlying neural representations. With regard to my specific research question, this would involve correlating token frequencies of transparent multimorphemic sequences with neural responses. To the best of my knowledge, so far, there has not been any neuroimaging study of this kind. There have, however, been neuroimaging studies comparing neural activation for irregular (and thus inherently holistic) and regular (and thus potentially compositionally computed) sequences of morphemes. These studies, which represent a useful frame of reference for my own neuroimaging studies, will be presented in the next section.

57 3.3.5.

The cognitive realism of usage-based generalizations

57

Neuroimaging studies supporting the idea of a neurocognitive split between holistic and compositional modules

It is interesting to note that recent research on non-impaired populations has been taken to corroborate the idea of a neurocognitive split between holistic and compositional aspects of language. Tyler et al. (2005) conducted an event-related fMRI study in a group of 18 young adults, who were exposed to pairs of spoken words and asked to decide whether these words were the same (e.g., played – played or bought – bought) or different (stayed – stay or taught – teach). The first word in each pair was spoken by a male speaker, while the second word was spoken by a female speaker to ensure some depth of processing by avoiding decisions being made on the sole basis of surface similarities. The most significant peaks of activation for regulars against irregulars were found in the left and right superior temporal gyri (L and RSTG) and the left anterior cingulated gyrus (LAC)12 (p < 0.05, corrected for multiple comparisons in whole-brain analysis). A more sensitive statistical method (i.e., a region-of-interest analysis using small volume correction) revealed that certain regions in the left inferior frontal cortex (LIFC), especially pars opercularis (BA 44) and, at a lower threshold of statistical significance, pars triangularis (BA 45), were more strongly engaged in the processing of regular than irregular morphology.13 According to Tyler et al. (2005), the LIFC regions of this frontotemporal network handle the parsing of complex forms into stems and affixes as well as the processing of extracted grammatical morphemes. Activation in bilateral (mainly posterior superior and middle) temporal regions is interpreted in terms of access to lexical form and meaning. Unfortunately, it is not made clear why this should be associated with additional processing costs for bare stems relative to irregular forms, which are assumed to be mapped onto stored lexical representations in a holistic fashion without prior segmentation. Tyler et al. (2005) go on to suggest that the cluster around LAC (which also extends into the right hemisphere) may subserve

12. This refers to the frontal part of the cingulate gyrus shown in Figure 39 (bottom) of the appendix. 13. BA is used as an acronym for ‘Brodmann area.’ Brodmann areas are regions of the human cortex defined in terms of cytoarchitectonic structure (i.e., by dividing the cortical tissue into regions on the basis of similarities and differences in cell densities or cell types) (cf. Figure 38 in the appendix).

58

In support of corpus-derived cognitive claims

58

the integration of information between superior temporal and inferior frontal regions. Marslen-Wilson (2007) suggests that the ‘decompositional network’ relies on a dorsal (or: superior) processing stream which connects left inferior frontal to superior temporal lobe regions via the arcuate fasciculus.14 This proposal is founded on analyses of anatomical connectivity in human and macaque brains (Catani, Jones, and H. ffytche 2005). According to Marslen-Wilson, this major connection might be supplemented by two secondary processing pathways: a ventral (or: inferior) route connecting orbito-frontal and anterior temporal regions, and a route related to the basal ganglia15 (see also Saur et al. 2008). All in all, Tyler, Marslen-Wilson and their colleagues claim that their findings militate against the notion of a single undifferentiated neurocognitive system underlying the processing of both regular and irregular forms. An fMRI study by Vannest, Polk, and Lewis (2005) shows that this finding extends to the distinction between two classes of overtly bimorphemic words. More specifically, they compare activation for derivatives involving the suffixes -ness, -less, and -able, (‘decomposable’ derivatives in their parlance) to that for derivatives ending in -ity and -ation (‘whole-word’ derivatives). Drawing on behavioural findings and certain theoretical linguistic assumptions,16 they hypothesize that ‘decomposable’ derivatives, whose affixes do not trigger phonological changes in the base (e.g., agree – agreeable) and are relatively productive, will be processed in a morphemic fashion. By contrast, members of the ‘whole-word’ group, which is less consistent in meaning, less productive and tends to change the pronunciation of the base (e.g., serene – serenity), are expected to be retrieved as holistic units. On the basis of Ullman’s Declarative-Procedural 14. cf. Figure 42 in the appendix. 15. cf. Figure 37 in the appendix. 16. Although this is not mentioned in any of the papers discussed in the present work, Vannest and her colleagues seem to draw on the idea of morphological strata, which is advocated by most models of Lexical Phonology (e.g., Giegerich 1999; Mohanan 1986). According to stratal theories, English derivational affixes can be subdivided into two relatively homogeneous groups (or: strata) on the basis of criteria like, among other things, origin, ability to trigger morphonological alternations, productivity and transparency. In these theories, the suffixes of ‘decomposable’ derivatives (e.g., -ness or -ful) are often referred to as ‘level-II-affixes’, while those of ‘whole-word’ derivatives (e.g., -al or -y) are called ‘level-I-affixes’.

59

The cognitive realism of usage-based generalizations

59

model (cf. section 3.3.4), Vannest, Polk, and Lewis (2005) predict that the ‘decomposable’ group should evoke more activation in Broca’s area and the basal ganglia than the ‘whole-word’ group. They conduct a memory encoding task with a blocked design, which involves participants (15 young adults) viewing sequences of 50 words with a certain suffix type and remembering these words as well as possible. After each a block, participants are presented with 10 words and asked to decide via button press whether they remember these words from the previously shown list. Although Vannest, Polk, and Lewis (2005) do not find significant statistical differences in memory performance between blocks with ‘decomposable’ and ‘whole-word’ items, they do find differences in neural activation. In their neuroimaging data analysis, they focus on the above-mentioned neuroanatomical areas as regions of interest, and compare activation for different word-reading conditions to monomorphemic items. A subject-bysubject analysis reveals that ‘decomposed’ words on average lead to stronger increases in activity than matched ‘whole-word’ derivatives, an effect which survives random-effects analysis at the group level in the basal ganglia. No further peaks of activation are found at the whole-brain level after correcting for multiple comparisons. Vannest, Polk, and Lewis (2005) conclude that their results support the neuroanatomical specifications of Ullman’s Declarative-Procedural model. According to these specifications, the mental grammar system, which combines stored units “into more complex sequentially and hierarchically structured units” (Walenski and Ullman 2005: 338), depends on procedural memory (cf. section 3.2). This memory component is assumed to be located in left frontal regions (more specifically, Broca’s area, especially BA 44, the precentral gyrus, and premotor areas, especially supplementary motor area [SMA] and pre-SMA), strongly interconnected basal ganglia circuits (particularly the caudate nucleus), parts of the temporal and inferior parietal cortex as well as parts of the cerebellum. By contrast, the declarative memory system, which is held responsible for holistic, associative storage in the mental lexicon, is functionally and anatomically subdivided into three sub-regions. According to the Declarative-Procedural model, the learning, consolidation and retrieval of memory information are rooted in the hippocampus and adjacent medial temporal lobe regions. Over time, memories ‘leave’ these subcortical regions and ‘move’ to neocortical structures. In the temporal lobes, middle and inferior regions are responsible for the storage of lexical meanings, while superior and temporo-parietal regions subserve the storage of phonological word forms and complex grammatical patterns and act as an inter-

60

In support of corpus-derived cognitive claims

60

face to the procedural memory system. The selection or retrieval of memories stored in temporal areas is supported by parts of the ventrolateral prefontal cortex largely corresponding to BAs 45 and 47, while the search for these memories depends on portions of the right cerebellum. The declarative memory system is thought to contain every chunk which is idiosyncratic and therefore has to be known by rote, such as word-level information (sounds and meanings) and ‘non-transparent idioms’. Although models that dichotomize compositional and holistic processing thus seem to have received some empirical support from neuroimaging research, their central claim has to be taken with some caution. One major reason for my reservation is that prominent advocates of the dualmechanism paradigm themselves have recently come to acknowledge that “some high-frequency regular forms might move into the same whole-word category” as that of irregular past tenses (Marslen-Wilson 2007: 180; for similar statements, cf. Hartshorne and Ullman 2006; Pinker and Ullman 2002; Pinker 1999; Ullman 2007: 276). It is hard to see how entities should percolate from one module into the other if these modules are entirely separated. Tyler et al. (2005) even acknowledge that irregular forms activate left inferior frontal regions to some extent, which they ascribe to the fact that an irregular past tense form will as a whole be associated with grammatical properties (presumably in terms of clausal syntax). This is an interesting hypothesis which would certainly deserve further investigation, but it does not rule out the possibility for regular and irregular forms inherently relying on the same neural substrates. This leads on to another important caveat which has to be kept in mind when conducting or reviewing neuroimaging studies. Most fMRI research is still based on the so-called ‘subtraction logic’, which involves comparing activation for two experimental conditions which are assumed to differ in a single cognitive feature, the so-called ‘component of interest’. To illustrate, let’s call the first experimental condition A and the second experimental condition, which involves a single additional cognitive feature, B. Under the subtraction logic, brain activation for the component of interest will be isolated by subtracting A from B (i.e., B – A = I). It is important to note that a significant difference in activation does not imply that condition A is not associated with activation in a given area – it simply means that the activation for condition A is weaker than that for condition B, which can still be quite strong. This means that the subtraction method may not be an ideal candidate for falsifying the claim that two functions rely on the same neural substrates. A more specific problem of the above studies is that chunked and composite items might actually be considered to differ in more than a single

61

The cognitive realism of usage-based generalizations

61

cognitive property: Under a dual-mechanism view, the processing of chunks requires the retrieval of a single form. By contrast, the parsing of complex items requires at least segmentation, retrieval of individual morphemes, and possibly also their subsequent recomposition. 3.4.

Conclusion: Can we reasonably expect corpora to predict entrenchment in the mind?

This chapter has explored the extent to which we can reasonably expect to gain neuropsychologically realistic insights into a speaker’s linguistic knowledge from quantitative generalizations across corpus data, as posited by the corpus-to-cognition principle. For all the methodological and theoretical problems that the corpus-to-cognition principle brings about (cf. sections 3.1 and 3.2), a conservative interpretation of existing empirical research – most of which has been conducted outside the usage-based framework (cf. section 3.3) – suggests that corpora can, after all, be taken to inform us about certain aspects of entrenchment in the brain. Nonetheless, too simple an equation between the individual, mental and potentially creative level the one hand, and average collective performance on the other would yield theoretical inconsistencies, leading to some important caveats. First, statistical methods that aggregate over large corpora will, by their very nature, reveal generalizations pertaining to an idealized average brain. Interesting as these findings might be, they will miss important generalizations at a lower level of granularity – a level which usage-based models independently need to model the dynamics of language. More concretely, with regard to entrenchment, there is good reason to expect systematic differences between subjects with more or less holistic cognitive styles, and a statistical model that handles these differences as noise will be highly distorting with regard to the minds of real language users. Second, even if we do find correlations between statistical models and cognition, we should probably be more cautious in our statements about the actual knowledge of language users. As shown above, statistics and distributions may very well be a surrogate for understanding what goes on in the minds of real individuals, who may be (partly or exclusively) driven by functional factors rather than unconsciously tracking and encoding frequencies. Third, in monomorphemic lexemes, the so-called ‘word frequency effect’ has been shown to be easily replicable and clearly apparent in both behaviour and neurophysiology (cf. section 3.3.1). However, the situation

62

Conclusion

62

is far less straightforward with regard to multi-morphemic strings. While several behavioural studies report holistic cognitive anchoring for highfrequency sequences, most of them actually merely demonstrate increased ease of processing (cf. section 3.3.2). Although this finding is both suggestive and interesting by itself, it does not actually allow us to conclude to holistic storage, since participants may simply have retrieved and assembled the component morphemes with greater ease and rapidity. Under such an interpretation, frequency would merely shape language processing (or: performance), but it would not impact representation (or: competence). Such a finding, however, would not unequivocally support usage-based over competing theories, since, as O’Grady (2008b: 23) puts it, “everyone agrees that there are frequency effects in language” (Keller and Asudeh 2002: 240; Chomsky 2007: 10). The real argument within linguistics is how far these effects go, and usage-based linguists are unanimous in claiming that linguistic experience shapes the mental representation that underlies processing – as witnessed by their definition of entrenchment, which encompasses a distinction between holistic and (merely) compositional processing. Unfortunately, the neurological chunking studies presented in sections 3.3.4 and 3.3.5 do not contribute to settling the issue. One important problem of these studies is that they do not define holistic chunks in terms of frequencies and indeed seem to be dealing with a rather disparate set of phenomena ranging from highly emotional expressions (e.g., curses) via social formulae (e.g., formulas of greeting and thanking like Have a nice day or See you later) and irregular or weakly productive morphology (e.g., bought, serenity) to non-literal expressions (e.g., He’s at the end of his rope) (Van Lancker and Kempler 1987). Interestingly, many of these studies emphasize that the chunks differentially affected by neurological conditions are highly familiar, but in their understanding, ‘familiarity’ seems to be tantamount to ‘exhibiting idiosyncrasies that have to be known for native-like production’. Consider, for example, the following description of formulaic expressions: Most importantly, and as an essential, even definitional feature, FEs [formulaic expressions, A.B-D.] have a unique coherence not present in novel utterances. Word selection and word order are determined; intonation is usually stereotyped, in that choices for sentence accent are limited: No man is an island sounds “wrong” with an accent on man or is; I wouldn’t want to be in his shoes does not sound native, or well-formed, when shoes carries the accent (these kinds of “errors” are heard in second language speakers, (Van Lancker-Sidtis, 2004)). FEs are “familiar” in the sense that a native

63

The cognitive realism of usage-based generalizations

63

speaker will recognize them as having this special status. (Van Lancker and Rallon 2004: 208)

As shown in chapter 2, it is commonly accepted that expressions of this kind have to be stored in a lexically specific format by virtue of their idiosyncrasy; in other words, they are not adequate to demonstrate chunking in sequences that can be fully subsumed under more general constructional schemas. A major purpose of the empirical part of this work will be to gain a clearer understanding of the exact relationships between token frequencies in corpora, chunking in mental representation, and ease of processing. Is the usage-based definition of entrenchment empirically justified? If so, how are the different defining features of entrenchment related to each other? In particular, is the distinction between representation and processing a gradual or a categorical one? The chunking studies in sections 3.3.4 and 3.3.5 interpreted the situation in terms of a clear-cut dichotomy between a compositional and a holistic module. This, however, should be seen as an axiom rather than an empirical claim, since the data were actually silent on this issue (for detailed arguments, see pp. 56 and 60). This leads on to the question of how entrenchment should be modelled. Dual-mechanism models assume a clear-cut dichotomy between two cognitive modules: a computational module, which parses complex sequences on the basis of morphemes and abstract rules, and a holistic module, which retrieves holistic chunks from memory. By contrast, according to the usagebased view of language, “all linguistic experience (be it atomic or complex) is processed by the same cognitive mechanism” (Arnon 2009: vi). This approach tallies perfectly with connectionist models, which hold that there is only one way of processing language. According to connectionists, there is no such thing as (de)composition, but only a network of holistically stored entries which are interconnected through associations of gradable strengths. The strength of an associative link depends on the frequency, degree and consistency of (phonological, orthographic and semantic) overlap between the entries involved (Rueckl et al. 1997). At first sight, dual-mechanism models and associationist models seem to stand in maximal opposition to each other. Crucially, however, a closer look reveals unexpected points of convergence. Thus, with regard to transparent multi-morphemic strings, both families of models allow for redundancy. Of course, in the case of dual-mechanism accounts, the notion of redundancy refers to cooperation and competition between qualitatively distinct cognitive modules (retrieval of stored units versus parsing). By contrast, the usage-based kind of redundancy involves associations between

64

Conclusion

64

qualitatively identical representations within a single, undifferentiated neurocognitive system (or, in construction grammar terminology, between adjacent layers in a hierarchically organized taxonomical network, cf. section 2.1). The moot question, though, is whether these kinds of redundancy are at all empirically distinguishable. Even more striking similarities between single-system accounts and dual-mechanism models also deserve consideration. Thus, Walenski and Ullman (2005) argue that besides the acquisition, storage and retrieval of idiosyncrasies (e.g., monomorphemic words and non-transparent idioms), the mental lexicon also performs less prototypical cognitive operations. One of these operations is the generation of productive schemas via associative generalizations over stored entries which are perceived as similar on some dimension (cf. Pinker 1999; Pinker and Ullman 2002). Walenski and Ullman (2005: 337) illustrate this by proposing that people will compute the form splang as the English past tense form of the pseudo-word spling by analogizing across memory traces for similar-sounding word pairs like spring – sprang, sing – sang and ring – rang (see also Bybee and Slobin 1982). Section 3.2 mentioned that according to Hartshorne and Ullman (2006), this kind of associative computation also extends to overtly complex forms. Walenski and Ullman (2005: 340) even go so far as to suggest that the result of associative generalizations in declarative memory may be fully abstract constructional schemas: “It should be emphasized that there appear to be many ways by which the lexical/declarative memory system can learn and process complex structures that are also computable by the grammatical/procedural system. In addition to memorizing chunks …, individuals may depend on stored schemas or constructions.” This conception of a mental lexicon memory which allows for schematic abstractions and analogy-based associative generalizations is remarkably similar to the constructional network posited by usage-based cognitive approaches (cf. section 2.1; see also Prasada and Pinker 1993). If prominent advocates of the dual-mechanism paradigm go along with the usage-based idea of a continuum of holistically stored structures ranging from monomorphemic items to complex schemas, why do they still cling to the idea of the existence of a separate procedural module? To put it in other words, what is the added value of positing a computational-rulemodule, if you explicitly acknowledge the existence of fully abstract schemas generated in a superpositional mental lexicon? Walenski and Ullman (2005: 340) respond by emphasizing “that it is surely not the case that any complex structure computed by the grammatical system can also be memorized in, or otherwise depend on, lexical/declarative memory. For example,

65

The cognitive realism of usage-based generalizations

65

structures that involve long-distance dependencies may cause particular difficulties for this system.” As a result, they argue that the mental lexicon must be seen as “a qualitatively different beast from the compositional computations that are subserved by the grammatical/procedural system” (Walenski and Ullman 2005: 337). All in all, this chapter leaves us with the following research questions, which will be explored in the remainder of this work: Is it possible to find independent neuro- and psycholinguistic support for the claim that compositional high-frequency sequences are processed and represented in a special way? How is entrenchment related to token frequency and other stimulus-related variables, and how do these variables interact? On this basis, does it make sense to assume a principled distinction between holistically entrenched chunks and ‘the rest’, in other words, is entrenchment something categorical rather than a matter of degree? If so, where are the relevant frequency thresholds that index these qualitative ‘quantum leaps’? How should entrenchment be modelled? Are generative dual-mechanism models or usage-based single-system models more appropriate to account for entrenchment? Answers to these questions could also inform ongoing debates about possible dividing lines between lexicon and grammar, the fixed and the free as well as storage and computation. While, as mentioned above, all usagebased linguists contend that high token frequency leads to chunking, the division of labour between item-based chunks and more abstract constructions in normal language use has been a matter of considerable debate in the literature (for diverging estimates of the size of the multiword lexicon, cf. Erman and Warren 2000; Foster 2001; Jackendoff 1995; Mel’þuk 1996; Becker 1975; Bolinger 1976; Moon 1998; Altenberg 1998). To conclude, it seems wise to adhere to a weak version of the corpus-tocognition principle. With regard to token frequencies in corpora and their relationship to entrenchment, such a version has the advantage of making clear, interesting and testable predictions while still avoiding theoretical inconsistencies. It also acknowledges that while corpus data may, to some extent, be used as a yardstick for language representation in the brain of an average language user (which may, in its turn, be rather weakly representative of actual brains), these have to be tested against and complemented with behavioural and neural findings to achieve a more complete picture of the relationship. Before turning to the experiments, however, we will have to explore how entrenchment can be operationalized.

Chapter 4 Operationalizing entrenchment

Chapter 2 demonstrated that the notion of entrenchment is central to usagebased approaches, with transparent multimorphemic strings at the bottom of the constructional hierarchy playing a particularly prominent role. Chapter 3 showed that the relationship between token frequencies in corpora and entrenchment in the minds of speakers still awaits experimental testing, since research is sparse and has left crucial questions unresolved. Another issue which becomes evident from the studies presented in chapter 3 (notably section 3.3.3) is that it is extremely difficult to get an experimental handle on chunking, which is one of the defining features of entrenchment (cf. chapter 2). Thus Wray (2009), who works on ‘formulaic sequences’ (a cover term for phenomena like idioms, collocations and proverbs), claims that there is no independent way of experimentally distinguishing between holistic and compositional processing. Sinclair and Mauranen (2006: 6) make a similar point: “In principle we could have devised a psycholinguistic experiment to bolster our claim [that chunking is a natural and unavoidable way of perceiving language text as it is encountered], but we have not done so. Such a project would not be easy, because it is very difficult to specify precisely the kind of behaviour that would be required to demonstrate the validity of our supposition” (cf. also Dahl 2004: 91). This chapter will set the stage for the experiments to be presented in chapter 5. Before we can test whether corpus-derived claims about the cognitive entrenchment of compositionally analyzable strings are psychologically realistic, we first need to examine even more closely than before how entrenchment has been defined in the literature (section 4.1). It will also be necessary to specify how the psychologically realism of a statement can be assessed in experimental terms (section 4.2), and, most importantly, to operationalize entrenchment (section 4.4), which, as mentioned above, is notoriously difficult. To get some fresh ideas on how to tackle this problem, section 4.3 will highlight different strands of linguistic and nonlinguistic research which have been important sources of inspiration for the experiments presented in this work. More specifically, it will be concerned with the domain of chunking in vision as described by Gestalt psychology (section 4.3.1), with certain strands of priming research (section 4.3.2), with extant research on frequency effects in complex words (section 4.3.3),

67

Operationalizing entrenchment

67

and with parametric fMRI studies on word frequency effects (section 4.3.4). All in all, this chapter will argue that insights and methods from different lines of research can be fruitfully combined to develop a promising experimental paradigm. At the same time, this chapter also aims to introduce the unfamiliar reader with some essential neuroanatomical and neuroimaging notions, which will be indispensable for understanding the experiments to be presented below. 4.1.

Defining entrenchment

This section will elaborate on the notion of entrenchment with a special focus on the moot question of whether entrenchment should be thought of as something categorical or gradient (cf. section 3.4). It is important to get more clarity on how the literature deals with this point, since it will be crucial to my operationalization. Looking at definitions of entrenchment, it becomes clear that they fall into two sets of criteria which are not necessarily consistent with each other. The first set, which may be subsumed under the heading of chunking, is highlighted by De Smet and Cuyckens (2007: 188), who claim that an entrenched construction “represents an automated, routinized chunk of language that is stored and activated by the language user as a whole, rather than ‘creatively’ assembled on the spot.” In other words, chunking refers to the idea that an entity which is made up of smaller sub-entities comes to be perceived as a unit, with unit status implying retrieval in a single step rather than access to and composition of its component parts. At a phenomenal level, being a cognitive unit is an all-or-nothing question (for more detail, cf. section 4.3.1.2). Moreover, in common understanding, it pertains to the relatively stable level of representation more than to the dynamic level of processing in that it refers to the storage format in our mental inventory. This in turn suggests that with higher levels of entrenchment, there must be a point where a compositionally assembled sequence comes to be reanalyzed as a single precompiled unit which can be directly retrieved from the mental lexicon. The problem with this interpretation is that it is not immediately obvious how it relates to the second set of entrenchment criteria, which refer to features that are inherently gradable, such as degree of automaticity, strength of mental representation , processing effort, and degree conscious monitoring (Handl 2011: 67; Langacker 2008: 16). What is more, these continuous features seem to be related to processing, viz. to changes in the

68

Defining entrenchment

68

use of stored entities rather than their inventory, as they imply that the process of on-line concatenation gets easier and more fluid. To summarize, existing entrenchment definitions can be taken to suggest that there must be a point at which gradual differences in processing ease turn into qualitative differences in representation. Most definitions actually explicitly encompass both aspects, as the following quotes show: Entrenchment comes in degrees, even beyond the minimum threshold required for independent storage. (Croft and Cruse 2004: 292). With repeated use, a novel structure becomes progressively entrenched, to the point of becoming a unit; moreover, units are variably entrenched depending on the frequency of their occurrence. (Langacker 1987: 59)

Bybee (2007) is more specific with regard to the assumed direction of causality: Each usage event leads to a continuous increase in ease of access and fluency of composition, and these gradual changes in turn engender a qualitative change, the demotion of sub-units. Each token of use of a word or sequence of words strengthens its representation and makes it more easily accessed. In addition, each instance of use further automates and increases the fluency of the sequence, leading to fusion of the units. (Bybee 2007: 324; cf. also Bybee 2007: 10, 279)

Since this aspect is not explicitly mentioned in the above quotes, it should be recalled that according to usage-based models, holistic and compositional modes of processing and representation for one and the same string are not necessarily mutually exclusive. Thus, as shown in section 2.1, idiomatically combining expressions represent concrete, holistic chunks which can nonetheless be analyzed compositionally. The idea of potentially redundant representations for a given linguistic sequence also follows from the usage-based assumption that in language acquisition, emerging abstractions (i.e., abstract schemas and single morphemes) do not necessarily supersede the instances (i.e., holistic chunks) from which they emerge (cf. section 3.1). To conclude, let us adopt the following working hypothesis with regard to the entrenchment of compositional multimorphemic sequences: Higher token frequencies in usage will correlate with a gradual increase in ease of processing, more precisely in enhanced fluidity in composition or parsing. At some point, this process will lead to a new, holistic representation. After this point, facilitation – more precisely, ease of retrieval, possibly in combination with fluidity in parsing – will still continue to increase as a func-

69

Operationalizing entrenchment

69

tion of frequency. While the literature is not very specific as to how exactly chunking comes about, it strongly suggests that the solution may reside in the nature of the relationship between representation and processing, which might be more intricate than assumed under the default view. This, in turn, suggests that in operationalizing entrenchment, we will have to keep track of both aspects – chunking in representation and fluidity in processing – to get a clearer picture of whether they are actually distinguishable and, if so, how they are related. 4.2.

Assessing the psychological realism of a statement in experimental terms

Before one can test whether corpus-derived claims on the cognitive entrenchment of compositionally analyzable strings are psychologically realistic, it is necessary to gain a clear understanding of what it means to experimentally assess the psychological realism of a statement. In this connection, it is important to point out that usage-based linguists usually talk about the ‘mind’ – a notoriously ill-defined term which seems deliberately non-committal as to the brain. Here, ‘mind’ will be understood as referring to subjective phenomena of cognition like thought, free will and sensory perception, whereas ‘brain’ will be taken to refer to the underlying physical structure in our heads (cf. Searle 1984, 2004). Although mental and cerebral phenomena are qualitatively different, we will proceed from the assumption that they are intrinsically related in that there is no mental activity without brain activity (more precisely, I would adopt a non-reductive, emergentist version of physicalism as proposed by Murphy and Brown [2007], but I will not delve further into this theory here, as it is not immediately relevant to the discussion at hand). As a consequence – and this probably goes without saying – it will be assumed that the issue of entrenchment in the mind is, in principle, amenable to empirical testing by means of brain research, and that psychologically realistic models are models which are in line with the data produced by such research. Now there are, of course, several more or less direct ways of ‘getting at the brain’ and of producing psychologically realistic accounts. A relatively weak way of interpreting the commitment to psychological realism would be to seek to achieve transparent and lawful relations between a given model and what is known about the brain from patholinguistic studies (cf. Schnelle 2003: 339). To gain insights into the cerebral anatomy of language, researchers have traditionally resorted to investigating language disorders, more particularly aphasic syndromes. Their main

70

Assessing the psychological realism of a claim

70

principle has been to relate brain lesions to linguistic deficits and vice versa in order to elaborate a general functional model of language (cf. section 3.3.4). Although work in this paradigm has generated interesting hypotheses and continues to inform current debates on language representation in the brain, it is problematic in that it is based on case studies which are not reproducible. Another problem is that it only affords insights into the function of lesioned brains, which, for reasons of neuro-plasticity, are not necessarily representative of ‘normal’ brains (cf. section 3.3.1). Moreover, due to the distributed network anatomy of higher cognitive functions, lesions may result in important language deficits, even if the relevant brain regions are only indirectly connected to critical language areas. All in all, according to Démonet and Thierry (2001: 49), the traditional aphasiological approach to language deficits has not been able to relate linguistic impairments to reproducible and reliable brain sites. Under a somewhat more ambitious interpretation, a psychologically realistic model could be defined as a model which is in line with externally observable and experimentally reproducible behaviour. Analysis of behavioural data like reaction times and error rates is the traditional technique of experimental psychology and psycholinguistics. Although button-press tasks and similar experimental methods have provided valuable insights into the nature of language representations (some of which will be reviewed in 4.3.2), these approaches still remain ‘outside’ the brain and therefore provide us with a very indirect reflection of the corresponding mental activity. However, if you wish to investigate subtle questions of cognitive linguistic model-building, an exclusive reliance on behavioural data seems problematic, since there is no one-to-one relation between behavioural indicators and mental activity. Thus, tasks can be resolved on the basis of different strategies, answers can be given at different levels of certainty, and redundant cognitive processes can give rise to identical behaviours, making it difficult to infer and distinguish between different mechanisms (Pulvermüller 2007: 121; Ullman 2007: 275). More recently, non-invasive neuroimaging techniques have become available. These techniques fall into electro-magnetic techniques (most prominently electro-encephalography [EEG] and magnetoencephalography [MEG]), which offer excellent temporal, but relatively poor spatial resolution, and hemodynamic (or: metabolic) techniques (like positron emission tomography [PET] and functional magnetic resonance imaging [fMRI]), which offer poorer temporal, but impressive spatial resolution at the millimetre scale. In the following, I will restrict myself to giving some background information on the fMRI technique, which will be exploited in three of my own experiments. fMRI capitalizes on the fact that

71

Operationalizing entrenchment

71

increased neural activity in a given brain region triggers changes in blood flow and blood oxygenation after a delay of a few seconds. Importantly, local increases in metabolic activity induce minute changes in magnetic blood properties. These are captured by means of the so-called Blood Oxygenation Level Dependent (BOLD) contrast, which reflects the ratio of oxygenated and deoxygenated blood in a given volume element (voxel or volumetric pixel) (Bornkessel-Schlesewsky and Friederici 2007: 408). fMRI is thus an indirect technique for pinpointing brain regions that are active during a specific cognitive task, but it does not directly measure the activity of neurons. It is important to keep in mind that the hemodynamic response which supplies active neurons necessarily lags behind the neural activity that triggers it, leading to a relatively weak temporal resolution. In fMRI research, brain activation for a component of interest is typically obtained by the so-called ‘subtraction method’. This means that activation for a control condition (called ‘baseline condition’) is subtracted from activation for a critical condition. The baseline condition may either be cognitive rest, a low-level baseline (e.g., a fixation cross [+] or a row of hashmarks [####] in a visual word recognition experiment), or ideally a condition which differs from the critical condition in only one cognitive component (for more details, cf. section 3.3.5). Results of fMRI research are typically presented in the form of statistical parametric maps indicating statistically significant differences in brain activity between different conditions. To conclude, although lesion studies and psycholinguistic experiments are helpful in providing a first approximation to cognitive linguistic issues, they can be fruitfully combined with brain imaging methods, which provide more direct access to the cognitive mechanisms underlying our linguistic behaviour. Needless to say, no technique is without problems, and it is in the interest of theoretical robustness to compare and discuss data from as many different strands of empirical research as possible. In the present work, usage-based entrenchment predictions will be tested, first, against behavioural data and, second, against fMRI data. Note that although any externally observable linguistic performance must ultimately correspond to some cerebral activity, we still lack a model of how these two qualitatively different levels of description can be integrated (more precisely, we lack ontological bridging rules establishing principled correspondences between neurophysiology and behaviour, cf. Nelson 2012). As a result, it is not to be excluded that data from both levels may diverge (we will come back to this problem in section 7.3.4). One problem is that behavioural performance data such as response times or accuracy tend to give us a univariate perspective on language processing in that they provide a single measure at a given moment in the course of a cognitive process; they

72

Sources of inspiration

72

can thus be said to compress “a complicated network of neural computation into a single behavioral output” (Small and Nussbaum 2005: 304). By contrast, fMRI yields a multivariate data set which reflects all of the activity occurring in different neural networks over time at a given level of temporal and spatial granularity. 4.3.

Sources of inspiration

Remember that the experiments to be presented in chapter 5 aim to explore potential relationships between the different defining criteria of entrenchment (cf. section 4.1) and corpus-extracted token frequencies on the basis of a combination of behavioural and neuroimaging experiments. As this study breaks new ground and entrenchment is notoriously difficult to operationalize, it seems natural to turn to research dealing with the relationships between holistic chunks and their parts in other domains of cognition. Section 4.3.1 will argue that Gestalt psychology, which examines how unified patterns emerge from lower-level units in visual perception, has yielded insights which are both relevant and transferable to linguistic sequences. Section 4.3.2 will then go on to present the masked visual priming paradigm, which has proven to be a sound and successful method to measure part-whole relationships in the context of other linguistic debates. An important advantage of this method is that it that it is equally well established in behavioural and neuroimaging research, making for an ideal candidate to combine both approaches. Section 4.3.3 will provide a review of studies investigating frequency effects on the processing of English transparent derivatives. These studies indicate that part-whole-relationships in morphology might be much more intricate than assumed in the working definition in section 4.1. However, these studies do not allow us to draw definite conclusions, as they do not test for part-whole relationships proper. Moreover, they are not suited to track gradient changes as a function of frequency because they use categorical designs, which involves assigning words to pre-determined frequency categories (e.g., high versus low). Section 4.3.4 will present parametric studies of neural activation, which are better suited to answering my research questions, especially when combined with regression techniques on behavioural response data.

73

Operationalizing entrenchment

4.3.1.

73

Gestalt psychology

Section 4.3.1.1 sets out the rationale for operationalizing linguistic entrenchment in the light of findings from non-linguistic strands of research. Section 4.3.1.2 demonstrates that some principles of perceptual organization proposed by Gestalt psychologists in the 1920s and 1930s (Ellis 1999; Wong 2010a, b) are both relevant and transferable to linguistic chunks. 4.3.1.1. Why seek inspiration from non-linguistic lines of research? In the context of research on linguistic entrenchment, it is pertinent to look at Gestalt psychology for several reasons. The founding fathers of this school, most prominently Max Wertheimer, Kurt Koffka, and Wolfgang Köhler, sought to explain how humans organize visual input into unitary meaningful shapes. These shapes, called Gestalts, represent more than and are actually cognitively primary to the sum of individual component sensations they are made up of. The striking parallels between visual Gestalts and highly entrenched linguistic constructions suggest that visual Gestalts constitute a fertile and well-established ground to draw on for inspiration with regard to linguistic constructions. The idea to proceed from an analogy between Gestalt perception in vision and holistically stored language sequences is theoretically consistent for research situated within the usage-based framework, which assumes that language is cut from the same cloth as the rest of the mind (see MacWhinney 2002: 249; Raible [1981] 2011 and Talmy 2003 for the language-Gestalt interface, as well as Landau and Jackendoff 1993; Jackendoff 1999a and Talmy 2000: 90–92 for the language-vision interface). Croft and Cruse (2001: 2) elaborate on the idea that linguistic cognition fundamentally works like other aspects of cognition: [L]anguage is not an autonomous cognitive faculty. … [T]he representation of linguistic knowledge is essentially the same as the representation of other conceptual structures, and … the processes in which that knowledge is used are not fundamentally different from cognitive abilities that human beings use outside the domain of language. … [T]he cognitive processes that govern language use … are in principle the same as other cognitive abilities. That is, the organization and retrieval of linguistic knowledge is not significantly different from the organization and retrieval of other knowledge in the mind, and the cognitive abilities that we apply to speaking and understanding language are not significantly different from those applied to other cognitive tasks, such as visual perception, reasoning or motor activity.

74

Sources of inspiration

74

By contrast, modular linguists, who assume that the human linguistic faculty is clearly distinct from and informationally encapsulated with regard to other cognitive capabilities, would arguably be averse to transferring knowledge of vision to language (compare, for instance Fodor 1983 and Culicover 2005). Let us note in passing that there are two theoretical positions which go by the name ‘linguistic modularity’: On the one hand, this term is used to refer to the assumption that language is a distinct module of cognition. On the other hand, it denotes the idea of there being neurocognitive dissociations between separate modules within the linguistic system (e.g., between lexis and syntax or syntax and semantics) (Eisenbeiß 2009; Dapretto and Bookheimer 1999). Although these two meanings are not always clearly kept apart, it is important to grasp that linguists who subscribe to one version of modularity will not necessarily adhere to the other. The present section will be concerned with the first kind of modularity, but it should be clear from chapter 2 that usage-based linguists, who assume that language is “constructions all the way down” (Goldberg 2006: 18) and that abstract syntactic structure is inherently meaningful, also emphatically reject the second kind of modularity. Besides considerations of theory-internal consistency, there are other arguments that militate against a modular take on language and, by the same token, support an approach which acknowledges that linguistic cognition relies on transversal cognitive functions such as bodily experience, visual perception and music. For example, recent experimental research conducted within the embodiment framework has demonstrated that sensory-motor aspects of our bodily interaction with the physical environment are part of the meaning of words (cf. Gibbs 2006; Meteyard and Vigliocco 2008; Pulvermüller 2005; Barsalou 2008 for overviews). Thus, different neuroimaging studies attest to the fact that primary sensory and motor areas in the brain are activated by constructions referring to perception and motion (Hauk, Johnsrude, and Pulvermüller 2004; Vigliocco et al. 2006). Tettamanti et al. (2005), for instance, conduct an fMRI experiment which reveals that the auditory processing of sentences depicting actions performed with the legs, the hands or the mouth selectively activates cerebral regions where the relevant actions are motorically coded. Likewise, recent behavioural research has demonstrated interferences between language comprehension and non-linguistic tasks related to perception and action (e.g., Richardson et al. 2003). Glenberg and Kaschak (2002) present subjects with imperative sentences (e.g., Close the drawer!), sentences describing the transfer of concrete objects (e.g., You delivered the pizza to Andy.), and sentences describing the transfer of abstract entities

75

Operationalizing entrenchment

75

(e.g., Liz told you the story.). The subjects are required to determine as quickly as possible whether these sentences make sense by pressing a button which either requires moving their hand away from or toward their bodies. Crucially, when a sentence implies movement in one direction, subjects are slower in making decisions which require hand movement in the opposite direction. In a similar vein, linguists have shown that contrary to long-held assumption, the phenomena of unbounded recursion, instantiation of variables, headed constituents and hierarchical structure are not restricted to language, but also exist in the domains of music, vision and motor control (Jackendoff and Pinker 2005; Jackendoff 2007). Likewise, Walenski and Ullman (2005) argue that the neurocognitive substrates underlying declarative and procedural memories for language (cf. section 3.3.5) actually subserve domain-independent functions. Thus, declarative memory, which is related to stored units in the mental lexicon, also modulates semantic memory, which contains factual knowledge (e.g., Conakry is the capital of Guinea), and episodic memory, which contains autobiographical events which can be explicitly stated (e.g., Yesterday, I bought a plush duck for my son). By contrast, procedural memory relates to the acquisition and performance of cognitive and motor routines, especially those which involve sequences (e.g., driving a car or tying shoes). The procedural memory network is usually considered to operate rapidly, automatically, and without the need for conscious monitoring. The non-modular approach also has the obvious practical advantage of making language cognition amenable to experimental techniques and paradigms originally developed for other cognitive domains. Thus, Walenski and Ullman (2005) expect that inspiration from other cognitive disciplines will give a considerable boost to linguistic research: Importantly, the acknowledgement of such shared neurocognitive substrate is expected to lead to entirely new directions in the study of language. Theories and data from other cognitive domains are likely to generate novel predictions about language that would be far less likely to be entertained in the isolated study of language alone. Because neurocognitive substrates often work similarly across the different domains they subserve …, discoveries in one domain may very well be applicable in another.

The next section will demonstrate that Gestalt psychology has yielded insights which are both relevant and transferable to linguistic chunks.

76

Sources of inspiration

76

4.3.1.2. What makes a chunk in Gestalt psychology? Now, according to Gestalt theory, what does it mean for a collection of elements to form a holistic chunk? In a nutshell, it means that the whole takes precedence over the component parts, which involves the following features (note that in the following, I will stick to linguistic and/or selfexplanatory terminology): (i) (ii) (iii) (iv)

Pattern completion: the component parts of a given chunk will evoke the whole; Emancipation: the perception of the whole will be autonomous from that of its parts; Top-down coercion: the parts will be perceived and interpreted in the light of stored knowledge of the whole (rather than vice versa); Ease of memory: it will be easier to remember chunked than nonchunked bits of information.

Let us go through these phenomena in turn and illustrate that they have what it takes to inform us about linguistic cognition. In visual perception, pattern completion refers to the phenomenon that the mind will complete incomplete figures to match them with a holistic mental representation. For example, in Figure 1, you will perceive a circle, although actually only a few lines have been drawn.

Figure 1. An example of pattern completion: The perceiver unconsciously completes the missing lines to match the stimulus with the holistic mental concept of a circle. From: Gestalt psychology. (2011, May 16). In Wikipedia, The Free Encyclopedia. Retrieved 08:40, May 20, 2011, from http://en.wikipedia.org/ w/index.php?title=Gestalt_psychology&oldid=429425456 (caption mine).

77

Operationalizing entrenchment

77

In Gestalt theory, the perceptual principle that makes us complete missing information is referred to as Law of Continuity. It is also known to apply in the domains of auditory and kinetic perception. Another principle that underlies this ability is captured by the so-called Law of Closure, according to which we are naturally inclined to perceive figures in such a way as to increase their regularity and symmetry. As an aside, it would be interesting to examine the extent to which the neural substrates underlying the Law of Closure overlap with the neural substrates for analogyformation and regularization in language (cf. section 2.5). Here, I will restrict myself to the Law of Continuity and show that it can also be shown to hold in language. Consider, for example, sentences (1) to (3): (1) (2) (3)

Where there is ___, there is fire. The ___, the merrier. You can’t make an omelette without breaking ___.

There is no doubt that native speakers of English will be able to generate the deleted items. The reason is that they will have stored the relevant expressions in a strong holistic format such that subparts will be sufficient to activate the whole chunk-level representation. In psycholinguistics, the probability of informants producing the removed word in such sentences is commonly referred to as ‘cloze predictability’, and it is interesting to note that the word ‘cloze’ itself is commonly said to derive from ‘closure’ in Gestalt theory (Ashby, Rayner, and Clifton 2005; Frisson, Rayner, and Pickering 2005; Kliegl et al. 2004; Taylor 1953). Harris (1998) argues that the same principle applies when we anticipate the completion of an idiom on hearing its initial part (cf. her experiment reported in section 3.3). Extrapolating McClelland and Rumelhart’s (1981; Rumelhart and McClelland 1982) Interactive Activation Model of word recognition to idioms, she suggests that partial input activates a unitary higher-level representation, which in turn top-down activates missing component parts. The top-down coercion experiment to be introduced in section 4.4 will examine whether this finding extends to high-frequency sequences. The degree to which an item evokes a larger pattern which it coconstitutes arguably refers to a phenomenon which is inherently gradable and best captured in terms of association strengths between the relevant item and the overall pattern. However, there is no doubt that elements which are mentally associated do not necessarily form a chunk. To take a straightforward linguistic example, even if we found that like was preferentially associated with the variable X in I don’t X this, this would not allow us to conclude to the existence of a chunk-level representation. The reason

78

Sources of inspiration

78

is that I don’t like this could still be compositionally assembled from its component parts, but with greater ease than, for instance, I don’t hate this (cf. Tremblay, Derwing, and Libben 2009). Such a finding could easily be accommodated within a constraint-based lexicalist account which posits that lexemes in the mental lexicon are associated with frequency information (e.g., MacDonald, Pearlmutter, and Seidenberg 1994; Trueswell and Tanenhaus 1994). By contrast, the phenomenon of emancipation seems to imply the existence of a chunk-level representation. Emancipation refers to the finding that certain complex entities are not recognized by first identifying their individual component parts and then putting these parts together in a bottom-up fashion, but rather via holistic retrieval of the ready-made higher-level concept all at once. For illustration, consider the Dalmatian depicted in Figure 2. If you do recognize it (note that some people actually don’t), you will not have the impression of assembling it from its body parts (paws, snout, tail etc.), but rather of perceiving it as a unified whole. Another way of putting this is that the perception of a Gestalt implies low awareness for the individual parts that make it up (Poljac, de-Wit, and Wagemans 2012). In Gestalt psychology, the technical term for a perceptual pattern that suddenly emerges from simpler parts is ‘emergence’ (Köhler 1921; Koffka 1935; Maier 1930, 1931; Wertheimer 1925, 1959). The reader will probably agree that the phenomenal experience of the Dalmatian perception is categorical (rather than gradient): You either do recognize it, or you don’t, with there being no intermediate level of partial recognition. This subjective experience of a new insight or percept suddenly popping into mind, seemingly ‘out of nothing’, has been termed Aha-Erlebnis (sometimes translated into English as ‘light bulb moment’) by Karl Bühler (1907) (for a recent state-of-the-art overview, cf. Knoblich and Öllinger 2006).

79

Operationalizing entrenchment

79

Figure 2. Picture of a Dalmatian illustrating the principle of emergence. From: Gestalt psychology. (2010, April 10). In Wikipedia, The Free Encyclopedia. Retrieved 11:13, April 10, 2010, from http://en.wikipedia.org/w/ index.php?title=Gestalt_psychology&oldid=428146601 (caption mine).

Things are actually even more complex, as, indeed, it would seem that the recognition of individual body parts of the Dalmatian depends on prior recognition of the dog as a whole and its subsequent decomposition (cf. Hochstein and Ahissar 2002). This leads on to the next relevant feature of holistic chunks, namely top-down-coercion. The notion of top-down coercion refers to the phenomenon that subparts of a chunk will be perceived and interpreted in the light of stored knowledge of the whole (rather than bottom-up composition from cognitively primitive parts). This highlights an insightful parallel between visual Gestalts and linguistic constructions: As mentioned in section 2.1, once retrieved as meaningful entities, constructions tend to exert top-down coercion onto their component morphemes. In this connection, it is interesting to note that the linguistic phenomenon of constructional top-down coercion fits in perfectly with the concept of downward causation as it is currently being discussed in emergentist

80

Sources of inspiration

80

epistemology (for more details, see, for example, Murphy, Ellis, and O’Connor 2009). The notion of downward causation presupposes a multilayered view of the world, with different levels being hierarchically arranged in terms of increasing complexity and with higher levels emerging from lower levels. Downward causation has been formulated as an alternative to received reductionist accounts, which posit that “in all higher-level systems, the parts unilaterally determine the behavior of the whole, and are not affected by their relations to one another or to the whole” (Murphy, Ellis, and O’Connor 2009: 4). In emergentist philosophy, higher levels are assumed to exert top-down causation onto the behaviour of the lower-level parts they are made up of. This account has been applied to phenomena as various as the body-mind problem, which seeks to explain how physical matter in the brain can give rise to the ontologically different level of subjective consciousness, and a wide range of natural and social phenomena. Popper and Eccles (1984: 20) illustrate the phenomenon of physical downward causation in the following way: Stars are undesigned, but one may look at them as undesigned "machines" for putting the atoms and elementary particles in their central region under terrific gravitational pressure, with the (undesigned) result that some atomic nuclei fuse and form the nuclei of heavier elements; an excellent example of downward causation, of the action of the whole structure upon its constituent particles.

The analogy with abstract constructions at high levels of the constructional hierarchy imposing constraints onto potential lexical slot-fillers is obvious. In the following, it will be shown that the analogy readily extends to the relationship between complex word strings and their component morphemes as well as morphemes and their component letters, respectively. An illustration of the former kind of relationship is the cartoon in Figure 3, which exploits the fact that given the (linguistic and non-linguistic) context, readers will have no problem homing in on the respective meanings of ‘smurf’ and ‘snork’.

81

Operationalizing entrenchment

81

Figure 3. Cartoon illustrating the principle of downward causation in language. In this example, complex linguistic constructions exert top-down pressure onto their constituent morphemes, allowing the reader infer the meanings of ‘smurf’ and ‘snork’. Retrieved 14:40, September 28, 2012, from http://bluebuddies.com/ubb/ultimatebb.php/topic/1/2071.html, with permission from BlueBuddies.com (caption mine).

Another example of top-down coercion at a concrete multi-word level is idiomatically combining expressions, which, once understood in a holistic fashion, devolve function upon their component lexemes (cf. section 2.1). Consider, for example, the following sentences: (4) (5)

She spilled the beans about Tom. (Croft 2001: 181) Tom pulled strings to get the job. (Croft 2001: 181)

In the light of the idiosyncratic (and therefore necessarily holistic) expressions that they constitute, the words spilled, beans, pull and strings take on a special meaning which they do not have in other contexts. An illustration of top-down coercion at an even lower level of linguistic part-whole relationships is given in Figure 4:

82

Sources of inspiration

82

Figure 4. An example of top-down coercion from words onto their constituent letters. From: Top-down and bottom-up design. (2011, April 21). In Wikipedia, The Free Encyclopedia. Retrieved 08:57, May 20, 2011, from http://en.wikipedia.org/w/index.php?title=Top-down_and_bottom-up_design& oldid=425225865 (caption mine).

The second character of each of these words is ambiguous between H and A, but readers will easily deduce their respective function, given that they will have independent holistic knowledge of the relevant words. Closely related is the text in (6), which is amazingly easy to read, although most of its words have had their letters jumbled. (6)

Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. (Velan and Frost 2007: 913)

(7)

According to a researcher at Cambridge University, it doesn't matter in what order the letters in a word are, the only important thing is that the first and last letter be at the right place. The rest can be a total mess and you can still read it without problem. This is because the human mind does not read every letter by itself but the word as a whole.

This text and its corrected version in (7) nicely illustrate a finding which has been known in the psycholinguistic literature since the 1970s (cf. Reicher 1969), namely that letter processing in words is easier than in nonwords, for example under conditions of brief presentation (cf. the relevant experiments by Harris (1998) reported in section 3.3). This effect, which goes by the name of word superiority effect, has been attributed to the fact that letters in words benefit from top-down activation from unitized wordlevel representations (Rastle 2007: 76; cf. Carr 1986 for a review). Note, however, that the text itself and the claims that it makes should be taken with some caution for several reasons. First, it started out as an internet hoax which was not actually bolstered by any original research at Cambridge University when it first started to circulate around 2003 (for

83

Operationalizing entrenchment

83

details, see Rayner et al. 2006 and http://www.mrccbu.cam.ac.uk/people/matt. davis/cmabridge/ [sic!]). Second, it is by now well-established that things are much more complex than suggested, since letters have to be jumbled according to certain rules. Thus, transpositions between letters that are distant of each other, which give rise to another existing word, which cross morphemic boundaries, which affect exterior letters, and which disrupt the sound structure of the original word will be significantly more difficult to read (Andrews 1996; Grainger and Whitney 2004; McCusker, Gough, and Bias 1981; Perea and Lupker 2003; Van Orden 1987). Moreover, it has been suggested that effects may vary depending on the language and alphabet used (Velan and Frost 2007). Unlike pattern completion, emancipation and top-down coercion necessarily imply a chunk-level representation and, indeed, can be seen as representing complementary perspectives on this very phenomenon, with emancipation referring to the perception of the chunk-level and top-down coercion to that of its subparts. Besides vision, my experiments will draw on another cognitive source of inspiration. Since the mid-1950s, holistic chunks have played a prominent role in research on short-term memory. In a seminal paper, Miller (1956) claims that human immediate memory is limited to about seven (plus or minus two) chunks of information at a time. Interestingly, he argues that while the number of chunks in short-term memory is fixed, we may augment our memory span by mentally reanalyzing information into fewer meaningful chunks, thereby expanding the size of each chunk. Miller (1956: 93) himself provided the following insightful linguistic example: A man just beginning to learn radio-telegraphic code hears each dit and dah as a separate chunk. Soon he is able to organize these sounds into letters and then he can deal with the letters as chunks. Then the letters organize themselves as words, which are still larger chunks, and he begins to hear whole phrases.

This quote suggests that the size of chunks varies as a function of experience, which implies that experience with sequences must somehow become ingrained into long-term memory – a fact I will capitalize on in one of my experiments (cf. section 4.4). Note that while later research has suggested that the human memory span may actually be smaller and depend on the nature of stimuli to be memorized, the basic idea that we can and actually do increase our mental storage capacity by recoding atomic bits of information into higher-level chunks is still widely accepted (Baddeley 2000; Cowan 2001; Hulme et al.

84

Sources of inspiration

84

1995; Mathy and Feldman 2012; Schweickert and Boruff 1986). Just for the sake of completeness – and to reinforce my initial claim that language fundamentally works like other aspects of cognition – let us point out that chunks have also been argued to play a prominent role in the processing of motor sequences (Terrace 2002; Sakai, Kitaguchi, and Hikosaka 2003; for an overview of chunking processes in human perception and learning in different cognitive domains, cf. Gobet et al. 2001). This section has reviewed research on non-linguistic domains of cognitive chunking with a view to gaining a more concrete understanding of the gradable and categorical features of entrenchment. Research on vision and memory suggest that entrenchment is best modelled in terms of interplay between parts and wholes, or, to put it in other words, in terms of meronomic relationships. All in all, with regard to the research questions outlined in section 3.4, this section invites the following predictions: If token frequencies of multimorphemic sequences correlate with their entrenchment in the minds of speakers, (i) (ii) (iii) (iv)

higher token frequencies should correlate positively with ease of activation from individual morphemes (i.e., parts) to multimorphemic sequences (i.e., wholes); there should be a frequency threshold indexing the point where compositional bottom-up processing turns into holistic retrieval; higher-frequency sequences with jumbled letters should be easier to decipher than lower frequency sequences with jumbled letters; higher frequency sequences should be easier to memorize than lower-frequency sequences.

Section 3.4 emphasized the need for an operationalization which keeps track of the gradual as well as the categorical dimensions of entrenchment. From the present section, pattern completion naturally emerges as a promising candidate to measure gradual changes of processing fluidity, while emancipation and top-down coercion suggest themselves as criteria for chunking. Memory storage, by contrast, is vague as to the underlying cognitive representations: While better mnemonic performance may be due to holistic storage, it may equally well rely on greater strength of association between component parts – a fact which does not seem to have been acknowledged by Miller (1956). Chapter 5 will nevertheless present a memory experiment, as memory tasks provide valuable insights with relatively little experimental effort. This experiment, however, should merely be seen as an ancillary experiment with the potential to further substantiate the findings from the main experiments. The next section will present the

85

Operationalizing entrenchment

85

experimental paradigm which will be exploited in my neurobehavioural investigations, the masked priming paradigm. 4.3.2.

The masked priming paradigm

This section will provide an introduction to priming. It will be demonstrated that masked priming represents a robust and well-established paradigm to test part-whole relationships both at a behavioural and at a neural level. Generally speaking, priming investigates how the presence of one stimulus, the prime, affects the processing of another stimulus, the target. In neuroand psycholinguistics, priming experiments are usually combined with lexical or phrasal decision tasks, where participants are asked to determine as quickly and accurately as possible whether a given target represents a real expression of the language. In forward priming, which will be exploited in my experiments, the prime precedes the target. At the behavioural level, response latencies in lexical decision tasks vary as a function of the degree of mental relatedness between prime and target such that strongly related stimuli yield faster response times. To give an example of semantic priming, responses to the target nurse will be quicker after a prime like doctor than after pink, presumably because doctor and nurse are metonymically related. The fMRI correlate of behavioural facilitation is the phenomenon of ‘fMRI suppression’, which refers to the fact that cerebral regions which are activated by a prime will display significantly reduced neural activation in response to a strongly related target. Crucially, robust fMRI suppression effects tend to accompany relatively small behavioural effects. Broadly speaking, quicker responses will be associated with greater reductions in BOLD signal, and neural activity will decrease most in cases of stimulus repetition (for reviews, see Buckner et al. 2000; Henson 2003; Schacter and Buckner 1998; Wiggs and Martin 1998). Importantly, research has shown that manipulations of the stimulus onset asynchrony (SOA; i.e., the delay between prime and target) make it possible to track different stages in visual word recognition – a finding which has recently received independent support from electrophysiological research (Hauk et al. 2006; Pulvermüller 2007). More precisely, short prime-target intervals are known to reflect early stages of processing which tap into aspects of physical relatedness (like length, font or intensity). With increasing SOAs, responses come to depend less on surface properties processed in posterior, modality-specific regions of the brain (such as the visual cortex in BAs 17, 18 and 19) and more on amodal conceptual repre-

86

Sources of inspiration

86

sentations in anterior regions (such as the left frontal cortex). As a result, researchers in visual word processing have postulated a gradient of neural priming effects ranging from early, automatic and perceptual regions in the occipital lobes to later more complex, concept-driven and conscious regions in the frontal lobes (McCarthy et al. 1995; Dehaene et al. 2004; Orfanidou, Marslen-Wilson, and Davis 2006; Gold and Rastle 2007). Priming studies which use primes that do not reach the participants’ consciousness are referred to as masked priming experiments (for a review of this paradigm, see Kinoshita and Lupker 2003). The most widely used masked priming paradigm, which goes back to Forster and Davis (1984), exploits the so-called ‘sandwich technique’. This technique involves inserting a very briefly displayed prime between a forward mask (e.g., a row of hash marks) and a target which functions as a backward mask, as illustrated by Figure 5.

Figure 5.

Schematic representation of masked priming using the ‘sandwich technique’.

To avoid priming at a purely perceptual level, which would be uninformative with regard to linguistic processes, the prime is typically displayed in lowercase (e.g., dog), whereas the target is presented in uppercase (e.g., DOG) (cf. Dehaene et al. 2001). Crucially, although the masked prime will be invisible to most subjects, it will still exert a measurable influence on the perception of the target (cf. Forster et al. 1987). The obvious advantage of masked priming resides in the fact that participants cannot possibly employ strategies: Although they will have the impression of merely reacting to the targets, they will actually be uncon-

87

Operationalizing entrenchment

87

sciously responding to prime-target relationships. A potential shortcoming of this technique is that due to extremely short SOAs of maximally 50 ms, it can only inform us about relatively early processing stages (Bozic et al. 2007). This explains why the neural effects of masked primes tend to be limited to posterior brain regions and why results obtained under masked conditions are not necessarily replicated when the same primes are unmasked. Thus, it is by now well-established that so-called ‘pseudo-derived’ words (such as corner, handsome or hardly) only prime their ‘pseudostems’ (corn, hand or hard) under masked conditions. Crucially, this priming effect is only observed when the primes are fully segmentable into independently existing morphemes (i.e., pairs like harpoon – HARP or brothel – BROTH do not prime). This suggests that in initial stages of word processing, the brain attempts to segment incoming orthographic input into formal units before higher-level semantic representations are accessed. These findings agree which with Taft and Forster’s (1975) famous affixstripping model, according to which complex words are obligatorily decomposed into morpho-orthographic sub-parts before semantic whole-word access (cf. also Taft and Hambly 1985). Many researchers have argued that masked priming for pseudomorphological pairs is equivalent in magnitude to that for fully transparent pairs such as teacher – TEACH (cf. Lavric, Clapp, and Rastle 2007; Marslen-Wilson, Bozic, and Randall 2008; Rastle, Davis, and New 2004; Rastle and Davis 2008; for similar results in other languages, cf. Boudelaa and Marslen-Wilson 2005 [Arabic]; Longtin and Meunier 2005 [French]; Longtin, Segui, and Halle 2003 [French]). Others have found significantly weaker effects for non-transparent pairs, which has been taken to point to an effect of morpho-semantics at very early stages of processing (e.g., Diependaele, Sandra, and Grainger 2005; Feldman, O'Connor, and Moscoso del Prado Martín 2009; Morris et al. 2007). The time course of access of semantic information remains an interesting topic for further investigation. Be this as it may, researchers are unanimous in acknowledging that in overt priming, pseudo-morphological relationships yield inhibition, with facilitation only obtaining under conditions of transparency (MarslenWilson 2007; Marslen-Wilson et al. 1994; Meunier and Longtin 2007; Rastle and Davis 2003). There has been a heated debate among neurolinguists about whether this kind of facilitation should be explained in terms of semantic or morphological relatedness. Advocates of the ‘morphological’ account maintain that morphology exists as a type of linguistic knowledge of its own, while exponents of the opposite view – most promi-

88

Sources of inspiration

88

nently connectionists – argue that morphemes are nothing but systematic form-meaning overlaps between stored whole forms in an associative memory (Joanisse and Seidenberg 1999; McClelland and Patterson 2002; Plunkett and Marchman 1993; Raveh 2002; Rumelhart and McClelland 1986). Although the debate itself is not directly relevant to my research questions, it has the advantage of having sparked an intensive debate and insightful neuro-imaging research into the nature of meronomic relationships between complex words and their stems. In the following, I will report some relevant fMRI priming studies which have been an important source of inspiration for my own experiments. All these studies combine visual priming with a lexical decision task and use an event-related fMRI design, which allows the investigator to detect brain responses to single events from randomly intermixed task conditions. This review will also serve to give the unfamiliar reader a more concrete idea of how fMRI priming experiments operate in practice. Devlin et al. (2004) conduct a masked priming study (SOA: 33 ms) to examine three kinds of prime-target relationships: pairs sharing form (e.g., corner – CORN), pairs sharing meaning (e.g., idea – NOTION) and pairs sharing both form and meaning due to a common morpheme (e.g., boldly – BOLD). Devlin et al. (2004) first identify the neural circuits activated by word reading by subtracting the activation for consonant letter strings from that for unrelated word pairs like ozone – HERO.17 Within these circuits, they compare the fMRI suppression for each priming group to that for unrelated pairs. The shared-form condition yields a reduced BOLD answer in the left occipito-temporal cortex, the shared-meaning condition induces a reduction in the left middle temporal gyrus, and the morphological condition results in reductions in the left posterior occipito-temporal cortex and the left middle temporal gyrus. In behavioural terms, only the shared-form and morphological conditions correlate with significant facilitation. All in all, Devlin et al.’s (2004) results further corroborate the abovementioned claim that the neural effects of masked primes are limited to posterior cerebral regions. They also lend credence to another hypothesis which has been discussed in the literature, the idea that the left posterior fusiform gyrus (or: visual word form area) is engaged in the extraction of meaningful primitive units at early stages of visual recognition (Dehaene et al. 2001; Devlin et al. 2006; Marslen-Wilson and Tyler 2007; Vinckier et 17. This procedure yielded the following activation regions: the bilateral posterior angular gyri; the left precentral gyrus, the left posterior occipito-temporal cortex, the left middle temporal gyrus, and the left frontal operculum.

89

Operationalizing entrenchment

89

al. 2007; see also Gold and Rastle 2007). Moreover, according to Devlin et al. (2004), their results support the notion that morphology can be reduced to systematic overlaps between form and meaning. This conclusion is challenged by Gold and Rastle (2007) on the grounds that the monomorphemic primes in Devlin et al.’s (2004) shared-form condition actually look pseudo-morphological (e.g., corner – CORN). To address this concern, Gold and Rastle (2007) conduct a masked priming fMRI study (SOA: 30 ms) which distinguishes between neural priming for pseudo-morphological pairs (corner – CORN), formal pairs (e.g., brothel – BROTH), and semantically related pairs (e.g., bucket – PAIL). For each condition, areas of repetition suppression are defined as those which exhibit significantly lower BOLD activation than an unrelated condition (e.g., distinct – CHEAP) within regions activated by all word conditions relative to a baseline fixation.18 Crucially, the pseudo-morphological condition triggers neural priming in a left-lateralized region which does not overlap with either form or meaning effects: an extrastriate region in the anterior part of the middle occipital gyrus (BA 19). In behavioural terms, significant facilitation is found only for the pseudo-morphological condition. According to Gold and Rastle (2007), these results demonstrate that morphology cannot be reduced to the joint effects of form and meaning. In my view, however, their study merely shows that morpho-orthographic units have a special status in the mind and are automatically extracted from visual input before their meaning is accessed (cf. Lewis, Solomyak, and Marantz 2011). As the study does not include transparent pairs (like hunter – HUNT), it cannot be taken to provide insights into real morphemes, which are associated with meanings by their very definition. A closely related fMRI study uses another paradigm thought to reflect the neural effects of early stages of complex word processing: delayed repetition priming. This technique involves separating primes and their targets by varying numbers of intervening words and thereby reduces the possibility of participants employing conscious strategies. Bozic et al. (2007) apply this technique to word pairs exhibiting the following kinds of relationships: transparent (e.g., bravely – BRAVE), opaque (e.g., department – DEPART), repetition (half complex, half simple, e.g., mist – MIST or lately – LATELY), form (e.g., scandal – SCAN) and meaning (e.g., accuse – BLAME). 18. This yielded a mainly left-hemispheric network including the middle temporal gyrus, the occipito-temporal cortex, supramarginal and angular gyri as well as the inferior prefrontal cortex.

90

Sources of inspiration

90

Compared to the combined effects for form and meaning pairs, those for opaque and transparent pairs yield fMRI suppression in left frontal regions, with a significant cluster involving the inferior frontal operculum, the insula, the rolandic operculum and the precentral gyrus. The following statistical parametric map renders this cluster at an uncorrected threshold of p < 0.01:

Figure 6.

Regions showing significant repetition suppression for opaque and transparent pairs compared form and meaning pairs. The colour bar indicates voxel-level t-values (i.e., activation strength). The plot on the right illustrates the effect size for the relevant conditions in the peak voxel, with solid bars representing the primes and shaded bars representing their targets. Reprinted from Journal of Cognitive Neuroscience, 19(9), Bozic, Mirjana, William D. Marslen-Wilson, Emmanuel A. Stamatakis, Matthew H. Davis, and Lorraine K. Tyler, Differentiating morphology, form, and meaning: Neural correlates of morphological complexity, 1464–1475, 2007. Reprinted by permission of MIT Press Journals (caption modified).

The study also finds significant behavioural priming for opaque (17 ms) and transparent (13 ms) pairs (the difference between these priming effects is not significant), whereas pairs in the form and meaning conditions do not prime. A second contrast compares the activation elicited by first (unprimed) presentations of complex and pseudocomplex words to that of lengthmatched simple words from the form and meaning conditions in wholebrain analysis. This yields activation in the left pars orbitalis, which is significant after small volume correction (SVC). The statistical parametric

91

Operationalizing entrenchment

91

map in Figure 7 illustrates this result, again at an uncorrected voxel threshold of p < 0.01:

Figure 7.

Cluster showing enhanced activation for first (unprimed) presentations of complex and pseudocomplex words relative to lengthmatched simple words from the form and meaning conditions in whole-brain analysis after SVC. The colour bar indicates voxel-level t-values. The plot on the right shows the effect size for the relevant conditions in the peak voxel. Reprinted from Journal of Cognitive Neuroscience, 19(9), Bozic, Mirjana, William D. Marslen-Wilson, Emmanuel A. Stamatakis, Matthew H. Davis, and Lorraine K. Tyler, Differentiating morphology, form, and meaning: Neural correlates of morphological complexity, 1464–1475, 2007. Reprinted by permission of MIT Press Journals (caption modified).

Overall, these results suggest that morpho-orthographic units exist as a mental category of their own and that left inferior frontal regions underlie their extraction. Unfortunately, Bozic et al. (2007) do not compare neural priming effects of opaque and transparent pairs. As a consequence, like Gold and Rastle’s (2007) study, their research does not shed light on possible neural effects of semantic transparency. Also note that a major puzzle about this paradigm is that it reveals activity in anterior brain regions even though it taps into early morpho-orthographic stages of word processing. This section has shown that visual priming represents a well-established and robust paradigm for examining linguistic part-whole relationships at a behavioural and a neural level. Quite a few studies have successfully used this paradigm to investigate meronomic relationships between different kinds of items with a view to exploring the conditions under which they are decomposed or holistically accessed. Crucially, even pseudo-bimorphemic

92

Sources of inspiration

92

lexemes such as corner are accessed in a decompositional manner in initial stages of word processing. This analysis is then rejected in subsequent stages, where holistic processing gains the upper hand. This finding suggests another important point which does not seem to have been acknowledged in the usage-based literature so far, namely that chunking may vary as a function of the processing stage under consideration. The work reviewed in this section also raises a number of questions. Although it shows that morpho-orthographic parts of complex words are accessed before semantic whole-word properties, it does not inform us about the exact nature of word-level processing. More specifically, it does not tell us how people construct the meaning of the whole from individual morpho-orthographic segments. Do they pair each morpho-orthographic segment (such as corn, teach, and -er) with its meaning, before attempting to combine the resulting morphemes into a complex word? Or, alternatively, do they first integrate the morpho-orthographic units into a larger orthographic representation, which then gets holistically mapped onto the relevant functional representation? And in which cases besides idiosyncrasy do we have a stored whole-word representation available in the first place? A usage-based approach which integrates the above findings would predict that there is no all-or-nothing answer to this question, but that effects depend on usage-frequency. More precisely, one would expect that in early stages of visual word processing, the meanings of low-frequency derivatives are put together from smaller form-meaning associations, whereas the meanings of high-frequency derivatives are retrieved in a precompiled, holistic fashion. The effects of frequency-related chunking should be weakest in the earliest stages of processing. This hypothesis gains support from the only priming study so far to have examined frequency effects between derivatives and their bases. Meunier and Segui (1999b) conducted two purely behavioural cross-modal priming experiments in French.19 The first experiment investigated priming effects between bases and derived lexemes (like travail – travailleur, ‘work’ [noun] – ‘worker’). The second experiment reversed this order by presenting participants with derivatives prior to bases (e.g., travailleur – travail). Priming in the former experimental condition was compared to identity priming between derived items (e.g., travailleur – travailleur), while priming in the latter condition was contrasted with stem-stem identity priming (e.g., travail – travail). 19. Cross-modal priming is a technique which involves subjects making a lexical decision on a visually presented item after the offset of an acoustic prime.

93

Operationalizing entrenchment

93

In the first experiment, bases primed high-frequency derivatives much more strongly than low-frequency derivatives, but priming for highfrequency derivatives was still weaker than identity priming between derivatives. As far as the second experiment is concerned, base priming with low-frequency derivatives was as strong as identity priming between stems, with behavioural facilitation after high-frequency derivatives being much weaker. On this basis, Meunier and Segui (1999b) claim that highfrequency derivatives have lexical representations of their own, whereas low-frequency derivatives are morphemically represented, which will facilitate the recognition of their bases. They also hypothesize that base primes will differentially pre-activate members of their morphological family (which encompasses all lexemes in which the stem occurs as a constituent) as a function of their respective surface frequency, such that higherfrequency derivatives will receive more activation. This study is extremely interesting in that it indicates that the asymmetric part-whole relationships known from visual Gestalts readily extend to bi-morphemic sequences in the language. More concretely, with higher frequency, parts increasingly activate wholes, but wholes decreasingly activate parts, as predicted in section 3.4. Unfortunately, to the best of my knowledge, Meunier and Segui’s (1999b) results have never been replicated in French, let alone English. Moreover, they are silent as to the central question of how gradual changes in processing are related to chunking. Due to its factorial design, which involves assigning stimuli to predefined categories, their study is not sensitive to the possibility of graded differences between derivatives of higher or lower frequencies. This problem also holds for most other studies on frequency effects, which will be reviewed in the next two sections. 4.3.3.

Frequency effects in English derivatives

Traditionally, psycholinguistic investigations into frequency effects on multi-morphemic words have pitted the usage frequencies of individual constituents (morphemes, syllables, etc.) against those of whole words to examine which level is more predictive of behavioural performance. In research on derivatives, the term ‘surface frequency’ is commonly used to refer to the count of how frequently a particular base and derivational affix occur together in language use across different possible inflectional variants (e.g., government and governments). Surface frequencies are thus equivalent to token or lemma frequencies of derivatives. By contrast, the term ‘base frequency’ refers to how frequently the base morpheme occurs on its

94

Sources of inspiration

94

own and in other word forms (e.g., govern, governed, governing, governs) – in other words, it denotes the token or lemma frequency of a root (Taft 2004; Vannest et al. 2011; Baayen, Tweedy, and Schreuder 2002). For example, if in a lexical decision task involving derived words, higher surface frequencies lead to quicker response times, this will be taken to provide evidence for whole-word retrieval and holistic representation (all the more so when constituent frequencies are controlled for). By contrast, if response times vary as a function of base frequencies, everything else being equal, this will be interpreted in terms of decompositional access to complex words and morphemic representation (Hay 2001: 1062). Alternatively, under a multi-stage approach assuming obligatory initial access to morpho-orthographic constituents (see section 4.3.2), base frequency effects will be taken to reflect initial phases of processing, whereas surface frequency effects will bear witness to subsequent whole-word stages (Baayen, Wurm, and Aycok 2008; Taft 2004). This section will limit itself to presenting research on the visual recognition of suffixed bimorphemic English derivatives, which will be at the core of the experiments to be presented in 4.4.20 Why focus on this kind of stimuli? First, research into derivation has so far mainly been behavioural, while the neuro-anatomical picture for other areas of morphology is much better established. What is more, the scarce existing neurological data on derivational morphology are inconsistent and still subject to considerable debate (Bozic et al. 2007: 1465; MarslenWilson 2007: 182). Neurobehavioural research into derivatives therefore promises exciting new insights into a relatively unexplored part of morphology, while at the same time allowing us to draw on the strengths of existing empirical studies on related topics. A further advantage resides in the fact that while all languages have derivational suffixes, languages do not necessarily have inflectional affixes (Bybee 2010: 3). This offers the long-term prospect of extending the paradigms used in this work to other languages. Third, inflectional processes are usually considered to be fully 20. For frequency effects in word production, see Bien, Levelt, and Baayen 2005; Jescheniak, Meyer, and Levelt 2003; for frequency effects on inflectional morphology, see Alegre and Gordon 1999; Davis, Van Casteren, and MarslenWilson 2003 and Taft 1979; for frequency effects on compound words, see Fiorentino and Poeppel 2007 and Juhasz et al. 2003; for frequency effects on the auditory processing of derivatives in French, see Meunier and Segui 1999a; for frequency effects in French, Dutch, Italian and German derivatives, cf. Bertram, Schreuder, and Baayen 2000; Burani and Caramazza 1987; Colé, Beauvillain, and Segui 1989; and Clahsen and Neubauer 2010, respectively.

95

Operationalizing entrenchment

95

productive, while derivational formations tend to be subject to more or less arbitrary constraints of varying scope (Haspelmath 2002: ch. 4.3). This suggests that derivatives may be associated with a much more heterogeneous neurobehavioural profile than word forms created through inflectional affixation. It also indicates that it might be insightful to explore the predictive value of different productivity metrics, which would help us to achieve a more complete picture of entrenchment than an account restricted to token frequencies. Last but not least, as a matter of fact, current fMRI research implies experimental settings which are extremely distorting with regard to real-life communication: Subjects have to lie still in the confined space of a dark and noisy scanner which is usually located in an unfamiliar, clinical environment. Lexical decision studies, for example, expose subjects to single words on a screen in the absence of any meaningful context and have them indicate as quickly as possible whether these words are ‘real’ via button press. In her study of morpheme order across languages, Bybee (1985) argued that derivational morphemes are more semantically relevant to the bases they attach to than inflectional morphemes, which have a much wider scope and interact with the syntax of the whole sentence. Crosslinguistically, this is reflected by the tendency for derivational affixes to appear closer to verb stems than inflectional affixes. This indicates that in fMRI priming experiments, it might be somewhat more natural (or, for that matter, less unnatural) to have subjects judge derived than inflected words, whose understanding is more dependent upon a sentential context. To avoid misunderstandings, let me mention that in contrast to many psycholinguists (such as, for example, Laudanna, Badecker, and Caramazza 1992 or Miceli and Caramazza 1988), Bybee (1985) and other usage-based linguists do not think of inflection and derivation as distinct phenomena, but as processes which vary along a continuum, such that affixes which are less productive and more relevant to their base are more derivational (cf. also Dressler 1989; Dalton-Puffer 1996). This being said, let us proceed to the overview of non-priming research into frequency effects on the visual recognition of suffixed bimorphemic English derivatives. Vannest and Boland (1999) conducted a lexical decision experiment on words involving the productive and phonologically neutral suffix -less and found a base frequency effect, which they interpret in terms of morphemic access. Interestingly, this base frequency effect did not hold for the suffixes -ity and -ation, suggesting that the relevant derivatives are stored and represented holistically (see also section 3.3.5). Vannest and Boland (1999) attribute this result to the idiosyncratic behaviour of -ity and -ation, which tend to trigger phonological changes in the stem to

96

Sources of inspiration

96

which they attach (compare serene – serenity to point – pointless). By contrast, surface frequency effects were observed for both groups of derivatives. This finding agrees with the results from a lexical decision study by Bradley, Garrett, and Zurif (1980), who found both base and surface frequency effects for the phonologically neutral suffixes -ness, -er, and ment, but only surface frequency effects for -ion. Ford, Davis, and Marslen-Wilson (2010) conducted a statistically much more sophisticated visual lexical decision study. Unlike earlier psycholinguistic studies on frequency effects in English derivatives, they exploited multiple regression analysis, which implies that they did not dichotomize the predictor variables under consideration. This contrasts with more traditional factorial designs, which arbitrarily assign stimuli to predefined categories (e.g., high versus low surface frequencies), even if they actually vary along a continuous parameter. In line with earlier research, Ford, Davis, and Marslen-Wilson (2010) found higher base frequencies to correlate with lower response times in derivatives involving productive suffixes, leading to the conclusion that these words must be represented in a morphemic fashion. By contrast, response times for derivatives involving nonproductive affixes did not vary as a function of base frequency. Ford, Davis, and Marslen-Wilson (2010) also investigated the effects of morphological family size, which counts the number of complex words in which a base occurs as a constituent and is often regarded as a potential confound of base frequency effects (cf. section 5.1.1.1). After transforming the variables of family size and base frequency to reduce multicollinearity, Ford, Davis, and Marslen-Wilson (2010) observed independent facilitatory effects for both predictors. Intriguingly, the effect of family size obtained irrespective of suffix productivity, which is interpreted as supporting the claim that it reflects processes of semantic association rather than morphological processes proper (Schreuder and Baayen 1997; Bertram, Baayen, and Schreuder 2000; De Jong, Schreuder, and Baayen 2000). Consistent with earlier studies, Ford, Davis, and Marslen-Wilson (2010) also found surface frequency effects, which, in the case of derivatives formed with productive affixes, were markedly stronger than those of base frequency. Overall, the foregoing studies point to the possibility that derivatives involving productive, phonologically neutral and semantically transparent suffixes might be processed in a manner which is both holistic and compositional at the same time, as witnessed by the simultaneous existence of base and surface frequency effects – a finding which is also supported by research on other languages (e.g., Burani and Caramazza 1987 for Italian). By contrast, derivatives involving non-productive suffixes are only affected by surface frequencies. This indicates that they are processed in an exclu-

97

Operationalizing entrenchment

97

sively holistic fashion. To put it in different terms, the above-presented studies suggest that all complex derivatives are processed holistically, but that some of them are additionally accessed via their component morphemes – an interesting assumption, which turns traditional morphemebased models that consider composition as the default case on their head. Dual processing is exactly the view defended by Hay (2001), who conducted a pen-and-paper-experiment which revealed that subjects perceive derivatives exhibiting a higher base than surface frequency (e.g., rekindle, inadequate and invulnerable) as significantly more complex than matched counterparts with a comparatively lower base frequency (e.g., refurbish, inaudible or incongruous). The finding that relative frequencies (i.e., the ratio between surface and base frequencies) are more predictive of morphological decomposition than simple surface frequencies was further corroborated by a dictionary study showing that derivatives exhibiting a high relative frequency tend to be more semantically transparent than those which are less frequent than their bases. Hay (2001) interprets these findings in terms of dual-route access. More specifically, she claims that there are two pathways for handling complex words – a holistic retrieval route and a compositional parsing route – and that relative frequencies determine which of either route dominates morphological processing. She also maintains that previously observed effects of whole-word surface frequency may actually represent artefacts of relative frequency, as the two measures are strongly correlated. Although these and similar results presented by Hay (2001, 2003) clearly suggest that relative frequency should be included into the set of predictor variables for my own experiments, some limitations of Hay’s (2001) study warrant mention as well. Her psycholinguistic experiment involved participants making conscious judgments about decomposability. However, it is not to be excluded that decomposability judgments in a metalinguistic forced-choice experiment including instructions explicitly defining the notion of complexity do not directly reflect actual decompositional activity in real language use.21 Vannest et al. (2011) report an event-related fMRI lexical decision study comparing the effects of base frequency on BOLD activity for different kinds of lexemes: ‘decomposable’ items, whose derivational suffixes do not affect the phonology of the base they attach to (e.g., agreeable), ‘wholeword’ items, which involve affixes which tend to modify the phonology of 21. The instructions explicitly define complex words as words which “can be broken down into smaller, meaningful, units” (cf. Hay 2001: 1048).

98

Sources of inspiration

98

their base (e.g., location), and matched high- and low-frequency monomorphemic items (see also section 3.3.5). The two groups of complex items are matched for average surface frequencies, which Vannest et al. (2011) describe as relatively low. Vannest et al. (2011) restrict their analyses to regions of interest where effects of morphological complexity and word frequency have been observed in prior studies22 and which are activated in performing the lexical decision task (relative to a resting baseline, a fixation cross). As response times vary as a function of stimulus group (consistent with Vannest and Boland 1999), Vannest et al. (2011) conduct a regression analysis to identify and partial out BOLD signal change that might reflect perceptual or motor activation associated with longer response times.23 They then go on to run a 3x2 ANOVA of word type (simple, ‘wholeword’ and ‘decomposable’) and frequency (high versus low) on the residual BOLD response, reporting results at p < 0.05. Regions with a main effect for frequency are the bilateral thalamus (left: p = 0.003; right: p = 0.006), the left cerebellum, the left SMA and, more marginally, the caudate nucleus (left: p = 0.06; right: p = 0.08), with lower frequency words leading to increased activation. From among these areas, only the right thalamus displays a statistically significant interaction with word type: As shown in Figure 8, low frequencies yield a stronger BOLD response for simple and ‘decomposable’ words, whereas ‘whole-word’ derivatives do not show any significant effects of frequency.

22. These include the supplementary motor area, the superior, middle and inferior temporal gyri, the inferior frontal gyrus, the insula, the angular gyrus, the inferior occipital cortex, the caudate nucleus, the fusiform gyrus, the thalamus and the cerebellum. 23. The relevant regions were in the left supplementary motor area, the right middle temporal gyrus, the right inferior occipital cortex, the caudate nucleus, the thalamus, and the cerebellum bilaterally.

99

Operationalizing entrenchment

Figure 8.

99

Region of interest displaying a frequency by word-type interaction after removal of the effect of response time. The frequency effect is graphed as a function of word type. Reprinted from Brain Research, 1373, Vannest, J., E. L. Newport, A. J. Newman, and D. Bavelier, Interplay between morphology and frequency in lexical access: The case of the base frequency effect, 144–159, 2012, with permission from Elsevier (caption modified).

Two other areas, the bilateral insulae and the left inferior occipital gyrus, also show an interaction of word type by frequency. In these areas, only simple words elicit a large frequency effect. Regions which are responsive to morphological complexity are the left inferior frontal and left superior temporal gyri, as well as the right inferior temporal and left angular gyri (p = 0.07), with all regions showing stronger engagement for complex (‘decomposable’ plus ‘whole-word’) items versus simple words. Within the regions displaying this main effect of word type, planned contrasts reveal that regions exhibiting stronger activation for ‘decomposable’ derivatives than monomorphemic words are the left inferior frontal gyrus and the left superior temporal gyrus, as illustrated by the following figure:

100

Figure 9.

Sources of inspiration

100

Regions of interest displaying a main effect of word-type (complex>simple): left inferior frontal gyrus and left superior temporal gyrus. For each region of interest, the mean residual percent signal change (after RT regression) is graphed as a function of word type. Reprinted from Brain Research, 1373, Vannest, J., E. L. Newport, A. J. Newman, and D. Bavelier, Interplay between morphology and frequency in lexical access: The case of the base frequency effect, 144–159, 2011, with permission from Elsevier (caption modified).

A similar, but statistically somewhat less significant effect obtains for the planned contrast between ‘whole-word’ items and simple words. For its simultaneous focus on frequency effects and holism, this study is highly pertinent with regard to the research questions at hand. All in all, it shows that usage frequency and morphological status affect the mental lexicon in ways which cannot be explained away by effects of variation in motor response preparation and execution. As far as frequency contrasts are concerned, ‘decomposable’ derivatives can be seen as occupying the middle ground between monomorphemic items, which show the strongest and most wide-ranging BOLD effects, and ‘whole-word’ derivatives, which are not sensitive to differences in base frequency at all. This indicates that stem morphemes are more prominent in monomorphemic words than in

101

Operationalizing entrenchment

101

low-frequency transparent derivatives, which in turn are processed in a more stem-driven fashion than matched non-transparent items. Overall, this would tie in with the connectionist notion of ‘morphemicity’ being a gradient rather than a categorical issue. It remains to be seen whether such an account could be integrated with the above-mentioned dual-processing claim, according to which morphemic and holistic processing are not mutually exclusive. To compare the validity of different models and to find out how holistic and compositional levels of representation are related, it will be necessary to keep track of what happens in between factorial extremes, both at a behavioural level and at a neural level. The next section will show that a few neurolinguistic studies have recently begun to treat word-related features such as frequency as non-categorical variables. All of these parametric studies have dealt with monomorphemic words, and the experiments in chapter 5 will be the first to extend this paradigm to bi-morphemic sequences. 4.3.4.

Parametric studies on word frequency effects

Unlike classical factorial or categorical studies, parametric experiments make it possible to “exploit the full range of values available for continuous variables”, leading to enhanced statistical sensitivity (Hauk, Davis, and Pulvermüller 2008: 1185, cf. also Baayen 2010a). More concretely, rather than subtracting activation for a baseline condition from activation for a condition of interest (cf. section 4.2), parametric studies track correlations between one or several predictors of interest (e.g., surface frequency or family size) and the BOLD signal within one and the same critical condition. Parametric studies exhibit greater ecological validity than traditional studies, since researchers are not required to assign stimuli which vary along a continuous parameter to more or less arbitrarily predefined categories (such as high, middle and low frequency). An alternative way of creating factors is to select items which represent extreme values along a variable of interest (e.g., compare the 30 most frequent words of a language to the 30 least frequent ones), but this usually results in stimulus samples which are not representative of the population of interest (Ford, MarslenWilson, and Davis 2003). Another important advantage of parametric studies is that they are not subject to baseline effects. As mentioned in section 4.2, an ideal baseline represents a condition which differs from the condition of interest in a

102

Sources of inspiration

102

single cognitive component. A good baseline is rather difficult to come by, and your choice will be contingent on a priori assumptions. For example, a connectionist linguist will assume that the processing of a transparent past tense form like claimed does not involve segmentation, whereas advocates of the dual-mechanism account will maintain that such a word gets segmented into morphemes, which are subsequently assigned to different processing destinations (cf. section 3.3.5). As a result, proponents of the dualmechanism model will assume that the contrast ‘regular minus irregular past tense’ yields activation for segmentation and morphemic access, whereas connectionists will assume that this contrast reflects, say, differences in processing ease. Many studies use pseudo-words as baseline controls, but here the problem is that activation for words relative to pseudowords may well reflect deactivation for pseudowords rather than real activation for words (Hauk et al. 2008; Mechelli, Gorno-Tempini, and Price 2003). Contrasts using a resting baseline are equally difficult to interpret, as a cognitively wakeful ‘resting state’ actually involves quite a lot of activity in the so-called ‘default mode network’, which may be related to cognitive activities such as the retrieval of memories, thinking about the future, or the interpretation of others’ behaviours and perspectives (Mazoyer 2001). Parametric studies do not run into such baseline problems, as they track activation changes within groups of relatively homogeneous stimuli which naturally vary along some parameter of interest. As the foregoing section, this section focuses on research into visual word recognition in English.24 Since research on complex words is still lacking, we will have to make do with the only existing parametric study of frequency effects on monomorphemic words. Hauk, Davis, and Pulvermüller (2008) present the results of a multiple linear regression analysis of fMRI data acquired from 21 right-handed monolingual speakers of English, who silently read mono-syllabic words presented on a screen one after another. On the basis of the non-parametric literature on frequency effects in single word processing (cf. section 3.3.1), Hauk, Davis, and Pulvermüller (2008) define areas of interest in left and right frontal areas and the left fusiform gyrus. As expected from the electrophysiological and metabolic imaging literature (cf. section 3.3.1), Hauk, Davis, and Pulvermüller (2008) observe negative correlations between brain activation and frequency (SVC, p < 0.05). In the left fusiform gyrus, this result is interpreted in terms of prior claims that this area is not engaged in the processing of surface 24. For a parametric fMRI studies on picture naming in English, see Graves et al. 2007 and Wilson, Isenberg, and Hickok 2009; for a parametric fMRI study on word reading in English, see Graves et al. 2009.

103

Operationalizing entrenchment

103

properties like size or font, but rather reflects processing at a deeper linguistic level (see section 4.3.2). This hypothesis is further strengthened by the finding that activation in this area is not modulated by orthographic variables like word length or orthographic typicality. Negative correlations in the bilateral inferior frontal gyri are interpreted as reflecting the integration of lexical, phonological and semantic representations. Negative correlations in the bilateral insulae25 are hypothesized to be related to phonological processing (e.g., to enhanced difficulty of access to low-frequency phonological forms). Further peaks of activation at a more lenient statistical threshold (p < 0.001, uncorrected) are observed in the right precentral gyrus, in the frontal superior medial gyrus as well as the left supplementary motor area. The only area to display positive correlations is the right middle cingulated (p < 0.001, uncorrected). The following figure renders Hauk, Davis, and Pulvermüller’s (2008) results on the surface of a standard brain.

Figure 10. Parametric effects of surface frequency on single-word reading. Negative correlations with frequency are rendered in green, activation for the factorial contrast of all words versus baseline (a string of hashmarks) are rendered in red, and their overlap in yellow. Activation levels are rendered at an uncorrected significance level of p < 0.001. Reprinted from NeuroImage, 42 /3, Hauk, O., M. H. Davis, and F. Pulvermüller, Modulation of brain activity by multiple lexical and word form variables in visual word recognition: A parametric fMRI study, 1185– 1195, 2008, with permission from Elsevier (caption modified).

25. cf. Figure 41 in the appendix.

104

Operationalizing entrenchment

104

This figure illustrates a point which will be crucial in the discussion of my own fMRI results in chapter 7: Although different studies have suggested that left inferior frontal regions might be involved in the segmentation of morphologically complex words (cf. sections 3.3.4 and 3.3.5), activation in this region need not reflect parsing, but may just as well be attributed to aspects of monomorphemic word recognition. Thus, according to Ullman’s (2007) model of the biocognition of the mental lexicon, the left inferior frontal cortex (more specifically, BAs 47, 45 and 44) underlies the selection and retrieval of lexical chunks stored in temporal areas (for more details, see section 3.3.5). Likewise, Buckner et al. (2000), who conduct a fMRI experiment combining repetition priming with a cued word-generation task, show that behavioural priming for repeated items is associated with reduced BOLD activation in left inferior frontal regions (BAs 44, 45, 47 and 6). This result holds irrespective of cue modality (auditory or visual), which lets Buckner et al. (2000) conclude that they subserve the selection and retrieval of lexical knowledge. Yet another strand of research argues that left inferior frontal activity may be sensitive to more general, domain-independent cognitive processes, such as increases in task difficulty, differences in attention load and working memory demands, or performance effects (Graves et al. 2009). It follows that without further stipulation, increases in left inferior frontal activation cannot be seen as an unambiguous touchstone of chunking. 4.4.

Operationalizing Entrenchment

Linguistic entrenchment is a complex concept which has so far eluded comprehensive psycho- and neurolinguistic operationalization. The goal of the present chapter was to offer a way out of this by demonstrating that insights and methods from different lines of linguistic and non-linguistic research can be fruitfully combined to develop promising experimental paradigms. Section 4.1, which provided a review of usage-based entrenchment definitions, led to the following operationalization of the relationship between usage frequencies and entrenchment: Higher token frequencies will correlate with a gradual increase in processing ease, more precisely enhanced fluidity in composition or parsing. At some point, this process will lead to a new, holistic representation. After this point, facilitation – more precisely, ease of retrieval, possibly in combination with fluidity of parsing – will still continue to increase as a function of frequency. Although this operationalization might strike the reader as rather broad, it is

105

Operationalizing entrenchment

105

important to emphasize that it is still narrow enough to cover all essential research questions brought up in section 3.4. At the same time, it is of course consistent with the entrenchment definitions offered in the literature. After presenting different ways of experimentally assessing the psychological realism of a statement (section 4.2), this chapter demonstrated that although no research tailored to the specific research questions at hand has been done so far, it is possible to integrate aspects from work on more or less related topics to develop an appropriate paradigm (section 4.3). Thus, it was argued that Gestalt psychology, which explores the relationship between unified percepts and their lower-level constituents in visual perception, has yielded insights which are both relevant and transferable to language. More specifically, section 4.3.1 made a case for the fact that linguistic entrenchment is best modelled in terms of part-whole (or: meronomic) relationships and further specified my predictions concerning the link between entrenchment and token frequencies. Section 4.3.2 demonstrated that masked visual priming constitutes a well-suited paradigm to test these predictions for several reasons: It allows us to combine behavioural and neuroimaging methods, it is both robust and well-established, and results cannot be distorted by conscious strategies of participants. Moreover, it was shown that quite a few studies have successfully used this paradigm to investigate the relationship between complex words and their parts in the context of other linguistic debates. The overview of research on frequency effects in English bi-morphemic derivatives in section 4.3.3 revealed that although suggestive findings have been reported, they are not directly relevant to the research questions at hand. One problem resides in the fact that these experiments used lexical decision without priming and therefore did not explore meronomic relationships. Moreover, due to their factorial design, most of these studies were not sensitive to the possibility of graded differences between derivatives of different usage frequencies. Section 4.3.4 demonstrated that this problem can be solved by means of parametric fMRI studies, which are not prone to distorting baseline effects and do not coerce continuous data into a categorical format, which leads to enhanced statistical sensitivity and allows us to track gradient effects of frequency. The five experiments to be presented in chapter 5 will examine relationship between usage frequencies and entrenchment on the basis of a combination of behavioural and neuroimaging experiments. All of these experiments will focus on bimorphemic suffixed derivatives, which have been argued to provide an ideal testing ground for frequency-related entrenchment (cf. section 4.3.3). Three of these experiments will make use of a

106

Operationalizing entrenchment

106

parametric fMRI design in combination with multiple regression on response times in masked visual priming with lexical decision. The pattern completion (or: part-to-whole priming) test will examine whether the degree to which a base (e.g., pale) is mentally associated with a derivative that it co-constitutes (e.g., paleness) varies as a function of usage frequency. More concretely, this experiment will involve part-to-whole priming (e.g., pale – PALENESS), the usage-based prediction being that higher entrenchment should correlate with stronger neural and behavioural priming, i.e., quicker response times and weaker neural activity. Section 4.3.1 demonstrated that pattern completion makes for a good candidate to measure gradual changes of processing fluidity. The emancipation (or: whole-to-part priming) test will reverse this order of priming (i.e., gauntness – GAUNT). Under a usage-based view, more entrenched derivative ‘wholes’ should be more emancipated from and less mentally related to the morphemic parts they are made up of. This means that lower frequencies should result in stronger whole-to-part priming. Section 4.3.1 argued that emancipation tests for the existence of a ‘chunked’ or holistic representation, which seems to be a categorical feature. From the present operationalization, it follows that only those variables which turn out to be significant predictors in both tests while at the same time displaying mirror-inverted regression slopes can rightly be regarded as gauging entrenchment. Another way of putting this is that a variable which correlates with significant facilitation in the former test must correlate with significant inhibition in the latter test (or vice versa) to be considered as an entrenchment predictor. It might be helpful to think of this requirement for inversely related meronomic relationships in terms of the following analogy: When one thinks of a steering wheel, one will think of a car; but when one thinks of a car, one will not necessarily think of a steering wheel. Exactly the same kind of asymmetric part-whole-relationship is expected to hold between a highly entrenched derivative and its base morpheme: The base will strongly evoke the derivative, but the derivative will not necessarily be perceived in terms of its base. As argued above (section 4.3.1), these two experiments would be sufficient by themselves to get a grip on the gradual and categorical dimensions of entrenchment, and I therefore consider them as my main experiments. By contrast, the three experiments to be presented in the following are less comprehensive in scope. As a consequence, they should not be regarded as touchstones for the validation or falsification of entrenchment, but rather as supplementary experiments which can be expected to add to our understanding of the phenomenon.

107

Operationalizing entrenchment

107

The top-down coercion (or: jumbled target priming) experiment will take advantage of the fact in order to avoid response bias and to ensure that participants maintain a relatively constant level of attention, lexical decision tasks typically involve as many wrong items (to provide for ‘no’answers) as items of interest (associated with ‘yes’-answers). This means that data for ‘no’-answers are a natural fall-out of lexical decision paradigms. It was decided to include stems involving jumbled letters in the ‘no’-condition and to code the relevant behavioural and neural responses to explore whether they are modulated by usage frequency. This experiment will be exploratory rather than hypothesis-driven. A Gestalt-inspired interpretation of entrenchment would predict strong frequency effects for ‘false’ jumbled stems under non-primed conditions: If high-frequency derivatives are holistically represented, they will exert top-down coercion on their component parts, and letter transpositions should be more difficult to detect. However, to avoid unconscious response bias through the absence of a prime, it will be necessary to stick to the priming format of the main conditions, and I do not have any hypothesis as to the possible effect of a ‘realword’ prime on a ‘non-word’ jumbled target (e.g., attachment – ATATCH). A fourth experiment, which has not been mentioned so far, will compare the behavioural results for the two main experiments to those for two control conditions involving monomorphemic items of matched length. If, as suggested in section 2.5, high levels of entrenchment go along with the fusion and accretion of syntagmatic structure, response times to highly entrenched derivatives should be similar to those for monomorphemic items. More specifically, I predict that with higher usage frequency, response times in part-to-whole priming with bimorphemic targets (e.g., pale – PALENESS) should become more and more similar to those in part-towhole priming with monomorphemic targets (e.g., negle – NEGLECT or approa – APPROACH). In a similar vein, response times whole-to-part priming with bimorphemic primes (e.g., gauntness – GAUNT) should become increasingly like those in whole-to-part priming with monomorphemic primes (e.g., dialect – DIAL, triple – TRIP). As can be seen from the examples, besides being monomorphemic, word pairs in the control conditions differ from those in the relevant main conditions along an additional dimension, respectively. Thus, unlike primes in the regular pattern completion test, those in the relevant control task do not represent self-contained monomorphemic items. Likewise, unlike targets in the regular emancipation test, those in the relevant control task are not semantically related to the prime word. As a consequence, it seems likely that the control conditions will elicit considerably stronger entrenchment effects than the most strongly entrenched items in the relevant main

108

Operationalizing entrenchment

108

conditions. Another implication of this stimulus selection is that the control condition does not lend itself to comparison with the main conditions by means of fMRI analysis, as the method of cognitive subtraction (cf. section 4.2) would not uniquely isolate the component of interest ‘multimorphemic structure’ (as opposed to ‘monomorphemic chunk’). My last experiment, the memory task, will be purely behavioural and concern memory storage, the prediction being that higher-frequency derivatives should be easier to memorize than matched lower-frequency counterparts. While this experiment is not suited to yield direct insights into mental representations, as better mnemonic performance may either be due to holistic storage or to greater strength of association between component parts, it has the potential to provide valuable insights with relatively little experimental effort. Before tackling the experiments in detail, let us anticipate a question that may have arisen in the course of this chapter. While I explained why derivatives make for an ideal candidate to examine entrenchment within the domain of morphology, I did not mention why I decided to focus on morphology in the first place. As mentioned above, this study breaks new ground. Although my paradigm could be readily extended to longer sequences, I decided to start small and to stack the odds in favour of entrenchment by using stimuli that are relatively short and written in plain form. If this study finds evidence in support of entrenchment in bimorphemic strings, it makes sense to conduct experiments with longer sequences to see how far we can push the boundaries of entrenchment. If, by contrast, this study falsifies entrenchment, it seems idle to spend further effort applying this paradigm to other sequences. From a usage-based perspective, it is fully legitimate to explore word-formation to study entrenchment, as this school assumes that there is no qualitative difference between morphological and syntactic knowledge (cf. Booij 2010a,b).

Chapter 5 Experimental design

This chapter, which will introduce the design for each of the five experimental conditions introduced in section 4.4, is subdivided into two broad sections. Section 5.1 will be devoted to presenting the stimuli for each condition, and section 5.2 will describe the relevant experimental procedures. Each section will first deal with the main experiments, before turning to the supplementary experiments. In the interest of readability, predictor variables such as RELATIVE FREQUENCY will henceforth be printed in SMALL CAPS. A list of all predictor variables and their abbreviations can be found on page xiiixiii. 5.1.

Stimuli

5.1.1. Stimuli for the masked priming fMRI study 5.1.1.1. Stimuli for the main conditions As stated in section 4.3.3, all experiments will focus on suffixed bimorphemic derivatives like kissable, employer, or careerist. The experiments will be restricted to items which are free of idiosyncrasies, as peculiarities of any sort by definition lead to holistic storage at some level (cf. section 2.1). Thus, items which look as if they were constructed from two independently existing morpho-orthographic units, but whose meanings are not composed from the relevant simplex meanings (e.g., jumper, number, toothsome or footling), were excluded. This also holds for derivatives with (synchronically) homonymous bases (e.g., trainable, fineness, fairly, leftist). Likewise, derivatives whose base orthography or pronunciation gets modified under the influence of the suffix (e.g., wrapper, studious or musician; signify or carbonic) were not considered, as these phenomena might entail listedness in some format. Section 3.2 mentioned that emotionally arousing words are known to have a special status in memory. Items like bloody or orgasmic were therefore avoided. It was also attempted to exclude stimuli which are highly jargon- or culture-specific. Thus, voiceless is likely to be distinctly entrenched in the minds of phoneticians. In a similar vein, native speakers from the United Kingdom will probably be more familiar with words like peeress than speakers from the United States.

110

Stimuli

110

Another group of stimuli which was excluded from analysis is items whose usage frequency has been subject to significant changes over the last years. For example, the usage frequency of the word wireless is likely to have significantly increased with the advent of wireless telecommunication – a fact which will not be accurately reflected by corpora compiled more than a couple of years ago. Bertram and Hyönä (2003) argue that greater word length may favour compositional over whole-word processing as more fixations may be needed, and Staub and Rayner (2007) claim that the number of visual fixations on a word is related to its length. It does indeed seem plausible that a word like antidisestablishmentarianism, which is often claimed to be the longest word of English, has to be segmented for reasons of visual acuity. To make sure that the words under investigation could be processed at a single glance, only derivatives of two to three syllables and between six and ten letters were chosen. To draw on a broad sample of different suffixes exhibiting different usage frequencies and degrees of productivity, items with the following suffixes were considered: -able, -age, -al, -ance, -ant, -dom, -en, -er, -ess, fold, -ful, -hood, -ify, -ish, -ism, -ist, -less, -let, -like, -ly, -ment, -ness, -or, ous, -ry, -y. Moreover, for these morphemes, different kinds of productivity measures are available from another study (Hay and Baayen 2001), allowing me to take these measures into account as additional predictors in my response time analyses. The aim of the present study is to find out how far pure TOKEN FREQUENCIES in corpus data can get us in predicting entrenchment. To reduce the risk of prejudging the analysis and to be true to the spirit of usage-based research, which aims to induce generalizations from concrete instances of language use in a maximally empirical fashion, a suffix selection which might reflect theoretical assumptions or a priori classifications was avoided. As cautioned by Paul (1891: xxxvi), abstractions tend to “obtrude themselves between the eye of the observer and the concrete phenomena, and disturb his vision” (for a similar statement, see also Paul 1891: 11). As a consequence, my sample includes suffixes which also function as free morphemes (e.g., -like, -less, -wise), level-I-affixes (e.g., -al and -y), level-II-affixes (e.g., -ness, or -ful; cf. section 3.3.5), affixes which create different output categories (i.e., -ist creates a noun, -ify creates a verb, and -ful does so for an adjective), and adverbial -ly, which is considered an inflectional suffix by some linguists (Haspelmath 1995: 50; DaltonPuffer and Plag 2001). Likewise, some of the suffixes are homonymic. Thus, the English suffix -er can be used to build both the synthetic com-

111

Experimental design

111

parative form of adjectives (e.g., greater) and agentive deverbal nouns (e.g., teacher). The stimuli were extracted from the English version of the CELEX lexical database (Baayen, Piepenbrock, and Gulikers 1995), which is based on a version of the COBUILD Corpus of the University of Birmingham (about 17.9 million words). This corpus contains mainly British English texts, but data from other varieties of English (e.g., American and Australian English) are also included. The corpus contains 16.6 million tokens of written text and 1.3 million tokens of transcripts of spoken interactions (for more details on COBUILD, cf. Sinclair 1987). To make sure that my stimuli covered a large and representative TOKEN FREQUENCY spectrum, I started the stimulus selection by dividing the group of eligible stimuli (as defined above) into three SURFACE FREQUENCY bins: high, middle and low. As mentioned in section 4.3.3, the SURFACE FREQUENCY of a derivative denotes the sum of absolute frequencies of its inflectional variants (e.g., the summed frequency of government and governments) in a corpus. With regard to derivatives, SURFACE (or: LEMMA) FREQUENCIES are thus equivalent to TOKEN FREQUENCIES, which are assumed to determine entrenchment in usage-based model-building (cf. section 2.2). Note that derivatives exhibiting multiple entries as a result of conversion (e.g., combatant, which is listed as a both as a noun and as an adjective) had to be excluded from the main experiments for practical reasons, since it was not clear how their SURFACE FREQUENCIES should be counted. Moreover, the experiments were restricted to derivatives occurring at least once in the corpus.26 From each frequency bin, I then went on to select items that were maximally distinct in terms of BASE FREQUENCY. Remember from section 4.3.3 that this notion refers to TOKEN FREQUENCIES at the root level, and that quite a few psycholinguistic studies attest to BASE FREQUENCY effects on morphological processing. Although effects at this token level are not predicted under the usage-based view, it is important to find out whether they are relevant to entrenchment and if so, whether they interact with SURFACE FREQUENCIES. Moreover, the manipulation along the BASE FREQUENCY dimension allows us to keep track of the potential effects of RELATIVE FREQUENCY, a composite measure which results from dividing 26. Note that the lemmas included in CELEX were drawn from two dictionaries before getting tagged with corpus-extracted information. As a result, some lemmas are associated with a usage frequency of zero.

112

Stimuli

112

SURFACE by BASE FREQUENCY, and which has also been claimed to affect morphological processing (cf. section 4.3.3). For example, the high-frequency item government has a SURFACE FREQUENCY of 7693 and a BASE FREQUENCY of 340; it therefore has a RELATIVE FREQUENCY of approximately 22.63 (i.e., the ratio of 7693 and 340). Likewise, the low-frequency item speakable has a SURFACE FREQUENCY of 1 and a BASE FREQUENCY of 6655; its RELATIVE FREQUENCY therefore amounts to about 0.00015. The scatterplot in Figure 11 presents the relationship between SURFACE and RELATIVE FREQUENCIES in my stimulus sample (the low-frequency bin is in black, the middle-frequency bin is in light gray, and the highfrequency bin is in dark gray). This figure also illustrates a point which will be important to keep in mind in conducting regression analyses, the fact that both frequency measures are strongly correlated in my data (Spearman’s rank correlation: rho = 0.711; p < 0.001). Note that in compliance with common psycholinguistic practice, I use logarithmically transformed numerical values for many predictor variables as well as the response time (RT) variable. For example, the SURFACE FREQUENCY of government is assigned the log-transformed value of 8.948066 (LOG RELATIVE FREQUENCY: 3.119120), while the SURFACE FREQUENCY of speakable is transformed into the log-frequency of 0 (LOG RELATIVE FREQUENCY: -8.803124). The reasons for this transformation are various. First, it is assumed that log-transformed frequencies represent a more accurate translation of the cognitive effect of perceived frequencies than raw frequencies (the frequency difference between 1 and 10 will be cognitively more relevant than that between 10001 and 10010). Second, the log-transformation of asymmetric predictor variables reduces the risk of overly influential outlier stimuli distorting the effects, thus obscuring the main trend characterizing most data points (Baayen 2008a). Third, with regard to response variables like RTs, the transformation allows us to reduce the skewing in the distribution of response variables, which is necessary in view of the normality assumptions underlying many statistical methods such as regression modelling and ANOVA.

113

Experimental design

Figure 11.

113

LOG SURFACE FREQUENCIES as a function of LOG RELATIVE FREQUENCIES for 216 English suffixed derivatives in three surface frequency bins (low: black; middle: light gray; high: dark gray).

The stimuli from the different frequency bins were matched as much as possible for suffix, NUMBER OF SYLLABLES and NUMBER OF LETTERS. Additionally, all psycholinguistic parameters which are independently known to potentially affect word processing, but which could not be held constant across bins were included into the regression analyses as control variables. When these parameters were not available from the CELEX database, they were extracted from the Hyperspace Analogue to Language (HAL) corpus via the search engine of the English Lexicon Project (ELP) at http://elexicon.wustl.edu/ (Balota et al. 2007). HAL contains approximately 131 million words gathered from about 3000 internet news groups in 1995.

114

Stimuli

114

For subsequent comparison with CELEX, SURFACE and BASE FREQUENCY data were also collected. Note that the ELP interface also provides access to the Kuþera and Francis (1967) frequency norms, which are derived from a corpus of approximately one million words of American English texts and which I also wanted to include in my comparison. Unfortunately, however, it turned out that many of the derivatives under investigation were not included in these frequency norms, thereby further substantiating the hypothesis that this corpus might not be appropriate for psycholinguistic purposes (e.g., Balota et al. 2004; Brysbaert and New 2009; Zevin and Seidenberg 2002). The following control variables were extracted for each base and derivative, respectively: NUMBER OF SYLLABLES IN THE DERIVATIVE; NUMBER OF PHONEMES; NUMBER OF PHONOLOGICAL NEIGHBOURS; NUMBER OF ORTHOGRAPHIC NEIGHBOURS; NUMBER OF PHONOGRAPHIC NEIGHBOURS, LOG AVERAGE BIGRAM FREQUENCY; LOG SUMMED BIGRAM FREQUENCY BY POSITION and LENGTH IN LETTERS. Moreover, the DERIVATIVE-BASE LETTER RATIO was computed for each item. While the first variables are self-explanatory, the latter deserve explanation. The NUMBER OF (PHONOLOGICAL, ORTHOGRAPHIC and PHONOGRAPHIC) NEIGHBOURS refers to the number of words which can be obtained by changing a single element (phoneme, letter, or both) (e.g., Coltheart et al. 1977). In other words, neighbourhood measures refer to the degree of similarity that a given word exhibits to other words. Words exhibiting many neighbours have been shown to be recognized more slowly than those displaying ‘sparse’ neighbourhoods, presumably as a result of competition between similar items (e.g., Luce and Pisoni 1998; Imai, Walley, and Flege 2005). Conversely, ‘dense’ neighbourhoods have been found to be facilitatory in word production (Goldrick and Rapp 2007). The notion of BIGRAM FREQUENCY (or: SUBLEXICAL FREQUENCY) refers to the frequency of a sequence of two letters, for example the frequency of TH in English. The AVERAGE BIGRAM FREQUENCY sums the bigram frequencies of a word, before dividing this sum by the number of consecutive bigrams. The SUMMED BIGRAM FREQUENCY BY POSITION additionally takes into account the position of bigrams within a word (Balota et al. 2007). This means that, for example, the SUMMED BIGRAM FREQUENCY BY POSITION count for BR in brain will only consider words starting in BR (e.g., brand, brisk, etc.). Moreover, each stimulus was coded for measures which seemed theoretically interesting, either because they have been reported to be relevant to morphological processing in the literature or because they represent potential confounds to the TOKEN FREQUENCY measures presented above. These

115

Experimental design

115

measures can be subdivided into two broad classes: base-related measures and affix-related measures. In very general terms, base-related measures can be seen as gauging the degree of paradigmatic uncertainty as to the word constituent which comes next (i.e., to the right of the base). More specifically, LOG MORPHOLOGICAL FAMILY SIZE (or: LOG FAMILY SIZE; BASE PRODUCTIVITY) counts the number of members in the morphological family of a base. In other words, this parameter refers to the type frequency of morphologically complex words in which the base occurs as a constituent (cf. Pylkkänen et al. 2004; Moscoso del Prado Martín, Kostiü, and Baayen 2004; Lüdeling and de Jong 2002; Hay and Baayen 2005). For example, the morphological family of thick includes thicken, thickly and thickness. It seems natural to assume that a derivative containing a highly versatile base should be more readily segmentable than a derivative with a base whose occurrence is more contextdependent, and there is, indeed, some psycholinguistic research supporting this view (De Jong et al. 2002). The LOG MORPHOLOGICAL FAMILY FREQUENCY (or: LOG FAMILY FREQUENCY) variable relates to the summed absolute frequency of the root in derived and compound words (Burani and Caramazza 1987; Schreuder and Baayen 1997). By contrast, affix-related measures relate to the productivity of suffixes, assessing the degree to which a suffix can be used to form new words. This study will take into account productivity measures which are readily available for the suffixes under consideration (from Hay and Baayen 2001, on the basis of CELEX). These include the so-called REALIZED PRODUCTIVITY (or: TYPE FREQUENCY) of an affix, which refers to the number of distinct words in which the affix occurs; the NUMBER OF HAPAX LEGOMENA with a given suffix; and the CATEGORY-CONDITIONED DEGREE OF PRODUCTIVITY (or: POTENTIAL PRODUCTIVITY), which is gauged by dividing the number of hapaxes with a given affix by the absolute frequency of words containing the affix (Baayen 2009; Plag 2006). It seems likely that more productive affixes will be easier to segment out. On the whole, 72 stimuli for each of the three frequency bins were extracted. Each frequency bin was then split up into two equal groups of 36 closely matched items (in terms of suffix, NUMBER OF SYLLABLES IN THE DERIVATIVE, and NUMBER OF LETTERS IN THE BASE), with one group being assigned to the whole-to-part priming task (e.g., government – GOVERN), and the other group providing the stimuli for the part-to-whole priming task (e.g., settle – SETTLEMENT). A complete list of the stimuli in the different frequency bins can be found in the appendix (Table 29 to Table 34).

116

Stimuli

116

5.1.1.2. Stimuli for the ‘no’-conditions As mentioned in section 4.4, lexical decision experiments typically involve as many wrong items (to provide for ‘no’-answers) as items of interest (associated with ‘yes’-answers) in order to ensure that participants process the targets at some processing depth and to maintain a relatively constant level of attention throughout the experiment. To avoid response bias, the differences between ‘yes’- and ‘no’conditions have to be kept to a minimum. This means that I had to extract an additional 216 derived words from CELEX, half of which were assigned to the ‘false’ whole-to-part-priming task, and the other half being used for the ‘false’ part-to-whole-priming task. Care was taken to select stimuli from the whole SURFACE FREQUENCY range and which approximated letter lengths from the main conditions. Moreover, the different suffixes used in the ‘no’ conditions were proportional to their share in the main conditions. However, I had to be more flexible with regard to other criteria. Thus, I included the homonymic base still and the base herb, which could lead to confusion as a similar word with a different meaning also exists in German (German herb corresponds to English ‘bitter’). Moreover, some stimuli had a BASE or SURFACE FREQUENCY of zero. To avoid the log of zero, these stimuli were assigned the value of 1 before transformation. The targets for the part-to-whole task were distorted by creating illegal base-suffix combinations, giving rise to pairs involving impossible derivative targets (e.g., ticket – TICKETMENT, pool – POOLAGE and solar – SOLARER). The targets for the whole-to-part task were constructed by changing the position of a consonant in each base, which created targets with jumbled letters (e.g., sugary – SGUAR, builder – BLUID, seasonal – SESAON and priceless – PIRCE). When possible, I created transpositions that did not affect word boundaries, as the jumbling of exterior letters has been shown to be particularly disruptive to the word superiority effect (cf. section 4.3.1). Obviously, this was not possible in the case of three-letter bases like owl, which gave rise to the priming pair owlish – WOL. Remember that the word superiority effect explains the ease of letter processing in words compared to non-words by invoking top-down activation from unitized word-level representations. This invites the hypothesis that if entrenchment does indeed create or reinforce unit status, jumbled high-frequency targets should be easier to read than jumbled low-frequency targets. Conversely, errors should be more difficult to detect under conditions of higher entrenchment. As mentioned above, I decided to capitalize on the fact that behavioural and neural responses to such items are a natural fall-out of my paradigm, and to explore whether top-down coercion in

117

Experimental design

117

jumbled items is affected by usage frequency. However, it has to be emphasized that this part of the experiment was exploratory rather than hypothesis-driven. Although a Gestalt-inspired interpretation of the word superiority theory would predict strong frequency effects under non-primed conditions, I had to stick to the priming format from the main conditions, and did not have any hypothesis as to the possible effect of a ‘real-word’ prime on a ‘non-word’ jumbled target. A complete list of the stimuli in the ‘no’-conditions can be found in the appendix (Tables 37 and 38). 5.1.1.3. Stimuli for the control conditions The experiment also involved three control conditions, which will be briefly presented in this section. The aim of the control conditions was twofold: First, I wished to contribute to the ongoing debate on whether masked priming taps into conceptual rather than merely orthographic representations on the basis of 36 morpho-orthographically segmentable primes (cf. section 4.3.2). This topic is not of immediate interest to entrenchment and will be the subject of a separate publication. It is mentioned here only to provide a complete picture of the experimental situation and the stimuli presented. Second – and this is of considerable relevance to entrenchment – in consonance with usage-based assumptions, I hypothesized that higher entrenchment should make complex strings more unit-like in the sense that their processing should be more similar to the processing of monomorphemic words of equal letter length. More specifically, I predicted that with higher usage frequency, response times in the pattern completion test (i.e., in part-to-whole priming with bimorphemic targets) should become more and more similar to response times in part-to-whole priming with monomorphemic targets (e.g., negle – NEGLECT or approa – APPROACH). This prediction was tested on the basis of 36 items extracted from the HAL corpus. To maximize priming effects, I chose monomorphemic items whose onsets do not have frequent competitors. Thus, words like control were excluded because the first letters of control are likely to activate a range of alternative entries like contrast, contract, contain, etc. This also holds for items like maintain, which includes a self-contained word at its onset (main). When I did use onsets with competitors (e.g., cathed priming CATHEDRAL), these competitors were either extremely rare (e.g., catheter) or I made sure to include their point of disambiguation into the prime. For example, the letters vic represent the onset for different words (e.g.,

118

Stimuli

118

victim, vicar, victory, vice). To exclude any competition between these entries, I presented victi as a prime for the target VICTIM. Some of the target words (e.g., difficult, problem, character) are themselves part of existing longer words (difficulty, problematic, characteristic). This was accepted as long as the containing and contained words did not diverge in segmental pronunciation (differences in stress were accepted). The pairs in this condition were matched with pairs from the main conditions in terms of orthographic overlap between prime and target. In a similar vein, as far as the emancipation test (i.e., whole-to-part priming with bimorphemic primes) is concerned, I hypothesized that with higher entrenchment, response times should become increasingly similar to response times for whole-to-part priming with monomorphemic primes (e.g., dialect – DIAL, triple – TRIP). This prediction was examined on the basis of 36 pairs, which were tightly matched with pairs from the main conditions in terms of orthographic overlap between prime and target. As can be seen from the examples, the targets represented existing words which are not semantically or morphologically related to the primes. The priming pairs were restricted to stimuli exhibiting the same pronunciation, thus excluding pairs like lament – LAME or stampede – STAMP. Moreover, care was taken to select primes which are not morphoorthographically segmentable (such as hardly or corner), as various empirical studies have indicated that such primes might be subject to initial decomposition in the processing stages reflected by masked priming (for details and references, see chapter 4.3.2). As a result, items of this kind do not provide an appropriate point of reference for holistic processing. A complete list of the stimuli in the control conditions is to be found in the appendix (Tables 35 and 36). 5.1.2.

Stimuli for the memory experiment

The memory task was a simple pen-and-paper experiment consisting of two lists of words, with each list containing 20 derivatives from the whole-topart priming condition. This means that the stimuli under investigation had not been consciously perceived in the prior fMRI experiment due to masking. The low-frequency list was restricted to words from the low-frequency bin, while the high-frequency list only included words from the highfrequency bin (i.e., from the black and the dark gray group on the scatterplot in Figure 11, respectively). Words on the lists were closely matched in terms of length in letters. As certain suffixes preferentially occur in certain frequency bins, it was impossible to match the lists for suffixes. It was

119

Experimental design

119

therefore decided to match the suffixes in terms of length and number of repetitions, such that, for example, as the low-frequency group contained three derivatives in -ness (broadness, commonness, gauntness), the highfrequency list contained three words ending in -ment (amazement, government, equipment) (the full lists are available in the appendix). 5.2. 5.2.1.

Experimental procedures Experimental procedure for the masked priming fMRI studies

19 right-handed native speakers of English with normal or corrected-to normal vision who reported no neurological or developmental disorders were recruited (9 male, 10 female, mean age: 25.9 years, range: 19-61 years, SD: 9.9 years). Most participants were foreign exchange students, with the majority coming from the United States (13 subjects), and the others coming from New Zealand (two subjects), the Anglophone part of Canada (two subjects), Great Britain (one subject) and Australia (one subject). None of the participants presented contraindications to fMRI scanning. Participants were paid for their participation. To avoid strong interferences between different languages, the sample was restricted to participants who were not bi- or multilingual (in the sense of having achieved native or native-like command in at least one language other than English before adolescence). All participants gave written informed consent after the experimental procedure had been explained to them and all their questions had been answered in a way they considered to be satisfactory. The study was approved by the ethics committee of the university.27 To familiarize participants with the task and to discuss possible questions, they were given an offline practice session of 40 prime-target pairs (20 whole-to-part, 20 part-to-whole, with half of the targets being nonwords, respectively) before the actual fMRI experiment started. These pairs were similar, but not identical to the stimuli used in the real experiment. Participants made lexical decision to the targets by pressing the right or the middle button of a three-button-mouse with the index or middle finger of their right (i.e., dominant) hand. RTs were recorded with tenth of millisecond accuracy from target onset. Accuracy feedback was given. 27. If required, the circular to recruit participants, the information sheet, the consent form for participants, the magnetic resonance safety screening form, as well as the application to the ethics committee can be obtained directly from the author.

120

Experimental procedures

120

Before the practice session started, the following instructions were presented in white font on a black computer screen (participants clicked through the relevant slides at their own pace): 1 WELCOME! Please read the following instructions VERY CAREFULLY and ask the experimenter any questions that you may have. Press the space bar to proceed to the instructions. 2 Throughout the experiment, you will be presented with word pairs. Each word pair will be preceded by a row of hash marks (####), and words forming a pair will always be flashed on the screen one after another. Please press the space bar for the next page. 3 Within a pair, the first word will always be a real English word. It will be printed in lowercase and flashed on the monitor so briefly that you will barely be able to read it (if at all). The second word, by contrast, will always be printed in uppercase. It will remain on the screen for quite some time, and it may or may not be a real English word. Please press the space bar for two EXAMPLES IN SLOW MOTION! 4 Your task will now be to focus on the SECOND WORD of each pair and to decide whether it is a real word or a non-word. If it is a real English word, press the left mouse button with your left middle finger. If it is a non-word, press the right mouse button with your left index finger. Please make this decision as quickly and as accurately as possible! If you feel that you have missed an item or hit the wrong button, don't worry! Just go ahead and concentrate on the upcoming items. Also keep in mind that you have to decide whether the SECOND word is a real word; just ignore the first word! Press the space bar to proceed to the next page. 5 Before the practice session starts, note that this experiment involves two different kinds of non-words. Some items are non-words because they have their letters jumbled (e.g., PLIGRIM). Other items are non-words because they are made up of parts which do not fit together (e.g., KINDATION).

121

Experimental design

121

In making your decision, please remember that we are neither interested in whether the words in question could occur in poetic, playful or ironic language, nor in whether they exist as proper names or in abbreviations. What we want to know is whether you would consider these words as normal in natural language usage. Press the space bar when you are ready to START THE PRACTICE SESSION.

Nine of the participants were presented with a modified version instructing them to press the right mouse button for real words, and the left mouse button for non-words. The practice to invert response buttons for half of the participant sample is common in neurolinguistic experiments. It aims to avoid or counterbalance potentially confounding effects of motor activity on RTs and neural activation (for example, if motor responses with the index finger are quicker and easier to perform than those with the middle finger, this may distort the results). While in the great majority of masked priming studies participants are not informed about the presence of masked primes, I decided to do so, as the SOA of 60 ms was somewhat longer than the SOAs in traditional masked priming studies, which tend be below 50 ms (cf. chapter 4.3.2). I deliberately diverged from this default timing so as not to tap into the earliest stages of visual word processing, which have been shown to be related to the perceptual extraction of morpho-orthographic sub-units, irrespective of conceptual whole-word properties, which are reflected by later processing stages. Although under a usage-based perspective, it is not to be excluded that extreme cases of chunking may ultimately affect what is considered as a morpho-orthographic unit, it was decided to stack the odds in favour of whole-word effects by focusing on a somewhat later stage than is usual in masked priming studies. In this stage, which is probably best characterized as preconscious (rather than unconscious), stimuli cannot be reported, and experiments therefore cannot be distorted by conscious strategies. Unlike in masked priming, however, certain participants will tend to perceive a kind of flash before the target stimulus. To avoid confusion during the experiment, this was anticipated by informing the participants of the primes. (Note that the flash-perception prediction was confirmed by the answers provided on the debriefing questionnaire in the appendix). Stimulus presentation and data recording were controlled by the Presentation software (http://www.neurobs.com/) running on a PC. This commercial software allows for stimulus delivery and response time recordings with tenth of milliseconds precision. The stimuli were displayed in Arial font, size 28, in white letters on a black background.

122

Experimental procedures

122

In more detail, the timing was the following: Each experimental trial lasted 2560 ms. The forward mask, a row of hash marks (####), was displayed for 500 ms, the prime for 60 ms, and the target for maximally 600 ms (shorter if subjects responded before). This sequence was replaced by black screen, which left an additional 1400 ms for responding. The fact that targets disappeared as soon as subjects responded provided feedback to encourage quick answers. At the same time, the targets were not presented for longer than 600 ms to make sure that neural differences between stimuli triggering low and high RTs could not be attributed to significant differences in visual exposure. The intertrial interval (black screen) varied pseudorandomly between 0 and 2190 ms (mean: 1095 ms) to enable fMRI jittering. Items were presented in four sessions to prevent fatigue, with each session containing 153 stimulus pairs and lasting for approximately 10 minutes. The first session was actually somewhat longer, as it contained an additional short practice block with half of the stimulus pairs from the offline practice session followed by a countdown before the start of the real experiment in an attempt to help participants to get accustomed to the scanner and its noise. Sessions were separated by short breaks to provide resting time for the participant. Within each session, the order of presentation of different stimulus types was pseudo-randomized in an event-related design (the same for each participant). A schematic representation of a priming trial is provided in Figure 12.

Figure 12.

Schematic representation of a priming trial.

123

Experimental design

123

Imaging was performed on a Siemens Tim-Trio 3T scanner at the Freiburg Brain Imaging Laboratory, Freiburg University Hospital. 5.2.2.

Experimental procedure for the memory task

The instructions for the memory task were as follows: Before we come to the concluding questionnaire, let’s briefly do a very short memory task. You will now be presented with a list of 20 words. You will have one minute to memorize these words, after which I will ask you to write down as many of the words as you can remember. The task will then be repeated with a second list of words.

To counterbalance potential effects of practice of fatigue and betweenlist-interference on performance, nine subjects were first presented with the high-frequency list, while ten subjects started with the low-frequency list. After the experimental sessions, participants were asked to fill in a brief questionnaire (cf. appendix), which included the following questions relating to the memory task: Which list from the memory task did you find easier to memorize? Can you explain why? Did you make use of any particular strategy which helped you to memorize the words?

Subjects were not informed about frequency differences between the lists, and one aim of this questionnaire was to find out whether they actually perceived any differences and how they analyzed them. Note that due to the relatively open format of the questionnaire, subjects did not provide equally long and exhaustive answers to all questions, making them more readily amenable to qualitative interpretation than to statistical analysis.

Chapter 6 Behavioural data analysis

6.1. 6.1.1.

Main conditions General method

Incorrect responses were discarded from the analyses. The relationship between log-transformed RTs and the different predictors presented in section 5.1 was examined using linear mixed-effects models by means of the function lmer from the lme4 package in the R statistical programming environment, version 2.12.1 (Pinheiro and Bates 2000; Baayen, Davidson, and Bates 2008; R Development Core Team 2011). SUBJECT was fed into each model as a random variable to account for variance between individual speakers. Outliers with a standardized residual greater than 2.5 standard deviations from zero were discarded from the models, leading to the removal of 1.0 to 2.76 percent of the data points, depending on the model. P-values of fixed effects and highest posterior density confidence intervals were estimated by means of Markov Chain Monte Carlo Sampling on the basis of 10,000 samples with the function pvals.fnc from the languageR package (Baayen 2008). 6.1.2.

Simple linear mixed-effects regression analyses

Remember that section 4.4 claimed that entrenchment should be associated with asymmetric meronomic relationships between complex sequences and their morphemic constituents. More concretely, drawing on findings from Gestalt psychology, it was hypothesized that with higher entrenchment, parts (i.e., constituent morphemes) should increasingly prime wholes (i.e., complex sequences), whereas wholes should decreasingly prime parts. Applied to the bimorphemic derivatives at hand, this asymmetry should be reflected behaviourally by enhanced facilitation (or shorter log RTs) in masked part-to-whole priming and enhanced inhibition (or longer log RTs) in masked whole-to-part priming. From the foregoing, it follows that only predictors which turn out to be significant in both main conditions while at the same time displaying mirror-inverted regression slopes can rightly be regarded as gauging

125

125

Behavioural data analysis

entrenchment. In other words, the usage-based hypothesis that TOKEN FREQUENCIES correlate with entrenchment will be considered true only if the following two conditions are satisfied simultaneously: (i) (ii)

Higher LOG TOKEN FREQUENCIES lead to significantly lower log RTs in part-to-whole priming (i.e., cheer – CHEERFUL); Higher LOG TOKEN FREQUENCIES lead to significantly higher log RTs in whole-to-part priming (e.g., marginal – MARGIN).

I started the regression analysis of log RTs by investigating the predictive power of each independent variable in isolation for either of the two main conditions. From among the models whose single fixed-effect factor survived the Į-level of p < 0.05 in both main conditions, I identified those whose coefficients had opposite signs. This left me with three predictors which, according to my operationalization, must be seen as indexing entrenchment: LOG RELATIVE FREQUENCY, DERIVATIVE-BASE LETTER RATIO, and NUMBER OF PHONOGRAPHIC NEIGHBOURS OF THE BASE. Table 1 to Table 6 present the fixed-effect part of each model, with Table 1 and Table 2 referring to LOG RELATIVE FREQUENCIES, Table 3 and Table 4 relating to DERIVATIVE-BASE LETTER RATIOS, and Table 5 and Table 6 concerning the NUMBER OF PHONOGRAPHIC NEIGHBOURS OF THE BASE. Table 1.

Coefficient for the single fixed-effect factor LOG RELATIVE FREQUENCY in a mixed-effects model fitted to log RTs in part-towhole priming. MCMCmean: Markov chain Monte Carlo mean for the estimated coefficients; Lower, Upper: 95% highest posterior density intervals; pMCMC: Markov chain Monte Carlo p-value. Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

9.0079

9.0074

8.9500

9.065

0.0001

logRelFreq

-0.0285

-0.0285

-0.0331

-0.024

0.0001

126 Table 2.

Main conditions

126

Coefficient for the single fixed-effect factor LOG RELATIVE FREQUENCY in a mixed-effects model fitted to log RTs in whole-topart priming. Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

8.8946

8.8943

8.8446

8.9405

0.0001

logRelFreq

0.0084

0.0084

0.0053

0.0114

0.0001

Table 3.

Coefficient for the single fixed-effect factor DERIVATIVE-BASE in a mixed-effects model fitted to log RTs in part-towhole priming. LETTER RATIO

Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

8.8607

8.8609

8.7635

8.9523

0.0001

DerBasLetRat

0.1341

0.1338

0.0862

0.1822

0.0001

Table 4.

Coefficient for the single fixed-effect factor DERIVATIVE-BASE in a mixed-effects model fitted to log RTs in wholeto-part priming. LETTER RATIO

Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

9.008

9.0075

8.9322

9.0829

0.0001

DerBasLetRat

-0.086

-0.0859

-0.1233

-0.0493

0.0001

Table 5.

Coefficient for the single fixed-effect factor PHONOGRAPHIC NEIGHBOURHOOD SIZE OF BASE in a mixed-effects model fitted to log RTs in part-to-whole priming. Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

9.0599

9.0594

9.004

9.1181

0.0001

PhonGra NeiBa

0.0054

0.0054

0.002

0.0088

0.0014

127

127

Behavioural data analysis

Table 6.

(Intercept) PhonGra NeiBa

Coefficient for the single fixed-effect factor PHONOGRAPHIC NEIGHBOURHOOD SIZE OF BASE in a mixed-effects model fitted to log RTs in whole-to-part priming. Estimate

MCMCmean

Lower

Upper

pMCMC

8.9004

8.9003

8.8546

8.9481

0.0001

-0.0083

-0.0083

-0.0105

-0.0062

0.0001

To illustrate the fact that the predictors have opposite effects in different conditions, their effects have been visualized in Figure 13 to Figure 15 by means of the function plotLMER.fnc from the languageR package. The left side of each figure relates to part-to-whole priming, while the right side refers to whole-to-part priming, respectively. To familiarize the reader with such figures, and because LOG RELATIVE FREQUENCIES will take on a special importance in the course of this work, the relevant figure is presented in larger format than those for the other predictors.

Effect size for the single independent variable in a linear mixedeffects model fitted to log RTs in part-towhole priming (left panel) and

LOG RELATIVE FREQUENCY



whole-to-part priming (right panel). The solid black lines represent predicted log RT values. The dotted black lines visualize the MCMC-based HPD intervals for these values.

Figure 13. asdf

Figure 1.



128 Main conditions

128

129

Behavioural data analysis

Figure 14.

129

Effect size for the single independent variable DERIVATIVE-BASE in a linear mixed-effects model fitted to log RTs in part-to-whole priming (left panel) and whole-to-part priming (right panel). LETTER RATIO

Figure 15.

Effect size for the single independent variable NUMBER OF in a linear mixed-effects model fitted to log RTs in part-to-whole priming (left panel) and wholeto-part priming (right panel).

PHONOGRAPHIC NEIGHBOURS OF BASE

Crucially, these statistics show that contrary to usage-based expectations, the LOG TOKEN FREQUENCIES of derivatives are not predictive of entrenchment. By contrast, and in line with Hay’s (2001) results (cf. section

130

Main conditions

130

4.3.3), LOG RELATIVE FREQUENCIES can rightly be viewed as gauging entrenchment, since higher LOG RELATIVE FREQUENCIES yield lower log RTs in part-to-whole priming, but higher log RTs in whole-to-part priming. In other words, a base morpheme will strongly evoke a derivative which is more frequent than the base itself is in isolation. Conversely, the perception of a derivative which is more frequent that its base will be relatively autonomous of the base morpheme. Importantly, the other significant entrenchment predictors are not directly related to frequencies. The results for DERIVATIVE-BASE LETTER RATIOS show that derivatives with a relatively longer suffix will be less entrenched. Although I am not aware of any claim to this effect in the literature, this finding seems intuitively plausible, as a longer suffix will be perceptually more salient and therefore presumably more prone to extraction. Another finding that does not seem to have been discussed in the literature is that derivatives including bases with more PHONOGRAPHIC NEIGHBOURS are less entrenched, a result which is easily accounted for when we consider that language users arguably cannot afford storing morphemes with many competitors in a fuzzy holistic Gestalt form. A morpheme ‘living’ in a dense neighbourhood must be stored in ‘high-resolution’ format including precise phonographic detail in order to be distinguishable from similar items. At the same time, the fact that phonographic rather than orthographic neighbourhood values yielded significant results suggests an interesting point which I had not anticipated, namely that visual word recognition does not exclusively rely on visual information, but also on the sound structure of a word. In a second step, I decided to include these three predictors into a multiple mixed-effects regression model with lmer to test whether they are jointly predictive of entrenchment for each condition. I found that although the collinearity between the predictors was not unreasonably high (the Rfunction collin.fnc yielded a kappa value of 17.35 for part-to-whole priming and 18.07 for whole-to-part priming), the predictors were not jointly significant in either model. Thus, as shown in Table 7 and Table 8, the LOG RELATIVE FREQUENCY predictor was the only variable to survive in both conditions.

131

131

Behavioural data analysis

Table 7.

Coefficients for the fixed-effect factors in the mixed-effects model fitted to log RTs in part-to-whole priming. MCMCmean: Markov chain Monte Carlo mean for the estimated coefficients; Lower, Upper: 95% highest posterior density intervals; pMCMC: Markov chain Monte Carlo p-value.

(Intercept)

9.0194

MCMCmeann 9.0193

logRelFreq PhonGra NeiBa

-0.0284

-0.0284

-0.0335

-0.0230

0.0001

0.0015

0.0015

-0.0019

0.0050

0.4052

-0.0093

-0.0095

-0.0612

0.0482

0.7398

Estimate

DerBasLetRat

Table 8.

Lower

Upper

pMCMC

8.9219

9.1181

0.0001

Coefficients for the fixed-effect factors in the mixed-effects model fitted to log RTs in whole-to-part priming. Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

8.9247

8.9249

8.8475

9.0069

0.0001

logRelFreq PhonGra NeiBa

0.0080

0.0080

0.0043

0.0114

0.0001

-0.0082

-0.0083

-0.0105

-0.0060

0.0001

-0.0002

-0.0002

-0.0432

0.0462

0.9936

DerBasLetRat

This section has shown that the phenomenon of entrenchment correlates with several usage variables, with LOG RELATIVE FREQUENCY emerging as the most robust entrenchment predictor.

132

Main conditions

132

6.1.3. Multiple mixed-effects regression analyses 6.1.3.1. Introductory remarks on methodology To obtain a more complete picture of the variables modulating the behavioural responses in the main experiments, the log RTs from the main conditions were subjected to multiple mixed-effects regression with backward elimination. This method has the potential to explore the correlation between log RTs and all potential predictor variables for each condition on its own. Variables in the final models were arrived at by means of manual backward elimination, which means that the analysis starts with all potential predictors, and that non-significant predictors are deleted one by one, until only significant predictors are left. To avoid a model whose predictors are too strongly correlated, the analysis for each condition of interest started by visualizing the correlational structure of all variables. To get a first impression of potential collinearities, I used the function pairscor.fnc, which plots a scatterplot matrix of pairwise correlations between all predictors and indicates both nonparametric (Spearman) and parametric (Pearson) correlations. In a second step, I conducted a hierarchical agglomerative cluster analysis with the function hclust (after creating a squared correlation matrix with the function cor and converting this matrix into a distance object with dist). From clusters of highly correlated predictors which were not or not directly frequency-related (e.g., variables relating to THE NUMBER OF PHONOLOGICAL, ORTHOGRAPHIC and PHONOGRAPHIC NEIGHBOURS or LENGTH IN LETTERS), I selected the variable which was most strongly correlated with log RT, and threw out the other predictors as a strategy to reduce collinearity. By contrast, highly collinear frequency-related variables (notably LOG RELATIVE and SURFACE FREQUENCIES, as well as LOG FAMILY FREQUENCY and LOG FAMILY SIZE) were retained in the model and compared with a view to examining which of them (if any) provided the best model fit, everything else being equal. Variables with a significance level above p = 0.05 were successively dropped from the model, after excluding the existence of significant quadratic relationships. The partial effects for each predictor (i.e., the effects for a given predictor when the other predictors in the model are held constant) in the final model were visualized by means of the function plotLMER.fnc. Partial effects for polynomials were drawn on the basis of the same function, after refitting the relevant mixed-effects model with a term for a raw polynomial. Collinearity between predictors in the final model was assessed by means of collin.fnc, and the condition number kappa never exceeded 16.61.

133

133

Behavioural data analysis

Linear predictors in the final model were tested to check whether the inclusion of a quadratic term made a significant addition to the model. 6.1.3.2. Multiple regression results for part-to-whole priming Manual backward elimination in multiple mixed-effects regression analysis revealed that a model including LOG FAMILY FREQUENCY (involving both a linear and a quadratic component) and LOG RELATIVE FREQUENCY as predictors was best suited to predict log RTs in part-to-whole priming (cf. Table 9). In line with my findings from section 6.1.2, higher LOG RELATIVE FREQUENCIES yielded significantly lower log RTs, as illustrated by the partial effects plot in Figure 16 (left panel). The linear component of LOG FAMILY FREQUENCY indicates that derivatives whose bases exhibit high TOKEN FREQUENCIES within complex words are reacted to more quickly. The positive slope for the quadratic component of LOG FAMILY FREQUENCY complements this finding by showing that derivatives whose bases display extremely high or low FAMILY FREQUENCIES are reacted to more slowly than those with bases in the middle range (cf. the relevant partial effects plot in Figure 16 (right panel)). Table 9.

Coefficients for the fixed-effect factors in the mixed-effects model fitted to log RTS in part-to-whole priming. MCMCmean: Markov chain Monte Carlo mean for the estimated coefficients; Lower, Upper: 95% highest posterior density intervals; pMCMC: Markov chain Monte Carlo p-value. Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

9.2399

9.2395

9.1368

9.3390

0.0001

logRelFreq

-0.0301

-0.0300

-0.0349

-0.0256

0.0001

logFamFreq

-0.0683

-0.0683

-0.0954

-0.0419

0.0001

logFamFreq^2

0.0045

0.0045

0.0025

0.0066

0.0001

134

Figure 16.

Main conditions

134

Partial effects for the predictors LOG RELATIVE FREQUENCY (left) and (right) in a linear mixed-effects model fitted to log RTs in part-to-whole priming. The solid black lines represent predicted log RT values. The dotted black lines visualize the MCMC-based HPD intervals for these values. LOG FAMILY FREQUENCY

First of all, this model underscores the importance of entrenchment effects (as indexed by LOG RELATIVE FREQUENCIES) in predicting behavioural reactions to linguistic stimuli. Moreover, it raises the question of how the nonlinear effects of LOG FAMILY FREQUENCY can be theoretically handled. This issue will be discussed in the next section. 6.1.3.3. Multiple regression results for whole-to-part priming Multiple mixed-effects regression with manual backwards elimination revealed that a model including LOG FAMILY FREQUENCY (again with a linear and a quadratic component), ORTHOGRAPHIC NEIGHBOURHOOD SIZE OF THE BASE, NUMBER OF SYLLABLES IN THE DERIVATIVE and LOG RELATIVE FREQUENCY as predictors provides the best fit to log RTs in whole-to-part priming, as shown by the following table of fixed-effects coefficients:

135

135

Behavioural data analysis

Table 10.

Coefficients for the fixed-effect factors in the mixed-effects model fitted to log RTs in whole-to-part priming. MCMCmean: Markov chain Monte Carlo mean for the estimated coefficients; Lower, Upper: 95% highest posterior density intervals; pMCMC: Markov chain Monte Carlo p-value. Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

9.0014

9.0014

8.9104

9.0934

0.0001

logRelFreq

0.0069

0.0069

0.0038

0.0100

0.0001

SylDer

0.0504

0.0504

0.0280

0.0711

0.0001

OrtNeiBas

-0.0053

-0.0053

-0.0073

-0.0031

0.0001

logFamFreq

-0.0610

-0.0610

-0.0785

-0.0435

0.0001

logFamFreq^2

0.0043

0.0043

0.0028

0.0057

0.0001

In line with my findings from section 6.1.2, higher LOG RELATIVE FREQUENCIES correlate with significantly higher log RTs, as illustrated by the partial effects plot in Figure 17:

  Figure 17. Partial effects for the predictor LOG RELATIVE FREQUENCY in a linear mixed-effects model fitted to log RTs in whole-to-part priming (cf. Table 10). The solid black line represents predicted log RT values. The dotted black lines visualize the MCMC-based HPD intervals for these values.

136

Main conditions

136

Intriguingly, the linear and quadratic effects of LOG FAMILY FREQUENCY in the whole-to-part priming task turned out to be highly similar to those in the part-to-whole priming task: While derivatives whose bases are highly frequent in complex words are reacted to more quickly overall, extremely high or LOW FAMILY FREQUENCIES elicit slower log RTs than those with bases in the middle range (cf. the relevant partial effects plot in Figure 18). In other words, the effects of LOG FAMILY FREQUENCY on log RTs in the main experiments are not inversely related, which clearly indicates that this predictor cannot be taken to unequivocally reflect entrenchment at a concrete lexical level as operationalized in section 4.4.

Figure 18. Partial effects for the predictor LOG FAMILY FREQUENCY in the same model (cf. Table 10).

How can we make sense of the fact that high LOG FAMILY FREQUENCIES facilitate priming in both main conditions? Interestingly, at least as far as whole-to-part priming is concerned, the linear component of LOG FAMILY FREQUENCY is easily accounted for in terms of entrenchment along the following lines: A base exhibiting a high summed TOKEN FREQUENCY across derived and compound words is highly malleable and will therefore be perceived as an autonomous item having ‘a life of its own’. By contrast, a base with a low LOG FAMILY FREQUENCY will be perceived as being contingent on a particular context, for example, a specific derivative. It seems plausible that an autonomous morpheme will be more salient and easier to

137

Behavioural data analysis

137

segment out, leading to weaker entrenchment and, consequently, quicker log RTs in the whole-to-part priming task. The problem, however, is that an explanation in terms of entrenchment at a concrete lexical level is not consistent with the findings for the part-to-whole priming task, as high levels of autonomy inherently exclude a great strength of meronomic association. One tentative explanation which might be worth exploring in the future is that morphemes with a high LOG FAMILY FREQUENCY might somehow be cognitively tagged for frequently occurring as part of complex words rather than on their own, leading to the pre-activation of the whole network of morphological family members. This hypothesis could be translated into usage-based construction grammar parlance by claiming that bases which tend to occur in complex words will evoke a semi-schematic construction pattern (consisting of a concrete base and an open slot), which will then top-down activate all potential slot fillers (be it in terms of concrete morphemes or in terms of abstract features), thus facilitating the recognition of the upcoming morpheme. Under such an account, LOG FAMILY FREQUENCY would, after all, reflect entrenchment, but at a semi-abstract level of the constructional hierarchy (cf. section 2.1). Quadratic effects in the part-to-whole priming task might then result from the absence of a unitary higher level schema for bases exhibiting too high a family frequency, which might prevent the construction of a single, coherent higher-level construction (especially if the tokens to generalize across are highly diverse). A more mundane explanation would be that the predictor LOG FAMILY FREQUENCY simply does not index entrenchment, but other cognitive processes which still remain to be identified. Although I have at present no definite solution to offer to the LOG FAMILY FREQUENCY effect, I would like point out that a cognitive system which warrants enhanced performance for items of higher LOG FAMILY FREQUENCY in different experimental conditions is, of course, highly functional. Figure 19 (left panel), which visualizes the partial effects of the predictor NUMBER OF SYLLABLES IN THE DERIVATIVE, shows that longer words elicit slower log RTs. This effect is arguably related to the strongly correlated variable of DERIVATIVE-BASE LETTER RATIO (already discussed in section 6.1.2) in that longer words will involve proportionally shorter and therefore less prominent suffixes. As a consequence, their parts will be more difficult to segment out, leading to slower reaction times in tasks requiring morpheme extraction. In a similar vein, the ORTHOGRAPHIC NEIGHBOURHOOD variable, whose partial effects are presented in Figure 19 (right panel), is strongly related to the NUMBER OF PHONOGRAPHIC

138

Supplementary experiments

NEIGHBOURS OF THE BASE

138

variable discussed in section 6.1.2, and I will not

repeat the argument here.

Figure 19.

6.2. 6.2.1.

Partial effects for the predictors NUMBER OF SYLLABLES IN DERIVATIVE (left panel) and ORTHOGRAPHIC NEIGHBOURS OF BASE (right panel) in the same model (cf. Table 10).

Behavioural analyses for the supplementary experiments Mixed-effects regression analyses for jumbled target priming

The regression studies presented in section 6.1.2 found three variables gauging entrenchment: LOG RELATIVE FREQUENCY, DERIVATIVE-BASE LETTER RATIO, and NUMBER OF PHONOGRAPHIC NEIGHBOURS OF THE BASE. This section will restrict itself to exploring whether these entrenchment variables are equally predictive of log RTs in the jumbled target priming task by means of linear mixed-effects models with subject as a random variable, using the methods presented in the previous sections of this chapter. Recall that while I predicted that higher levels of entrenchment should slow down log RTs in experiments exploring unprimed top-down coercion, I had no specific intuitions as to possible effects of masked derivatives on the perception of jumbled-letter morphemes (cf. section 4.4). I started by fitting a separate model for each predictor. The results in Table 11 to Table 13 clearly demonstrate that each predictor is significant on its own:

139

139

Behavioural data analysis

Table 11.

Coefficient for the single fixed-effect factor LOG RELATIVE FREQUENCY in a mixed-effects model fitted to log RTs in jumbled target priming. MCMCmean: Markov chain Monte Carlo mean for the estimated coefficients; Lower, Upper: 95% highest posterior density intervals; pMCMC: Markov chain Monte Carlo p-value. Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

9.0127

9.0126

8.9553

9.0682

0.0001

logRelFreq

0.0064

0.0064

0.0009

0.0121

0.0254

Table 12.

Coefficient for the single fixed-effect factor DERIVATIVE-BASE a mixed-effects model fitted to log RTs in jumbled target priming. LETTER RATIO

Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

9.088

9.0873

9.0170

9.1564

0.0001

DerBasLetRat

-0.056

-0.0559

-0.0863

-0.0279

0.0001

Table 13.

Coefficient for the single fixed-effect factor PHONOGRAPHIC a mixed-effects model fitted to log RTs in jumbled target priming. NEIGHBOURHOOD OF BASE

(Intercept) PhonGra NeiBa

Estimate

MCMCmean

Lower

Upper

pMCMC

9.0151

9.0149

8.9626

9.0663

0.0001

-0.0084

-0.0084

-0.0104

-0.0064

0.0001

It is interesting to note that for each predictor, the slope points into the same direction as in the whole-to-part priming task. This is illustrated by the charts in the left panels of Figure 20, which visualize (from top to bottom) the effects of the variables LOG RELATIVE FREQUENCY, DERIVATIVE-BASE LETTER RATIO and NUMBER OF PHONOGRAPHIC NEIGHBOURS in jumbled target priming. The figures on the right represent the effects for the relevant predictors in whole-to-part priming (cf. Table 2, Table 4 and Table 6). The solid black lines in the figures represent predicted log RT values. The dotted black lines visualize the MCMC-based HPD intervals for these values.

140

Supplementary experiments

140

141

141

Behavioural data analysis

Figure 20.

Effect sizes for different single independent variables in jumbled target priming, in comparison to the relevant effect sizes in whole-topart priming. Top left panel: Effect size for the single fixed-effect factor LOG RELATIVE FREQUENCY in jumbled target priming. Top right panel: Effect size for the single fixed-effect factor LOG RELATIVE FREQUENCY in whole-to-part priming (cf. Table 2). Middle left panel: Effect size for the single fixed-effect factor DERIVATIVE-BASE LETTER RATIO in jumbled target priming. Middle right panel: Effect size for the single fixed-effect factor DERIVATIVE-BASE LETTER RATIO in whole-to-part priming (cf. Table 4). Bottom left panel: Effect size for the single fixed-effect factor NUMBER OF PHONOGRAPHIC NEIGHBOURS OF BASE in jumbled target priming. Bottom right panel: effect size for the single fixed-effect factor NUMBER OF PHONOGRAPHIC NEIGHBOURS OF BASE in whole-to-part priming (cf. Table 6).

These results clearly corroborate the hypothesis that higher levels of entrenchment entail a reversal in the direction of composition such that subconstituents come to be perceived and interpreted in the light of stored knowledge of the whole rather than vice versa (i.e., perception of the whole via bottom-up composition of cognitively primitive parts). They also highlight the apparent paradoxes related to entrenchment: Although with stronger entrenchment, wholes are increasingly emancipated from their constituent parts, they also exert an increasing top-down pressure on these very parts, as witnessed by the fact that spelling mistakes become increasingly difficult to identify. In view of these findings, which clearly confirm that top-down coercion varies as a function of entrenchment, I decided to test whether the predictors under consideration are jointly predictive in a multiple regression model. Table 14 reveals that the only predictor to emerge as significant in such a model is PHONOGRAPHIC NEIGHBOURHOOD. Table 14.

Coefficients for the fixed-effect factors in a mixed-effects model fitted to log RTs in jumbled target priming. Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

9.0174

9.0179

8.9411

9.0931

0.0001

logRelFreq PhonGra NeiBa DerBasLetRat

-0.0005

-0.0005

-0.0065

0.0055

0.8756

-0.0094

-0.0094

-0.0118

-0.0069

0.0001

0.0050

0.0049

-0.0327

0.0428

0.8072

142

Supplementary experiments

142

For a simple reason which has gone unmentioned so far, this result is anything but surprising. Remember that the notion of downward causation presupposes a multi-layered view of the world, with different levels being hierarchically arranged in terms of complexity (cf. section 4.3.1.2). Now, complex words involve at least three hierarchically layered levels: the letter level, the morpheme level and the word level, in increasing order of complexity. The top-down coercion test, which requires participants to identify jumbled target morphemes as wrong, inherently focuses on the relationship between the morpheme and the letter level. As a consequence, it seems natural to expect that predictors referring to the relationship between the bases and their component letters, such as the variable indexing PHONOGRAPHIC NEIGHBOURHOOD density, will affect log RTs most strongly. However, due to the presence of a masked ‘real word’ derivative before the jumbled bases, my experiment included an additional, less immediate dimension of top-down pressure: It also explored the effect of whole complex word primes on individual letters. The finding that simple linear mixed-effects regression models relating to LOG RELATIVE FREQUENCY and DERIVATIVE-BASE LETTER RATIO yield significant results clearly attest to the fact that top-down coercion is a transitive phenomenon which percolates through adjacent complexity layers, reinforcing the claim that the phenomenon of entrenchment is much more intricate than usually assumed. Moreover, my findings signal the need to revise claims pertaining to the word superiority effect (cf. section 4.3.1): While it might be true that known words are not usually recognized in terms of their component letters, but rather on the basis of their shapes, this is even truer of words which are highly entrenched. While further exploration of the variables affecting the recognition of jumbled morphemes would presumably have brought to light additional predictors pertaining to the morpheme level (e.g., BASE FREQUENCY or different types of BIGRAM FREQUENCIES), I stopped my behavioural analysis of jumbled target priming at this point, since the primary purpose of my study was to investigate entrenchment in complex sequences.

143

Behavioural data analysis

6.2.2.

Behavioural analyses for the comparison with monomorphemic controls

143

Let us now come back to the prediction that with higher levels of entrenchment, bimorphemic derivatives should increasingly behave like monomorphemic items of matched length. More concretely, with regard to part-to-whole priming, this means that with higher usage frequency, log RTs should become increasingly similar to log RTs in part-to-whole priming with monomorphemic targets (e.g., negle – NEGLECT or approa – APPROACH). I tested this hypothesis by means of an analysis of variance with lmer, using SUBJECT as a random variable. P-values of fixed effects and highest posterior density confidence intervals were estimated by means of Markov Chain Monte Carlo Sampling with the function pvals.fnc. Incorrect responses were discarded from the analyses. As shown in Table 15, planned comparisons on the basis of treatment coding with monomorphemic words as a reference level revealed that log RTs for complex words from the lowand middle-frequency bins (as defined in section 5.1.1.1) were significantly different from log RTs for simple words. By contrast, log RTs for highfrequency derivatives did not differ significantly from those for simple words, thus confirming the hypothesis that the processing of highly entrenched items is similar to that of monomorphemic items. Table 15.

Result of an ANOVA, conducted with lmer, comparing mean log RTS to bimorphemic stimuli from three frequency bins and mean log RTs to tightly matched monomorphemic words in the part-towhole priming task. SUBJECT was fed into the model as a random variable. Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

9.0037

9.0037

8.9452

9.0612

0.0001

PC$High

-0.0020

-0.0020

-0.0294

0.0253

0.8748

PC$Middle

0.0836

0.0836

0.0559

0.1123

0.0001

PC$Low

0.2063

0.2061

0.1746

0.2377

0.0001

In a second step, applying the same method as before, I examined the hypothesis that with higher usage frequency, log RTs in whole-to-part priming with bimorphemic primes should become increasingly like those in whole-to-part priming with monomorphemic primes (e.g., dialect – DIAL, triple – TRIP). As shown in Table 16, planned comparisons with lmer con-

144

Supplementary experiments

144

firmed this prediction: While log RTs for low-frequency bimorphemic words were significantly lower than those for simple words (one-tailed: p = 0.029), those for derivatives from the middle- and high-frequency bins did not differ significantly from the reference level. This finding clearly demonstrates that highly entrenched derivative ‘wholes’ (such as government) are as emancipated from their constituent parts (e.g., govern) as monomorphemic items (e.g., triple) are from their pseudo-constituent morphemes (e.g., trip), at least at the stage of processing investigated in my experiments. Table 16.

Result of an ANOVA, conducted with lmer, comparing the mean log RTS for bimorphemic stimuli from three frequency bins to log RTs for tightly matched monomorphemic words in the whole-to-part priming task. SUBJECT was used as a random variable. Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

8.9036

8.9033

8.8519

8.9556

0.0001

EM$High

-0.0043

-0.0042

-0.0309

0.0220

0.7602

EM$Middle

-0.0105

-0.0104

-0.0381

0.0162

0.4648

EM$Low

-0.0256

-0.0255

-0.0530

0.0010

0.0674

Overall, the results reported in this section highlight the robustness of entrenchment effects. Although my selection of control stimuli deliberately stacked the odds in favour of strong entrenchment effects (cf. section 4.4), contrary to expectation, the p-values in Table 15 and Table 16 show that monomorphemic controls did not yield stronger effects of entrenchment than high-frequency bimorphemic items. 6.2.3.

Behavioural analyses for the memory task

The behavioural analysis for the last ancillary experiment, the memory task, involved three steps. I started by conducting statistical analyses testing the hypothesis that items on the high-frequency list should be easier to retain than those on the low-frequency list. In a second step, I considered the answers given on the questionnaire used to debrief participants after the memory experiments. Due to the relatively open format of the questions, subjects did not provide equally long and exhaustive answers to all of them, making them more readily amenable to qualitative interpretation than to

145

Behavioural data analysis

145

statistical analysis. Finally, a qualitative analysis of the mistakes made in the memory task was performed. Statistical analyses were again performed by means of the R statistical programming environment, version 2.12.1. A vector containing the number of correctly memorized words per subject was constructed for each list. Vectors were first tested for normality by means of the Shapiro-Wilk test (function shapiro.test). Since the vectors turned out to exhibit skewed distributions, I performed one-tailed paired Wilcoxon signed rank tests (function wilcox.test) after resolving ties (function jitter), the prediction being that performance on the high-frequency list should be better. A first Wilcoxon test comparing the overall performance on the highand the low-frequency list corroborated the hypothesis that high-frequency derivatives are easier to retain (V = 152, p = 0.01). This difference became even more significant after excluding a male subject who reported “a complete blackout” on the high-frequency list (V = 144, p = 0.0045). It is important to note that this effect obtains in spite of several participants grouping together words associated with feminity on the low-frequency list as a mnemotechnic device (wifely, womanish, poetess, lioness). Several subjects explicitly reported using this technique on the questionnaire, while many others seem to have employed it without consciously noticing, as far as one can tell from the order of items memorized. Other comments in the questionnaire were instructive in other respects. As far as the high-frequency list is concerned, it is interesting to note that at least some subjects considered the relevant derivatives to be monomorphemic. For example, one subject wrote that the words on this list “didn’t have suffixes”, and another participant analyzed them as having “fewer morphemes”. Many subjects noticed that words on the high-frequency list were “more common”. By contrast, words on the low-frequency list were generally perceived as complex. One participant, for instance, wrote that the list was composed of “2-part-words”. Several participants also noted that the words were “more unusual”. The errors made on the low-frequency list were equally telling. The most common mistakes were confusions between different suffixal paradigmatic competitors. For example, the target item gauntness was confounded with gauntly, gauntless (twice), gauntisch (sic!). Likewise, hatful was mixed up with hatless and hatly, playful with playable, womanly with womanish (six times!), careerist with careerish, commonness with commonist and commonplace, wifely with wifey, lioness with lionness, speakable with speaking, and shallowly with shallowness. Many of these items are not at all attested in CELEX (e.g., commonist, wifey, hatly, careerish, lionness), and those which are attested are sometimes even less frequent

146

Supplementary experiments

146

than the target word in terms of absolute frequencies (e.g., shallowness versus shallowly [6 versus 8 occurrences] and womanish versus womanly [7 versus 29 occurrences]; note that since the confounded items contain the same base, LOG RELATIVE FREQUENCIES will vary accordingly). Others paradigmatic confounds were more frequent than the target word (commonplace versus commonness [107 versus 1 occurrences], hatless versus hatful [6 versus 1 occurrences], and playful versus playable [53 versus 5 occurrences]). Considerably less frequently, I found that subjects had confounded the base morpheme while retaining the right suffix (e.g., readable instead of speakable, or scarcify (twice) instead of scarify). Moreover, there were unsystematic errors which could not readily be assigned to a target word (e.g., proton, weakly), some of which obviously contained orthographic errors (camear, countnish). In two cases, subjects only reproduced the target base (woman instead of womanish and wife instead of wifely), a mistake which also occurred on the high-frequency list (smooth instead of smoothly and bitter instead of bitterness). In general, mistakes on the high-frequency list were relatively rare and unsystematic, involving items which could not be assigned to target words (i.e., greatness, electronic, powerful, delightful, chief, wishful, management, deathly). To summarize, this section corroborated my entrenchment hypotheses on the basis of new kinds of data pertaining to memory performance, selfreport and systematic error patterns. The fact that memory performance was significantly better on high- than low-frequency items supports the view that complex high-frequency sequences enjoy a special, presumably unitary processing status. Answers on the questionnaire revealed that linguistically naive language users are to some degree aware of this. By contrast, the large number of suffix confusions on the low-frequency list strongly suggests that low-frequency derivatives are not represented in a holistic format. Rather, participants seem to have split these words up into their morphemic parts, only retaining the semantically rich and salient bases, while forgetting the suffixes. The fact that many participants produced base-suffix combinations which are even rarer than the actual target words indicates that they might have retained the relevant bases together with the very coarse-grained piece of information that these bases occurred in an unusual combination with some affix. Interestingly, virtually all cases of suffix confusion involved derivational suffixes actually used in the experiment, which indicates that participants might have analyzed them as belonging to a common category. Although these results clearly converge with the results for the priming experiments, it must be mentioned they

147

147

Behavioural data analysis

have to be taken with a grain of salt, since they are based on a relatively low number of items (20 items per condition). 6.3.

Conclusions

Statistical analyses for the main experiments revealed that contrary to widespread usage-based assumptions, the only frequency metric to be a significant entrenchment predictor is LOG RELATIVE FREQUENCY. More specifically, higher LOG RELATIVE FREQUENCIES correlate with shorter log RTs in masked part-to-whole priming (i.e., doubt – DOUBTFUL), but with longer log RTs in masked whole-to-part priming (i.e., dweller – DWELL), thus reflecting the asymmetric meronomic relationships required by my entrenchment operationalization. To make sure that these findings were more than an artefact of accidental characteristics of the CELEX database, I examined whether LOG RELATIVE FREQUENCIES from the HAL corpus were equally predictive. Table 17 and Table 18, which are highly similar to Table 1 and Table 2 respectively, clearly confirm that my findings generalize beyond CELEX. Table 17.

Coefficient for the single fixed-effect factor LOG RELATIVE as gleaned from the HAL database in a mixed-effects model fitted to log RTs in part-to-whole priming. MCMCmean: Markov chain Monte Carlo mean for the estimated coefficients; Lower, Upper: 95% highest posterior density intervals; pMCMC: Markov chain Monte Carlo p-value.

FREQUENCY

Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

9.0015

9.0014

8.9408

9.0569

0.0001

logRelFreq

-0.0248

-0.0248

-0.0291

-0.0205

0.0001

Table 18.

Coefficient for the single fixed-effect factor LOG RELATIVE as gleaned from the HAL database in a mixed-effects model fitted to log RTs in whole-to-part priming.

FREQUENCY

Estimate

MCMCmean

Lower

Upper

pMCMC

(Intercept)

8.8953

8.8953

8.8503

8.9430

0.0001

logRelFreq

0.0067

0.0067

0.0038

0.0094

0.0001

148

Conclusion

148

Crucially, the two other entrenchment predictors which emerged as statistically significant from my behavioural analyses of the main conditions were not frequency-related. Thus, I found that derivatives exhibiting a higher DERIVATIVE-BASE LETTER RATIO yield inhibition in part-to-whole priming, while at the same time eliciting facilitation in whole-to-part priming. In other words, this means that derived items displaying proportionally longer suffixes are significantly less entrenched. Although I am not aware of any claim to this effect in the literature, this finding seems intuitively plausible, as a longer suffix will be perceptually more salient and therefore presumably more prone to extraction. Another finding which does not seem to have been acknowledged in the literature is that derivatives including bases with a higher NUMBER OF PHONOGRAPHIC NEIGHBOURS are less entrenched, a result which is easily accounted for when considering that language users arguably cannot afford storing morphemes with many competitors in a fuzzy holistic Gestalt format. A morpheme ‘living’ in a dense neighbourhood must be stored in high-resolution-format including precise phonographic detail in order to be distinguishable from similar items. This strongly suggests that the paradigmatic salience of an entity has a decisive role to play in modulating entrenchment. I propose to use this term to refer to the perceptual distinctiveness of a given entity from other entities stored in the mind, an idea which obviously relates to the structuralist idea that the identity of a sign is at least in part defined relationally and on the basis of negative evidence, through evocation of and demarcation from what it is not. At the same time, the fact that PHONOGRAPHIC rather than ORTHOGRAPHIC NEIGHBOURHOOD values yielded significant results suggests an interesting point which I had not anticipated, namely that visual word recognition does not exclusively rely on visual information, but also to some degree on the sound structure of a word. In a further step, I explored whether the three entrenchment predictors identified in the main experiments were equally predictive of log RTs in jumbled target priming, which represented the first supplementary experiment. While each predictor was highly significant on its own, the variable NUMBER OF PHONOGRAPHIC NEIGHBOURS of base was associated with stronger entrenchment effects than LOG RELATIVE FREQUENCY or DERIVATIVE-BASE LETTER RATIO. This result again highlighted the apparent paradoxes related to entrenchment: Although with stronger entrenchment, wholes get increasingly emancipated from their constituent parts, they also exert an increasing top-down pressure on these parts, as witnessed by the fact that spelling mistakes become increasingly difficult to identify. On a more general plane, the result also corroborated the hypothesis that

149

Behavioural data analysis

149

higher levels of entrenchment entail a reversal in the quality of composition such that sub-constituents come to be perceived and interpreted in the light of stored knowledge of the whole rather than vice versa (i.e., perception of the whole via bottom-up composition of cognitively primitive parts). But why should the variable NUMBER OF PHONOGRAPHIC NEIGHBOURS of base be associated with stronger entrenchment effects than LOG RELATIVE FREQUENCY, which was the most robust entrenchment index in the main experiments? Remember that the notion of downward causation presupposes a multi-layered view of the world, with different levels being hierarchically arranged in terms of complexity (cf. section 4.3.1). Complex words involve at least three superposed levels: the letter level, the morpheme level and the word level, in increasing order of complexity. The jumbled target priming test, which requires participants to identify jumbled target morphemes as wrong, inherently focuses on the relationship between the morpheme and the letter level. As a consequence, it seems natural to expect that predictors referring to the relationship between the bases and their component letters, such as the variable indexing PHONOGRAPHIC NEIGHBOURHOOD density, will affect log RTs most strongly. However, due to the presence of a masked ‘real word’ derivative before the jumbled bases, this experiment included an additional, less immediate dimension of top-down pressure, as it also explored the effect of whole complex word primes on individual letters. The finding that variables gauging meronomic relationships beyond the letter level yield significant results clearly attests to the fact that top-down coercion is a transitive phenomenon which percolates through adjacent complexity levels, reinforcing my claim that the phenomenon of entrenchment is much more intricate than usually assumed. The second supplementary experiment compared the behavioural results for the main experiments to those for two control conditions involving monomorphemic items, the prediction being that with higher levels of LOG RELATIVE FREQUENCY, bimorphemic derivatives should increasingly behave like monomorphemic controls of matched length. An ANOVA confirmed that log RTs in part-to-whole priming with bimorphemic targets (e.g., pale – PALENESS) become more and more similar to those in partto-whole priming with monomorphemic targets (e.g., negle – NEGLECT or approa – APPROACH). In a similar vein, log RTs in whole-to-part priming with bimorphemic primes (e.g., gauntness – GAUNT) become increasingly similar to those in whole-to-part priming with monomorphemic primes (e.g., dialect – DIAL, triple – TRIP). As can be seen from these examples, word pairs in the control conditions differed from those in the relevant main condition along an additional dimension, besides being monomorphemic. Thus, unlike the primes in the

150

Conclusion

150

regular part-to-whole priming task, those in the relevant control task did not represent monomorphemic items. Likewise, unlike targets in the regular whole-to-part priming task, targets in the control task were not semantically related to the prime word. This design clearly stacked the odds in favour of strong entrenchment effects in the control conditions. I therefore expected that the control conditions should go along with considerably stronger entrenchment effects than the most strongly entrenched bimorphemic items. This hypothesis was falsified, attesting to the robustness of frequencyrelated entrenchment effects in multimorphemic items. My last supplementary experiment, the memory task, showed that memory performance was significantly better on high- than low-frequency items, thereby supporting the view that complex high-frequency sequences enjoy a special – presumably unitary – processing status. This interpretation is further strengthened by a qualitative analysis of systematic error patterns and answers on a questionnaire used to debrief participants after the memory experiments. To reiterate my main finding, LOG RELATIVE FREQUENCY emerged as the only significant frequency-related entrenchment predictor, which agrees with Hay’s (2001, 2003) results. Do these findings now totally undermine the usage-based tenet that entrenchment is affected by TOKEN FREQUENCIES in language use (cf. section 2.2)? Taking a conciliatory approach, I would like to claim that they do not, for two reasons. First, if you believe that TOKEN FREQUENCIES affect representation strength, it appears logical to assume that they do so at different levels of language representation (i.e., morphemes, complex words, phrases, etc.). (This interpretation obviously presupposes the existence of mental levels of representation which broadly correspond to the relevant theoretical linguistic levels. More details on what these mental levels are, how they arise, and how they relate to the idea of full storage of tokens will be provided in section 8.2). Under such an interpretation it is no coincidence that LOG RELATIVE FREQUENCY, which represents the log-transformed ratio between TOKEN FREQUENCIES at different complexity levels, correlates with entrenchment. The claim that entrenchment arises from the interplay of TOKEN FREQUENCIES at different levels of language representation is schematically illustrated in Figure 21:

151

151

Behavioural data analysis

dKd/s &ZYhEz͗Ͳ

dK