131 86
English Pages 320 [331] Year 2021
Introducing Linguistic Research Over the past decades, conducting empirical research in linguistics has become increasingly popular. The first of its kind, this book provides an engaging and practical introduction to this exciting versatile field, providing a comprehensive overview of research aspects in general, and covering a broad range of subdiscipline-specific methodological approaches. Subfields covered include language documentation and descriptive linguistics; language typology; corpus linguistics; sociolinguistics and anthropological linguistics; cognitive linguistics and psycholinguistics; and neurolinguistics. The book reflects on the strengths and weaknesses of each single approach and on how they interact with one another across the study of language in its many diverse facets. It also includes exercises, example student projects and recommendations for further reading, along with additional online teaching materials. Providing hands-on experience, and written in an engaging and accessible style, this unique and comprehensive guide will give students the inspiration they need to develop their own research projects in empirical linguistics. s v e n j a v o¨ l k e l is a senior researcher/lecturer in linguistics at the Johannes Gutenberg University of Mainz, Germany. She has longstanding research and teaching experience in a broad field of topics, including language typology, anthropological linguistics, language contact, and cognitive linguistics. Her regional focus is on Oceania. f r a n z i s k a k r e t z s c h m a r is a postdoctoral research fellow at the Leibniz Institute for the German Language Mannheim, Germany. She has extensive research and teaching experience in psycho- and neurolinguistics, studying primarily language comprehension and reading both from a basic and an applied perspective.
Introducing Linguistic Research SVENJA VÖLKEL Johannes Gutenberg University of Mainz
FRANZISKA KRETZSCHMAR Leibniz Institute for the German Language Mannheim
University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 103 Penang Road, #05–06/07, Visioncrest Commercial, Singapore 238467 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781107185500 DOI: 10.1017/9781316884485 © Svenja Völkel and Franziska Kretzschmar 2021 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2021 A catalogue record for this publication is available from the British Library. ISBN 978-1-107-18550-0 Hardback ISBN 978-1-316-63642-8 Paperback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents
List of Figures viii List of Tables ix Preface xi Acknowledgements xvii part i research basics 1 Empirical Research in Linguistics 1.1 1.2 1.3 1.4 1.5
Basics of Empirical Research Aspects of the Research Process Summary Exercises and Assignments Further Reading
2 Basic Research Methods for Data Collection 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
Research Design and Fundamental Considerations on Data Collection Observation Survey Experiment Mixed-Methods Design Summary Exercises and Assignments Further Reading
3 3 15 39 40 41 43 43 47 52 60 69 73 74 74
part ii specific research approaches of linguistic subdisciplines 3 Language Documentation and Descriptive Linguistics 3.1 3.2 3.3 3.4 3.5 3.6 3.7
Research Aims and Questions The Documentary and the Descriptive Approach Methodology Basic Research Findings Summary Exercises and Assignments Further Reading
79 79 82 84 98 102 103 104
v
vi
contents
4 Language Typology 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8
Research Aims and Questions The Typological Approach Methodology Basic Research Findings Explanation and Interpretation of the Results Summary Exercises and Assignments Further Reading
5 Corpus Linguistics 5.1 5.2 5.3 5.4 5.5 5.6 5.7
Research Aims and Questions Corpus-Linguistic Approaches Methodology Basic Research Findings Summary Exercises and Assignments Further Reading
6 Sociolinguistics and Anthropological Linguistics 6.1 6.2 6.3 6.4 6.5 6.6 6.7
Research Aims and Questions The Sociolinguistic and the Anthropological Linguistic Approach Methodology Basic Research Findings Summary Exercises and Assignments Further Reading
7 Cognitive Linguistics and Psycholinguistics 7.1 7.2 7.3 7.4 7.5 7.6 7.7
Research Aims and Questions Cognitive Approaches in Linguistics Methodology Basic Research Findings Summary Exercises and Assignments Further Reading
8 Neurolinguistics 8.1 8.2 8.3 8.4 8.5 8.6 8.7
Research Aims and Questions Neurolinguistic Approaches Methodology Basic Research Findings Summary Exercises and Assignments Further Reading
106 106 107 109 123 126 129 130 131 133 133 136 137 158 162 162 163 166 166 168 172 188 190 191 193 195 195 198 200 221 224 225 227 229 229 234 236 254 256 257 258
Contents
vii
part iii linguistic research across the discipline 9 Insights from Linguistic Research 9.1 9.2 9.3 9.4 9.5 9.6
Summary of the Subdiscipline-Specific Research Interfaces of the Subdisciplines Inter- and Multidisciplinarity Current Trends in Linguistics: Technological Impact & Data Management Concluding Remarks Exercises and Assignments
References Index
263 264 270 275 276 279 279 281 309
Figures
1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4 2.5 3.1 5.1 5.2 5.3 6.1 7.1 8.1 8.2 8.3 9.1
viii
The process of research The empirical and theoretical approach The island of research The main stages of a research process Prototypical usages of research methods according to three dimensions Types of observation Types of survey Types of experiment Single-method and mixed-methods approaches Example of an ELAN file Approaches to corpus linguistic research A typology of corpora Concordance lines (from COSMAS II for DeReKo) Stages of the researcher–community relationship in ethnographic fieldwork Underlying design of the non-linguistic spatial experiment The classical Wernicke–Geschwind model of language in the brain Neurocognitive and behavioural methods ordered by their spatial and temporal resolution Hypothetical ERP waveforms Interfaces of the subdisciplines
page 4 5 6 7 44 48 54 61 70 95 136 142 149 175 220 235 243 246 271
Tables
1.1 1.2 1.3 1.4 1.5 1.6 2.1 3.1 3.2 3.3 4.1 4.2 4.3 4.4 4.5 4.6 4.7 5.1 6.1 7.1 7.2 7.3 7.4 8.1
Basic kinds of research per linguistic subdiscipline page 10 Basic research questions of linguistic subdisciplines 19 Example data for statistics 34 Statistical measures and their visualisation depending on the level of measurement 36 Linguistic subdisciplines and their basic procedure 36 Linguistic subdisciplines and their basic outcomes 38 Basic questions about research design 46 Example research aims/questions in language documentation and descriptive linguistics 81 Documentary and descriptive research 83 Checklist for fieldwork preparation 88 Example research questions in language typology 108 Language families according to Ruhlen (1987) and Voegelin & Voegelin (1977) 115 Rijkhoff’s sample per genetic group and per sample size 119 Rijkhoff’s diversity values per genetic group 120 Dryer’s results for word order in relation to adposition per geographic area 120 Examples for the different kinds of universals 125 Tetrachronic table 125 Example research questions in corpus linguistics 135 Example research questions in sociolinguistics and anthropological linguistics 169 Example research questions in cognitive linguistics and psycholinguistics 199 List of common experimental task types in cognitive linguistic research 206 List of common experimental methods for data collection 208 Tasks and paradigms in language acquisition research with children 216 Example research questions in neurolinguistics 233
ix
x
list of tables
8.2 Temporal measures in neurolinguistics 8.3 Spatial measures in neurolinguistics 9.1 Summary of the basic research aims, methods, and findings per subdiscipline
245 249 265
Preface
This book is an introduction to linguistic research, particularly focusing on the multifaceted empirical investigation of human language. While the language sciences have probably always been empirically oriented to some extent, the second half of the twentieth century has seen a remarkable increase in empirical studies of language. Currently, many linguistics departments are specialised in one or more forms of empirical research, passing on in-depth knowledge to their students. As a result, many excellent introductions to the major empirical disciplines in linguistics – such as sociolinguistics, corpus linguistics, or psycholinguistics – have already been written. Given such broad coverage, one would be forgiven for asking why we felt the need to write yet another introductory book on empirical linguistics. The story of this introduction begins a few years ago, when we decided to teach a seminar on empirical linguistics together, after having taught empirical classes on our respective research interests (Völkel: language typology, anthropological linguistics, and cognitive linguistics; Kretzschmar: psycho- and neurolinguistics). It soon became very apparent that research questions and methods, basic terminology, and implications for linguistic theory differed significantly between linguistic subdisciplines. Because most advanced linguists specialise in no more than one or two empirical approaches (as our own research affiliations demonstrate), there were virtually no textbooks available for a cross-disciplinary introduction to empirical linguistics targeting undergraduate students and their instructors – in stark contrast to the wealth of subdiscipline-specific introductions and the few compilations for advanced researchers. After experiencing this for ourselves, we found that this situation makes it extremely challenging to teach a broad spectrum of different methodological approaches and, therefore, most empirical classes will tend to focus on the instructor’s or institution’s own areas of research expertise. Since it is our belief that transferring knowledge between linguistic subdisciplines is essential for increasing our understanding of language, a textbook providing a cross-disciplinary introduction to both empirical research in general and linguistic subdisciplines in particular seems to be an optimal starting point to foster such cross-disciplinary dialogue right from the ground up – i.e., by merging the specifics with teaching the basics of empirical linguistics to students. Therefore, the goal of this textbook is to provide an introductory and diversified insight into the vast realm of linguistic research, offering both an overview xi
xii
p re f a c e
of the theoretical foundations of empirical studies on the one hand and the basic research methods and distinct approaches of various linguistic subdisciplines on the other. Our aim is to provide a comprehensive general introduction to empirical linguistics that does not focus on individual methods alone. Instead, the book puts particular emphasis on identifying the commonalities, differences and interrelations between the empirical approaches towards the study of language in addition to presenting subdiscipline-specific methodologies. To the best of our knowledge, a textbook such as the present one including methodological approaches ranging from documentary, descriptive, and anthropologicallinguistic field studies to laboratory experiments in neuro- and psycholinguistics does not currently exist.
Who Is This Book For? Empirical investigations generally require, at the very least, expertise in the basic terminology of the scientific discipline. Thus, Introducing Linguistic Research is meant for MA students or advanced BA students in linguistics programmes – be it general or comparative linguistics; linguistic studies with a genealogical or a geographic focus (such as African studies); or a languagespecific discipline (such as English linguistics). It may also serve as a point of departure or a referral tool for empirical PhD projects. At the same time, we also intend to address instructors of introductory classes in empirical linguistics. The book provides a general teaching guideline and comprehensive fundamental information for teaching research approaches with which one is perhaps less familiar, at least at a basic level.
How Is the Book Structured? The following brief synopsis outlines the structure of the book which is made up of three major parts: Part I addresses empirical research in general. Chapter 1 provides introductory information on empiricism; the stages and elements of the research process; quality criteria; basic types of research questions and methodological approaches; data collection; documentation and analysis; interpretation, reflection and presentation of the findings; and research ethics. Chapter 1 also contains detailed instructions for keeping research diaries and making poster presentations as forms of documentation and presentation which we recommend for student projects. Chapter 2 presents the three main research methods – observation, survey (interviews and questionnaires) and experiment. For each method, we outline a range of subtypes, including the consequences of various
xiii
Preface
methodological decisions when methods have to be chosen in accordance with a specific research context. Part II focuses on empirical approaches and methodological procedures stemming from distinct linguistic subdisciplines. In contrast to other linguistic textbooks, we have not chosen linguistic domains – such as phonetics and phonology; morphology and syntax; semantics and lexicon; and pragmatics – as the basic underlying structure, since we have observed more diversity in research within a linguistic domain in contrast to more homogeneity in the main approach of a linguistic subdiscipline. In selecting subdisciplines, we aimed for a broad range of distinct empirical approaches covering field work and laboratory experiments; research on individual languages and comparative studies; and so on. Nevertheless, this is by no means an exhaustive list of linguistic research approaches. Focusing on current core approaches in linguistics, we have, for instance, left out all kinds of empirical research in applied linguistics – such as studies on language teaching, language translation, natural language processing, forensic linguistics, and so on. Chapters 3–8 describe the fundamental research questions; empirical approaches and findings of language documentation and descriptive linguistics; language typology; corpus linguistics; sociolinguistics and anthropological linguistics; cognitive linguistics and psycholinguistics; and neurolinguistics. Cross-disciplinary fields such as language acquisition, language contact or language change do not have separate chapters dedicated to them as they can each be studied with multiple of the specific empirical approaches mentioned above. Consequently, in Chapters 3–8 we provide a list of research questions including the cross-disciplinary fields and linguistic domains. Finally, Part III addresses linguistic research across the disciplines. Chapter 9 provides an overview contrasting the distinct empirical approaches, including their characteristics, as well as their strengths and weaknesses. Based on this summary, we discuss how the subdisciplines are empirically intertwined and how the findings of each subdiscipline contribute to the overall understanding of the complexity of human language. We use the cross-disciplinary fields of language acquisition and language contact as illustrative examples. Overall, we emphasise that no single approach is sufficient to answer all questions in linguistics and thus view mutual understanding of and collaboration between linguistic subdisciplines as a prerequisite for future linguistic endeavours. All chapters include short exercises for students to gain experience with different methodological issues and the basic methods, and there are ideas for small research projects that can be carried out in parallel to class meetings. Given its cross-disciplinary nature, our textbook cannot offer in-depth information as can be found in other more specialised books. Thus, for further specialisation on a particular methodological approach, more specific and detailed literature should be consulted. For this purpose, we list suggestions for further reading that we find particularly helpful at the end of each chapter.
xiv
p re f a c e
How to Use This Book Information for Students
The entire book provides you with a broad and comprehensive introduction to research in linguistics including different methodological approaches that study different fundamental research questions and, therefore, provide insight into different aspects of language. Depending on your educational level in empirical linguistics, you can use this book in different ways. As a beginner who has not yet worked empirically, we recommend working through the entire book in order to find your own research area of interest. If, however, you have already gained empirical experience and know your empirical focus, it is possible to work on individual subdisciplines in Part II, but you should always consult Part I, as it provides essential basic information without which Part II may be difficult to understand. Further readings for specialisation on various empirical aspects are given at the end of each chapter in Part II. Used in this way, our book serves as a starting point for subdiscipline-specific studies. However, even in this case, it may be useful to make yourself familiar with the other linguistic subdisciplines of Part II in order to get an understanding of where your field of research is situated within linguistics and how it interacts with research in other linguistic subfields. In this case, you can work with the book in reverse or nonchronological order and start with Part III. While Part I describes general aspects and procedures of a research project, Chapters 3–8 of Part II are structured according to the methodological process that underlies empirical research in general. This will enable you to follow and compare subdiscipline-specific research step by step: general and specific research questions, the particular approach, a detailed description of the methodology and its components and the basic findings of each linguistic subdiscipline. For handson experience with linguistic research, short exercises in Parts I and II will familiarise you with various aspects. Furthermore, the lists of possible general research questions in linguistics (Chapter 1, Section 1.2.1) and discipline-specific questions and project ideas (Chapters 3–8: ‘research questions’ and ‘exercises and assignments’) may inspire you to pursue your own research projects. The book may be a suitable starting point for independent research studies, but you should also consult your instructor about the particulars of the envisaged project.
Information for Instructors
The book resulted from a teaching schedule that we designed for our own purposes and implemented as an empirical module for the first year of the master’s program in linguistics at the Johannes Gutenberg University of Mainz. Thus, we propose a syllabus based on the average duration of a German semester
xv
Preface
which typically comprises about 13–15 weeks of teaching with two sessions (90 minutes each) per week, plus lecture-free time of 8–12 weeks: Chapter 1: Chapter 2:
approx. 4 sessions or 6 hours approx. 3 sessions or 4.5 hours (1 session or 90 minutes per method) Chapters 3–8: approx. 9 sessions or 13.5 hours (1–2 sessions per chapter/ subdiscipline) Chapter 9: approx. 1 session or 90 minutes The chapters include student exercises that can be worked on in class or as homework. They aim to give hands-on experience with various research components and techniques. In the final part of the course (i.e., the remaining sessions) and the beginning of the lecture-free period students should test their new theoretical and practical knowledge in their own small research project (about 4–6 weeks). Individual experience in designing and conducting small empirical projects produces valuable insight into what works, what does not, and why. Thus, the introductory class on empirical linguistics is intended to end with a project phase, including the following: Joint project meetings:
approx. 3 sessions or 4.5 hours (ideally 1 session or 90 minutes biweekly) Autonomous work on projects: remaining sessions/hours Possible additional tutorials: e.g., statistics (4 sessions or 6 hours) This teaching schedule is easily adaptable to a shorter course-run (e.g., trimester) by shifting the project phase partly or entirely into the lecture-free time and by offering an online platform for students’ dialogue regarding their questions and progress. The students’ task is to develop a research question (some general project ideas can be found in Chapter 1 and discipline-specific ones are included at the end of the Chapters 3–8) and a research design with adequate variables and methods to subsequently collect and/or analyse data. Projects can be carried out as group work or alone, depending on the research plan. As for group work, each group member should take part in every critical research component as laid out above. Students may freely choose a research topic, but it is also possible to specify a linguistic topic (e.g., the syntax of noun phrases) or domain (e.g., semantic research) that should be addressed by all projects, but each taking different methodical perspectives. For this purpose, we have included specific research questions per linguistic domain for each subdiscipline (Chapters 3–8: ‘research questions’). Such thematic specification may result in interrelated and complementary findings (cf. Chapter 9). From our own experience, student projects were especially successful when the individual research process was accompanied by joint project meetings in which students report on their progress and in which students and instructor
xvi
p re f a c e
discuss problems or open questions. In this way, students learn and profit from each other’s experience. If necessary, additional meetings for projects can be offered. Instead of the students’ performance assessment in terms of seminar papers, we recommend more research-relevant assessment formats such as an exposé/ proposal, a research diary or a poster presentation (which are described in detail in Sections 1.2.4 and 1.2.10). We encouraged our students to keep their research diary from the beginning until the end of the course. The research diary can be started with a loose collection of research ideas, topics, and questions from the beginning of the class. It can then be continued at the beginning of the project phase, starting with an exposé/proposal that should be assessed by the instructor and later updated by the student during the research process. The research diary serves several purposes. First, it helps students to structure their research project in terms of time and work capacity from the start. Second, the individual research process becomes transparent and comprehensible to both the student and the instructor which is especially helpful should problems arise. The research diary ends with a reflection on the feedback given at each project’s poster presentation. The poster presentation should take place at the end of the semester and is thought of as a student conference where students present and discuss the results of their projects. After a short period of time for everyone to familiarise themselves with the content of all posters, students have about 10 minutes to present their poster which is followed by a short discussion and feedback session. Additional teaching materials are available online: https://doi.org/10.14618/ ids-pub-10454 (Kretzschmar & Völkel 2021). These include presentation slides following the structure of the book chapters.
Acknowledgements
First of all, we are very grateful to our families and friends for their encouragement and understanding that so much time and energy has gone into this project. Furthermore, our thanks go to the many people and institutions in the academy who supported us in various ways. Without their support, we would not have been able to write this book. The conceptual framework of this book was developed as part of an innovative research-oriented teaching curriculum which was awarded an extensive grant by the ‘Gutenberg Teaching Council’ (GLK) of the University of Mainz in 2014. Moreover, various colleagues of different linguistic departments at the University of Mainz supported us with their ideas for empirical studies, including Kristin Kopf (German Linguistics), Ulrike Schneider (English Linguistics) and Raimund Kastenholz (African Studies). They complemented our own empirical knowledge which is of course shaped by our former university teachers, most of all Walter Bisang (Language Typology), Jürg Wassmann (Cultural Anthropology), and Matthias Schlesewsky (Psycho- and Neurolinguistics). Furthermore, we are particularly grateful to Marion Grein (German as a Foreign Language) for the constructive collegial exchange regarding didactic ideas in research-oriented teaching over the years. We are also very grateful to our colleagues R. Muralikrishnan and Lisa Friederich who tested the curriculum in their classes and gave us valuable feedback on previous versions of our teaching manuscripts. Finally, we thank the many researchers who agreed to contribute to our research colloquium on empirical research methods in linguistics held at the University of Mainz in 2016 for inspiring presentations and discussions on subdiscipline-specific and interdisciplinary research in the broad field of empirical linguistics: Harald R. Baayen, Balthasar Bickel, Michael Cysouw, Sonja Eisenbeiss, Adolfo García, Tanja Kupisch, Ulrike Mosel, Petra B. Schumacher, and Gunter Senft. Finally, our thanks go to Robert Mitchell for carefully proofreading our book; to Helen Barton and the editorial team of Cambridge University Press for their patient assistance, to the anonymous reviewers for their valuable feedback; and to the students who enthusiastically supported our project with their assistant work (above all Svenja Lüll, Karin Kuldva, Martin Schröder, Nairi Demirkiran, Leonie Steimel, Annika Esch, and Jacqueline Wiedner) or by their contributions in class.
xvii
PART I
Research Basics
1
Empirical Research in Linguistics
In this introductory chapter, we address some basics of empirical research (1.1) and illustrate some aspects of the research process (1.2). Our focus is on key concepts and classifications relevant for linguistic studies in distinct subdisciplines. After covering these foundations, the chapter contains suggestions for smaller exercises (to gain practical experience with individual research procedures) and tasks (to support readers in developing their own empirical project). Finally, we provide a summary of the contents of the chapter (1.3), followed by suggestions for further in-depth readings on diverse issues in linguistic research (1.4).
1.1
Basics of Empirical Research
Section 1.1 provides an overview of empirical research. Starting from considering what research is in the first place (1.1.1) and also looking at the interaction of empiricism and theory (1.1.2), we focus on the research process and its stages (1.1.3), as well as research components and basic classifications of types of research (1.1.4). Finally, in Sections 1.1.5 and 1.1.6, we address two important basic requirements of empirical research, namely quality criteria and research ethics. 1.1.1
Research
As many well-known researchers have aptly noted, an impression, or even a strong feeling that something is or is not the case, does not mean that this is actually true. According to Barnett (1948: 58), Einstein once said that ‘common sense is actually nothing more than a deposit of prejudices laid down in the mind prior to the age of eighteen’. Durkheim (2006 [1897]) similarly points out that the common sense is vague and unreliable and that we can only learn about the social world by thorough research. But What Exactly is ‘Research’? Research is the systematic search for new knowledge. It is generally pursued in academic disciplines – for which gaining new insights and contributing to the growth of knowledge is a crucial aim and systematic research 3
4
r e se a r c h b a si c s
a crucial approach. This means that existing knowledge about the phenomena in the world (such as language) always constitutes the foundation for further research. In this way, knowledge is constantly being questioned and built upon. In detail, research questions or even concrete hypotheses are formulated and then systematically examined on the basis of concrete experienced facts by using an adequate methodological procedure. The research taking place in this framework is required to fulfil certain scientific standards (i.e., quality criteria and research ethics, cf. Sections 1.1.5 and 1.1.6). Curiosity and a fundamental interest in finding answers to as-yet-unknown phenomena as well as common sense; an analytical understanding of complex issues are basic requirements for successful research. Common sense and intuition are vague, imprecise and unreliable – as stated above – and cannot replace systematic data-based research. Nevertheless, they remain important key tools for research. On the part of the researcher, common sense and intuition can serve as general guidance. They help to identify interesting issues, questions and hypotheses, which are the starting point for every research project, to find appropriate methods for investigation, and to interpret and reflect on the findings. However, it is crucial to ensure that the researcher’s intuition does not result in personal bias (cf. Section 1.1.5). On the part of research participants, intuition can provide valuable data, such as linguistic judgements or self-evaluation.
1.1.2
Empiricism and Theory
Empiricism and theory are often perceived to be a pair of terms with opposing meaning. However, as Popper (1963, 1973) has pointed out, research is an evolutionary process, consisting of the following three components: problems/ issues, theoretical considerations, and empirical examination (see Figure 1.1). This means that empiricism and theory actually go hand in hand. A theory is a mental system describing and explaining a phenomenon based on existing scientifically produced knowledge. This abstract system or construct serves not only to explain regularities regarding the phenomenon but also to make
(new) problems/issues
empirical examination
Figure 1.1 The process of research
theoretical considerations
Empirical Research in Linguistics
Empirical approach:
Theoretical approach:
bottom-up (data-driven)
top-down (theory-driven)
general inductiveanalytical
general deductivetheoretical
particular
particular
Figure 1.2 The empirical and theoretical approach
predictions. In the further process, these theoretical assumptions need to be verified by systematic data-based research. Based on new empirical findings, new issues arise. Either the empirical study confirms the theory, it necessitates modifications, or it disproves the theory completely and a new one must be formulated. In any case, it is always an improvement of (academic) knowledge in a continuous evolutionary process. Although scientific progress is based on theoretical considerations as well as empirical investigations, at base, empiricists and theorists take different approaches (see Figure 1.2). While empiricists start with data (i.e., actually occurring facts), which is analysed in order to identify general underlying structures or practices, theorists operate in the opposite direction, starting with a theoretical framework and deriving individual predictions that can subsequently be tested. More precisely, the theorist searches for evidential data in support of theoretical considerations. A radical empiricist would conduct purely explorative studies and abstain from any kind of anticipation regarding the data to be analysed. In contrast, a radical theorist would disregard empirical data – either for the development of a theory or even for its verification. These two opposing positions, however, rarely occur in their extreme form. Just as empirical research generally includes theoretical assumptions of some kind (i.e., a specific research focus or procedure based on theoretical premises, such as hypotheses or criteria for data classification), a theory needs to be tested against empirical findings. Thus, ideally, there is a dialogue between theorists and empiricists.
5
6
r e se a r c h b a si c s
1.1.3
The Research Process and its Main Stages
The illustration of Alemann (19842: 152–153, redrawn by Angelika Morgh, with modifications and translations by the authors) in Figure 1.3 offers a good depiction of the empirical research process in its overall complexity. A clear takeaway from this illustration is that the research process is by no means a straight path but rather a windy road with detours, forking roads, and backtracking, all of which a researcher cannot entirely anticipate in advance. The researcher has to evaluate options, make decisions, cope with challenges, work simultaneously on different aspects, recognise interrelationships, and so on and so forth. All in all, it is crucial to plan a research project thoroughly but also to remain flexible regarding potential replanning for practical reasons (e.g., the sudden unavailability of data sources, uncooperative behaviour of research participants, or misunderstandings resulting in unusable data). Thus, research practice entails, by definition, the unforeseeable, the best laid plans notwithstanding. Therefore, even researchers with extensive research experience will still encounter challenging situations during their projects. Nevertheless, previous research experience does provide helpful guidance when navigating this sometimes-difficult path. Ultimately, empirical expertise is gained primarily through practice. Therefore, this book can only raise awareness regarding
The island of research Ocean of experience How-to-proceed mount
Mine of serendipity
Forest of fatigue
Jungle of data analysis
Peaks of confusion
Jungle of authorities
Pass of money Mountain of hypotheses
City of hope River of words Bay of literature
Swamp of data
Canyon of despair Path of redesign
Research design Plains of report writing
Trail of more data Where-am-I fog
Gate of tactics
Wreck heap of rejected hypotheses Bay of idleness
Trail of revision Delta of publishers
River of data
The huge fundless desert
Uncharted territory
Swamp of lost manuscripts
Sea of theory
Isle of omniscience
Figure 1.3 The island of research
Delta of dirty data
Empirical Research in Linguistics
retrospective reflections on methodology adjustments - based on pre-tests/pilot studies; - to local circumstances, etc.
1. find an empirically accessible research issue (specific questions/ hypotheses)
2. specify a suitable research design (methodological considerations)
topic identification & planning phase
3. conduct the empirical research (data collection & editing)
4. evaluate the collected data (data analysis & interpretation)
implementation & evaluation phase
5. publish the research project & its outcomes (research publication)
presentation phase
Figure 1.4 The main stages of a research process
numerous fundamental aspects of research. Consequently, we strongly encourage any empirically interested beginners to conduct studies of their own as soon as possible to complement the insights available here. The main stages of empirical research projects are depicted in Figure 1.4. These stages basically follow on from one another as indicated by the arrows. Nevertheless, information acquired at later stages may lead to reconsidering aspects of a prior stage (e.g., a specification, a modification, or even a complete rescheduling of a research issue or an empirical procedure). At a certain stage (of data collection and/or analysis), however, replanning the research project may no longer be feasible – generally for practical reasons, such as limited time or financial resources. In that case, such considerations can only be discussed in the post-study reflections and appropriate recommendations can be made for subsequent studies.
1.1.4
Research Components and Basic Classifications of Research
The basic components of empirical research are: •
the researcher (who): This is the person who plans and conducts the empirical investigation. It might be a single person or a group of multiple researchers working together on a project. In the case of linguistic studies, the researchers are generally linguists. Depending on the performed activity, they can be called observers, interviewers, experimenters, analysts, and so on. An unpopular moniker in linguistics is ‘armchair linguist’ – a term that is used by corpus linguists (who study natural language data) to denote linguists working with conveniently generated language data. Field linguists studying a language by direct participation in the life of its speakers, in turn,
7
8
•
•
•
•
r e se a r c h b a si c s
can label all linguists who study a language outside of this context ‘armchair linguists’. the research issue (what): This is the research topic, or more precisely the research question or hypothesis to be answered or tested by the empirical investigation. In the case of linguistic studies, this pertains to a linguistic topic of any area (e.g., phonology, syntax, pragmatics, or language contact). Alongside the linguistic area, we can distinguish between research on language production or use (speaking/signing and writing) and research on language perception (listening and reading). Another basic distinction is between topics in applied linguistics (the study of language in relation to practical application or real-life issues, such as language teaching, forensic linguistics, translation, and language software) and topics in academic core disciplines of linguistics (as addressed in Part II). Thus, the object of research can be past and present languages and their varieties (see temporal research framework: diachronic vs. synchronic language/dialect studies) – either single languages (studies on individual languages/varieties) or more than one language, as in comparative/cross-linguistic studies (cf. Section 1.2.8), contrastive studies (cf. Section 1.2.8), or contact studies. A further distinction is made between languages and language varieties (dialects, registers, sociolects, etc.) with different numbers of speakers, degrees of diffusion, and levels of research (e.g., better researched or major vs. minor or less studied/unstudied languages or varieties). the research aim (why): The general aim is to answer the research question and thus to contribute to knowledge growth, and, in the specific case of linguistic research, to the understanding of language in all its aspects. the research design (how): This is the empirical approach and the methodological procedure of how the research issue is to be investigated. In linguistics, this differs fundamentally in the distinct subdisciplines, as we will make clear in Part II. the research participants/subjects (whom/who): As language is produced by humans, its speakers, writers, hearers, or readers can be regarded as data sources, or, more specifically, data providers or suppliers. Depending on the method of data collection, they are also called informants, interlocutors, or respondents in survey studies. In linguistic studies, these are people with the respective language skills as relevant for the research topic – that is, mainly native speakers but also certain subgroups (e.g., children – for the study of first language acquisition; certain bilinguals – for the study of contact phenomena, such as code-switching) or learner groups (for the study of foreign language acquisition). The researcher is generally not a research participant, but if so, it is then in the context of introspective research (cf. Section 1.2.5). The number of research participants can vary
Empirical Research in Linguistics
•
•
•
between one person (single case studies) and much larger numbers (cf. Section 1.2.3). the research data: This is the needed data that is to be analysed. In linguistics, this is language data and/or data from language-related tasks (such as acceptability judgements or reaction time measures), depending on the research topic. It can be collected or compiled within the empirical study or it can already be available from previous investigations. Thus, we distinguish between primary data (i.e., data collected by the researcher for the specific research issue), secondary data (i.e., data that was collected by someone else for another purpose but is usable for the researcher’s own analysis), and tertiary data (i.e., convenient data that is already edited/processed by someone else and can be used for further analysis). Furthermore, natural language data can be distinguished from language data, which can be generated explicitly by/for linguistic research. The transitions between these types of data are fluid (e.g., the compilation of primary data that already exists or quasi-natural language data resulting from conversations initiated by the researcher) and different combinations are possible (e.g., natural speech vs. elicited primary data). A final distinction pertains to the ethical treatment of data: confidential data (i.e., the research participants are guaranteed that nobody other than the researcher(s) can know their identity) versus anonymous data (i.e., not even the researcher knows the identity of the research participants). the research location (where): This is the environment in which the research or more precisely the data collection takes place. A basic distinction in linguistics is made between field research (i.e., research in a natural surrounding with numerous interacting variables) and laboratory studies (i.e., research in an artificially controlled environment with a reduced number of variables). Depending on the object of research, field studies can take place in different locations, while most laboratories are in industrialised Western cities with the corresponding infrastructure. However, mobile laboratory equipment allows for a combined research environment. Secondary or teritary data–based research takes place in an office environment; this may include the consultation of archives and libraries. the temporal research framework (when): This is the time in which the research is carried out. Altogether, it encompasses the time frame starting from the research-planning phase up until the publication of the empirical findings and, in particular, the time devoted to data collection. The data can either be collected at a certain point in time such as in synchronic research (i.e., studies on language at a specific point in time as is, e.g., generally the case in language typology) or in cross-sectional studies (i.e., investigations of a population at a specific point in time – e.g., comparing different age groups) or repeatedly over a longer period such as in diachronic research
9
10
r e se a r c h b a si c s
(i.e., studies on language as it develops over time, e.g., in historical linguistics) or in longitudinal studies (i.e., investigations of the same individual(s) over time – e.g., observing their change). Other than the mentioned kinds of research resulting from basic distinctions regarding a basic component (field vs. laboratory research, diachronic vs. synchronic research, monolingual vs. comparative or multilingual research, etc.), there is a further fundamental classification of empirical research relating to the research question/hypothesis (cf. Section 1.2.1), as well as to the data and the research design: •
•
qualitative studies: Typically, this kind of research aims at textual descriptions of a complex research issue (multiple interacting variables in natural situations) from a personal perspective (subjectrelated). In order to capture the multiple interacting parameters, qualitative studies are generally based on natural language data obtained from small samples (see Section 1.2.3) or even single cases, which is then analysed interpretatively (in-depth analysis). A potential critique is to what extent the data is representative and generalizable, but this is generally not the aim of case studies. quantitative studies: Typically, this kind of research aims at numeric presentations (object-related) of a restricted research issue (a limited number of focused measurable variables under standardised conditions). Other than measurable/countable units, quantitative studies require certain methodological procedures (large samples of standardised data) and analytical tools (statistical evaluation). A potential critique is whether the collected data is realistic and corresponds to natural data.
Table 1.1 gives an overview of the linguistic subdisciplines as presented in Section II and indicates the kinds of research this book deals with. Table 1.1 Basic kinds of research per linguistic subdiscipline Basic kinds of research Language documentation & descriptive linguistics Language typology Corpus linguistics Sociolinguistics & anthropological linguistics Cognitive linguistics & psycholinguistics Neurolinguistics
• Field research, collection and analysis of primary language data • Comparative/cross-linguistic research, usually by use of secondary data (grammars) • Analysis of natural language, generally by use of secondary data (ready-made corpora) • Field research, analysis of language data in the context of socio-cultural data of the speakers • Laboratory (or field) research and/or analysis of natural language data or typological findings • Laboratory research
Empirical Research in Linguistics
1.1.5
Quality Criteria
Empirical research is based on general quality criteria that need to be met in order to fulfil scientific standards and ensure that one’s research findings will be taken seriously. The three fundamental quality criteria are: 1.
2.
3.
objectivity (or neutrality): This criterion strives for the independence of the research results from the researcher or any person involved in conducting the research (analysists, data compilers, etc.). This means that the same research conducted by other researchers should deliver the same results. However, absolute objectivity is never a fully achievable ideal. Observations, for instance, are generally more or less shaped by the observers’ perspective and even just by their presence – regardless of whether they are aware of their influence or not (cf. Section 2.2). Therefore, intersubjectivity can be considered a more realistic aim. It implies that a described fact is equally recognisable and transparent (or comprehensible) not only for the researcher but also for various other people. In this way, intersubjectivity is contrasted with subjectivity. reliability (or dependability): This criterion aims at the exactness of the collected data, i.e., that the research method produces consistent results that are reproducible with the same methods under the same conditions (replicability). The exact repetition of research processes, however, is more realistic in quantitative and/or laboratory studies than in qualitative and/or field research in which it is only possible to find similar/comparable conditions but not identical ones. Reliability can be tested by: – the exact repetition of the research (where possible) – by the same or even different researchers (e.g., inter-observer reliability); – parallel studies, i.e., the same group of research participants (sample, cf. Section 1.2.3) is investigated twice (if the first investigation does not affect the second one) or two or more comparable groups are investigated by use of the same methodological procedure; – split half of the research results, i.e., the data is divided in halves, then each half is evaluated/analysed separately, and the two sets of results are compared. validity (or trustworthiness, credibility): This criterion means that the research procedure actually needs to measure what is intended and that the research successfully answers the research question. A distinction can be made between internal validity, i.e., the controllability of parameters to rule out alternative influencing factors, and external validity, i.e., the generalisability of research outcomes. The latter means applying the conclusions of an empirical study outside its proper context/scope of investigation such as to other research locations (ecological validity: applicability of laboratory findings to natural settings), to other people (population
11
12
r e se a r c h b a si c s
validity), or over time (historical validity). Validity has to be tested regarding a multitude of research components, such as the representativeness of the selected research population, the appropriateness of the used method of data collection and of the applied evaluation procedure, the plausibility of the conclusions (generalisations/ interpretations), and so on. First of all, we can ask ourselves, if the research results are plausible. And then we can rule out any kind of research bias. Furthermore, the methodology has to be sensitive enough to show robust results. The scope of application (or applicability) describes under which circumstances research results are valid. The data may be representative for a certain subpopulation or only under particular circumstances. Thus, careful consideration has to be given to the extent to which research outcomes are generalizable. Conversely, several fundamental issues can have a negative impact on research quality. Good scientific practice includes the researcher’s awareness of their own role in the research process. Therefore, we will point out some effects that the researcher may cause unintentionally: •
•
personal bias: These are distorted research outcomes that result from the researchers’ subjective viewpoint (selective perception) – their personal beliefs, thoughts, expectations, feelings, or attitudes (see, for instance, Bargh, Chen & Burrows 1996). As people are often not aware of themselves and take their own perspective for granted (cf. exercise in Section 2.2.4), researchers need to be conscious and critical of their self-perception via self-reflection. They should ask themselves questions like: – Do I have expectations or a certain attitude (such as personal preference or animosity) regarding the research topic? Does this become apparent in my behaviour or empirical design (e.g., questions)? – Do the research parameters reflect only my own speaker-, language-, or culture-specific categories that may be inappropriate for the research participants? Do I interpret the data in a way that is also appropriate given the categories of the research participants? reciprocal effects: These are (unwanted) research effects that result purely from the researcher’s presence, i.e., the researcher’s presence leads to certain behaviours on the part of the research participants, which would not have occurred or would have occurred differently in the case of the researcher’s absence (e.g., actions of hospitality and politeness vis-à-vis the researchers or restrained and sceptical behaviour in their presence). The observer’s paradox (cf. Section 2.2.3), for instance, acknowledges response effects of the observer’s presence. However, this holds true for any kind of research interaction, including observations, surveys, and experiments (cf. Sections 2.3.3 and 2.4.3). The use of media, in particular, may cause or intensify
Empirical Research in Linguistics
reciprocal effects (e.g., certain behaviours resulting from wanting to be recorded or from unease in the presence of cameras and recorders) (Lang & Lang 1953, 1973). Work with research participants always involves reciprocal effects. Thus, the aim cannot be to avoid them but instead to consider their impact on research. The recognition of reciprocal effects is based on the awareness of interpersonal ideas and motives. – How do the research participants conceive me (my role, my position, my culture, my behaviour, my research project, etc.) and how I am treated (e.g., as a guest, a language learner or a person of high status)? Do they have any expectations regarding my research? Finally, the crucial question to reflect on is whether and how these reciprocally caused behaviours affect the appropriateness and representativeness of the research outcome: – To what extent do the collected data represent behaviours that occur in natural situations? Is it possible that the research data reflect ideas that the research participants have about what I want to hear? Or do the research participants not correct my deficient language examples in order to avoid offensive behaviour? The fact that the researcher and research participant have mutual ideas and expectations of each other is not avoidable but it is crucial to be aware of these interpersonal processes and to take into consideration their impact on the research process. Otherwise, unconsidered personal bias and reciprocal effects threaten the validity of research outcomes. In summary, these quality criteria need to be considered as crucial guidelines throughout the entire research process, i.e., during the period of careful methodological preparation, as well as during the process of data collection and analysis (retrospective reflections on the methodology). 1.1.6
Research Ethics
The ethical foundation of research deals with the protection of shared human values. In general, respectful and responsible behaviour that does not abuse and exploit trust is a fundamental guideline of research interaction with respect to different parties, such as the researcher and the academic community, as well as the research participants, their society and environment (e.g., animals, plants, and other resources). •
ethical guidelines regarding research participants: Ethical behaviour means protecting participants’ personal rights and their well-being. First and foremost, they should not be harmed or compelled to do anything against their will. This implies that people are aware of the research and its consequences for them and participate voluntarily on this basis. Hence, the work with infants is
13
14
r e se a r c h b a si c s
particularly delicate. The researcher generally needs to inform the research participants about the research aim and the precise research issue, as well as the use of recording devices prior to data collection (cf. Section 2.1.6). In some cases, however, this information would cause unwanted awareness and ‘unnatural’ behaviour on the part of the research participants and, consequently, compromise the usability of the data. In this case, it can be an option to give more general information about the topic or possibly to inform the research participants immediately after data collection. Furthermore, the publication of data needs to be approved explicitly (ownership and property rights), irrespective of whether the research participants are guaranteed confidentiality (i.e., only the researcher knows their identity) or even anonymity (i.e., nobody knows their identity, not even the researcher). In linguistic experimental studies, in particular, it is ethically problematic to ask research participants to do something that could have a negative impact on their health (e.g., the administration of alcohol or medication to test its impact on language behaviour or PET-experiments where participants are given a radioactive contrast agent) – even though they may be willing to participate. The ethical considerations in linguistic field studies pertain particularly to the work with other culture and language groups, i.e., their customs and habits must be respected which requires awareness and knowledge of local rights as well as informal rules of conduct. Generally, respectful behaviour requires sensitivity and empathy, as the relevant ethical issues are not universal but depend on the kind of research, the specific topic, the procedure, the requested tasks, the cultural setting, and so forth. Further, whether and how to adequately compensate research participants is a challenging and controversially discussed issue – the amount and kind of compensation (services, items, and/or money) and the beneficiary (entire community or main informants only). In the laboratory, we generally need to attract people by payment, and in the field, it is quite challenging to compensate research participants without causing community-internal hassles (e.g., by envy or offense). However, in any case, the researcher should express gratitude for the data and offer the research participants access to the research outcome (e.g., in the form of reports). Meanwhile, most universities, research institutions, and funding agencies have ethical review boards and specific ethics policies that should be followed in addition to the frameworks of national and international rights. The work with research participants of other countries and/or the work in other countries calls for the compliance with the relevant local standards in the form of the local laws as well as informal requirements. Please also consider whether official approval (e.g., an ethical appraisal or research permission) of any institution monitoring interpersonal behaviour is required.
Empirical Research in Linguistics
•
ethical requirements vis-à-vis the academic community: Researchers are required to share their scientific knowledge, i.e., to publish the research and, optimally, provide the basic data. In this process, academic standards need to be fulfilled. This includes that the work of other people has to be identified (their ideas, theories, data, findings, etc.) in accordance with citation conventions. Otherwise, the intellectual property rights of the originator are violated. This is called plagiarism. Furthermore, good research practice means that the researcher presents clean data and transparent analyses, i.e., no falsification by the invention, manipulation, or unmentioned omission of data and/or methodological details, whereas unintentional, honest errors and diverging or even conflicting viewpoints are not considered scientific misconduct.
In situations of conflicting interests, rights, and values, the options have to be evaluated in order to find a practicable and ethically justifiable way of proceeding with research. The researcher’s interests and the academic practices of good research may conflict with the interests of funders or research participants. Furthermore, work in other countries might involve handling divergent local rights and informal practices. Even within the group of research participants, interests and rights may diverge. In the case that the researcher does not find an ethically practicable solution, the proposed empirical study is ultimately not acceptable and no longer feasible.
1.2
Aspects of the Research Process
In this section, we address certain aspects of the research process. Planning an empirical project includes the formulation of a concrete research question or hypothesis (1.2.1), the operationalisation of its parameters in terms of variables and values (1.2.2), considerations regarding the required data and sampling procedures (1.2.3), and the specification of further methodological proceedings to answer the research question. Practical aids for structuring and planning empirical projects, namely an exposé/proposal and a research diary, are described in Section 1.2.4. The implementation of the empirical study comprises, first and foremost, data collection (1.2.5, and the basic methods are addressed in more detail in Section 2), including data documentation in the form of written notes and recordings (1.2.6). Further aspects of implementing empirical research are data editing/pre-processing – such as transcription, translation, and annotation (1.2.7) – and data analysis (1.2.8). Subsequently, the results of analysis need to be reflected upon and possibly interpreted (1.2.9). The final stage involves the presentation and publication of the empirical project and its outcome in the form of posters, talks, articles, and/or monographs (1.1.10).
15
16
r e se a r c h b a si c s
1.2.1
Research Questions and Hypotheses
The overall aim of linguistic research is to find answers to open questions about language such as ‘How did language evolve?’ or ‘What is language?’ As these are very general questions impossible to be answered in a single study, the first step of an empirical project is the formulation of a more specific research question or hypothesis guiding the entire research. While hypotheses are generally precise assumptions or predictions (formulated as statements) that are empirically verifiable or falsifiable (i.e., they can be proven correct or false by an empirical study), research questions are more openly formulated as interrogatives (what, how, why, who, to what extent, is/are, do/does, etc.). There are some basic research questions which are associated with fundamental types of empirical research, such as: • •
descriptive studies: The overall aim is to describe a phenomenon. Thus, the basic question is ‘What is the case?’ explanatory studies: The overall aim is to find an explanation for a phenomenon. Thus, the basic question is ‘Why is something the case?’
While some explanatory questions can hardly be answered by empirical research but rather by theoretical considerations, correlations between two parameters can be tested empirically (‘What happens to B, when A changes?’). However, it is quite challenging to find direct empirical evidence for a causeeffect relationship (one parameter being the cause of the other parameter) beyond correlation. There is a fundamental order in which these research questions can be followed. Until we have no knowledge about what the case is, we cannot raise the question of why it is the case. Depending on the state of research regarding a particular topic, another basic classification of research is made: •
•
explorative or hypothesis-generating studies: In the case a topic is relatively unknown or unstudied and thus no specific ideas or assumptions yet exist, explorative studies aim at getting a first overview of the situation. Thus, the research questions are of a more general and open nature. problem-oriented or hypothesis-testing studies: In contrast to explorative studies, problem-oriented studies have a specific research focus as they build upon previous knowledge regarding a topic. The basic question of hypothesis-testing studies, for instance, is ‘Is s.th. the case given specific parameters?’ Furthermore, we have the distinction between:
•
qualitative studies: This kind of research aims at textual descriptions of a research issue. Thus, the basic question is ‘How is s.th.?’ Explorative studies are predominantly qualitative studies.
Empirical Research in Linguistics
•
quantitative studies: This kind of research aims at numeric presentations of a research issue. Thus, the crucial question is ‘How frequent is or to what extent does a particular phenomenon occur?’ Hypothesis-testing studies are predominantly quantitative studies.
The distinct approaches do not exclude one another. There are numerous studies combining them, e.g., a qualitative part to describe a topic in all its occurring forms and interacting factors and a quantitative part that follows up with the detailed analysis of single parameters. Regardless of which kind of research type is chosen, it generally follows along the lines of comparison – that is, the overall research aim is to compare two aspects (languages, speakers, linguistic forms, etc.). Hypotheses are precise assumptions to be tested. Thus, they generally build upon basic knowledge about a research topic. Depending on the postulated relationship between single factors, different kinds of hypotheses can be distinguished: • •
null hypothesis: It postulates that there is NO systematic relationship between the dependent variable (B) and the independent variable (A). alternative hypothesis: It postulates that there is a systematic relationship between the dependent variable (B) and the independent variable (A). This means that in the case A occurs or changes, B is affected. This interrelated change can be postulated in a nondirectional way (i.e., B changes in any manner) or in a directional way (i.e., B changes in a certain manner, such as increasing or decreasing, improving or impairing, etc.).
Furthermore, alternative hypotheses are classified according to the kind of logical relationship: •
•
deterministic hypotheses: These are conditional if-then statements (‘If A, then B’), describing a relationship between two parameters of which A represents the independent variable and B the dependent one (cf. Section 1.2.2). probabilistic hypotheses: These are conditional statements of the form ‘The Xer A, the Yer B’ describing a relationship of tendency between two parameters of which A represents the independent variable and B the dependent one (cf. Section 1.2.2).
Both kinds of hypotheses are unidirectional which means that no statement is made about the opposite relationship between A and B. Thus, ‘If B, then A’ or ‘The Yer B, the Xer A’ have to be examined separately and cannot be automatically deduced. How to Develop Your Own Research Question or Hypothesis
Overall, it is crucial to find a research question or hypothesis that is suitable for an empirical research project, interesting and feasible within the
17
18
r e se a r c h b a si c s
project framework. Certain kinds of questions are difficult or impossible to be investigated empirically. Questions like ‘How did human language evolve?’ or ‘How many languages will become extinct by the end of the 21st century?’ are better approached theoretically. In order to find a suitable empirical research question, we recommend starting with a loose collection of linguistic topics on the basis of personal interest and/or familiarity and to narrow down these considerations step by step coming to a concrete and clear-cut research question or hypothesis that is well-grounded in the current state of knowledge. In this selective process of topic delimitation and specification, it is crucial to consult linguistic resources on the chosen topic (literature review). Recent textbooks and handbooks with summarized issues of specialists in particular are a good starting point to gain a quick insight into a particular field. Instead of developing one’s own research project from the ground up, it can be easier to replicate other studies with marginal changes regarding a single parameter. Whatever research topic is chosen, it is helpful to work on topics of great personal interest or even enthusiasm in the first place. Otherwise, there is a greater risk of aborting or inattentively pursuing the research once difficulties inevitably arise (cf. Figure 1.2). Other than personal interest, it is important that the research topic is relevant for the discipline and that the specific research project addresses so far unstudied or understudied aspects. Furthermore, a research project can only be successful if it is practicable with regard to the required resources and the timeframe available. In order to gain an overview on the feasibility of the project, it needs to be planned, organised, and structured carefully. Therefore, we recommend writing an exposé/proposal (cf. Section 1.2.4) to discuss it with a supervisor and/or fellow students and to revise it during the entire research period. Now that you have learnt more about research questions, proceed with exercise 1.1 in Section 1.4. As a starting point for developing specific research ideas, you can use Table 1.2 which provides an overview of the basic research questions of various linguistic subdisciplines as presented in Part II of the book. More specific examples of research questions which may inspire you are then given in the first table of Chapters 3–8 – for each linguistic domain (phonetics & phonology, morphology & syntax, lexicon & semantics, pragmatics & discourse) as well as for the cross-disciplinary fields of language acquisition, language contact, and language change. 1.2.2
Operationalisation: Variables & Values
Operationalisation refers to the process of making a research question or hypothesis empirically measurable. This includes the identification of basic parameters or research units relevant for the study (called variables), the definition of their characteristic values or defined levels, and the selection of research objects carrying these values. For instance, the non-linguistic variable ‘sex’ has the values ‘male’ and ‘female’ and the carriers of these features or traits
Empirical Research in Linguistics
19
Table 1.2 Basic research questions of linguistic subdisciplines Basic research questions Language documentation & descriptive linguistics
Language typology
Corpus linguistics Sociolinguistics & anthropological linguistics
Cognitive linguistics & psycholinguistics
Neurolinguistics
• How do native speakers linguistically behave in various naturally occurring contexts? • What are the underlying structural patterns of an unstudied or less-studied language? • What are the common linguistic features, what are the differences of the languages of the world and how are the different features distributed? • What are quantitative or qualitative patterns of language use in natural situations? • How is language used in different social contexts, and is there a correlation between language-internal variation and the social features of its speakers? • Is there an interrelation between language and culture, or more precisely, in which way do linguistic forms and practices reflect culture-specific meaning? • Which cognitive conceptualisations are reflected in language, i.e., how is knowledge organised in the speakers’ mind? And which mental processes are active (and when) during language production, language comprehension, and language acquisition? • Is there an interrelation between language and thought, i.e., does a language influence its speakers’ cognition? • When and where is language processed in the brain? What is the genetic basis of human language? How does language processing take place in patients with neurological disorders and/or lesions?
are humans. The linguistic variable ‘gender’ has the values ‘masculine’, ‘feminine’, and possibly ‘neuter’ or other gender categories – depending on the studied language(s). Variables and their characteristic values/levels need to be well-defined according to the current state of scientific knowledge. Thus, it is vital to know the relevant literature (e.g., about gender categories and their definitions). Furthermore, they need to fit the research question or hypothesis (e.g., gender and sex are the key variables of a question like ‘To what extent does the grammatical gender of human-describing German nouns corresponds to sex?’). In order to measure every case exactly and only one time for each case, the values/levels of each variable need to be: – –
disjoint: the levels must not overlap (i.e., each case must be assignable to one level only); and exhaustive: the levels must cover the entire spectrum (i.e., each case must be assignable to a level).
20
r e se a r c h b a si c s
If we take the variable ‘age’ as an example, the levels ‘0–20 years’, ‘20–40 years’, ‘40–60 years’, ‘60–80 years’, and ‘80–100 years’ do not meet the criteria. First, the levels overlap (the age values 20, 40, 60, and 80 are assignable to two levels each), and second, the levels do not cover all possible cases (people who are older than 100 years cannot be assigned to a level). Thus, the levels need to be defined differently, such as: ‘0–19 years’, ‘20–39 years’, ‘40–59 years’, ‘60–79 years’, and ‘80 years and older’. Furthermore, the levels need to match the research question or hypothesis. The above proposed age levels are, for instance, not suitable for a study on first language acquisition. In this case, the most relevant age of about 0–10 years (the core age being considered depends on the specific research topic) needs to be sub-differentiated much more finely, while peripheral age groups can be combined into broader levels and irrelevant age groups can be excluded from the study. Suitable levels are, for instance, ‘0–6 months’, ‘7–8 months’, ‘9 months’, ‘10 months’, ‘11 months’, ‘12–13 months’, and ‘14–20 months’ if the main age range being considered is between 7 and 13 months and the research participants are no older than 20 months of age. Depending on the kind of research, the variables, values/levels, and carriers need to be identified and defined at different stages of the research project. While explorative studies generally aim at discovering parameters and values/levels as a part of their findings, in other empirical projects the researcher already needs to be aware of them from the beginning. In particular, quantitative hypothesistesting studies require a precise definition prior to data collection. However, even in qualitative descriptive studies it is important to deal with variables and values/ levels – in order to narrow down the research topic and to identify distinct parameters during data collection and analysis. While hypothesis-testing studies generally focus on a limited number of variables, descriptive studies in natural contexts face a complex network of interwoven variables. Overall, different kinds of variables are distinguished. The most basic distinction is derived from the overall research design (i.e., research questions/ hypotheses): • • •
independent variable (A): a variable that is assumed to have an impact on the dependent variable (B); dependent variable (B): a variable that is assumed to change under the impact of the independent variable (A); intervening variables or confounding factors: all other parameters that may also have an impact on the dependent variable apart from the independent variable: ➔ In descriptive field studies, all these parameters have to be recognised and described, while laboratory hypothesis-testing studies try to eliminate their impact by keeping them constant (at one value) or by equalizing them (working with a large quantity of diverse values regarding this parameter).
Empirical Research in Linguistics
Furthermore, variables can be categorised in terms of their values: •
•
•
categorical variables: There is a finite number of values which cannot necessarily be put into a logical order/sequence. These may be only two categories (dichotomous variables, e.g., sex: ‘male’ vs. ‘female’) or more than two (polytomous variables, e.g., gender: ‘masculine’ vs. ‘feminine’ vs. ‘neuter’). ➔ The values of categorical variables generally describe qualities or conditions. Such variables are qualitative variables. discrete variables (or count variables): These are numeric variables with values representing discrete points on a scale which are not decomposable, i.e., there is no intermediate values between two adjoining values. Regarding the number of siblings, for instance, the value can be 3 or 4 but you cannot have 3.8 siblings. continuous variables: There are numeric variables with continual values. This means that the values are decomposable, i.e., there is an infinite number of intermediate values between any two values. The time you spend reading, for instance, can be 2 or 3 hours per week but also 2 hour and 38 minutes, and so forth. ➔ The levels of numeric variables can be categorised into uniform (i.e., of the same size) or otherwise suitable units (e.g., the proposed age values for first language acquisition). ➔ The values/levels of numeric variables generally describe quantities or dimensions (e.g., age or number of occurrences). Such variables are quantitative variables.
Regarding the values, we also distinguish between different levels of measurement: – – –
nominal scales: The alternative values do not indicate a ranking of the differences (e.g., sex: ‘male’ and ‘female’). ordinal scales: The alternative values can be ordered hierarchically but there are no metrical intervals, i.e., no intervals of the same distance (e.g., ‘always’-‘often’-‘seldom’-‘never’). metric scales: The alternative values represent metrical intervals, i.e., intervals of the same size. If the scale also starts from zero, it is a ratio scale (e.g., age: 0, 1, 2, 3, . . . years). If there is no zero point, it is an interval scale (e.g., annual information: 2008, 2009, 2010, . . .). Another categorisation of variables is made in terms of their carrier:
•
individual variables: The carrier of the features is a single person/ language/etc. Furthermore, the individual variables may be absolute (i.e., its values describe features of one carrier, e.g., person deixis) or relational (i.e., its values describe features of two carriers in relation to each other, e.g., honorifics or relational deixis).
21
22
r e se a r c h b a si c s
•
collective variables: The carrier of the features are groups of people/ languages/etc. Finally, we distinguish between:
• •
manifest variables: They pertain to observable properties (e.g., usually sex) and latent variables: They pertain to properties that are not directly observable (e.g., exact age). Thus, latent variables require other methods of data collection (e.g. survey).
Now that you have learnt more about the operationalisation of research questions, proceed with exercise 1.2 in Section 1.4. 1.2.3
Data Foundation and Sampling
A fundamental part of planning an empirical study is the specification of a research design that serves to answer the research question/hypothesis and meets basic research requirements (quality criteria, research ethics, etc.). Initial methodological considerations result from the data that is needed to answer the specific research question or hypothesis. Depending on the topic, researchers need to collect or compile the data from different sources or they can access existing databases. No researcher can study an endless amount of data, taking into account all languages, all texts or language examples, and/or all speakers that/who are relevant for the research topic. Thus, we work with samples – i.e., representative selections of the basic set/population (which is the pool of potential languages, research participants or language items). Two important aspects are the size of the sample and the selection procedure. The size may vary between one instance (e.g., single case studies investigate one speaker), which implies a selection but does not really represent a sample, and a huge number of instances (e.g., in electronic databases). For reasons of feasibility, an increase in number or scope regarding one aspect (i.e., languages, speakers, etc.) generally comes with a decrease regarding others. For instance, sociolinguistic studies that investigate variation at the speaker level generally focus on one language, while typological studies usually compare multiple languages without taking language-internal variation into account. Similarly, in research on complex phenomena for which data on multiple aspects is needed to obtain a comprehensive understanding, case studies are feasible (decreasing the number of research participants). Otherwise, the scope of a study is not realistic for an individual researcher. Regarding selection, there are two fundamental sampling methods available: •
random sampling: This is a selection solely on a random basis. In order to run a random selection, all elements of the basic set (e.g., languages or speakers) need to be known/listed and each
Empirical Research in Linguistics
•
element needs to have an equal chance of occurring in the sample. Such sampling is used especially in large scale surveys (e.g., telephone- or internet-conducted studies). quota sampling (or systematic/purposive sampling): This is a selection guided by relevant research criteria. Elements of distinct subgroups regarding a certain variable (e.g., sex, age, or language family) are included in the sample. In this process several criteria can be considered, i.e., the subgroups are further subdivided according to another relevant aspect. The total number of subgroups then increases factorially (i.e., the numbers of values of all variables are multiplied). For instance, if speakers are grouped according to sex (male vs. female) and age (younger vs. middle aged vs. older), we get the following 6 subgroups: ‘younger males’, ‘middle aged males’, ‘older males’, ‘younger females’ ‘middle-aged females’, and ‘older females’.
Both kinds of sampling methods may be combined. It is, for instance, possible to choose specific subgroups based on relevant research criteria (e.g., male vs. female participants; postpositional vs. prepositional languages) and then to run a random selection within this/these subgroup(s). In contrast to quota sampling, it is also possible to reduce the basic set of all carriers based on aspects relevant to the research topic (e.g., bilinguals for research on code-switching, children for research on first language acquisition). This is called systematic/purposive preselection. The sampling then takes place within this basic subset. An unbiased data set is a fundamental prerequisite in order to avoid inaccurate and meaningless research outcomes. Thus, we strongly recommend carefully considering the representativeness of the sample (i.e., the fact that the sample has the characteristics of the basic set or population that are relevant to the research question): •
•
regarding its size: There is a critical minimum number of sample units to be included in a statistical sample, if inferences about the total population are to be made. Beyond this, the larger the sample size, the higher the precision when estimating unknown parameters (e.g., the proportion of verb-final languages with postpositions across the languages of the world). regarding the selection procedure: Random samples of sufficient size have a greater probability to be representative of the characteristics of the basic set/population that are relevant to the research question. By contrast, quota samples need to be checked for representativeness, particularly regarding the parameters that are relevant for the empirical project. In order to study gender-specific variation in language use, for instance, men and women should be selected in a balanced and representative manner.
23
24
r e se a r c h b a si c s
From a practical point of view, the most problematic aspect of any kind of sampling is the unequal access to elements of the basic set/population. This bears the risk of bias in favour of elements that are more readily available. In typological studies, for instance, unstudied languages are generally not included in the language samples and, thus, these kinds of languages are underrepresented; in laboratory studies, students of Western industrialised societies are overrepresented, by far, among the research participants; and in most ready-made corpora, oral text types are underrepresented in comparison to written ones. Thus, the collection of data should include considerations of unbiased sampling, just as working with an existing available database requires a review of its composition and awareness of the selection criteria. When data on relevant subpopulations are not available in a representative manner, they can be weighted to compensate for the imbalance. For instance, if written data are overrepresented in a corpus, this can be compensated by a representatively higher weighting of the oral data in the analysis. Alternatively, only as much written data from the corpus can be analysed as is proportional to the oral data available. Sampling units are the individual elements of the basic set/population that have been selected to be part of the sample. If sampling takes place in multiple stages, there are generally also several sampling units in successive order of clustering. For instance, in a study on language use across Germany a primary sampling unit could be the postal area (sampling by area code), the secondary unit the household (sampling by household within the previously sampled postal areas), and the final unit the individual (sampling by adult within the previously sampled households). Similarly, a first sampling unit in cross-linguistic research could be the language family, then the subfamily, and finally the individual language. Sampling units have to be distinguished from the unit of measurement and the unit of analysis which may be, but are not necessarily the same. While the unit of analysis is the entity about which you want to make a statement as determined by the research question/hypothesis (cf. Section 1.2.8), the unit of measurement (also called unit of observation) is the entity described by the (collected) data. For instance, in studies on the cross-linguistic distribution of postpositional languages (sampled by language family, subfamily, and finally language) the unit of analysis as well as the unit of measurement is the individual language. Now that you have learnt more about data foundation and sampling, proceed with exercise 1.3 in Section 1.4. 1.2.4
Practical Aids for Planning and Structuring Your Project: Exposé/Proposal & Research Diary
Empirical research involves careful planning and structuring, particularly prior to the implementation period but also over the course of data collection and analysis. In order to develop a methodological procedure, the following questions provide useful guidelines:
Empirical Research in Linguistics
• • • • •
What is the specific object of research? What kind of data is needed and where does it come from? Does the data need to be collected or is it already available? According to which criteria is the data selected? If the data needs to be collected, what kind of research participants are needed to gain the appropriate data? And how can they be found and motivated to participate in the study? What method(s) of data collection and what research location is/are most suitable? Are these methods feasible within the research framework and the limitations it imposes? How should the data be analysed (appropriate method of data analysis) and what does that mean for the method of data collection?
In the following, we describe two practices that support the planning and structuring of your own empirical project, namely, writing an exposé/proposal and a research diary. The aim of an exposé/proposal is to give an overview of a research project at the end of the planning process. It should contain information about the following aspects: • • • • • •
reasons for the choice of the topic (the process of its identification, its topical relevance, reasons for choosing it, etc.) preliminary work and prior knowledge topic specification and delimitation (the research aim, the question or hypothesis, and the focus of the investigation) literature on the topic (literature research, relevance of publications, academic basis for your own research, etc.) and state of research methods of data collection and analysis (considerations beforehand, reasons for the decisions made, details of the procedure, required resources/equipment/permissions, etc.) work-plan and time schedule (chronological order, length of the distinct periods, etc.)
The exposé serves the researchers themselves as a framework, i.e., helps them gain an overview of the entire project and to recognise whether it is feasible within the scope of research. Furthermore, it is intended to inform supervisors about the project and the researcher’s considerations. On this basis, they approve the study and provide guidance regarding aspects of the research. Finally, the exposé is a central component in the process of applying for research funding. Generally, funding agencies base their decision on this information. Now that you have learnt more about the research exposé/proposal, proceed with exercise 1.4 in Section 1.4. In the subsequent research process, we strongly recommend keeping an eye on the exposé, particularly the work-plan and timeschedule, and to add to it or adjust it if necessary. A research diary documents the entire research process from the researcher’s perspective in chronological order. In contrast to other scientific records
25
26
r e se a r c h b a si c s
(cf. Sections 1.2.6 and 1.2.10), it includes personal notes, such as thoughts, ideas, actions, reactions, observations, impressions, interests, comments, experiences, values, conflicts, reflections, as well as even emotions and assumptions on the part of the researcher. Hence, it provides an insight into the individual research process and its development in (personal) detail: Which aspects have I already taken into account? What inspired me? Which thoughts did I reject? What still needs to be done? And so forth. For the researcher, the diary serves (self-) reflection and organisation. It is a reminder of completed and outstanding project tasks, a pool of ideas regarding further studies, a means to document the research activities, and a resource for the identification of problems. Although the practice of writing research diaries is predominantly used in qualitative social science field research, it is a useful tool for supporting any kind of empirical process. Research diaries are individual and personal documents. They are generally not published and only given to others when this has been explicitly agreed upon. Despite the great freedom writing a research diary implies, there are some fundamental requirements: • • •
accuracy/exactness of the notes (the entries need to be made in a timely manner to avoid effects of forgetfulness and the distortion of events over time) regularity of the entries (usually on a daily basis) systematic, clear, and transparent structure in chronological order (by date and possibly further sub-structuring units (e.g., ‘to do list’ vs. ‘descriptions of performed activities’ or ‘facts’ vs. ‘impressions/ feelings’ – indicated by colour or by page division) Possible contents are:
• • •
•
•
development of the research question (areas of interest, topic selection, specification of research questions or hypotheses) literature (e.g., accessibility, completion status of screened literature, relevant content and quotes, bibliographical data) methodological considerations (self-reflections and reflections of pros and cons with regard to the research question/hypothesis, access to needed resources, such as funding, research permissions, participants, data, technical equipment, software, need of pre-tests, plan of procedure, etc.) implementation period (e.g., functionality and effectiveness of procedures, reasons for problems or failure, necessary adaptations to unexpected occurrences, personal perceptions, self-observation and reflections) teamwork (e.g., interpersonal relationships, content of group meetings, division of labour and clarification of interfaces, understanding of the work coresearchers are doing, difficulties and successes of the joint project)
Empirical Research in Linguistics
Now that you have learnt more about the research diary, proceed with exercise 1.5 in Section 1.4.
1.2.5
Data Collection
Empirical studies not building upon already-available data involve a stage of data collection. The most important aspect of data collection as well as analysis is a consistent methodological procedure. The three basic methods of data collection are: •
•
•
observation: This is the collection of sensually perceivable (particularly visible) information. In field linguistics, we primarily observe naturally occurring language behaviour which is, as far as possible, unaffected by the researcher or the research project – regardless of whether the data is recorded or otherwise documented. Note that observation here is understood as a method of data collection that is distinct from observation as a unit of any kind of measurement (cf. Section 1.2.3). ➔ introspection: This describes a process of self-observation, i.e., researchers retrieve the information from themselves. Thus, there is no data record that is accessible to others. Introspection is only practicable if the researcher can be classified as a competent speaker with regard to the object of research. As a basic method of data collection, it carries an increased risk of personal bias (cf. Section 1.1.5). survey (interviews and questionnaires): This is the collection of information via verbal interaction (questions and answers). The research participants (or more precisely, the informants or interlocutors) are asked to share their knowledge. However, not all kind of information is accessible via direct questioning (e.g., unconscious processes or detailed recalls of long-term memories). Furthermore, answers given have to be understood in the context of the informant’s sociocultural conversational norms and practices as well as the research setting. ➔ elicitation: This is a particular linguistic method of data collection. In its general meaning, it describes any kind of language data collection from a suitable research participant. In the following, however, we use it in its more specific sense of obtaining conveniently generated language data by systematic survey (cf. Chapter 3). experiment: This is the collection of systematic data regarding a particular aspect under controlled conditions (generally in laboratories). The research participant is given stimuli in which the independent variable is manipulated (generally by linguistic stimuli). The researcher then observes effects on the dependent variable i.e., reactions or the ensuing behaviour of the research participants.
27
28
r e se a r c h b a si c s
These basic methods are covered in detail in Chapter 2 – their subtypes or specific techniques, their pros and cons, the optimal context for their use, etc. Particularly in experiments and surveys, stimuli are used. These are items (such as texts, sentences, words, or pictures – generally representing contrasting or minimal pairs) that aim at the activation of a certain behaviour, called a response, on the part of the research participants. Furthermore, there are different tasks (such as reading, evaluating, completing, or matching) in which the stimuli are used. The research participants may be asked, for instance, to read, to evaluate, to complete, to match or to name stimuli. Thus, specific techniques of data collection have been developed (e.g., matched guise technique in sociolinguistics or director-matcher games in cognitive linguistics). The kind of method which is appropriate for a specific empirical project strongly depends on the kind of variables – i.e., natural language data needs to be collected by the observation of research participants in their natural surroundings; unconscious language behaviour cannot be captured by direct questioning; latent variables can only be captured by surveys; anonymous data (which is appropriate for studying sensitive issues in non-longitudinal research) can only be collected via large quantitative non-traceable questionnaires, etc. Furthermore, methodological selection is determined by the topic of the research parameters, such as the investigated language skills (speaking/signing, writing, listening, and reading) or the core areas of linguistics pertinent to the study. In general, creativity and an overall empirical understanding is needed in order to develop an appropriate research design. Various linguistic subdisciplines (such as language documentation & descriptive linguistics, language typology, corpus linguistics, sociolinguistics & anthropological linguistics, cognitive linguistics & psycholinguistics, neurolinguistics) have developed particular procedures of data collection and/or analysis to study distinct aspects of language. This will be covered in Part II of the book. As well as using a single method, it is possible to combine methods via cross examination or mixed methods (cf. Section 2.5). This describes the systematic combination of different methods and/or data sources in order to answer a certain research topic from complementary perspectives. If different methods lead to converging results, this strengthens the trustworthiness of individual findings and may allow for further generalisations across studies. Thus, different research approaches should not be regarded as conflicting but rather as supplementary strategies to gain a complex and profound understanding of a linguistic topic. Once a methodological design is specified, it may be necessary or prudent to conduct pretests/pilot studies. These are smaller collections of data prior to the main phase of data gathering. Their purpose is the development or improvement of the empirical design by pretesting the actual methods with a smaller data sample (e.g., fewer research participants) or by collecting essential information for the creation of more precise methodological tools. Furthermore, preliminary minor data samples can serve to gain an insight into the results to be expected.
Empirical Research in Linguistics
Pre-tests are particularly advisable in studies that involve data collection and analysis on a large scale and/or at great expense. The recognition of methodological shortcomings in pretests serves to avoid the collection of useless data on a large scale and waste resources. 1.2.6
Data Documentation: Written Notes & Recordings
There are basically two distinct techniques of data documentation: •
•
written notes (such as protocols or detailed reports, handwritten or in electronic versions): The researcher can take these notes simultaneously (i.e., during the situation of data collection) or after the observation, interview, or experiment. The longer the time in between data collection and data capturing, the more information will be lost, particularly details will be difficult or impossible to recall. Therefore, we strongly recommend to review notes or to write more detailed reports immediately or as soon as possible after the period of data collection. A further option is to instruct other persons to take (additional) notes (e.g., in diary studies, research participants are asked to write self-reports). recordings by use of technical equipment (i.e., audiotapes, video recordings, photographs, recordings of special devices for laboratory experiments, etc.): First of all, the researcher needs to be familiar with the technical equipment. As the recording runs parallel to the data collection, a second person supporting the researcher can be helpful – particularly, if the handling of the technical equipment is elaborate.
Each of these methods can be used separately. In most studies, however, both kinds of data documentation or different recording techniques are combined complementarily, as each one has its advantages and disadvantages: Writing notes is generally an insufficient method for capturing large volumes of data in a short period of time (high density of information). Thus, technical equipment is essential, particularly audio-/video recordings for documenting naturally occurring speech and visualising devices for collecting and capturing psycho-/ neurolinguistic data. They document the event of data collection in a greater complexity and allow for repeated viewing or listening and, thus, enable access to data that were not recognised the first time (temporal dimension). However, it is important to be aware that no media technique records all facts. Although video recordings log visual and auditory data in comparison to audio-recordings with just audio data, every type of recorder is limited by its positioning and its perspective (selective focus). An attentive researcher generally gains a more comprehensive overview of the entire situation and can react to a change of location or perspective in contrast to installed devices (mobility). Moreover,
29
30
r e se a r c h b a si c s
certain data, such as background information about the research participants (social status, profession, personal relationships, etc.) and about the overall setting or event (sketches of the overall arrangement, relationships between events, significant activities outside the range of recording, etc.) can only be captured by additional written notes. A further advantage of taking (hand)written notes is their usability in situations in which electronic or technical recordings are impossible (no electricity, no portable devices, etc.) or inappropriate (e.g., cultural or situational rules of conduct). Furthermore, the use of recording equipment generally leads to participants focussing more on the research situation which may result in intensified reciprocal effects and thus biased data (cf. Section 1.1.5). In sum, we strongly recommend carefully evaluating the options and their advantages and disadvantages regarding a specific research project. 1.2.7
Data Editing: Transcription, Translation, and Annotation
Before being able to evaluate linguistic data, the collected material generally needs to be edited or preprocessed. First and foremost, this entails the removal of unusable data points (e.g., typographic errors in written corpora or audio recordings of too low quality). A further editing process of linguistic primary data includes the transcription of oral texts (i.e., the systematic transfer of recorded data into a written form) or if necessary, the transliteration of written sources (i.e., the transfer of the text into another writing system), and the annotation of these transcripts or any other kind of written documents (i.e., the addition of grammatical information). A particular kind of annotation is tagging, which is used in computational and corpus linguistics (cf. Chapter 5). This is the categorisation of linguistic units based on grammatical information. Furthermore, working with languages that are not commonly comprehensible requires the translation of the text into a widely known language. For each of these data preparing processes, there are different variants on offer (as we will show). All in all, data editing is a means to an end, in other words, the researcher only needs to do what is necessary for the study. This also means to choose the most suitable variant of transcription/transliteration, annotation and translation. In any case, it is important to be consistent and to follow general conventions. A transcription is generally a written representation of a text as it was uttered, more precisely an unadjusted/uncorrected accurate version (incl. repetitions, slips of the tongue, break-offs, false starts, self-repair, etc.) illustrating the text sequence (incl. turn-taking and parallel statements). Other than such preserving forms of transcript, there are also more cleaned-up versions that do not include such features. Depending on the intended use, the researcher can choose the kind and complexity of transcription (i.e., which aspects need to be transcribed and which ones are irrelevant for the study). It may include or exclude prosodic features (e.g., stress, tone, lengthening, and pauses), paralinguistic ones
Empirical Research in Linguistics
(e.g., vocal quality, speech tempo, and vocal events such as laughing) and nonverbal communication (e.g., gestures, facial expressions, and eye contact). There are different kinds of transcriptions that can be distinguished: •
•
phonetic or phonemic transcription: This is a representation of the text on a sound level (phonetically: articulatory characteristics; phonologically: distinctive sounds), using a conventional system, such as the International Phonetic Alphabet (IPA) devised by the International Phonetic Association (www.internationalphoneticassociation.org/con tent/ipa-chart). orthographic transcription: This is a representation of the text on a grapheme level, using a writing system (generally the conventionalised one of the recorded language).
In comparison to an orthographic transcription, a phonetic or phonemic one is more time-consuming. Depending on the research topic, it is necessary to work with more detailed transcription conventions regarding the encoding of prosodic, paralinguistic, and nonverbal features. For the transcription of conversation, discourse, and dialogue studies, there are sophisticated transcription systems, such as GAT 2, CAT, or HIAT. The annotation of grammatical or otherwise relevant information facilitates readers to be able to understand linguistic analyses without knowledge of the original language. Depending on the encoded information, we distinguish between: •
an interlinear or morpheme-by-morpheme glossing, based on a conventional system such as described by Lehmann (1982, 2005), Comrie, Haspelmath and Bickel (2015) (the Leipzig glossing rules) or Haig and Schnell (2011) (GRAID). – glosses of lexical morphemes: the basic meaning, in lowercase (e.g., ‘go’ in example 1) – glosses of grammatical morphemes: in upper case or small capitals, using conventional abbreviations (e.g., ‘PST’ for past tense in example 1) (1) TONGAN: Na‘e ‘alu ‘a Pita ki kolo ‘aneafi. PST go ABS Peter ALL town yesterday Peter went to town yesterday.
•
the annotation or tagging of other linguistic units (such as word classes or phrases) or of other linguistically relevant information (e.g., status of speech act participants).
The annotations need to be adequate, consistent throughout the document, and as detailed as necessary (i.e., containing as much information as needed for the specific research).
31
32
r e se a r c h b a si c s
The translation does not need to meet professional standards. It is generally a free translation of the text into a language widely known by researchers or a lingua franca of the native speakers. Alternatively, the translation can be more literal if this helps to make grammatical structures more clear. The entire editing process includes the segmentation of the text, i.e., its subdivision into units (e.g., words, sentences, sounds, and/or phrases). While spaces standardly indicate word boundaries, segmentable morpheme boundaries are marked by hyphens (e.g., talk-s) and clitic boundaries by equals signs (e.g. Peter=‘s car). Components of nonsegmentable morphemes are separated by dots within the annotations (e.g., my: 1SG.POSS). Transcription software (such as Praat, ELAN, Toolbox or EXMARaLDA) may expedite the work, but only if learning their usage does not require more effort than the gains one can expect from them. With these programs, sound or video files can be downloaded and then played and edited in different tiers. There are multi-tiered formats that may include, for instance, a time-aligned transcription (sentence-by-sentence), further linked annotations (generally morpheme by morpheme), and a translation (sentence by sentence). Also, most transcription software allows for (complex) data searches and thus supports the data analysis. Now that you have learnt more about data editing, proceed with exercises 1.6 to 1.8 in Section 1.4.
1.2.8
Data Analysis
Data analysis is the systematic search for specific patterns in linguistic form and function. There are multiple analytical methods available which are based on distinct theoretical frameworks or metalanguages. Depending on the research question/hypothesis and the collected data, an appropriate method of data analysis needs to be chosen. While statistical evaluation is crucial for quantitative research, the analysis of qualitative data involves techniques that reveal underlying linguistic structures, processes, and relationships. In explorative studies, the data is evaluated in search of any systematic patterns, whereas data evaluation in hypothesis-testing research is strictly oriented towards the verification or falsification of said hypotheses. In fact, it is not possible to readjust for other data patterns in this type of study but rather their recognition generally results in new research projects. In general, the units of analysis are the entities about which you want to make a statement as determined by the research question/hypothesis. They depend on the theoretical framework used and the variables and values defined accordingly. Analysis units have to be distinguished from the units of measurement which may be, but are not necessarily the same (cf. Section 1.2.3). Linguistic analyses basically measure frequencies/probabilities, contexts/co-texts, and meanings of linguistic units.
Empirical Research in Linguistics
There are numerous analytical methods used in empirical linguistics so that the following list cannot aim to be complete but only give some insight: • • • • • • •
• •
•
•
distribution analysis: to study the (complementary or contrastive) distribution of linguistic units (e.g., distinct environments of allomorphs) feature analysis: to study the distinctive features of linguistic units (e.g., contrasting articulatory and acoustic features of sounds) component analysis: to study the components of a word’s meaning (e.g., distinctive features of kinship terms) conversation analysis: to study conversational techniques (e.g., turn-taking, repair, or adjacency pairs such as question-answer) discourse analysis: to study the generation of meaning (e.g., the expression of rejection in different social contexts) content analysis: to study communicative patterns by quantification (e.g., the use of positively vs. negatively connoted terms regarding a certain topic as defined in a codebook) network analysis: to study linguistic phenomena (e.g., language change) from an interactional perspective (e.g., how do individual speakers relate to each other and how do linguistic expressions spread within this network) corpus analysis: to study the occurrence of linguistic forms (frequency, co-occurrence, and distribution) in natural language data, particularly language-internal variation contrastive analysis: to study the structural differences and similarities of two languages (e.g., for establishing language genealogies based on the extent of similarities, or for identifying challenges in second language acquisition based on differences to the mother tongue) error analysis: to study the process of language acquisition (e.g., the kinds of grammatical errors occurring at different stages of first language acquisition, or patterns of L1-transfer in foreign language acquisition) comparative analysis: diachronically, to study language change; synchronically, to study differences between languages
In the following, we include some basic remarks on statistical evaluation as a starting point for further readings (cf. Section 1.4). Basically, there are two kinds of statistics: descriptive and inferential statistics. The difference between the two is that descriptive statistics are used to describe systematic patterns in any data set, whereas inferential statistics are additionally employed when generalisations from the sample to the basic set/population are of interest. This is most often the case when samples are comparatively small or otherwise not representative of the basic set/population. In addition, statistical methods are differentiated depending
33
34
r e se a r c h b a si c s
on whether they highlight differences between data points regarding a single variable or correlations (i.e., similar outcomes of at least two variables). Importantly, there are few if any causal relationships in linguistic research (even in laboratory experiments), so in most cases you will have to carefully interpret your statistical results as correlational effects (i.e., as x changes, y changes accordingly) rather than causal effects (i.e., x is the underlying cause for changes in y). Descriptive statistics allow for the concise presentation of complex, mostly quantitative, data – i.e., they describe distributions of individual data points within the data set. The simplest description includes absolute or relative frequencies. These are used to describe distributions pertaining to basic characteristics of the sample. If the data is collected from 3 women and 6 men, as in our example (Table 1.3), you could describe the gender distribution in absolute terms (‘3 women and 6 men participated in the study’) or you could describe it in relative terms (‘1/3 of the participants were female’). Combining both types of frequency is preferable as it provides more detailed information than just one of them alone. Besides frequency, measures of central tendency and of dispersion can be distinguished. Measures of central tendency (also measures of location) describe where in a data set most data points occur. Imagine a cloud made up of points, each one representing a response from a study participant. Measures of central tendency depict cases where data points within the cloud would occur in close vicinity. The three most commonly used measures are mode (the value shared by most data points, i.e., 21 words in the example presented in Table 1.3, as this is the most frequent value), arithmetic mean (where on average most data points occur; 171/9 = 19 words in our example, that is the sum of all values divided by the number of values), and median (the cut-off point that splits the data sets into two halves with equal numbers of data points above and below that value; 17 words in our example, as 4 values are above and 4 values below this middle value; for an even number of values, the median is the arithmetic mean of the two middle values). Usually, measures of central tendency are calculated for
Table 1.3 Example data for statistics Participant
No. of words
Peter Paul John David Henry James Mary Lucy Susan
14 32 11 17 24 21 15 21 16
Empirical Research in Linguistics
the dependent variable across participants to reflect patterns per condition (corresponding to the level of one independent variable or the combination of levels from more than one independent variable). While these measures are very informative, they require measures of dispersion (also measures of scale) to allow their sensible interpretation. These describe whether, overall, data points are located at positions similar or dissimilar from one another within the data set. There are several measures of dispersion such as range (between the two most extreme data points; e.g., 32 – 11 = 21 words in the example presented in Table 1.3), variance, standard deviation, or standard error. Their calculation is somewhat more difficult than for measures of central tendency, so we will not go into detail here. Note, however, that at the participant level they provide critical information, for instance, as to how well the mean across participants also holds for individuals. Therefore, they should always be reported together with a measure of central tendency. Inferential statistics are used to calculate the probability that a specific pattern occurs given the characteristics of the data set and serve as the basis to draw inferences about the representativeness and generalisability of the findings. In frequentist approaches, a specific significance level (e.g., p-value < .05) is used to assess whether the results will generalise from the sample to the population (if so, future studies with different participants should replicate the results). Bayes approaches refrain from using significance levels by comparing the probabilities of occurrence for different data patterns to one another. In any case, it is important that the characteristics of the data set are known to the researcher. For instance, many advanced tests require that the data conform with a number of statistical assumptions in terms of, e.g., sample size, normal distribution of the data points, and independence of data points that are to be explained by different independent variables. If a data set follows these assumptions, parametric tests can be used (e.g., t-test, Anova in frequentist approaches). If not, non-parametric tests have to be used (e.g., chi-squared test, Wilcoxon rank sum test). Generally speaking, the correct choice of statistical method depends on several aspects of your research design, most notably the level of measurement for both dependent and independent variables (cf. Section 1.2.2), sample size, and your focus on differences or correlations between variables. Statistics textbooks include guidelines on which test to use for a given research design. Table 1.4 provides you with some basic information on statistical measures, as well as data visualisation techniques, that are applicable depending on the level of measurement (Meindl 2011: 98). Statistical software (such as SPSS, SAS, or R) may expedite the evaluation but you need to be familiar with its use (i.e., how to put in the data and how to query statistical measures). Methods of data collection and analysis often go hand in hand with empirical research. Table 1.5 gives an overview of specific standard procedures to collect and analyse linguistic data per linguistic subdiscipline (see Part II of the book for further details).
35
36
r e se a r c h b a si c s
Table 1.4 Statistical measures and their visualisation depending on the level of measurement Nominal scale
Ordinal scale
Interval scale
Ratio scale
Interpretation qualitative differences (e.g., male vs. female) Mathematical NONE operations
order relation/ ranking (e.g., few, some, many, all) NONE (e.g,. no sums, i.e., you cannot add up ‘few’ and ‘many’)
ratio of values (e.g., 0 years, 1 year, 2 years, 3 years, 4 years) sums, differences, products & ratios
Statistical operations
frequencies & mode
frequencies, mode, median, quartiles
length of the interval between values (e.g., –10C, 0C, 10, 20C, 30C) sums & differences (e.g., no products, i.e., the statement ‘20C is twice as warm as 10C’ is not a sound statement – in which relation would –10C be?) frequencies, mode, median, quartiles, arithmetic mean, range, standard deviation
Visualisation techniques
bar/pie charts
(bar/pie charts), boxplots
(bar/pie charts), boxplots & scatter diagram
frequencies, mode, median, quartiles, arithmetic mean, range, standard deviation (bar/pie charts), boxplots & scatter diagram
Table 1.5 Linguistic subdisciplines and their basic procedure Basic method of data collection and analysis Language documentation & descriptive linguistics
Language typology Corpus linguistics Sociolinguistics & anthropological linguistics Cognitive linguistics & psycholinguistics Neurolinguistics
• Recordings of natural language data; transcription, annotation and translation of these recordings • elicitation and/or corpus analysis of recorded, transcribed and annotated texts • Cross-linguistic analysis of data from language descriptions or surveys (mainly questionnaires) • Analysis of natural language data • Participant observation and surveys (mainly interviews) and/ or analysis of linguistic data in relation to socio-cultural data • Laboratory or field experiments (cognitive tasks) and/or language analysis with regard to underlying concepts and processes • Laboratory experiments; quantitative analysis of neurobiological basis of language processing
Empirical Research in Linguistics
1.2.9
Reflection and Interpretation of Analysis Findings
Generally speaking, the outcome of data analysis is information that the data has revealed, i.e., what is the case and also what is not the case. Once the data is analysed, it is crucial to carry out a critical review of the research design and the data. These retrospective reflections include questions such as: • • • •
Is the data and the research outcome plausible? Do the results answer the research question? Can all kinds of bias be ruled out? Does the empirical study meet scientific standards (i.e., quality criteria and research ethics)?
In general, methodological critique relates to the suitability of the methodological procedure, the correctness of data analysis, and the validity of results. Furthermore, the results of data analysis can be interpreted. It is important to distinguish data analysis from the interpretation of analysis results. While data analysis results in summarised statements about the facts as revealed by the research data, the interpretation of these facts is information which the data itself does not show but which can be deduced or concluded from it. Thus, it is crucial to carry out data analysis prior to and separately from the interpretation of its results. Some research questions can only be investigated empirically on the basis of variables that are an indicator for a certain phenomenon. In this case, the findings are interpreted regarding the phenomenon that is not explicitly expressed by the data. For instance, while frequencies of form variants can be measured directly, cognitive effort can only be measured indirectly via (e.g., reaction times). A prior step of interpretation comprises the evaluation of data in the research context. In the field, this includes the consideration of culture-specific behaviours of the research participants (e.g., conversational practices have an impact on survey data). In the laboratory, the basic question is whether the artificial and generally unfamiliar environment has a distorting impact on the research participants’ behaviour. Table 1.6 shows the basic outcomes of the various linguistic subdisciplines as presented in Part II of the book.
1.2.10
Presentation and Publication
A basic requirement of academic research is its publication. Research findings can be published and/or presented in various ways: • •
conference and workshop presentations (posters and talks) publications (articles in scientific journals or edited volumes and monographs)
37
38
r e se a r c h b a si c s
Table 1.6 Linguistic subdisciplines and their basic outcomes Basic outcomes Language documentation & descriptive linguistics Language typology
Corpus linguistics Sociolinguistics & anthropological linguistics Cognitive linguistics & psycholinguistics Neurolinguistics
• Edited text corpora, (reference) grammars, and/or dictionaries • Universals (cross-linguistic commonalities) & rara (crosslinguistic particularities), typologies (cross-linguistic variation), and distributional maps • Patterns of language use, generally language-internal variation • Relationships between cross-linguistic or language-internal variation and socio-cultural parameters of the speakers • Mental conceptualisations as encoded in language, patterns of language behaviour (e.g., speed and accuracy), and relationships between language and thought • Patterns of brain activation in time and space
The basic elements of publications and presentations are the following: – – – – –
–
–
– –
title: It should be informative and concise in the sense that the addressed audience/readership gets an idea of the precise topic and/ or the fundamental findings. author(s) and their affiliation table of contents (particularly in monographs and edited volumes): It is a list of the chapter titles and subtitles in chronological order and their respective starting page number. glossary (particularly in monographs and edited volumes): This is a list of the meaning of technical abbreviations and symbols used in the publication (e.g., for annotations). abstract (particularly in articles): It briefly describes the entire study (topic & research question/hypothesis, empirical approach, and main finding) in one paragraph. (Note: An abstract is not identical to the introduction.) introduction: First, it outlines the topic and the structure of the publication or presentation. Furthermore, it provides basic information regarding the research topic, including the research question/ hypothesis, its embedding into the current academic debate on the topic, and basic knowledge as relevant for the study. main part: This is the description of the empirical study. It includes sections on: a. Methods (of data collection and analysis) b. Results/findings c. Discussion/interpretation of the findings conclusion: It briefly summarises the research outcome(s), discusses its/their meaning for the overall topic and identifies further research issues that result from the study. references: This is a list of all the sources that are used in the study.
Empirical Research in Linguistics
–
appendix (possibly): This is additional information, such as the (edited) data records, insights into data analyses, etc.
As a poster is often the first option for early career researchers to present their empirical projects (final or preliminary results) to a broader scientific audience, we will introduce this presentation format in more detail. Overall, the poster contains key information about a project in a clearly arranged form on one large page (generally DIN A0 in portrait or landscape format). As posters can look quite different, we will focus on some general recommendations: •
•
layout: – head section (incl. title, author(s), research affiliation, funding institution) – a clear comprehensible structure (sections/boxes with subheadings arranged in sequence guided by the content and taking reading direction into consideration) – an engaging amount of text (written in full sentences or in keywords) – visualisations (illustrations, tables, etc.) help to give a fast overview on an aspect and make the overall appearance more appealing – font size (font sizes smaller than 18pt may not be legible from a distance; references however can be printed in a smaller font size) – intelligible fonts and colours (different colours/fonts/font sizes indicate distinct aspects/sub-issues; similar aspects should be presented in the same way) content: essential information on the research project (e.g., topic: current state of research, research question/hypothesis of the study; methodological procedure: data collection, data analysis; results including examples; discussion; references)
The poster may serve as a guideline for a short oral presentation of the research project. However, it should also be meaningful on its own, i.e., without further explanation and comment by the author(s). Any type of presentation of the empirical study leads to feedback from a scientific audience. This exchange of information is valuable for a project in progress and for subsequent research. So when giving feedback, make sure it is constructive and precise. Now that you have learnt more about presentation and publication, proceed with exercise 1.9 in Section 1.4.
1.3
Summary
Empirical research is the systematic data-driven search for new knowledge by use of an adequate methodological procedure in compliance with scientific standards (quality criteria and research ethics). The research process consists of several stages, namely the identification of a research issue (a
39
40
r e se a r c h b a si c s
question or hypothesis), the planning of a suitable research design (identification of the research parameters, specification of the needed data foundation and methodological considerations of how to get and analyse the data), the implementation of the empirical project (data collection, documentation, and editing), the evaluation of the data (data analysis, retrospective reflection, and interpretation of the analysis findings), and the presentation or publication of the research project and its results. In linguistic studies, research objects are prior and present languages or language varieties. As language is produced and processed by humans, they are the data providers or research participants. Language or language-related data is collected via observations, surveys, and/or experiments, documented in the form of written notes and/or recordings. Prior to analysis the data generally needs to be edited or preprocessed (e.g., the transcription of oral texts, the annotation of these transcripts and/or possibly translations). In sum, linguistic research comprises a broad range of distinct approaches, such as field vs. laboratory studies, the analysis of primary data vs. secondary or tertiary data, qualitative and/or quantitative research, language-comparative vs. singlelanguage studies, diachronic vs. synchronic research, etc. The development of a specific empirical study always requires an awareness of its particular research components, knowledge of the different methodological options (as outlined throughout the chapter), and considerations with regard to their strengths and weaknesses as pertaining to the research project. Thorough planning and structuring of the empirical study (e.g., by means of exposé/ proposal and/or research diary) are essential as research process will inevitably confront researchers with unforeseeable situations. As empirical expertise is primarily gained through practice, we strongly encourage anyone to use the information in this book for developing and conducting their own research.
1.4
Exercises and Assignments
Exercises for students which can be included during a session on the basics of empirical research or as part of project work (cross-references in the text indicate the thematic context for each exercise): 1.1
1.2
What could be researched in linguistics? Write down a list of general topics, specific research questions, and/or hypotheses. Advice: You can start your research diary (cf. Section 1.2.4) with this loose collection of research issues. It can serve as an initial pool of ideas for developing your own research project. Operationalise the following hypotheses: ‘Women use a broader vocabulary than men’ and ‘Older children use a broader vocabulary than younger children’: a. What are the variables (dependent & independent)? b. What kind of variable are they?
Empirical Research in Linguistics
1.3
1.4
1.5
1.6 1.7 1.8
1.9
1.5
c. Which characteristic values can the variables take (disjoint & exhaustive, as well as suitable for the research topic)? d. Who/what is the carrier of the characteristics? Alternatively, you can operationalise your own linguistic hypothesis. Discuss which data foundation is needed for different kinds of research (e.g., studies on English-Spanish language contact or comparative studies on Austronesian languages) and what are relevant considerations regarding sampling? If you already have your own research question or hypothesis, specify the necessary data foundation and sampling procedures for this project. Write an exposé/proposal for your own research project (prior to the implementation period). With a length of about three pages, it should include information about the six aspects mentioned in Section 1.2.4. Write a research diary accompanying your own research project. We recommend to start it with a loose collection of possible topics and to finish it with personal reflections on feedback regarding the final research publication (e.g., in student projects feedback on a poster presentation). Imagine specific research topics and consider which data preparing process(es) is/are essential. Record a sequence (to make it easier, in a language you are familiar with) and transcribe it or part of it phonetically using IPA. Take a written text (to make it easier, in a language you are familiar with) and annotate it or part of it using the Leipzig glossing rules. Note: If you take a text that is already annotated, you can compare your own version with this interlinear glossing afterwards. Develop a poster presenting your own completed research project. If you have the possibility to present it, you can use the feedback for writing a publication.
Further Reading
Other monographs and particularly omnibus volumes on empirical research in linguistics are Schlobinski 1996, Wray & Bloomer 20133, Litosseliti 2010, Ender, Leemann & Wälchli 2012, Podesva & Sharma 2013 and Krug & Schlüter 2013. Furthermore, Albert & Koster 2002, Albert & Marx 2010, Paltridge & Phakiti 2010, Müller & Ball 2013 and Settinieri et al. 2014 address the topic with a focus on applied studies. Research methods for studying sign languages can be found in Orfanidou, Woll & Morgan 2015, whereas Wei & Moyer 2008 have a focus on bi-/multilingualism research, and
41
42
r e se a r c h b a si c s
Blom & Unsworth 2010 and Blume & Lust 2016 address research on language acquisition. Helpful literature in gaining a first overview of linguistic topics are Malmkjær 20022, Aronoff & Rees-Miller 2003, Allan 2016, and many other handbooks and introductions to linguistics or linguistic subfields. Aspects of linguistic analysis, in particular, are described in Heine & Narrog 2010 and Puglielli & Fascarelli 2011. For further insights in quantitative research, we recommend Köhler, Altmann & Piotrowski 2005 and Rasinger 20132, while methods of qualitative research are addressed in Lindlof & Taylor 20113 and Saldaña 20132. Creswell 20093 and Davies 2007 describe qualitative and quantitative research approaches. Literature on statistics for linguists include Woods, Fletcher & Hughes 1986, Baayen 2008, Johnson 2008, Meindl 2011, Gries 20132a, Eddington 2015, Levshina 2015 and Grant 2017. In addition, there are numerous books on statistics for social scientists, such as Field 20185, Field, Miles & Field 2012 and Bortz & Schuster 2010. A classic textbook that includes a comprehensive overview of both the basic foundations of empirical research and statistics is Kerlinger 19863. Many introductions to statistics also consider at least one statistical software (e.g., SPSS, SAS or R). For practical reasons, you should therefore also think about which software you are going to use when you choose a statistics textbook to work with. Further readings on the basic methods of data collection are listed in Chapter 2 and on more specific methodological aspects (e.g., of field or laboratory research) in the respective subchapters of Part II of the book.
2
Basic Research Methods for Data Collection
Upon finding a research question, the next step is to think about a research design including choosing the research method best suited to finding results which answer the research question. This chapter will familiarize you with basic considerations on methods for data collection (Section 2.1) and introduce the three basic methods for collecting primary data yourself: observation (Section 2.2), survey (Section 2.3), and experiment (Section 2.4). We will consider the fundamentals of each method, and also briefly allude to mixed-methods approaches (Section 2.5). Finally, the chapter contains a summary (Section 2.6) and recommendations for further reading (Section 2.7).
2.1
Research Design and Fundamental Considerations on Data Collection
After detailing several fundamental concepts in Chapter 1, we will now familiarize you with the three basic methods to collect primary data: observation, survey, and experiment. These methods can be arranged according to central fundamental oppositions such as qualitative vs. quantitative research, explorative/ hypothesis-generating vs. problem-oriented/hypothesis-testing research, or according to continua such as naturalistic vs. reductionist research designs. Figure 2.1 illustrates the position of the prototypical usages of these methods according to each of these dimensions. Observational methods represent a prototypical means of qualitative and explorative investigation of language in its natural setting, where the researcher only minimally interferes with the speaker’s language behaviour. In contrast, experimental approaches are prototypically employed with quantitative, hypothesis-testing designs studying language in reductionist laboratory settings. Here the researcher explicitly manipulates a limited set of variables and, thus, interferes with participants’ language behaviour. Surveys fall in between these two contrasting positions, depending on whether interviews (tending to be qualitative and explorative research) or questionnaires (tending to be quantitative and hypothesis-testing) are utilized. No matter which method you choose, an absolute prerequisite for data collection is that you find participants willing to cooperate. The choice of method depends on a number of questions you should ask yourself before beginning data collection. It cannot be overstated that this is one of the most important initial 43
r e se a r c h b a si c s
44
Qualitative design Exploration/ Hypothesis-generation Observation
Survey: Interview
Reductionist design
Naturalistic design
Survey: Questionnaire Experiment
Falsification/ Hypothesis-testing
Quantitative design
Figure 2.1 Prototypical usages of research methods according to three dimensions
steps of your project (along with the literature review), because it will determine the validity, reliability, and replicability of your empirical results. These questions are related to the four basic aspects of your research design: •
• • •
basic approach: Qualitative, quantitative, or both? Hypothesis-generating/explorative or problem-oriented/hypothesis-testing? Number and kind of dependent and independent variables? Determination of intervening variables and/or control of confounding variables? sampling: Cross-sectional or longitudinal design? Sampling procedure and sample size? method for data collection: Observation, survey, experiment, or mixed-methods? data analysis: Which analysis approach is appropriate to analyse the data?
The research design is the starting point of any empirical project, as it lays out the matrix of the study, from research problem and question to operationalisation of variables and levels of measurement, and, finally, to data collection and analysis (cf. Section 1.1.3). Hence, choosing a method for data collection should
Basic Research Methods for Data Collection
be in line with the upcoming stages in the research process (cf. Figure 1.4). Ideally, you should be able to justify the choices you make in the development of your research design. Often, there is no single optimal design, but several feasible options, each with its own (dis)advantages. Multiple options notwithstanding, you should strive to come up with a research design which yields reliable, valid, and replicable data and in which the choices regarding each of the four basic aspects noted above do not contradict each other. Table 2.1 lists some specific questions regarding each of the above basic aspects that you need to answer when developing your research design and, thus, when choosing a method for data collection that is able to produce suitable results. We recommend that you consult both the relevant literature and your project advisor to discuss your choices and the reasons for them, especially when attempting your first empirical project. In addition to these general considerations, there is the practical aspect of actually conducting empirical research: you should supplement the data collection with a protocol that documents meta information such as disruptions, details about the location/time of data collection, participants, etc. Such a protocol is not identical with a research diary (cf. Section 1.2.4), but supplements data documentation (cf. Section 1.2.6). Ideally, each individual data file has its own protocol. The protocol should include any information that may become relevant for data analysis. It will provide you with additional information about which data points you may have to exclude from analysis to ensure high data quality, or in which context the data needs to be interpreted. Such protocols are important for any method of data collection. At the least, a protocol should include the following information: • • • • •
unique code to identify the data points unequivocally, e.g., participant ID and/or ID of the observation/survey/experiment unit date, time, and location of the data collection ID of the person responsible for data collection specifics of the recording session, e.g., (un)expected pauses, systematic units during recording additional remarks on specific or general aspects of the data collection
In the following sections, methods are arranged in sequence from the most explorative/naturalistic to the most hypothesis-testing/reductionist based on their prototypical uses. Bear in mind, however, that none of these methods is exclusively bound to being used in hypothesis generation vs. hypothesis testing or reductionist vs. naturalistic research, but there are standard usages and some research phenomena may benefit from the complementary use of several methods. The following subchapters on observation, survey, and experiment provide you with definitions, the range of subtypes to choose between, aspects of methodological implementation, and advantages and disadvantages of each method. In the sections on definitions and subtypes of each method you will see
45
46
r e se a r c h b a si c s
Table 2.1 Basic questions about research design Design aspect
Why it matters
Basic approach • What is my research question or hypothesis? • What kind of data is needed to investigate it? What are the research parameters?
Your answer to these questions will typically determine whether you do exploratory vs. hypothesis-testing research, and whether you focus on nominal differences (a more qualitative perspective) or quantification on differences or commonalities (a more quantitative perspective).
Method for data collection If you need to collected primary data, you should consider the following aspects:
Some methods prohibit direct interference by the researcher (e.g., hidden systematic observation), while others do not work without it (e.g., task instruction in experiments). This will determine whether you use more naturalistic or more reductionist methods for data collection. Importantly, if the chosen method for data collection cannot deliver sensible and reliable results, you should rethink your method choices! There will certainly be a trade-off between your answer to the last question and the preceding ones. The method that is most suitable to study your research question/hypothesis may not be the one you can master in your project (due to limitations in time, funding, or skills). In case you work with a more feasible compromise, shortcomings of the used method should be considered as part of the discussion and interpretation of results in the study report.
• Which method is adequate to collect the data? Does my choice of method conform with the current standard in the research field? • To which extent can the method measure what I want to study? What is directly measurable and what can be concluded? • Do I have access and sufficient experience, time, and money to master the method I want to use in my project?
Sampling • Do I have access to the population, or is it restricted in time and/or space? How many participants do I need, and how do I appropriately sample my participants?
Special populations (e.g., children, older adults, indigenous people in remote areas) need additional preparation in terms of access to the sample, ethical considerations, and special care to be taken during data collection and analysis. A longitudinal design in which the same participants are studied several times during their lifespan requires long-term access. In general, the more labour- and time consuming the data collection per participant is, the smaller is the sample size.
Data analysis • Which type of analysis is required to analyse the data appropriately? Do I have the necessary (methodological or statistical) knowledge? Do I have planned for sufficient time to analyse the data during my project?
There is a tight connection between research design, data collection, and data analysis. Make sure that, once you have chosen a method, you also take the necessary steps to learn appropriate analysis techniques (e.g., discourse analysis, inferential statistics) to make the most of your data.
Basic Research Methods for Data Collection
that the transition between the three methods is often fluid. The subchapters on aspects of methodological implementation will focus on those aspects in the research design that may introduce severe bias if they are not implemented thoroughly. For observation, this pertains to the observer’s paradox, personal bias and the development of an appropriate coding scheme which will ultimately determine the variable values/levels included in analysis and interpretation of the data (Section 2.2.3). For surveys, inappropriate question formats, sampling issues, respondent-based errors and interviewer effects may bias your results (Section 2.3.3). For experimental methods, we will concentrate on the design of appropriate tasks and stimuli/test materials and the control of confounding variables to maintain internal validity (Section 2.4.3). Needless to say, these aspects – variable definition, sampling, confounds – essentially apply to all methods, but in the literature they are often discussed prominently with respect to a particular method.
2.2
Observation
In this section, you will be introduced to the fundamentals of observation: definition (Section 2.2.1), the different types of observation (Section 2.2.2), observer’s paradox, personal bias, and the importance of a well-defined coding scheme (Section 2.2.3), and specific strengths and weaknesses of carrying out an observational study (Section 2.2.4). Finally, the section includes exercises/tasks that will support you in gaining practical experience with the method of observation. Note that, throughout this chapter, observation is defined as a specific method for data collection and is not used in the general sense of studying particular entities (cf. Section 1.2.3: unit of observation/ measurement) as part of any kind of research regardless of the employed method of data collection. 2.2.1
Definition
Observation is defined as a collection of spontaneous, perceivable data without systematic verbal interaction between researcher and participants. In linguistics, the object of observation is, generally, language behaviour of speakers in its natural setting such as in language documentation (cf. Chapter 3) or the compilation of corpora (cf. Chapter 5). The observation either concentrates on a set of variables which are defined prior to data collection or leads to the exploration of parameters that can be studied in further problemoriented research. As such, observation is a systematic scientific method that differs from observation as we understand it in everyday life. In contrast to the other two main research methods (survey, experiment), observation explicitly aims to interfere as minimally as possible with the natural communication situation, yielding a high degree of naturalness. Advanced technical equipment (such as in neurolinguistic experiments) is typically not
47
r e se a r c h b a si c s
48
required. The resulting data are comprehensive in that they contain a multitude of different variables and can be analysed both qualitatively or quantitatively.
2.2.2
Types of Observation
There are several dimensions or criteria to distinguish types of observation (cf. Ricart Brede 2014), and each captures a different aspect present during observation. We will depict these criteria one after the other (with the resulting types), but note that they can be combined freely when planning an observational study. Figure 2.2 provides a summary of the criteria and resulting types of observation. The main criteria differentiating between types of observation are: • •
degree of participation of the researcher in the situation speakers’ awareness of the observation Degree of participation yields the two major types of observation:
•
•
participant observation: The researcher participates in the situation to be observed by interacting with participants in their natural setting. This method is used in sociolinguistics and anthropological linguistics (cf. Chapter 6.3.1) where it specifically includes the interaction with speakers in their communicative situation. Participant observation allows researchers to obtain an emic perspective (cf. Section 2.2.3). non-participant observation: The researcher is an external bystander and does not participate in the situation to be observed. Open observation
Hidden observation
Speaker awareness
Observation in natural situations
Participant observation Researcher participation
Observation
Situation of observation
Non-participant observation
Object of research
External observation
Figure 2.2 Types of observation
Introspective observation
Observation in researcher-induced situations
Basic Research Methods for Data Collection
In contrast to participant observation, it is easier to conduct for researchers, as they do not have to simultaneously observe and keep data records while actively participating. This reduces the risk of selective perception. Regarding the speaker’s awareness of the observation, two types can be distinguished based on whether the researcher hides their role and intentions from the informants or not: •
•
hidden observation: The researchers hide their role and intentions from the informants. This can be problematic in ethical terms. Some study designs may need approval by an ethics review board, which in turn necessitates that participants in the study give their informed consent for participation prior to data collection. So, even if your research question requires that your participants remain uninformed about the observation taking place, you should inform them about the study after you have finished data collection (cf. Section 1.1.6). open observation: The researchers do not hide their role and intentions during data collection. This is strongly linked to the observer’s paradox, (cf. Section 2.2.3), yielding unpredictable consequences with regard to data quality. Depending on the topic, it can be advisable not to provide participants with overly detailed information about the research aim and topic.
There are two further distinctive criteria distinguishing subtypes of observations: • •
situation of the observation: The researcher observes spontaneous behaviour in a fully natural situation vs. in a researcher-induced situation. object of research: The researcher observes the linguistic behaviour of others (external observation) versus the researchers or speakers observe their own behaviour (introspective observation). As far as introspective observation is concerned, we have already pointed to the problematic use of researcher-based introspection as the sole method of data collection (cf. Section 1.2.5). In contrast, participant-based introspection is often used in diary studies in which informants are required to document how they use language in certain contexts or situations. For instance, participants can be asked to document situations and words of refusal which can later be used in developing questionnaires with authentic items (Grein 2007). As participant-based introspection itself includes verbal instruction, it falls in between observation and survey. By asking for very specific information you are likely going to alter your participant’s awareness of it. Thus, the more detailed the instructions for introspective observation, the more likely the transition is from observation to survey.
49
50
r e se a r c h b a si c s
2.2.3
Designing and Conducting Observations
In conducting observations, it is important to be aware of participant and researcher attitudes towards each other. The aim to observe language behaviour with minimal interference conflicts with the reality of scientific investigation. This conflict is described in the observer’s paradox, introduced to sociolinguistics by William Labov (1972: 209): ‘The aim of linguistic research in the community must be to find out how people talk when they are not being systematically observed; yet we can only obtain this data by systematic observation’. In other words, the mere presence of a linguist can change the (linguistic) behaviour of speakers, thereby potentially changing the empirical outcome. Participants may adjust their language to meet normative expectations (i.e., societal forces that define ‘good’ or standard language) and therefore refrain from using linguistic forms they might use in situations without the researcher’s presence (Sampson 2001). Or speakers may use a linguistic form only in conversations not involving the researcher as a speech act participant (Everett 2001). Importantly, variants of the observer’s paradox are present in the case of any research method, though manifested in different ways. In surveys, participants may unconsciously form their response based on normative or societal considerations or simply based on whether they like or dislike the interviewer. In experiments, they may feel less comfortable knowing that their performance in an experiment is constantly being monitored, or they may respond to the experimental task in unforeseen ways. While the observer’s paradox clearly revolves around participant attitudes, researcher attitudes can lead to two kinds of personal bias (cf. Chapter 1.1.5). The first one is the observer bias and pertains to expectations that researchers can develop because of their awareness of the hypothesis being tested in an empirical study (Cordaro & Ison 1963; cf. 2.3.3 for comparable researcher-based errors in surveys). These expectations may shape how the experimenter interacts with participants and evaluates participant behaviour or responses, subsequently affecting the overall distribution of results. Finally, researcher bias pertains to the distinction between an emic and an etic perspective during data analysis (Pike 19672). Emic perspective means an ‘insider’s view’ on the collected data. For example, such a view may be developed when a researcher becomes a member of a speaker community to study their language use (e.g., via participant observation, cf. 2.2.2). In this case, researchers have an internal view on language use and structure because they are part of the linguistic and cultural group, in addition to the external academic perspective. In contrast, an etic perspective means that researchers have an external view on the data only; they are not a member of the speaker community being observed. Of course, evaluations on linguistic utterances can differ fundamentally depending on how researchers weigh emic and etic perspectives. For this reason, linguistic fieldwork, for instance, often enforces thorough self-reflection regarding the researchers’ standing in a speaker community and the possible pitfalls caused by what they do or do not know in terms of internal rules constraining language use (cf. Chapter 6 for examples).
Basic Research Methods for Data Collection
Beside the four basic aspects of a research design (cf. Section 2.1), an observational research plan must include the following list of items: • • • • • •
object of research: Who will you observe and do you have access to the relevant speaker community? How many speakers can you manage to observe? location and time: When and where do you collect your data? On how many occasions can you collect data? longitudinal or cross-sectional design type of observation data documentation: Do you need recording devices in addition to handwritten notes to collect data? Additional devices may intensify the observer’s paradox or may be problematic regarding research ethics. coding scheme (category system or observation units): What are the categories/variables that you need to capture and how many? What are the variable values/levels and are they defined to be precise, disjoint, and exhaustive? Are the categories directly observable or indirectly inferable?
The coding scheme is the core of your observational study. The categories for observation (variable and their values/levels) can be deduced by reviewing the relevant literature for your research topic. Categories contain one or more observable units or items, which you will analyse in terms of their quality or quantity. It may be challenging to define precise, disjoint, and exhaustive values/levels (cf. Section 1.2.2), particularly for qualitative variables. In case no existing categories can be adopted from the literature, they need to be defined for the research project at hand by prior exploration. As for category definition, the categories and their values/levels need to be observable or, at least, indirectly inferable by observation. Otherwise, you will need a method that prompts participants to provide the relevant information such as surveys. We strongly recommend to pre-test your coding scheme. Ideally, such a pre-test should be conducted with several observers using the same coding scheme under identical situations. If they come to dissimilar results, you should adjust the coding scheme before actual data collection. Ill-defined categories for observation (e.g., non-exhaustiveness or inappropriateness due to a lack of emic understanding) are a major source of unreliable or invalid data alongside observer’s paradox and personal bias.
2.2.4
Advantages and Disadvantages of Observations
Observational methods have the following strengths. Specifically, they can be used: •
in an exploratory manner (which is not impossible with other methods, but less often applicable) in order, e.g., to obtain an idea of relevant variables in a new research field.
51
52
r e se a r c h b a si c s
•
to investigate latent variables and processes not consciously accessible for speakers. to circumvent possibly biased self-reports made by speakers in surveys or experiments. to study language in special populations (e.g., infants, indigenous people) who cannot be studied with surveys or experiments where response formats are defined beforehand. to study language in real-life communication, so allowing for a high degree of naturalness of the collected data. when the use of technical equipment and intrusions into a speaker’s environment is infeasible as, e.g., in the study of language behaviour in remote areas.
• • • •
Weaknesses of observational methods include: • •
•
•
They are time-consuming both in terms of data collection and analysis. Although they follow a research plan that is designed to be controllable, transparent, and reproducible, it may be difficult at times to follow the research plan thoroughly. For example, when informants behave in an unexpected way, the researcher needs to spontaneously adapt, and documentation of the relevant categories can become error-prone without accompanying video or audio recordings. Results are especially susceptible to personal bias. Therefore, observational methods require the researcher to be particularly aware of their cultural background when it comes to (i) data documentation (i.e., what is documented, what is left out) and (ii) interpretation of the observed patterns (i.e., what counts as usual or unusual linguistic behaviour and why). Hidden observation can be legally and morally problematic; it may therefore not always be feasible.
Now that you have learnt more about observation, proceed with exercises 2.1 to 2.3 in Section 2.7.
2.3
Survey
In this section, you will be introduced to the essentials of surveys: definition (Section 2.3.1), the different types of survey (Section 2.3.2), the importance of question formats and the impact of participant sampling, interviewer and respondent behaviour on data quality (Section 2.3.3), and specific strengths and weaknesses of carrying out a survey study (Section 2.3.4). Finally, the section includes an exercise/task that will support you in getting practical experience with the method of surveys.
Basic Research Methods for Data Collection
2.3.1
Definition
Surveys are a research method utilising communication between researcher and participants with the goal of collecting verbal information from the participants on a specific research topic. In linguistics, surveys are a common method to collect language data in a systematic way (called elicitation), such as in descriptive linguistics (cf. Chapter 4). In contrast to the observation of natural language behaviour, this method is better suited for getting information on seldom occurring phenomena. Apart from speakers’ language knowledge, extra- or para-linguistic information collected in surveys are speakers’ beliefs and attitudes (e.g., whether a speaker prefers the British or American variety of English), their behaviours (e.g., how often a multilingual speaker uses one of their languages during the day), and (nonobservable) demographic characteristics such as age or membership in a sociocultural speaker community. The communicative situation is externally triggered in the sense that the researcher determines the topic and the formats in which the participants are supposed to respond. Thus, survey methods contrast with observational methods (cf. Section 2.2) in that they restrict the range of investigated variables by directing the communication situation of the speakers, and in that participants are typically aware of the survey taking place. However, the range of variables need not be as reduced as is the case in an experimental approach (cf. Section 2.4), and surveys are also not as tied to a specific infrastructure or research equipment, as is often the case for experiments. Surveys are employed for quantitative and qualitative research to an almost equal degree. Crucially, the classification as quantitative vs. qualitative survey research is strongly linked to what type of survey method is implemented (Section 2.3.2) and it depends on the degree of standardisation of possible responses (Section 2.3.3). Also, the quantitative vs. qualitative approach in a survey study determines to a certain extent how research participants are referred to. In more qualitatively oriented research (e.g., interviews), participants are often called informants/interlocutors, while in more quantitatively oriented research (e.g., questionnaires), they are called respondents.
2.3.2
Types of Survey
Surveys are classified according to two related criteria: • •
modality, distinguishing between surveys as written questionnaires vs. oral interviews – this distinction goes hand in hand with absence vs. presence of the researcher during data collection level of standardisation in questions: a. closed-ended/standardised (response categories are provided by the researcher) vs. open-ended/ non-standardised (the kind of response is up to the interviewee); b. predetermined order of questions vs. flexible order
53
54
r e se a r c h b a si c s
Survey methods
Questionnaire
Interview
(written administration)
(oral administration)
paper-and-pencil format
Interview
telephone format
structured
web-based format
semi-structured
Focus groups
non-structured
Tendency for closed-ended questions
Quantitative approach
Tendency for open-ended questions
Qualitative approach
Figure 2.3 Types of survey
There is a very strong tendency that surveys with closed-ended questions are administered in written form, whereas surveys with open-ended questions are administered in oral form. Also, the decision to use predominantly closed-ended or open-ended questions leads to a quantitative or qualitative research approach, respectively. Figure 2.3 shows the resulting classification with the two basic types of survey and their subtypes: •
Questionnaires are administered in written form and work predominantly with closed-ended questions. Consequently, the kind and number of questions is fixed as is the order in which they are submitted to the respondents. They are the primary means of conducting quantitative survey research. There is no further distinction other than how questionnaires are submitted to participants. Paperand-pencil formats are easy to use, but require direct access to participants or else have to be mailed to them. In telephone surveys, the interviewer simply reads out the questions that the respondent then answers. While telephone surveys may be quicker in collecting data, it may not be possible to use all the materials necessary to make the question items fully comprehensible (i.e., visual information). Therefore, some complex questions may not be properly addressable
Basic Research Methods for Data Collection
•
in a telephone survey. Web-based formats usually refer to either electronic copies of paper-and-pencil questionnaires sent by e-mail or to survey platforms on the web. The latter are advantageous as the data of each participant in the online survey are automatically recorded in an easy-to-export format, thereby saving time during the initial stages of data processing. However, web-based and physical mail (and to some extent telephone) formats have two major disadvantages. First, researchers do not know the circumstances under which participants fill out the questionnaire, i.e., are they attentive, do they follow the order of questions (in paper-and-pencil questionnaires), and do they answer the questions without the support or influence of others. Second, the response rate is typically lower than when questionnaires are distributed personally. In webbased and mail formats, the response rate tends to be about 20–25 per cent and therefore researchers often work with benefits or monetary compensation as a means of motivation. Interviews are administered in oral form and work predominantly with open-ended questions. They are the primary means of doing qualitative survey research. Interviews are further distinguished based on whether they take place face-to-face with the interviewer and one informant/interlocutor at a time, or whether a group of informants/interlocutors discuss a topic on their own, while the interviewer records the discussion (focus groups). The use of focus groups limits researcher interference with the communication situation as the researcher only chooses the topic of the conversation. Consequently, focus groups are often used in qualitative-exploratory research, when further aspects of a research topic or even possible values/levels of a variable are to be discovered. Face-to-face interviews can take on many formats and mixing different subtypes is common. In an expert interview, your informant holds a privileged position informing you about your research topic. This may be the case, for instance, for language teachers who have privileged access to learners’ issues and developmental acquisition processes. The other types of interview differently regulate how much the interviewer guides the informant through the interview, regardless of whether or not the informant is an expert. In structured interviews, the researcher strictly follows a predetermined list of questions. Such a straightforward question-answer format is predominantly used in problemoriented research. Semi-structured interviews, instead, allow for discussions, longer answers, follow-up questions, and a variable order of going through the questions. The interviewer develops a list of (usually open-ended) questions as a guideline, is flexible in reacting to the informant’s answers but makes sure that all questions are answered. In nonstructured interviews, the informants/interlocutors
55
56
r e se a r c h b a si c s
are asked to talk about whatever seems to be important to them. In this way, they determine the thematic focus of the interview. The researcher intervenes as little as possible and only asks questions to motivate the informants/interlocutors to continue their narrative or to follow up on certain aspects mentioned by the informants/interlocutors. Such an open interview format is generally used in explorative research to determine topics and aspects that are relevant to the respondents. The transition between interviews and questionnaires is fluid. This becomes apparent in cases in which the researchers ask participants to fill out paper-andpencil questionnaires in their presence for a better response rate or in which they interview participants following a set of questions with fixed order and wording for better comparability. 2.3.3
Designing and Conducting Surveys
Besides the four basic aspects of research design (cf. Section 2.1), the following aspects need to be considered carefully: • •
question format, including kind, wording, number and order of questions control of sampling and nonsampling errors
Let us first consider the types of questions you can use and, particularly, the difference between closed-ended and open-ended question formats that are the major determinant of whether it is quantitative vs. qualitative survey research: •
closed-ended questions: They can be used when the range of possible answers to a question is known. For instance, when asking for linguistic behaviour within the time frame of one week, one can easily determine possible response categories based on day counts: In a week, how often do you use your minority language in a public space? ☐ daily ☐ 6–5 days ☐ 4–3 days ☐ 2–1 days ☐ never
Hence, the researcher is responsible for the choice and wording of the possible responses. •
open-ended questions: They can be used when one cannot determine the range of possible responses with certainty. For instance, when asking for personal attitudes towards or experiences with dialects, the respondent alone determines the wording of the response: What benefits do you see in using your minority language/dialect in public space? _____________________________________________________
Basic Research Methods for Data Collection
If necessary, one can combine an open-ended response category such as ‘other: __________’ with an otherwise closed-ended format. The main advantage of closed-ended question formats is their inherent standardisation, i.e., each respondent is presented with the same set of response categories to choose from. This, in turn, facilitates statistical analysis as you can directly (and with quite a high level of certainty) compare response categories. Closed-ended questions formats may also lower prestige or self-deception biases (see below) with sensitive questions, as participants cannot freely choose their response. A major disadvantage of closed-ended question formats is that the response categories the researcher chooses do not necessarily mirror the mental or sociocultural categories the individual respondent may have. In the abovementioned example for closed-ended questions, for instance, respondents may want to differentiate between six to five days, but they cannot do this due to the fixed-response categories. Clearly, an open-ended question format may be better suited to fully capture the response categories the respondent wishes to express. In surveys with open-ended questions, it is even more important that the question wording is clear and does not confuse the informant, as there are no specified answer options that are supportive in understanding the aim of the questions. Open-ended questions are superior for more exploratory designs or non-quantitative research. A drawback of open-ended questions is that the responses can be repetitious or irrelevant for the current research question. In addition, results may be more difficult to analyse because the researcher needs to assign uniform codes to different types of responses (across participants) for analysis, which is more challenging when there is a higher degree of variation in the data or the data are less easy to interpret. Finally, closed-ended questions can take several specific forms: • • •
alternative responses (single choice) vs. multiple responses (multiple choice): The respondent is allowed to give exactly one vs. several responses to a question. binary vs. polytomous categories: There are two vs. more than two response categories to choose between. ranking vs. rating/scaled response: With ranking questions, participants pick one category from a list of alternatives (e.g., ‘Do you prefer English, French, or German?’), and with rating questions, they judge each member of the list separately on a scale (e.g., ‘How much do you like English/French/German on a scale from 1 – very much to 7 – not at all?’). Scales with an uneven number of response categories have a potentially ambiguous mid-point (indecisive vs. deliberate choice), while scales with an even number of response categories induce the participants to make a forced choice towards one of the two poles. Ranking and rating responses can, to some extent, also apply to open-ended questions (e.g., ‘Which language do you prefer?’ or ‘How much do you like your native language?’).
57
58
r e se a r c h b a si c s
The preparation of questions is central to any survey project and should closely match your research question and hypothesis. Regarding the wording of questions, it is crucial to be as precise and clear as possible and to avoid ambiguity, particularly in surveys (questionnaires) in which the researcher is not present to explain the question in case of misunderstandings. Furthermore, the questions should not be suggestive or embarrass the participants. Regarding number and order of questions, it is important to avoid unnecessary questions, as participants generally do not participate attentively in surveys for longer than about 20–30 minutes in one session. In order to avoid unwanted carryover effects between questions, the order of questions is important. As a rule, more general or open-ended questions should precede more specific or closed-ended questions. For instance, the question ‘How would you address an audience in class or at a conference?’ needs to asked before the question ‘What do think about gendering?’ in order that the topic gendering is not particularly reflected while answering the first question. In general, it is advisable to pre-test kind, wording, number, and order of questions with a small group of participants who are not taking part in the main study. In what follows, we will briefly discuss the main problems for survey validity, survey errors. There are two main types of survey errors that may affect data quality: •
•
sampling error: This impacts all methods of data collection, but is most intensively discussed in survey research. The major concern is to carefully select a sampling procedure (sampling criteria and sample size, cf. Section 1.2.3) in order to keep the sampling error as small as possible. Recall from Section 1.2.3 that a sample should be representative in terms of the characteristics of the basic set/ population that are relevant to the research question and that sampling procedure are more or less likely to be representative. In qualitative studies, it is especially crucial to work with a sample of participants who have representative characteristics of the population, as the number of participants in relation to the entire population is generally smaller than in quantitative studies. In the latter, increasing the sample size (given adequate sampling criteria) reduces the sampling error, because the quantitative estimate (e.g., the mean and the associated standard deviation) of a larger sample is more likely to come closer to the true estimate of the population. nonsampling error: that pertains to participant and researcher behaviour: This comes in many forms (Biemer & Lyberg 2003) and can affect quantitative and qualitative research alike. Respondent-based errors include nonresponse and self-selection, i.e., situations in which the representativeness of your sample can be corrupted depending on who does or does not agree to participate in the study. In surveys, nonresponse error is present when a
Basic Research Methods for Data Collection
significant proportion of contacted persons do not reply to your survey call or do not answer the entire set of question items in the survey. Self-selection bias emerges especially with mail-based formats when only a specific subset of respondents replies, because they are, e.g., particularly interested in the survey topic. This can skew the overall results (Albert & Koster 2002). Imagine you sent out a questionnaire to speakers of a minority language asking them to rate their social standing depending on whether they use the minority or majority language in public. It may be the case that only those speakers who are particularly sensitive to this issue – either positively or negatively – respond to the survey and this may have an impact on whether the overall attitude measured will be positive or negative. Also, such a subset may show characteristics that are not representative of the population to which you want to generalise your results. For example, the subset of speakers may fall within a certain age range or exhibit a particular socioeconomic status so that the generalisability of the results is further lowered. In addition, respondents may also lower the accuracy of survey results if they are susceptible to one of the following response biases based on self-presentation (Wagner 2010): – prestige bias: Participants choose answers that will make them appear higher on a perceived prestige scale. – self-deception bias: Participants choose answers conforming to how they would like their own behaviour to be. – acquiescence bias: Participants choose answers to please the researcher. As for researcher-based errors, they can occur at all stages of the research process (cf. Section 1.1.3). During the operationalisation stage, question formats may be worded so that they do not reflect the phenomenon they are supposed to investigate, thereby reducing internal construct validity (specification error). During data collection, interviewer effects are probably the most prominent source of skewed results. These effects describe situations in which interviewers’ personal characteristics or behaviour influence responses given by interviewees (cf. Section 2.2.3 on the observer’s paradox, observer and researcher biases). On the one hand, interviewer effects can arise from social relations when, for example, gender differences come into play or when researchers aim to interview speakers of another sociocultural group they are not part of and, therefore, may not be aware of the code of conduct (e.g., conversational practices). Other interviewer effects can be traced back to the researcher’s strategies during the interview. The Rutledge Effect in the Linguistic Atlas of the Gulf States (LAGS) is an example of how interviewer expectations and resulting suggestive questions led to significantly skewed results in the distribution of some grammatical phenomena (Bailey & Tillery 1999, 2004). Intensive interviewer training,
59
r e se a r c h b a si c s
60
standardisation of questions, the employment of more than one interviewer (if possible), or separate analyses of the data from multiple interviewers are common recommendations to mitigate interviewer effects. Finally, errors during data processing may affect data quality when coding is erroneous, as, for instance, with illegible handwriting, false transcription of audio recordings, etc.
2.3.4
Advantages and Disadvantages of Surveys
Survey methods have the following strengths: •
Written questionnaires can be more efficient to collect large amounts of data within a relatively short period of time. Surveys can collect data on psychological constructs (e.g., attitudes, beliefs) that may only be indirectly observable with observational methods. Surveys can be used in qualitative and quantitative research designs as well as for explorative and hypothesis-testing studies. As such, they can combine flexibly with one of the other main methods for data collection.
• •
Weaknesses of surveys include: • • •
For surveys sent out via (e)mail or distributed on online platforms, the situation in which participants answer survey questions is rarely controllable for the researcher. The setting of data collection is less natural than with observational methods. Surveys are associated with a number of researcher-based and respondent-based confounds, some of which may be hard to control for (e.g., response rate).
Now that you have learnt more about surveys, proceed with exercise 2.4 in Section 2.7.
2.4
Experiment
In this section, you will be introduced to the essentials of experiments: definition (Section 2.4.1), the different types of experiments (Section 2.4.2), the experimental design including appropriate tasks and stimuli and the requirements of internal vs. external ecological validity (Section 2.4.3), and specific strengths and weaknesses of carrying out an experimental study (Section 2.4.4). Finally, the section includes an exercise/task that will support you in getting practical experience with the design of experiments.
Basic Research Methods for Data Collection
2.4.1
61
Definition
In an experiment, the researcher systematically varies or manipulates the levels of at least one independent variable in order to observe its influence on at least one dependent variable. Potential confounding variables are controlled for by the researcher, and the setting for data collection is held constant as much as possible. The goal of experimental research is to identify a cause-effect relationship between independent and dependent variables. Because such relationships are especially difficult to isolate in the humanities and social sciences (Tanner 20022), experiments in the language sciences often investigate correlation between variables, which is a necessary precondition to causation (cf. Section 1.2.1). Experiments are the primary means for testing a hypothesised relationship between dependent and independent variables, and, therefore, follow a highly reductionist approach to isolate the critical variables from real-life communication and to identify possible confounding factors that the researcher, ideally, excludes from the experimental setting. Experiments are typically conducted with quantitative research designs. 2.4.2
Types of Experiment
Experimental types are defined based on how quality criteria (cf. Section 1.1.5) are integrated into the basic experimental design and the experimental setting. In Figure 2.4, you can see the types of experiment resulting Experimental setting
Laboratory experiment
External ecological or population validity (‘Naturalness’)
Experimental design
True experiment
Quasi-experimental Field experiment
Pre-experimental
Natural experiment
Figure 2.4 Types of experiment
Internal validity (‘Controllability’)
62
r e se a r c h b a si c s
from these two strands of the definition. Experimental designs differ regarding the degree to which the researcher has control over confounding variables to ensure high certainty in interpreting the relationship between dependent and independent variable (internal validity or ‘controllability’). Experimental settings differ regarding the degree to which the data generalise to further populations (external population validity) and natural situations (external ecological validity or ‘naturalness’). Depending on how and where experiments are conducted, one can achieve different degrees of controllability and naturalness. There are links between experimental design and the experimental setting in which they can be implemented (indicated by the lines in Figure 2.4). A first distinction can be made according to the research location or experimental setting: •
•
experiments in the laboratory: They yield higher internal validity, because in their highly controlled setting many confounding variables are controlled for, so that they cannot bias the experimental outcomes. However, the data are collected in a less natural situation, because speakers are removed from their natural environment and are presented with test materials (linguistic stimuli) and/or tasks (Section 2.4.3) constructed for the experimental purpose at hand. The findings may, therefore, be less easily generalisable to different speaker communities and to an individual’s language use in everyday life. field experiments: Field data are collected in a speaker’s natural sociocultural environment to avoid distortions resulting from dislocation and contact with other languages and cultures. Stimuli and tasks are usually more heavily adapted to meet local conditions and to familiarize participants with them. Such adaptations come along with methodological differences across recording sites and, thus, the minimisation of cross-linguistic comparability. By implication, high external ecological validity comes with a reduction of internal validity of the experimental design, because influences from confounding variables cannot be controlled for in an optimal way.
Lab experiments are typically conducted in psycho- and neurolinguistics (Chapters 7 and 8), whereas field experiments are carried out in cognitive linguistics, sociolinguistics or anthropological linguistics (Chapters 6 and 7). The experimental design employed in a study is the major source to determine further experimental types. Experimental designs are classified according to three design criteria (cf. Keppel 19913; Kirk 2003): •
randomisation: This means that every participant has an equal chance to be assigned to any level of the independent variable(s), so that the experimental results cannot be explained by individual differences that correlate with the independent variable(s). This is the hallmark of a valid experiment.
Basic Research Methods for Data Collection
•
•
replication: In a narrow sense, this describes that two or more participants are tested in the experiment. In a wider sense, replication makes an additional distinction as regards the number of tests for each participant. In a between-subjects design, a participant contributes exactly one observation, i.e., one measurement of the dependent variable obtained on one level of the independent variable. In a within-subjects or repeated-measures design, a participant contributes more than one observation obtained from all levels of the independent variable(s). control of confounding variables: This pertains to how confounds can be addressed in experimental studies (cf. Section 2.4.3).
Randomisation and control of confounds are the main criteria to define the four major types of experiments, some of which may have several subtypes depending on how they weight one of the design criteria over the others (see Salkind 2010 for an overview). •
•
true experiment (or interventional design): It typically investigates one independent variable and emphasises the randomisation criterion in requiring at least two participant groups with randomly selected participants: an experimental group and a control group. In this basic between-subjects design, the experimental group is exposed to one level of the independent variable (called treatment or intervention), while the control group is exposed to the other level (or no treatment level at all). This basic between-subjects design is often varied to yield specific subtypes, such as, e.g., the pre-post test measurements (Rost 20072; Salkind 2010; Abbuhl, Gass & Mackey 2013). That is, both groups are tested once before and once after the treatment, which allows the researcher to infer that a difference between pre- and posttest scores in the experimental group is most likely caused by the independent variable. In another basic subtype, the factorial design, researchers test for the influence of one independent variable in the presence of one or more further independent variables. Factorial designs often co-occur with a within-subjects or repeated-measures design as this combination reduces sample size and increases experimental power (i.e., the sensitivity to reveal significant changes induced by the independent variables). However, internal validity is compromised when confounding variables specific to repeated-measures designs are not controlled for (cf. Section 2.4.3). quasi-experiment: This is similar to the between-subjects design of a true experiment in comparing the responses of an experimental and a control group at different levels of the independent variable. However, for ethical or practical reasons, participants are not randomly assigned to one of the groups, so the influence of the independent variable may be confounded by characteristics of the
63
64
•
•
r e se a r c h b a si c s
participants. This may be the case, for example, when language use is bound to gender differences (e.g., female and male participants can only fall into one group), or when sociocultural reasons prohibit random assignment (e.g., when group membership cannot be anonymous and therefore may have negative personal consequences for some participants). pre-experimental designs: They deviate from randomisation of participants and from systematically using a control and experimental group. They usually do not include a control group, but test for changes in the dependent variable in only one experimental group which is assigned to only one level of the independent variable. Subtypes of this design may or may not measure the dependent variable before and after the treatment. For example, some only focus on the outcome after treatment (one-shot case studies) or compare post-test outcomes to another group that did not undergo randomisation and was not matched to the experimental group to control for possible a priori differences (Tanner 20022; Rost 20072). natural experiments: They are not experiments in the strict sense because the researcher does not voluntarily manipulate the independent variable. Instead, they take place after natural phenomena or historical incidences and, hence, are outside of proper experimental control. Natural experiments require the researcher to be present in the respective field, and research questions and variables are highly dependent on the context-specific setting. Thus, natural experiments are highly spontaneous and variable. An example is the emergence of Nicaraguan Sign Language in the 1980s, which was initiated by changes in the political government and associated reforms in the educational sector in Nicaragua (Emmorey 2001).
The latter three designs are sometimes referred to as weak experimental designs (Rost 20072), because they have poorer internal validity – compared with the strong design of true experiments – making it difficult to delineate a cause-and-effect relationship between dependent and independent variables. 2.4.3
Designing and Conducting Experiments
Besides the four basic aspects of a research design (cf. Section 2.1), the experimental research design must include the following list of items: • • • • •
preparation of linguistic stimuli/test materials experimental task experimental design, including control for confounds technical equipment for data recording (particularly in the lab) implementation of research hypotheses as statistical hypotheses to choose descriptive and inferential statistics
Basic Research Methods for Data Collection
We will now briefly discuss the first three items. For reasons of space, we will not elaborate on technical equipment, because most (laboratory) experiments require their own specific technical equipment with which you will need to familiarize yourselves. Also, we will exclude statistics here, but it is important that experimental design and statistical analysis go hand in hand. There are plenty of introductions on statistical design in experimental research in psychology and experimental linguistics (cf. Section 1.2.8) and we highly recommend that you familiarise yourself with statistical analysis as early as possible in case you want to carry out an experiment (cf. Section 1.4 for further readings). Let us begin with experimental stimuli/items, which are used to trigger a specific response in the participant. The preparation of (non-)linguistic stimuli is constrained by the experimental design: in order to adequately test a research question or hypothesis, the content of the stimuli must reflect or be associated with the levels of the independent variable(s) either alone or in combination with the experimental task (see below). Moreover, just as your participant sample should be representative of the speaker population of a language, the sample set of words, sentences, etc. is thought to be representative of the infinite population of utterances in a language (see Clark 1973 for analytical consequences). In many experimental approaches (especially in psycho- and neurolinguistics) stimuli are prepared so that they represent minimal pairs/contrasts (akin to minimal pairs in phonology), i.e., sets of words, sentences, etc. that are altered in one specific feature and/or one specific position to yield changes in meaning or structure, while all other properties are kept the same. For instance, in English, the word order variation in relative clauses (subject relative: ‘This is the president that insulted the reporter.’ vs. object relative: ‘This is the president that the reporter insulted.’) determines their meaning, with otherwise identical lexical meanings. Conceiving stimuli can introduce language-specific confounds which may bias the results just like participants, task, or experimental design. Stimuli that are constructed for one level of the independent variable may be more frequent than others, have ambiguous or additional meanings or be implausible compared to stimuli prepared for another level. Therefore, stimulus preparation needs to be carefully checked and corrected regarding such confounds (see Keating & Jegerski 2015). If correction is not possible, such caveats need to be addressed during statistical analysis or when interpreting the results. Minimal pairs allow inferences about whether the independent variable associated with particular stimuli also correlates with changes in the dependent variable. However, in some cases stimuli constructed as minimal pairs result in artificial utterances and therefore affect ecological and external validity. For example, psycho- and neurolinguistic experiments often employ violation paradigms, in which grammatically licensed structures (e.g., a phrase such as the jogger) are contrasted with ungrammatical ones (e.g., a phrase-structure violation such as *the went), which rarely occur in natural language use (without a subsequent correction by the speaker).
65
66
r e se a r c h b a si c s
Another unique characteristic of linguistic experiments is the experimental task that serves two functions: it ensures that participants are attentively engaged with language during the experiment so that the stimuli can actually trigger a measurable response. And, the task specifies or restricts the way in which participants should respond to the stimuli so that the response delivers the kind of measurement that is central to the research hypothesis. Suppose you want to study how speakers memorise words, you need to use a task that engages memory processes. If you target a particular linguistic domain such as semantics or syntax, the task – in addition to the stimuli – should facilitate responses that will reveal variation in this domain. In the above example for relative clause interpretation, for instance, counting phonemes would be a useless task if you were interested in whether people find object relative clauses more difficult to understand than subject relative clauses. Tasks can target purely linguistic information or processes such as comprehension or production tasks, or they can target a mixture of linguistic and other cognitive processes such as memory or categorisation-based tasks. A detailed list of task types is provided in Table 7.2 in Chapter 7 on cognitive linguistics and psycholinguistics. The choice of task is critical as it may introduce confounds, hence affecting validity and reliability of the experimental outcome. The task should be appropriate for the sample, research question and hypothesis and it should be feasible in the experimental setting in the laboratory or in the field. There are two types of task-induced confounds: •
•
task demands: A task may be too difficult or too easy for the participants, so they will either perform very poorly or at a chance level (floor effects) or almost perfectly (ceiling effects) – neither case would reveal meaningful differences between levels of the independent variable(s). task effects: A task may not target the linguistic knowledge or process that is of interest to the research question, or it may simply be difficult to generalise the task-specific results to language use in everyday life. Even the wording of the task instruction may lead to changes in the experimental results. For example, in cognitivelinguistic experiments participants react quite differently depending on whether one instructs them to respond ‘as quickly as possible’ or ‘as accurately as possible’ (the speed-accuracy trade-off phenomenon). In the first case, they may make more errors in their speedy responses, while in the latter they may appear unduly slow, but more accurate overall. Unless one intends to focus on either processing speed or processing accuracy (cf. Section 7.3.3 for definition), the instruction should not use words referring to speed or accuracy, or it should emphasise both.
Sometimes a pre-test of the task (instruction) is necessary in order to avoid the above task-specific confounds.
Basic Research Methods for Data Collection
Finally, researchers need to control for confounding variables in their experiment (cf. Section 1.2.2.), as confounds are considered serious threats to validity, especially internal validity (see Campbell & Stanley 1963; Shadish, Cook & Campbell 2002 for overviews). Generally speaking, confounds especially impacting on internal validity are introduced by: • • • • •
participants the researcher experimental task and stimuli specifics of data collection the basic experimental design
The following strategies are feasible to control for confounds introduced by the first four of the above sources of confounding: •
•
•
•
randomisation or random variation: Random assignment of participants to levels of the independent variable(s) is thought to control for confounds induced by individual differences in, e.g., age, gender, or cognitive abilities. The same rationale is possible for linguistic stimuli. For example, if researchers are interested in word properties other than lexical frequency (e.g., morphological structure, lexical meaning), they may choose to randomly sample as many words with varying frequency estimates as possible. Whatever the effects they find in response to the specific word properties, they should be independent of the well-known effect that lexical frequency has on word recognition (e.g., Brysbaert et al. 2011). keep the confounding variable constant: The confounding variable is held at a fixed level. For example, neurolinguistic studies usually sample from the population of right-handers, because they represent the majority of humans and because left-handers show different brain activation patterns which may bias the overall result (e.g., Knecht et al. 2000). Confounds in data collection can be minimised by keeping the recording site identical for each experimental session. blocking or matching procedures: This ensures that the levels of the confounding variable are evenly distributed across participant groups and/or linguistic stimuli (Kirk 2003; Myers, Well & Lorch 2010; Abbuhl, Gass & Mackey 2013). For example, in studies on developmental language disorders in children, the control group (e.g., typically developing children) is matched with the experimental group (atypically developing children) regarding age or cognitive ability. systematic variation: The confounding variable is treated as an independent variable and also enters statistical analysis (Myers, Well & Lorch 2010; Sassenhagen & Alday 2016). For example, with this procedure, researchers can investigate task-specific confounds by including task variation as an independent variable.
67
68
r e se a r c h b a si c s
•
exclude the confounding variable from the experimental setting: This procedure applies especially to confounds that are very unlikely to be of interest as a possible independent variable. blind protocols: This only applies to researcher-based confounds, particularly observer bias (Section 2.2.3). All the people involved in collecting the data are unaware of the research question and hypotheses.
•
There are two additional confounds that come with a within-subjects or repeated-measures design: •
carryover effects: Participants may react differently to a level of the independent variable depending on the order in which the levels of the independent variable(s) are presented. (This is different from priming where the impact of item presentation order is systematically tested, cf. Sections 7.3.3 and 7.3.5). order effects: When participants are tested several times in the course of an experiment, they may become more practiced or show signs of fatigue, which may alter their experimental performance, sometimes leading to ceiling or floor effects.
•
These specific confounds can be accounted for by counterbalancing, that is each participant is tested in a different order of the levels and in some cases with only a subset of the stimuli representing a particular level (for different counterbalancing procedures, see Myers, Well & Lorch 2010; Abbuhl, Gass & Mackey 2013). 2.4.4
Advantages and Disadvantages of Experiments
Experimental approaches have the following strengths: • • • •
Experiments in principle allow for the investigation of cause-andeffect relationships between two variables, which none of the other methods for data collection can do. They follow a reductionist approach, making it possible to study local linguistic features in a very detailed manner. With a high level of internal validity, the relationship between two variables can be inferred from the data with high certainty. Experiments allow the researcher to study linguistic knowledge of which a speaker is not consciously aware. Weaknesses of experiments include:
• •
Experiments require high control over the experimental setting, so they cannot be conducted outside of the lab in many cases. Experimental approaches with a high level of internal validity are often artificial in terms of the kind of stimuli presented and the
Basic Research Methods for Data Collection
•
situation in which the participants are tested, thereby lowering external ecological and population validity. Lab experiments often sample students, which limits generalisability to the speaker population of a language (low external population validity; Henrich, Heine & Norenzayan 2010; Rad, Martingano & Ginges 2018).
Now that you have learnt more about experiments, proceed with exercise 2.5 in Section 2.7.
2.5
Mixed-Methods Design
Throughout Chapter 2, we have emphasised that in empirical linguistics, the major question to ask yourself is whether your research is explorative or more problem-oriented, whether you adopt a qualitative or quantitative approach, and whether you value the naturalness of the data more than controllability of variables or vice versa. Your answers to these questions in part determine your choice of a method for data collection (see also Gilquin & Gries 2009; Krug, Schlüter & Rosenbach 2013). If your research question clearly falls within either a qualitative vs. quantitative or an exploratory vs. hypothesis-testing research approach, then a single-method design may be sufficient. More complex or comprehensive research questions, instead, are investigated by use of a mixedmethods design. In this section, you will be introduced to the essentials of mixed-methods designs: definition (Section 2.5.1), the different types of mixed-methods approaches (Section 2.5.2), and specific strengths and weaknesses of carrying out mixed-methods studies (Section 2.5.3). 2.5.1
Definition
Mixed-methods designs are defined as ‘a procedure for collecting, analyzing, and mixing quantitative and qualitative data at some stage of the research process within a single study in order to understand a research problem more completely’ (Ivankova & Creswell 2009: 137). Qualitative and quantitative research should be combined systematically or, ideally, be integrated with one another at the stage of research design, analysis or interpretation (Tashakkori & Creswell 2007; Ivankova & Creswell 2009; Agouri 2010). Validation of research findings from single-method designs is repeatedly mentioned as a core function of mixed-methods approaches. The discussion of converging, diverging, or complementary findings provides the basis to elaborate the description and explanation of a multifaceted research problem (Aguado 2014; cf. Sections 9.2 and 9.3). For instance, observational methods emphasise the naturalness of data (i.e., external ecological validity) at the expense of internal validity and
69
70
r e se a r c h b a si c s
generalisability of the results, while the exact opposite holds for experimental methods. Here, mixed-methods designs allow the researcher to compensate for the respective methodical limitations in order to increase validity and reliability of the findings (see Creswell & Plano Clark 2017 for examples). Moreover, mixed-methods approaches are advantageous when single-method designs provide only insufficient data to fully answer a research question. Under these circumstances, mixed-methods approaches are not simply complementary, but in fact represent an independent approach for studying a research phenomenon comprehensively. This latter type of design is associated with its own research questions and hypotheses. The top panel of Figure 2.5 takes up the idea from Section 2.1 that the prototypical usages of observational, survey, and experimental methods can be positioned on a continuum from qualitative to quantitative research. Qualitative data collection and analysis comes along with when you opt for observational and interview methods, quantitative data collection and analysis go hand in hand with questionnaire and experiment. Figure 2.5 adds to this that qualitative datasets include many variables to study, while quantitative datasets generally focus on fewer variables. Hence, the richness of parameters declines from qualitative to quantitative research. However, some research questions Single-method designs
Qualitative data and analysis
Quantitative data and analysis
Observation
Experiment
Mixed-methods designs
Qualitative data and analysis
Quantitative data and analysis
Exploratory design
Explanatory design
Triangulation design
Figure 2.5 Single-method and mixed-methods approaches
Basic Research Methods for Data Collection
necessitate the systematic combination of both qualitative and quantitative approaches, because they can only be properly addressed when investigated from different perspectives. In such cases, mixed-methods designs are in order. Before explaining the different types of mixed-method design (Section 2.5.2), we want to note that mixed-methods design are currently more common in applied linguistics (language teaching, second language acquisition) than in the subdisciplines of empirical linguistics that we will introduce in Section II of this book. Nevertheless, these subdisciplines have a long history of systematically combining several methods for data collection. For instance, observation and interviews or elicitation are used conjointly in linguistic fieldwork in documentary/descriptive linguistics or anthropological linguistics, while experiments, corpus data, and surveys are systematically combined in cognitive linguistics and psycholinguistics (cf. an example of multi-method combinations to examine frequency effects in Section 7.4). These multi-method combinations pave the way for an increase in applying mixed-methods approaches in these fields. We will discuss the additional complexity in data interpretation that arises with the use of multi-method or mixed-methods designs in Section 9.2.
2.5.2
Types of Mixed-Methods Approach
According to Ivankova and Creswell (2009: 138–139), types of mixed-methods design can be described according to the following classification criteria: • • •
timing, i.e., whether quantitative and qualitative data collections are carried out sequentially (one after the other) or concurrently, weighting, i.e., whether one data type is prioritised or both types are equally important, mixing, i.e., whether the two data types are integrated during data collection, analysis, or interpretation.
From these criteria, Creswell and colleagues (Ivankova & Creswell 2009; Creswell & Plano Clark 2017) derive the following major types, with the first three also exemplified in the bottom panel of Figure 2.5: •
explanatory design: Quantitative data is collected and analysed before qualitative data collection and analysis, which, in the final phase of data interpretation, is used to explain or clarify patterns in the quantitative data. Quantitative data is typically prioritised. Mixing occurs when conducting the qualitative research part (choice of method for data collection and sampling) and when interpreting both data sets conjointly. For example, a variationist sociolinguist is interested in when speakers use sub-standard varietal expressions of their native language, and first quantifies the use of varietal expressions in dialogues by means of a survey. The researcher then
71
72
•
•
•
r e se a r c h b a si c s
conducts interviews to elucidate further contextual restrictions on the use of varietal forms for those participants who show a deviant usage pattern. This approach is useful, for instance, when you work with a small sample and need to back up the quantitative results with qualitative results. exploratory design: Qualitative data is collected and analysed before quantitative data collection and analysis. In a first exploratory stage, qualitative data is used to develop variables or coding schemes that are then tested for their applicability in the quantitative stage of data collection and analysis. Qualitative data is typically prioritised. Mixing occurs when developing the variables/coding schemes for the quantitative research and when assessing the variables/coding schemes derived from the first phase by means of the quantitative results. For example, a researcher is interested in the use of kinship terms in an as yet undescribed language and therefore interviews a native speaker of that language by using an ethnographic interview. In order to generalise their findings from that one speaker to the population, the researcher administers an (oral) questionnaire or conducts an experiment with other natives on how they categorise kinship terms. triangulation (or convergent) design: Qualitative and quantitative data are collected concurrently from the same sample and on the same topic (see also Aguado 2014). Weighting is fully flexible. Mixing occurs during data analysis (including conversion of one data type into the other) or interpretation of the results. For example, a researcher is interested in code switching and provides bilinguals with a questionnaire including closed-ended and open-ended questions. There are short passages with blanks in which the bilinguals are supposed to fill in words in one of their languages. For each blank, the researcher asks their participants to reflect on (and write down) their motivation to choose that word. At the end, the researcher uses open-ended questions to collect data on the speakers’ attitude towards code switching in general. embedded design: One data type is nested within the other in order to answer a secondary research question. The main research question and method can be clearly described as either qualitative or quantitative, hence the main approach is prioritised. Datasets can be collected in sequence or in parallel. Mixing occurs during data analysis (for datasets collected simultaneously) or during interpretation of research findings (for sequential datasets). For example, a researcher is interested in code switching and therefore asks bilinguals to keep a diary over the course of a week in which they note all instances of code switching. To account for possible confounding differences in language proficiency, the researcher collects scores on proficiency tests for language production and comprehension at the beginning of their study.
Basic Research Methods for Data Collection
2.5.3
Advantages and Disadvantages of Mixed-Methods Approaches
Mixed-methods approaches have the following strengths: •
They allow the investigation of a research topic with complementary methods, enabling the compensation of a single method’s shortcomings regarding validity or reliability. They allow the researcher to investigate a complex research problem from more than one perspective.
•
Weaknesses of mixed-methods approaches include: •
They are more complex than single-method designs in terms of data collection and analysis, thus, requiring more resources (researchers with particular skills, funding etc.).
Due to the complexity of the mixed-method approach, we recommend that beginners start by gaining experience with single-method studies first.
2.6
Summary
The three basic methods for data collection, namely observation, survey, and experiment, are used to a different extent in exploratory vs. problem-oriented research, studies with a naturalistic vs. reductionist approach, and studies with a qualitative vs. quantitative design. While observation and (particularly unstructured) interviews are the central methods of explorative research, experiments, and questionnaires (particularly with closed-ended question formats) are used in more problem-oriented research focusing on a reduced number of aspects. In linguistics, observation can produce a wealth of data about people’s linguistic behaviour in diverse natural settings. Surveys, instead, are useful to gain data focusing on specific aspects regarding people’s linguistic knowledge, their attitudes and mental states, and their demographic and sociocultural characteristics. In experiments, confounding variables are controlled in order to discover causal relationships between single variables. Such a strongly reductionist method, however, comes at the expense of naturalness. Thus, the different methods of data collection serve to answer distinct kinds of research questions. Mixed-methods research is the systematic combination of multiple qualitative and quantitative methods to investigate a single research question or topic comprehensively. The limitations of single methods are to be overcome by other methods that supplement the findings. The following chapters (3–8) on research in distinct linguistic subdisciplines will show you which particular methods and combinations of methods have been developed to get answers on different aspects of language. In Chapter 9, we will finally illustrate how the subdisciplines with their specific research questions,
73
74
r e se a r c h b a si c s
methods, and findings build on one another, and thus contribute a multitude of aspects to a comprehensive understanding of language in its complexity.
2.7
Exercises and Assignments
Exercises for students that can be included during a session on basic research methods for data collection or as part of project work (cross-references in the text indicate the thematic context for each exercise): 2.1
2.2
2.3
2.4
2.5
2.8
Read Miner (1956) and discuss the observational description of ‘Nacirema’. Who is the object of observation and what does the author want to illustrate with his description? Conduct an observation of about 30 minutes at the market place or at any other public space in your town and write a protocol. What was your observational focus? Which coding scheme did you use and what problems did you face? Conduct an observation sitting in a public place with a newspaper which has a big hole in its centre. Observe the people through this hole and collect information on how you feel in this situation, how people react on you, and what impact these two parameters have on the observation. Prepare a survey on bi-/multilingualism. Choose whether your survey will be administered as a semi-structured interview with open-ended questions or as a questionnaire with closed-ended questions. Find 10–15 questions and recruit about 5 bi-/multilinguals as respondents. Discuss your experience of designing and conducting the survey in class. Find a research paper with an experimental approach and summarize the main components of the experimental design, i.e., hypothesis, method for data collection, experimental task, possible confounds, etc. Discuss to what extent the study meets the quality criteria of reliability, internal and external validity, and how this influences the experimental results.
Further Reading
There are many excellent introductions for each of the single methods and for mixed-methods approaches, also in neighbouring disciplines such as sociology and ethnography (observation and survey) or psychology and cognitive science (survey and experiment). A comprehensive encyclopedia of research design is Salkind 2010. Bakeman & Quera 2011 provide a comprehensive introduction to quantitative analysis of data from observational studies and on
Basic Research Methods for Data Collection
how to design a coding scheme. Qualitative approaches with observational and survey methods are described in Beer 2008 and Schlehe 2008. Porst 20144, Moosbrugger & Kevala 2012, Daase, Hinrichs & Settinieri 2014, Visser, Krosnick & Lavrakas 2000 and Rea & Parker 20144 are easy introductions to survey research and provide valuable information on developing question formats. Biemer & Lyberg 2003 provide a comprehensive guide to survey quality. Kirk 2003, Myers, Well & Lorch 2010, and Shadish, Cook & Campbell 2002 provide good introductions to the different types of experimental design, but target somewhat advanced students. Gass 2010, Abbuhl, Gass & Mackey 2013, and Hussy, Schreier & Echterhoff 2013 are comprehensive introductions to experimental design for beginners. The article by Keating & Jegerski 2015 includes further details about stimuli preparation for linguistic experiments. The volumes edited by Settinieri et al. 2014, Paltridge & Phakiti 2010, and Podesva & Sharma 2013 provide a range of chapters about each of the methods presented here as well as further, more detailed information about data analysis. Creswell & Plano Clark 2017 provide a detailed recent introduction to mixed-methods approaches. Further typologies of and introductions to mixed-methods can be found in Agouri 2010, Aguado 2014, Byrman 2006, Doyle, Brady & Byrne 2009, Greene, Caracelli & Graham 1989, and Tashakkori & Teddlie 1998. Denzin 1970 remains a classic in mixed-methods research. Hashemi & Babaii 2013 provide a summary of mixed-methods studies in second language research.
75
PART II
Specific Research Approaches of Linguistic Subdisciplines
3
Language Documentation and Descriptive Linguistics
Language documentation – which is also known as documentary linguistics – and descriptive linguistics are two closely interrelated linguistic subfields concentrating on the collection and/or analysis of primary data for the purposes of documenting and describing languages. As these linguistic subfields generally address languages on which no or very few previous studies exist, research is based on linguistic fieldwork within the native-speaking community. This chapter provides information on the similarities, differences, and intersections between documentary and descriptive research, specifically concerning fundamental research aims and questions (Section 3.1), the distinct approaches (Section 3.2), and specific methodological issues, such as considerations on research objects, fieldwork, as well as techniques and procedures of data collection, editing, and analysis, such as transcription, annotation, and elicitation (Section 3.3). Finally, we will outline the basic outcomes of documentary and descriptive research in linguistics, namely text corpora, grammars, and dictionaries (Section 3.4). All in all, this chapter offers readers a broad overview of multiple empirical aspects and fundamental documentary and descriptive considerations. At the end of the chapter, the main points are summarised in Section 3.6, followed by exercises on individual methodological aspects and ideas for your own research project (Section 3.7), and suggestions for further in-depth readings on selected issues in language documentation and descriptive linguistics (Section 3.8).
3.1
Research Aims and Questions
The fundamental research aim of language documentation is the collection of primary language data. Ideally, it compiles language records of a broad range of representative natural written and oral texts. In order to preserve this data and to ensure future accessibility for other interested parties, the recordings must be edited (transcribed, translated, and annotated) and published. The basic research aim of language documentation is to record •
how native speakers behave linguistically, i.e., how they use language to express various facts and topics in different naturally occurring situations and contexts. 79
80
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Descriptive linguistics, on the other hand, has the main aim of the structural description of a language, ideally in all its aspects. This means that descriptive linguists have an analytical research focus, attempting to detect general structural patterns underlying the linguistic data. Subsequently, their fundamental research question is: •
How is a language structured, i.e., what are the underlying structural patterns of a certain language?
Ultimately, the overall aim of documentary and descriptive research is the advancement of knowledge on language and its diverse forms. They do this by providing language data and structural information on individual languages and language varieties which have not yet been studied (sufficiently). In order to ensure relevance of research outcomes for future (research) purposes, primary data collection as well as language description contain information on a broad range of linguistic areas. The transcription of audio- and video-recordings provides primary phonetic and phonological data and their annotation serves to secure morphological, syntactic, semantic, and pragmatic information. Pragmatic and discourse aspects are probably the most complex and challenging for language documentation as they require more metadata (e.g., contextual information, social characteristics of the speech act participants). Language descriptions typically focus on statements about phonetic, phonological, morphological, syntactic properties. Specific semantic and pragmatic aspects (such as word classes, focus, honorifics, or body-part terminologies) may also be included. Furthermore, the collection of lexical data may be an additional aim. Table 3.1 contains specific exemplary research questions per linguistic domain and cross-disciplinary field. In addition to the primary research aim of documenting and describing as many languages as possible in order to expand upon fundamental knowledge on language diversity as an empirical basis for further linguistic studies (typological research or research on language change, etc.), there may be secondary goals such as: •
•
•
preserving linguistic and cultural diversity (reclamation or revitalisation of endangered languages by developing teaching materials/ programs and/or writing systems for oral languages, etc.) – this objective is pursued especially by several international organisations promoting endangered language documentation such as UNESCO; acquiring linguistic and cultural knowledge in order to reach local people for political or religious purposes (e.g., missionary activities – some religious aid organisations combine missionary and educational work which includes language training); aiding local interests, i.e., the researcher works in service of indigenous issues, ideas, and projects (e.g., strengthening the status of their minority language).
If documentary or descriptive work is done by other parties, such secondary goals may actually be the main research aims. Linguists, however, focus on the
Language Documentation and Descriptive Linguistics
81
Table 3.1 Example research aims/questions in language documentation and descriptive linguistics Language documentation Linguistic domains: Phonetics & 1. Recordings of different speakers to gather sound data and capture the phonology range of variation within the speaker community (e.g., allophony) Morphology & 2. Segmentation of texts with regard to word and morpheme boundaries syntax 3. Recordings of different text types in order to capture a broad range of morpho-syntactic structures Lexicon & 4. Recordings of different topics in order to capture a broad range of lexical semantics items per semantic field Pragmatics & 5. Recordings of dialogical texts discourse 6. Metadata of the various speech act participants Cross-disciplinary fields: Language 7. Recordings of child speech (e.g., children of different age groups) and/or acquisition child-directed speech (i.e., linguistic interactions between parents or older siblings and young children) Language contact 8. Recordings of language varieties in multilingual areas and (if not yet available) of the contact languages Language change 9. Recordings of a language variety at different periods in time or of different speaker generations at one point in time Descriptive linguistics Linguistic domains: Phonetics & 10. What is the sound inventory of the language and what sounds make a phonology phonemic distinction? 11. Is it a tone language – if so, which tones are distinguished? 12. Does the language allow consonant clusters within syllables – if so, which combinations occur in which position? Morphology & 13. What kind(s) of word formation occur (e.g., kinds of affixation, syntax reduplication, composition)? 14. Does the language have an ergative or accusative case pattern? 15. How is ‘number’ marked (e.g., on the verb and/or the noun phrase)? Lexicon & 16. How is ‘kinship’ terminology structured? semantics 17. Which lexical items are used to express ‘emotions’? Pragmatics & 18. Does the language have linguistic means to express a difference in discourse status/rank – if so, which one(s)? Cross-disciplinary fields: Language 19. What linguistic structures are particularly complex and when/how are acquisition they acquired by children? Language contact 20. Are there features in the language indicating contact with other languages? Language change 21. Does the language under study have spatial prepositions derived from ‘body-part’ terminology (a typical grammaticalisation scenario)?
82
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
aforementioned primary aim of documentary and descriptive research. This does not exclude possible secondary goals per se, but if they interfere or conflict with the primary goal, they will (or should) be given low(er) priority status. Such issues are discussed further in Section 3.3.2 in the context of research ethics.
3.2
The Documentary and the Descriptive Approach
Based on Himmelmann (1998) and Lehmann (1999), who were the first to argue for establishing language documentation as a separate subdiscipline, the majority of researchers currently active in the field distinguish between a documentary and a descriptive approach. However, it is also clear that research in either discipline is closely interrelated, complementary, or even partially overlapping. Hence, it is difficult to draw a clear dividing line between language documentation and descriptive linguistics (e.g., Chelliah & de Reuse 2011). Consequently, we cover them both in one chapter. In the following, we outline the fundamental distinctions as well as the entire range of basic documentary and descriptive research activities so that, in the end, you can make your own empirical choices regarding research scope and methodological procedure. Typically, language documentation is mainly focussed on the collection of primary data of endangered languages prior to their extinction. This is to preserve texts of native speakers while the language is still in use. This kind of research activity includes recording oral texts and collecting written documents as well as editing this data (transcriptions, translations, and annotation) in order to preserve primary data and to make them accessible to others (not only linguists but also the speech community itself and researchers of other disciplines) for future purposes. Thus, the general outcomes of documentary linguistics are edited representative text corpora of endangered languages. The main idea motivating the preservation of primary data on endangered languages is that language death means the loss of information on linguistic diversity (cf. language typology) and the loss of culture-specific knowledge (cf. anthropological linguistics). Due to the urgent need of documenting numerous languages while it is still possible, not only linguists perform this work but also other specifically trained people. In descriptive linguistics, the focus is on the analysis of language data for the purpose of describing the underlying structural patterns – with a particular interest in as yet undocumented or undescribed languages (in connection with language documentation) but more generally also any kind of language. The analytic work includes distributional tests on formal parameters and the analysis of semantic properties – based on systematic elicitation and/or on text corpora (self-collected or collected by others). The research outcomes are typically comprehensive grammars, which are intended for a linguistic audience (e.g., typologists), and possibly supplementary dictionaries. While the fundamental research aims, methods, and outcomes of language documentation and descriptive linguistics differ, they both have a similar research object: undocumented, undescribed, or even endangered languages
Language Documentation and Descriptive Linguistics
83
or language varieties. These are generally minor languages of non-Western societies which have not yet been in the focus of linguistic research (i.e., barely studied or unstudied languages). If no previously collected primary data is available, documentary as well as descriptive research are generally based on linguistic fieldwork within the native speaking community. Furthermore, both linguistic subdisciplines obviously constitute two ‘complementary activities that can cross-fertilise one another’ (Austin 2010: 12). First, previously collected primary language data provides the empirical foundation for every structural analysis. And second, most editing tasks (such as transcription, including segmentation) of primary language data involve some sort of analytical considerations (Himmelmann 1998: 162–163). In sum, the joint performance of descriptive and linguistic research results in the most comprehensive representation of a language or language variety – including a broad text corpus, a grammar based on corpus and elicitation data, and a dictionary. Table 3.2 provides an overview of documentary and descriptive research, more precisely the entire range of options with respect to single aspects (researcher, research outcome, etc.), including the most typical or common procedure. Table 3.2 Documentary and descriptive research WHO? (researcher)
WHAT? (research outcome) WHAT? (object of research) WHOM? (informants) WHERE? (research location)
HOW? (research techniques)
FOR WHOM? (data users)
• primarily linguists but also non-linguists (trained documenters) • typically non-native speakers of the studied language but also native speakers (introspection and/or the work with relatives and friends as informants) • ideally a three-parter consisting of an edited text collection, a dictionary, and a comprehensive grammar but also publications on single aspects (cf. Section 3.4) • as yet unstudied or barely studied languages or language varieties, i.e., primarily minor (endangered) languages but also varieties of pluricentric major languages • competent native speakers without or even with linguistic knowledge (cf. Section 3.3.2) • typically in the field (i.e., in the natural native environment of the native speakers) but also outside the native environment (e.g., the work with migrated native speakers) • the usual field are non-western locations that are difficult to access but western sub-settings are possible, too (cf. Section 3.3.2) • ideally the compilation and editing (i.e., transcription, translation, and annotation) of a representative text corpus, including (monolingual or bilingual) elicitation and corpus analysis (e.g., distributional tests, structural and semantic feature analysis) but also just one or more of these techniques on their own (cf. Section 3.3.3) for linguists and researchers of other disciplines but also for the • language community itself (e.g., the development of language teaching materials) and international interest groups (e.g., the preservation of language- and culture-specific knowledge)
84
3.3
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Methodology
Documentary and descriptive research consists of several fundamental methodological components, which will be presented in the following section: first, the selection of a language or language variety that serves as a research topic (Section 3.3.1); then, aspects of descriptive and documentary fieldwork (Section 3.3.2); and, finally, techniques of collecting, editing, and analysing primary language data (Section 3.3.3), i.e., methodological procedures for compiling an edited text corpus such as the recording of different text types, including the transcription, translation, and annotation of these texts (Section 3.3.3.1), and/or procedures for writing a grammar and a dictionary such as elicitation (Section 3.3.3.2). In the following, we will present and discuss different options and considerations regarding these central aspects of documentary and descriptive research in order to provide readers with an understanding of underlying methodological considerations regarding documentary and descriptive studies and a guideline for setting up their own empirical projects. 3.3.1
Objects of Research: Selecting a Language
Basically, every language or language variety can become the object of research in documentary and especially descriptive studies. The research focus, however, is clearly on barely studied or unstudied languages or language varieties, i.e., languages or language varieties which have not yet been documented or described. Language documentation even narrows the choice down to endangered languages. In general, different kinds of research objects can be distinguished – according to their specifics (minor languages or varieties of major languages, oral vs. written languages, endangered languages, etc.), their current state of research (e.g., undocumented and/or undescribed languages), and their area of occurrence. Research considerations with regard to several different kinds of languages are presented below: •
undescribed & undocumented languages or language varieties: It is assumed that about two-thirds of the estimated 5,000–7,000 languages of the world are still unstudied, i.e., there are no published records of any kind. Even though this means that there are numerous potential research objects yet to be covered, it is quite challenging to locate them. As they have not yet been in the focus of linguistic research, it is generally not an option to search for them in databases such as Ethnologue (Grimes 1997/Simons & Fennig 201720) or the World Atlas of Language Structures (Dryer & Haspelmath 2013) or in linguistic literature in general – obviously even more the case with regard to languages that are not even known to exist. Occasionally, one can find mention in linguistic studies of neighbouring languages, anthropological studies on the region, documents of regional
Language Documentation and Descriptive Linguistics
•
•
administrations, or sociopolitical organisations. This means that their existence has already been confirmed and that they possibly occur in lists (e.g., listings of the languages of the world or calls from documentary research institutions for prospective projects). Generally, hot spot areas of language documentation and description are remote or difficultly accessible areas (e.g., Papua or the Amazon region) or areas with dominant major languages (Siberia, India, Latin America, Australia, etc.). Documented but undescribed languages or language varieties, i.e., those with edited language records but no or little descriptive work done on them, can be best traced in larger archives of language-documentation projects (e.g., for endangered languages, see below). However, it is not uncommon that linguists not only document a language but also publish analytical findings themselves. minor languages or language varieties vs. varieties of major languages: The typical research topic in documentary and descriptive linguistics is minor unstudied languages, i.e., languages with small language groups (sometimes even less than 1,000 speakers). Around 95% of the world’s presently existing languages are spoken by only 5% of its population. Most of these minor languages are located in Africa, Asia, the Pacific, and also Latin America. The world’s major languages, each with 100 million native speakers or more (such as English, Spanish, Arabic, Portuguese, Hindi, French, Russian, and Mandarin Chinese), are generally well documented and well described (cf. Chapter 5: large ready-made corpora). But other than the standard variety or standard varieties, there are numerous varieties that are not or are less extensively documented and described. This is reflected in more recent research on non-dominant (minor) varieties of pluricentric major languages (e.g., World Englishes). Such non-dominant varieties can typically be found in areas in which one of the major languages plays a central role, albeit in co-existence with another language or other languages (i.e., situations of language contact). While minor languages tend to be endangered, such new contact varieties are constantly evolving. oral vs. written languages: Many minor languages have oral traditions and no writing system or a writing system was only implemented relatively recently. Thus, there are no or relatively few written documents of such languages. In contrast to documentary and descriptive studies on written languages, this means more work for researchers. The great majority of, if not all, texts are based on oral records which require time-consuming transcriptions prior to subsequent editing tasks. Furthermore, working with oral languages means that there is often no standard
85
86
•
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
variety. In this case, it is possible to document or describe one specific variety only, whereas capturing the language as a whole represents a challenge due to the dialectal variation. endangered languages: Endangered or moribund languages are languages that are at risk of becoming extinct (i.e., the last speakers are dying out or are shifting to other languages). Due to imminent language death, endangered or moribund languages carry a sense of urgency with regard to their documentation and description while primary data is still available. Therefore, these languages are the substantial object of documentary research in particular. Several institutional programs (e.g., the ‘Endangered Languages Documentation Programme’ at SOAS/ London, the ‘DoBeS’ project of the Volkswagen foundation, the ‘Documenting Endangered languages’ program of the National Science Foundation, etc.) pursue the documentation of endangered languages in order to preserve linguistic diversity and culture-specific knowledge as encoded in each language (i.e., unique means of communication and instruments of conveying cultural identity). Language endangerment is a gradual phenomenon ranging from potential to serious endangerment. The risk of extinction can be captured by the following criteria of endangerment: – low number of speakers in absolute and relative numbers, especially younger ones – a sign of no/declining language transmission to infants of the next generation – little prestige and benefit of the language (community attitudes, language policies, etc.): e.g., a low level of identification with the language, no media or school education in this language (poor opportunities & perspectives), decreasing numbers of contexts/ domains of use in daily life (restricted communicative functions) – no writing system (no written documents) and a high degree of variability (no standard variety) Many languages have already gone extinct (particularly in Australia and the Americas) and numerous others are expected to follow within the near future. It is estimated that by the end of this century about half of the world’s current languages may disappear as a result of neocolonialism, globalisation, Westernisation, urbanisation, etc. Larger numbers of endangered languages are located in the area of Central and Eastern Siberia, Northern Australia, Central America, and the Northwest Pacific Plateau. They are listed in Ethnologue (Grimes 1997/Simons & Fennig 201720) and institutional databases for endangered language documentation, e.g., the ‘Endangered Languages Project’ administered by the University of Hawai‘i or the UNESCO ‘Atlas of the World’s Languages in danger’ (Moseley 2010).
Language Documentation and Descriptive Linguistics
Once the research object has been chosen, it is beneficial to search for all kinds of published or unpublished data on the language and its speakers as well as information on neighbouring languages and the respective language family prior to the actual data collection period in the field.
3.3.2
Documentary and Descriptive Fieldwork
Research on undocumented and undescribed or even endangered languages is generally based on the collection of primary language data. The resulting necessity to work with native speakers means that the researchers either find native speakers in the researcher’s local environment (e.g., the ‘Endangered Language Alliance’ works with native speakers living in the city of New York) or that they have to move to the native speakers’ community. The disadvantage of the first option is that native speakers are generally outside their natural and cultural surroundings and, hence, are in permanent contact with another language which usually affects their language behaviour. Therefore, if you do not wish to study a variety of migrants, linguistic fieldwork within the native speakers’ community is strongly advised. Once a research destination is determined, several aspects and considerations are crucial for successful linguistic fieldwork. These include: •
•
practical & methodological preparations prior to field stay: The majority of documentary and descriptive field sites are located in comparatively remote or difficult-to-access areas within non-Western contexts. This means that time in the field has to be carefully prepared for, not only from a thematic and methodological point of view, but also practical aspects should be considered (see Table 3.3). Alongside consulting published information on the field site (e.g., advises released by the Foreign Office, recommendations in travel guides, and scholarly literature such as ethnographies), it may be helpful to contact other documentary and descriptive linguists and/or researchers of other disciplines with the same regional focus to learn from their practical as well as methodological experience in the field. On the way to the field site or during the initial phase on-site, it is recommended to visit local archives, libraries, and museums (generally situated in larger cities), to get in touch with local organisations (e.g., churches, schools, and development aid associations) and to meet colleagues of the country in question. These are important institutions in gaining access to additional information. awareness of typical challenges in the field: On-site, fieldwork requires good organisation which includes the permanent reconsideration and reworking of previous research steps and results as well as the development of further plans. This is of central importance as it is extremely difficult if not impossible to
87
88
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Table 3.3 Checklist for fieldwork preparation Practical preparations
Methodological considerations
- How can the field site be reached and what is a feasible and culturally appropriate way to gain access to the local community (e.g., conveyance tickets, passport and visa, and first contacts on-site)? - Where is it possible to stay (e.g., in tourist accommodations or host families)? - What does the political and medical situation look like (e.g., travel and security advices, travel health insurance, travel pharmacy, and necessary vaccinations)? - Which important items have to be brought for the trip (for daily life and for research), what is available on-site and which items cannot or should not be used (e.g., no or limited access to power for the use of electronic devices)? - Are research permissions required (e.g., from local authorities or tribal leaders on-site, and/or from research organisations in the home country)? - What are suitable presents in return for hospitality and research support? - Which methodological procedures and techniques are suitable for achieving the research goal (cf. Section 3.3.3)? - Which technical equipment is needed (e.g., recording devices), and am I familiar with its use, and does it work? - How much time in the field will be necessary? - What is already known about the chosen language, its language family, and the neighbouring languages, including a lingua franca or a second language of the speakers (e.g., published data, and unpublished information local archives, cf. Section 3.3.1)? - What does it mean to conduct fieldwork in a culturally unfamiliar context (e.g., acculturation processes and psychological challenges such as a culture shock, and the local behaviour vis-à-vis foreigners, cf. Section 6.3.1)?
gather missing data after the conclusion of the field period. Nevertheless, and despite the importance of plans in the field, it is equally essential to remain flexible and open to unexpected and unplanned opportunities. Quite often, apparently reliable options (sources of information, specific methodologies, etc.) do not work out and suddenly other options arise. Therefore, time and patience are important requirements and sometimes important insights come from pure coincidences. A key factor of documentary and descriptive fieldwork is to find informants with the required language skills (i.e., competent and proficient native speakers) who are willing and allowed to share their knowledge. Particularly in the case of languages with low prestige, appropriate speakers may not make themselves known (cf. Harrison & Anderson 2008: the Chulym in Siberia). In other cases, people may
Language Documentation and Descriptive Linguistics
•
pretend to have the required language skills – this presents a challenge to the researcher of determining incorrect and, thus, unusable data (cf. Harrison & Anderson 2008: the Kallawaya in Bolivia). Then again, people may not want to share their knowledge on certain genres which are related to secret or sacred practices, and so forth. Furthermore, working with endangered languages means that there are only a few speakers left. As they are generally older people, the researcher may have to deal with hearing impairment and unclear pronunciation. Finally, a culturally appropriate way of compensating the informants has to be found. Generally, returning services and offering locally popular items (including money) as a gift or a payment have proven feasible. Local practices or payments may serve as a reference. In order to not offend people or cause interpersonal discontent, it is particularly important to consider groupinternal structures. For this reason, it is recommended to compensate not only (main) research participants but also the entire language community (e.g., by the support of village activities). For the collection and evaluation of good data, emic as well as etic considerations are necessary. Typological knowledge, for instance, is extremely helpful for descriptive research in particular (cf. Section 3.3.3.2). It provides an overview of the overall range of language structures and how they can be described cross-linguistically (cf. Chapter 4) – this allows for the classification of language-specific data from an etic perspective. Conversely, the acquisition of a culture/ language-specific perspective provides important emic information such as the classification of indigenous text types/genres in documentary research (cf. Section 3.3.3.1), the self-demarcation towards other language varieties, and the impact of language-specific communicative practices (with regard to knowledge sharing or politeness strategies) on data quality. Missing emic awareness results easily in insensitive behaviour on the part of the researcher, inappropriate methods of data collection, and misinterpretation of the data. In some societies, for instance, it may not be customary or even appropriate to correct false or imperfect statements (particularly of honourable persons, e.g., Western foreigners), and thus, elicitation by target language manipulation (cf. Section 3.3.3.2) easily fails and provides invalid data. Depending on the research scope, the duration of field research varies, but generally a period of 9 to 12 months is recommended. In less time, it is difficult or even unfeasible to gain access to the community, to learn about culture-specific ideas and practices, and to collect the requisite linguistic data. fieldwork ethics: Working within a native speaking community requires appropriate behaviour on the part of the researcher. Therefore, documentary and
89
90
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
descriptive linguists adhere to several research ethical principles. The basic responsibility of the fieldworker with regard to the host community is, of course, to respect the local people and their emic way of life via adequate behaviour. This includes first and foremost that research is only possible in places where the researchers and their work are welcomed, accepted, or at least tolerated. Even if the research is supported, it may be the case that individual informants or even the entire language community wish for restrictions regarding the use or publication of the provided data – these wishes and rights of local sources should be respected, even at the detriment of the researcher’s scientific interests. Furthermore, language documentation in particular emphasises the return of knowledge to the indigenous language group. As a sign of respect and gratitude for the local knowledge that the people shared with the researcher during the field period, the researcher is supposed to bring back research outcomes (e.g., audio data, edited text collections, descriptions, and/ or other language materials) to the indigenous language group. However, the kind and extent of involvement in or promotion of local activities is a very sensitive issue. Linguistic fieldwork is based on the interaction with the local community in order to study the local language, which generally includes its acquisition (at least to a certain extent) and the acquisition of culture-specific language-relevant knowledge. However, it is debateable to what extent the researchers may pursue secondary goals, such as the development of a writing system or language teaching materials/concepts, possibly even combined with missionary purposes (cf. Section 3.1). Although some organisations may explicitly promote one or many of these aims, missionary goals are generally seen as being more controversial compared to activities of language preservation. Nevertheless, even the implementation of revitalisation measures depends on several considerations. In general, it is definitely more legitimate if an issue arose from the request of a native speaker than if it is based on an impulse from the researcher (with or without subsequent local interest). In the first case, the project may still entail unconsidered or even unintended consequences (e.g., the empowerment of previously minor groups or individuals); in the latter case, negative local attitudes towards the researcher’s project generally result in failure. Similarly, it is not uncontroversial to what extent local people may instrumentalise the researcher for their own purposes. If it is feasible within the research framework, it may be acceptable to react to some explicit local requests (such as the desire for word lists or dictionaries). But what about other requests which are in conflict with academic standards and morals (e.g., native ideas of structuring a dictionary or describing a linguistic phenomenon, and the local
Language Documentation and Descriptive Linguistics
policy on dialectal variation and language standards)? In numerous cases, there is no surefire way to deal with these ethical issues. It is, however, crucial to be aware of these topics in order to be able to consider and evaluate the options and consequences deliberately. Even purely documentary and/or descriptive research without any secondary goals will leave traces in the field – no matter how careful the researcher proceeds. For instance, the purest of research interest in minority or endangered languages generally results in its increased appreciation by the language community (cf. Harrison & Anderson 2008: the Chulym in Siberia). All in all, linguistic fieldwork varies along the following parameters: –
–
–
–
the language of interaction with informants: In monolingual fieldwork, the main language of interaction is the language that constitutes the research topic (i.e., the informants’ native language that the researcher does not know prior to fieldwork). If the researcher and the informant use a shared language (i.e., not necessarily the native language of either the researcher or the informant but a language they both know), it is a bilingual fieldwork. the number & kind of informants: It is possible to work with individual main informant(s) or with larger groups of informants. Generally, all informants are native speakers with language competence (i.e., a high degree of language proficiency regarding the particular language, language variety or genre) as well as availability and suitability as a consultant (i.e., willingness and talent). Research should never be based on only one informant, this informant perhaps even being the researcher (i.e., pure introspection). In order to capture the entire range of language-internal variation, informants with different sociocultural characteristics (sex, age, status, etc.) should be consulted. If informants have additional linguistic knowledge (i.e., structural language information), this allows for a survey of metadata in addition to the collection of primary data. This may be helpful (e.g., for a faster understanding of the language structure) as well as problematic (e.g., as misleading information or cognitively reflected, unnatural primary data). the number & affiliation of the researcher(s): Fieldwork may be conducted by individual linguists or by research teams (either projects from interdisciplinary collaborations or just groups of linguists). Working in teams requires good organisation of who contributes what. If a researcher is already familiar with the field site and/or the language group, the fieldwork period may be shorter due to faster access. the field site: Field research may be conducted in rural vs. urban areas (which entails different levels of infrastructure and access to organisations/
91
92
–
3.3.3
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
institution and thus special groups of informants), in familiar vs. less familiar contexts (which entails different levels of contact to the community prior to research), in peaceful vs. dangerous surroundings (with different levels of interpersonal conflicts and thus more or less delicate access to informants), etc. the research focus (documentary vs. descriptive fieldwork) and the specific techniques of data acquisition (see below). Techniques and Procedures of Data Collection and Analysis
Documentary and descriptive research consists of several procedures and methodological techniques that are described in the following: how to compile an edited text corpus (Section 3.3.3.1), a grammar, and/or a supplementary dictionary (Section 3.3.3.2). While an edited text collection of natural primary data is achieved by recording and editing (i.e., transcribing, translating, and annotating) of texts, grammars and additional dictionaries are generally based on elicited data. Elicitation is data collection driven by analytical considerations. Another methodological approach to grammar and dictionary writing is the analysis of a previously compiled text corpus. Consequently, only compiling existing written documents without editing is not regarded as documentary research, just as descriptive studies should not only be based on introspection. Interacting with native speakers for data collection and analysis is a crucial requirement of documentary and descriptive research.
How to Compile an Edited Text Corpus In order to compile an edited text collection the following methodological steps are necessary:
3.3.3.1
a.
audio- and/or video-recording of a broad range of different text types: A text is a naturally occurring oral or written language sequence and different text types (or genres) are defined by their channel, communicative purpose, and other situational characteristics such as formal vs. informal register, spoken vs. written modality, medium (radio, internet, print media, etc.), spontaneous/real-time vs. planned/edited text, speaker- and addressee-specific texts (e.g., language of children vs. adults vs. speakers with language disorders, child- vs. adult- vs. authority-directed speech), private vs. public setting, topic, etc. In order to create a representative record of the linguistic practices of a speech community, the corpus should include as many diverse text varieties as possible. Due to language-specific peculiarities, it is probably impossible to establish a cross-linguistically useful typology of text types (narratives, conversations, songs, etc.). Thus, a suitable classification of text types which are representative of a particular language is based on emic categories (e.g., Trobriand
Language Documentation and Descriptive Linguistics
b.
c.
d.
Islanders’ ways of speaking, Senft 2010). For direct comparison, it is even an option to include the same texts in different genres (e.g., narratives and their edited versions, Mosel 2014). transcription of the audio-/video-material (cf. Section 1.2.7): Generally, language documenters provide a phonetic or a phonemic transcription based on IPA. If the language has a conventionalised writing system, an orthographical/grapheme-based transcription is an alternative or additional option. While a phonetic or phonemic transcription contains valuable information on articulatory characteristics or distinctive sound/tone features, an orthographic transcription offers better usability of the data for the speech community itself. Especially for a phonetic/phonemic transcription, some prior training is strongly recommended. Furthermore, each kind of transcription may include or exclude the encoding of prosodic and/or paralinguistic features (such as pauses or laughing). translation of the text into a widely known language (cf. Section 1.2.7): In order to bring the text corpus to a broad audience, it needs to be translated into a language that is widely known by the interested parties, i.e., a language of academic relevance (such as English) and/or an important language in the region/country of the speakers (such as a lingua franca or an official national language). The overall aim of the translation is to provide an idea of a text’s meaning as relevant for analysis. Therefore, a free translation that does not need to meet the standards of a professional translation is sufficient. Additionally or alternatively, a more literal translation (i.e., a translation which is not necessarily idiomatic in the target language but closer to the structure of the source language) may be provided if it serves a purpose (i.e., in otherwise misleading instances). Generally, it is common practice to translate on a sentence level, but it is also possible to choose smaller or even larger units (e.g., to add literal translations only for certain phrases). annotation, generally, an interlinear morphemic glossing (cf. Section 1.2.7): In order to capture as much information as possible for all kinds of further data use, annotations are desirable on as many distinct linguistic levels as possible. The most basic kind of annotation is interlinear morphemic glossing. It can be supplemented by annotations regarding word classes, semantic role, syntactic function, discourse function, level of politeness and respect, etc. Annotation includes the segmentation of the text into respective linguistic units – in the case of an interlinear morphemic glossing primarily into morphemes but also words, intonation units, or sentences. With little or no structural knowledge on an as yet unfamiliar and undescribed language, this can be quite challenging. For helpful advice see, for instance, Himmelmann (2006).
93
94
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
e.
addition of relevant metadata: Metadata are relevant details for the interpretation and analysis of the linguistic data. They include information on text type, the speaker(s) and other speech act participants (sex, age, status, level of education, etc.), the context in which the text was recorded, and so forth.
While step one is the actual data collection, all further steps pertain to editing the collected data. This additional work is crucial to making the raw data (i.e., the unedited texts) meaningful and useful for those who do not know the language. Annotation (and even phonetic transcription) can even be regarded as data analysis on a basic level. It is important to bear in mind that not only data recording is part of the field period. Collaborating with native speakers in particular is also necessary for translating and annotating texts. It is recommended to do this work with native speakers other than those recorded as this offers an opportunity for the verification of text quality. Ultimately, metadata can only be gained in the situation and context of recording. While some information is observable (e.g., the place of recording), other information has to be elicited (e.g., the social relationship or status of the speech act participants vis-à-vis each other). Software tools such as ELAN and/or TOOLBOX are helpful instruments commonly used in documentary linguistics. ELAN, for instance, allows the compilation of an edited corpus consisting of audio-/video-text files with multiple related information on separate tiers (three-tiered format: a transcription, an annotation, and a translation tier). First, the transcription is time-aligned to the sound file and then the translation is linked to the transcription tier (segmented per sentence). Finally, an annotation tier with interlinear morphemic glossing is linked to the transcription tier (this time, segmented per morpheme). Further annotation tiers containing information on phrasal or clause structure, etc., may be added (segmented accordingly). Figure 3.1 illustrates this ELAN format based on an example from Mosel (2014: 137). The entire corpus consists of distinct text files of this kind. For referential purposes, each file should be labelled unambiguously – abbreviations indicating the text source and information on the text type have proven to be useful for further classification. In addition, each sentence within a file is numbered consecutively. Additional metadata concerning the entire corpus (i.e., information on its content and structure, the language, the collector, the methodological procedure, etc.) as well as text file-specific metadata (i.e., information on speaker, date and location, setting and circumstances of data collection, text type and content, social status of all speech act participants, etc.) need to be captured separately. ELAN allows for simple as well as multilayer searches (i.e., complex searches on more than one tier) with the query language ‘regular expression’. This is an important aspect for a subsequent corpus-based linguistic analysis. TOOLBOX,
Language Documentation and Descriptive Linguistics
95
sound file referential code
transcription tiers translation tier morphological segmentation annotation tier: interlinear morpheme glossing
Figure 3.1 Example of an ELAN file
in particular, facilitates the automatic generation of a dictionary and can be used in combination with ELAN. The software tool ELAN is freely available on the MPI homepage (https://tla.mpi.nl/tools/tla-tools/elan/) and TOOLBOX on the SIL homepage (https://software.sil.org/toolbox/).
How to Write a Grammar and a Supplementary Dictionary Descriptive linguists traditionally use elicitation techniques in order to gather data for grammatical descriptions and supplementary dictionaries. The term elicitation is used in different ways throughout the linguistic literature, which makes a specific definition of our use necessary. By elicitation, we mean the collection of data from native speakers that is systematically controlled by the fieldworker. In this sense, it should be distinguished from the collection or recording of naturally occurring oral texts for edited text corpora. As elicitation aims at linguistic descriptions, it represents a combination of data collection and analysis: a systematic inquiry of language data guided by analytical considerations. There are several elicitation techniques and methodological procedures that can be differentiated according to two fundamental criteria:
3.3.3.2
a.
the language of inquiry: If the researcher and the informant(s) can draw on a joint language (such as a lingua franca of the region), elicitation can be conducted bilingually (i.e., by use of translation). Otherwise, monolingual elicitation is the only option. In order to avoid translation problems, to reduce the impact of contact phenomena, and to achieve more immersion in the target language, one could even argue for monolingual elicitation as the method of choice in situations in which a joint
96
b.
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
language is available (e.g., Everett 2001). However, it is important to take into account that monolingual elicitation is generally more time-consuming because the researcher has to learn to speak the language – at least at a certain average level. the researcher’s guideline: Chelliah & de Reuse (2011: 361, 367) distinguish between schedulecontrolled elicitation (i.e., elicitation guided by a questionnaire) and analysis-controlled elicitation (i.e., elicitation guided by analysis). It is also possible or even advisable to have a schedule to fall back onto if the basic procedure is analysis-controlled. Generally, analysis-controlled elicitation is more challenging, especially for unexperienced researchers.
In detail, the basic techniques of elicitation are (e.g., Bowern 2008: 77–84; Chelliah & de Reuse 2011: 361–381): • •
•
• •
translation of words, phrases, sentences, or other linguistic sequences into the target language (i.e., the language to be described) back-translation from the target language – this technique is often very difficult for the informants due to imperfect or insufficient language skills in the language to be translated into, and, thus, the data is problematic for analysis. However, back-translation subsequent to translation (with different informants) is a useful technique to check the previously collected data naming of items or description of situations, actions, etc. as shown or performed by the researcher, including or excluding stimulus tools (e.g., items, pictures, videos) – this technique is primarily used in monolingual research and in the course of progressive target language knowledge, the researcher can include target language interrogation target language interrogation, i.e., questions such as ‘What do you do?’, ‘What is x?’, ‘Where is x?’, are asked in the target language target language manipulation, i.e., the informant is asked to react on target language data, for instance: acceptability judgements (e.g., of sentences with varying word order of the same items), correction, substitution of individual elements (e.g., words or constituents of a sentence), creation of words/phrases/sentences/texts with particular elements, or completion of sentences, etc.
Regardless of the technique(s) used, we recommend starting with the elicitation of single lexical items and then moving from simple sentences (i.e., basic construction types) to more complex sentences and phrase-internal structures. Another option is to incorporate research on single lexical items into morpho-syntactic elicitation. Either way, the researcher gathers lexical, semantic,
Language Documentation and Descriptive Linguistics
phonological, morphological, syntactic, and pragmatic information on the target language. While lexical and semantic information is primarily gained through the elicitation of single lexical items per semantic fields (i.e., wordlist elicitation using translational and/or picture-prompted techniques), morphological and syntactic data, including data on the lexicon-grammar interface (information on word classes, morpho-syntactic frames, etc.) are gathered by elicitation of simple and complex phrases and sentences. For pragmatic and discourse analyses (e.g., data on information structure), even longer linguistic sequences need to be elicited. Phonological sound analyses are primarily based on single lexical items, but for information on stress and tone, the elicitation of phrases and sentences is necessary as well. Depending on the linguistic area or the specific topic, different questions, tools, or linguistic prompts are used: e.g., differently coloured items for the elicitation of colour terminologies, questions such as ‘Where is X?’/’How does one get from Y to Z?’ to gather data on spatial descriptions, simple intransitive and transitive sentences with the same two arguments occurring as subject of an intransitive sentence (S) and as subject (A) and object (O) of a transitive sentence to get information on potential ergativity, items with different characteristics (objects of different shapes, items differing in animacy, etc.) for the elicitation of classifier or class-based systems, etc. In any case, it is extremely important to elicit minimal pairs, i.e., data that only differs in the specific aspect of interest at that time (e.g., number: 3Sg-m vs. 3Dl-m vs. 3Tr-m vs. 3Pau-m). Otherwise, linguistic entities and properties cannot be clearly identified (e.g., on the basis of comparing 3Sg-f vs. 1Pl-m, it is not clear if the distinction relates to person, number, or gender). Typological knowledge (i.e., familiarity with the diversity of linguistic structures in the world’s languages; cf. Chapter 4) is very helpful in the search for structural patterns, particularly in analysis-controlled elicitation without a questionnaire. Nevertheless, it is crucial not to press linguistic data into familiar categories but to stay open minded with regard to alternative findings. In general, it is important to constantly question one’s own (provisional) analysis. Based on new data, prior conclusions might turn out to be wrong or imprecise. Thus, it is necessary to ask for a number of comparatively similar (i.e., in some way redundant) data in order to gain confidence with regard to one’s analytical findings. As such elicitation sessions carry the risk of being tiring, exhausting, or even boring or annoying for the informants, the sessions should not be too long. Otherwise, the informant may provide imprecise or even incorrect data due to a lack of concentration or interest or they may not return for a further session. In order to spread the workload, to filter out speaker-specific particularities, and to check or double-check the data of individual informants, it is advisable to work with several native speakers. Apart from elicitation, language description can also be based on the analysis of an edited text collection (corpus analysis) – regardless of whether the researcher previously compiled and edited the text corpus him-/herself or
97
98
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
whether this work was done by others. The outcome is corpus-based grammars and dictionaries. The great advantage of this methodological procedure as opposed to elicitation-based grammars and dictionaries is that the data consists of naturally occurring language material. There are, however, disadvantages which include in the first place a more time-consuming procedure if the researcher has to start by building up the corpus themselves. Furthermore, rarely occurring linguistic phenomena and lexical items may not appear in text collections and can, thus, be better detected by systematic elicitation than by corpus analysis. In order to profit from both methodologies, it is possible to combine elicitation and corpus analysis by including elicited data as a separate text type into the text collection (as practiced, for instance, in some DoBeS projects). In this way, rare linguistic phenomena and lexical items are included in the analysis and it is also predictable whether or how frequent linguistic phenomena occur in different types of natural speech. This very detailed analysis, however, is based on a very labourextensive methodological approach in terms of data collection and its analysis.
3.4
Basic Research Findings
The most comprehensive outcome of documentary and descriptive research is a three-parter of an edited text corpus, a reference grammar (based on corpus analysis as well as elicitation data), and a supplementary dictionary. The achievement of this goal is generally a lifetime project for individual researchers or only achievable within years in larger teams, which becomes apparent if one considers the requirements for each of said outcomes. The following requirements are formulated for edited text collections in (endangered) language documentation (e.g., Himmelmann 1998; Woodbury 2011) – ideally, they are: •
•
extensive or even exhaustive – i.e., they need to comprise a large number of texts and a substantial total word count. To give you an idea, a documentary text collection of about 100,000–250,000 words represents a comprehensive corpus size. Such a number of words might seem relatively small compared to the size of the huge readymade corpora of the major languages, comprising millions of words (cf. Chapter 5). Considering the enormous editing effort, however, a single researcher can hardly provide any more. If the corpus consists of mainly oral genres, 30,000 words may even be more realistic for a three-year-long project conducted by a single researcher. representative – i.e., they need to include a broad range of diverse text types that are characteristic of the specific language (cf. Section 3.3.3.1). Furthermore, the texts of each genre need to be produced by as many speakers or writers as possible – with different characteristics (regarding sex, age, level of education, dialects, etc.). In this way, language-internal variation is captured. A more limited focus is
Language Documentation and Descriptive Linguistics
•
• • •
• •
•
basically possible, but then it has to be mentioned that the corpus is representative of this subgroup only (a language variety, children’s language, etc.). Most documentary language corpora only represent the language of a certain time and/or a certain region (i.e., the time and place of field research) – for diachronic data, multiple field research phases over a longer period are necessary. expandable – i.e., they need to provide the possibility of expansion over time (i.e., a monitor/dynamic corpus). This means that the same or even other documentary linguists should be able to add further texts (e.g., diachronic data, other varieties of a language, more text types and/or more texts per text type). opportunistic – i.e., they consist of texts that were collected for a specific task and/or in a specific situation, in this case the particular situation of documentary field research. portable – i.e., they need to be accessible and useable on demand by a broad audience in different places. comprehensible and transparent – i.e., they need to meet scientific and linguistic standards in a systematic manner (cf. the edited format as described in Section 3.3.3.1) so that others can understand and work with the primary data. preservable – i.e., they need to remain accessible in the future – especially in the case of endangered language documentation, they may be the only data available. ethical – i.e., they should only contain data that is authorised by the speakers/writers and the entire language group for particular purposes (private use vs. publication, etc.). The informants’ interests (e.g., the wish to redact texts or the prohibition to publish certain text types) may conflict with the principle of representativeness guiding the research. reliable – i.e., the primary data comes from reliable and competent informants/sources and the carefully edited information meets professional standards.
While the requirements of extensiveness, representativeness, and expandability aim at the exhaustive documentation of the entire language, comprehensibility, reliability, and ethicality ensure the data and research quality. Finally, portability and preservability ensure the permanent availability of the data for future purposes. This is achieved by its publication and archiving. With contemporary technology, it is common practice to store the data in digitalised form primarily in internet databases (e.g., the DoBeS archive). However, another option is the publication of the recorded data together with its edited information on data storage media or separately on printed media along with audio and video carriers. Generally, the advantages of digital archives are better access for users, the option to expand the text collection or to supplement editing information, better searchability for specific items, and the adaptation of the storage system to technological changes. The challenges are,
99
100
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
however, dealing with issues of access management (e.g., access restrictions to certain data, allowance to add data) taking legal and ethical issues into consideration in order to guarantee that the original primary data remain unchanged, etc. Likewise, several requirements exist for reference grammars (e.g., Payne 1997; Mosel 2006; Noonan 2006; Payne & Weber 2007). Most importantly they need to be: •
extensive or comprehensive – i.e., they need to comprise general information about the language and its speakers as well as a maximally broad range of structural aspects regarding all linguistic areas and their interfaces with an equally balanced degree of detailedness (e.g., Comrie & Smith 1977; Lehmann 1980; Mosel 1987, 2006): – general data: location of the language and its genetic relatedness, number of speakers, specific characteristic of the language community, dialectal variation, data sources of the grammar, etc. – phonetics/phonology: vowel and consonant inventory, syllable structure, prosodic features (such as stress or tone), etc.; and orthography – morphology: inflection vs. derivation, kinds of affixation and reduplication, other kinds of word formation (e.g., composition), etc. – syntax: simple sentence types (word order, nominal vs. verbal sentences, intransitive vs. transitive sentences, etc.), subordination and coordination, phrase structures (including kinds of adposition, phrasal elements and their order), negation, questions, comparison & equation, possession, passive or antipassive constructions, etc. – possibly semantics: particular semantic fields (such as kinship terminologies, body-part terminologies, numerals, etc.), word classes, etc. – possibly pragmatics: deictic expressions, honorifics, information structure (topic and focus encoding), etc. – possibly texts of particular types As a general rule, structural descriptions should not only include the presence but also the absence of individual features, and the conditions under which they occur.
•
transparent – i.e., they need to be clearly structured in a concise way. Based on the structure, different kinds of grammars can be distinguished: – Usually, the linguistic areas are dealt with in the above listed ascending order starting from smaller and simpler to bigger and more complex units that build upon each other. Alternatively, a reverse descending order can be chosen. – Most grammars have a primarily semasiological structure (i.e., a form-to-function approach: the basic entities are particular forms and it is described which functions are expressed by these forms).
Language Documentation and Descriptive Linguistics
•
•
Another option is an onomasiological structure (i.e., a functionto-form approach: the basic entities are functional domains and it is described how they are formally expressed) or a combination of both (e.g., describing morphology semasiologically and syntax onomasiologically). maximally theory neutral – i.e., theory-specific terminologies and analyses should be avoided in order to be usable by a broad audience of various linguists, independent of theoretical trends in linguistics. A commonly proposed theoretical framework in descriptive linguistics is ‘basic linguistic theory’ (Dixon 2010, 2012). It uses only generally accepted fundamental terms and notational conventions. reliable & traceable – i.e., the described features should be amply illustrated by appropriate and authentic examples (with interlinear morpheme glossing and a translation into a common academic language). Depending on the kind of underlying data, distinct grammars arise: – descriptive grammars (i.e., the description is based on how language is actually used) vs. prescriptive grammars (i.e., the description is based on how language should be used), – grammars based on spoken vs. written data – corpus-based grammars vs. grammars based on elicited data (cf. Section 3.3.3.2)
The supplementary lexical data, including information on the lexicongrammar interface (pronunciation, word class, gender of nouns, argument structure of verbs, idiomatic expressions, etc.), are generally collected in a dictionary. Depending on the structure and the language, different kinds of dictionaries can be distinguished: – –
While they are usually organised in alphabetical order, another option is to structure the lexical items according to semantic fields. While monolingual dictionaries make use of one language only, bilingual (or even multilingual) ones include translations into another language. For linguistic purposes, most dictionaries are bilingual and unidirectional (i.e., the focus is on the described language with its translations into a widely known language, possibly with a reverse finder list) rather than symmetrically bidirectional.
Cross-referencing can be used within grammars or dictionaries to indicate interrelations within a particular grammar (features of distinct linguistic areas) or a dictionary (different lexical items) as well as between different outcomes (a dictionary and its supplementary dictionary or a corpus-based dictionary/ grammar and its underlying text collection). All said, the outcomes described thus far focus on language documentation or description which is maximally comprehensive for the widest linguistic audience possible. Depending on the research framework, the extent of outcomes as well as the kind of outcomes may differ. Less comprehensive outcomes are text
101
102
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
corpora/grammars/dictionaries with less informational detail and/or a narrower thematic research focus, such as: • • •
individual gathered texts (maybe even on particular text types only and/or without editing information) vs. representative edited text corpora of different sizes grammar sketches or short grammars (maybe even on individual linguistic aspects only) vs. reference grammars of different sizes simple word lists (maybe even on individual semantic fields only) vs. dictionaries of different sizes
While language documentation addresses different audiences including nonlinguists with its edited text corpora (cf. Section 3.2), reference grammars and their supplementary dictionaries are primarily written for a linguistic readership (e.g., as a data foundation for typological studies, cf. Chapter 4). Descriptive outcomes for other interested parties and purposes, however, also exist, e.g.: • •
textbooks or learning guides (for language teachers and learners) monolingual dictionaries (primarily for the language community)
3.5
Summary
This chapter presented the similarities and differences between empirical approaches towards language documentation and descriptive linguistics. In order to gather information on as yet barely studied or unstudied languages or language varieties, documentary and descriptive research involves several fundamental methodological techniques and procedures. First, a suitable research object has to be selected. This can be quite challenging, as undescribed and undocumented or even endangered languages or language varieties with mostly minor language communities and often hardly any written documents (oral traditions) are difficult to locate. Second, the collection of primary data is based on a period of field research, i.e., working with native speakers within their natural and cultural environment. As the researcher is generally not familiar with the local way of life, it is crucial to acquire language- and culture-specific practices. Without careful preparation (other than methodological aspects, especially practical and psychological considerations) and adaptability to local circumstances, linguistic fieldwork is unlikely to be successful. Once competent and cooperative informants are found, the data are collected and/or analysed by use of different methodological techniques and procedures. Edited text collections or corpora (the typical outcome of language documentation) result from the recording of a broad range of different text types that are representative of the language and their editing, comprising a transcription, a translation into a widely known language, annotations (e.g., an interlinear morphemic glossing) and metadata. The edited information promotes the
Language Documentation and Descriptive Linguistics
understanding of the primary language data for further use (e.g., linguistic analyses). Reference grammars and supplementary dictionaries (the typical outcomes of descriptive linguistics) are based on analytical studies. This means that language data are systematically elicited in order to describe the underlying structural patterns of the language regarding all linguistic areas. Alternatively or additionally, grammars and dictionaries may be based on the analysis of corpus data. Language descriptions are a crucial basis for further linguistic research such as cross-linguistic studies in language typology.
3.6
Exercises and Assignments
Exercises for students which can be included during a session on language documentation and descriptive linguistics or as part of project work: 3.1 3.2
3.3
3.4
3.5 3.6
Develop your own specific typological research questions (similar to the ones in Section 3.1). Watch the film The Linguists (Harrison & Anderson 2008) – particularly Chapters 3, 6, 9, 11, 12, and 14 – and discuss which challenges the documentation or description of endangered languages in the field involves and how you can deal with them. Practice transcription and annotation: For this purpose, record a short oral text of about 2–3 minutes length in a language you are familiar with (watch out: conversations of multiple speakers are more difficult to transcribe and annotate) and begin transcribing it (orthographic and/or phonetic transcription). Afterwards, you can add an interlinear morphemic glossing of the text. Alternatively, it is possible to practice annotation without a prior transcription by working on a written text. Watch Everett’s presentation of monolingual elicitation (Everett 2013) and work out by which examples and items he tries to get information on which particular linguistic structures. How does the elicitation of Everett (2013) differ from the elicitation of Harrison & Anderson (2008, Chapter 9)? Conduct you own small elicitation: For this purpose, choose a language you are not familiar with and for which you can find at least one native speaker as an informant. Focus on a particular linguistic aspect (e.g., number encoding/marking) and elicit (monolingually or bilingually) data concerning this aspect. Chapter 6.2.1 in Bowern (2008: 74–76) or questionnaires may be helpful in this regard. Afterwards, you can compare your results of structural analysis with the published data in a reference grammar (for this purpose, it is crucial not to consult any source on the respective language prior to the completion of elicitation). In this way, you can check your own work.
103
104
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
3.7
Discuss considerations regarding a representative text collection of different text types occurring in a particular context or setting (e.g., at university or in your personal daily life).
More extensive exercises for student research projects or (mid-)term papers: 3.8
3.9
Write a grammar sketch of a language you are unfamiliar with or a language description on a particular linguistic aspect of a language you are unfamiliar with, a. based on elicitation with at least one native speaker, b. based on available edited text collections (e.g., in databases on endangered language documentation), and/or c. based on unedited written parallel texts (i.e., bible texts, declarations of the United Nations or texts of other international organisations in the chosen language that are comparable with the same texts in a familiar language). If the texts are not aligned yet, this needs to be done prior to analysis. Afterwards, you can compare your results of structural analysis with the published data in a reference grammar (for this purpose, it is crucial not to consult any source on the respective language prior to the completion of elicitation). In this way, you can check your own work. Alternatively, you can check the published information prior to your own studies and focus on providing additional information on specific aspects which are not yet described in detail. Develop your own documentary or descriptive research project.
3.7
Further Reading Language Documentation
Comprehensive handbooks on language documentation are Gippert, Himmelmann & Mosel 2006 and Grenoble & Furbee 2010. Journals and book series with a focus on documentary linguistics are ‘Language documentation and description’ (SOAS, ed. by Austin and guest editors) and ‘Language documentation & Conservation’ (University of Hawai‘i Press). Austin 2007 provides insights into documentary training. Endangered languages are addressed in particular in Lehmann 1999, Seifart 2000, Grenoble & Whaley 2006 (revitalisation), Austin & Simpson 2007, Brenzinger 2007, Austin & Sallabank 2011, Haig et al. 2011, Jones & Ogilvie 2013 (revitalization), and Thomason 2015. Jones 2019 focuses on new technologies in endangered language documentation. Regarding various methodological issues of language documentation, we have the following reading suggestions: For information about the collection of phonetic data, Maddieson 2001 and Bird & Gick 2006 are useful references. Editing practices are described well in Schultze-Berndt 2006 (annotation
Language Documentation and Descriptive Linguistics
including transcription, translation, glossing), Lieb & Drude 2000, Himmelmann 2006 (segmentation) and Bergqvist 2007 (metadata). The challenging documentation of pragmatics is illustrated in Grenoble 2007. The aspect of archiving is covered in the issues of Gippert, Himmelmann & Mosel 2006, Austin & Sallabank 2011 and Haig et al. 2011. Ostler 2008 and Lüpke 2005 address corpora of minor less studied languages and their value for linguistics. The challenging topic of language contact in documentary linguistics is dealt with in Comrie & Golluscio 2015. Descriptive Linguistics
Introductory textbooks and handbooks on descriptive linguistics include Gleason 1961, Mosel 1987, Bouquiaux & Thomas 1992, Dürr & Schlobinski 20063, Ameka, Dench & Evans 2006, Payne & Weber 2007 and Chelliah & de Reuse 2011 – a more practical guide is provided by Robinson & Gadelii 2003. Different aspects of elicitation as a crucial method in descriptive studies are extensively described in Bowern 2008, Chelliah & de Reuse 2011, Majid 2012 and Everett 2013 (a video demonstration of monolingual elicitation) – Bowern 2008 and Chelliah & de Reuse 2011 even provide insights regarding the specific empirical procedures differentiated according to linguistic area (phonetics/ phonology, morphology/syntax, lexicon/semantics, discourse/pragmatics). Questionnaires for elicitation of various linguistic areas and topics as well as those with a focus on individual language families or areas are, for instance, listed on the homepage of the Max Planck Institute for evolutionary anthropology (www.eva.mpg.de/lingua/tools-at-lingboard/questionnaires.php), in Chelliah & de Reuse 2011 (365–366), and in Lahaussois & Vuillermet 2019. Grammar series are published by Lincom Europa, Routledge, Cambridge, and others. Furthermore, grammars appear in book series or omnibus volumes of specific language families or areas, and as single monographs with well-known publishers in linguistic academia as well as with small local printing agencies of the respective language communities. The differentiation between documentary and descriptive linguistics is specifically described in Himmelmann 1998, Lehmann 2001, Akinlabi & Connell 2008 and Himmelmann 2012. Various aspects of linguistic fieldwork are addressed in Newman & Ratliff 2001, Thieberger 2012, Foley 2002, ‘Sprachtypologie und Universalienforschung’ (volume 60 (1), including Aikhenvald 2007 and Dixon 2007), Chelliah & de Reuse 2011, Gippert, Himmelmann & Mosel 2006, Everett 2001 (monolingual fieldwork) and Kastenholz 2002. Practical guidelines in particular are provided by Crowley & Thieberger 2007, Bowern 2008, and Sakel & Everett 2012, and the film The Linguists (Harrison & Anderson 2008) offers practical vivid insights into documentary and descriptive fieldwork.
105
4
Language Typology
Language typology is a linguistic subdiscipline studying language from a crosslinguistic perspective. Thus, the research objective is to analyse the languages of the world comparatively in search for structural commonalities and differences. In this chapter, we will outline fundamental typological research questions and specific examples (Section 4.1), the typological approach (Section 4.2), and its specific methodology, including cross-linguistic comparison, sampling, and data sources (Section 4.3). Further, the chapter also covers basic information on typological findings, such as kinds of universals, rara, and typologies (Section 4.4) and common explanations and interpretations of such results (Section 4.5). In sum, this chapter offers readers a broad overview of multiple empirical components and fundamental typological considerations, as well as information on how some of them are linked together. Specific types of research questions, for instance, require certain kinds of samples, just as they result in a particular sort of findings. Finally, the chapter provides a summary (Section 4.6), exercises to practice methodological steps and to develop your own research projects (Section 4.7), and suggestions for further readings regarding various aspects in typological research (Section 4.8).
4.1
Research Aims and Questions
In language typology, we aim to answer the following fundamental research questions: • • •
To what extent do the languages of the world exhibit universal patterns (i.e., shared characteristics)? How do they differ structurally and how wide is the spectrum of variation (i.e., linguistic diversity)? What is the distribution of different linguistic features in the languages of the world?
Based on these descriptive studies, linguists are, further, interested in finding explanatory answers to these findings: •
106
Why do the structures occur which are actually to be found in the languages of the world and not others (which would also be logically feasible)?
Language Typology
However, these explanations are only partially provided by typological studies (cf. Section 4.5). Areal typology (i.e., cross-linguistic studies with a geographic focus), for instance, may show how certain structures diffuse in contact situations. Studies in other linguistic subfields also contribute explanations to this fundamental question. Psycho- and neurolinguistics, for instance, investigate the connection between linguistic universals and processing preferences or the neurobiological underpinnings of language (cf. Chapters 7 & 8), whereas anthropological-linguistic studies provide information on the relationship between linguistic particularities and culture-specific traits (cf. Chapter 6). The ultimate aim pursued by this kind of research is to find out more about what these cross-linguistic similarities and variations tell us about the evolution, function, and characteristics of human language. The first typological studies (i.e., the early comparative work of Friedrich von Schlegel and Alexander von Humboldt) dealt with morphology: the classification of analytic (agglutinating vs. inflecting) vs. synthetic (isolating) language structures (Humboldt 1936). Subsequently (as the work of Joseph Greenberg, the founder of modern typology, shows), the research focus shifted increasingly to syntax, particularly the word order of subject, verb, and object (Greenberg 1963). Today, typologists conduct research in all domains of linguistics, including phonology, semantics, pragmatics, and discourse. Furthermore, even linguistic processes such as language acquisition and language change, are studied cross-linguistically. This broad thematic scope is readily apparent by glancing in the introductory literature (e.g., Whaley 1997; Vellupillai 2012; Moravcsik 2013) as well as typology handbooks (e.g., Haspelmath et al. 2001; Song 2011). Table 4.1 contains specific exemplary research questions from each area.
4.2
The Typological Approach
The methodological approach in language typology is inductiveanalytic. In other words, it is an empirical ‘bottom-up’ approach (cf. Section 1.1.2). Basically, linguistic features of the distinct languages of the world are analysed comparatively. This systematic cross-linguistic study ascertains information about structural features which (almost) all languages have in common (universals) or which are very seldom in their occurrence (rara); it reveals in which aspects the languages of the world vary (a typology of all existing types) and how these different types are geographically distributed (typological maps). The universals of language typology are, thus, established on empirical grounds – in contrast to the theoretically deduced concept of a universal grammar in generative approaches (e.g., Chomsky 1986; cf. Section 7.2).
107
108
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Table 4.1 Example research questions in language typology Language typology Linguistic domains: Phonetics & phonology
Morphology & syntax
Lexicon & semantics
Pragmatics & discourse
Cross-domain
1. Do languages with a comparatively limited consonant inventory tend to have a greater vowel inventory? 2. What sounds or sound combinations (e.g., consonant clusters in the syllable onset) can be found in all languages of the world and which are rare or do not occur? 3. Is tone a feature occurring equally in the languages of the world or is there an areal-specific dominance? 4. Do languages with contour tones always have level tones as well? 5. Are questions always indicated by a rising sentence-final intonation in languages marking questions by intonation? 6. Do prepositional languages predominantly have prefixes while postpositional languages generally utilise suffixes? 7. What types of compounding can be distinguished in the languages of the world? Which are the most frequent? 8. How is ‘number’ formally encoded in the languages of the world? 9. Is definiteness always encoded by articles (or are other formal means available)? 10. Do all non-ergative languages encoding features of the object (person, number, gender, case, etc.) in the verb also mark similar features of the subject in the verb (agreement)? 11. Which arguments can be modified by a relative clause (e.g., in accusative vs. ergative languages)? And how are they formally expressed in the relative clause? 12. How do the verbs of emotion differ cross-linguistically with regard to their valency? 13. Are there universal categories in classifier systems, e.g., (a) separate marker(s) for ‘food/drink’ in languages with indirect possessive markers/classifiers? 14. What are the universal units in the lexicon? 15. Are all Eskimo-type kinship terminologies also lineal terminologies (both emphasising the core family)? 16. What rare (non-decimal or non-quintal) numeral systems can be found in the languages of the world (e.g., a base-twelve system)? 17. Is politeness generally marked by more indirect speech acts? 18. By which linguistic means is a topic shift indicated in the languages of the world? 19. Do all languages follow the maxim of quantity (be as informative as required and not more) to the same degree? 20. complexity: Do languages with a complex phonology (e.g., a large phoneme inventory) exhibit simplicity in morphology or syntax? 21. animacy: What structural features are governed by animacy as an underlying principle?
Language Typology
109
Table 4.1 (cont.) Cross-disciplinary fields: Language 22. Are there universal processes of language acquisition (e.g., is acquisition phonetics generally learnt before syntax and are regular formal patterns generally overgeneralised)? To what extent do we find languagespecific variation? Language contact 23. What typological features do pidgins (& creoles) have in common? 24. Which loanwords are widespread cross-linguistically? And how is their distribution? 25. Is there a tendency for universal features to spread in situations of language contact between statistically universal features and rara? Language change 26. Is there evidence for cyclic language change from agglutinating languages to inflecting languages to isolating languages which change back into agglutinating languages? 27. In which way do artificial languages differ from natural languages? And how stable are they in terms of rare, uncommon characteristics?
The following definitions and descriptions of language typology expound these fundamental aspects of the linguistic subdiscipline: The broadest and most unassuming linguistic definition of ‘typology’ refers to a classification of structural types across languages. In this [. . .] definition, a language is taken to belong to a single type, and a typology of languages is a definition of the types and an enumeration or classification of the languages into those types. [. . .] This definition introduces the basic connotation that ‘typology’ has to contemporary linguists: typology has to do with cross-linguistic comparison of some sort. (Croft 1990: 1; emphasis by the authors) There is a basic unity that underlies the awesome diversity of the world’s languages. [. . .] These [common properties are] referred to as language universals [. . .] In its most general sense, typology is [the] classification of languages or components of languages based on shared formal characteristics. [. . .] typology has the goal of identifying cross-linguistic patterns and correlations between patterns. [. . . Therefore,] (a) Typology utilises cross-linguistic comparison, (b) typology classifies languages or aspects of languages, and (c) typology examines formal features of languages. (Whaley 1997: 4, 7; explanatory notes by the authors)
Overall, languages can be classified in all linguistic domains (cf. Table 4.1) in terms of structural and/or functional features.
4.3
Methodology
Typological research comprises several fundamental methodological components: First, linguistic features or parameters that can serve as a topic of comparative cross-linguistic analysis (Section 4.3.1), further, a suitable and representative sample of the world’s languages as the analysis subject (Section 4.3.2), and, finally, adequate data sources providing information on linguistic
110
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
features or parameters in the sample languages (Section 4.3.3). In the following, we will present and discuss different options and considerations regarding these crucial aspects of typological research in order to provide readers with an understanding for underlying methodological considerations of typological studies and a guideline for setting up their own empirical projects. 4.3.1
Cross-Linguistic Comparison of Linguistic Features
A fundamental requirement for research that aims at uncovering structural commonalities and differences in the languages of the world is the comparability of individual languages. If this precondition is not met, it is impossible to gain cross-linguistic statements. Therefore, we need two crucial methodological components for typological research: First, a metalanguage, i.e., a description and categorisation tool that is not language-specific or emic, but rather that applies to all languages equally. Semantic issues, for instance, are not comparable by use of language-specific terminology as the conceptual content generally differs between languages (e.g., kinship terminology; Völkel 2016). The International Phonetic Alphabet (IPA), for instance, is a well-known metalanguage in phonetics. It is an excellent tool for describing the sound systems of any language on comparable grounds. Second, typological studies require comparable structural features or groups of features according to which each language can be classified. The researcher must therefore identify the parameter(s) to be compared (i.e., the variable(s) of the respective research topic or question) and their values. Depending on the research question, parameters and values vary in quantity and quality. A study may deal with one parameter only (such as question 1b in Table 4.1: kinds of consonants) or it may investigate correlations between two or even more logically independent parameters (such as question 1a in Table 4.1: number of consonants & number of vowels). According to the number of parameters, the respective results are unrestricted (one parameter) or implicational (two parameters) universals (cf. Section 4.4). In order to classify the languages regarding this/these parameter(s), different values must be identified for each parameter. Either the values are logically deducible types (e.g., kinds of adposition) or one might search for classifications concerning the research topic within the specialised literature (e.g., kinds of composition). The values of categorial (e.g., kinds of adposition) as well as continuous parameters (e.g., number of consonants) must be disjoint and comprehensive, so that each language can be clearly assigned to one type and one type only. For the parameter ‘kind of adposition’, for instance, there are the values ‘prepositional’ (e.g., Tongan) and ‘postpositional’ (e.g., Japanese). However, languages may also exhibit no kind of adposition or make use of both kinds. Such cases can either be considered as separate values or subsumed under one (unspecified rest) category. If these two cases are not considered at all, that means that the respective language types are a priori excluded from the study. Accordingly, the results only relate to purely prepositional and purely postpositional languages. Alternatively, languages with
Language Typology
both types of adposition can be classified according to their predominant main type of adposition. During the classificatory work on the individual languages, unconsidered types may appear (such as the rare type of inpositional languages) which must then be added to the types defined thus far (either as a separate type or in a joint category), or this type of languages must be excluded from the scope of research. The number of values depends on the topic, the research focus, the level of detail, etc. In word order typology, there are generally only two values (RelN and NRel, OV and VO, possessor-possessum, possessum-possessor, etc.). Other typological studies operate with more values (such as those on the number of consonants), but in view of being able to make significant statements on meaningful groups of languages, it is generally not advisable to work with too many, overly detailed values (especially for the correlation of two parameters). Taken to the extreme, this would result in as many types as languages studied, making meaningful typological statements impossible. Due to the large extent of formal (or particularly morphosyntactic) diversity in the languages of the world (cf. Evans & Levinson 2009; Bickel 2014b), it is in many cases impossible to define comparable grammatical categories (such as relative clauses or genitive constructions) purely on formal grounds. In these cases, the cross-linguistic comparison must ultimately be carried out on a semantic/pragmatic-cognitive level. For this purpose, Croft (1990:12) specifies the following methodological steps: 1. 2. 3.
determine the particular semantic(-pragmatic) structure or situation type that one is interested in studying; examine the morphosyntactic construction(s) used to express that situation type; and search for dependencies between the construction(s) used for that situation and other linguistic factors – other structural features, other external functions expressed by the construction in question, or both.
Genitive constructions, for instance, describe possessive relationships – i.e., the semantic situation of ownership between possessor and possessum (possessed entity). However, not all languages encode possessive relationships in the same morphosyntactic way. Thus, the semantic situation allows for cross-linguistic comparison and the identification of distinct structural construction types that are used to encode possessive relationships (e.g., morphosyntax: juxtaposition, compounding, adpositional case marking, case inflection; constituent order: possessor-possessum, possessum-possessor). Another example of a challengingly comparable topic are word classes. As many languages do not distinguish nouns, verbs, and adjectives on a lexical level (so-called lexically flexible languages – i.e., single lexemes can function as noun, verb, and adjective without morphosyntactic derivation), typological studies define word classes in terms of pragmatics (N: head of a referential phrase, V: head of a predicate phrase, Adj: modifier of a referential phrase). Semantic maps and multidimensional scaling are methods used in typology to analyse and visualise formal similarity (Haspelmath 2003; Croft & Poole 2008). Connecting lines or the proximity of dots indicate functions expressed
111
112
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
by the same (formal) category in any language. These methods can be used to detect the underlying semantic concepts of formal categorisation in the languages of the world and to define or examine structural types on empirical grounds. In summary, typological studies analyse different kinds of parameters comparatively: formal features (e.g., kinds of adposition or number of consonants), formal features that are used to express one semantically/pragmatically defined situation type (e.g., possessive constructions or the encoding of focus), functional features that are expressed by the same formal linguistic means (e.g., functions of raising intonation), semantic features (e.g., kinship or colour terminologies), and even linguistic processes (such as grammaticalisation, language contact phenomena, or processes in language acquisition). Research on cross-linguistic typologies of linguistic processes is probably the most challenging and most complex with respect to the definition of grounds for comparison. Finally, it should be noted that for the sake of comparability, typologists cannot account for each language in every detail. Thus, the cross-linguistic classification of languages into types generally entails simplified language representations. For instance, if languages make use of different types to varying extents, they may be classified according to their most prominent type without further subspecification. 4.3.2
Sampling
The subject matter of typological research is the languages of the world. Therefore, the most significant results would be gained if a study took all languages (i.e., past, present, and future) into account. However, this is impossible for several reasons. First of all, we have no knowledge of the future, very little information on the past, and we do not even know exactly how many languages currently exist. Rough estimations range from 6,500 to 7,000. The imprecision of this estimate is due to several reasons: a. Languages are in constant flux – i.e., while in some places languages become extinct, in others, new varieties emerge. b. In several cases, it is difficult to determine whether two linguistic varieties should be considered dialects of a single language or already classified as distinct languages (especially dialect continua are the subject of such debates). As several linguists (e.g., Pereltsvaig 2012; Bickel 2014a, 2014b) aptly point out, it is ultimately an arbitrary decision of where to draw the line between language and dialect (based on the number of commonalities or the degree of mutual intelligibility). c. In several areas around the world (South America, New Guinea, etc.), we still have too little knowledge of the languages being spoken there. This means that not all languages are known and/or documented. Only about one-third of languages currently in use are described and the situation is even worse for languages used in the past. We only have records and descriptions of comparatively few extinct languages and former varieties when considering the presumed breadth of the history of human language. The consideration of proto-languages in typological samples is problematic, as we generally do not have transmitted records (such as texts or oral documents) of these earliest
Language Typology
varieties and the descriptions are pure reconstructions based on present languages of the genetic group combined with general knowledge on processes of language change. These difficulties notwithstanding, it would be conceivable to take all the currently described languages into account – i.e., those with available, sufficient, and reliable information. However, the effort of working with such an enormous quantity of languages would be too great and time-consuming for a single researcher. Research groups setting up typological databases do aim to cast such a wide net and some allow for a steady expansion taking recent data into account. Such attempts notwithstanding, in a study based on all or most described languages, specific language families or regions would be underrepresented due to fewer descriptions currently being available compared with those of other linguistic groups and regions. Faced with all these issues, typologists cannot consider all the languages of the world, nor do they work with all languages currently described. Instead, typological studies are based on a representative selection of languages, a language sample (cf. Section 1.2.3). In order to avoid bias, several more or less important sample criteria should be considered when creating a language sample (cf. Rijkhoff et al. 1993; Rijkhoff & Bakker 1998): 1.
2.
sample size: For representative statements, an appropriate number of languages is required. As a reference point, 30 languages are generally regarded to be a minimum requirement, while 400–600 languages (e.g., Tomlin 1986; Dryer 1992) are considered extremely extensive samples. The possible or appropriate sample size may also depend on other sampling criteria under consideration. For instance, if one wishes to build a sample of genetically and areally independent languages, the sample size clearly decreases the larger the defined geographic areas and the higher the chosen level of genetic relatedness. genetic relatedness: As languages of a language family have similar or common linguistic features due to historic relatedness, the overrepresentation of individual language families leads to bias – i.e., the linguistic features of these language families are overrepresented. In the past, this held mostly for the Indo-European languages (e.g., Greenberg 1963). In its extreme, such an overrepresentation can result in the postulation of universals purely on Indo-European grounds. ➔ language classification in terms of genetic relatedness: The genetic relatedness of languages is captured by stratified representations of languages belonging to genetic groups: subgroups belonging to larger language families (genera) which are, in turn, part of macrofamilies (phyla). While the classification of many languages is more or less clear (with minor variation with regard to subdivision), some classificatory aspects are discussed more
113
114
3.
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
controversially (e.g., the classification of Japanese, the AustroThai relation, or the Indo-Pacific language family). This becomes apparent when considering differences between various classifications of the languages of the world, such as Voegelin and Voegelin (1977), Ruhlen (1987), Grimes (1997)/Simons and Fennig (2017)20 (Ethnologue), and Dryer and Haspelmath (2013) (WALS). Regardless of which classification a sample is ultimately based on, it should be used consistently throughout the whole typological study. Rijkhoff & Bakker (1998) demonstrate how different classifications lead to different samples although the sampling method remains the same. Table 4.2 shows the basic classification of Ruhlen (1987) and Voegelin and Voegelin (1977) in comparison with the major difference in the level of Amerind subclassification. For sample building, the following questions are pertinent: a. How many and what languages per language family should be included in the sample? Generally, each language family is considered according to its size (i.e., the number of languages, including the known extinct ones). Another aspect that may be important are the levels of subcategorization (i.e., the level of relatedness or the number of stratified levels). Languages belonging to different sub-groups on a higher level are genetically less related than languages of different sub-groups on a lower level (cf. Section 4.3.2.1 for considerations on selection). b. How to deal with language isolates, unclassified languages, and pidgins and creoles? Language isolates (and unclassified languages) are generally treated as distinct language families with a single representative, and pidgins and creoles as one single language family. c. What about constructed or artificial languages? If such languages are spoken in daily life over a longer period by a community, they may be taken into consideration (e.g., Esperanto, Interlingua, or Volapük), as this means that their linguistic patterns are usable and that they serve as possible linguistic systems. Conversely, linguistically impracticable or hardly learnable patterns would probably change rapidly. Such languages rarely occur in typological samples. If they are included, considerations regarding their relatedness and their context of construction must be taken into account. geographic proximity: Languages spoken in the same area can share linguistic features although they belong to different genetic language
Language Typology
115
Table 4.2 Language families according to Ruhlen (1987) and Voegelin and Voegelin (1977) Ruhlen (1987) Macrofamily [number of known languages: living, extinct] 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
Khoisan [31, 2] Niger-Kordofanian [1064, 4] Nilo-Saharan [138, 0] Afro-Asiatic [241, 17] Caucasian [38, 0] Indo-Hittite [144, 36] Uralic-Yukaghir [24, 3] Altaic [63, 3] Chukchi-Kamchatkan [5, 0] Elamo-Dravidian [28, 1] Sino-Tibetan [258, 10] Austric (incl. Austronesian) [1175, 11] Indo-Pacific [731, 17] Australian [170, 92] Eskimo-Aleut [9, 0] Na-Dene [34, 7] Amerind [583, 271]
Language isolates: Basque, Burushaski, Etruscan, Gilyak, Hurrian, Ket, Meroitic, Nahali, Sumerian Unclassified languages: – in New Guinea: Busa, Messep, Nagatman, Pauwi, Porome, Taurap, Warenbori, Yuri; – in South America: Arara, Carabayo, Chiquitano, Guaviare, Kohoroxitari, Mutus, Yari, Yuwana Pidgins & creoles [38]
Voegelin and Voegelin (1977) Macrofamily [number of languages] 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32.
Khoisan [48] Niger-Kordofanian [1036] Nilo-Saharan [114] Afroasiatic [209] Caucasian [33] Indo-European [153] Yukaghir [2] Ural-Altaic [89] Chukchee-Kamchatkan [5] Yukaghir [2] Dravidian [22] Sino-Tibetan [289] Austro-Asiatic [109] Austronesian [778] Indo-Pacific [754] Australian [259] Eskimo-Aleut [6] Na-Dene [32] Wakashan [6] Chimakuan [2] Salish [23] Hokan [36] Penutian [78] Yuki [2] Macro-Algonquian [30] Macro-Siouan [26] Waicurian [2] Aztec-Tanoan [30] Oto-Manguean [25] Macro-Chibchan [60] Ge-Pano-Carib [269] Andean-Equatorial [250]
Language isolates: Ainu, Aricapu, Baenna, Basque, Beothuk, Burushaski, Callahuaya, Calusa, Gilyak, Juma, Keres, Kutenai, Nahali, Natu, Tarascan Pidgins & creoles [34]
116
4.
5.
6.
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
groups. Such structural similarity results from convergence in situations of language contact. The Balkans (Albanian, Romanian, Bulgarian, Macedonian, Modern Greek) and South Asia (the IndoAryan, the Dravidian, the Munda, and the Sino-Tibetan languages of India) are examples of so-called Sprachbünde or linguistic areas. Thus, the overrepresentation of linguistic areas in a sample can similarly be a source of bias. Therefore, several typological studies classify languages according to linguistic areas of languages sharing structural features. Dryer (1989, 1992), for instance, distinguishes six large areas: Africa, Eurasia, Southeast Asia and Oceania, Australia and New Guinea, North America and South America. environmental similarity (socio-cultural relatedness & similarity of natural surrounding): If people of different languages share the same cultural ideas (e.g., the same religion), these ideas may be reflected in joint linguistic features. Depending on the topic of a given study, this criterion is of more or less importance for the sample. Especially the lexicon or the semantics of culturally related languages can show similar features. A sample with an overrepresented cultural context is then biased. Just as a shared socio-cultural environment may result in linguistic similarities, the same holds for a similar natural surrounding (in colour terminology, in absolute parameters for spatial reference, in taxonomies of animals, plants, food, etc.). typological similarity: This criterion is problematic as it involves circular considerations. If one wishes to study typological issues, how can typological facts be assumed a priori? It is only possible to consider typological facts resulting from previous research. If typological studies have shown that certain languages share many features, it is feasible that they also show similarities regarding the feature to be studied. Hence, typological knowledge of previous research provides the possibility to avoid bias by the overrepresentation of typologically similar languages. data accessibility: This is probably the most important criterion although it should not really be the case. Ideally, each language should have an equal chance of being represented in a sample. However, as discussed earlier, not all languages are documented equally (see ‘databases’). As generally more than one language of a genetic group and/or a geographic area and/or other shared traits are available, one may choose a language(s) with easily accessible data on the research topic. However, this entails the excessive and questionable occurrence of well-described languages in most samples (such as Samoan as a popular representative of the Polynesian languages). Furthermore, especially in small and insufficiently documented language families (e.g., certain isolates), this procedure (the search for alternative languages) is problematic or perhaps not even possible.
Language Typology
The importance of the different criteria also depends on the respective typological research question. The sample size, for instance, should be larger for the study of statistical universals (95% of only 30 languages is less meaningful than 95% of 100 languages), whereas the verification of absolute universals allows for smaller and genetically or geographically biased samples – i.e., one could even search for counterexamples in a specific area or genetic group of languages. For more information on the different types of universals, see Section 4.4. Finally, it should be noted that not all typological research is based on the cross-linguistic comparison of the languages of the world. Depending on the research question, it is possible to focus on specific subgroups only, such as language areas (in areal typology), genetic groups, or particular kinds of languages (sign languages, pidgins & creoles, artificial languages, etc.). Of course, such research is based on a corresponding specific sample basis, which must nevertheless be established with the same aim of minimising bias. According to the basic research question, the following two kinds of samples are distinguished: 1.
2.
probability samples: This sample is used to study the distribution of probabilities for different linguistic types (single parameters or correlations of parameters). Such a sample is extremely vulnerable to bias. Therefore, as many distorting factors as possible must be taken into consideration, at minimum, the criteria genetic relatedness and geographic proximity. In probability samples, genetic language groups are generally represented according to their size (i.e., the number of languages per genetic group). Greenberg (1963), Hawkins (1983), Tomlin (1986), and Dryer (1989, 1992) provide examples of probability samples. variety (or diversity) sample: This sample is used to discover the spectrum of variation regarding a linguistic issue – i.e., the range of formal types that exist to express a cognitive-semantic aspect (e.g., possession, topic change) or the range of formal types regarding one linguistic aspect (e.g., tones, affixation) in the languages of the world. Thus, these studies provide possibilities, not probabilities. In order to capture the diversity of types or possible languages, variety samples are made up of languages belonging to as many distinct genetic groups (and geographic areas) as possible. Depending on the size of geographic areas, the level of genetic relatedness, and the number of further criteria, it may be difficult to find a representative number of independent languages (e.g., Perkins 1989: only 43 genealogically and areally independent languages). Rijkhoff et al. (1993, 1998) and Miestamo, Bakker and Arpe (2016) provide different kinds of variety samples. Furthermore, samples can be distinguished according to sampling criteria.
3.
random sample: To build this kind of sample, no sampling criteria are considered at all and the languages are chosen purely at random.
117
118
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
4.
This means that each language has the same probability to occur in the sample regardless of its genetic, geographic, environmental, or typological characteristics. However, it is recommended to check the language selection for possible bias after the selection process. This kind of sample can be used for all kinds of typological research questions. convenience sample: In building this kind of sample, practical criteria (such as familiarity with particular languages or data accessibility) play a major role. From a theoretical perspective, such criteria should not be considered important, but for practical reasons they are extremely relevant considerations. In fact, most samples are convenience samples, as we can only work with languages for which we have access to relevant and reliable data.
To form a sample, a researcher basically has two different options: either they can use an existing and proven sample that is suitable for their own research (e.g., a sample provided in Dryer and Haspelmath (2013) or in Whaley (1997), the probability sample of Dryer (1992) or the variety sample of Rijkhoff and Bakker (1998)) or they can build their own sample by taking the above described sampling criteria and sample types into consideration. By way of illustration, we will subsequently introduce the samples of Dryer and Rijkhoff et al.
Variety Sample of Rijkhoff et al. The variety sample of Rijkhoff et al. (1993, 1998) that is explicitly designed to identify linguistic diversity is solely based on the criterion of genetic relatedness. ‘Diversity values’ are calculated for each macrofamily based on the number of subordinate levels and the number of nodes per level (using Ruhlen’s classification). The number of sample languages per macrofamily is then proportional to its diversity value. Table 4.3 shows how many languages per macrofamily should be represented for different sample sizes (Rijkhoff & Bakker 1998: 274). The sample includes at least one language of each language family, including language isolates (LI). If more than one representative per macrofamily is included, they should be selected from subfamilies which are as diverse as possible (i.e., from those diverging at the highest level of subordination). Language isolates for which no data is available cannot be considered. In this case, a further representative of a macrofamily must be included to maintain the sample size. Which macrofamily is then additionally represented must be calculated based on the diversity values, which are shown in Table 4.4 (Rijkhoff & Bakker 1998: 272). This is done as follows: The diversity value divided by the current number of representatives shows you the current values, and the next representative is always assigned to the macrofamily with the highest current value. 4.3.2.1
Table 4.3 Rijkhoff’s sample per genetic group and per sample size Macrofamilies (Ruhlen 1987)
Sample size: 30 40
50
60
70
80
90
100
125
150
175
200
225
250
Afro-Asiatic Altaic Amerind Australian Austric Caucasian Chukchi-Kamchatkan Elamo-Dravidian Eskimo-Aleut Indo-Hittite Indo-Pacific Khoisan Na-Dene Niger-Kordofanian Nilo-Saharan Sino-Tibetan Uralic-Yukaghir Pidgins & Creoles Basque (LI) Burushaski (LI) Etruscan (LI) Gilyak (LI) Hurrian (LI) Ket (LI) Meroitic (LI) Nahali (LI) Sumerian (LI)
1 1 2 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 7 3 5 1 1 1 1 2 5 1 1 4 2 2 1 1 1 1 1 1 1 1 1 1 1
3 1 9 4 7 1 1 1 1 2 7 1 1 5 3 2 1 1 1 1 1 1 1 1 1 1 1
4 1 12 4 9 1 1 1 1 3 8 1 1 6 3 3 1 1 1 1 1 1 1 1 1 1 1
5 1 14 5 11 1 1 1 1 3 10 1 1 7 4 3 1 1 1 1 1 1 1 1 1 1 1
5 2 16 6 12 1 1 1 1 4 11 1 1 8 4 4 1 2 1 1 1 1 1 1 1 1 1
6 2 18 7 14 1 1 1 1 4 13 1 1 9 5 4 1 2 1 1 1 1 1 1 1 1 1
8 2 24 9 19 1 1 1 1 5 17 1 1 12 6 5 1 2 1 1 1 1 1 1 1 1 1
9 3 29 11 23 2 1 1 1 7 20 1 2 15 7 6 1 2 1 1 1 1 1 1 1 1 1
11 3 35 13 27 2 1 1 1 8 24 1 2 18 8 7 1 3 1 1 1 1 1 1 1 1 1
12 3 40 15 31 2 1 2 1 9 28 2 2 20 10 9 1 3 1 1 1 1 1 1 1 1 1
14 4 45 17 35 2 1 2 1 10 32 2 2 23 11 10 1 4 1 1 1 1 1 1 1 1 1
16 4 51 19 39 3 1 2 1 11 35 2 3 26 12 11 1 4 1 1 1 1 1 1 1 1 1
2 1 5 2 4 1 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1
119
120
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Table 4.4 Rijkhoff’s diversity values per genetic group Afro-Asiatic Altaic Amerind Australian Austric Caucasian Chukchi-Kamchatkan Elamo-Dravidian Eskimo-Aleut
55.53 14.79 178.44 67.58 137.41 8.54 2.47 7.43 3.34
Indo-Hettite Indo-Pacific Khoisan Na-Dene Niger-Kordofanian Nilo-Saharanan Sino-Tibetan Uralic-Yukaghir Pidgins and Creoles
39.71 123.39 6.97 9.44 90.38 42.18 38.52 4.93 13.47
Basque (LI) Burushaski (LI) Etruscan (LI) Gilyak (LI) Hurrian (LI) Ket (LI) Meroitic (LI) Nahali (LI) Sumerian (LI)
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
Table 4.5 Dryer’s results for word order in relation to adposition per geographic area Australia & New Guinea
North America
South America
Total
5
17
25
19
107
3
0
1
0
0
7
4
1
0
0
3
4
12
16
8
15
6
20
5
70
Africa
Eurasia
OV & Postp OV & Prep
15
26
3
VO & Postp VO & Prep
SE Asia & Oceania
Probability Sample of Dryer The probability sample of Dryer (1989, 1992) aims to measure statistical tendencies. In order to study the impact of huge language families on the occurrence of typological features, Dryer takes the criterion of genetic relatedness into consideration by distinguishing 252 genera. Furthermore, he controls the sample for geographic bias by grouping the genera into six large areas: Africa, Eurasia, Southeast Asia & Oceania, Australia & New Guinea, North America, and South America. Thus, it can be shown whether a structural feature occurs on a global scale or only regionally. For Dryer, a statistical tendency is only universal if it holds true in all geographic areas, such as the statistical preference of languages with OV order to be postpositional and the statistical correlation of VO order and prepositions, as shown in Table 4.5 (Dryer 1992: 83). The values in this table do not represent single languages but rather genera. If a correlating type is found in a genus, it is counted as ‘one’ in the corresponding table field. However, in the case that the languages of a genus show distinct correlating types, the genus is subdivided and each sub-genus is counted
4.3.2.2
Language Typology
separately. Languages with both parameter types (e.g., OV and VO) and languages that do not have a parameter (e.g., no adpositions) are not included in the sample. Dryer’s extremely large sample of 625 languages is published in Dryer 1992 (133–135).
4.3.3
Data Sources
Once a sample is ready, data regarding the linguistic feature to be examined is needed for each sample language. As it is impossible for typologists to have expertise in all the languages they study, this data can stem from different sources: a.
b.
secondary data: The data is retrieved from reference grammars, dictionaries, and linguistic articles. In this case, one is forced to rely on information from other linguists. Provided that careful preliminary work of descriptive linguists does exists, it is still possible that this linguistic information is insufficient or not differentiated enough for the typological research topic. In this case, the situation is comparable to undocumented languages or languages with unpublished records. If no secondary data is available, the missing data must be obtained in other ways (primary data) or the respective languages must be excluded retrospectively from the sample and an adequate replacement must be found (e.g., another language of the same subfamily). Working with various linguistic sources (such as grammars) means that one has to deal with different representations which must be standardised for one’s own typological study. This can be painstaking and time-consuming work, but it must be done carefully. elicited primary data: The data is collected by the researchers themselves. Most of such data collection in typology is obtained via questionnaires (cf. Sections 3.3.3.2 and 3.7). This is a relatively practicable method to gather standardised data on a specific issue (compared to recordings of real-life language in the field). The EUROTYP project (areal typology) used this method and provides guidelines for a typological questionnaire (Bakker et al. 1993, particularly Chapter 5). The collection of primary data obviously requires access to native speakers (or to linguists specialised on the particular language). For typological studies, this is a comparatively labour-intensive, timeconsuming, and expensive kind of data source, particularly when working on barely accessible languages (such as languages in remote areas). However, this can be the only option, especially for studies on linguistic topics not generally described in grammars or dictionaries (such as issues in pragmatics, oral variation, or dialectal varieties, etc.) and when more detailed work is being pursued on underdocumented language families and geographic areas.
121
122
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
c.
tertiary data: The data retrieved from typological databases is cross-linguistically comparable information on linguistic features. This means that the language-specific data (from secondary or primary sources) has already been pre-processed for typological purposes. The Association for Linguistic typology (ALT) provides lists of typological databases on diverse linguistic topics. The ‘World Atlas of Language Structures (WALS)’ is probably one of the bestknown databases. Another one is ‘Glottobank’. Computer technology has facilitated the development of huge databases which are generally not developed by single researchers but by whole teams of typologists. This data source is viable for typological issues or aspects which have not already been analysed within the database project. The work with large-scale datasets comes along with an increasing use of more complex statistical approaches and models (e.g., Cysouw 2005; Croft & Poole 2008 on multidimensional scaling, and Bickel 2011). compiled primary data: A more seldom used alternative is the work with written texts, i.e., a form of natural language data. For structural comparability across all sample languages, a parallel text corpus (cf. Section 5.3.1) consisting of the same text(s) in numerous languages (such as the bible or texts of global organisations) must be built. These texts are then multi-aligned: sentence-by-sentence and/ or word-by-word alignment, depending on the research topic (Cysouw & Wälchli 2007). The work with this kind of data means that no access to native speakers is necessary (in contrast to elicited primary data) and language-internal variation can better be studied than by use of secondary data. However, there are only very few and very special kinds of parallel texts available.
d.
Furthermore, it is, of course, possible to combine some or all of these data sources. However, this must be done systematically to ensure that work proceeds with equivalent and comparable data for all sample languages. Questionnaires, for instance, could provide complementary data on languages with no, unavailable, or insufficient data from reference grammars. The access to unpublished data or the expertise of linguistic specialists for a language may be another data source. Working exclusively with reference grammars is by far the most common practice in typology. Despite the advantages of this procedure, it bears the risk of the scientific community constantly working with the same well-described languages. Furthermore, the information retrieved from grammars can be criticised by specialists on the individual language as too undifferentiated or shallow. However, it is correct that grammars themselves contain more or less undifferentiated information, generally not describing language-internal variation (differences between different speakers, genres, etc.). To study such aspects cross-linguistically, primary data collection is unavoidable.
Language Typology
4.4
Basic Research Findings
Ultimately, the outcome of typological research depends on the kind of research question being pursued. First, studies on the scope of typological variation tell us which different types exist regarding a certain parameter (X). The result, that is the complete list of defined types, is called a typology of X. Second, quantitative studies on the frequency of occurrence of certain types result in probability statements, such as different kinds of universals. Third, studies on the distribution of typological features in the languages of the world provide information on correlations between features and geographic areas of occurrence. This might even entail diachronic insights into the diffusion of linguistic types. Universals (quantitative statements on certainties or probabilities) are probably the most prominent findings in typology. Therefore, we will focus on this kind of result in more detail and start with a detailed description of different kinds of universals. For a feature or property to be considered universal, it generally must be found in (almost) all languages of the world – i.e., in at least 95% of languages. Based on this quantitative aspect, the following kinds of universals are differentiated: •
•
absolute universals (exceptionless statements): These are features or properties occurring in all languages – i.e., 100% of languages exhibit a given feature: ‘All languages have . . .’. Other logically possible types regarding this feature are not attested. Thus, the converse statement relates to impossible language features: ‘No language has . . .’. All these statements express certainties. statistical universals (probability statements): These are features or properties found in the vast majority of languages – i.e., not all but at least 95% of languages exhibit the feature: ‘Most languages have . . .’. Seldom attested features or properties, i.e., those found in no more than 5% of languages, are called rara. Depending on the degree of attested frequencies, features occurring in at most 1% of languages are called rarissima; unicale are features attested in a single language only (Cysouw & Wohlgemuth 2010). Such improbable language features constitute the other end of the scale: ‘Very few languages have . . .’. All these statements express probabilities – i.e., statistically significant tendencies.
A further classification of universals pertains to the number of parameters (cf. Section 4.3.1): •
unrestricted (or unconditional) universals: These universals refer to one parameter (X) only. Regarding this parameter, all or most languages belong to a particular value (a): ‘All/most languages have X:a’. The other values of this parameter (X:b, X:c, . . .) are not/ barely attested.
123
124
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
•
implicational universals: These universals state correlations between two logically independent parameters (if X, then Y). All/ most languages with a particular value (a) of one parameter (X), belong to a particular value (a) of another parameter (Y): ‘X:a implies Y:a’ (X:a Y:a). For X:a languages, other values for Y (Y:b, Y:c, . . .) are not/barely attested. If even more logically independent parameters (X, Y, Z, etc.) are correlated in chains of implicational universals, these are hierarchies. All/most languages with a particular value (a) of one basic parameter X and a particular value of parameter Y, belong to a particular value (a) of another parameter (Z): ‘In X:a languages, Y: a implies Z:a’ (P (Q R)). For languages with X:a and Y:a, other values for Z (Z:b, Z:c, . . .) are not/barely attested.
The following examples in Table 4.6 illustrate the different kinds of universals. The typological findings of implicational universals with two parameters (X and Y) and two values per parameter (a and b) can be represented in socalled tetrachronic tables (see Table 4.7). This representation illustrates that implicational universals, such as ‘X:a implies Y:a’ (X:a Y:a), are unidirectional. This means that the statement is not reversible, i.e., ‘Y:a implies X:a’ is incorrect (*Y:a X:a). Based on the information in the tetrachonic table, languages with Y:a can have X:a or X:b. The broadest compilation of postulated typological generalisations is provided by the Universals Archive in Konstanz/Germany (https://typo.uni-konstanz.de/ archive). It lists not only the universals (consecutively numbered for unequivocal reference), but also provides information about the topical domain, the kind of universal, the sample base, and the literature source. Further less extensive lists of universals can be found in some typological literature, such as Newmeyer 2005 (4–6, 16–17, 206–207) or Scalise, Magni and Bisetto (2009). The Universals Archive also includes a list of rara (which are generally less systematically researched) and hierarchies. Furthermore, Croft (1990) also contains a chapter on hierarchies. Typological studies can test and specify existing typological statements (e.g., by using more elaborate samples or refined parameters and values) or new ones can be established. Absolute universals are already disproved by a single counterexample. Thus, especially absolute unrestricted universals are comparatively seldom. The majority of universal statements are gained either by the acknowledgement of a few counterexamples (i.e., statistical universals) or these cases are excluded from the scope of the statement (i.e., implicational universals). Typologies are research outcomes providing information on the scope of typological variation. A typology of X shows the spectrum of different types: ‘Languages express X by X:a, X:b, . . . or X:z’. This list contains all captured types detected so far, thus implying that no other type has been attested
Language Typology
125
Table 4.6 Examples for the different kinds of universals Kind of universal
Example universal
Parameter(s) & their values
Absolute unrestricted universals
All languages have vowels and consonants.
Statistical unrestricted universals
In most languages (95.77%), the subject precedes the object (SOV, SVO, VSO). Only in 4.23% of the languages, the object precedes the subject (VOS & OVS; OSV does not occur at all) (Tomlin 1986: 22). If the genitive follows the noun, then the relative clause follows the noun: NG NRel (Hawkins 1983: 83).
X: sound inventory (a: consonants & vowels; b: vowels only; c: consonants only; d: neither consonants nor vowels) X: word order (subject, object) (a: SO; b: OS)
Absolute implicational universals
Statistical implicational universals
Hierarchies
With overwhelmingly more than chance frequency, languages with dominant order VSO have the adjective after the noun (Greenberg 1963: 85). Prepositional noun modifier hierarchy: If a language has prepositions, then if the noun precedes the demonstrative or the numeral, then the noun precedes the adjective, and if the noun precedes the adjective, then the noun precedes the genitive, and if the noun precedes the genitive, then the noun precedes the relative clause: Prep ((NDem v NNum NA) & (NA NG) & (NG NRel) (Hawkins 1983: 75).
X: constituent order (noun, genitive) (a: NG; . . .) Y: constituent order (noun, relative clause) (a: NRel; b:RelN) X: word order (subject, verb, object) (a: VSO; . . .) Y: constituent order (noun, adjective) (a: NA; b: AN) X: adposition (a: Prep; . . .) Y: constituent order (noun, demonstrative) (a: NDem; b: DemN) Z: constituent order (noun, numeral) (a: NNum; b: NumN) ... R: constituent order (noun, relative clause) (a: NRel; b: RelN)
Table 4.7 Tetrachronic table
Parameter Y (value Y:a) Parameter Y (value Y:b)
Parameter X (value X:a)
Parameter X (value X:b)
+ -
+ +
126
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
(cf. ‘absolute universals’). Less exhaustive statements only confirm the existence of single types: ‘There are languages with X:a’, without any statement on the existence or absence of other types. Basic classifications of languages pertain to morphology, or more precisely to the boundedness and segmentability of grammatical markers such as TAM and case (analytic vs. synthetic languages of which the latter can be further subdivided into agglutinating vs. inflectional languages), to phonology (tone vs. nontonal languages), to the predominant morpho-syntactic order (languages with head-dependent order vs. languages with dependent-head order), or to the case system (accusative vs. ergative languages). Overall, the more detailed the classification, the higher is the number of types or subtypes (e.g., different kinds of split-ergative languages). The distribution of typological features, i.e., the information about how single types are distributed around the world, is generally presented in typological maps. Languages with X:a are indicated by dots of the same colour on the world map (see, for instance, in Dryer & Haspelmath 2013). Thus, these maps show a correlation of features and geographic area (and partly also language family or environmental aspects). Diachronic typological data can then show which typological features spread, which ones are replaced, and where this takes place. With sufficient synchronic typological data, these linguistic developments may even be traced by statistical approaches and models of preference (e.g., Bickel 20152). Another option is to study the distribution of typological features in relation to geographic areas as Dryer did (cf. Table 4.5) and/or in relation to language families (probabilities of features per language family). In sum, typological studies reveal quantitative (i.e., ‘all/most/no/very few language(s) have/has . . .’) or purely existential information (i.e., ‘there are languages that have . . .’) on the occurrence of individual structural features and correlations between several such features as well as correlations between structural types and other distributional parameters (such as geographic areas, genealogical groups, environmental conditions, kinds of languages, etc.). Furthermore, typological research provides general cross-linguistic information on linguistic processes such as grammaticalisation, language acquisition, and contact phenomena.
4.5
Explanation and Interpretation of the Results
The reasons for cross-linguistic similarities, differences, and distributional facts are multifaceted. While generative approaches explain universals by a genetic language faculty device (called ‘universal grammar’ by Chomsky), typological approaches generally emphasise functional reasons (e.g., Moravcsik
Language Typology
2013: 243–275). Language is a means to express thought and serves communicative tasks. Therefore, language universals are explicable by common experiences and shared conditions of all humans. Gravity, for instance, shapes spatial reference, the physiology of the human vocal tract sound inventory, the number of fingers basic counting systems (decimal or quintal), and the human brain processing capabilities and preferences. Other experiences, values and conditions may only be shared by certain but not all peoples (e.g., technological skills or weather conditions) and, as a consequence, only the languages of these groups may show similar linguistic structures. Similarly, a possible reason for crosslinguistic differences and rara are distinct or special environmental conditions – i.e., culture-specific natural surroundings, practices, and values. Universals in basic colour terminology, for instance, are explained by human physiology, more precisely eye anatomy and visual perception, while certain differences and particularities are often related to cultural factors (Berlin & Kay 1969; Kay & McDaniel 1978). Culturally important ideas, concepts, and categorisations are most likely recognisable in semantics. Moreover, the actual distribution of language structures is explicable by historical developments. Which patterns spread and which become extinct in situations of language contact is largely dependent on historical contingencies based on social, political, and environmental factors (cf. Bickel 20152). Thus, the hypothetical political dominance of minor genetic language groups would have resulted in different distributional patterns than we observe at present and likely even different cross-linguistic findings. Altogether, language use, language acquisition, and language socialisation determine which patterns are passed on and which change, also in situations without language contact – i.e., languageinternal processes of drift or grammaticalisation (cf. Bybee 2010). The following interacting or competing motivations underlying language use and, thus, explaining cross-linguistic findings are generally discussed in the typological literature: •
•
discourse: In order to convey information, speakers produce meaningful messages by structuring their speech (topic-focus, antecedents-referents, etc.). In this way conversational practices have an impact on linguistic structures, such as constituent order, the choice of full NPs or proforms, switch-reference marking, definite or indefinite articles, and so on. Discourse strategies that underlie human communication (such as the cooperative principle and conversational maxims and practices) thus result in formal regularities (cf. Hopper & Thompson 1993). processing: The human parser prefers linguistic structures that can be processed efficiently – i.e., that require minimal cognitive effort in the linear process of speech analysis. Phrases or sentences containing (temporary) ambiguities or cognitively difficult assignments in the
127
128
•
•
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
course of processing are disfavoured. This explains, for instance, why the recursion of relative clauses is limited and why long dependents between their head and the head of a higher phrasal level (such as RelN order in prepositional languages) rarely occur (cf. Hawkins 1999, 2011). economy: Efficient communication involves the following processes for minimal speech effort: Frequently used forms become/are shorter (e.g., abbreviations and clitics) and highly predictable elements even remain unexpressed/are deleted (e.g., elliptical structure, pro drop, and the elimination of other redundant formal encoding). The use of deixis and the fact that a default is generally formally more marked in contrast to a non-default are further economically motivated instances of familiarity. In contrast, longer and more transparent forms occur with less familiar concepts (cf. Haiman 1983). Economy in its extreme may lead to ambiguity or difficulties in understanding which counteracts other motivations, such as processing or discourse strategies. iconicity: Linguistic structures tend to mirror an experienced (semantic) property of the designated item. In languages that distinguish different kinds of possession (such as Oceanic languages), inalienable/inherent possessions are described by direct constructions and alienable possessions by indirect constructions. Thus, the concept of closeness is mapped onto linguistic forms: Closer relationships are expressed by closer formal encoding. Economy and iconicity are often represented as competing motivations (cf. Haiman 1983; Bybee 2011).
A typologically interesting aspect of language contact is the development of rare types which contradict the fundamental motivations. Bisang (2004: 37, following Tosco 1994) describes an East Cushitic contact variety (Bayso) that is a counterexample to Hawkins’ universal (1983: 64): ‘If a language has OV word order, then, if the adjective precedes the noun, the genitive precedes the noun’. (2)
Highland East Cushitic languages: OV with AN & GenN Lowland East Cushitic languages: OV with NA & NGen Bayso: OV with AN & NGen
Through contact with Highland East Cushitic languages, Bayso has changed its NMod structure and adapted AN. As this counterexample to Hawkins’ universal is disfavoured for processing reasons, the subsequent question that arises is how stable such rare types are diachronically. Further, if Bayso goes on to change to GenN, this might be due to persistent contact or cognitive preferences.
Language Typology
In sum, the existence of cross-linguistic generalisations and the distribution of linguistic types in the languages of the world can be explained by a multitude of interacting factors: biological, environmental, historical, discourse related, cognitive, etc. Therefore, typological findings are of interest for further research in other linguistic subdisciplines. Anthropological-linguistic and sociolinguistic research (cf. Chapter 6), for instance, may reveal explanations for rara and cognitive linguistics, psycho- and neurolinguistics (cf. Chapters 7 and 8) may provide more insights into cognitive motivations. Furthermore, historical linguistics, contact linguistics, and so on. may explain the distribution and diffusion of types (including the development and stability of rara).
4.6
Summary
This chapter presented the empirical approach of language typology. In order to study structural commonalities and differences in the languages of the world, typological research involves several fundamental methodological considerations and procedures. First, due to the impossibility of comparing all languages, we have to form a well-balanced sample considering several sample criteria – from genetic relatedness and geographic proximity to data accessibility, which depends on the different kinds of data sources. Furthermore, the appropriate kind of sample is dependent on the specific research question – i.e., for research on the range of typological variation, a variety sample is appropriate, while a probability sample is used for investigations on the frequency of occurrence of different linguistic types. Second, in order to identify cross-linguistic commonalities and differences, comparable linguistic parameters are needed. Each language can then be classified according to a certain type depending on its value in terms of this parameter. A fundamental requirement for such cross-linguistic comparison is a metalanguage – i.e., a description and categorisation tool that applies to all languages. Finally, the results of typological studies depend on the research question being pursued. Language universals and rara, for instance, arise out of investigations on how frequent linguistic types are. Studies on the range of structural variation, on the other hand, result in a typology – i.e., a list of finite types. Explanations for the spectrum of attested linguistic types, common and uncommon or even unattested types are diverse, ranging from circumstances of language diffusion and language contact to sociocultural, physical, and cognitive accounts. At this juncture, typological research interacts with various other linguistic and even non-linguistic subdisciplines, such as psycho- and neurolinguistics, anthropological linguistics, biology or history.
129
130
4.7
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Exercises and Assignments
Exercises for students which can be included during a session on language typology or as part of project work: 4.1 4.2
4.3
4.4 4.5 4.6
Develop your own specific typological research questions (similar to the ones in Section 4.1). Discuss which kind of sample (variety sample, probability sample, etc.) and which sample basis (the languages of the world or a specific group of languages) is reasonable for research on a specific research question in Section 4.1. Find two examples for each kind of universal (absolute unrestricted, absolute implicational, statistical unrestricted, & statistical implicational) – you may use the Universals Archive (https://typo.unikonstanz.de/archive) or Newmeyer (2005: 4–6, 16–17, 2006–2007) for your search. Take a statistical universal or a specific research question of Section 4.1 and define its parameters and values. Discuss possible explanations for the rara of languages with ‘case distinction exclusively by tone’ (Plank 1995). Discuss possible explanations for the universal: ‘The longer a sentence is, the more likely it is to be classified as belonging to a more formal register‘ (Östman 1989).
Furthermore, smaller typological exercises can be found in Velupillai (2012) and Moravcsik (2013). More extensive exercises for student research projects or (mid-)term papers: 4.7
4.8
4.9
Set up a variety sample and examine one absolute universal with this sample: a. with 30 languages based on Rijkhoff and Bakker (1998) – only with languages available at your institution, or b. with 30 genetically and geographically maximally unrelated languages by using Rijkhoff and Bakker (1998) (number of languages per genetic macrogroup) and Dryer (1992) (categorisation according to geographic macroareas). Set up your own variety sample of pidgin & creole languages – taking the criteria of genealogical relatedness and geographic proximity into consideration – and analyse the variety of distinctions within the system of personal pronouns (person, number, gender, etc.). You can then compare your findings to the overall variety in the languages of the world (e.g., Cysouw 2003; Bhat 2004; Siewierska 2004). Set up a probability sample of 50 languages, in which the number of sample languages per genetic group should be proportional to the size
Language Typology
4.10
4.11
4.12
4.8
of each genetic group to examine whether a statistical universal of your choice is valid (i.e., shows the same tendency) across all geographic areas. Consider the criterion of geographic proximity by using Dryer’s categorisation according to geographic macroareas in your sample. Develop a questionnaire for typological research on AN/NA word order (considering which adjectival lexemes allow which constituent order, in which context is which order used, how frequent/dominant each order is, etc.), and practice this survey on three different languages (5 speakers per language) – what are your experiences with the questionnaire and what results does the research provide? How are ergative languages vs. split ergative languages vs. accusative languages distributed in the Oceanic languages? Develop your own sample to study this question. Develop your own research question on a specific topic and develop an adequate typological research procedure.
Further Reading
Introductory and comprehensive textbooks to Language Typology are Comrie 1981, Ramat 1987, Croft 1990, Whaley 1997, Song 2001, Velupillai 2012 and Moravcsik 2013. Handbooks providing a general overview of the linguistic subfield include Shopen 1985, Haspelmath et al. 2001, Mairal & Gil 2006, Song 2011, and Aikhenvald & Dixon 2017. Linguistic journals with a typological focus are ‘Linguistic Typology’ (journal of the Association of Linguistic Typology, published by Mouton de Gruyter), ‘Studies in Language’ (Benjamins), and ‘Language Typology and Universals (STUF)’ (de Gruyter). Furthermore, typological studies are published among others in ‘Linguistics’ (Mouton de Gruyter). Typologically focused book series are, for instance, ‘Typological studies in Language’ (Benjamins) and ‘Explorations in Linguistic Typology’ (Oxford University Press). The most important typological institution is the Association of Linguistic Typology (ALT). On its website (www.linguistic-typology.org), it lists typological databases and related resources. Typological issues per linguistic domain (lexicon & semantics, phonetics & phonology, morphology & syntax) are described in Whaley 1997, Haspelmath et al. 2001, Mairal & Gil 2006, Song 2011, Velupillai 2012, and Moravcsik 2013, with Velupillai 2012 including sign languages. In order to stress the diversity of the languages of the world, Evans & Levinson 2009 and Bickel 2014b also give an overview of central typological findings per domain. Mairal & Gil 2006 instead focus on universals.
131
132
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Comrie 1981, Mallinson & Blake 1981, Shopen 1985, and Song 2001 discuss morphological and syntactic issues only, and Aikhenvald & Dixon 2017 contains articles on phonology, morphology and individual syntactic and semantic topics. Pragmatic issues are addressed in Haspelmath et al. 2001 and specific textbooks on pragmatics with a typological perspective (such as Levinson 1983, Wierzbicka 2003, and Senft 2014). Language change (including grammaticalisation and language contact) from a typological perspective is the subject of Croft 1990, Shibatani & Bynon 1995, Haspelmath et al. 2001, Mairal & Gil 2006, Song 2011, Velupillai 2012, Moravcsik 2013, and Aikhenvald & Dixon 2017. Furthermore, Song 2011 contains typological information on language acquisition. More aspects and examples on the spread of linguistic types in situations of contact are published in Aikhenvald & Dixon 2001, 2006, and Bickel 20152. Hickey 2017 addresses typology from an areal perspective. A survey of the languages of the world is provided by Campbell 1995 and Pereltsvaig 2012. Typological profiles of various linguistic areas and language families are also included in Haspelmath et al. 2001 and Aikhenvald & Dixon 2017. The World Atlas of Language Structures (Dryer & Haspelmath 2013) or Glottobank (https://glottobank.org) are also great sources for typological information on numerous languages. The typology of specific kinds of languages, i.e., sign languages, is the subject of Velupillai 2012 and Aikhenvald & Dixon 2017, with the latter also including articles on mixed languages, creoles, and secret languages. Different approaches to language typology are presented in Shibatani & Bynon 1995 and Haspelmath et al. 2001. Guidelines for structuring parameters and values can be found in Bakker & Siewierska 1991. For more detailed discussions on sampling issues, see Bell 1978, Dryer 1989, Rijkhoff et al. 1993, Rijkhoff & Bakker 1998, Perkin’s article in Haspelmath et al. 2001 and Bakker’s article in Song 2011. Miestamo, Bakker & Arpe 2016 focus on variety sampling. Specific information on quantitative methods and statistics in typology can be found in Cysouw 2005 and Bickel 2011. Alongside introductions and handbooks, further detailed descriptions on motivations explaining typological findings can be found in Newmeyer 2005, MacWhinney, Malchukov & Moravcsik 2014, and Schmidtke-Bode et al. 2019. Cross-disciplinary perspectives focusing upon hierarchies are published in Bornkessel-Schlesewsky, Malchukov & Richards 2015.
5
Corpus Linguistics
Corpus linguistics investigates language use in its natural context with different types of corpora as its data base. It is regarded as a methodological approach to empirical linguistics rather than a subdiscipline with its own unique research questions. After introducing the status of corpus linguistics as a methodology (Section 5.1) and the range of its, mostly quantitative, research questions and aims, we then present the three corpus linguistic approaches showing how linguists use corpus data (Section 5.2). Subsequently, Section 5.3 gives an overview of the major components of the corpus linguistic methodology. First, we provide a prototype definition of what a corpus is followed by an overview of the major types, ranging from more to less prototypical corpora, and we detail the criteria of corpus design (Section 5.3.1). On this basis, Sections 5.3.2 and 5.3.3 address the typical steps of carrying out an analysis and consider specific types of analysis. Following a brief discussion of the web-based corpus approaches (Section 5.3.4), we present the various research outcomes (Section 5.4) that result from the intersections of corpus linguistics with other subdisciplines in linguistics, summarise the chapter (Section 5.5), and give recommendations for exercises (Section 5.6) and further reading (Section 5.7).
5.1
Research Aims and Questions
Corpus linguists are generally interested in how speakers use their language(s) in natural contexts (i.e., uninfluenced by the researcher) and use authentic natural language data, as provided in corpora, to answer their research questions. They approach research from an inductive perspective, and quantitative analyses are the predominant kind. Although corpus linguistics is probably the most wide-spread approach in empirical linguistics at the time of writing, it nevertheless seems somewhat difficult to define its exact status. Corpus linguistics can either be defined as a linguistic discipline in its own right with a distinct theoretical basis (e.g., Tognini-Bonelli 2001) – hence, pursuing research questions specific to the field – or as an empirical method of pursuing research questions from other linguistic fields such as cognitive and psycholinguistics (cf. Section 7.3.5: analyses of natural spontaneous speech) or from linguistic core areas such as semantics or morphology. Currently, the majority of corpus linguists favour the 133
134
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
latter approach, defining corpus linguistics as a particular methodology of investigating language empirically (e.g., Leech 1992; McEnery & Wilson 1996/ 20012; Gries 2012; McEnery & Hardie 2012): ‘As a corpus linguist I consider myself primarily a methodologist and CL primarily a methodology, to be applied to whatever theory seems most appropriate for the task at hand’ (Hardie, bcd; cited in Gries 2012: 42). Defining corpus linguistics thus, there are two corollaries distinguishing corpus linguistic research from other empirical disciplines: •
•
Research questions in corpus linguistics are not specific regarding their content, rather they are united, representing a usage-based approach to linguistics that relies on empirical data with specific corpus linguistic methods for data collection and analysis. Corpus linguistics mainly provides secondary data because the majority of existing corpora is designed to allow researchers from different linguistic disciplines to (re-)use them to answer their research questions.
Pursuing a usage-based approach, researchers study the qualitative differences between patterns of language use and their corresponding quantitative estimates. That is, the distribution or frequency of a linguistic pattern across and within different strata or varieties of one or more languages as represented in the corpus (e.g., areal, dialectal or sociolinguistic). A more specific differentiation concerns the unit of analysis: variationist approaches use variants of linguistic patterns as unit of analysis, while text-linguistic approaches investigate possible differences between registers or varieties, i.e., specific text types are the unit of analysis (cf. Biber & Jones 2009). Distributional patterns may be analysed from a synchronic or a diachronic perspective, allowing researchers to study when or where these patterns emerged or became extinct (e.g., in historical linguistics, discourse analysis, language acquisition). Corpora provide researchers with authentic, natural data of unconstrained language use or performance. That is, the documented speech was produced in a natural communicative setting without interference from the researchers themselves in the majority of cases (Sinclair 1996). Also, the data is authentic as it reflects the actual, spontaneous language use of speakers (and, vice versa, the actual language percept of hearers). From this perspective, corpus data represent a subtype of observational data: They are typically unaffected by the observers’ paradox (cf. Section 2.1), yet contain more information that may be less relevant for the research question at hand. The naturalness of corpus data may vary depending on how the data were obtained. For instance, learner corpora data from second-language (L2) learners are based on non-spontaneous language use elicited with educational tasks (cf. Section 5.3.1). In projects aiming to document an as yet undescribed language, natural language data can also be complemented by elicited data (cf. Chapter 3) to provide a larger quantity of some types of utterance or language use in special contexts. The naturalness of elicited data, in turn, varies as a function of whether, for example, informants provide free
Corpus Linguistics
135
narratives on a specific topic or whether they produce narratives prompted by visual stimuli such as story books (Klamer & Moro 2020). As long as the two sources of data collection (natural language data and elicited data) are clearly documented in the corpus’ meta information (see below), the researcher can compare both types of data to investigate whether, for instance, the mode of data collection matters. Hence, a minimal definition of corpora is that they are collections of language data in written, spoken, or signed form, and that the data were obtained in natural situations. However, this minimal definition is clearly insufficient to properly distinguish linguistic corpora from other collections of texts such as in archives, so it will be further specified in Section 5.3.1. The following Table 5.1 gives an overview of some typical research questions pursued within a corpus linguistic approach.
Table 5.1 Example research questions in corpus linguistics Corpus linguistics Linguistic domains: Phonetics & phonology
Morphology & syntax
Lexicon & semantics Pragmatics & discourse
1. What kinds of spelling errors are common for L2 learners of English whose L1 language has a transparent orthography? 2. What is the relationship between information structure (e.g., focus) and prosodic prominence, i.e., accentuation in speech? 3. What is the distribution of subject-before-object vs. object-beforesubject word order in a free word order language? 4. In languages with the dative alternation, which ditransitive verb (e.g., give, send) shows a bias towards a double object or prepositional object argument structure? 5. Does the frequency of words from a particular semantic field change depending on language variety or text genre? 6. How does the meaning of a word change across time? 7. What is the relative frequency of different forms of anaphoric expressions (null, pronominal, full) in the major varieties of English? 8. What discourses give rise to different semantic connotations (also semantic prosody) that distinguish the use of near-synonyms (e.g., error, howler, and faux pas)?
Cross-disciplinary fields: Language 9. How frequent are certain patterns in child language at the age of 2 vs. acquisiton 3 years such as repetition (of a prior utterance) or multi-word phrases? 10. What kinds of neologism do young children produce sponteaneously and are these based on regular morphological processes (e.g., noun-verb derivation)? Language contact 11. Are deviations (e.g., phonetic differences) of the Hong Kong variety of English from British English influenced by English-Chinese bilingualism? Language change 12. Considering the German spelling reform in 1996, how long did the change in spelling from ‘ph’ to ‘f’ (e.g., Photograph/Fotograf ‘photographer’) take as exemplified by letters to the editor of national newspapers?
136
5.2
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Corpus-Linguistic Approaches
There are three different ways of doing corpus linguistic research, as summarized in Figure 5.1 (cf. McEnery & Hardie 2012; Lemnitzer & Zinsmeister 20153). The first approach, corpus-informed research, is a collection of more or less ‘anecdotal evidence’ from corpora, and, in the view of many corpus linguists, it does not count as a proper corpus-linguistic approach. This is because both selection of a corpus and corpus search are unlikely to follow the standards in the field of corpus linguistics (cf. Section 5.3). As you can infer from Figure 5.1, this kind of corpus search is typically conducted when the researcher has developed a theoretical account of a certain linguistic phenomenon and seeks empirical evidence in support of a hypothesis laid out in said account. Often the empirical findings are taken as evidence for the existence of a specific phenomenon (therefore somewhat adhering to qualitative research), while its distribution, frequency and other specifics are not always taken into account in the research. This is a deductive and qualitatively-oriented approach, as is common to linguistic theory (especially in the generative tradition). The other two approaches are at the core of corpus linguistics and are divided as to the extent of their commitment to pre-existing theoretical assumptions. Corpus-based analysis is carried out when the researcher has a specific
Corpus-informed search
Corpus-based analysis
Corpus-driven analysis
top-down approach
top-down & bottom-up approach
bottom-up approach
Linguistic theory
Hypothesis
selected corpus findings
deductive, qualitative
Conclusion
Hypothesis
Corpus analysis
Linguistic theory
Hypothesis
Corpus analysis
inductive, quantitative or qualitative
Figure 5.1 Approaches to corpus linguistic research
Corpus Linguistics
(empirical) hypothesis, which is then tested against empirical data based on a carefully chosen corpus and an exhaustive corpus analysis, which can be qualitative, quantitative, or both. The results are then interpreted as either confirming or falsifying the hypothesis – hence, the conclusion derived from the corpus analysis can refine the original hypothesis. In addition to this top-down approach to corpus-based research, it is possible to adopt a bottom-up approach, when, for instance, the corpus is investigated with an exploratory goal. Here, theoretical considerations may still be present, given that most corpus-based studies are conducted with annotated corpora, which introduce pre-existing theoretical assumptions via the annotation system (as annotation in any other way would be impossible; cf. Section 5.3.2). For example, the annotation of word forms based on morphosyntactic word classes is a very common annotation scheme. But what exactly makes a word? This is a controversial question with direct repercussions on the annotation schemes. If words are defined as separated by spaces in written language, then one would have to treat non-concatenated compounds in English as two separate words (e.g., washing machine, printer cartridge), which, however, denote a single concept. If words are defined only by morphological criteria, it would be difficult to distinguish homographs in some languages that have a distinct profile in their syntactic position in a sentence (e.g., English to may be an infinitive marker or a preposition; cf. Baker 2010a: 98). Thus, theoretical considerations will necessarily come into play even for very basic annotation systems. Corpus-driven analysis is, by contrast, a strictly bottom-up approach in the sense that the researcher takes no theoretical premises into consideration when conducting the corpus analysis. In some cases, the corpus analysis itself is seen as equivalent to building a theory of language (see chapter 6 in McEnery & Hardie 2012 for an in-depth discussion). The distinction between corpus-based and corpus-driven analysis also has practical consequences in terms of the corpus that is used. Corpus-based research often relies on annotated corpora, whereas corpus-driven research would avoid using annotated corpora in order to not make a theoretical commitment prior to analysis (see also Gries 2012 for a summary). Corpus-driven approaches are practically difficult to implement as linguistic categorization at some level always takes place (see discussions in Lüdeling 2007; Gries 2012). Therefore, most of the current corpus-linguistic research is corpus-based.
5.3
Methodology
In this chapter, we will define what a corpus is, define its criteria, present a typology of corpora (Section 5.3.1), and describe the basics of corpus annotation (Section 5.3.2) before turning to corpus analysis (Section 5.3.3) and the peculiarities of working with the web as a corpus resource (Section 5.3.4). There is some variation in the literature as to which criteria are given priority in a
137
138
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
corpus typology, because they may apply to some corpora more than to others. The reason for this is that these criteria constitute a corpus design which can be loosely defined as a set of criteria according to which the data entering a corpus are selected and enriched with further linguistic information (corpus compilation and annotation) or according to which a corpus is chosen for analysis in order to address a research question (corpus retrieval). Thus, the validity of any corpus analysis crucially depends on the corpus design and the extent to which it pertains to the researcher’s own research question. The core of any corpus design results from researchers’ reflections on (at least) the following basic questions that centre around sampling issues, especially representativeness, sample size and balance, as well as practical issues (cf., e.g., Biber 1993; Adolphs & Knight 2010; Nelson 2010; Reppen 2010 for more specific details about corpus design when building a corpus): •
• •
Based on the research question, what is the basic set/population that the corpus, as a sample of the population, is supposed to represent? Which languages, varieties, registers etc. should be included in the corpus and in what proportion (to one another or relative to their proportion in the population)? What type of sampling procedure is used (cf. Biber 1993 or Nelson 2010 for brief introductions to sampling theory in corpus linguistics)? Based on practical aspects, how are the units of observation per language, variety, register etc. collected (i.e., issues of data availability, accessibility, and ethics)? Based on both the research question and practical aspects, how many units of observation should be collected (i.e., number of texts, text length) and how many can be collected and edited (e.g., transcription, annotation, metadata) given the time constraints of the project?
Given that the world wide web does not adhere to a research design employed by the researcher in corpus compilation or retrieval, Section 5.3.4 is concerned with the additional considerations coming into play when the web is used as a corpus. Section 5.7 contains further links to web pages and references with lists of existing corpora. 5.3.1
The Data Set: Corpora
In Section 5.1, we provided a minimal definition of what a corpus is, which, however, fails to distinguish linguistic corpora from other types of collections of linguistic data. Here, we expand on this initial definition with proposals defining corpora along a cline of corpus prototypicality (Gilquin & Gries 2009: 6; McEnery & Hardie 2012; Lemnitzer & Zinsmeister 20153: 13), using the following criteria (which are comparable to the criteria in language documentation, cf. Section 3.4):
Corpus Linguistics
A linguistic corpus is a systematic collection of authentic texts that: • •
•
is digitized and machine-readable is compiled to be representative and balanced as regards a particular language variety, register, or genre (i.e., representation of all instances of the variety/register/genre and in a proportion that mirrors the proportion in the population) may contain metadata, text mark-up and linguistic annotation
Systematic collection means that the texts are collected following a specific corpus design (cf. Section 5.3), the ‘term text here denotes a file of machinereadable data’ (McEnery & Hardie 2012: 2), which may include linguistic utterances of any modality. According to this definition, a text is a unit of observation or measurement that may contain one or more units of analysis (cf. Section 1.2.3) such as variants of particular linguistic features or variants of specific text types (i.e., register or dialectal variants). Authenticity refers to the fact that the data spontaneously originate from natural communicative situations of speakers, but not from artificial experimental settings induced by the researcher. In general, the quality of a corpus and, hence, of the corpus search performed largely depends on its representativeness and balance, ideals that sometimes counteract one another (see below). Any corpus should be compiled in a way to maximise potential representativeness with regard to the language (or variety, register, genre, etc.) that the corpus is supposed to reflect. That is to say, the corpus is compiled so that it represents the full range of variation in that type of language concerning the varying contexts of use and the range of linguistic feature distributions (cf. Biber 1993 for a discussion of representativeness in corpus linguistics). Balance is somewhat more difficult to define as it may have two different meanings. The first (given in our prototype definition above) is that a corpus is stratified so that it corresponds to the proportion with which the respective strata would emerge in the language. The second refers to the matter that, within a corpus, there is an equal amount of data for each stratum of the language. Keep in mind, however, that perfect representativeness and balance is an ideal rather than an achievable goal in practice, simply because any (existing) language is an infinite, ever-growing repository whose exact composition cannot be truly estimated and because research questions also influence what counts as representative or balanced. In addition, representativeness and balance may also counteract each other, which is nicely exemplified for Swedish. About two thirds of all written texts in Swedish per year have been found to belong to newspaper texts (Gellerstam 1992). A truly representative corpus of (written) Swedish would adhere to this distribution yet compromise balance within the corpus as other text varieties are underrepresented – which may skew results for linguistic phenomena that occur more often in text genres other than newspapers. Corpus size, although not being a definitional criterion, is an important factor because insufficient size may affect representativeness in particular. Together, representativeness, balance and corpus size are of utmost importance in order to
139
140
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
achieve external population validity in corpus linguistic research: Researchers want to make inferences about a particular speaker and text-type population – which, however, is only possible with the best possible balance between representativeness and balance and sufficient size of the corpus as a whole. Otherwise, whatever the systematic patterns revealed in a particular corpus are, it cannot be concluded that they will also hold for natural language data of the population outside of the queried corpus. Metadata (also meta information) describes additional para-linguistic information that is not encoded in the primary corpus data itself. This may be, for instance, the date and time of recordings or other contextual characteristics such as codes to subsets of the data or speaker demographics. Textual mark-up includes additional information in the text that is not part of the linguistic utterances themselves, but helps to identify further language-related information. This may include, for instance, the use of typographical formatting in written corpora (i.e., quotation marks, different fonts), paralinguistic information in spoken corpora (e.g., laughing, coughing) or when a speaker begins and finishes to speak. Textual mark-up information may, therefore, also become relevant for the analysis of pragmatic items such as non-verbal markers of speech acts or speaker evaluations. Linguistic annotation, finally, describes additional linguistically informed information that is linked to the raw language data. These criteria can be regarded as defining a prototypical corpus, if all of the criteria apply, but corpora meeting only some of these criteria will still qualify as corpora, though in a less prototypical sense. As we shall see below, this prototypicality-based definition has advantages when it comes to separating corpus subtypes. In addition, it is compatible with the basic distinction between specialised corpora and ready-made corpora, which is relevant for whether a corpus provides primary or secondary data. Basically, the distinction depends on the researchers performing all three basic steps in corpus research themselves or only some of them. These three steps (Rayson 2015) are: • • •
corpus compilation: systematic collection of texts following a specified corpus design, including preprocessing steps such as digitization, text mark-up, or transcription corpus annotation: adding linguistic annotation and/or metadata corpus retrieval: searching the annotated corpus following some research question
If the researcher compiles and annotates a corpus for a specific research question, then the corpus represents primary data. Such self-made specialised corpora are compiled for a specific research purpose (i.e., they may not be recyclable for other research questions), are often smaller in size and may not be representative of or balanced for all variants or discourse domains of a language. If, by contrast, the researcher queries a ready-made corpus to gather empirical data, the corpus represents secondary data, because compilation and annotation have been performed by other research groups, probably with
Corpus Linguistics
different research goals in mind. Such ready-made corpora are typically large mega corpora that allow multiple analyses for various research questions. Distinguishing between these three steps in corpus usage also helps to elucidate the focus with which different linguistic subdisciplines work with corpus data. Corpus compilation and annotation are at the heart of linguistic documentation (Chapter 3), because the compilation of self-made specialised corpora is the prevalent means of documenting an understudied language. In corpus linguistics, as introduced in the current chapter, corpus retrieval and the ways of analysing corpus data are predominant, as much work is devoted to a corpusbased falsification of linguistic hypotheses and, consequently, the use of readymade corpora prevails. Of course, there are also projects combining all these steps with equal measure such as corpus-based grammars or projects compiling general corpora that ought to be representative for a language as a whole. In any case, when working with secondary corpus data, it is of the utmost importance to become familiar with and reflect upon the first two steps of corpus compilation and annotation. Compilation and annotation directly impact the corpus design, and determine the kind of data that can be retrieved and the kind of analysis that can be performed. There are several criteria that feed into a corpus design and help distinguishing among corpus types. Here, we explicitly discuss only criteria at a macro-level that serve to compare different corpora regarding their general corpus design. They can be distinguished from corpus-internal (micro-level) criteria that are bound to the information in the texts or their meta information. Such micro-level criteria for text selection may be, for instance, whether a text stems from an adult vs. an adolescent speaker, from a woman vs. a man, or from a novel vs. an online blog. Micro-level criteria of course also contribute to the overall corpus design, but are less suited to depict a typology of the main corpus types. The corpus typology in Figure 5.2 presents a corpus typology that covers most of the macro-level criteria discussed in the various subfields of corpus linguistics. It is largely based on the typology provided by Lemnitzer and Zinsmeister (20153: 137; Figure 25) in German, albeit with some minor additions. Representativeness and balance are the core concepts guiding how researchers deal with the sampling criteria in the upper part of Figure 5.2. Function means that corpora are compiled for specific purposes. For example, the British National Corpus (BNC) for English or the Deutsches Referenzkorpus (DeReKo) corpus for German are general corpora that are supposed to reflect a language as a whole, whereas the CHIILDES corpus is designed to exclusively include cross-linguistic data on child language acquisition. Language selection denotes whether a corpus contains data from one or more languages. Monolingual corpora contain entries from one language and mostly refer to monolingual native speakers of a particular language or language variety. Learner corpora (e.g., the International Corpus of Learner English (ICLE) or Falko, a corpus for German as a foreign language) are a special subtype as they
141
142
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s Learner corpus
Function
Monolingual corpus ‘Native-speaker’ corpus
Language selection Bi-/multilingual corpus Size
Comparable corpus
Written corpus Sampling
Modality
Parallel corpus
Computer-mediated communication (CMC) corpus Spoken corpus Sign language corpus Multimodal corpus Web as corpus
Corpus Persistence
Monitor corpus Sample corpus None
Editing
Annotation
Types
Morphology Syntax Treebank Semantics Pragmatics Errors Phonetics/Phonology/Orthography
General or reference corpus Language relation Corpus "body"
Specialised/ opportunistic corpus
Accessibility
Figure 5.2 A typology of corpora
include texts from foreign language learners, often built on specific genres like essays, grammatical exercises and tests, and other educational text forms that are based on elicitation (cf. Granger 2008). Therefore, texts in learner corpora do not qualify as authentic in the strict sense of representing spontaneous language use in a natural communicative situation, unless educational context in language classes is considered a type of register in its own right. Typically, a learner corpus represents data from learners of a particular mother tongue learning a particular foreign language. Among bi-/multilingual corpora, we differentiate between parallel corpora and comparative corpora. Comparable corpora include texts from at least two languages or language varieties, which are selected following the same sampling procedure (identical text genres, comparable proportions of the texts within each language subset etc.). Examples are the International Corpus of English (ICE) or the INL (Instituut voor Nederlandse Lexicologie) Corpus. Parallel corpora, by contrast, explicitly align two languages with one another, by including texts from the source language and one
Corpus Linguistics
or more other languages, which are translations of the source language (e.g., the Oslo Multilingual Corpus [OMC]). The individual sentences in source texts and translated texts are aligned so that researchers can investigate, for example, to what extent translations differ from the original. Alignment can be unidirectional (a source language A and its translation counterpart in the target language B) or bidirectional (a source language A and its translation in language B, as well as language B as a source language and its translation in language A or C). Modality describes how the data entering the corpus were produced. Most corpora include data from written language, because they are the easiest to acquire and process. Spoken language corpora (e.g., the London-Lund Corpus of Spoken English) are less frequent, as audio recordings are more difficult to obtain in naturalistic communication settings (with high quality). They are also smaller than written language corpora, because they require transcripts to be accessible for corpus search tools. Such transcriptions have to be carried out manually for the most part, which is still a time-consuming process. The same holds for corpora of sign language (e.g., British Sign Language [BSL] corpus), in which transcriptions and annotation of video recordings have to be performed manually. Multimodal corpora include audio-visual recordings so that language can be studied in the context of non-verbal communicative signals such as gestures (e.g., the Scottish Corpus of Texts and Speech [SCOTS]). Computermediated communication (CMC) corpora include texts generated electronically which depend on computers and the internet as a medium of communication (e.g., Düsseldorf CMC Corpus, Thai Chat Corpus; cf. Beißwenger & Storrer 2008). The text types included are characterised by not fitting in with what is traditionally considered a written or spoken text. This is because, inter alia, texts such as from instant messaging, emails, or weblogs include visuoorthographic features reminiscent of phonological information in spoken language (e.g., capitalisation or emoticons signalling emotional prosody), while lacking other features of natural face-to-face communication. Corpus size, understood as the total number of tokens in a corpus, is a crucial topic in corpus linguistics. On the one hand, it has been argued that ‘the default value of [q]uantity is large’ (Sinclair 1996: 6), because small-sized corpora are of limited use for some linguistic fields such as lexicography, the study of rare linguistic constructions that may be missed with insufficient corpus size, or for studying phenomena with a high degree of variation. On the other hand, analyses on very frequent phenomena (e.g., pronouns, function words, tense marking) may not require as large a corpus to reveal reliable results (Biber 1990, 1993). Generally speaking, adequate corpus size results from the interplay of research question, corpus representativeness, and, especially when building your own corpus, from practical aspects (Reppen 2010). As mentioned above, the corpus must be large enough to accurately represent the language (variety) relevant to the research question, as otherwise frequency estimates, for instance, may be distorted. For example, the British National Corpus (BNC), the general reference corpus of British English, contains only about 10 per cent spoken texts, so it is
143
144
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
not representative or balanced regarding language modality (spoken vs. written). Yet, with nearly 100 million words (referred to as tokens) in the BNC overall, even a subcorpus containing only 10 per cent of the data is sufficiently large to allow for meaningful analyses within the spoken modality (Hunston 2008). The usefulness of small-sized vs. large corpora depends on the research question: When the corpus is used to test a very specific research question (i.e., workplace discourse, ritual speech, etc.), a smaller corpus suffices to investigate the particular context of language use. Evidently, it would then be impossible to generalise the results to language use in other contexts or to use the corpus for research questions targeting other discourse domains. A mega corpus with several hundreds of million words (usually a general corpus), by contrast, includes texts from various contexts or discourse domains and so may facilitate myriad different analyses focusing on diverse contextual conditions of natural language use or on the use of linguistic features across different contexts. Finally, corpus size varies between disciplines due to practical issues. In the field of language documentation, corpora are much smaller compared with many ready-made corpora of the major languages used in corpus linguistics, mainly due to the enormous efforts in data compilation and annotation that are carried out by only one researcher (cf. Section 3.4). Oral text genres in particular require the transcription of the recorded materials in addition to the elaborate annotation work on less or unstudied languages (cf. Section 3.3.3.1). Corpus size includes considerations about text length as well. There is a risk that rare linguistic features in particular are not properly represented in short text samples, which, in turn, can lead to unreliable frequency counts (Biber 1990, 1993). Now, texts are typically included in their entire length, but if extracts are used instead of full texts, one should pay attention that, per text genre, the beginning, middle, and end parts of texts enter the corpus to an equal extent as these text parts are likely to differ linguistically (cf. Nelson 2010). The criterion of persistence is strongly related to corpus size, because it captures the extent to which the corpus can still change after its first compilation. Sample corpora are fixed in size and content, providing a snapshot of a language (variety) at a particular time and distribution. In contrast, monitor corpora, change constantly either because new data are added systematically (as in the DeReKo corpus for German) or because parts of the data are replaced with new ones due to copyright issues or ethical considerations. Therefore, analyses conducted on monitor corpora may have lower replicability, if the version of the corpus at the time of analysis is not carefully documented. Note that while the web may appear like a huge monitor corpus, there are good reasons to consider it as a corpus type of its own (see Section 5.3.4). The criterion of corpus ‘body’ captures aspects of the actual overall corpus entity. On the one hand, there may be different limitations on accessibility, most notably licensing for the corpus or parts of it. That is, not all parts of a corpus may be freely available to the researcher wanting to compile a corpus or to other researchers working with a ready-made corpus. Language relation describes
Corpus Linguistics
whether the corpus was compiled to provide a reference frame for the language as a whole, including all its varieties, thereby constituting a general or reference corpus (e.g., DeReKo for German, the BNC, the Corpus of Contemporary American English (COCA)), or whether it reflects an a priori specified variety of a language (specialised or opportunistic corpus). Given their function, general reference corpora are typically larger than specialised corpora that are developed for more specific research purposes or questions. Sometimes corpora are classified as opportunistic, because circumstances of data collection imposed limitations on achieving higher levels of balance and representativeness. The final criterion, corpus editing, describes the amount of linguistic labelling or annotation applied to the raw data, and given the central status of linguistic annotation for many corpus linguistic studies, it will be described in more detail in Section 5.3.2.
5.3.2
Corpus Editing: From Raw Data to Annotated Linguistic Categorisation
We briefly touched upon linguistic annotation in the discussion of corpus-based vs. corpus-driven approaches in Section 5.2 (cf. also Section 1.2.7), showing that the annotation type may alter the outcome of a corpus query. Hence, the quality of data annotation (if present) is a central aspect in any corpus linguistic project, be it conducted with ready-made corpora or with bespoke corpora custom compiled for the research question at hand. Because the majority of corpus analysis is based on annotated data, corpus linguistic research necessarily entails a qualitative component, even when the main research question is quantitative in nature (cf. Lüdeling 2007, 2017). Counting linguistic elements of some sort necessarily presupposes a qualitative distinction between them, as they are categorised as members of different linguistic categories. And, this is essentially what corpus annotation does: It assigns linguistic elements (e.g., words, phrases) to a linguistic category and, thereby, helps to distinguish the elements in the corpus from one another. Thus, it is linguistic annotation that makes a corpus usable for various quantitative and qualitative analyses beyond lexical single-word searches. Annotation can be performed for any linguistic level from phonetics/phonology to morphosyntax and semantics/pragmatics, and it is based on two interconnected steps, which, nowadays, are often automatically provided with larger ready-made corpora: •
tokenisation: With tokenisation, the raw data is split up into segments (tokens) that are supposed to be the smallest units of analysis. Tokens can be linked to any linguistic or non-linguistic form. Most often tokens are associated with written word forms, separated by, e.g., spaces or other formal indices a writing system may use to signal individual word forms. However, this association between
145
146
•
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
tokens and written word forms is problematic in many respects. First, tokens can also include smaller units (e.g., syllables), non-verbal characters (e.g., punctuation, symbols for non-verbal units such as gestures) or larger linguistic sequences (e.g., syntactic phrases, discourse-pragmatic units). Second, even for tokens based on written word forms separated by spaces there is no one-to-one correspondence, as there are clear cases where word forms incorporate more than one word (e.g., contractions such as we’ll in English or zum (zu+dem ‘to + dative determiner’ in German) or, vice versa, where more than one word form corresponds to a concept (e.g., proper names such as Los Angeles). For spoken or multimodal corpora tokenisation is performed with the transcriptions of the spoken data that are then linked to a sound file. tagging: The above definition of tokens inherently implies that they are linked to distinct linguistic categories, which is known as tagging. A tag set includes an exhaustive list of ideally non-overlapping categories that can be assigned to tokens – i.e., tokens are tagged as carriers of a particular type of linguistic information. Thus, the key component of tagging is a well-defined set of tags that captures most of the forms accurately and efficiently, as otherwise a corpus query may lead to false negative (misses) or false positive (incorrect hits) results. The most frequent types of tagging are parts-of-speech (POS) tagging and lemmatisation, and many further annotation schemes build on these two. With POS tagging, each token is assigned a tag to indicate its morphosyntactic word class in a language (verbs, nouns etc.). Lemmatisation means that tags are associated with a superordinate base form or lemma. This tagging procedure is often applied when a language exhibits inflectional morphology – i.e., inflected forms belonging to the same lemma (e.g., book and books are the two tagged forms of the lemma BOOK). A corpus can be annotated simultaneously with more than one tag set or remain untagged altogether (e.g., the Enron Email Dataset, a corpus for computer-mediated communication). There is a tendency that specialised corpora are only tagged with a tag set that is relevant for the original research question and analysis.
A tag set is necessarily linguistically informed and can be defined for any linguistic domain. For instance, while POS tagging contains morphological and (sometimes) syntactic information to define a tag set such as morphological word class (e.g., adposition or particle) and syntactic distribution (e.g., postposition or preposition as types of adposition), semantic tag sets can include labels for relational concepts such as semantic role for arguments in a sentence. Annotation guidelines include information about how individual tags are defined in linguistic terms (e.g., what counts as an adjective, what counts as an adverb?)
Corpus Linguistics
or how ambiguous cases are handled. Therefore, it is necessary to know the annotation scheme of a selected corpus in depth, including tag set(s) and annotation guidelines, not only during corpus annotation itself, but also in order to assess the quality of the search results yielded by the corpus query. The same holds for when analysis results from different corpora are being compared with one another: If the corpora come with different annotation schemes, the comparison may be of little added value if there is no way to find a consistent additional annotation scheme that applies to the corpora. The problem of diverging annotation schemes is probably most striking for corpora with syntactic annotation, i.e., treebanks (e.g., the PennTreebank, Negra treebank for German). Syntactic annotation captures sentence structure in terms of phrase structure and grammatical relations between sentence constituents. Just as there are different formal and functional grammatical theories that substantially differ in how they describe constituent structure and grammatical relations, syntactic annotation schemes also differ as to whether they adhere to, for example, formal generative or functional dependency syntax. Linguistic annotation can be done fully automatically with software tools (taggers), fully manually or via automatic annotation that is manually checked for errors. The kind of procedure that is chosen depends on several factors. First and foremost, the level of accuracy that a tagging procedure can achieve for a given linguistic domain is critical. For example, accuracy of fully automatic tagging is generally high for POS tagging (above 90 per cent), but lower for semantic and pragmatic annotation. Second, the size of the corpus to be annotated may prohibit fully manual annotation. Finally, there are many taggers for a few major languages (English, German, Chinese, or French), but very few if any for the majority of lesser studied languages in the field of corpus linguistics. In general, there are two search options depending on data type (tagged vs. raw) and the availability of an advanced tool for data retrieval (see Section 5.3.3): With tagged corpora, one can perform one search for all instances of a specific tag. With raw data without any tags, one can tag the data first and then perform searches as in tagged corpora. Or one can proceed with the untagged data and perform separate lexical searches for all tokens that may fall under a hypothetical tag. This decision is a question of efficiency in terms of the project timeline.
5.3.3
Corpus Analysis: From Query to Evaluation
There are four broadly identifiable steps in using a corpus linguistic method with existing corpora: • • • •
research question and definition of linguistic phenomenon to be studied appropriate corpus selection and search with specific query tools analysis: quantitative and/or qualitative, macroscopic or microscopic, synchronic, or diachronic statistical tests for quantitative analyses and interpretation
147
148
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Having identified a research question that can be studied with corpus linguistic tools, the next step is to define what kind of linguistic phenomenon should be searched for. Thus, corpus linguistic analysis includes a coding scheme similar to studies using observational methods (cf. Section 2.2). This may not be as trivial as it first appears for two reasons. First, some phenomena may reveal considerable variation in the choice of lexical forms, which, in turn, may impact the results if some variants yield higher estimates than others. So, the coding scheme (the lexical forms used in the corpus query) should represent an exhaustive list or a balanced subset of instances for the studied linguistic phenomenon. Second, some searches may be facilitated by the use of lemmas, as the search for single word forms may be very time consuming. Thus, the choice of the object of research may partly depend on what kind of query tool and/or annotation is available with the selected corpus. Next, the researcher choses a corpus whose design and size is appropriate for the research question. For instance, for studies aiming to examine different varieties of a language, the corpus (or corpora) should contain all the relevant varieties of that language or at least as many as possible. For studies with multiple corpora, the annotation systems must be comparable, as meaningful comparisons across the corpora will be difficult or even impossible otherwise. Further considerations in corpus design (e.g., speaker groups, registers and time periods covered by the corpus) and access to the full corpus apply to all research projects. When quantitative estimates are of special interest, then corpus size also matters. Larger corpora are better suited to provide enough instances of rare phenomena, hence, large ready-made corpora may be a better choice. For other phenomena, self-made specialised corpora of smaller sizes may be better as they limit the query to the relevant genres or registers. But this is more time consuming and may not allow for replicative work if that corpus is not made available to other researchers. The researcher then performs a corpus query by using specific software or query tools. Indeed, the availability of such tools is a unique feature of corpus linguistic analyses. Some corpora bring their own query tools to facilitate searches (e.g., BNCweb for the BNC, Cosmas II for DeReKo), while others require that the researcher uses third-party software to process a corpus (e.g., WordSmith, AntConc, ELAN). Some of them can be used for data annotation (e.g., AntConc for POS tagging, ELAN for tagging at various levels), others for displaying and initial analysis of the data, and others serve multiple purposes (e.g., ELAN can be used for tagging and search). Tools for data display and initial analysis are known as concordancers. They are used to retrieve the data from the corpus and display them to facilitate analysis, in a display format called concordance line, which displays the target item (a word, morpheme, phrase etc.) and the immediate context to its left and right on a single line. The most common way of displaying concordance lines is via the Key Word In Context (KWIC) concordancing in which the concordance lines are centred for the item that has been searched for (the keyword) and listed (cf. Figure 5.3). Concordance lines
Corpus Linguistics
Figure 5.3 Concordance lines (from COSMAS II for DeReKo)
can then be listed randomly or alphabetically or they can be sorted in a manner contingent on more sophisticated contextual parameters such as the syntactic category of a word to the left or right of the keyword. In this way, concordance lines already give a first visual impression of what instances and possible patterns there are in the corpus. These can then be further explored statistically and interpreted in qualitative terms. Figure 5.3 illustrates a random selection for the German keyword ‘Dealer’, a loanword from English that, however, exclusively denotes drug dealers in German and, hence, has a narrower meaning than in English. In a next step, further types of analyses can be performed to shed light on the distribution, the contextual motivation and linguistic specification of systematic occurrences (see Sections 5.3.3.1 and 5.3.3.2). In general, these analyses proceed along at least three dimensions, each involving a continuum between two poles: •
•
quantitative distribution and qualitative function: At a very basic level quantitative analyses correspond to some kind of frequency list, whereas qualitative analysis types involve further scrutiny of concordance lines. Corpus analyses are often associated with purely quantitative research, but it is important to keep in mind that quantitative and qualitative analyses go hand in hand in corpus linguistics, mainly for two reasons. First, linguistic annotation is a qualitative categorisation system that provides the basis for the quantitative analysis of how variants for a category are distributed in a corpus. Second, quantitative estimates of an element’s distribution in different contexts inform qualitative analyses about which contexts favour, for instance, certain meanings over others. macroscopic and microscopic approach (Biber 1988): This distinction refers to the unit of analysis. In adopting a microscopic or variationist perspective, researchers are interested in describing the existing variants of a linguistic element. With a macroscopic or textlinguistic approach, researchers analyse the differences between text genres/language varieties/registers with respect to a linguistic element. For example, a researcher may be interested in the development of nominal structure in monolingual children, and would therefore
149
150
•
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
search for all variants of nominal or nominal-like elements in a corpus on child language (e.g., CHILDES). This would follow the first goal of describing all existing variants of a phenomenon. This approach is often inductive as it can be used without prior hypotheses. If, instead, the researcher were interested in whether there are functional differences in the use of, for example, restrictive vs. appositive relative clauses between different text genres or registers, this analysis would be guided by the second goal above. Research projects in which two or more corpora are being compared with respect to what they reveal about the function of the variants of a linguistic element also relate to the microscopic approach (cf. Meyer 2002; Biber & Jones 2009). variation in space and variation in time: Synchronic studies compare several language varieties, genres, or registers at a single point in time, emphasizing regional or sociolectal variation. Diachronic studies compare a language variety, a genre, or a register over time, hence, focusing on temporal variation within a region or sociolect.
Analysis types based on these dimensions can, in principle, be flexibly combined with one another, but, in actual research, there are preferences, which is mainly due to practical considerations. For example, a macroscopic approach prefers the study of synchronic variation, while the microscopic approach is compatible with synchronic and diachronic investigations. Diachronic studies often include more qualitative research components given the somewhat smaller amount of data. No matter which analytical dimension prevails, there is a basic, somewhat technical, principle of analysis applying to all kinds of analyses: exhaustiveness. In order to avoid false positive (i.e., assuming a systematic pattern in the data where there is none) or false negative findings (i.e., missing a systematic pattern), analyses are usually conducted on the full corpus or on randomly chosen subsets of the data that are representative of the entire corpus, when, for example, the research targets high-frequency forms (e.g., pronouns, conjunctions) that would yield too many hits to allow a time-efficient analysis (cf. Baker 2010a). Exhaustiveness entails that new discoveries during the analysis should also be recorded and the coding scheme adjusted in a transparent way. Finally, for quantitative analyses, inferential statistical tests (cf. Section 1.2.8) are necessary to determine when two data patterns reliably differ from each other and, hence, are allowed to be interpreted as a significant result. As with other subdisciplines that use inferential statistics (cf. Chapters 7 and 8), there are many types of statistical tests that test partly different aspects of the data and that exhibit varying degrees of sensitivity to distortions in the data (e.g., very infrequent items that may even stem from typographical or annotation errors; cf. Biber & Jones 2009; McEnery & Hardie 2012: chapter 2.6). The correct application of statistical methods requires at least basic knowledge in statistics and preferably also advanced knowledge of the standard tests in the field.
Corpus Linguistics
In the following sections, we will introduce the most common types of analysis, beginning with more quantitatively oriented analysis techniques (Section 5.3.3.1) and followed by more advanced quantitative and qualitative techniques (Section 5.3.3.2).
Common Quantitative Analyses The most common analyses are quantitative measures of the number of occurrences of a particular target unit. That is, they measure the frequency of words or multi-word expressions/sequences (e.g., phrases, idioms) within a corpus or in comparison to another corpus. These analysis types are: 5.3.3.1
• • • • •
frequency list type-token ratio keyword list collocation analysis (collocation, colligation, semantic preference, semantic prosody) n-gram
A frequency list displays the frequency of all words in a corpus, either in alphabetical or ranked order from most frequent to least frequent. Such lists are sometimes integrated into dictionaries (e.g., the German online frequency dictionary wortschatz.uni-leipzig.de), so that dictionary users also get a glimpse of how frequently a word occurs across (mostly written) text types or language varieties. An important aspect to consider when frequency lists are compared for different corpora is that the raw frequency values (absolute values) need to undergo some form of normalisation – i.e., a correction for differences in overall corpus size of the corpora being compared. Such a normalised frequency is obtained by first dividing the raw frequency value of a target word by the overall word count of the corpus. The result is then multiplied by a base of normalisation, e.g., 1,000 or 1,000,000, to give the target word’s average occurrence per thousand or million words. In this way, normalised frequencies for one or more target words in different corpora can be directly compared to one another, independent of the size of the two corpora being compared (Biber & Jones 2009: 1299). Two further frequency counts that have a relational or comparative flavour to them are the type-token ratio and keyword lists. The type-token ratio (TTR) indexes whether there is variety in the vocabulary of a corpus. The total number of types of tokens (unique word forms associated with a lemma) is divided by the total number of tokens (individual word forms) in a corpus. The closer the TTR is to 1, the more varied is the vocabulary used in a stretch of text. This kind of analysis is useful for investigations on language development (cf. Diessel 2008). Keyword lists also measure the frequency of single words, but always include a comparative aspect (cf. Evison 2010; Gatto 2014). Keywords are words that occur significantly more frequently (positive keywords) or less frequently
151
152
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
(negative keywords) in a target corpus in comparison to a reference or benchmark corpus. In many cases the critical corpus is a small–sized specialised corpus, while the reference/benchmark corpus is larger in size (often in the form of a general reference corpus). The underlying rationale is that when a keyword is significantly more (in)frequent in the target corpus compared to the reference/ benchmark corpus, then its use is tied to a certain context of use. Significant differences in keyword distributions can be statistically assessed with t-tests (evaluating the confidence of whether a difference is reliable) or the mutual information (MI) score (assessing the likelihood or strength of two words occurring together). It is also possible to use this approach to investigate the ‘keyness’ of multiword expressions; in this case, the technique is sometimes referred to as a key–cluster listing. Keyword lists are particularly useful in stylistics or forensic linguistics that compare different text genres, registers or speakers with one another, with the aim of identifying author–specific characteristics of vocabulary use. There are also several indices measuring the systematic co-occurrence of two or more words within the same co-text – i.e., those words tend to co-occur in the same co-text more often than what would be expected if their distribution was purely determined by chance. These multi-word expressions or multi-word sequences typically include expressions such as idioms (e.g., kick the bucket) or set phrasal constituents (e.g., all the best), but also lexical items that exhibit some affinity to each other (e.g., typical modifier-head combinations such as the alleged offender in juridical contexts). Such types of co-occurrence are subsumed under the umbrella term collocation analysis which can be regarded as a summary statistic of the co-text surrounding a target item. In general, a collocation analysis assumes a bi-partite structure with a node (a target word) and its collocates (neighbouring words that systematically co-occur with the node to its left or right). The span within which collocates to a node can be searched for can be set freely by the researcher. In this sense, collocation analysis essentially informs the generation of the keyword lists described above. Collocation analysis is a ubiquitous analysis type in corpus linguistics. It always involves inferential statistics to identify significant collocations, via probability measures such as log-likelihood tests, z-score tests or t-tests, and the MI score (see above) to test the associated collocational strength. As speakers’ intuitions about what is or is not a collocation are often unreliable, inferential statistics are of the utmost importance to correctly identify collocations, and so many analysis tools include such inferential tests as built-in functions. The literature distinguishes between four subtypes of collocation, depending on what kind of relation between node and collocates is being investigated (Xiao 2015). •
Collocation in a narrow sense is used when the relation between node and collocates is captured in lexical-semantic terms (i.e., there is a relation between node and further words).
Corpus Linguistics
•
•
•
Colligation captures the relation between a node and its morphosyntactic environment – i.e., the morphosyntactic or grammatical categories of collocates. For example, the difference between adjectives and adverbs can be stated as a difference in whether they colligate with nouns or verbs. A more qualitative version of collocation analysis is brought about by investigating the meaning of collocations, specifically the semantic relations between node and collocates that may give rise to pragmatic connotations or multi-word patterns signalling a preferred semantic combination of words. Hence, this is achieved by analysing the semantic content of collocates. Semantic preference means that the node word is preferentially combined with collocates from (any) specific semantic fields without taking on an additional connotation from the collocates. For example, in a study of the near synonymous adjectives ‘big’, ‘large’, and ‘great’, Biber and colleagues (Biber, Conrad & Reppen 1998: 43–51) found that ‘big’ preferably combines with nouns denoting physical size in academic and fiction prose. In fiction texts, ‘large’ also combines most with nouns denoting physical size, whereas it combines with nouns of quantity and amount in academic prose. Finally, ‘great’ also refers to quantities and amounts across the two text genres/registers, but has a wider range of senses than the other two adjectives. Semantic prosody describes speaker attitude or evaluative-emotive content (negative, positive, or neutral), expressed with a node word when it is put in a context with collocates that have a particular evaluative-emotional meaning component (cf. Louw 1993). For example, the English lemma CAUSE has been found with collocates of negative association (e.g., cause of death; Stubbs 1995), and, hence, a critical assumption for language productivity that follows from this observation is that speakers would use CAUSE whenever they intend to convey this negative prosody.
Some research approaches derived from corpus linguistics (such as computational linguistics) employ a concept strongly related to collocations: N-grams (also lexical bundles, clusters or recurrent combinations). ‘N’ refers to the number of elements in the multi-word expression when the number of words is set to two (2-gram) for an analysis on English prepositions, results would include sequences such as ‘instead of’. When set to three (3-gram), results would include three-word sequences such as ‘in lieu of’ in English, consequently allowing for the analysis of contextual differences in use between these two sequences with similar meaning. When combined with concordance lines, concgrams can reveal recurrent patterns, both continuous and discontinuous. It is therefore an interesting tool for phraseologists who are interested in whether additional material can intervene between the constituent parts of a phrase or idiom.
153
154
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
5.3.3.2
Further Types of Analyses
The types of analysis introduced in this section also investigate systematic patterns of language use and their variation within and across language varieties, but deviate from the types of analysis presented in the previous section on methodological grounds regarding the basic research approach or specifics of the analysis: • • •
a cross-domain analysis: collostructional analysis a multi-method approach: corpus-assisted discourse analysis a multivariate analysis: multi-dimensional (MD) analysis
Collostructional analysis is a variant of collocation analysis that exclusively focuses on the interplay of a lexeme and a syntactic construction (hence, the blend of collocation and construction; Stefanowitsch & Gries 2003; Gries & Stefanowitsch 2004a, 2004b; Gries 2013b). As such it relates to the debate on whether syntax provides grammatical rules to combine lexical meanings during sentence composition or whether syntactical constructions can have their own meaning (Goldberg 1995). The approach investigates the reciprocal relationship between items stored in the mental lexicon (e.g., verbs) and syntactic constructions, specifically to what extent a lexical item prefers some constructions over others or, vice versa, to what extent a syntactic construction attracts or repels a lexical item over others. A prime example stems from the dative alternation with ditransitive verbs in English (i.e., the choice between double object or prepositional object structures) that show variable linking of verbal arguments and syntactic function, e.g., (a) ‘Lisa gave Mary the apple’ vs. (b) ‘Lisa gave the apple to Mary’. Collostructional analysis is used, for example, to identify which ditransitive verbs (e.g., ‘give’, ‘take’, ‘send’) favour a double object structure (a) or a structure with a prepositional object (b) or whether the two syntactic structures prefer ditransitive verbs to a different degree. It therefore helps to describe variation at the interface of lexical and construction-based knowledge in more detail. Corpus-assisted discourse analysis, is a multi-method approach that combines qualitative and quantitative analyses (cf. also Section 2.5 on mixed methods) and originates from critical discourse analysis (cf. Section 6.1). The starting point of a discourse analysis is a qualitative analysis of a small number of texts (also close-reading) that seeks to identify recurrent patterns and meanings conveyed by speaker groups in (mostly) socio-political discourse who intend to signal social power (as)symmetries. Such qualitative analyses are then complemented by quantitative estimates derived from keyword lists and keyword clusters or collocations, as extracted from specialised corpora. This especially facilitates the comparison between the language use of different discourse participants or between different discourses types (e.g., various newspapers, academic fields). Such variationist analyses can be performed in an intra-corpus manner or by comparing several corpora. Some typical research questions have
Corpus Linguistics
investigated the semantic connotations (cf. Section 5.3.3.1 on semantic prosody and semantic preference) of gender stereotypes and how discourses vary in expressing them. Others have looked at how different media involved in a particular socio-political discourse shape concepts for sociological variables such as ethnicity by using specific lexical patterns (see Baker 2010 for an excellent overview). The second approach, multi-dimensional (MD) analysis is characterised by its particular statistical approach (multivariate analysis) that takes multiple linguistic features into account simultaneously. As summarised in McEnery and Hardie (2012: 104–115), MD analysis resulted from the observation that variationist studies had produced partly contradictory results, and located the source of this divergence in the heterogenous use of linguistic features across the studies (Biber 1986). It uses a statistical procedure named factor analysis which tests whether several linguistic features such as aspect, nominalisations, or noun modifiers co-vary with one another. Features that co-vary form a cluster and are then interpreted as a single functional factor. Factors describe linguistic variation in a particular dimension that is independent of other clusters. In an early MD study, Biber (1988) analysed 67 features and identified five statistically significant dimensions for spoken and written Standard English (e.g., dimension 2: narrative vs. non-narrative concerns, dimension 3: explicit vs. situationdependent reference). He also ranked registers (understood here as a text group defined based on their context of use, purpose, etc.) on each dimension, which yields a typicality ranking of registers per dimension. Using this set of dimensions, MD analysis has subsequently been applied to corpora in diachronic studies to shed light on the development of stylistic features in different registers or in cross-linguistic investigations to highlight similarities and divergencies of registers in genetically different languages (Biber & Finnegan 1989; Biber 1995). Finally, the approach can also be applied to specify features of understudied or new text types and registers, hence, creating further register-specific dimensions (see McEnery & Hardie 2012 for further discussion). 5.3.4
The Web as and for Corpus
The internet has undoubtedly changed human communicative behaviour (cf. computer-mediated communication corpora in Section 5.3.1), but it has also affected linguists’ access to various, open-ended sources for their research. It is an electronic repository of enormous amounts of authentic language data produced in fully natural contexts. On the one hand, researchers can use the web as a corpus, making queries either via general search engines or by means of linguistically more advanced metasearch tools that work on top of search engines such as WebCorp performing online concordancing and word lists. This is known as the web as corpus approach. Obviously, the use of commercial search engines can be problematic because they are not designed with corpus linguistic criteria of data collection in mind, which may lower the precision and
155
156
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
recall of data extraction or prohibit some searches (Bergh & Zanchetta 2008; Lew 2009; Diemer 2011). For instance, search engines would typically only return one hit per web page (even if there are multiple hits), rank the results list based on factors unrelated to the research question, or only search within indexable web pages. On the other hand, researchers can also use the internet to systematically collect texts to compile a corpus for their research question at hand (cf. Sharoff 2006a, b). For example, researchers may pre-select thematically grouped webpages and extract linguistic information from them (e.g., digital newspaper websites, literary archives such as Project Gutenberg). This is known as the web for corpus approach. This approach is associated with tools that automatically search the web for relevant contents matching criteria set by the researcher, downloading the web content and compiling the corpus (e.g., KWiCFinder; Fletcher 2001, 2004). An advantage of this approach is that the data can be re-used in other research projects or be subjected to reproducibility studies in order to validate research findings. This addresses the omnipresent problem that the internet is in flux, with pages being updated, added or deleted, and with indexing and search strategies of commercial search engines being modified. There are in fact good reasons to work with the web as corpus both contentwise and regarding data availability: • • •
mega corpus of infinite size continuous updating free and easy data access
First of all, the internet is an ever-growing mega corpus that is unparalleled when compared to any conventional corpus to date. With its dynamic composition and growth, the internet is an optimal database to study rare phenomena for which conventional corpora may not allow for a sufficient or only a biased number of observations. Relatedly, its continuous updating makes the study of language use in all its facets possible in a more timely manner. Together, growth rate and continuous updating can provide a speedy supply of new data for research that cannot be delivered as quickly by sample or monitor corpora requiring months or years for compilation. Thus, the fast growth rate and the increasing access to online communication worldwide make it possible to study linguistic productivity or change in virtually all linguistic domains as it happens (Leech 2007; Diemer 2011). Finally, the web user has free and easy access to much, if not most, of the public indexable web pages, which in turn mitigates research biases and discrimination due to the availability of funding to purchase access rights etc. However, pursuing the web as/for a corpus approach also necessitates a careful reflection of the most important criteria in corpus design (see Section 5.3.1) – authenticity, representativeness, balance, and size – because the internet has not been compiled following any design criteria as is common for conventional corpora (cf. Gatto 2014: 41–63):
Corpus Linguistics
•
•
•
•
authenticity: The internet contains noisy data in the sense of unedited texts, including, e.g., user errors such as grammatical mistakes or misspellings and their possible corrections, or formatting and coding errors affecting the representation of special characters (such as ‘mojibake’ where special characters are garbled or the substitution of ‘ß’ to ‘ss’ in German scripts). This may be problematic in cases where errors cannot be easily identified as such or where there is variation in what counts as an error. For example, errors from nonnative speakers of a language may be highly informative – being one reason, why there are learner corpora – but they may distort the data when native speaker competence is of interest. Thus, this necessitates systematic metadata identifying speaker groups, which however is not systematically available in the case of many web pages. size: There are only approximate estimates of the size of the web, and there is no consensus on the unit of size calculation: is it, e.g., the number of web pages or bytes of data? From an analytical perspective, the overwhelming size of the web poses additional challenges to linguists. Larger amounts of data can reveal more significant patterns, i.e., a more fine-grained picture overall, but many of these may be unexpected, calling for more exploratory work and possibly different statistical evaluation. Also, if the exact size of the corpus is unknown, what do the quantitative frequency estimates mean, also considering that many of them will be based on a sample of the web? representativeness: The notion of representativeness implies that it is a direct consequence of the sampling procedure adopted for text selection and corpus compilation. In this sense, the web, as anything but a carefully designed sample, cannot be regarded as a representative snapshot of a language (Leech 2007). However, its size allows researchers to compile web-based specialised corpora for variants of rare or newly emerging phenomena that would be harder to find and analyse in conventional corpora. balance: The web’s composition and its degree of balance have been debated for decades, as it remains far from clear what categories should be used to classify its subparts (see Kilgarriff 2001a; Bergh & Zanchetta 2008; Gatto 2014). Two of those subparts stand out in the discussion: medium of conversation and languages covered. The web encompasses a broad range of texts from those with an analogue correspondent (e.g., literary archives) to those that are unique to the web (e.g., Twitter), many of them crossing the traditional boundaries of written vs. spoken texts. So, the medium of communication has an essentially different meaning for web-as-corpus studies. While the internet has long been recognised as predominantly being a ‘corpus of English’ (Bergh & Zanchetta 2008: 313), it is now acknowledged
157
158
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
that it has developed into a multilingual corpus, with major and minor languages as well as a faster growing rate for languages other than English (de Schryver 2002; Gatto 2014). In summary, the web-as/for-corpus approach bears great promise for the study of authentic language use, especially if linguists aim to study language productivity and change as they happen. This sets apart the web as corpus approach from the usage of CMC corpora that assemble texts of computer-mediated communication (cf. Section 5.3.1) and are mostly sample corpora of a fixed and smaller size. However, the wealth of unstructured and uncontrolled data from the internet also pose a number of challenges to researchers when it comes to data extraction and meaningful interpretation of the research findings.
5.4
Basic Research Findings
Research in corpus linguistics is interested in the distribution or frequency of linguistic patterns across and within language varieties or discourse domains, be it very frequent or very infrequent ones. Hence, corpus linguistics provides descriptive analyses of patterns of language use and language-internal variation. Most of the research can be classified as quantitative, but for research targeting discourse practices and for research based on annotated corpora, a qualitative perspective is equally important. Discourse practices can only be fully understood against the social background of the discourse participants. And in general, any calculation of frequency patterns of distinct linguistic units in an annotated corpus hinges on a thorough definition and annotation of what counts as one unit versus another. So, meaningful quantitative estimates are only possible on the grounds of a carefully implemented annotation. In order to interpret quantitative research results, researchers have to bear in mind the following: •
•
•
The analysis outcome (e.g., the distribution of a linguistic unit) depends on the criteria for corpus selection or compilation (i.e., its representativeness, balance, size, quality of annotation, etc.) and the quality of the corpus query (i.e., being exhaustive, exclusion of irrelevant hits, etc.). The absence of positive evidence for a pattern or distribution does not constitute absolute evidence against that pattern being part of language grammar. Such evidence can only be obtained in combination with elicitation or experimental designs (cf. Lemnitzer & Zinsmeister 20153: 51–54). Corpus linguistics is concerned with language as a product that results from cognitive processes. This means that, on the one hand, corpus editing and annotation will be biased towards a certain language code (e.g., the standard variety of a language) and, on the
Corpus Linguistics
•
•
other, that corpus data tend to represent the final product of language, but not the intermediate stages. Thus, process-related aspects of language use (errors such as misspellings, improper use of meanings, corrections, etc.) may not be accessible for analysis in every corpus type (due to editing and lack of annotation). Again, this can be complemented with other research methods (e.g., from psycholinguistics and cognitive linguistics, cf. Chapter 7). Combining corpus linguistic with experimental or elicitation-based evidence provides a unique opportunity for cross-methodical comparison – and, hence, cross-disciplinary research –which may yield insights into phenomena that elude a systematic investigation with only one empirical research method. An obvious example is the influence of experimental task or observer bias on the data, which can be assessed indirectly via comparison with corpus data being typically of natural and spontaneous origin. A final aspect concerns the reproducibility and replicability of research findings. Results are reproducible if they can be obtained by repeating an analysis with the original data and analysis code (but independent of the original investigator), while replicability is defined as obtaining consistent results across different studies that pursue the same research question. The trend of compiling new corpora for specific research questions makes it more difficult to assess corpus quality with respect to whether a corpus enables consistent results for similar research questions. A similar point can be made for the web as corpus, especially for corpora created ad hoc or corpora based on hits from what later becomes a broken link.
Corpora have a strong impact on empirical linguistics as most corpora can be analysed to answer a variety of research questions from different linguistic domains and subdisciplines. This is why we did not classify corpus-based research as a source of primary data (cf. Section 1.1.4), but as primarily providing secondary data. Also, in contrast to documentary linguistics where the focus is on corpus compilation, i.e., the generation of primary data (cf. Chapter 3), corpus linguistics also puts particular emphasis on corpus analysis. The following list gives an overview of research outcomes in various linguistic subdisciplines which are obtained, among others, by corpus linguistic analysis, with the strength of impact varying amongst the different subdisciplines: • • • •
usage-based grammars of major languages as well as of documented languages (descriptive linguistics, cf. Chapter 3) lexicography of major languages as well as of documented languages (descriptive linguistics, cf. Chapter 3) diachronic language change (historical linguistics) synchronic language-internal variation: variety-specific or speakerspecific variation (sociolinguistics, cf. Chapter 6)
159
160
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
•
synchronic cross-linguistic variation: analysis of parallel texts (comparative or typological linguistics, cf. Chapter 4) discourse and pragmatics language and cognition (psycholinguistics, cf. Chapter 7) language learning and teaching (applied linguistics) natural language processing (computational linguistics)
• • • •
Traditionally, linguistics has always been concerned with describing and explaining the grammar of a language or with writing a reference grammar for an as-yet unexplored language. Corpora have had a major impact in this field, especially for major languages, as they contributed to projects developing usagebased grammars. Usage-based grammars do not merely describe the system of a language, but also provide information about the frequency of grammatical forms in that language. A well-known example being the corpus-based grammar of Modern English (Biber et al. 1999) or the reference grammar of the Englishbased Creole Tok Pisin (Verhaar 1995). This has had repercussions in theoretical linguistics, initiating debates on the binary vs. gradient nature of linguistic concepts. In this way, grammars became more similar to dictionaries in the field of lexicography, where the frequency of word meaning and senses, besides the list of word meanings themselves, has always been a significant piece of information. Dictionaries may also contain collocational information (e.g., Davies & Gardner 2013 for contemporary American English). Related to this, corpus analysis also gains importance in the field of descriptive linguistics (cf. Chapter 3), where documentary corpora are not only compiled to be stored in archives but used as a data foundation for the description of the vocabulary and grammar of endangered languages. While these corpora may not be typical when their size is compared to the size of ready-made corpora for major languages, they typically include authentic natural data and balance various text types and genres. To the extent that these are complemented by elicited data, they allow for interesting methodical comparisons between natural language data and elicited data (e.g., judgements or text editing by informants), if elicited data is treated as a separate text type (cf. Mosel 2014). This can shed light on the correspondence between elicitation methods and corpus linguistics methods in terms of linguistic forms and their functions. Another prominent field where corpus linguistics methods are heavily used is the study of language change and variation, both in time (diachronic language change studied in historical linguistics) and in space (e.g., language-internal and cross-linguistic types of synchronic variation as studied in comparative or typological linguistics). For example, many studies in historical linguistics have explored changes in encoding pragmatic information such as speech acts across time. With multi-dimensional (MD) analysis, changes in significant markers of text registers have been investigated from a diachronic perspective. For investigations of synchronic variation, one focus is on the text as the unit of analysis. For instance, investigating the ‘World Englishes’ is probably the most prolific
Corpus Linguistics
research field within variationist English Linguistics and has resulted in large corpora including several English varieties (e.g., the International Corpus of English). When the individual speaker and their style is the central unit of analysis, as in corpus-based variationist sociolinguistics or dialectology (cf. Chapter 6; Szmrecsanyi 2017), spoken corpora in particular have proved to be significant tools, because they provide sociolinguists and dialectologists with large amounts of data. In addition, large general corpora such as the BNC not only include demographically diverse speaker profiles, but also provide detailed metadata on the speech genres. However, work with spoken data often needs speech annotation prior to analysis, so the number of sociolinguistic studies using corpora is somewhat smaller than sociolinguistic studies using other research methods. Finally, areal distributions of linguistic features are increasingly investigated from a cross-linguistic or typological perspective by means of parallel corpora or distributional analyses (cf. Chapter 4). Pragmatics and discourse studies have benefited from corpora well before the rise of multimodal corpora that facilitated exploring language in natural discourse (cf. Baker 2006, 2010b; Aijmer & Rühlemann 2017). When corpus linguistics methods are combined with analysis techniques from critical discourse analysis (CDA) or conversation analysis, this essentially shows the benefits of combining quantitative and qualitative research. On the one hand, the qualitative text analysis from CDA, for instance, can reveal an in-depth description of how language signals social relations between different groups of speakers in a particular context (cf. Chapter 6), while quantitative estimates from corpora add to the generalizability of the results. In disciplines such as stylistics and forensic (socio)linguistics quantitative analyses such as keyword lists can help identify the language style of authors or individuals. Such studies are often devoted to specific text genres such as poetry and drama in stylistics or in forensic linguistics blackmail letters or plagiarised texts where the writer needs to be identified. Corpora also have a growing impact in research on language cognition (cf. Chapter 7). Language acquisition and language production research has traditionally relied on specialised learner or developmental corpora, investigating systematic (error) patterns in speech and/or language in the recorded utterances. Similarly, there is a growing field of corpus-based cognitive linguistics in which corpus data are used to investigate the cognitive concepts underlying language use. Corpora also gain more and more importance in applied linguistics, especially in the field of language pedagogy and second language teaching. Corpora are used to prepare teaching materials so that second language learners, for instance, become familiar with more natural uses of the language they are learning. This may include contextual embeddings of vocabulary or syntactic constructions that are tied to particular speaker groups – knowledge about language use that is not typically included in standard textbooks or grammars. Finally, natural language processing (NLP) or, more generally, computational linguistics make heavy use of large-scale corpora for various purposes such as the development of algorithms for speech recognition, machine translation, automated
161
162
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
checking for spelling or grammar errors in texts, or machine-human interface systems (such as Siri, Alexa, etc.). Mega corpora such as the web are especially useful as they contain massive amounts of linguistic utterances from which machines learn the regularities and meaning of human language.
5.5
Summary
This chapter has introduced corpus linguistics methods, which are not considered a particular subdiscipline in its own right but a methodology that intersects with various other linguistic subdisciplines. Corpora aim to provide authentic data of language use in natural contexts, hence, they represent a special type of observational data. There are three general ways of using corpora in linguistic research – corpus-informed, corpus-based and corpus-driven – with corpus-based research being the dominant kind. Because of its interaction with many different linguistic subdisciplines, there is a huge variety of corpora with markedly different characteristics. Therefore, corpus types are ranked according to a corpus typology consisting of a few macroscopic criteria that determine data sampling for conventional corpora. The most important criteria in corpus design are representativeness, balance, and size. The web, in that it is not a systematically sampled corpus, represents its own kind of corpus. The analysis approach of corpus linguistics is largely quantitative, but can be complemented with qualitative analyses, especially in discourse-pragmatic research. The research outcome of corpus linguistics is a description of the frequency or distribution of systematic patterns in natural language use, which can be combined with other research methods such as experiments for explanatory purposes. Common analyses types measure frequency counts of individual words or word clusters that show a preference to co-occur, resulting in lexico-semantic or lexico-syntactic collocations. The quality of the quantitative findings crucially relies on the quality of corpus compilation (for specialised corpora) or corpus selection (of ready-made corpora) with respect to the research question and variables to be studied. While corpora provide a wealth of natural language data, tokens are distributed following Zipf’s Law – i.e., few items are very frequent, whereas the majority of items occur only once. As a consequence, the study of rare phenomena requires large corpora and the absence of positive evidence for a phenomenon is not absolute evidence against its existence in grammar.
5.6
Exercises and Assignments
Exercises for students which can be included during a session on corpus linguistics or as part of project work: 5.1
Many learner corpora consist of raw data without further linguistic annotation. Discuss the problems that may arise when automatic software for POS tagging or lemmatisation, which have usually been
Corpus Linguistics
5.2
5.3
5.4
5.5
5.6
5.7
trained on the basis of native speaker material, is applied to annotate learner corpora. ‘Time flies like an arrow, fruit flies like a banana.’ Discuss to what extent the morphosyntactic ambiguities in this sentence pose a challenge to the automatic annotation regarding word classes, phrase structure, and semantic meaning. Can you think of further similar examples including other linguistic domains also (e.g., pragmatics, phonology/orthography)? The type-token ratio (TTR) measures the lexical richness of variety in vocabulary and is thus particularly suited for research on language acquisition or language complexity. Find two texts in different modalities (spoken, written, or signed) for a (i) child or (ii) an adult foreign language learner and compare their vocabulary richness for the modalities. Discuss the implications of your results. What kind of additional analyses may also be useful? The German alphabet includes a few special characters, one of them being ‘ß’ (eszett), representing a voiceless dental fricative just like the letter ‘s’ or the combination ‘ss’. A major spelling reform from 1996 partly changed the rules according to which writers in Germany had to use either eszett or ‘ss’. For example, after a long vowel the use of ‘ß’ became obligatory (as in Fuß ‘foot’), and ‘ss’ after a short vowel (as in Fluss ‘river’). Note that in some German-speaking countries only ‘ss’ is used (e.g., Switzerland) and that for typographical reasons, ‘ss’ has been used instead of eszett before the spelling reform. Discuss to what extent a general corpus of German vs. the web as corpus is appropriate to investigate variation in these spelling alternatives in the first 5–10 years after the spelling reform. Take a research question from Table 5.1 (or another one that comes to your mind). Which corpus or dataset is useful to investigate that research question? Choose a specific text genre or register and perform a keyword analysis by comparing the frequency lists for the specific genre or register with the frequency list of a general reference corpus. What content is particularly frequent for that genre or register and what does this tell you about that genre/register? For example, compare frequency lists for work-related registers (banking conversation, academic texts) or special populations (youth speech) with a general corpus.
Further Reading
There are many introductions to corpus linguistics, which, however, differ somewhat in whether they follow a corpus-based or a corpus-driven approach. Discussion about the distinction between corpus-based and corpusdriven approaches can be found in Gries 2012 and McEnery & Hardie 2012.
163
164
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Introductions and articles with a corpus-driven focus are Teubert 2005, TogniniBonelli 2001, Stubbs 2001 and textbooks from the intellectual father of the corpus-driven approach, Sinclair (1991, 2004). Introductions to corpus-based research are Kennedy 1998, Baker 2010a, McEnery & Hardie 2012, Lemnitzer & Zinsmeister 20153, McEnery & Wilson 1996/20012, Stefanowitsch 2020, and Meyer 2002. Hunston 2002 and O’Keeffe, McCarthy & Carter 2007 introduce corpora in various branches of applied linguistics (e.g., language teaching, stylistics, forensic linguistics). An introduction to natural language processing is Manning & Schütze 1999. Abeillé 2003 is an early collection of work on treebanks in different languages. Works that specifically cover aspects of corpus compilation include McEnery, Xiao & Tono 2006 and Wynne 2005. Web-based linguistics is introduced in Crystal 2011, Gatto 2014, Kilgarriff 2001b, Kilgarriff & Grefenstette 2003, and the edited volume by Hundt, Nesselhauf & Biewer 2007. Sharoff 2006a and 2006b give a practical overview of compiling one’s own corpus from the web. Introductions to statistical analyses of corpus data can be found in Gries 2009 and Kilgarriff 2001a. The edited volume by Baker & Egbert 2016 provides a comparative overview of different corpus linguistic analyses, all applied to the same corpus. Bondi & Scott 2010 and Scott & Tribble 2006 provide introductions to keyword analyses. In addition, there are many handbooks and edited volumes with highly informative target articles on the topics discussed in this chapter. The handbook edited by Lüdeling & Kytö 2008 probably covers the widest range of topics in corpus linguistics. Somewhat more recent handbooks are O’Keeffe & McCarthy 2010, which also includes chapters on the compilation and annotation procedures for corpora of different modalities (e.g., Adolphs & Knight 2010 on spoken corpus compilation), and Biber & Reppen 2015, which focuses on research outcomes in linguistic subdisciplines working with corpora. The volume edited by Baker 2009 includes a variety of different topics, from corpus pragmatics to web as corpus and aspects of corpus annotation. The leading scientific journals are the ‘International Journal of Corpus Linguistics (IJCL)’ (Benjamins), ‘Corpora’ (Edinburgh University Press), ‘Corpus Linguistics and Linguistic Theory’ (De Gruyter), and the ‘ICAME Journal’ (Sciendo), but given the highly interdisciplinary nature of the research questions pursued with corpus linguistics methods, studies including corpus analyses can be found in a myriad of journals from different linguistic subdisciplines. There are also many resources on how to find existing corpora and resources that are appropriate for one’s research question. The articles by Xiao 2008, Lee 2010 and Ostler 2008 review existing corpora of major and less studied languages. The books by Meyer 2002 (Appendices 1 and 2) and McEnery & Hardie 2012 (Chapter 2) provide further information on corpus resources, including tools for tagging and analysis. Raysen 2015 reviews tools for corpus compilation and analysis. The article by Allwood in the handbook edited by Lüdeling & Kytö 2008 on multimodal corpora includes links to tools for analyzing multimodal corpus data. The following list of URLs provides further resources and links to corpora:
Corpus Linguistics
• • • • • • • •
www.english-corpora.org (a list of English corpora, including those with online access) linguisticsweb.org (a site with resources and list of corpora, including an access portal to some corpora) the open multilingual Wordnet (http://compling.hss.ntu.edu.sg/omw/), a lexical database) www.cmc-corpora.net (a resource site with references to publications based on computer-mediated communication (CMC) corpora) http://martinweisser.org/corpora_site/CBLLinks.html (originally maintained by David Lee; probably one of the most detailed descriptions of corpora available) http://ucrel.lancs.ac.uk/links.html (links to various resources in corpus linguistics) http://cecl.fltr.ucl.ac.be (links to learner corpora and a bibliography on learner corpus research) WebCorp: www.webcorp.org.uk/live/ (online concordancing tool)
165
6
Sociolinguistics and Anthropological Linguistics
Sociolinguistics and anthropological linguistics – the latter also known as linguistic anthropology (depending on which aspect is being focussed on) or as ethnolinguistics – are two interdisciplinary linguistic subfields investigating language in its socio-cultural context. More precisely, this means considering the relationship between language use or linguistic forms and speakers’ social features or culture-specific ideas and values. While sociolinguistic studies traditionally examined statistical correlations of linguistic features and speakers’ social parameters in Western contexts, the traditional focus of anthropologicallinguistic field research was on linguistic features expressing cultural traits of non-Western societies. Now, however, both subdisciplines have converged significantly with regard to research topics, objects of investigation, and empirical methods. In this chapter, we will review key aspects of research in the methodologically heterogeneous field of sociolinguistics and anthropological linguistics. The diversity of the approaches notwithstanding, we will outline their shared core ideas and fundamental methods. First, the chapter covers the fundamental research aims and questions (Section 6.1) and the basic approaches of sociolinguistics and anthropological linguistics (Section 6.2). Subsequently, we address selected key issues of empirical research such as participant observation as the method that is vital for ethnographic fieldwork, data types and techniques of data collection, the identification and sampling of research participants, and methods of data analysis (Section 6.3). Furthermore, the chapter contains an overview of possible research outcomes, illustrating how diverse the relationships between language and sociocultural factors can be (Section 6.4). Finally, the chapter is summarised in Section 6.5 followed by exercises on individual methodological aspects, ideas for your own research project (Section 6.6), and suggestions for further in-depth readings on key issues in sociolinguistics and anthropological linguistics (Section 6.7).
6.1
Research Aims and Questions
Starting from the assumption that language is shaped by its sociocultural context, the fundamental research aim of sociolinguistics and anthropological linguistics is to discover social parameters or culture-specific concepts that correlate with or even induce linguistic variation (cross-linguistic and
Sociolinguistics and Anthropological Linguistics
language-internal variation). Thus, the research aims to uncover sociocultural meaning that is encoded in linguistic forms and language use. Sociolinguistics is primarily interested in the relationship between language and society. Depending on the perspective on the relationship, distinct subareas can be distinguished. Sociolinguists in the narrow sense (also called microsociolinguists) are primarily linguists who investigate the effect of social aspects (e.g., the speakers’ age or their place of origin/living) on language use (primarily language – internal variation), as their basic research questions illustrate: •
•
•
•
variationist sociolinguistics: Does the use of linguistic variants correlate systematically with social features of the speaker? And which social parameters have an impact on language variation (gender, age, etc.)? ➔ including the large subarea of research on language and gender: How does the speech of women and men differ? How is gender constructed through language? But also, how is gender encoded in linguistic forms? dialectology and lately researchers on pluricentric languages also study language varieties along the dimension of space: What are regionally divergent language features (i.e., distinctive characteristics of dialectal varieties) and how is such variation distributed geographically? interactional sociolinguistics: What are conversational practices and discourse strategies which are characteristic for a particular language variety? How do people use language in different social contexts or settings? What is the social meaning encoded in speech? historical sociolinguistics: What sociocultural parameters affect language change?
Besides these diverse research traditions, sociolinguistics in a broader sense (also called macro-sociolinguistics) also includes research on the effect of language on society which is primarily conducted by sociologists and social psychologists: •
sociology/social psychology of language: What attitudes and perceptions do speakers have about a language or language variety? How do these beliefs and ideologies shape (linguistic) behaviour? And, how is identity linked to language, i.e., how do sociocultural communities define themselves by language?
Anthropological linguistics, on the other hand, primarily aims to uncover the relationship between language and culture. In this perspective, language is a tool for encoding cultural ideas/concepts in language-specific categories and practices. Similar to the different perspectives on the relationship in sociolinguists, we can distinguish between anthropological linguistics and linguistic anthropology. While the former is primarily home to linguists with a particular interest in the impact of culture on cross-linguistic variation, the latter primarily houses social/cultural anthropologists studying language as a cultural practice. Hence, the fundamental research questions are:
167
168
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
•
anthropological linguistics (closely linked to cognitive/cultural linguistics, cf. Chapter 7): To what extent do patterns of cross-linguistic variation reflect cultural ideas and practices? And, what is the cultural meaning encoded in linguistic forms (lexical items, semantic parameters, and formal categorisation)? linguistic anthropology (similar to interactional sociolinguistics): What are culture-specific conversational practices or ‘ways of speaking’? What are emic genres? And what is the cultural meaning encoded in these linguistic practices? research on language & culture contact: Which linguistic phenomena occur in multilingual settings? What determines language choice in code-switching? How do languages change (transfer), emerge (pidgins & creoles), or vanish (language death)? research on language socialisation & acquisition: In what way does language socialisation differ across various cultural settings? And how does first language acquisition vary cross-culturally?
•
•
•
In several areas of study, there is significant overlap between sociolinguistics and anthropological linguistics, particularly in the fields of conversation/discourse practices, language and politeness, and language and gender. Table 6.1 contains specific examples of research questions from each linguistic area. While socio- and anthropological-linguistic studies generally aim at uncovering relationships between language and sociocultural contexts, some approaches go beyond the academic framework pursuing applied objectives. For instance, sociolinguists working in the tradition of critical discourse analysis and the early variationist approach investigate the relationship between language and social issues such as power and poverty, wishing to uncover social abuses and, eventually, even to find solutions for these problems. Another example is forensic sociolinguistics which aims to identify a speaker via language material: •
Does knowing certain social features of a speaker allow for predictions about linguistic choices? Can we draw conclusions on a speaker’s social features based on a given text (oral or written)?
And, in documentary anthropological linguistics, it can be an applied objective to preserve cultural knowledge as it is encoded in endangered languages (cf. Chapter 3).
6.2
The Sociolinguistic and the Anthropological Linguistic Approach
Both subdisciplines study language in its sociocultural context. Language is regarded as an instrument of social and cultural interaction. Community-specific meaning is encoded in linguistic forms and practices, and
Sociolinguistics and Anthropological Linguistics
169
Table 6.1 Example research questions in sociolinguistics and anthropological linguistics Sociolinguistics & Anthropological linguistics Linguistic domains: Phonetics & phonology
Morphology & syntax
Lexicon & semantics
Pragmatics & discourse
1. What group of speakers (e.g., male vs. female, old vs. young, more vs. less educated) use different phonetic variants? 2. How does emotionally loaded speech (e.g., expressing aggression, fear, or affection) differ from ‘normal’ speech in terms of prosody – in a particular language? Are there cross-linguistic similarities or differences? 3. What morpho-syntactic features characterise dialectal variation of a language? How are these features distributed geographically? 4. Do speakers of languages with a past/non-past tense system have a culturally distinct concept of time (i.e., behaviour with regard to the future such as the importance of planning ahead or making appointments, and with regard to the past such as the importance of genealogies) than speakers with a future/non-future tense system? 5. How do lexical innovations spread in youth language? 6. What terminological distinctions are made in a language regarding animals? Does this classification reflect the culture-specific manner of dealing with these animals (e.g., food vs. non-food, farmed vs. wild, caught by men vs. by women)? 7. What lexemes or semantic fields are taboo and what are their substitutes or strategies of avoidance? 8. How is politeness encoded in a specific language (e.g., honorifics), and which social differences are encoded by this system (e.g., rank, age, or intimacy)? 9. What are turn-taking and repair strategies in political debates on TV or in video-conferences with variable connection quality? 10. What are the source domains for metaphors to talk about ‘love’ and ‘hate’ in different languages? And what is the cultural significance of the findings?
Cross-disciplinary fields: Language 11. What strategies do parents use to support the first language acquisition acquisition of their children (simplified talk, direct/indirect correction of mistakes, repetition, etc.)? How do these strategies differ from societies in which children are primarily raised by their older siblings? Language contact 12. What attitudes and ideas are linked to the languages of bilingual societies? Do they differ between distinct groups of speakers (e.g., with different cultural or social backgrounds)? 13. Which socio-cultural factors are associated with language death? Language change 14. What’s the social status of speakers who initiate or determine the distribution of new linguistic variants within a speaker community?
170
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
language use reveals information about the speakers’ belonging and their interpersonal relationships within a community. Furthermore, language may establish, strengthen, or weaken socio-cultural structures. This means not only that sociocultural parameters can have an impact on language and language behaviour, but also vice versa. In general, research focuses on language as a sociocultural means of expression situated among other practices (such as clothing, the sequence and type of action, spatial arrangements, etc.). In Tonga, for instance, social stratification is expressed via linguistic means (a referent honorific system called the language of respect, the order of address in speech preludes, and the semantic parameters of kinship terminology) as well as nonlinguistic practices (the height of waist mats which are worn, sitting order, and the exchange of gifts and services). At the same time, these practices serve to indicate social rank and status among group members, and thus can maintain old, reinforce new, or even challenge established social structures (Völkel 2010). In the broad field of sociolinguistics and anthropological linguistics, there are numerous other examples illustrating relationships between the two fundamental parameters of language and society/culture. If one wishes to make a distinction between the two disciplines, this can be done by emphasising differences with regard to parameters such as the precise research aim or topic, the conventional research location, and the traditional methodology. While sociolinguists primarily investigate the relationship between social parameters and language-internal variation, anthropologicallinguistic studies focus on the relationship between cultural aspects and crosslinguistic variation. Typological studies produce results on cross-linguistic variation and distinctive linguistic features or language characteristics (cf. Chapter 4), whereas corpus analyses of language usage data mainly provide information on language-internal variation such as alternative pronunciations, grammatical forms, words, speech styles, or language varieties (cf. Chapter 5). The primary aim of (variationist) sociolinguistics is to detect social features of the speakers (independent parameter) which correlate with their linguistic behaviour (dependent parameter), while in anthropological linguistics the search is for the cultural meaning found in linguistic forms and practices. Foley (1997: 4) offers a good example of these differences in research focus. A sociolinguistic investigation (comparable to Labov’s study, cf. Section 6.3.4) would show that the use of Yimas (a Papuan language) vs. Tok Pisin (a creole and the lingua franca of Papua New Guinea) in a contemporary Yimas village correlates with gender and age of the speakers: older people tend to speak Yimas and younger ones Tok Pisin; female speakers tend to use Yimas and male ones Tok Pisin. In contrast, anthropological-linguistic studies focus on cultural aspects determining language choice: Yimas is the language of village life (the domain of females) and of the traditional way of life (the domain of older people), while Tok Pisin is the language of political and economic interaction with other ethnic groups (the domain of males) and of the modern world (the domain of younger people). By such complementary collaboration, anthropological linguistics
Sociolinguistics and Anthropological Linguistics
makes sociolinguistic quantification meaningful, as Eckert (2014) frames it. The qualitative research adds an emic, contextual and explanatory perspective to the previously found etic correlation parameters. Following in the anthropological tradition, anthropological linguists are mainly interested in rural areas of distant non-Western small-scale societies (see the anthropological-linguistic study in Foley’s example). This is also the typical kind of research location in documentary and descriptive linguistics (cf. Chapter 3). In contrast, sociolinguistic studies are primarily conducted in urban settings of Western societies (e.g., suburbs, high schools, companies, organisations, or even internet forums). However, at base the research location in both subdisciplines is the field (i.e., the natural environment of the speakers), and even the distinction made above with regard to discipline-specific field sites becomes increasingly blurred: anthropological linguists conducting studies in urban areas of non-Western locations or even in Western societies (e.g., Underhill 2012), sociolinguists doing research in non-Western contexts (see the sociolinguistic study in Foley’s example). An interesting field combining sociolinguistics and anthropological linguistics is to be found in comparative studies of varieties of pluricentric languages (e.g., World Englishes). As these varieties develop in distinct cultural environments around the world, the question arises whether linguistic differences between the language varieties can be associated with culture-specific ideas and practices. In comparison to the investigation of native languages, the study of English varieties generally includes aspects of language and culture contact. Moreover, the traditional distinction between the two subfields in terms of research focus and methodology is becoming increasingly less clear. The predominantly quantitative approach discovering sociolinguistic correlations is increasingly complemented by qualitative research and, vice versa, the predominantly qualitative approach in anthropological linguistics by quantitative studies. Therefore, sociolinguists have adopted the anthropological-linguistic core method of ethnographic fieldwork and participant observation as an important means of collecting natural language data and of gaining an emic understanding of a community’s socio-cultural ideas and practices (cf. Section 6.3.1). Otherwise, socio- and anthropological-linguistic techniques of data collection and analysis are as diverse as the kinds of relationships between language and society/culture, and this applies particularly in the contemporary complex global world in which people are interrelated in diverse ways (interacting physically or via media) and have mixed cultural backgrounds. Some socioand anthropological-linguistic research is even conducted without the key method of fieldwork. However, the pure analysis of secondary-language data (i.e., the use of pre-existing corpora from documentary projects or other readymade corpora that have not been compiled by the researchers themselves; cf. Chapter 5) is only possible if the relevant sociocultural parameters are known. This means that relevant social parameters are annotated in a corpus (e.g., generally the speaker’s gender, but hardly any information on the social
171
172
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
relationships among speech act participants) and/or that the researcher is familiar with cultural ideas and practices (e.g., Underhill 2012 studies languages and cultures with which he was familiar prior to the research). In order to investigate the relationship between language and society/culture empirically, it is possible to proceed from patterns of linguistic variation (language-internally or cross-linguistically) and then search for sociocultural meaning or parameters that correlate with or explain the linguistic differences. Alternatively, you can proceed from a sociocultural parameter (e.g., gender, age, or ethnicity) and study communicative practices and language features within this (sub)community or sociocultural context (e.g., among male adults working in higher positions of Western companies), optionally in comparison with other (sub)communities and contexts (e.g., vis-à-vis female adults working in higher positions of Western companies, or vis-à-vis male adults working in upper positions of Asian companies).
6.3
Methodology
The methodological procedures employed in socio- and anthropological-linguistic research are as diverse as the topics they cover. Therefore, we cannot discuss every methodological approach or technique of data collection and analysis but will focus instead on key methods and basic considerations. This includes the fundamental approach of ethnographic fieldwork and participant observation (Section 6.3.1), different kinds of linguistic and sociocultural data and corresponding techniques of data collection including the sociolinguistic interview and some experimental tasks (Section 6.3.2), the identification and sampling of research participants (Section 6.3.3), and important methods of data analysis (Section 6.3.4). By covering these aspects, we aim to provide you with an overview of the methodological field and a foundation of basic options in order to be able to conduct your own projects. 6.3.1
Ethnographic Fieldwork and Participant Observation
The fundamental methodological approach in sociolinguistics and anthropological linguistics is fieldwork, or more precisely ethnographic fieldwork, i.e., the study of language in the natural environment of its speakers. In contrast to documentary and descriptive fieldwork (cf. Section 3.3.2), sociolinguists and anthropological linguists do not only perceive the field as a concrete place populated by native speakers and, thus, offering unlimited access to language data, but more broadly as the speakers’ socio-cultural context in which language is embedded. Ethnographic fieldwork aims to provide systematic descriptions from an inside perspective (emic view of the research participants). In socio- and anthropological-linguistic studies, this does not only include the collection of language data but also sociocultural data on speakers such as
Sociolinguistics and Anthropological Linguistics
social characteristics, interpersonal relationships, social interaction, cultural concepts and practices, ideas and perceptions of belonging and differentiation, and so on. (cf. Section 6.3.2). The focus is on language as a means of sociocultural interaction. Thus, the fieldwork aims at gaining a comprehensive understanding of interrelations between local sociocultural and linguistic ideas, patterns and practices. A fundamental characteristic of fieldwork is the use of diverse data collection techniques in accordance with the specific research aim and the stage of research. Generally, there is a first explorative stage in which qualitative data is collected via ethnographic fieldwork and participant observation in order to gain an overview of the entire field. On the basis of this holistic information, a more problem-oriented/focused stage of qualitative and/or quantitative research on a specific topic may follow. This includes the use of data collection techniques such as systematic observation, focused surveys, and even experimental tasks in the field (cf. Section 6.3.2). In general, ethnographic fieldwork (particularly anthropological-linguistic research) takes place in sociocultural and language communities that the researcher is not part of or familiar with prior to research. This means that the first explorative stage is extremely important to gain an understanding of sociocultural habits relevant or of interest for research. In this process, it is not uncommon that initial research questions/topics are adjusted or even changed onsite, as other aspects turn out to be more relevant for subsequent systematic research. In keeping with anthropological tradition, the fundamental method of ethnographic fieldwork is participant observation. In contrast to other kinds of observation, the researcher takes on the role of participant in the setting of observation (cf. Section 2.2.2). Thus, it is a targeted research approach in which participation is more than mere presence in the field. Participation means getting involved in community interactions and learning to become a more or less competent member of the community/group to be studied. This socialisation process includes the acquisition of the requisite language skills and sociocultural knowledge for appropriate behaviour on the basis of attentive observation and imitation as well as verbal interaction and comprehension questions (e.g., ‘What’s happening here?’ or ‘Why is s.th. being done?’). However, participation does not mean taking on an influential position within a group by initiating major activities. In anthropology, this methodological approach supplanted the formerly established ‘armchair anthropology’ as the standard method, as the following quotation makes clear: The anthropologist must relinquish his comfortable position in the long chair on the veranda of the missionary compound, Government station, or planter’s bungalow, where, armed with pencil and notebook and at times with a whisky and soda, he has been accustomed to collect statements from informants, write down stories, and fill out sheets of paper with savage texts (Malinowski 1954: 146–147).
173
174
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Thus, ethnographic fieldwork means that a researcher builds a stronger relationship with the research participants than is the case in documentary or descriptive fieldwork. This methodological procedure has both advantages and disadvantages. By becoming a member of the community, the researcher gains an emic understanding. This is essential for successful socio- and anthropologicallinguistic investigations. However, observing while participating can be challenging. First, the simultaneity of these tasks easily results in excessive practical demands, and, second, in general, it is a balancing act to maintain an observer’s ‘neutral’ and distanced perspective while gaining an emic involved perspective at the same time. With non-participant observation, however, comes the not to be underestimated risk of researcher bias (cf. Section 1.1.5) or, more precisely, the perception and description of a research issue from a strongly observer-specific point of view that has little to do with the inside perspective (cf. Section 2.1, as illustrated by the ‘Nacirema’ text of Miner 1965). Furthermore, participant observation facilitates the minimization of reciprocal effects (cf. Section 1.1.5). Although, according to the observer’s paradox, research participants behave differently or unnaturally due to their awareness of being observed or researched, long-term participation means that, after a while, the researcher is perceived increasingly as a group member (a familiar insider) and seen less as an observer or researcher. Thus, ethnographic fieldwork is a very time-consuming process. Generally, it takes about a year to become socialised as an almost competent group member, to gain access to more sensitive topics, and to be able to understand the linguistic data in its sociocultural and situational context taking the speakers’ perspective into consideration. Participant observation usually requires interpersonal skills such as empathy, politeness, patience, and modesty – although the value of personal qualities and the specific rules of polite conduct differ cross-culturally. Finally, good listener and observer skills, and open-mindedness and flexibility are needed to gain an emic perspective and to adapt to situations and conditions in the field. Overall, fieldwork means that you cannot be prepared for everything and you will certainly experience unexpected situations. Therefore, we strongly recommend not underestimating the importance of careful preparation prior to fieldwork – not only in terms of thematic and practical issues of relevance, but also in terms of methodological and ethical awareness (cf. Section 3.3.2 for details). Field research and participant observation in a foreign unfamiliar community requires a high level of personal involvement and therefore can be emotionally demanding. Often, issues arise including culture shock (an initial feeling of discomfort, loneliness and incompetence in an unfamiliar cultural environment), interpersonal and -cultural misunderstandings, and denial of access to in-group data – particularly at the beginning of the post-initial research stages (cf. Figure 6.1). In the further course of the field project, the researcher gains a deeper cultural and linguistic understanding of and familiarity with the native speaking community – this facilitates not only better access to empirical data but
Sociolinguistics and Anthropological Linguistics
Initial stage:
researcher: unfamiliarity with the sociocultural group; curiosity, fascinated by the unknown research participants: they treat the researcher as a guest/foreigner Post Post-initial stages:
researcher: increasing sociocultural and linguistic competence for increasingly intense participation research participants: at first, they treat the researcher as a ‘child’ who needs to be taught step by step (with growing expectations regarding the appropriateness of their behaviour/skills) until they have proven to be competent group members who are fully integrated into the local community
Figure 6.1 Stages of the researcher-community relationship in ethnographic fieldwork
also improves one’s personal well-being. The quality of participation strongly depends on the intensity of interaction between the researcher and the community, the stage of research and the researchers’ level of competence at that stage (as indicated in Figure 6.1). Furthermore, the researcher’s social characteristics (e.g., being male or female, being an adult or a child) have an impact on the role that the researcher takes or is assigned. Here are some personal experiences for illustration: Despite much preparation on the Tongan way of life by reading ethnographic work of other researchers and by getting personal advice from experienced fieldworkers prior to my own fieldwork, it took me several months of participation in family and village life until I was regarded as competent enough for several tasks. In other instances, I was still treated as a Tongan child with insufficient competence and the need for advice and guidance. But this was already progress compared to the initial role as a ‘white foreigner’ who was given special treatment and not expected to know appropriate Tongan behaviour at all. Due to the locals’ image of and prior
175
176
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
experience with other ‘white foreigners’, some of my closest informants did not share their ideas on certain topics until I had lived within their household for nearly a year in which I had proven myself to be open-minded and truly interested in learning the Tongan way even though this meant putting aside formerly acquired knowledge (i.e., taking their beliefs seriously despite Western practices and thinking). Another aspect was that access to various aspects of local life depended on the researcher’s social characteristics. As a female researcher in Tonga, for instance, it was easy to participate in mats weaving but not in fishing which is a traditional male domain. Similarly, in South Central Africa, only circumcised men have access to initiation schools, their secret knowledge, and corresponding language styles. Or, as an adult researcher in Western schools, it is generally easier to gain access to the group of teachers than to students. Thus, participation is always defined by the researcher’s role within a community (depending on the researcher’s characteristics and skills) which can change in the course of the research process. Even though most ethnographic fieldwork takes place in communities that the researcher is not part of, it is also possible to study one’s own community. If this is not only done via introspection or secondary data, the researcher conducts fieldwork as an insider. In this case, the great advantage is generally faster and more direct access to group members and insider knowledge. This level of involvement and understanding, however, may come with a lack of distance, blocking an observer perspective, e.g., certain aspects are not perceived at all or certain information is believed to be known without verification. Aside from taking systematic notes on research aspects, including thorough methodological information (how, when, where, who, from whom, and under which circumstances was the data collected) for transparency (cf. Section 1.1.5: quality criteria), we strongly recommend writing a detailed field diary with notes on personal experiences and observations that attract your attention, comments regarding your feelings, etc. (cf. Section 1.2.4). This serves to capture information that might turn out to be relevant once you get more familiar with the field. My Tonga diaries, for instance, reveal that there were numerous aspects, which I no longer perceived to be remarkable after I had spent more time in the field. Moreover, subsequent reflection on individual notes in the field diary revealed relevant information in connection with the scientific records.
6.3.2
Data Types and Techniques of Data Collection
Studies in sociolinguistics and anthropological linguistics are based on two fundamental types of data that are investigated in relation to one another: • •
language data (i.e., language-internal or cross-linguistic variation), and extra-linguistic data (i.e., sociocultural characteristics of the speakers and situational parameters pertaining to the sociocultural context).
Sociolinguistics and Anthropological Linguistics
Depending on the specific research topic and question, particular types of language and sociocultural data are needed. While some approaches rely on existing data (e.g., ready-made language corpora as used in variationist sociolinguistics) and concentrate on the analysis, in other cases data collection is an essential part of the research project. In this section, we focus on data collection and, consequently, the latter approaches. In Section 6.3.3 on data analysis, we will then include the first approaches. As different types of data involve distinct methods of data collection, the study of language in its socio-cultural context is methodologically as diverse as the thematic approaches to the relationship of language and culture/society. Thus, we cannot cover all possible methods of data collection but only the most crucial ones. Overall, this section will provide you with an idea of some basic data types and how they can be collected.
Language Data With regard to basic methods of data collection, a major distinction can be made between the following types of language data: 6.3.2.1
•
(quasi-)natural language data – i.e., data representing the speakers’ actual language use (performance): Typically, natural language data are gained by the observation (and recording) of natural speech. Alongside oral data, the study of natural language use basically also includes written genres such as print media and computer-mediated communication (e.g., e-mails, texting, and online chats). The investigation of written genres is based on the representative selection and compilation of accessible texts. In contrast to natural language data which would have been uttered or written (ideally in the same way) whether or not the research had taken place, quasi-natural language data is generally triggered by the researcher but generally without drawing attention to the subject of language (e.g., the sociolinguistic interview as described below). In comparison with the recording and transcription of oral genres, the compilation of written genres is generally less labour-intensive. However, accessible natural language data that has not been produced in the presence of the researcher, such as ready-made corpora (cf. Section 5.3.1) or pre-existing written documents (i.e., texts written outside the research context such as online chats), can only be used if the relevant extra-linguistic data is also available (e.g., information on social characteristics of the speaker/author, or prosodic annotation). On the other hand, the use of pre-existing written documents does not entail any risk of researcher bias and observer’s paradox effects (cf. Section 2.2.3).
177
178
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
•
elicited language data – i.e., data representing the speakers’ knowledge of grammatical rules (competence): Elicited data in the narrow sense is gained by systematic survey (e.g., elicitation tasks such as acceptability judgements or minimal pair tests, cf. Section 3.3.3.2). In contrast to natural language data, the researcher has much more control over the occurrence of essential research data (e.g., phonemes, lexemes, constructions, or genres). However, elicitation generally involves cognitive awareness on the part of the speaker (reflected language data) and the data tends to fall more in line with and be more representative of (perceived) language norms than individual behaviour (idealisation of the data). In the case that the researchers do not collect the data themselves, they can use grammars – again only if the needed information is published.
Depending on the research topic, there are tendencies as to which data are used. The data basis of studies investigating the connection between languageinternal variation and sociocultural parameters of the speakers and/or the context (such as court language, or language and linguistic interaction patterns at comprehensive schools with a heterogeneous student body of different cultural backgrounds and social classes) is generally natural language data. For research topics such as conversation or discourse practices, oral genres are studied. Similarly, more complex sociocultural parameters (e.g., social relationships of speech act participants) tend also to be studied in the context of oral genres. Written genres, instead, are primarily investigated in terms of basic social speaker characteristics (e.g., gender variation) or sub-genres (e.g., specifics of computer-mediated language). On the other hand, studies on cross-linguistic variation generally work with elicited language data, primarily because the work with natural language data of multiple languages would be too labour-intense (cf. Section 4.3.3); they build on typological findings. Furthermore, investigations of language-specific features in comparison to other languages do not focus on language-internal variation but characteristics shared by the entire language community. Altogether, the more specific the topic of research, the less likely it is to find pre-existing data for use, neither natural nor elicited data. In more detail, socio- and anthropological-linguistic research makes use of a broad range of data collection techniques including different kinds of observation, survey, and even experimental tasks. In the following, we will present some selected methods of language data collection as used in the two subdisciplines. a.
participant observation and speech recording: This is the most common collection method of naturally occurring speech. In order to keep reciprocal effects (i.e., observer’s paradox effects) and researcher bias (resulting from a lack of emic understanding) at a minimum, the observation is characterised by
Sociolinguistics and Anthropological Linguistics
b.
long-term field research and the participation in community life (cf. Section 6.3.1). While observation without recording is less obtrusive/ conspicuous and has less effect on the naturalness of data, it is nearly impossible to capture the same amount of linguistic information and details by taking notes as compared to audio- or video-recordings, which can be listened to repeatedly (cf. Section 1.2.6). The type of data documentation ultimately also depends on whether technical equipment is at hand in situations that turn out to be relevant and important for research. Depending on whether researchers are familiar with a language, they can do the subsequent requisite editing of the recorded data (e.g., transcription, annotation, and translation) themselves or they can be assisted by native speakers. The type and extent of transcription and annotation depends on the specific research topic, and translations are only necessary in investigations of languages that the prospective academic readership is not expected to know (cf. Section 1.2.7). sociolinguistic interview: Interviews generally lead to a greater awareness of the research situation and make behaviour less natural. In contrast to interviews with a strict question-answer format, the socio-linguistic interview has an informal conversation-like format that aims for minimal impact on the part of the researcher in order to gain a corpus of (quasi‑)natural language data. The researcher needs to motivate research participant(s) to talk without reflecting on their language. This can be achieved by use of open questions about local events and topics that are interesting and familiar to the research participants like ‘How did you meet your partner?’, ‘How did you experience the last cyclone?’, or ‘How did you become a [group member, e.g. banker or club member]?’. By recalling personal experiences, the speakers produce long narrative responses, and the longer the interview, the less attention is usually paid to language use. Too general questions, potentially traumatic personal issues, and culturally delicate topics should be avoided. In comparison to the observation and recording of natural speech, interview data can be criticised for being less natural, but it has the advantage that the researcher can gather systematic language data more easily. Linguistic items that are essential for a research project (e.g., topics) can be triggered by corresponding questions. Furthermore, the researcher can determine whether to question individual speakers for narrative speech or multiple speakers at the same time for interacting dialogic speech. However, it is difficult to access distinct speech styles of a single speaker (e.g., formal vs. informal) as this requires different contexts and speech act participants which are difficult to simulate in an interview context.
179
180
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
•
elicitation & (experimental) tasks: Depending on the research topic, it can be necessary or advisable to use even more systematic techniques of data collection such as (experimental) tasks with corresponding stimuli. In doing so, the researcher can ensure that relevant linguistic items (sounds, words, etc.) occur in the data, even though this comes – to a lesser or greater degree – at the expense of naturalness or authenticity of the data. Picture-prompted storytelling, for instance, can be used to produce narratives that include the description of particular items, situations, and procedures as triggered by visual stimuli (e.g., the often used frog story of Mayer 1969). It is recommendable to use stimuli that the speakers are familiar with in their sociocultural context. In the case of such tasks, not only can the researcher influence language items occurring in the data, but also the data of different speakers who performed the same task are more easily comparable (crosslinguistically and language-internally). Still, picture-prompted storytelling creates quasi-natural or semi-natural language data, which is more natural than other elicitation tasks. Compared to natural language data, however, it comes with a decline of naturalness in terms of item frequency, information density, and so on, as the investigation of Klamer and Moro (2020) shows. Alongside the key element of initiated (quasi-)natural speech, the classic sociolinguistic interview (as developed by Labov) also includes more systematic tasks, namely reading tasks (words, minimal pairs, and short texts) to gather pronunciation data on certain sounds in different contexts. Whenever both, (quasi-)natural and systematic data, are collected, the issue of order is important. As systematic tasks draw attention to language, the informal conversation-like interview data are likely to be more natural if their collection precedes the systematic task. An experimental task in sociolinguistics that is used to investigate speech perception and language attitudes is the matched-guise task. Research participants basically listen to distinct dialect or language varieties without knowing that these are uttered by the same speaker. Afterwards, they are asked to attribute social characteristics to the speaker which reveals the research participants’ attitudes regarding the speaker and indirectly the dialect or language variety. In contrast, direct inquiry into attitudes regarding a language or language variety focuses attention directly on the topic and can lead to self-conscious, socially approved answers instead of personally held attitudes. Furthermore, people are often not aware of their attitudes and cannot consciously retrieve this information.
Sociolinguistics and Anthropological Linguistics
6.3.2.2
Extra-linguistic Data
Ethnographic fieldwork, participant observation, and inquiries (interviews or questionnaires) are also the fundamental methods employed to gather relevant extra-linguistic data as relevant in relation to language. This includes the following extra-linguistic parameters: •
•
sociocultural characteristics of the speakers: Parameters that are studied in terms of their impact on language are generally gender, age, level of education, occupation, social class or socio-economic status, place of residence, religious persuasion, and/ or ethnic background of the speaker. In numerous ready-made corpora, the gender and possibly the age, regional origin and/or the educational level of the speaker are the only annotated pieces of sociocultural information. This annotated information is furthermore only usable if the annotated values (e.g., level of education: primary’, ‘secondary’, ‘tertiary’) are identical with or can be transformed into the defined levels in your own project (e.g., level of education: ‘no literacy’, ‘literacy’). Another parameter that influences language production and perception is the speaker’s attitude towards linguistic variants (e.g., sounds, words or language varieties/languages). It determines the speaker’s choice in language production and the perception of others based on their choice – consciously or unconsciously (cf. matchedguise task in Section 6.3.2.1). sociocultural context: First of all, these are cultural parameters, i.e., culture-specific ideas and practices that have an impact on linguistic forms (e.g., crosslinguistic differences in the encoding of space, possession, time, emotion, and personhood) and linguistic practices (e.g., crosslinguistically different conversational maxims/practices or speech acts of greeting). Furthermore, the sociocultural context includes situational parameters, i.e., contexts in which the same person speaks differently depending on the particular social environment of interaction. These parameters are: – the setting (i.e., the environment of interaction): A common distinction, for instance, is made between ‘on-stage’ vs. ‘off-stage’ or ‘formal’ vs. ‘informal’ settings. In numerous cases, a different setting also includes a different constellation of social interactors (e.g., ‘in family’ vs. ‘at work’), but this need not necessarily be the case. Thus, we treat the two parameters separately. – the social constellation (i.e., the persons that may have an impact on the speaker’s language, namely the addressee, the referent
181
182
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
and/or the bystander): Relevant parameters can be various sociocultural characteristics – either of the addressee(s), the referent(s) and/or the bystander(s) only (e.g., absolute age, gender, profession, or status) or, in most cases, in relation to the speaker (e.g., relative age, gender, profession, or status). Absolute status (or profession) of the addressee, for instance, is the determining factor for the use of a certain language to address royalty (or a judge). An example of the impact of relative status/familiarity between addressee and speaker on language is the use of T/V pronouns (e.g., German ‘du’ vs. ‘Sie’ in 2Sg). Based on a relative parameter, a fundamental distinction is made between ‘ingroup’ vs. ‘outgroup’ (e.g., the use of a certain language among the initiates of a male association). In most studies only the addressee is considered relevant but it is important not to disregard referents and bystanders. The Tongan language of respect, for instance, is an honorific system that encodes the referent’s social rank in relation to the speaker’s rank (Völkel 2010), and in numerous Australian languages, such as Guugu Yimidhirr and Dyirbal, the speaker’s kinship relationship to bystanders (mother/brother-in-law) plays an important role in language behaviour (Haviland 1979; Dixon 1980). Other relevant relational parameters may be interpersonal sympathies and antipathies. A fundamental distinction can be made in terms of the view from which extralinguistic sociocultural parameters are defined: •
•
etic parameters & values: These are predetermined variables and values that are believed to be of cross-cultural relevance, e.g., gender (male vs. female), age (young vs. middle-aged vs. old), nationality/ citizenship, or social environment (urban vs. rural), and emic parameters & values: These are variables and values that are relevant to the speakers themselves, e.g., social status is of more or less relevance in different socio-cultural contexts and is defined according to distinct parameters such as age (e.g., in traditional egalitarian Aboriginal societies), special skills as an orator or fighter (e.g., in traditional Melanesian ‘big men’ societies), descent and kin relationships (e.g., in traditional Polynesian chiefdom societies), occupation, financial and educational background (e.g., in Western societies).
The work with emic vs. etic parameters & values differs extremely in the effort required for their collection and the fundamental procedure of data collection. An emic understanding of social categories and cultural concepts of an unfamiliar or even unstudied community is, first and foremost, gained primarily via ethnographic fieldwork by means of participant observation and open interviews – an elaborate method (cf. Section 6.3.1). Although common sociocultural ideas
Sociolinguistics and Anthropological Linguistics
and practices are a basis for successful interaction within a community, this does not necessarily mean that they are shared by all members in the same way. If a study aims at a more detailed analysis of this group-internal heterogeneity, complementary methods such as cultural consensus analysis (i.e., a survey approach to identify the distribution of shared knowledge within a community) can be used. Etic sociocultural data, however, are simply gained by systematic inquiry (questionnaires or interviews). In order to get a comprehensive understanding both these methodological approaches can be combined. While etic categories are useful for cross-cultural/-linguistic comparability, emic categories provide more precise culture-/language-specific insights. Socio- and anthropological-linguistic studies investigate different extralinguistic and linguistic parameters of people that are part of sociocultural (sub)groups and/or language communities that can be defined in multiple ways and captured by diverse methods. This issue is covered in Section 6.3.3. In sum, ethnographic field research is the fundamental research method, particularly in anthropological linguistics. It offers a platform for the combination of various techniques of data collection such as different kinds of observation, inquiry, and even experiments. While some sociolinguists and anthropological linguists consider ethnographic fieldwork with participant observation and the recording of naturally occurring speech among native speakers to be the only appropriate method of data collection, others use a broader range of techniques. The expected amount of work differs widely depending on the methods employed. Overall, the methodological approach must be tailored to answer the clearly defined research question and be feasible within the research framework. For this second reason, most socio- and anthropological-linguistic studies involving labour- intensive collections of language data (e.g., the transcription and annotation of extensive speech data) work with predetermined etic sociocultural parameters (e.g., gender) that are comparatively simple to acquire. By contrast, investigations involving labour-intense collections of sociocultural parameters (e.g., emic data) tend to reduce the workload needed for language data collection (e.g., working with systematically elicited language data or smaller sets of natural language data). 6.3.3
Research Participants: Identification & Sampling
The study of a particular language or language variety as a means of socio-cultural interaction inevitably involves working with speakers of that language or language variety. Depending on the level on which the relationship between socio-cultural parameters and linguistic variation is being investigated, different (sub-)groups of speakers will constitute the basic population or speech community (Morgan 2014). In any case, it will be a socioculturally significant or meaningful group of speakers with a common language foundation (i.e., a language variety, a language, or languages). The fact that they share a linguistic foundation, however, does not necessarily mean that there is no language- or variety-internal variation. A common linguistic foundation means
183
184
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
that the speakers can directly or indirectly interact based on shared linguistic norms and communicative practices. Depending on the research focus, the specific groups/communities to be studied can – –
either be identified in the course of the research project (e.g., speakers who use the same/similar linguistic features, such as the urban upper-class population); or one can start from a defined group of speakers (e.g., speakers with the same native language such as Tongans).
Anthropological linguists, for instance, primarily study communities defined by a shared language or languages (language communities or multilingual speech communities) and/or a common cultural background (ethnic groups). Single language and ethnic groups can then be contrasted cross-linguistically/-culturally. As language groups and ethnic groups do not necessarily correspond, it is also possible to study communities sharing the same language but differing in cultural aspects or vice versa. In contrast, sociolinguists primarily investigate speech communities that are defined by a shared language variety (dialect or vernacular groups) and/or that are to be identified by shared social parameters (e.g., people with the same level of education, at the same workplace, or within the same circle of friends). Two fundamental kinds of speech communities defined in sociolinguistics are: •
•
social networks: These are people who are cross-linked by shared association (social structure). The social network is a group of somehow related people (Milroy 19872; Milroy & Gordon 20032). The relationships can be strong (e.g., among friends or kin) or weak (e.g., among acquaintances). Strong ties are characterized by frequent and intense contact and a wide variety of topics (multiplex), while weak ties are characterized by rare and irregular contact and single topics (uniplex). Furthermore, the relationship between two members of a network can be direct (first-order contacts) or indirect (second-order contacts), and symmetrical (mutual) or asymmetrical (one-sided). Network density is calculated by the number of direct ties in relation to the total number of possible ties. Thus, a social network can be close-knit or open consisting of a few clusters (i.e., segments of high density). communities of practice: These are people that are cross-linked by shared practices (social practice). A community of practice is a group of people with shared interests and activities. They interact based on shared common knowledge, ideas, and practices, including a shared language and communicative behaviour (Eckert & McConnell-Ginet 2003).
Socioculturally significant groups of speakers can also be defined on emic or etic grounds (cf. Section 6.3.2.2). The definition of cultural or ethnic groups, for instance, can be based on etic parameters (e.g., citizenship or country of origin) or emic parameters (e.g., sense of belonging, ideas of in-group vs. outgroup, or shared ideas and practices). While etic data are primarily gained by survey, emic
Sociolinguistics and Anthropological Linguistics
data require more elaborate methods. Due to positive self-presentation and/or absence of awareness or self-reflection on the part of research participants, the data resulting from direct questioning can differ from data obtained via participant observation or indirect questioning. In order to gain a comprehensive understanding, different methods can be used to complement one another, e.g., participant observation of interacting behaviour and network analysis based on survey data (cf. Section 6.3.4) to define a social network. In the global world of today with people of multiethnic backgrounds and long-distance (in)direct relationships via modern media, the definition and delimitation of socio-cultural communities (such as social networks or ethnic groups) becomes ever more complex. Once the language or speech community is defined, the researcher can choose to study the entire group of speakers (if it is a smaller community), a representative sample, and/or single experts or representative cases. Generally, the researcher first chooses a specific field site where speakers of the language or language variety to be studied are located. This choice may be guided by practical considerations (e.g., pre-existing contacts or easy accessibility) and/or scientific considerations (e.g., a location with a representative population or a particular linguistic situation). On-site, it is then essential to find research participants. The identification of main informants (specialists) generally takes place during the explorative field stage. While getting to know the people, some turn out to have special skills or characteristic features relevant for the research (e.g., good storytellers, language teachers, or experts of a particular genre). Such exceptional talents are generally not representative of the entire community. Instead, for a representative case study it is important to choose an average person with regard to social parameters and linguistic behavior. An alternative is to investigate a representative sample – i.e., a selection of speakers from the entire community (cf. Section 1.2.3). Basic considerations regarding selection relate to the number of research participants and to sampling criteria. The total number largely depends on the research aim and the overall research scope of the project. Furthermore, the number of socio-cultural criteria to be investigated (e.g., two: males vs. females; or six: young females vs. middle aged females vs. old females vs. young males vs. middle-aged males vs. old males) is the deciding factor as for each subgroup a certain number of research participants needs to be included in quota samples (as a rough point of reference: at least 10–20) in order to allow for quantitative statements. As a rule, the more heterogeneous a community or subgroup is expected to be, the larger the entire sample or the subgroups should be. Regarding selection criteria, sociolinguists and anthropological linguists work with different kinds of samples, a select few of which we will now introduce: •
quota sampling according to social criteria: Research participants are selected according to their social and language characteristics (e.g., gender, age, mono-/bilingualism). Which criteria are relevant for a study is predetermined. These can be emic or etic categories, although emic categories need to be determined by pre-studies.
185
186
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
•
snowball sampling: Research participants are identified and selected based on acquaintance, friendship, kinship, or any other kind of relationship (including shared activities). After a first research participant has been selected, further ones are identified by the snowball system (also called ‘friend of a friend’ principle). By selecting research participants based on some kind of relatedness, socioculturally relevant groups of speakers, such as social networks or communities of practice, can be identified.
In any case, empirical success hinges on gaining access to sociocultural groups and their speakers or even being accepted as a member. Therefore, it is necessary to respect group-internal structures and hierarchies and to know local authority figures (e.g., the school director, the village chief, the spokesperson or ringleader of a group) to get their permission and support. Some authorities and structures may be obvious, but others only become apparent in the course of fieldwork. Sometimes jealousy among group members, caused by working with some but not all, may be unavoidable, but knowing group-internal structures certainly will assist in minimizing such problems. Furthermore, it is important to consider that making fixed appointments is not always common practice in some cultural contexts. This means that people may never show up or at least not at the arranged time, or they may send somebody else as a representative. In these cultural settings, it is reasonable to work directly/spontaneously with people whom you meet, who comply with the research criteria, and who are willing to participate.
6.3.4
Data Analysis
The kinds of data analysis in sociolinguistics and anthropological linguistics are as diverse as the research questions and the data collection methods. Thus, we will only discuss selected main types of analysis together with some discipline-specific analytical methods. All socio- and anthropological-linguistic studies generally include some kind of comparative analysis – comparing linguistic parameters (languages, varieties, language/variety-internal variations) and sociocultural parameters (sociocultural groups/sub-groups or contexts, cultural ideas & practices). Depending on the linguistic area of research, distinctive linguistic features can be determined by component analysis, distribution analysis, feature analysis, frequency analysis, etc. (cf. Section 1.2.8). The data basis for linguistic analysis is either natural language data (corpus analysis of accessible or self-recorded texts, cf. Chapter 5) or systematically elicited language data (cf. COMP: Please link.), whereas in the latter case, data collection is already guided by analytical considerations. A basic distinction can be made between qualitative and quantitative studies (cf. Section 1.1.4). Qualitative studies provide descriptions pertaining to sociocultural aspects and language behaviour in their complexity. In doing so, they aim to identify the fundamental parameters, underlying concepts and relational
Sociolinguistics and Anthropological Linguistics
patterns between socio-cultural and linguistic phenomena, such as in ethnographies of speaking and conversation analyses. This micro-level research generally involves long-term participant observation of a limited number of participants. Conversely, quantitative studies inform about how often linguistic features occur in distinct sociocultural groups or contexts, such as in correlation or network analyses. Such macro-level investigations generally involve surveys that focus on a limited number of pre-determined parameters, because the more standardised the data, the better their comparability is. Furthermore, these studies consider a statistically representative sample of speakers. Some typical analytical methods of sociolinguistics in particular are: •
•
Quantitative network analysis is a method to determine the structure of a group of interrelated speakers, a social network (cf. Section 6.3.3), from a predominantly emic perspective. Milroy and Milroy (1985) are primarily interested in the intensity and distribution of relationships and thus the network density, proceeding from the assumption that networks with a high density and strong ties are based on more shared knowledge, which can be taken for granted, than loose networks with weak ties. Consequently, dense networks and strong ties are expected to be unfavourable for innovation, including language change, while loose networks and weak ties are expected to be favourable for innovations. The data that is needed for social network analysis is a matrix indicating for each actor the presence (+) or absence ( ) of a relationship to any other actor of the network. The social network or segments of it can then be visualised in a sociogram (nodes representing actors and connecting lines representing relationships between actors) by use of software programs (e.g., UCINet or SocNetV). An analysis of more differentiated and/or emic data concerning social relationships (e.g., distinct kinds of social interaction as considered in communities of practice) is generally more qualitative than quantitative in nature. Correlation analysis as used in variationist sociolinguistics, for instance, generally includes two parameters: a (usually etic) social parameter (e.g., gender, age, or any other parameter that is expected to have an impact on language use) which is the independent variable and a linguistic parameter (linguistic variants, e.g., the pronunciation [-in] vs. [-iŋ] of the suffix ‘-ing’ as in ‘going’, Labov 1972) which is the dependent variable. In order to make sure that the impact of a single social parameter is measured, the groups of research participants must differ in this respect (e.g., male vs. female) while all other potential parameters need to be balanced (i.e., to occur equally within each group, e.g., older & younger, urban & rural, and more & less educated) or need to be kept homogeneous (i.e., participants are as similar as possible and only differ with respect to the parameter of interest, e.g., gender). Statistical evaluation then reveals whether the
187
188
•
6.4
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
linguistic variation correlates with the values of the social parameter (e.g., % of male speakers pronounce the suffix [-in] and % of female speakers pronounce it [-iŋ]). If so, the social feature constrains the distribution of language-internal variation. This is basically a corpus-based analysis, taking annotated social speaker characteristics into account (cf. Chapter 5). Variable rules analysis (also called VARBRUL analysis after the computational software tool), for instance, is a statistical analysis method which describes the patterns of variation between alternative linguistic variants as conditioned by social or contextual factors (Tagliamonte 2006). Conversation and discourse analysis (cf. Section 1.2.8; Sidnell & Stivers 2012; Tannen, Hamilton & Schiffrin 20152), in particular, is used to identify contextual and interactional patterns and their meaning in corpora of natural speech. Conversation in specific sociocultural contexts (e.g., in films, in court, in group therapy, or in emergency calls) can be analysed in terms of its temporal and sequential organisation, such as turn-taking, adjacency pairs (e.g., question-answer or greeting-greeting back), and repair. Furthermore, the production of meaning in discourse (e.g., rejection, requests, apologies, and other speech acts) can be analysed, i.e., which linguistic forms are used to express a certain meaning. With regard to the analysis of the conversation of bi-/multilingual speakers, a topic of particular interest is code-switching, and critical discourse analysis focuses on the identification of discourse structures that can be associated with social power. Overall, conversation and discourse analysis can be more exploratory or more systematic. While exploratory studies generally analyse single cases in detail (i.e., considering multiple aspects), more systematic studies focus on a specific phenomenon that they investigate across many cases. The study of language in unfamiliar socio-cultural contexts carries a great risk of the misinterpretation of data. Therefore, it is essential to learn about culture- or community-specific ideas and behaviours (particularly conversational practices), in order to interpret the data and the findings appropriately in their socio-cultural and situational context.
Basic Research Findings
Basically, research findings in sociolinguistics and anthropological linguistics can be categorised in terms of the statement level: • •
micro-level: statements about single speakers or small-scale units (e.g., case studies of a particular group of speakers) macro-level: statements about large-scale units (e.g., a speech community)
Sociolinguistics and Anthropological Linguistics
•
a more general level: cross-linguistic generalisations regarding language and society/culture in general (e.g., universal patterns) ➔ However, broadly generalising outcomes are relatively rare due to the great effort required to compare different languages in their diverse sociocultural contexts.
In terms of content, the outcome of socio- and anthropological-linguistic research describes distinct kinds of relationships between language and its sociocultural context. The specific results, however, are as manifold as the research questions and the methodological approaches. In the following, we present some basic categories of findings, including some basic terms and definitions that are relevant in this context: •
•
•
•
•
language-internal variation based on geographic parameters: dialects (i.e., vernaculars associated with a particular region) ➔ linguistic maps/atlases (e.g., Kurath 1949; Schmidt & Herrgen 2001–2009; Labov, Ash & Boberg 2006) indicating the geographic distribution of dialects and individual linguistic features (e.g., the pronunciation [kønik] vs. [køniç] in German) language-internal variation based on social parameters: sociolects (i.e., vernaculars associated with a particular social group, e.g., language of men vs. women, or youth languages) ➔ social markers (e.g., gender markers), i.e., linguistic features that are statistically representative for speakers of distinct social groups language-internal variation based on communicative situation or purpose: registers (i.e., vernaculars associated with a particular social context, e.g., ritual language, avoidance style, or respectful registers) cross-linguistic variation reflecting cultural meaning as encoded in linguistic forms (lexical concepts and formal categories): cultural conceptualisations (cf. also Section 7.3.7) – i.e., culture-specific ideas (e.g., on personhood, social status, object classification, possession, causality, emotion, time and space) encoded in linguistic forms (e.g., pronominal systems, kinship terminologies, classifiers, colour terminologies, honorifics, tense systems, metaphors and idioms) ➔ in contrast to social markers (see above) that are only statistically associated with sociocultural parameters, deixis (e.g., social deixis/honorifics or person/gender deixis) are mandatory encodings of these parameters (namely, social status or gender) cross-linguistic/-cultural and language-internal variation in communicative practices: ethnographies of speaking/communication – i.e., descriptions of communicative practices as they are characteristic of a particular group (e.g., a culture, a speech community, or a particular situation/setting); linguistic practices (and forms) as used
189
190
•
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
in initiation circles, in spirit possession rituals, etc. are particularly interesting instances which demonstrate how deeply language is embedded in its socio-cultural context sociocultural factors of language change (i.e., language change as the result of social interaction): a. in mono-lingual contexts, e.g., the impact of network structures: Social networks with a high density and strong ties (cf. Section 6.3.3) are unfavourable for innovation (i.e., vernacular norms are maintained), while weak network ties are more favourable for innovation (Milroy 19872; Milroy 1992). b. in situations of language & culture contact, e.g., the factors leading to the emergence of pidgin & creole languages, underlying codeswitching behaviour in multilingual communication, or leading to the transfer of linguistic features (such as lexical borrowing) Overall, languages or language varieties that are less accepted or prestigious are more likely to be subject to change (e.g., Labov 1972; Seifart 2000 – cf. Section 3.3.1: language extinction). ➔ acrolect (language or language variety of high prestige) vs. basilect (language or language variety of low prestige) Whenever speakers choose between languages or language varieties, this choice can be conscious or unconscious.
•
cross-linguistic differences in the process of language socialisation (i.e., the co-acquisition of language and culture)
Overall, language is certainly influenced by the sociocultural parameters of its speakers, but language can also be used to maintain or negotiate sociocultural ideas and practices. Language itself is a sociocultural instrument and language use a sociocultural practice. These relationships between language and society/ culture becomes clear in a multitude of research results. Conversely, a study may show that the suspected link/correlation does not exist. Thus, it is difficult to postulate cross-linguistic ‘if-then’-statements between linguistic and sociocultural parameters. The relationship is manifold and complex overall, as the diversity of results shows, and multiple socio- and anthropological-linguistic studies include some level of interpretation, – i.e., the pure results of analysis are regarded as an indication of the relationship.
6.5
Summary
The basic aim of socio- and anthropological-linguistic research is the investigation of relationships between language variation (language-internally and cross-linguistically) in linguistic forms and practices and sociocultural
Sociolinguistics and Anthropological Linguistics
parameters and concepts. This broad and heterogeneous research field comprises multiple aspects which are approached empirically in various ways and by use of diverse methods. A main method for studying language in its sociocultural context is ethnographic fieldwork with participant observation. Field sites can range from rural areas of non-Western small-scale societies (traditional research locations of anthropological linguistics) to specific urban settings in Western societies such as high schools or city districts (traditional sociolinguistic research contexts). This time-consuming and emotionally demanding research procedure aims at gaining an inside perspective (emic view) of a community’s ideas and practices by becoming a competent member of the sociocultural community. The initial stage of exploratory research can be followed by more problem-oriented field research such as systematic observation, focused surveys, and even experimental tasks. The majority of studies investigating language-internal variation focus on language use/performance, and therefore record (quasi-)natural language data gained by observation or sociolinguistic interviews (in an informal conversationlike format). In contrast, studies with a cross-linguistic perspective are primarily based on competence data gained by direct inquiry (e.g., elicitation). Socio-cultural parameters that are associated with language variation are gender, age, level of education, occupation, religious persuasion, social class or status, social context of interaction, social environment or community, ethnic background, and culturally important concepts. With regard to data collection, etic and emic parameters and values can be distinguished. While predetermined etic categories (e.g., gender/sex: male vs. female) can be gained by direct inquiry, emic categories, due to their relevance to the speakers themselves, require participant observation. In sum, socio- and anthropological-linguistic research always involves a type of comparative analysis such as the analysis of language as used in a particular sociocultural context as compared to other contexts, or the analysis of languagespecific patterns as compared to culture-specific ideas and practices.
6.6
Exercises and Assignments
Exercises for students that can be included during a session on sociolinguistics and anthropological linguistics or as part of project work: 6.1
6.2
Gather some practical experience in ethnographic fieldwork and participant observation in a ‘sub-culture’ of your own society with which you are unfamiliar, e.g., an allotment garden community, the banking industry, or refugee homes. Record a speech sequence in which multiple speakers are interacting, and transcribe it orthographically – what kind of challenges or difficulties do you experience?
191
192
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
6.3
Develop a short questionnaire to collect detailed data on a person’s network or community of practice (e.g., ‘Who do you regularly meet, and for which reason/purpose?’, ‘Do you consider these people to be a friend/acquaintance/kin/ colleague/. . .?’). Discuss what type of data is needed to answer specific research questions as shown in Table 6.1. And, what techniques of data collection and analysis are best-suited to collect this data? Develop your own socio- or anthropological-linguistic research question that focuses on a specific aspect of language in its sociocultural context.
6.4
6.5
For students, it is generally not possible to conduct long-term research in the field (particularly other countries). However, less time-consuming and costintensive research projects or extensive exercises are nevertheless still feasible: 6.6
6.7
6.8
6.9
6.10
6.11
Replicate a quantitative sociolinguistic study that analyses correlation patterns of linguistic variation and social parameters (e.g., age or gender) of the speakers, or develop and conduct your own variationist sociolinguistic research project. Compare two or more international English varieties using readymade corpora such as the ICE corpus (http://ice-corpora.net/ice/) or the GloWbE corpus (www.english-corpora.org/glowbe/). Are there divergences that can be related to culture-specific features? Are they more marked in male vs. female or younger vs. older speakers? Conduct a qualitative study based on participant observation of a small number of research participants in the banking industry or primary schools. Does the language used in arguments by boys differ from that of girls at the age of seven, or how does the language of bank employees differ when speaking to colleagues vis-à-vis when speaking to customers? Collect and analyse ‘greetings’ in different cultures and/or contexts. What is the culture-specific or situational meaning of these interpersonal speech acts? Check whether the same dialectal features as described in the literature (based on natural speech analysis) occur in published versions of this dialect, e.g., ‘Asterix’ – comparing the Standard German and the Hessian version (‘Asterix auf Hessisch’), or ‘The little prince’ – comparing the Standard German and the Swabian versions. Are the features used excessively compared to natural speech? Investigate motives for code-switching of bilingual speakers. Record data and then ask the speakers for their motives in the different instances of switching from one language to the other. Are there particular topics, addressees, or diverging concepts encoded in each language that trigger code-switching?
Sociolinguistics and Anthropological Linguistics
6.7
Further Reading Sociolinguistics
Introductory textbooks to sociolinguistics are Hudson 19962, Trudgill 20004, Romaine 20002, Eckert & McConnell-Ginet 2003 (language and gender), Meyerhoff 2006, Wardhaugh 20065, Holmes 20134, Merrison et al. 20142 (language in use), and Bell 2014, while Coupland & Jaworski 1997, Paulston & Tucker 2003, and Meyerhoff & Schleef 2010 are supplementary readers. Friginal & Hardy 2014 provide a students’ guide for corpus-based sociolinguistics in particular. Handbooks to sociolinguistics include Coulmas 1998, Ammon et al. 2002/2005/2006, Ball 2010, Mesthrie 2011, HernándezCampoy & Conde-Silvestre 2012 (historical sociolinguistics), Bayley, Cameron & Lucas 2013, and García, Flores & Spotti 2016. Major sociolinguistic journals are ‘Language in Society’ (Cambridge University Press) and ‘Journal of Sociolinguistics’ (Blackwell). Alongside these encompassing journals, there are multiple journals with a narrower thematic focus, e.g., ‘Language, variation and change’ (Cambridge University Press), ‘Journal of historical sociolinguistics’ (de Gruyter), ‘Journal of linguistic geography’ (Cambridge University Press), ‘Dialectologia et Geolinguistica’ (de Gruyter), or ‘World Englishes’ (Wiley-Blackwell). Finally, thematic journals like the ‘Journal of Politeness: Language, Behaviour, Culture’ (de Gruyter) or the ‘Journal of Language Contact‘ (Brill) publish articles on topics which are studied by sociolinguists as well as by anthropological linguists. Book series with a sociolinguistic focus are ‘Language in society’ (Blackwell), ‘Routledge studies in sociolinguistics’ (Routledge), and ‘IMPACT: Studies in language, culture and society’ (Benjamins). In the book series ‘Language in society’ (Blackwell) several introductory textbooks to multiple topics of sociolinguistics and anthropological linguistics appeared, e.g., language contact (Winford 2003).
Anthropological Linguistics
Textbooks introducing anthropological linguistics are Salzmann, Stanlaw & Adachi 1993/20187, Duranti 1997, Foley 1997, and Ahearn 2012, while Duranti 2004 and 20092 are readers with contributions by crucial representatives of the field and Duranti 2001 provides a glossary of key terms and concepts. Handbooks to anthropological linguistics are provided by Enfield, Kockelman & Sidnell 2014, Bonvillain 2015, and Sharifian 2015. Journals with a focus on anthropological-linguistic topics are, ‘Anthropological Linguistics‘ (Indiana University), ‘Journal of Linguistic Anthropology’ (Wiley) which is the journal of the Society for Linguistic Anthropology (SLA), ‘International Journal of Language and Culture’ (Benjamins), and ‘Language, Culture and Society’ (Benjamins). Book series publishing anthropological-linguistic studies are ‘Culture and language use’
193
194
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
(Benjamins), ‘Studies in linguistic anthropology’ (Berghahn), and ‘Oxford studies in the anthropology of language’ (Oxford University Press). Finally, the journals ‘Language, culture and cognition’ (Cambridge University Press), ‘Brill’s studies in language, cognition and culture’ (Brill), and ‘Cognitive linguistic studies in cultural contexts’ (Benjamins) cover topics at the interface between anthropological linguistics and cognitive linguistics (cf. Chapter 7). Johnstone 1999 (qualitative methods), Milroy & Gordon 20032, Hanneman & Riddle 2005 (network methods), Bernard 20115 (methods in anthropology), Krug & Schlüter 2013, Hernández-Campoy 2014, Holmes & Hazen 2014, Heller, Pietikäinen & Pujolar 2017, and Drager 2018 (experimental methods) focus on different research methods as used in sociolinguistics and/or anthropological linguistics. Various aspects of ethnographic fieldwork, in particular, are addressed in Eckert 2000, Atkinson et al. 2001, Newman & Ratliff 2001, Fischer 20022, Vaux & Cooper 2003, Gumperz & Jacquemet 2006, Beer 20082, Bowern 2008, DeWalt & DeWalt 20112, Sakel & Everett 2012, Thieberger 2012, Schilling-Estes 2013, and von Poser & von Poser 2017. Tusting 2020 is about the ethnography of communication. Different kinds of analysis are described in Scott 20002 (network analysis), Tagliamonte 2006, Sidnell & Stivers 2012 (conversation analysis), and Tannen, Hamilton & Schiffrin 20152 (discourse analysis).
7
Cognitive Linguistics and Psycholinguistics
Cognitive linguistics and psycholinguistics comprise a broad range of approaches and theories within the interdisciplinary field of linguistics, cognitive science, and psychology. Their shared interest lies in the investigation of language and cognition, as language is regarded as a mental phenomenon or, more precisely, as ‘an instrument for organising, processing, and conveying information’ (Geeraerts & Cuyckens 2007: 3). In comparison to other linguistic subdisciplines, cognitive linguistics, and psycholinguistics only relatively recently came into being. In general, no sharp distinction is made between cognitive linguistics and psycholinguistics so that the terms are often used interchangeably. In this chapter, we will first outline fundamental research aims and questions (Section 7.1) and distinct cognitive approaches to language (Section 7.2). Furthermore, we will address specific methodological issues (Section 7.3), including the kinds of research participants, an overview of data types and techniques of data acquisition and analysis. Subsequently, we will focus on experimental research on mental representations and processes (in language comprehension and production) and on the relationship between language and thought (linguistic relativity). Finally, the chapter provides information on key findings (Section 7.4), followed by a summary (Section 7.5), methodological exercises and ideas for your own research projects (Section 7.6), and suggestions for further in-depth readings on multiple issues in cognitive linguistics and psycholinguistics (Section 7.7).
7.1
Research Aims and Questions
Cognitive linguists and psycholinguists study language from a cognitive perspective. The overall goal is to discover the largely unconscious mental representations of linguistic knowledge, their impact on thought and vice versa, and the mental mechanisms underlying language use. This goal can be broken down into the following basic research questions: •
What mental representations or cognitive conceptualisations are reflected in language, or more precisely, how is knowledge organised in the speakers’ mind (mental lexicon, mental grammar)? What are meaningful units – i.e., basic concepts and categories according to 195
196
•
•
•
•
• • •
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
which our knowledge is structured in language? Depending on the researchers’ area of interest and their theoretical background, this includes contentful units of the lexical and semantic subsystem (e.g., meaning primitives) and structural units of the grammatical subsystem (e.g., phonemes, morphemes, syntactic functions, or theory-laden units such as trace or movement operations). What mental processes are active during language production and language comprehension, i.e., how and when are mental representations accessed or become relevant? How do we assign meaning and hierarchical structure to sounds, signs or letters and their combinations during language comprehension and how is non-verbal thought transformed into a spoken, signed or written utterance (bidirectional form-to-meaning mapping)? And what happens when we are confronted with suboptimal informativity (due to ambiguity, complexity, and markedness)? Are there measurable processing signatures (called processing cost or cognitive effort)? Which are the mental representations and processes particular to language and which belong to general cognitive capacities? How do those specific to language interact with other cognitive functions such as memory, perception, or attention, thereby yielding certain specific linguistic behaviours (e.g., how is linguistic knowledge used during processing)? To what extent are these mental representations and processes subject to developmental changes (language acquisition in typically vs. atypically developing children) or life-span changes (interactions between language and aging, language loss due to acquired disorders)? How do human language and cognition differ from cognitive capacities and communication systems of other species? Why are humans the (hitherto) only species attested to have a communication system as complex as language and how did they come to develop it? What is the relationship between language and thought (linguistic relativity)? Does language have an impact on thought, and if so, in what way (e.g., language determines thought, thinking for speaking)? Which mental representations and processes are cross-linguistically universal and which are language-specific, i.e., in what way do they differ across the languages of the world? How do children acquire language (first language acquisition)? And, what happens in the mind of speakers who learn a second language (second or foreign language acquisition)?
As there is significant overlap between the research fields of cognitive linguistics and psycholinguistics, it is difficult to make a clear distinction between the
Cognitive Linguistics and Psycholinguistics
two subdisciplines. However, there are different traditional foci. Studies in cognitive linguistics basically investigate mental representations as reflected in language (i.e., semantics, grammar, phonology and pragmatics). The core fields of research are: •
•
cognitive semantics, studying the conceptual content and its structure as encoded in language. The main topics are conceptual metaphors and metonymies, the lexical categorisation of semantic fields (e.g., componential features, taxonomies, prototypicality), and lexical ambiguity of polysemous and homonymous word forms. cognitive approaches to grammar (i.e., Cognitive Grammar and Construction Grammar), focusing on schematic meaning and cognitive principles structuring grammar. Topics studied here are word classes, construction types, constituency order, and grammaticalisation.
These topics are approached predominantly from a cross-linguistic perspective with a focus on language variation and linguistic particularities (cf. Chapter 4 on typology). Furthermore, the research generally includes non-linguistic ethnographic investigations to learn more about culture-specific ideas underlying language particularities and other sociocultural practices. Hence, cognitive linguistics is closely related to cognitive and linguistic anthropology (cf. Chapter 6 on anthropological linguistics), as they are all interested in the relationship between language, culture and cognition, or more precisely, to what extent conceptual organisation as reflected in linguistic forms and practices differs across languages and reflects culture-specific ideas. With regard to this interface, anthropological linguists and cognitive linguists study primarily semantic fields or conceptual domains such as space, time, kinship, emotion, quantity, colour, animals, or plants. While research on linguistic relativity investigates the relationship between language diversity and thought, studies in cultural linguistics are predominantly interested in culture-specific cognitive conceptualisations as reflected in individual languages. Given the need for anthropological data and cognitive data of native speakers in their natural environment, cognitive research is generally conducted in the field. Psycholinguistic research is primarily interested in the mental representations and processes that speakers use when they process language in real time, focusing on the interaction of linguistic and non-linguistic cognition. Typically, this includes the following areas: •
•
language production studies the mental mechanisms and their temporal order in accessing and retrieving linguistic information for speaking, signing, and writing, speech errors in the language of healthy (e.g., slipof-the-tongue phenomena) and impaired people (e.g., with different kinds of aphasia, acquired or developmental dysgraphia). language comprehension focuses on the mechanisms of understanding speech (e.g., sounds and phonemes for spoken language)
197
198
•
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
and language (meaningful units from the word up to the text level), and the temporal order in which different types of information are processed to compute sentence or text meaning. language acquisition studies the ability to learn a language, the process and the preconditions of language acquisition.
Traditionally, psycholinguistics focuses predominantly on universal parameters (i.e., mental mechanisms and processes underlying human language independent from cross-linguistic variation). However, there is increasing appreciation of the issue that cross-linguistic and individual variation in language processing needs to be addressed systematically in order to find cognitive universals. Given the close vicinity to experimental and cognitive psychology, psycholinguistic experiments generally aim to conform to laboratory standards. This approach offers a link for researchers from neuroscience who aim for the complementary use of cognitive and neuroscientific methods when explaining language processing (cf. Chapter 8). Table 7.1 contains specific examples of research questions targeting cognitive linguistic and psycholinguistic issues from each linguistic area.
7.2
Cognitive Approaches in Linguistics
Cognitive linguistics and psycholinguistics comprise a vast field of distinct approaches studying language as a mental phenomenon, starting from different theoretical assumptions and commitments. Basically, there are two fundamentally distinct cognitive approaches to language: a.
b.
The first is the generative approach (evolving out of the Chomskyan tradition), postulating innate cognitive principles of language acquisition, a Universal Grammar (UG), based on the fact that children learn a mother tongue despite the deficiency of the input they receive. Some scholars even surmise a language-particular acquisition device in the human brain. Thus, mental representations and language processes are considered as strictly language-internal. The respective research is theory-driven and proceeds in a deductive manner (topdown) as empirical investigation is used to verify/falsify hypotheses derived from linguistic theory. Seeking to find universal mental principles and parameters, theoretical assumptions regarding linguistic competence are deduced from and checked via grammatical structures as they occur in distinct languages. Apart from the aim of describing UG, some researchers have also tried to localise a language acquisition device in the human brain. The second branch of research, in contrast, views our linguistic abilities as being embedded in and not separable from humans’
Cognitive Linguistics and Psycholinguistics
199
Table 7.1 Example research questions in cognitive linguistics and psycholinguistics Cognitive linguistics & Psycholinguistics Linguistic domains: Phonetics & phonology
Morphology & syntax
Lexicon & semantics
Pragmatics & discourse
Cross-domain
1. What phonological knowledge is used during reading across varying writing systems? 2. How do humans discriminate sounds and phonemes? Does sound discrimination interact with other types of information (e.g., visual content, predictions of upcoming words)? 3. Are morphological forms (inflected forms) stored as whole units in the mental lexicon or are they composed via rule-based mechanisms? 4. How do humans deal with syntactic ambiguity in sentence comprehension? 5. Are languages with a consistent head-dependent order easier to process? (cf. Section 4.5) 6. What is the conceptual structure of (numeral or possessive) classifiers – in a particular language or cross-linguistically? 7. Is gender-marking of non-animates in German acquired faster/better by speakers with a gender-indifferent L1 or by speakers with a genderdifferentiating but deviating system? 8. Is there a relationship between the dominant spatial frame of reference in linguistic encoding and non-linguistic spatial thinking? 9. What are the basic building blocks of text and discourse structure to which speakers adhere (e.g., a consistent topic-focus order in languages without topic and/or focus marker)? 10. How does the language of people with different kinds of impairment differ from the language of unimpaired speakers? 11. Is syntactic parsing guided by lexical information or is it independent of lexical-semantics? (syntax-semantics interface)
Cross-disciplinary fields: Language 12. Can newborns discriminate their mother tongue from other languages acquisition with distinct intonation patterns? 13. At what age do young children develop a theory of mind? What are the linguistic indicators? 14. Are auxiliary or simplified languages (e.g., Basic English, Esperanto, Interlingua) easier to acquire, produce and/or comprehend than their natural counterparts? Language contact 15. How do bilingual speakers access word meaning in their dominant vs. non-dominant language? 16. In which of their languages do multilinguals calculate? Language change 17. How does language exposure across one’s life span influence the speed and accuracy of language comprehension?
200
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
overall cognitive capacities. In this sense, cognitive research in the tradition of Lakoff, Langacker and Talmy (originated in the late 1970s and early 1980s) is based on the following two commitments: first, the generalisation commitment, seeking general rules and principles responsible for all aspects of human language; and second, the cognitive commitment, specifying that the general rules and principles of language are to be consistent with the knowledge regarding the human mind and the brain. Emphasising the interdependence of language and other cognitive abilities means that mental representations and processes can be both particular to language and general to human cognition (i.e., established in non-linguistic cognitive domains and subsequently applied to linguistic processing). Similarly, language acquisition is explained by the involvement of general learning mechanisms not specific to language, just as the articulation organs do not only fulfil language-related functions (e.g., Tomasello 2003). This approach is also top-down, aiming to falsify a priori hypotheses, but these hypotheses stem from cognitive rather than from linguistic theories. They are tested in experimental settings or deduced from natural language behaviour. However, investigating general cognitive principles does not mean that there is no linguistic diversity (Evans & Levinson 2009). Instead of proceeding from universals in formal linguistics (such as UG), this approach searches for mental representations and processes underlying typological universals and linguistic diversity (i.e., data-driven findings based on cross-linguistic comparison) (cf. Chapter 4). Explanations offered for cognitive universals are processing preferences, embodied cognition (i.e., similar experience based on human body-specifics such as the visual system), shared environment (e.g., gravity-based distinction in spatial terms), collective experience, and basic structural and functional parameters of language (e.g., iconicity or prototypicality). For differences in cognition, research on linguistic relativity investigates to what extent language shapes thought. In sum, research in cognitive linguistics and psycholinguistics has been significantly theory-driven, consisting of theoretical considerations based on introspection and rational reasoning. Nevertheless, there is a clear and steady concern for the systematic empirical examination of theoretical claims. Therefore, testable hypotheses need to be formulated which can be operationalised and studied via empirical methods.
7.3
Methodology
The broad range of distinct methods and empirical procedures used to study language from a cognitive perspective can best be classified according to distinct research areas, as presented in Section 7.1. After some remarks on
Cognitive Linguistics and Psycholinguistics
different kinds of research participants (Section 7.3.1), we will first provide a general overview of the data types and techniques of data collection and analysis used in cognitive linguistics and psycholinguistics (Section 7.3.2). Subsequently, we will focus on experimental research. After an introduction of key components in experimental research (Section 7.3.3), we will present experimental designs for investigating language comprehension (Section 7.3.4), language production (Section 7.3.5), language acquisition (Section 7.3.6), and finally the relationship between language and thought (Section 7.3.7). 7.3.1
Research Participants
Depending on the specific area and topic of research, studies on language and cognition are based on data gathered from various kinds of research participants: • • • • •
participants of distinct age groups (e.g., infants/children of different language-acquisition stages, and adults vs. elders in research on developmental and life-span changes) speakers of typologically distinct languages, particularly in studies on linguistic relativity monolingual vs. bi-/multilingual speakers healthy vs. cognitively impaired speakers humans vs. other species (particularly great apes) in studies on cognitive capacities for language or communication
Sampling procedures for research on language cognition often entail a struggle with biases regarding sample size and selection criteria, impacting the potential generalisation and replication of the research findings. Many psycholinguistic experiments build their samples with research participants taken from the local university student population, when the research question is not concerned with language in special populations such as children or older adults. These age and education biases go hand in hand with a linguistic bias towards better studied major languages (especially English) in countries promoting language sciences and providing the necessary laboratory equipment. This overrepresented group of research participants is called WEIRD (Western, educated, industrialized, rich, and democratic) people (Henrich, Heine & Norenzayan 2010). The more complex the required technical equipment (e.g., in neurolinguistic studies), the more problematic or even impossible is its use for speakers who cannot be dislocated from their natural surroundings. There is, however, an increased effort to make at least some experimental methods viable for field research and to include more samples from non-Western or non-student populations. Working with small and special populations bears the risk of creating a bias with regard to sample size. Furthermore, in studies of special populations (e.g., of impaired people or speakers of endangered languages), it is difficult to guarantee confidentiality or anonymity.
201
202
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
7.3.2
Data Types and Techniques of Data Collection and Analysis
Introspection by the researchers themselves is important as guidance but not as a research method on its own. Therefore, systematic empirical research is needed. As cognition is in and of itself not observable and speakers cannot be asked directly for unconscious knowledge of mental representations and processes underlying language and language processing, the research in this field is primarily based on two empirical methods of data collection (see Chapter 2): • •
experiments, and/or observation and analyses of (natural) language behaviour.
The resulting data can be classified as language data (i.e., linguistic responses or natural speech), language-related data (i.e., behavioural reactions in response to linguistic input), or language-independent data (i.e., behavioural reactions in non-linguistic experiments). Mental representations and processes can only be inferred indirectly from the data. Thus, there is always an interpretive component in cognitive research. Indicators for cognitive phenomena are, for instance, longer reaction times indicating increased processing effort, eyemovement patterns indicating attention, and the frequency of use in natural language indicating cognitive entrenchment. Direct inquiry can be used as supplementary method, but it has to be conducted after experimental and/or observational data collection, so that research participants cannot adapt their behaviour due to possible hints and chances for reflection on the research topic which a survey could reveal. After they have performed an experimental task (e.g., categorising or memorising words) or used certain expressions in natural speech, asking them for the reason for their behaviour (‘Why did you choose this expression/categorisation?’) or inquiring into their approach to a task (‘How did you memorise the items?’), may provide valuable supplementary information. Depending on the basic research question, cognitive-linguistic and psycholinguistic studies work with different kinds of data, methods of data collection and analysis. The study of mental representations as they are encoded in language can be based on the analysis of language data produced in natural situations (observed and recorded by the researcher for the purpose at hand or via access to pre-existing corpora), elicited language data (retrieved from grammars and dictionaries or self-collected using elicitation-based experimental tasks, cf. Table 7.2) and/or language-related data (collected using, for example, decision-related experimental tasks, cf. Table 7.2). Natural language data differs from elicited language data in terms of whether the researcher has an impact on the data’s existence. Quasi-natural language data (e.g., triggered monological descriptions or dialogical interactions) can be found in the middle of the continuum of researcher impact as the researchers initially provoke the participants’ production of the language data, but during data collection, they interfere as little
Cognitive Linguistics and Psycholinguistics
as possible. While the collection of experimental data is already guided by analytical considerations, this is not the case with natural language data. In search of cognitive patterns underlying a particular language, different kinds of analysis are used such as component or feature analysis (cf. Section 1.8), conceptual, discourse or corpus analysis (cf. Chapters 5 and 6). Obviously, the methods of analysis used here are in close proximity to corpus studies and research in other linguistic subdisciplines. The key distinction, however, is that they focus upon cognitive explanations of the data. Working with natural language data is essential for discovering information about frequency and cotext/context of use, speech errors and miscommunication, and complexity (e.g., information density). Studies based on grammars or dictionaries proceed from the assumption that the data reflect most prominent and/or most frequent structures, but still leaving room for language-internal variation. Synchronic and diachronic language data and their analysis provide insights into cross-linguistic similarities and variation as well as linguistic changes over the course of time. The methodological procedure for research on mental representations in the context of linguistic processing strongly differs according to the mode of processing, i.e., production vs. comprehension. Investigations on language production are based on the analysis of observational data (i.e., spontaneously produced natural language data (available in pre-existing corpora to be recorded for the purpose at hand), and/or experimental data (cf. Section 7.3.5). Units of analysis in natural language studies are deviations from fluent speech or writing (errors or dysfluencies such as slips of the tongue or slips of the pen) and distributional patterns (frequency of occurrence, co-text/context of chosen linguistic variations, or linguistic complexity). Studies on language comprehension, in contrast, are primarily based on the analysis of language-related data resulting from experimental tasks with linguistic stimuli (cf. Section 7.3.4). Additionally, language data produced in natural situations (e.g., miscommunication and understanding as retrieved from linguistic reactions) can be studied. Finally, investigations on the relationship between language and thought go beyond the previously described research on mental representations. In order to study whether and, if so, in which way language has an impact on nonlinguistic thinking, thought needs to be studied independently from language. This means that language-independent data indicating cognitive patterns underlying non-linguistic mental activities are indispensable for comparison with mental representations as encoded in language. Such data can be gained by non-linguistic experiments (cf. Section 7.3.7) as well as by ethnographic research on culture-specific ideas and practices. On the whole, research topics in cognitive linguistics and psycholinguistics are generally studied using multiple methods providing complementary information. Experiments are subject to a strongly reductionist approach and a single experiment is rarely sufficient to test a research topic comprehensively. Furthermore, the interpretation of data in terms of cognitive patterns can be discussed controversially. Thus, research outcomes are strengthened, if data of
203
204
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
different kinds and methods indicate the same cognitive pattern or process. Converging evidence (cf. Section 2.5) may result from: • • •
the analysis of experimental data and of (natural) language data, distinct kinds of linguistic experiments, and/or the analysis of different linguistic modes (e.g., speech and co-speech gesture).
In the following, we will focus on experimental research in language production, language comprehension, language acquisition, and on the relationship between language and thought. For this purpose, we will address some basic information on key components of experimental design first. 7.3.3
Key Components of Experimental Research
Typically, experiments measure one dependent variable reflecting cognitive processes or mental representations in one or more of the following three domains: •
•
•
representational access: This refers to absolute access in terms of which mental representation is present and relative access in terms of processing accuracy – i.e., whether the hypothesised mental representation is active. In production studies, this is measured as correct vs. erroneous utterances (rate of speech errors or disfluencies) – while in comprehension studies it is usually measured via a participant’s response to a comprehension question. processing speed: This refers to the time participants need to access a mental representation and, thus, to plan, produce, or comprehend an utterance. In production studies, this is measured as the onset regarding when a participant responds to a task (e.g., picture naming latency), whereas in comprehension studies the processing speed corresponds to the time taken to read a word or sentence. ontological type of processing output: This is not a domain in its own right, but it correlates strongly with the type of data being collected and the specific experimental method. In general, it distinguishes between data from behavioural vs. neurocognitive methods (cf. Table 7.3) and between the overt production of language data vs. language-related data in comprehension.
Experiments investigating real-time processing necessarily have to measure processing speed along with representational access. When real-time processing is not an issue, processing speed is not a relevant variable. The key components of experiments (see below) should be chosen in accordance with the variable domain(s) of interest to be measured. Regarding independent variables, as a rule of thumb, a limited number (typically about two or three) is investigated in one experiment. Unless the variables pertain to extralinguistic factors, they are implemented in the stimuli
Cognitive Linguistics and Psycholinguistics
to yield minimal pairs with categorical distinctions (e.g., grammatical item and ungrammatical counterpart in violation paradigms; cf. Section 2.4.3) or with parametric variations in terms of frequency, complexity, or predictability in a cotext/context. Stimuli, for which the researcher postulates a deviant processing profile in terms of additional processing effort or facilitation, are sometimes subsumed under the term of the experimental condition, whereas non-deviant stimuli are called the control/baseline condition. There are at least three further restrictions on stimulus construction: a. the use of filler or distractor items to draw participant’s attention away from critical manipulations, b. the use of different orders in stimulus presentation to mitigate carryover effects (cf. Section 2.4.3) between stimuli, and c. the implementation of several lexical instantiations per condition to ensure that, whatever the experimental outcome, it is not bound to a particular combination of words or phrases (see Jegerski 2014; Keating & Jegerski 2015 for details). The construction of stimuli also depends on your research question, participant sample and experimental design in terms of stimulus presentation, method of data collection and experimental task. We will now turn our attention to the latter. In the following, we will introduce the three key components of experiments in psycholinguistics and cognitive linguistics: experimental task, stimulus presentation mode, and experimental measures. Although they are combined in experimental studies creating an overall experimental design or paradigm, we recommend considering each of them carefully when designing an experiment: a.
Experimental task: Participants are asked to perform tasks in response to stimuli (cf. Section 2.4.2). In order to study how people think, perceive, decide, remember or imagine things, these tasks need to include such cognitive activities. Thus, according to the overarching activity, experimental tasks can be grouped into different types, such as decision-related or memory-based tasks (cf. Table 7.2). Some experimental designs include more than one of these tasks. In director/matcher language games, for instance, a description task is combined with a recognition task. The ‘director’ (participant A) gives a description of the stimulus to the ‘matcher’ (participant B) who selects or reconstructs the stimulus based on this description without having seen it initially. If more than one experiment is conducted with the same research participants, it is important to consider the order of tasks carefully in order to avoid interactive effects between tasks. Free listing, for instance, needs to be done before pile-sorting. Otherwise, the order and occurrence of items in free listing is influenced by the items of pile-sorting in an unwanted way. Experimental tasks can induce the same kind of confounds as tasks in other experimental sciences (cf. Section 2.4.3). Participants may weigh speed and accuracy in responding differently depending on task instruction: A speedy response comes at the expense of response accuracy and vice versa (speed-accuracy trade-off). A task may be
205
206
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Table 7.2 List of common experimental task types in cognitive linguistic research Task
Definition
Decision-related tasks: - categorisation - discrimination - judgement - evaluation - verification
Participants are asked to group linguistic stimuli according to selfdetermined parameters of similarity (e.g., in pile-sorting or triad tests) or to decide whether linguistic or non-linguistic stimuli belong to categories as defined by the researcher, i.e., they discriminate stimuli according to at least two categories (e.g., word vs. non-word in a lexical decision task, phonemes in a phonological awareness test, linguistic well-formedness in acceptability judgements, meaning match in picture-sentence verification, or associated attributes in semantic differentials).
Memory-based tasks: - recall - recognition
Participants memorise linguistic stimuli in an initial learning phase and are prompted to recall or recognise them in a subsequent test phase (e.g., which content units re-occur in story recounts, or whether words have occurred in a previous text). A memory-based task without a learning stage is called free listing (e.g., of all items belonging to a semantic category), which can also be regarded as an elicitation-based task.
Elicitation-based tasks: - naming - description - completion - contextualisation - association - reading aloud
Participants are prompted to produce language datai.e., they name or describe non-linguistic stimuli such as pictures (e.g., simple naming tasks, picture-prompted story-telling), they complete linguistic fragments (e.g., sentences missing the final word or words missing the final letters), they provide co-text/context for a linguistic fragment (e.g., a sentence containing a certain word) or associative meanings, or they read words/texts aloud.
Comprehension task
Participants read or listen to linguistic stimuli for comprehension.
too difficult or too easy for participants (causing ceiling effects or floor effects, cf. Section 2.4.2). And, finally, it can evoke task-specific patterns in the data that conflict with your research question. As you can see in Table 7.2, for instance, the task types involve varying degrees of naturalness (ecological validity) – i.e., they differ with regard to how commonly and consciously they are performed in everyday life. Judgement tasks, for instance, evoke a form of metalinguistic or normative awareness in participants that is not part of unconscious language processing. Thus, they are less natural than (e.g., free comprehension tasks, which limits the former’s value when you are focussing on naturalistic processing patterns). All of these factors can bias your data, regardless of the method used for data collection. b.
Stimulus presentation mode: This experimental component refers to the way in which participants are presented with stimuli. It pertains to the quantity, quality, and temporal duration of the stimulus being
Cognitive Linguistics and Psycholinguistics
presented at a given time and to the presentation order of the entire set of experimental stimuli. The following parameters can be manipulated: • modality of stimulus presentation: auditory vs. visual vs. multisensory • stimulus presentation quality: optimal vs. adverse conditions (lowquality, e.g., an auditory or visual noise) yielding easier vs. more difficult stimulus perception • number of simultaneously presented stimuli: single vs. multiple stimuli yielding conflicting vs. converging stimulus information (such as in interference paradigms or the stroop task, cf. Section 7.3.5) • quantity of a stimulus’ information presented at once: stimulus presentation in its entirety vs. in successive fragments (such as in the gating paradigm, Grosjean 1980) • order of stimuli presentation: random vs. functional order (such as in the priming paradigm – i.e., processing of the target stimulus is facilitated/inhibited when it follows a prime stimulus with (dis) similar properties, cf. Section 7.3.5) • control over onset and/or duration of stimulus presentation time: fixed (i.e., determined by the researcher) vs. self-paced (i.e., under participant control). There are stimulus presentation modes which make use of specific combinations of several parameters. Rapid serial visual presentation (RSVP), for instance, is the most common fixed-paced method, in which written stimuli are presented on screen one after the other in rapid succession. For sentence stimuli, this leads to fragmented presentation. With regard to self-paced presentation, a special option is the behaviour-contingent presentation mode, in which participants control the duration of stimulus presentation time, but the researcher determines the quantity of a stimulus’ information presented at once. There are three subtypes: In the moving window mode, participants perceive a stimulus only until they wish to proceed to the next one (cf. Section 7.3.4 on self-paced reading for an example). With a moving mask mode, some parts of a displayed stimulus are overlaid with a mask such as an array of random strings. Finally, in the gazecontingent boundary paradigm (specific to eye-tracking studies, cf. Section 7.3.4), the visual stimulus currently displayed on the screen is replaced in part with another stimulus, as soon as the participant’s eyes traverse an invisible boundary. c.
Experimental measures: This experimental component describes the nature of the measurement collecting data and can be classified along three dimensions, as you can see in Table 7.3:
207
208
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Table 7.3 List of common experimental methods for data collection Behavioural methods
- metalinguistic judgements and other kinds of elicitation - reaction time measures - self-paced reading - eye-tracking - speed-accuracy trade-off (SAT) measures
offline
online Neuro-cognitive methods
online
temporal dimension spatial dimension
- electroencephalography (EEG), especially event-related potentials (ERPs) - magnetoencephalography (MEG) - functional magnetic resonance imaging (fMRI) - functional near-infrared spectroscopy (fNIRS)
• temporal dimension: offline methods (measuring the final outcome of cognitive processes) vs. online methods (measuring the timecourse of cognitive processing, i.e., as it happens); • dimension referring to the ontological type of data: behavioural methods (measuring behaviour in terms of speed and accuracy) vs. neurocognitive methods (measuring when and where in the brain processing signatures change) of which the latter are ‘borrowed’ from neuroscience to investigate cognitive research questions; • informativity dimension: unidimensional methods (providing data that can only be analysed from one perspective, e.g., reaction time measures provide information only about how quickly participants respond to a stimulus) vs. multidimensional methods (providing data that allow for investigations from several perspectives, e.g., fixations as well as eye movements are measured with eyetracking, both of which can be analysed separately and provide partly different information on processing). Offline measures are unidimensional (except for self-paced reading, cf. Section 7.3.3.1), while online measures are generally multidimensional. Most behavioural methods are offline methods, as they measure processing effort only after a participant has completely processed or produced a linguistic stimulus. With a comparatively high temporal resolution and multidimensional data, behavioural methods such as eye-tracking (measuring eye movements on a millisecond-by-millisecond basis) or the speed-accuracy trade-off (SAT) method (measuring acceptability judgements under very high time pressure) also qualify as online methods. Neurocognitive methods, by contrast, overlap entirely with online methods, even though they also show variation with respect to whether they offer a high temporal or spatial resolution of online processing (cf. Chapter 8). From a practical point of view, behavioural offline measures are easier and less expensive in use because they do not require (advanced knowledge of ) highly
Cognitive Linguistics and Psycholinguistics
specialised technical equipment unlike many online methods. Moreover, they can be used more readily in field experiments or for short-term projects. In principle, almost all experimental tasks, stimulus presentation modes, and experimental measures can be combined freely, but there are default combinations of the three. While all experimental measures are feasible in language comprehension studies, their use is limited by the particular kind of dependent variable in language production (overt response), the particular sample in language acquisition studies, or the required equipment in field studies, as we shall see in the next sections. In what follows, we will introduce the main behavioural paradigms used in experiments to investigate language comprehension and production in healthy adults, in language acquisition research, and in research on language and thought. Neurocognitive measures will be presented in Chapter 8.
7.3.4
Experimental Research on Language Comprehension
Studies make use of nearly all of the experimental tasks and measures introduced above. In the following, we will consider behavioural measures with regard to language comprehension studies in more detail: •
metalinguistic judgements: Psycholinguists collect formal judgements from linguistic laypeople on linguistic stimuli from any domain (e.g., syntax, semantics). This is a unidimensional offline method. The simplest approach is to run a questionnaire in which participants rate stimuli in experimental and control conditions according to their acceptability or naturalness, or they categorise them in terms of (dis) similarity with regard to a comparison condition. A widely used example for the latter case is the lexical decision task where participants are asked to decide whether a given stimulus is a word or an instance of a non-word (an illegal string such as ‘gkxl’ in English) or pseudoword (a pronouncable string without semantic meaning such as ‘glab’ in English), with the critical variable often being the similarity of words/pseudowords and words. The major advantage of judgement data is their relatively easy and inexpensive collection, so they are a good starting point to test for the existence of mental representations and processes. The main disadvantage, however, lies in the complexity of data interpretation. Factors beyond linguistic processing can affect judgements, as they may reflect not only linguistic knowledge but also memory limitations (e.g., when fully grammatical sentences are classified as unacceptable because their structure is so complex that their processing exceeds memory capacity). In addition, complementary methods for data collection may be necessary when processing speed is of interest (as in speeded acceptability judgements combining reaction time and judgement measures) or when the unidimensional judgement data are insufficient to distinguish between different kinds of representations and/or processes.
209
210
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
•
reaction time (RT) measures: If your research question focuses on the speed with which language is processed, the simplest method to use is standard reaction time collection. It is a unidimensional offline method and must at least be combined with a comprehension task that, in turn, can be used to simultaneously infer processing accuracy. In standard RT experiments, the participants’ task is to respond to a stimulus with just one response (e.g., a button press or a spoken response) after they have completely processed the stimulus. The basic assumption behind RTs is that the time participants need to give their response reflects differences in cognitive processing following a subtraction logic: RTs are shorter because fewer processing steps are required or less effort is needed to process a stimulus (Donders 1969). Although standard RT measures are considered largely uninformative for questions regarding online processing nowadays, they are an inexpensive and easy method to provide initial evidence for or against a hypothesis regarding processing difficulties. When you are interested in the point during sentence reading at which processing difficulties emerge first, you can use self-paced reading, which is an RT method allowing participants themselves to determine the presentation rate of stimuli. Self-paced reading is an offline measure as it collects a series of reaction times, but it offers a higher spatial resolution by revealing at which sentence position processing difficulties exist. Hence, it provides multidimensional data and is, therefore, closer to online methods than standard RT measures. There are two presentation types for self-paced reading experiments: cumulative presentation (words are successively revealed upon button presses by the participant and remain on screen until the entire sentence has been displayed) vs. non-cumulative presentation (words are successively revealed upon button presses by the participant and are then replaced by dashes or underscores once the participant presses a button to proceed to the next word; cf. Section 7.3.3: moving-window presentation mode). Most current experiments use non-cumulative presentation to circumvent the issue that participants may begin reading the sentence only after all the words have been displayed, a strategy reducing the better spatial resolution of self-paced reading compared to standard RT measures. As with standard RT measures, self-paced reading capitalises on processing speed, and processing accuracy has to be collected via additional comprehension tasks (cf. Jegerski 2014). The speed-accuracy trade-off (SAT) technique is an online RT measure collecting both speed and accuracy data during reading. In this kind of experiment, participants are trained to respond to a sequence of short beep tones presented after the final word of a sentence. For each tone, participants must press one of two buttons
Cognitive Linguistics and Psycholinguistics
•
to indicate whether they find the sentence acceptable or not and must do so under great time pressure (within hundreds of milliseconds after tone onset). This procedure provides a continuous measure of how sentence interpretation changes over time. Thus, processing speed and accuracy are directly linked and analysed accordingly, providing multidimensional data. This method also addresses the inherent ambiguity of many RT measures with regard to whether processing disruption is caused by insufficient quality of a mental representation or a slowed down mental process to access memory information (see Martin & McElree 2008 for further discussion). A disadvantage of the SAT technique is that it is not feasible for all participant populations. eye-tracking: With this behavioural online method, you can monitor participants’ eye movements during language comprehension. It offers multidimensional data because different aspects of the eye movement recording can be investigated, such as temporal information (mainly provided by fixations – i.e., short intervals of eye immobility for information uptake) and spatial information (provided by saccades – i.e., short jumps transitioning the eyes from one object or word to another). In language research, there are different eyetracking paradigms: the visual world paradigm, eye-tracking during reading, and pupillometry. While all three paradigms provide data on processing speed, processing accuracy must be investigated via additional explicit comprehension tasks. The visual world paradigm is used to study spoken language comprehension and its interaction with visual information. Participants hear sentences or words while observing pictures on a display at the same time. The underlying assumption is that visual attention and linguistic processing are tightly linked – i.e., participants look at those objects that they are currently attending to in the auditory stream. Hence, the point in time when participants fixate specific objects is assumed to be indicative of when particular linguistic information is being processed. This paradigm is very useful if you are interested in when hearers begin to process a particular linguistic representation and to what extent they employ visual information when understanding speech. A major advantage of this paradigm is its high ecological and external validity because it does not necessarily involve an explicit task, so there are no restrictions in terms of minimum requirements of participants’ linguistic knowledge (e.g., literacy). A disadvantage is that its temporal resolution is somewhat limited compared with eye-tracking during reading. Eye-tracking during reading is used to study cognitive processes specific to reading or to study language processing in general when the research question does not posit processing differences for spoken and written language. Stimuli can be presented in different modes: natural text presentation, moving window or moving mask presentation, or gaze-contingent boundary paradigm presentation (cf. stimulus presentation mode in Section 7.3.3 above). A major
211
212
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
advantage of eye-tracking during reading is its high temporal resolution (on a scale of hundreds of milliseconds) coupled with multidimensionality and high ecological validity. A disadvantage is that it is restricted to literate populations. Pupillometry is a unidimensional measure tracking changes in the size of the pupil following stimulus presentation or task execution. The opening of the pupil (pupil dilation) is more sensitive to cognitive variables than its restriction which is more related to luminance variation. Researchers can measure the peak of dilation (i.e., the maximal opening) and/or dilation changes over time. A major advantage of pupillometry is its sensitivity to cognitive processes without requiring explicit tasks other than attentive comprehension, making it particularly easy to use in studies with special populations such as young children. Furthermore, it is nearly impossible for humans to control changes in pupil size which makes this paradigm comparatively robust with regard to undesirable task effects. A disadvantage is its temporal resolution, as significant changes in pupil size occur seconds after stimulus presentation. It is therefore better suited to investigate qualitative differences between stimuli, rather than the time-course of language processing. 7.3.5
Experimental Research on Language Production
Experimental paradigms in language production also make use of reaction time measures, eye-tracking, and neurocognitive methods, but their application is more restricted for the following reasons: Language production research aims at collecting overt linguistic responses in speaking, signing, or writing some of which are difficult to elicit under experimental control (e.g., slips of the tongue, speech errors) or may lead to distortions during data recording (e.g., muscle activity of the articulators may interfere with neurocognitive recordings). Furthermore, experimental tasks in production research represent the most striking difference vis-à-vis paradigms in comprehension research. Tasks and paradigms have to ensure that the experiment in fact measures processes related to language production (and not concurrent comprehension as in reading aloud), and that non-verbal messages in the experimental conditions (which are to be encoded verbally) are comparable to each other (see Bock 1996, Schiller 2012). As a result of these limitations, research on language production uses a mixture of experimental, observational, and corpus-based methods, and includes a limited set of experimental tasks – yielding specific experimental paradigms. Experimental methods are primarily behavioural, measuring processing speed (e.g., speech onset latency), accuracy (e.g., correct pronunciation or word selection) and type of output (e.g., choice of syntactic structure). Non-experimental designs analyse natural speech in terms of errors or dysfluencies (e.g., slips of the tongue), distributional patterns (i.e., frequency and co-text/context of types) and units of meaning (e.g., metaphors). These experimental methods are offline, as they do not measure speech planning as it happens (i.e., the processes preceding
Cognitive Linguistics and Psycholinguistics
the speech onset). However, online measures can also be applied, with one of the following two specific production tasks for instance: •
•
combined picture description and eye-tracking: By using the visual world eye-tracking paradigm (cf. Section 7.3.4), researchers can investigate online processes in speech planning (including before speech onset). It has been found, for instance, that peoples’ eyes move towards an object on the display about a second before it is named, that they look at an object longer when its name is difficult to plan (Griffin 2004), and that the serial order of glances at an event picture can be indicative of the argument order of the sentence (Griffin & Bock 2000; for cross-linguistic differences see Norcliffe et al. 2015). EEG measures with covert or delayed tasks: Electroencephalogram (EEG) measures are used to track the time-course of neurocognitive processes before speech onset providing multidimensional online data. However, a major problem with the use of neuroscientific methods in production research is their susceptibility to undesirable data patterns, masking brain activity in response to linguistic processing due to muscle activity which is caused by overt speaking or signing. Researchers have, therefore, resorted to using covert production tasks (i.e., participants think of what they would say instead of overt production) or delayed production tasks (i.e., overt responses follow the stimulus with some delay).
In language production experiments, there are two major designs/paradigms which involve tasks combining with reaction time and accuracy measures, but which differ from each other in the complexity of stimulus information. •
simple naming-based designs: This group of tasks is based on a simple perception-production cycle – i.e., participants first perceive a (non-)linguistic stimulus which provides information on what is to be produced, either overtly or covertly. In naming tasks, participants are required to describe pictures of objects or events by responding with the word or sentence that best matches the respective item. Thus, this task is useful for research questions targeting word as well as sentence production. There are some further variants of the naming-based paradigm. In completion tasks, participants are presented with fragments (e.g., a sentence fragment like ‘I like my coffee with sugar and ___’) and are asked to find one or more words which would plausibly complete the fragment. Completion tasks can be used to study the likelihood of a word or lexical-semantic category occurring in a certain co-text/context. Conversely, in elicitation tasks participants are given a word or another item for which they should provide a co-text (i.e., they are prompted to produce a sentence, a word, or a narrative. This task provides information about prototypical use or cognitive prominence).
213
214
•
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Compared to completion tasks, it is important not to provide any kind of contextual cues that may prime some responses more than others. Monitoring differs from these tasks in inducing a dual-task situation. Participants are instructed to covertly produce words or phrases in response to a picture or other stimuli. At the same time, they are asked to press a button when their covert speech unit contains a target element (e.g., a phoneme) or belongs to a target category (e.g., a nominal or a verb). Reaction times from the button presses are then used to infer when a type of linguistic information is accessed during speech planning. interference designs and structural priming: This group of tasks almost always includes a naming task but involves a more complex perception-production cycle. In interference designs, there is information from several stimuli used for speech planning which may interfere with each other (thus, affecting production), or there is one stimulus with conflicting properties. In structural priming designs, the properties of a prime sentence can influence the syntactic structure of the participants’ subsequent picture description. In stroop tasks, a colour word is presented in written form, with the colouring either matching or mismatching the word meaning. For incongruent combinations (e.g., the word ‘blue’ written in red ink), participants take longer to name the written colour word and sometimes make mistakes by naming the colouring instead. Thus, distinct stimulus properties (information from the colouring and lexical information from the word) negatively interfere with each other. In the picture-word-interference design participants see pictures of objects or actions with the general task of naming them, while a word is presented at the same time as the picture. Participants are instructed to ignore the word stimulus while they name the picture. However, linguistic information still interferes with picture naming, as we cannot avoid processing language subconsciously even when we try not to. Naming accuracy and latency vary according to the (sub)lexical relation between picture and word. Alongside negative interference, there are also facilitation effects, namely, shorter onset latencies when, e.g., the word stimulus shares phonological features with the picture name (e.g., Schiller 2012). In the structural priming paradigm, participants describe a picture after they have read simple prime sentences with a specific syntactic structure (e.g., active vs. passive voice). These sentences can be read aloud or silently. In their picture descriptions, participants preferably choose a syntactic structure matching the prime sentence. This priming effect is driven by structural information and holds across languages and populations (cf. Pickering & Ferreira 2008). Furthermore, it occurs not only in experimental paradigms but also in naturally occurring dialogues.
Alongside elicitation data obtained in experiments, research on language production makes heavy use of nonexperimental and multimethod approaches.
Cognitive Linguistics and Psycholinguistics
•
•
7.3.6
observational methods: Analyses of natural spontaneous speech are an important means to study the variables in speech production which are difficult to elicit under experimental conditions. These are deviations from fluent speech (e.g., speech errors and their corrections, hesitations) or the frequency and distribution of types in well-formed linguistic constructions. In contrast to studies on language variation in corpus linguistics or sociolinguistics, the focus lies on cognitive explanations and the analyses are generally quantitative in nature. SLIP (spoonerisms of laboratory-induced predisposition) paradigm: This paradigm is used to induce speech errors during phonological encoding. Using serial visual presentation, participants are presented with short phrases, most of which have identical word onset patterns (e.g., ‘dove ball’, ‘deer back’, ‘dark bone’ in Nooteboom & Quené 2007). Some phrases deviate from this pattern in that word onsets are in inverse order (e.g., ‘barn door’, ‘bad game’). After some of the phrases, participants are prompted to read aloud the immediately preceding phrase. In the case of the deviating phrases, this may result in slips of the tongue involving word onsets (e.g., ‘barn door’ ➔ ‘darn bore’, ‘bad game’ ➔ ‘gad bame’). The slip paradigm has revealed that many speech errors generate words rather than non-words, thereby complementing findings from natural data analyses. While the latter can shed light on frequency and structure, experiments can be constructed to contrast competing explanations on why these structures occur. Experimental Research on Language Acquisition
Research on language acquisition is concerned with what children know about language, how and when children acquire knowledge of one or more languages, and when a developing language system has fully matured. Because this requires the study of participants of different ages and cognitive abilities, researchers in this field make extensive use of the mixed-methods approach (cf. Section 2.5), including observational, corpus-based, and experimental methods for data collection (cf. Monaghan & Rowland 2016). Observational methods are the historically oldest methodological approach, going back to diary studies in the late nineteenth century, in which researchers documented their child’s cognitive and linguistic development in single-case studies. Now, diary studies are largely replaced by corpora of child language which include huge amounts of data from recordings in (semi-)naturalistic settings (e.g., the CHILDES database) and which can be searched automatically. Observational and corpus-based methods are complemented increasingly by experimental approaches for two main reasons: First, observational and corpus data build on a child’s ability to produce speech or language and, hence, are less suited for investigations of linguistic (and cognitive) development in early
215
216
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Table 7.4 Tasks and paradigms in language acquisition research with children
Language comprehension
Language production
Preverbal infants
Young children
• habituation • head-turn preference • preferential looking –
• truth value judgements • picture matching/sentence-picture matching • act-out task • elicitation-based tasks, e.g., picture naming, story narration • semi-structured elicitation
infancy (up to the end of the first year of life), when children cannot speak yet. Second, corpora may exclude types of linguistic structures which are not (frequently) part of a child’s linguistic input or output but are still understandable. Experiments can fill these gaps by enabling researchers to scrutinize language comprehension, especially during preverbal infancy, and to focus on the processing of language in real-time. Regarding the key components of an experiment (see Section 7.3.3), both behavioural (e.g., specific types of reaction time measures, eye movement measures, and neurocognitive measures (e.g., EEG) are used in experiments on child language, but stimulus presentation and experimental task are more restricted as they depend greatly on the child’s linguistic and cognitive abilities. Stimulus presentation has to be adjusted so that stimulus properties and presentation parameters are age appropriate. Table 7.4 presents a list of tasks which have been specifically developed for children who do not possess all cognitive and linguistic skills that experimental tasks developed for adult speakers presuppose. These tasks are grouped with respect to the targeted sample (preverbal infants vs. young children) and the kind of language processing (comprehension vs. production). Preverbal infants have not yet fully discovered the conventionalised association between sound and meaning, cannot produce meaningful language, and also lack other advanced cognitive skills (e.g., fine-grained motor control) compared to adults. Therefore, paradigms for this participant group work on attention shifts and measures of body movements in response to language comprehension processes only: •
habituation paradigm: Infants get used (habituate) to hearing or seeing stimuli repeatedly presented during a habituation phase. In the test phase, they are presented with old and new stimuli. It is assumed that habituation is linked to the ability to discriminate between the two stimulus types based on linguistic knowledge and that infants allocate more attention to new stimuli. Attention shifts are indicated by a higher frequency of sucking (high amplitude sucking paradigm; Ingram 1989) or an increased looking time for the new stimulus (visual fixation paradigm; Johnson & Zamuner 2010).
Cognitive Linguistics and Psycholinguistics
•
•
head-turn preference paradigm: Infants hear linguistic stimuli from speakers positioned in different corners of a room. The dependent variable is the time the infant attends to a stimulus, i.e., looks at a corner for more than about two seconds. Head-turn preference to a stimulus is indicated by a longer looking time (Hollich 2006). preferential looking paradigm (or looking-while-listening paradigm): This is a modified version of the visual world eye-tracking paradigm used with adults (see Section 7.3.4). A speaker attracts the child’s attention to a picture which includes a target object and one or more distractors. The dependent variable is when infants begin to look at the target object after it has been named by the speaker, and for how long they attend to it.
For young children such as pre-schoolers, there is a set of distinguished production and comprehension tasks. For language comprehension, these include: •
•
•
truth value judgements: Children watch a video, a person do a pantomime, or they look at a picture before they hear a short sentence. Their task is to decide whether the meaning of the sentence is true or not depending on the contents of the video, pantomime, or picture (see review in Schmitt & Miller 2010). picture matching or sentence-picture matching: Children are presented with visual and linguistic stimuli and are asked to integrate the meanings with each other. Often, this is combined with a forcedchoice task, so that, e.g., children are asked to pick the picture that best matches the meaning of the sentence (for further variants see Schmitt & Miller 2010). act-out task: Children are given instructions which are manipulated to include particular linguistic forms of interest (e.g., Put some/all/ most apples in the basket), and their non-linguistic reaction (e.g., moving a toy from one place to another, making particular moves or gestures – is taken as an indication of how they interpreted the instruction) (cf. Schmitt & Miller 2010).
In production experiments, researchers mainly employ the following two types of elicitation-based tasks to investigate linguistic knowledge children have already acquired at the time of testing: •
elicitation-based tasks: Children are instructed to complete sentence fragments, to paraphrase target structures or to name objects. The wug test is a famous example of this type of task (Berko 1958). Children are shown pictures of an animate or inanimate object each with its own pseudo-word name (a pronounceable nonsense word). When they are shown the same pictures again, but this time with several instances of the object, they are supposed to use the pseudoword name and modify it to indicate plurality. The use of pseudo-word
217
218
•
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
names is critical to preclude that the children could have heard the correct target form before; therefore, they have to actively rely on their grammatical knowledge to master the task. semi-structured elicitation: This paradigm works on the same kinds of elicitation-based tasks as exemplified above, but the researcher provides a more naturalistic context for data elicitation (Berman & Slobin 1994). Semi-structured elicitation is often implemented as a game or a form of narration task. In the latter case, children are presented with picture books or silent movies encouraging them to narrate the story.
In sum, experimental tasks for preverbal infants and young children are intended to engage them in behaviour that they would also exhibit outside of the lab. With infants, tasks are therefore based on attention shifts – and, hence, coupled with methods sensitive to attentional processes (e.g., eye movements, EEG: Friederici & Thierry 2008; Sekerina, Fernández & Clahsen 2008; Oakes 2012), whereas younger children may be prompted to produce linguistic utterances via description tasks or to judge whether a word or sentence has been used appropriately in a given situation, which indirectly tests their comprehension abilities. At a later age, children are tested with the sets of tasks and paradigms developed for adults, but additional tests on children’s verbal and cognitive abilities (e.g., vocabulary size) typically accompany these designs.
7.3.7
Research on the Relationship Between Language and Cognition
Studying the interplay of language and cognition is an essential research area in cognitive linguistics. The basic question is whether people who speak different languages think differently. The principle of linguistic relativity (also called Sapir-Whorf hypothesis) predicts that people of different languages (and cultures) perceive the world differently. This statement has been discussed controversially and both verified or disproved by multiple studies citing different examples. In the strong version of the hypothesis, language determines thinking (linguistic determinism) (i.e., there is a general structural correspondence between language and thought). In its weak version, language has an impact on thinking (i.e., language and thought can differ structurally). Radical opponents of linguistic relativity claim that all mental principles and categories are innate and, thus, shared by all humans no matter what language they speak. In order to not assume an effect of language on thought unwarrantedly, systematic empirical research is required. In such studies, the two main variables, namely language and thought, must be investigated independently from each other before they can be related. This means that thought cannot be deduced from mental conceptualisations as reflected in language (i.e., language analysis). Instead non-linguistic data is needed. Otherwise, circular reasoning
Cognitive Linguistics and Psycholinguistics
based on the postulation that thought is structurally identical to and cannot be separated from language impairs the research outcome (data validity). In this light, Brown and Levinson (1993: 1) propose the following procedure for investigations on the relationship between language and cognition: (a) first, pick a conceptual domain; (b) second, find two or more languages which contrast in the semantic treatment of that domain (i.e., where very different semantic parameters are employed); (c) third, develop non-linguistic tasks which will behaviourally reveal the conceptual parameters utilized to solve them; (d) compare the linguistic and non-linguistic representation systems as revealed by (b) and (c), and assess whether there is any correlation between linguistic and non-linguistic codings in the same domain.
Commonly researched conceptual domains are space, time, possession, colour, number and quantity, motion, gender. Typological studies (cf. Chapter 4) reveal in what way languages show conceptual variation with regard to a specific domain. Consequently, there is always some kind of cross-linguistic perspective, no matter whether a single language is being studied or multiple languages comparatively. The linguistic encoding of ‘space’, for instance, can be based on different frames of reference: In Indo-European languages spatial descriptions are predominantly made by the use of ‘left’/’right’/’front’/’back’ (i.e., a relative frame of reference, in which the coordinate system is anchored in ego). Prior to broader cross-linguistic studies, it was assumed that this way of spatial reference was universal. However, Guugu Yimidhirr speakers (Australian language), for instance, do not have words for left and right. Instead they use cardinal directions to express spatial reference, an absolute frame of reference in which the coordinate system is anchored in the geocentric system. If the typological information is not detailed enough, linguistic and languagerelated data on mental representations can be collected systematically. For better comparability across languages and speakers, a certain level of standardisation of data is recommended, as in picture-prompted storytelling or in other experimental designs with uniform stimuli. In studies on ‘space’, these could be, e.g., director/matcher language games for collecting standardised route descriptions on a map-like picture (Pederson et al. 1998), naming tasks or acceptability ratings for collecting standardised descriptions of spatial constellations between figure and ground positioned differently on a grid (Carlson & Hill 2007). Non-linguistic data on cognition pertain to mental patterns underlying thinking, perceiving, deciding, remembering, acting, feeling, imagining, etc., independent of language. Such information results from ethnographic research and/or field experiments (i.e., the participants are studied within their natural familiar environment as dislocating them might influence their behaviour). Thus, the experiments need to be viable within distinct field locations. This requires transportable equipment and an experimental design (particularly tasks and items) participants are familiar with. Without prior knowledge of culture-specific
219
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
220
Table 1
Table 2
N
S
right
N
left
right
ideas and behaviour patterns (e.g., the practice of mental state talk, the importance of individual attitudes and feelings), data collection and interpretation will easily become biased (cf. ethnographic fieldwork in Section 6.3.1). In crosslinguistic/-cultural studies, culture-specific adaptations of the experimental design come at the expense of comparability. The workgroup on space at the Max Planck Institute of psycholinguistics in Nijmegen/the Netherlands (e.g., Senft 1994; Levinson 1996; Pederson et al. 1998) has developed several nonlinguistic experiments to identify the frame of reference (absolute vs. relative) underlying spatial thinking irrespective of language. These experiments are based on the following design (see Figure 7.1, cf. Senft 1994: 421; Levinson 1996: 113) that can easily be conducted in distinct field contexts: Participants are asked to memorise a pattern (stimulus) shown on a Table 1 in front of them. This pattern is asymmetric so that the sides of the pattern can be distinguished (e.g., an arrow with a tip and a blunt end on the edge). Afterwards, the participants have to turn around (rotation by 180) in order to face Table 2. On this table, they are asked to select, reconstruct, or complete the pattern they saw previously on Table 1. If participants choose pattern A on Table 2, this indicates that they memorised the arrow on Table 1 as also ‘pointing to the north’ (absolute orientation). Otherwise, choosing pattern B on Table 2 means that they memorised the pattern on Table 1 as also ‘pointing to their right’ (relative orientation). Further nonlinguistic data on spatial cognition can be collected, e.g., by asking participants to indicate cardinal directions in different locations, or to draw mental maps of their environment. Finally, mental representations as encoded in language are compared with mental representations underlying language-independent thought in order to look for correlations. Investigations on space, for instance, have shown that the frame
180° rotation
left
A B absolute relative
S
Figure 7.1 Underlying design of the non-linguistic spatial experiment
Cognitive Linguistics and Psycholinguistics
of reference used in linguistic encoding (absolute coordinates in Guugu Yimdhirr vs. a dominance of relative coordinates particularly in small scale descriptions of most Indo-European languages) strongly correlates with the frame of reference underlying non-linguistic perception (the vast majority of Guugu Yimidhirr speakers choose pattern A on Table 2, while speakers of Indo-European languages predominantly choose pattern B). However, the great challenge of studies on linguistic relativity is to gain actual language-independent mental data. This has to be distinguished from ‘thinking for speaking’ (i.e., thinking that occurs immediately prior to language use) (Slobin 1996). In the latter case, a language forces its speakers to a certain kind of mental representation as relevant for linguistic encoding, but this is not necessarily the dominant mental representation of non-linguistic mental activity. A more detailed classification of distinct kinds of relationships between language and cognition resulting from various investigations is presented in the following section.
7.4
Basic Research Findings
Psycholinguistics and cognitive linguistics offer a vast number of findings on mental representations and processes underlying language production, language comprehension, language acquisition, and language and thought. In this chapter, we cannot summarise all the pertinent findings in detail, but it is possible to identify some recurrent processing principles and constraining factors across research areas and their influence on mental representations, mental processes, and the time-course of language processing. Psycholinguistics focuses on how speakers process language in real-time (i.e., while it happens). Studies on language comprehension, production, and acquisition specifically aim at testing the psychological reality of the components of language grammar (defined broadly), the processes underlying their use and their interaction with other human cognitive abilities. Psycholinguists study these linguistic representations and processes across humans’ lifespans, i.e., its development from infancy to old age, and in comparing cognitively unimpaired with impaired speakers of a language (e.g., specific language impairments, language in speakers with psychological disorders). In general, deviant processing patterns are deduced from comparing processing patterns for different types of linguistic input using language data (e.g., speech errors) and language-related data (e.g., longer reaction time and higher error rate in response accuracy), and they are typically interpreted as reflecting processing difficulty. Numerous studies have revealed that language processing makes use of mental representations and regularities specified in a speaker’s mental grammar. These mental representations roughly correspond to the basic categories postulated in linguistics, for example, phonemes, words, or sentences, and are assembled from smaller to increasingly larger meaningful units. Interaction between different
221
222
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
linguistic representations as well as the interplay of language knowledge and other cognitive abilities (e.g., memory, perception, reasoning) are mainly responsible for variation in how and when linguistic representations are accessed during processing. In addition, psycholinguistics has revealed further factors that constrain processing efficiency: •
•
•
•
incrementality: Language processing proceeds incrementally – that is, in a piecemeal manner. In comprehension, newly perceived pieces of information are interpreted to the maximum degree possible to assign linguistic structure and meaning to the speech stream. In production, speakers use information for utterance planning as soon as it becomes available (i.e., at the moment it is mentally accessible). predictive processing: Predictability refers to the probability with which a linguistic unit (e.g., a word) occurs in a given context. Predicted units show processing facilitation in terms of reducing comprehension effort (e.g., shorter reading time) and speaker effort (e.g., deaccentuation in speech). revision and correction: Parsing mistakes in comprehension and production are revised by reanalysis (in comprehension) or selfcorrection (in production), meaning that processing requires more effort. A speaker’s processing system may not detect an anomaly in need of revision (interpretative failure in semantic or grammatical illusions) or it may even fail. frequency and language exposure/experience: Speakers are sensitive to how often they have encountered a linguistic element in general (e.g., lexical frequency), when they did so for the first time (e.g., age of acquisition), and in which contexts they encountered the linguistic element (e.g., contextual diversity, speaker familiarity). Higher frequency and exposure rate tend to reduce processing effort. Frequency estimates are typically taken from corpora (cf. Chapter 5) and there is a lively debate about which corpora provide adequate estimates for contemporary language use across modalities (e.g., Brysbaert et al. 2011)
These factors strongly contribute to processing preferences – i.e., the fact that certain linguistic structures are easier to understand or to produce than others. They are assumed to hold across languages and speakers. There is an ongoing debate on the source of these processing preferences, especially regarding the status of frequency or experience effects in language processing. While it is indisputable that frequency plays an important role, it is still controversial whether it is a source of processing preferences or a modulating factor with principles of grammar being the actual source (cf., e.g., MacDonald et al. 1994; Bornkessel et al. 2002; Ellis 2002; Bornkessel-Schlesewsky & Schlesewsky 2009: 100–102; Seidenberg & MacDonald 2018). From a slightly different perspective, this is also a debate about the relationship between measures of
Cognitive Linguistics and Psycholinguistics
processing effects in psycholinguistics and frequency counts in corpus linguistics (cf. Chapter 5), which are both considered empirical correlates of speakers’ mental representations of language. Much research has been devoted to elucidating the relationship between various psycholingustic (i.e., experimental) and corpus linguistic measures (e.g., Arppe & Järvikivi 2007; Divjak 2008; Ellis et al. 2014; McConnell & Blumenthal-Dramé 2019; cf. the review in Blumenthal-Dramé 2016), but the emerging cross-method pattern of findings eludes a clear explanation. This is because of the different measures being compared across studies and because the different types of data only partially converge and, hence, may reflect partly different mental representations (cf. Section 9.2). Nevertheless, the crosstalk between psycholinguistics and corpus linguistics illustrates the importance of multi-method combinations to examine the mental representation of language from different viewpoints (cf. Section 2.5.1). More recently, psycholinguistics has identified further variables which may interact with the above factors constraining processing efficiency. But these factors require further consideration in future research: •
•
language processing model(s): Traditionally, language comprehension and production have been treated as strictly separate systems. However, natural dialogue requires people to switch rapidly between hearer and speaker perspectives, making it possible that both systems are potentially inseparable (Pickering & Garrod 2013; Kittredge & Dell 2016). variation within and across samples: There are approximately 7,000 languages currently spoken in the world, but psycholinguistics – similar to other fields in psychology – exhibits a bias towards investigating a few similar (European) languages with good infrastructure for research (cf. Henze et al. 2010; Rad et al. 2018). At a rough estimate, research on language production is based on data from less than 1 per cent of the languages currently spoken, and research on language acquisition on only 1–2 per cent (Norcliffe, Haerris & Jaeger 2015; Stoll 2015). So, to what extent do the identified milestones (i.e., developmental steps) and patterns in language acquisition hold from a typological point of view? To what extent do the abovementioned constraining factors for processing efficiency in fact apply cross-linguistically (cf. Bornkessel-Schlesewsky & Schlesewsky 2016)? Similarly, how much of the processing variation can be attributed to variation within individual speakers (intraindividual differences) or variation across speakers with (dis)similar backgrounds (interindividual differences)? The greatest challenge is to adjust the experimental key components such as stimuli and task instruction according to the particular requirements imposed by the distinct socio-cultural settings. It is an open question to what extent this is possible (cf. Hellwig 2019).
223
224
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
•
Reductionist vs. naturalistic approaches: Psycholinguistic studies almost exclusively rely on lab experiments with student samples. Future research will have to consider the extent to which variables affecting language processing in highly controlled lab settings affect processing in real-life situations as well (e.g., Speed, Wnuk & Majid 2018).
In sum, psycholinguistics has revealed that language processing in real-time is subject to a dynamic interplay of linguistic and general cognitive factors, which conjointly determine how speakers arrive at a particular meaning or utterance. Although, psycholinguists acknowledge cross-linguistic and individual variation in language processing and acquisition, our understanding of this variation is still developing. Cognitive linguists focussing on the relationship between language and cognition use cross-linguistic/-cultural comparison to investigate whether or in what way the speakers’ language has an impact on their thinking. Overall, the research has shown that the relationship between language and thought is manifold and bidirectional. Our language does not only influence our thinking but our mental focus can also be gleanable from our choice of linguistic expressions. This does not, however, mean that there is necessarily a correspondence between mental representations as encoded in language and mental representations underlying thought. Language or thought can, for instance, change independently from each other, as Bartmińsky’s example of ‘sun rises’/’sun sets’ illustrates so well: The conceptualisation encoded in these linguistic expressions is still that of an earth at the centre of the universe with the sun rotating around it, although this idea has been disproven since the Copernican revolution. Basically, the following kinds of effects of language on thought are distinguished (e.g., in Wolff & Holmes 2011; Casasanto 2017): • • •
7.5
no structural correspondence between language and thought (i.e., language competes with or contradicts non-linguistic thought)); an impact of language on thought immediately prior to language production (i.e., thinking for speaking); and language can prime or make properties salient in non-linguistic thought or even create mental concepts (thinking after language).
Summary
In this chapter, we presented empirical research procedures in cognitive linguistics and psycholinguistics. In order to study linguistic knowledge in terms of unconscious mental representations and mental processes during language use, generally, two basic research methods are used: observation and analysis of natural language data and a broad range of experiments.
Cognitive Linguistics and Psycholinguistics
The choice of experimental design depends first and foremost on the research area, namely, language comprehension, language production, language acquisition, or the relationship between language and thought. The research focus (e.g., cross-linguistic, developmental, or impairment studies) also has an impact on potential research participants (children, bilinguals, etc.) and the research location (laboratory vs. field experiments). As we have seen, there are multiple kinds of data (language, language-related or languageindependent behaviour, and neuronal processing) that are collected in many different ways. Experimental designs are specific combinations of experimental tasks that participants are asked to perform, of stimulus presentation modes (i.e., ways in which participants are presented with stimuli), and of experimental measures (i.e., the nature of measurement for data collection). Mental representations and processes can only be inferred indirectly from the data and generally multiple experiments and/or natural data analyses are necessary to answer a research question comprehensively. The findings from cognitive-linguistic and psycholinguistic studies have revealed that there are both cross-linguistic universals and differences with regard to mental representations and processes, and that there is a set of cognitive-linguistic processing principles that shape language comprehension and production across the life span. Because of its interdisciplinary origin, cognitive linguistics and psycholinguistics interact with various other non-linguistic disciplines such as experimental psychology or cognitive science.
7.6
Exercises and Assignments
Due to the required technical equipment and application skills, online measures are time- and money-consuming so that beginners can only make limited use of them. Exercises for students which can be included during a session on cognitive linguistics and psycholinguistics or as part of project work: 7.1
7.2
Take one or more cognitive-linguistic and psycholinguistic studies investigating one linguistic domain (e.g., space or colour terminology) or one linguistic phenomenon (e.g., processing of preferred vs. less preferred structures) and work out the experimental components (variables, experimental tasks & measures). In what way do multiple experimental designs contribute to a more comprehensive investigation of the domain? Discuss the advantages and disadvantages of the two main eyetracking paradigms (visual world paradigm & eye-tracking in reading) with respect to different kinds of research participants, research issues, research locations, etc.
225
226
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
7.3
Explain the difference between categorisation based on necessary and sufficient features or on prototypicality for seating furniture. Provide examples for the vertical and horizontal dimension of semantic categorisation. Discuss the naturalness of experimental tasks as shown in Table 7.2. Try to find a research question (cf. Section 7.1) and state which task may be a good fit and which may obscure the results. Develop a research question on sentence or discourse processing and develop example stimuli based on minimal pairs (cf. Section 2.4.3). Discuss potential problems in minimal pair construction. Find out what happens if you ask people to perform a task (e.g., naming or describing pictures) as fast as possible vs. as accurate as possible.
7.4
7.5
7.6
Also, here are some ideas for minor research projects: 7.7
7.8
7.9
7.10
What are mental representations underlying conceptual metaphors in sports? Are there similar source domains (e.g., SPORTS is BATTLE) across languages? Collect and analyse suitable language data. How do languages differ in their formal classification with respect to object-related items (e.g., gender/noun classes, classifiers, or word taboos)? What are distinctive parameters (e.g., animacy, alienability, social status, or sacredness)? Collect and analyse suitable language data. Investigate communication (e.g., event description, spatial language, or conceptualisation of motion events) by use of pictureprompted story telling (the ‘frog stories’ picture book, Mayer 1969). This can be done cross-linguistically and/or in developmental research comparing children of different ages vs. adults. You can then analyse picture descriptions with respect to syntactic structure, nominal modification, number and status of referents mentioned. Develop and conduct an experimental study or replicate an existing study with marginal changes regarding the group of participants, the specific topic, stimuli, or other research components. For instance, as the study of Gennari and MacDonald (2009) has shown, the sentence structures which speakers tend not to produce in elicitation experiments are also the hardest to understand. This could be replicated with native speakers of a language other than English and acceptability judgements instead of self-paced reading. Moreover, investigations using online measures can be planned in detail (hypothesis, variables, stimuli, participant sample, experimental measure, etc.) without finally conducting the experiment.
Cognitive Linguistics and Psycholinguistics
7.7
Further Reading Cognitive Linguistics
Introductory textbooks to cognitive linguistics are Taylor 1989/ 20033 & 2002, Ungerer & Schmid 1996/20062, Dirven & Verspoor 1998/ 20042, Lee 2001, Croft & Cruse 2004, and Evans & Green 2006. Cognitive grammar, in particular, is described in Tylor 2002 and Langacker 2008. Complementary readers with contributions by important representatives of the subdiscipline include Geeraerts 2006 and Evans, Bergen & Zinken 2007 as well as companion volumes focusing on more recent trends in cognitive linguistics by Ruiz de Mendoza & Peña Cervel 2005, Kristiansen et al. 2006, Evans & Pourcel 2009, Sandra & Östman 2009, and Littlemore & Taylor 2014. Evans 2007 provides a glossary of key terms and concepts in cognitive linguistics, an annotated reading list is published online (www.vyvevans.net), and an extended bibliography of the subdiscipline including monographs, articles of journals and book series, dissertations and MA theses as well as working papers and unpublished work is published by Wolf et al. 2006. Handbooks on cognitive linguistics include Cohen & Lefebvre 2005, Geeraerts & Cuyckens 2007, Robinson & Ellis 2008 (on second language acquisition), Dabrowska & Divjak 2015, and Dancygier 2017. Further books on cognitive linguistics that include the aspect of culture and/or society are among others Palmer 1996, Kövecses 2006, Kristiansen & Dirven 2007, and Sharifian 2017. Journals with a focus on topics in cognitive linguistics topics are ‘Cognitive Linguistics‘ (de Gruyter) which is the official journal of the International Cognitive Linguistics Association (ICLA), the ‘Annual Review of Cognitive Linguistics‘ (Benjamins), a journal published under the auspices of the Spanish branch of ICLA, ‘Language and Cognition’ (Cambridge University Press), and the ‘International Journal of Cognitive Linguistics‘ (Nova Science Publishers). Book series dedicated to the subdiscipline are ‘Cognitive Linguistics Research’ (Mouton de Gruyter), ‘Human Cognitive Processing‘ (Benjamins), ‘Cognitive Linguistics in Practice’ (Benjamins), ‘Language, Context, and Cognition’ (Mouton de Gruyter), and ‘Advances in Cognitive Linguistics‘ (Equinox). Furthermore, there are topics on the interface with anthropological linguistics that appear in journals and book series, such as ‘Cognitive linguistic studies in cultural contexts’ (Benjamins) or ‘Language, culture and cognition’ (Cambridge University Press) – cf. Chapter 6. A focus on empirical research in cognitive linguistics is provided by Gonzalez-Marquez et al. 2007. The more recent developments in cognitive linguistics presented in Kristiansen et al. 2006 already emphasize a growing interest in empirical methods. The latest handbook (Dancygier 2017) contains several chapters on methodological approaches (part V). For deeper insights into linguistic relativity, see Gumperz & Levinson 1996, Pütz & Verspoor 2000, and
227
228
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p l i n e s
Gentner & Goldin-Meadow 2003. Lakoff & Johnson 1980 and Kövecses 2010 are basic readings on metaphors, Lakoff & Johnson 1980 on metonyms, Fauconnier [1985] 1994 on mental spaces. Psycholinguistics
Introductory textbooks to psycholinguistics include Fernández & Cairns 2010, Höhle 2010/20122, Traxler 2012, Traxler & Gernsbacher 20062, Müller 2013, Sedivy 2014 and Dietrich & Gerwien 2017. Ingram 1989 and Clark 2009, 2016 are introductions to language acquisition. Historical and recent developments in psycholinguistics are described in Altmann 2001, Sanz, Laka & Tanenhaus 2013, and Levelt 2014. Psycholinguistic handbooks with more detailed information on language comprehension, production and acquisition are, e.g., Gernsbacher 1994, Rickheit, Herrmann & Deutsch 2003, Traxler & Gernsbacher 20062, Gaskell 2007, Spivey, McRae & Joanisse 2012, Brown & Hagoort 1999, and Fernández & Cairns 2018. Edited volumes specifically on language and speech production include Goldrick, Ferreira & Moizzo 2015 or Redford 2015. Major journals in this interdisciplinary field include, among others, the ‘Journal of Memory and Language’ (Elsevier), ‘Brain and Language’ (Elsevier), ‘Cognition’ (Elsevier), ‘Language, Cognition, and Neuroscience’ (Taylor & Francis), ‘Discourse Processes’ (Taylor & Francis), ‘Developmental Science’ (Wiley), ‘Developmental Psychology’ (journal of the American Psychological Association), ‘Cognitive Psychology’ (Elsevier), ‘Journal of Experimental Psychology’ with its specific sections (American Psychological Association), ‘Quarterly Journal of Experimental Psychology’ (Sage), and ‘Journal of Child Language’ (Cambridge University Press). Open access journals such as ‘Frontiers’, ‘PlosOne’, and ‘Glossa’ are having an increasing impact in the field. A comprehensive overview of research methods in psycho-/neurolinguistics is provided by De Groot & Hagoort (2018). There are also methodological volumes with a focus on particular research fields such as Sekerina, Fernández & Clahsen 2008 (online methods in developmental psycholinguistics), Blom & Unsworth 2010 and Hoff 2012 (experimental methods in language acquisition research), Jegerski & VanPatten 2014 (research methods in second language psycholinguistics), Carreiras & Clifton 2004 (online methods in language comprehension research), Bock 1996 and Schiller 2012 (methods in production research), and Cowart 1997 and Schütze 1996/20162 (experimental syntax). Eisenbeiss 2010 describes production methods as used in language acquisition research. Eye-tracking paradigms are described in Henderson & Ferreira 2004 and Rayner et al. 20122. The volume by Podesva & Sharma (2013) includes chapters on psycholinguistic methods such as metalinguistic judgements, eyetracking and EEG. A very useful website with extensive collections of references and materials to conduct psycholinguistic research inside and outside the lab is maintained at https://experimentalfieldlinguistics.wordpress.com.
8
Neurolinguistics
Neurolinguistics is a highly interdisciplinary field incorporating theories and research methods from, inter alia, linguistics, psychology, and neuroscience. Thus, neurolinguistics is a cover term for several subfields with partly different research foci. These are, however, unified by a shared interest in elucidating the relationship between the brain and language behaviour. Language behaviour is understood as language use in production, comprehension, and acquisition as well as the disorders that may affect successful language use. In this chapter, we will first outline the basic research aims and questions shared by all neurolinguistic approaches (Section 8.1) and then the neurolinguistic subfields that are distinguished by different research foci, research participants, and experimental approaches (Section 8.2). In Section 8.3, we will address the methodology of neurolinguistic research, concentrating on experimental research. This includes the specific groups of research participants, data types and techniques of data acquisition and analysis, and particular considerations when interpreting the wealth of data accumulated in neurolinguistic experiments. We will then present the basic research findings (Section 8.4) on the brainlanguage relationship, followed by a summary (Section 8.5), methodological exercises and ideas for your own research projects (Section 8.6), and suggestions for further in-depth reading (Section 8.7).
8.1
Research Aims and Questions
Neurolinguistics is probably the most interdisciplinary field in empirical linguistics research, encompassing research from areas as diverse as aphasiology and speech pathology, psychology and cognitive science, linguistics, neuroscience, and biology. Despite the sometimes profound differences between these scientific disciplines, neurolinguistic research is characterised by two general questions targeting the relationship between the human brain and language: •
How can the basic functions of the human brain explain language comprehension, production, and acquisition? Or, put differently, what is the causal relationship between brain mechanisms and language behaviour (i.e., language use in comprehension, production, and acquisition)? 229
230
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
•
What does adequate language architecture in the brain look like (e.g., vis-à-vis other cognitive functions implemented in the brain)?
In other words, neurolinguistics is concerned with how language is implemented in the brain and how the brain enables seemingly effortless verbal communication among humans. The focus is not only on language in healthy individuals of all ages, but also on language pathology (i.e., the study of impaired speech or language in patients with neurological disorders). The field shares many basic assumptions (as well as experimental designs; cf. de Groot & Hagoort 2018) regarding the components of language comprehension, production, and acquisition with psycholinguistics (cf. Chapter 7), but there are two further distinct assumptions that shape research questions and projects: • •
Language with its structure and meaning is a uniquely human cognitive ability which is unparalleled in the communication systems found in the animal kingdom. The brain exhibits properties specific to the human species, but it nonetheless shares a common biological ancestry with other nonhuman species in terms of structure, function, and genetic basis.
These basic assumptions lead to four more specific research questions pursued in neurolinguistics: • •
• •
Which brain areas are involved in generating language behaviour? What are the neural universals of language implementation in the brain in terms of temporal dynamics and spatial representation? Are the areas involved in generating language behaviour specific to language (i.e., this is their first and only function) or do they primarily serve other, ontogenetically older cognitive functions while finding additional application in language? Given the approximately 7,000 existing languages in the world, how do brain mechanisms relate to linguistic variability? Considering the partial overlap between human language and animal communication in terms of basic functioning, are the brain mechanisms that generate language particular to the human brain or are they shared with brain mechanisms in other species with different communication systems? In other words, what are the similarities and differences between human language and animal communication as they are implemented in the brain?
The first two of these four questions target whether there are mechanisms in the human brain applicable to all existing human languages regardless of their structural differences. Research in this field is mainly concerned with mapping language structure and function onto specific brain areas and processes and with finding an appropriate level of explanation for this mapping. One question in this regard relates to the temporal organisation of processing in the brain – i.e., which aspects of language are processed sequentially before (or after) others and which in
Neurolinguistics
parallel? This question is mainly found in research on whether syntactic information (e.g., syntactic categories, phrases) is processed before semantic meaning is accessed or at the same time (cf. the summary in Bornkessel-Schlesewsky & Schlesewsky 2009: chapter 7). Another question relates to the spatial mapping of linguistic functions onto brain areas or functions. It seems likely that language is built up hierarchically, consisting of several independent components or modules (i.e., phonetics/phonology, morphology, syntax, semantics and pragmatics). Similarly, the brain is also hierarchically structured with evidence for neural areas performing distinct functions in the hierarchy from low-level perception, through higher-level cognition, up to action preparation and execution. For example, humans have a visual and an auditory cortex supporting visual and auditory perception respectively. These areas are actively engaged in the perception of images and sounds in our environment, which result in further processing up the hierarchy in order to determine an appropriate body response to the visual or auditory input. But neither of these cortical areas is critically involved in the preparation and execution of a possible body reaction itself, which is instead under control of areas responsible for motor control. Crucially, whether such a functional segregation also exists for all structural components of language in the brain is still an unsettled issue. For instance, one ongoing debate revolves around whether there is a designated ‘visual word form area’ in the left occipito-temporal cortex that is retuned for the processing of orthographic input during early childhood and is therefore, strongly tied to the cultural invention of reading (Cohen et al. 2000; Price & Devlin 2003, 2011; Dehaene et al. 2005). Another prominent debate concerns whether Broca’s area (located in the left inferior frontal cortex) is specifically recruited for syntactic processing during language comprehension, for integrating multiple information sources with one another or for verbal working memory processes (cf. the reviews in Friederici 2011; Fedorenko & Thompson-Schill 2014; Hagoort 2019a). Although these debates concentrate on different parts of the brain, they have in common that they highlight a central, but still open question of data interpretation (cf. Section 8.3.3): Should an adequate explanation of language-related brain mechanisms include only language-specific functions or should it also include cognitive functions involved in both language and other cognitive domains? The final two research questions listed above refer to neural vs. behavioural variability within and across species. First, there is only one anatomy of the human brain, yet it has brought about thousands of languages of vastly different structure. If there is indeed a neural universal mechanism for processing language, how does it relate to linguistic diversity and variability? As a first example, consider language modality: languages can be perceived in spoken, signed, or written form. Perceptual processing is likely to differ – as a phoneme, a sign, and a letter each activate different perceptual brain areas – yet processing signatures for semantic composition remain quite stable across language modality. Second, languages also differ in their structural properties, allowing for varying word orders, phonological and morphological patterns etc., which may also interact in different ways from language to language. Nonetheless, the current data suggest that similar functions in the computation of meaning tend
231
232
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
to elicit similar brain signatures, even if conveyed by different forms across languages. Variability across species plays a role in neurolinguistics through comparison of human language with nonhuman communication systems, especially those of other primate species, much in the same way that cognitive neuroscience examines cognition more broadly across species. The focus here is, first, on spoken language and on language development (both across species and within individuals), because speech and communicative development exhibit several similar properties across species (e.g., vocal patterns to convey a particular meaning, pattern imitation in early stages of development). The second focus is on the genetic and neuroanatomical overlap of different communicative species. In stark contrast to other neuroscientific investigations involving cross-species comparisons, however, research in neurolinguistics cannot refer to a fully fledged animal model to study language. Animal models require that the animal species selected for research and humans exhibit highly similar biological or cognitive functions (e.g., vision) and neurobiological organisation (e.g., visual cortex) in order for an animal to be able to serve as a substitute participant in an experiment. The experimental outcome allows the researchers to draw inferences on how the human brain may perform an analogous function. These kinds of experiments are often associated with the use of invasive (but very precise) methods of data collection (cf. Section 8.3.2), which are not ethically possible with human participants. When language is the object of research, this procedure is additionally hampered, simply because no animal communication system possesses a complex hierarchical structure comparable to human language. However, it is possible to use animal models for studying specific communicative functions such as vocal learning in songbirds and human infants or speech processing in primates and humans. Finally, as in other scientific disciplines that are concerned with differentiating general (or universal) aspects from varying ones, neurolinguistics is home to a lively debate on what constitutes a real difference. Different brain mechanisms may lead to qualitatively different data patterns (e.g., different brain areas activating in response to different linguistic stimuli) or to quantitatively different data patterns (e.g., the same brain area activating to different degrees to different linguistic stimuli). Whether qualitative or quantitative differences prevail in the discussion on the relation between language and brain depends to some extent on the specific research question. For instance, potential differences between languages or linguistic domains (e.g., different processes related to semantics or syntax) tend to be discussed in terms of qualitative differences (e.g., different event-related potentials elicited in response to semantic vs. syntactic processing; cf. Section 8.3.2.1). Quantitative differences are often used to differentiate processing within a linguistic domain (e.g., the differential engagement of brain areas in processing syntactically complex vs. simple structures) or in research on differences between speakers (e.g., children vs. adults). As mentioned at the outset, neurolinguistics shares a number of assumptions and viewpoints on language with psycholinguistics, but puts greater emphasis on research questions that are neurobiologically adequate (i.e., they respect and
Neurolinguistics
233
follow the working mechanisms of the brain). Therefore, neurolinguistics is not equivalent to pursuing a (psycho)linguistic research question with neurocognitive methods for data collection (cf. Duncan, Tune & Small 2016). Rather, neurolinguistics develops research questions that specifically target the relationship between language and the brain. As a consequence, while neurolinguistic data can reveal a lot about how language is implemented in the brain, it shows considerable variation that, at this time, makes it unlikely that neurolinguistic evidence is as yet able to adjudicate between different linguistic theories or variants within a linguistic framework (Roberts et al. 2018). Table 8.1 gives example research questions per linguistic domain. Table 8.1 Example research questions in neurolinguistics Neurolinguistics Linguistic domains: Phonetics, phonology Morphology & syntax
Lexicon & semantics
Pragmatics & discourse
Cross-domain
1. Which brain areas are responsible for the production of phonological units of speech? 2. Which brain areas are sensitive to the recognition of letters? 3. Are different word categories (e.g., nouns vs. verbs, regular vs. irregular past tense forms, content words vs. function words) represented differently in the brain? 4. Is there a difference in the brain responses to syntactic complexity vs. syntactic ambiguity? 5. Do words for natural objects and man-made tools activate the same brain areas? 6. Which brain areas have to be intact for successful meaning retrieval during language production? 7. How rapidly are words integrated into the evolving discourse model during story comprehension? 8. Are different discourse-pragmatic functions (e.g., drawing inferences, updating of the discourse model) associated with activation in the same or in different brain areas? 9. How do both brain hemispheres interact to facilitate language comprehension? 10. Which brain signatures are language-driven and which ones result from other cognitive variables, such as preparation for or anticipation of the experimental task?
Cross-disciplinary fields: Language 11. What are the neuronal sources of dyslexia? acquisition 12. Do children show the same brain signatures of sentence processing as adults? Language contact 13. To what extent is semantic memory in bilinguals represented differently in the brain compared to monolinguals? Language change 14. From an evolutionary perspective, what neuronal mechanisms that enable communication are shared between humans and non-human primates? Are there neural mechanisms exclusive to human language?
234
8.2
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
Neurolinguistic Approaches
We use neurolinguistics as an umbrella term for three subfields that are distinguishable from one another by their research focus, participant samples, and primary methods of data collection (Small & Hickok 2016): •
•
•
Neuropsychology/aphasiology: This subfield studies impaired or spared speech or language in patients with acquired brain lesions – i.e., brain damage as a result of ischemic stroke, traumatic brain injury, tumours, neurodegenerative or infectious disease (e.g., Alzheimer’s, epilepsy, encephalitis), or in patients with developmental neurological diseases (e.g., dyslexia, specific language impairment, autism spectrum disorder). Main research questions target the functions that specific brain areas fulfil in language comprehension and production. Measurements traditionally include behavioural testing of patients and post-mortem examination of brain lesions (cf. Section 8.3.2.3) but are now also often supplemented by neuroimaging methods that help localise brain lesions in vivo (i.e., in the living brain; cf. Bates et al. 2003). Cognitive neuroscience: This subfield studies all aspects of language processing primarily in healthy humans of all ages, thereby complementing neuropsychological studies of patients. The main research focus is on how linguistic (or generally cognitive) abilities are implemented in the living healthy brain, both in adulthood and during development. Measurements include online methods measuring either the temporal dynamics of language/cognitive processing in the brain or the spatial distribution of brain areas associated with language/cognitive processing (cf. Sections 8.3.2.1 and 8.3.2.2). Neurobiology of language: This subfield studies ‘the biological implementation and linking relations for representations and processes necessary and sufficient for production and understanding of speech and language in context’ (Small & Hickok 2016: 5), exhibiting the most interdisciplinary profile amongst the neurolinguistic approaches. It studies linking between cognitive function and brain implementation in all kinds of human participants and in comparison to animal species, hence, integrating knowledge from psychology, linguistics, neuroscience, and biology (Duncan, Tune & Small 2016). Alongside the online methods used in cognitive neuroscience, methods in the field also include neuroimaging methods that directly measure single neural cells, biological techniques for genetic sequencing or for pharmacological intervention.
The three approaches can also be differentiated based on how they relate to the general research question of how language architecture in the brain can be
Neurolinguistics
235
Lateral view of the left hemisphere of the human brain Arcuate fasciculus parietal lobe frontal lobe Wernicke's area
occipital lobe
Broca's area
cerebellum temporal lobe
Figure 8.1 The classical Wernicke-Geschwind model of language in the brain
envisaged (cf. Section 8.1). Historically, neuropsychology is linked to the classical Wernicke-Geschwind model of language in the brain (see Figure 8.1) developed by Norman Geschwind, integrating findings from nineteenth-century neurologists Paul Broca and Carl Wernicke (cf. Geschwind 1970; Levelt 2014). According to this model, there are two designated language areas in the left hemisphere of the brain, specifically in the perisylvian cortex, each of which performs a separate function. Broca’s area, located in the left inferior frontal cortex, is assumed to support language production, because Broca’s patients with lesions in this area showed severe problems in the precise articulation of sounds and profound disruption in producing (syntactically) complex sentences. Wernicke’s area, located in the left temporal cortex, is assumed to be responsible for language comprehension, because patients with lesions in this area exhibited largely preserved production abilities, but severely impeded comprehension. Both areas are connected by a single fibre tract, the arcuate fasciculus, damage to which leads to conduction aphasia characterised by fluent production and intact comprehension abilities, but an impaired ability to repeat multiword sequences. The classical model was the first to specifically address which parts of the brain are linked to language behaviour, and as such was highly influential in
236
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
neurolinguistics and the classification of patients with aphasia – i.e., with an acquired language deficit due to brain injury (definition from Kemmerer 2015: 71; cf. Kemmerer 2015: chapter 3 for further details on classic aphasia and its modern classification). However, the classical model has been disputed by findings obtained by modern neuroimaging methods capturing brain processes in vivo suggesting that it is too simple (cf. Dronkers 2000; Tremblay & Dick 2016). Not only were the original loci of the two areas ill defined, there is now evidence that many more areas are involved in language processing with more connecting fibre tracts between them. Both cognitive neuroscience and the neurobiology of language share the assumption that there is a language network in the brain, encompassing diverse areas in the frontal, parietal, temporal, and occipital lobes of the cortex (see Figure 8.1 for lobe positions) as well as subcortical structures (e.g., hippocampus, thalamus) and the cerebellum. These language-related areas are not only connected to one another, but also to areas sensitive to cognitive operations that can become relevant for language processing as well, such as the attention network or motor cortex. Apart from this commonality, there is an ongoing debate on the linguistic and cognitive functions that the brain areas support during language processing (cf. Hickok & Poeppel 2007; Hickok 2009; Friederici 2011, 2012; Price 2012; Fedorenko & Thompson-Schill 2014; Bornkessel-Schlesewsky et al. 2015; Hagoort 2019a). For instance, while it is quite clear that Broca’s area is engaged in language production and comprehension, accounts differ with regard to the specific function they assign to it, ranging from purely linguistic (syntax) to general cognitive (information integration, conflict resolution). Importantly, in extending the classical model, modern neurolinguistics assigns the right hemisphere of the brain a pivotal role in language processing, particularly for information that is relevant for discourse-pragmatic aspects of language (e.g., prosody, theory of mind). Our knowledge about the widespread language network in the brain is a direct consequence of the advancement of methods in neuroscience. As will be discussed in Section 8.3.2, neurocognitive (or, more generally, neuroscientific) measurements of brain activity help to localise brain areas activated in response to language or they serve to examine the speed of processing.
8.3
Methodology
Neurolinguistics employs two research methods for data collection to study the relationship between language and the brain: Observation and experiment. Systematic open observation (cf. Section 2.2.2) of patients with brain lesions was the prevailing method in the early stages of neurolinguistics in the nineteenth century and continues to be an important approach in current neuropsychology and aphasiology research. It can be combined with elicitation or other simple experimental tasks in single case studies or experiments with
Neurolinguistics
groups of patients, when patients are asked to produce or comprehend language in the spoken or written modality. Historically, this methodical approach was due to the lack of methods to measure brain activity in vivo. Before the advent of brain imaging methods in the twentieth century, anatomical descriptions of the brain were entirely based on post-mortem examinations of patients’ brains, so any examination of a brain lesion was confined to observation in clinical settings, i.e., during doctor-patient interactions. Given the rareness of patients with comparable lesion anatomy, single case studies also remain an important method in current aphasiology (Caramazza 1986) in order to draw conclusions on normal brain function based on lesion data. However, it is acknowledged that drawing inferences from lesion data on normal, unimpaired language processes is difficult, because the non-damaged brain areas may take up different functions in a healthy brain compared with an abnormal brain where they may compensate for the functions of areas no longer working due to brain injury or disease. Experimental studies of brain function became the dominant method of investigation with the emergence of neuroimaging methods of data collection in the twentieth century, i.e., methods that are able to inspect brain structure and brain activity non-invasively and in real time. This paved the way for investigations of the brain and language in healthy participants and also made it possible to localise lesions in living patients by means of neuroimaging methods (such as voxel-based lesion mapping, cf. Bates et al. 2003). Due to the timeconsuming and expensive experimental setup, most neurolinguistic studies with neuroimaging methods are cross-sectional; longitudinal studies remain the exception despite increasing efforts in developmental neuroscience. Another consequence of the technical requirements to conduct neurolinguistic experiments is that, typically, speakers of a few major languages from countries with highly developed research infrastructure are studied. Endeavours to study a more diverse population of languages are less common as they face several challenges, mainly because speakers of understudied languages often live in areas without the required research infrastructure. Experiments on understudied languages take place as field experiments (cf. Section 2.4.2) as long as it is the case that it is impossible for speakers of understudied languages to travel to a university laboratory. This situation can lead to a number of practical issues in conducting neurolinguistic experiments in the field, such as: •
•
social interaction: The technical equipment for neurolinguistic experiments is often unknown to speakers living in remote areas. This will likely cause reciprocal effects during data recording (cf. Section 1.1.5) as participants may behave differently in the presence of unfamiliar technical equipment. This in turn may yield unpredictable influences on measurements due to, e.g., stress or fatigue. climate: Climate conditions such as heat, cold and frost or humidity can severely hinder accurate measurements as most of the current technology used in neurolinguistic experiments rely on mild
237
238
•
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
temperature and low humidity in order to work properly. For example, extremely hot and humid climate alters physical functions in terms of, e.g., increased perspiration. This makes measurements with some techniques (such as electrophysiological recordings, cf. Section 8.3.2.1) that require dry skin hard to obtain. infrastructure: Advanced technical equipment requires reliable electricity. When this is not possible via a local power supply system, additional current generators must be provided as part of the research equipment.
The experimental approach is the prevailing approach in current neurolinguistics. Accordingly, we will focus on it in the following sections. A typical neurolinguistic experiment comprises the same key components as in cognitive and psycholinguistics (cf. Section 7.3.3). Apart from the kind of research questions, the main differences to psycholinguistics pertain to the research participants sampled, the experimental measures used to collect data, and, hence, the focus on particular data types. We will discuss these aspects in the following Sections 8.3.1 and 8.3.2. Section 8.3.3 discusses some particular aspects of data interpretation that come with the multidimensionality of neuroimaging data. 8.3.1
Research Participants
Neurolinguists collect data from varied samples of participants, including healthy participants and participants with impaired speech or language (i.e., patients with disorders of neurological or psychological origin). The latter are often underrepresented in experimental research on cognitive processing in healthy participants (cf. Chapter 7) and their language is less likely to be compiled in corpora (cf. Chapter 5) because it is not representative of normal language use (i.e., impaired language users do not typically produce the kind of texts that are collected in large corpora). The type of research participant sample ultimately selected depends on the neurolinguistic approach and its research aims and questions pursued within a neurolinguistic experiment (cf. Section 8.2). In general, there are four groups of research participants: • •
healthy adults and typically developing children: These participants have not been diagnosed with any neurological or psychological disorders up to the time of the experiment or test. participants with acquired disorders: These participants (or patients) had a typical brain development, but have acquired a neurological disease later in life, i.e., the disorder is mostly due to circumstances in their environment. Acquired disorders can result from occlusions of blood vessels (ischemic stroke), traumatic brain injury, tumour, or neurodegenerative or infectious diseases (e.g., Alzheimer’s, epilepsy, encephalitis). Participant age varies from childhood to adulthood, depending on what caused the disorder.
Neurolinguistics
•
•
participants with developmental disorders: These participants have neurological disorders that first emerged during childhood, i.e., the disorders most likely have a genetic origin and are a chronic condition throughout an individual’s life span (e.g., dyslexia, specific language impairment, autism spectrum disorder). The focus is on child participants, but young adults may also be sampled when longterm atypical behaviour or enduring effects of treatment outcome (e.g., reading intervention in dyslexics) are of interest. non-human primates and other vertebrate species: These participant groups are animal species with an advanced communication system. They are either chosen for their genetic similarity to humans (primates) or because they possess certain communicative functions present in children or adults (e.g., songbirds and their capacity for vocal learning such as to imitate parents’ songs).
Sampling procedures for the majority of neurolinguistic investigations are subject to the same sampling biases as in cognitive and psycholinguistics (cf. Section 7.3.1) and, thus, suffer from the same limitations when it comes to the generalisation and replication of research findings. In particular, studies focusing on how the ‘normal’ healthy brain processes language typically sample the local university student population. Thus, neurolinguistic experiments are prone to the same age and education biases that go hand in hand with a linguistic bias towards better-studied major languages (especially English) in countries promoting language sciences and providing the necessary laboratory equipment. This bias also carries over to cross-species comparisons involving non-human animals with communication systems. Although animals are tested in such studies, implicit comparison is (or can be) made to findings from the literature on healthy adults – who are predominantly young adult speakers from a few major languages with a university background. Another selection bias in sampling pertains to participants’ handedness. Many of the current findings on brain structures involved in language is based on right handers. Left-handed participants often show a slightly different brain organisation, leading to different results in terms of where language is processed in the brain (Knecht et al. 2000). Hence, left-handed participants are almost always excluded from participating in neurolinguistic experiments (unless their brain organisation is the object of research) and as a consequence, neurolinguistic results do not, strictly speaking, generalise to about 5–10 per cent of the human population (the estimated percentage of left-handed people). Moreover, age bias also affects the generalisability of findings with regard to participants with acquired disorders. Many diseases leading to acquired brain disorders emerge later in life (e.g., stroke), thus, patients with acquired disorders tend to be significantly older than the healthy adults tested in experiments. Participants with disorders (acquired or developmental) are also special for another important reason: it is usually more difficult to find sufficient numbers of participants comparable in personal characteristics (age, cognitive abilities
239
240
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
other than language, etc.) – as would be the case in experiments with healthy participants – and who also present with comparable lesion sites. Therefore, experiments involving impaired groups often involve smaller group sizes, which are moreover characterised by greater within-group variation. This in turn may compromise research findings in terms of lower replicability and reliability, especially if it is not compensated for by experimental design or robust statistics.
8.3.2
Data Types and Techniques of Data Collection and Analysis
As a consequence of working with experiments, observation and analyses of natural or elicited language behaviour, neurolinguistics also deals with the same three kinds of data as we introduced for cognitive and psycholinguistics (cf. Section 7.3.2), albeit with a slightly different specification: • • •
language data: linguistic responses (with elicitation task) or natural speech language-related data: neuronal or behavioural reactions in response to linguistic input language-independent data: neuronal or behavioural reactions in response to non-linguistic input
How language is implemented in the brain can only be inferred indirectly from data (it is not currently possible to directly examine the neural implementation of presumed linguistic structure), so the same interpretive component applies as is the case in other experimental research fields. Data interpretation is more complex compared with behavioural cognitive research because neurolinguists additionally have to pay attention to the level at which they map brain mechanisms to language functions (cf. Section 8.3.3), which directly hinges on the basic research approach and the kind of neuroscientific or behavioural measures/ methods used to collect the data. As for the basic research approach, there are three approaches adopted in actual research using neuroimaging techniques (Rugg 1999: 16–18). They form a sequence of investigation from the localisation of a presumed or known cognitive/linguistic function, to the division of that function into smaller units and, finally, the determination of whether an assumed behavioural or perceptual difference evokes a corresponding difference in neural activity: •
functional localisation: With this approach, the researcher aims to identify a neural correlate of a well-defined linguistic (or cognitive) function. This approach extends knowledge from lesion data by not depending on chance (i.e., lesions occur randomly across patients and affect more than the areas of interest). It allows the investigation of a larger sample of (healthy) participants and also makes possible analyses focusing on individual differences in brain activity and functional localisations. With appropriate experimental design, it
Neurolinguistics
•
•
permits the more precise investigation of brain areas that are not damaged selectively. Neuroimaging techniques allow the investigation of the temporal dynamics of and interaction between activated brain regions. functional fractionation: The researcher focuses on the division of a presumed function into smaller units or the differentiation between linguistic or cognitive functions that a brain area may support. This can be achieved with particular experimental designs employing, for instance, experimental tasks targeting different cognitive functions or with measurements of better resolutions revealing more fine-grained data than other measurements. neural monitoring: With this approach, the researcher investigates whether particular experimental tasks or stimuli engage distinct cognitive functions and, thus, distinct, brain mechanisms. Designs of this type do not require participants to produce a behavioural response to a stimulus (but see Krakauer et al. 2017 for criticism of omitting behaviour in neurocognitive investigations). However, this means that prior empirical knowledge about the relationship between the investigated cognitive function and brain mechanism is needed.
For each of the above research approaches, researchers can choose between experimental designs following the logic of subtraction or the logic of correlation (Kemmerer 2015: 49–54). The subtraction design is most commonly used in neurolinguistic experiments. This type of design is based on the comparison of the control condition with the experimental condition, with the latter including some stimulus that is thought to engage a different cognitive mechanism than the control condition. During analysis with inferential statistics, researchers then infer whether activity in the experimental condition is different from activity in the control condition by subtracting activity in the control condition from activity in the experimental condition. Residual activity in the experimental condition is assumed to be genuinely triggered by the experimental manipulation of interest. With the correlation design it is assumed that differences between conditions are more quantitative in that they reveal activation to different degrees. In this design, conditions are associated with ‘tasks that recruit the ability of interest to different extents’ (Kemmerer 2015: 51), and statistical analysis is used to find out whether brain activity changes in a manner similar to the variation in task demands. Thus, in data interpretation, brain-language mapping is constrained by the specific research and experimental design – each of which may lead to different choices in measurements and statistical analyses. The application of neurocognitive or behavioural measures/methods largely depends on the subfield (cf. Section 8.2), although there is a clear focus on neurocognitive methods as they can be used to shed light on brain organisation and functioning in real time and in vivo. In Section 7.3.3, we introduced behavioural and some neurocognitive measures (cf. Table 7.3) spanning three
241
242
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
classification dimensions which we will now refine to capture the difference between neurocognitive and behavioural methods more precisely: • •
•
temporal dimension: Neurocognitive methods are online methods as they measure brain activity as it happens; behavioural methods are, with a few exceptions, offline methods. dimension referring to the ontological type of data: Neurocognitive methods measure when and where in the brain activity changes; behavioural methods measure associated processing speed or accuracy of a behavioural reaction. informativity dimension: Neurocognitive methods are multidimensional, providing multiple types of information per measurement, and, hence, allowing the analysis of multiple variables in the same dataset.
As the above dimensions mainly serve to distinguish behavioural from neurocognitive methods, we need three further dimensions of neurocognitive data collection in order to distinguish between the different types of neurocognitive methods: •
•
•
temporal vs. spatial resolution: This dimension relates to the ontological type of data explicitly distinguishing between temporal and spatial measures of brain activity. This is in fact the dominant dimension for methodical choice, while the other two dimensions differentiate subtypes. invasiveness: This dimension captures the fact that some methods leave the human body intact while others require the application of (potentially harmful) substances or operations, e.g., radioactive tracers or opening the skull. practical aspects: This dimension relates to the fact that many methods, especially those with a high spatial resolution, require advanced technical equipment that is costly in acquisition and/or experiment realization (e.g., participant reimbursement, maintenance costs for the equipment, researcher training).
Figure 8.2 (adapted from figure 1.2 in Rösler 2011: 8) provides an overview of the main methods of data collection in neurolinguistics and depicts measures along the first dimension of data collection. Spatial resolution is represented by the y-axis in Figure 8.2, showing units of analysis from the single neuronal cell up to the entire body. The temporal dimension is represented by the x-axis in Figure 8.2, showing units of analysis from milliseconds to units longer than minutes. Higher resolution on either axis corresponds to the smaller units of analysis, and, hence, more precise tracking of brain activity (or behaviour). Temporal vs. spatial resolution refers to whether a method of data collection captures when processing mechanisms change (temporal resolution) or where in the brain processing signatures change (spatial resolution) in response to a stimulus. Measurements with high temporal resolution are able to examine
Neurolinguistics
243
low
SPATIAL RESOLUTION
body
behavioural methods (eyetracking)
behavioural methods (reaction time and accuracy)
brain EEG MEG
fNIRS
brain area
PET
TMS fMRI
naturally occurring lesions
ECoG cell assembly
neuron
biological, pharmacological, genetic techniques
intracellular high
milliseconds (ms)
seconds (s)
> minutes (m)
TEMPORAL RESOLUTION Figure 8.2 Neurocognitive and behavioural methods ordered by their spatial and temporal resolution
processing with millisecond precision. Measures with high spatial resolution capture changes in processing at the scale of millimetres or even smaller for single cells. There is a trade-off between time and space: methods with a high temporal resolution often have rather poor spatial resolution and vice versa. For instance, electrophysiological measures (electroencephalogram, EEG) offer high temporal resolution (milliseconds), but perform poorly in spatial terms as they are typically collected via electrodes on the scalp limiting them due to signal mixing between the sensor and the neural source (cf. Section 8.3.2.1). Spatial resolution can be enhanced, however, when electrodes are placed directly on the surface of the cortex, as is the case for intracranial EEG (iEEG, also called electrocorticogram, ECoG, for cortical measurements). In contrast, measures collected using MRI offer high spatial resolution (millimetres), but their temporal resolution is fairly low (seconds; cf. Section 8.3.2.2). A notable exception to the trade-off is MEG (magnetoencephalogram) that offers both high temporal and high spatial resolution simultaneously, but it is less often available for research purposes as it requires special infrastructure (i.e., its practical cost is fairly high) and is rarely used in clinical practice (so infrastructure is not shared between researchers and clinicians).
low
244
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
Importantly, spatial or temporal resolution coincides with the dimension of invasiveness in a number of cases. Invasiveness refers to whether a method is applied from the outside, leaving the human body intact and its normal, spontaneous function unaltered, or whether it requires direct contact with the brain or neuronal cell structures or the introduction of foreign matter. For instance, both PET (positron emission tomography) and fMRI (functional magnetic resonance imaging) measure blood flow in the brain, but PET can only meaningfully do so when participants have taken a radioactive contrast agent, while fMRI does not require contrast agents. Thus, invasiveness for PET is higher as participants are required to take potentially harmful substances. Electrophysiological measures can also vary regarding their invasiveness: Scalp-recorded EEG places electrodes on the scalp, while intracranial EEG works with electrodes directly located on the brain areas of interest. As a consequence, the latter is nearly exclusively performed with patients before they undergo a neurosurgical intervention and not healthy participants. There is a tendency that more invasive methods exhibit higher temporal or spatial resolution. All invasive applications require special approval by an ethics board and careful debriefing of participants that may go beyond the necessary ethical approval for non-invasive methods. Finally, although practical aspects in data acquisition does not affect the kind of data collected per se, it still influences methodical choices. Methods with high spatial resolution require advanced technological equipment that has both high fixed and operational costs. Consequently, these methods are often found in medical environments where the technical equipment is also used for regular, non-scientific medical diagnosis. For example, both fNIRS (functional near infrared spectroscopy) and TMS (transcranial magnetic stimulation) provide good spatial resolution but require less expensive equipment than MRI, so both methods are used more often in research departments independent of medical science. In the following sections, we will introduce the most common types of neurocognitive methods of studying the brain-language relationship in healthy participants, grouping them according to the temporal (Section 8.3.2.1) and the spatial domain (Section 8.3.2.2). More detailed and technical introductions to these and further neuroscientific methods (including biological, pharmacological, and genetic techniques) can be found in Gazzaniga, Ivry and Mangun (20144: 71–119) and in individual chapters in de Groot and Hagoort (2018). Further experimental tasks and behavioural measurements employed in experimental settings with patients are briefly covered in Section 8.3.2.3.
Measures with High Temporal Resolution The scalp-recorded electroencephalogram (EEG) and the magnetoencephalogram (MEG) are the most commonly used online measures to investigate the temporal dynamics of processing in the brain, while the electrocorticogram (ECoG) as an intracranial measure is mainly applied to patients before they undergo necessary neurological surgery. The basic procedure is that
8.3.2.1
Neurolinguistics
245
Table 8.2 Temporal measures in neurolinguistics Method
Measurement type
Temporal resolution
Spatial resolution Invasiveness
Practical aspects
EEG MEG ECoG
electrophysiological, direct electrophysiological, direct electrophysiological, direct
very high (ms) very high (ms) very high (ms)
low medium high
low high high
low low high
several electrodes (also known as sensors) are placed on the surface of the scalp (or the surface of the brain region of interest for intracranial measures). The signal measured is the electrical voltage caused by groups of neurons firing in synchrony and, therefore, electrophysiological methods are direct measures of brain activity. Broadly speaking, the resulting data can be analysed in terms of when activity changes (latency), the strength of the signal (power or amplitude, suggestive of qualitative or quantitative processing differences), and where, on the scalp processing differences, manifest (topographical distribution), which is suggestive of different neural generators. Table 8.2 gives an overview of what each method’s type of measurement is and how it fares on each of the three dimensions of data collection, with temporal and spatial resolution listed separately. All measures have a very high temporal resolution as they track changes in brain activity on a millisecond-by-millisecond basis. EEG directly picks up voltage changes, with electrical activity stemming mainly from neural sources oriented perpendicularly to the surface of the scalp (cf. Luck 20142). MEG is sensitive to changes in the magnetic fields resulting from changes in the electrical activity of neural sources that are oriented in parallel to the surface of the scalp. These specific measurement details have two important consequences. First, both methods are subject to the selectivity problem (Rugg 1999: 25), meaning that they do not measure activity from all neural sources engaged in a cognitive task, but only from those near the scalp and with a particular orientation. Second, EEG and MEG are both electrophysiological measures, but measure electrical activity from different neural populations. Another difference is that MEG measurements exhibit better spatial resolution than scalp-recorded EEG measurements because the magnetic fields are less distorted by the skull and tissues within than electric fields. Nevertheless, the thus attained spatial resolution is still not as precise as the resolution obtained by spatial measures (cf. Section 8.3.2.2) because the strength of magnetic fields drops rapidly with increasing distance from the magnetic source so that activity in deeper structures is less well detected. Recorded on the surface of the scalp and limited in spatial resolution, both EEG and MEG suffer from the inverse problem (Rugg 1999: 25–26). That is, when a change in brain activity is recorded at a particular electrode site on the scalp vs. another site, it suggests that different neural sources likely generated the surface patterns. The problem, however, is that a surface pattern can by no means be taken as evidence that the neural source directly beneath that surface position
246
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
is responsible for generating the effect. The reason for this is that as the signal travels through the brain tissue, activity from different neural populations may interact with each other (e.g., then cancelling out one another) and, at the end of the process, the skull ultimately distorts the brain signal in complicated ways. In other words, every pattern observed on the scalp is a mixture of all the underlying signals, and multiple generators in the brain may be responsible for producing such a pattern. So, an effect at a given MEG/EEG sensor may reflect something directly beneath that electrode, but it may also reflect the mixing of effects from elsewhere. EEG studies on language processing outnumber MEG studies by far. Part of the reason for this is that the equipment for data collection is less costly for EEG and, more importantly, that a particular analysis type, event-related potentials (ERPs), has proven itself to be very fruitful for the investigation of language, with a tradition going back to before the existence of MEG. ERPs are potential changes that are time-locked to a sensory or cognitive event that may be induced by a stimulus presented during the experiment (stimulus-locked ERP) or a behavioural response (response-locked ERP). Figure 8.3 shows two hypothetical (stimulus-locked) ERP waveforms, one for a control condition and one for an experimental condition (i.e., one for which a deviant processing pattern has been hypothesised). A waveform is composed of peaks (‘hills’ and ‘valleys’), each of
-4 µv N400
N100
N200
100
500
1000 ms
P200
+4 µv
P300 P600
experimental condition
Figure 8.3 Hypothetical ERP waveforms
control condition
Neurolinguistics
which may constitute a functionally distinct ERP component. Negative-going voltage deflections (‘hills’) are plotted up, positive-going voltage deflections (‘valleys’) are plotted downwards. The beginning (onset) of the sensory or cognitive event is represented by zero on the x-axis (see Luck 20142 for an indepth introduction to ERPs). ERP components are classified based on their latency, polarity, and topographical distribution across the scalp: •
• •
Latency can be the onset at which changes are observed relative to the beginning of the sensory or cognitive event or the onset of the behavioural response. It can also refer to the peak of the component amplitude. Polarity refers to whether the ERP amplitude signals a positive or negative voltage deflection. Topography refers to the strength of the effect across the surface of the scalp.
Latency and polarity are typically used to name a component. The ‘P600’, for instance, is a positive-going ERP component peaking at about 600 ms after the onset of the sensory or cognitive event, the ‘N400’ is a negative-going ERP component peaking at about 400 ms post event onset. Topographical distribution is used to dissociate ERP components that would be indistinguishable based on latency and polarity, such as for example the N400 (with a centro-parietal distribution) and the left-anterior negativity (LAN, a negative-going ERP with similar latency but a left-anterior distribution on the scalp). Lastly, amplitude is the critical criterion to determine whether the experimental and the control condition differ from one another and, thus, whether the manipulation implemented in the experimental condition is having an effect. In other words, data interpretation is inherently relative, with ERPs in two (or more) conditions being compared to one another. A typical assumption is that amplitude increases (in microvolts) correspond to more or different neural activity. As one can readily ascertain from Figure 8.3, amplitude differences can be rather subtle, so advanced inferential statistics are necessary to determine whether there is a reliable difference (cf. Kretzschmar & Alday submitted). ERPs nicely highlight the multidimensionality of neurocognitive data, as ERPs of different latency and polarity are assumed to reflect qualitatively different cognitive mechanisms. In the beginnings of ERP research on language, researchers hoped that different ERPs would map to different linguistic functions. A case in point is the distinction between the processing of semantics and syntax. The N400 ERP component was found to be related to semantic processing in language in the early 1980s (Kutas & Hillyard 1980). About a decade later another study reported that a late positive component, the P600, exclusively reacted to syntactic processing (Hagoort, Brown & Groothusen 1993). However, both of these interpretations of the components’ functional significance have been refuted during the past decades. The N400 is sensitive
247
248
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
to meaningfulness in any cognitive domain (e.g., language, music, or mathematics), while the P600 is also sensitive to linguistic domains other than syntax and, in fact, may be a particular instance of another domain-general ERP component, the P300, which is a cognitive marker for decision-making or conflict monitoring (cf. Kutas & Federmeier 2011; Leckey & Federmeier 2020; Alday & Kretzschmar 2019). Thus, there is neither a 1-to-1 relationship between qualitatively distinct ERP components and linguistic domains, nor are these ERP components confined to language. We will return to this mapping problem between neurocognitive data and cognitive or linguistic functions in Section 8.3.3.
Measures with High Spatial Resolution Functional magnetic resonance imaging (fMRI) is by far the most prevalent spatial measurement in cognitive neuroscience to investigate diverse language or other cognitive functions in the brain. As the brain cannot store any form of energy, the oxygen that active neurons consume must always be delivered on demand carried by haemoglobin in the natural blood flow. For active brain regions this means that there is a change in the ratio of oxygenated haemoglobin and deoxygenated haemoglobin as the blood supply changes to meet demands. This is the value that fMRI detectors measure, as they are sensitive to the paramagnetic property of deoxygenated haemoglobin (as opposed to the diamagnetism of oxygenated haemoglobin). This value is referred to as the blood oxygen level-dependent (BOLD) response; it is interpreted as relative increases in activation or as de-activation. As changes in blood flow take a while to manifest, the temporal resolution of the BOLD response is on the scale of seconds (typically peaking 4–6 ms after the onset of a stimulus). Positron emission tomography (PET) is the historical precursor to MRI methods, but has become less important because it is an invasive method where participants have to take radioactive contrast agents so that the blood concentration in the brain can be made visible. PET measures metabolic brain activity by monitoring the distribution of that radioactive agent as transported with the blood flow (Gazzaniga, Ivry & Mangun 20144: 110). Functional near infrared spectroscopy (fNIRS) is a relatively new method in the field used mainly in developmental studies. It also measures oxygen levels in the blood, however, this is achieved via optical imaging. The transcranial magnetic stimulation (TMS) method uses focused magnetic fields to stimulate selected brain areas, resulting in the enhancement of brain activity or its suppression. Lesion studies either use post-mortem examination of lesioned brains or use MRI methods to map lesions in vivo. Table 8.3 gives an overview of what each method’s type of measurement is and how it fares on each of the three dimensions, with temporal and spatial resolution listed separately. The majority of the spatial measures in Table 8.3 are haemodynamic methods. These are indirect measures of neural activity because they measure the consequences of neural activity by being sensitive to the increased blood flow and 8.3.2.2
Neurolinguistics
249
Table 8.3 Spatial measures in neurolinguistics Method
Measurement type
fNIRS
haemodynamic, indirect haemodynamic, indirect haemodynamic, indirect post-mortem or voxel-based (hemodynamic) magnetic field stimulation, direct
PET fMRI Lesion studies TMS
Temporal resolution
Spatial resolution
Invasiveness
Practical aspects
medium-low (ms, s) low (s)
medium (cm)
low
low
high (mm)
high
high
mediumlow (s) low
very high (mm)
low
high
high (mm)
medium-low
high
high
high
low
medium
oxygen consumption that follow and, thus, correlate with neural activity. Because the vascular response is much slower (on the order of seconds) than the neural activity itself, these methods are also fundamentally limited by biology and not technology in their temporal resolution. Next, we will briefly introduce advantages and disadvantages of fMRI and fNIRS as complementary examples of haemodynamic methods, before turning to TMS. fMRI has several practical advantages over its precursor PET: it is less expensive with regard to its equipment and maintenance and does not require the use of radioactive tracers. The spatial resolution of MRI techniques is also superior to PET’s. MRI is unmatched by any current non-invasive technique in spatial resolution, enabling researchers to precisely map brain structure to cognitive functions, which is its greatest advantage. However, fMRI data are also highly multidimensional in the sense that any given task will activate many more regions than may be predicted or that may be critically involved in performing the task. This makes it difficult to infer the exact contribution a brain area has for a given task, which in turn weakens claims regarding the relationship between brain activation and behavioural performance (Gazzaniga, Ivry & Mangun 20144: 110). Moreover, the fMRI technique is further limited as it imposes a number of restrictions on who can take part in an experiment. The MRI recording environment does not permit participants with any metal (e.g., tattoos, prosthesis, some contraceptive coils) in their bodies. It is also very noisy and loud, requiring participants to remain very still for about half an hour in order to avoid motion confounds that would make data analysis and interpretation nearly impossible. The latter limitation has important consequences for studies on brain development during infancy: it is nearly impossible to use fMRI with infants or toddlers. In the last decade, fNIRS has been established as a viable alternative to fMRI in developmental cognitive neuroscience (Lloyd-Fox, Blasi & Elwell 2010), because it is not subject to the above limitations. fNIRS measures the haemodynamic response
250
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
via the light absorption properties of the blood. Specifically, it is measured by means of light emitters and detectors (optodes) mounted on a cap that is placed on the participant’s head. Light beams pass through the skull and neural tissue; light absorption in regions with greater neural activity should be larger than in regions with less activity. Hence, the haemodynamic response is a function of changes in the absorption properties of oxygenated haemoglobin (cf. Willems & Cristia 2018). The fNIRS technique has a number of advantages, especially in comparison to fMRI. It is less sensitive to motion artefacts during data recording and also has a somewhat higher temporal resolution than fMRI. Technical preparation is less time-consuming (and less expensive) and the recording environment produces much less sound. In fact, fNIRS can be conducted outside of the lab, which makes it attractive for studying language in its natural setting (cf. Willems & Cristia 2018 for discussion). Its major disadvantages are that neither temporal nor spatial resolution are optimal, lagging behind EEG and fMRI measures, respectively. Moreover, the precision of the method is limited by how far the nearinfrared light can penetrate and so the method is far more applicable to infants than adults with larger brains and thicker skulls. Also, as an extracranial measurement, it does not allow direct inferences of the underlying neural generators from the scalp-mounted optodes (i.e., they suffer from the same inverse problem as EEG measures, cf. Section 8.3.2.1), and so additional measures for anatomical reference may be necessary. Finally, as with MRI measures it provides multidimensional data and therefore, data interpretation is more complex. What the above measures have in common is that they provide correlative data for the brain-language relationship. Besides lesion mapping, TMS is the only neurocognitive measure that can test for a causal relationship between brain and language behaviour. TMS directly stimulates selected parts of the brain via brief magnetic impulses in order to rapidly change electrophysiological activity in those parts. There are two protocols of how the method can be applied: a single pulse results in a short-lasting effect, while the application of a series of pulses (repetitive TMS, or rTMS) can have effects beyond stimulation time. For rTMS, the time elapsed between pulses determines whether the effect is inhibitory or facilitatory/excitatory. With rapid successive pulses, inhibition of spontaneous neural activity creates what is called a virtual lesion (Coslett 2016). As a result, the participant will not be able to perform a behavioural task accurately or with normal speed. When several pulses are applied with sufficient time in between, it results in an excitatory or facilitatory effect in the target region (i.e., processing and ensuing behavioural effect are optimised). Pulses are very short and spatially focal, so both temporal and spatial resolution of the method are high. TMS pulses can be applied while the participant is performing a behavioural task or before task performance. In most cases behavioural consequences are changes in reaction time (measured by button presses, speech onset or eye tracking), but changes in response accuracy have also been reported (Kemmerer 2015). Thus, in order to infer the functional role of the target area for language behaviour, TMS must be accompanied by appropriate behavioural measures and experimental tasks to
Neurolinguistics
show the resulting effect on language behaviour. While this allows for better inferences on causality, it is important to note that the effect of the magnetic field may also propagate to further remote regions, thereby weakening causal claims. TMS is a promising method, which may also be effective in the long-term treatment of aphasic syndromes (Coslett 2016), however, its weaknesses are that it is restricted to brain areas near the skull and it harbours the risk of creating seizures during application (cf. Kemmerer 2015).
Further Behavioural Measures and Tasks Behavioural measures and specific experimental tasks are often used when groups of patients with acquired or developmental disorders of language or speech are studied. The purpose of using these measures and tasks is twofold. First, they are used to uncover individual differences within participant groups. This is most often the case for patients with acquired disorders, where experimental tasks are used to differentiate subgroups of patients (e.g., different types of aphasics). Second, they are used to define markers of developmental disorders that distinguish the delayed acquisition of cognitive abilities for an entire group of patients from the typical development of healthy controls. For the classification of patients with acquired disorders, for example, a range of experimental elicitation-based tasks from psycholinguistics (cf. Section 7.3.5 and Table 7.2) are commonly used. These are mainly simple naming-based designs where patients are required to name pictures of objects or describe scenes depicted in pictures (naming task, picture description task), but sentence or story completion tasks are also frequently used. An advantage of these tasks is that they directly test for lexical knowledge of word meaning, syntactic knowledge for sentence composition as well as phonological knowledge and motor abilities for producing speech. The main rationale, shared with psycholinguistic research, is that different tasks or different combinations of one experimental task with systematically varying linguistic stimuli are associated with different cognitive operations. In neuropsychological research, evidence for the involvement of functionally distinct cognitive mechanisms comes from single dissociation or double dissociation (Kemmerer 2015: 30–33). Single dissociation is considered the weaker form of evidence and means that a patient presents worse behavioural performance on a given task a vs. another task b (or combination of task and stimulus). Double dissociation is the stronger form of evidence and results from two patients showing reverse behavioural response patterns: whereas patient x performs worse on task a (vs. task b), patient y performs worse on task b (vs. task a). In general, detailed analysis of results from the tasks can be used for a detailed description of language performance associated with certain brain lesions and are, therefore, informative also in the absence of neural data. For characterising groups of participants as a whole, measurements to probe comprehension abilities include mainly eye-tracking in the visual world paradigm and eye-tracking during reading (cf. Section 7.3.4 and Table 7.3). An advantage of both paradigms is that they do not require secondary tasks that
8.3.2.3
251
252
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
could require the participant to make more effort. They are, therefore, especially used for child participants with developmental disorders or for participants with dramatic levels of impairment. As with the production tasks employed to classify subgroups of patients, these eye-tracking measures can be used to identify behaviour that is specific to a particular participant group, but it is currently less common to use them to describe individual differences within participant groups.
8.3.3
Data Interpretation
Broadly speaking, data interpretation is concerned with the description and explanation of how a particular linguistic function is mapped onto a particular brain area or brain activity. A key assumption here is the invariance of mapping – i.e., that the relationship between function and brain structure does not change in the (healthy) brain. A particular language function x is assumed to be implemented in the brain in only one way and this differs qualitatively from the way a language function y is implemented in the brain (Rugg 1999: 29–30). This, in turn, has important repercussions for models of language architecture in the brain that aim to describe and explain the neural underpinnings of language. Currently, explanations in neurolinguistics are predominantly based on qualitative differences between language functions and associated brain mechanisms, while quantitative differences (i.e., matters of degrees of activation rather than distinct activation patterns – only play a minor role in modelling language in the brain. In interpreting their experimental findings, researchers also have to address the following three aspects that restrict the inferences that can be drawn from any dataset: •
•
causality vs. correlation: Neuroimaging data are largely correlational in nature as we saw in Sections 8.3.2.1 and 8.3.2.2: they show only that the brain response covaries with certain linguistic or cognitive functions. Whether these brain correlates play a causal role in generating a particular language behaviour can only be directly tested with difficulty. Lesion data allow inferences on which brain regions are causally involved in language impairment, but suffer from the fact that lesioned areas cannot be systematically generated and that they can also alter the function of intact neural structures. Spatial measures such as TMS (cf. Section 8.3.2.2) can be applied to directly alter neural activity in circumscribed regions, thus, allowing for causal interpretations, but there is also some chance that further regions are stimulated. macroscopic vs. microscopic structure: Data interpretation can be based on macroscopic structures (i.e., brain areas, or on functionally or structurally segregated subareas and cell assemblies at a
Neurolinguistics
•
microscopic level). This is both a question of the degree of resolution of the selected technique and the quality of data recording. temporal vs. spatial dimension: Inferences on brain-language mapping are limited by the trade-off between temporal and spatial resolution (cf. Section 8.3.2) in neuroimaging techniques and, thus, suffer from the same trade-off. Functional explanations in the temporal domain may not generalise to the spatial domain, and vice versa. In most cases data patterns in both dimensions cannot be easily integrated with each other unless data from temporal and spatial measures are co-registered in one and the same experiment.
There are several problems and fallacies when interpreting the link between neural data and language. One of the most challenging is known as the mapping problem (Bornkessel-Schlesewsky & Schlesewsky 2009: 22–24) which was discussed above regarding electrophysiological measures (cf. Section 8.3.2.1), but actually applies to many neurocognitive measures. The mapping problem arises from the long-standing assumption that language and the brain both have a hierarchical architecture with functions clearly delineated from each other. The language hierarchy is composed from smaller units in phonology to semantic and syntactic units up to the largest possible units (propositions) at the discourse level. The brain shows a hierarchy from processing simple perceptual information units in posterior areas (occipital and parietal lobes) to the processing of increasingly more complex information units at more anterior sites (temporal and frontal lobes). This led to the assumption that the functional segregation of language may have a corresponding functional segregation in the brain, resulting in proposals for brain areas serving language-specific functions. This assumption seems uncontroversial for some phenomena. Reading, for examples, involves the recognition of visual shapes which are then integrated into meaningful orthographic units in order to compute word or sentence meaning. Shape perception takes place in functionally distinct areas in the visual cortex (located in the occipital lobe), and this function is transferred from a non-linguistic cognitive task to the processing of visual language. Orthographic and semantic units, which are more complex compared to visual shapes, are then processed in brain regions in the temporal cortex. This, admittedly simple, example seems to straightforwardly align linguistic and brain hierarchies with one another. However, the assumption that the two hierarchies align across the board faces two challenges. First, in many cases at the level of sentence composition and beyond, it is far from clear which linguistic function should be mapped to a given brain signature. In Section 8.3.2.1, we briefly discussed the example of how syntax and lexical semantics may be represented in qualitatively different ERP components. The same example also relates to the second challenge that neurocognitive processing correlates for language need not be explained with linguistic variables but are also compatible with cognitive variables present in language
253
254
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
and other cognitive domains. The latter question of whether language-specific or domain-general cognitive explanations fare better for neurocognitive results remains one of the ongoing debates in the field and concerns data from all neurocognitive measures (cf. Hasson et al. 2018; Hagoort 2019a). Another group of challenges in data interpretation originates from the multidimensional data obtained with neurolinguistic experiments. Of course, multidimensionality itself is advantageous as it provides a wealth of data which permits the researcher to analyse a phenomenon from different perspectives. However, such a wealth of data also comes at a price, namely with possible under- or overinformativity (Rugg 1999: 29–32): •
•
•
8.4
null results: The absence of a statistically reliable difference, be it qualitatively or quantitatively, in the processing of different information units does not represent strong evidence for the true absence of a difference. As multidimensional data are influenced by several variables, some of which are outside of the experimenter’s control, a difference of interest may have been masked by confounding variables. As a consequence, null results are rarely functionally interpreted. functional dissociation: Dissociating different linguistic or cognitive functions rests on the assumption of invariant function-to-brain mapping and emphasises qualitative differences (e.g., different ERP components, different activation patterns in fMRI data) vis-à-vis quantitative differences between experimental conditions. However, both the multidimensionality of the data and the still limited crosslinguistic replication of major research findings undermines this assumption. Hence, data from diverse experimental designs and languages is necessary to fully delineate different functions. Functional significance: This is strongly related to the question of whether there are correlational or causal relationships between a linguistic or cognitive function and brain activity. For example, it is more the rule than the exception that in fMRI experiments more brain regions show significant (de-)activation in response to a stimulus than has been hypothesised. Here the challenge is to dissociate systematically those (unpredicted) areas which are responsive to the linguistic stimulus from those that are sensitive to environmental noise during recording. This may be achieved by systematic variation of experimental designs or the combination with other measures that can target circumscribed regions (such as TMS).
Basic Research Findings
Neurolinguistics is concerned with the relationship between the brain and language and aims to elucidate the neural underpinnings of language
Neurolinguistics
processing, mainly with an experimental approach. Knowledge in the field is rapidly evolving due to the continuous advancement of neurocognitive methods for data collection that make it possible to study the brain correlates of language processing in real time and in vivo. While early work on the language behaviour of aphasic patients suggested that only two brain regions, namely Broca’s area and Wernicke’s area, are predominantly responsible for language production and comprehension, modern neuroimaging techniques have contributed to a better, more precise understanding of the neural correlates of language, now rejecting this classical view of language in the brain (Dronkers 2000; Poeppel 2014; Tremblay & Dick 2016). Instead, there is an extended language network in the brain (very likely composed of several sub-networks, cf. e.g. Vigneau et al. 2006), including several cortical and subcortical areas subserving language function and several fiber tracts supporting crosstalk amongst the areas (Friederici 2011; Fedorenko & Thompson-Schill 2014; Hagoort 2019a). There is converging evidence that, although the areas’ primary or only function is not language processing, functional dissociations corresponding to different linguistic domains are present. For example, semantic information is processed primarily in areas located in the temporal lobe, while phonological processing during speech recognition takes place in parietal and temporal areas. Also, there is good evidence that language processing proceeds along two streams, beginning in parietal areas and progressing through the frontal cortex. While the specific functions that are assigned to the streams differ in the dual-stream models proposed to date (e.g., Hickok & Poeppel 2007; Friederici 2012; BornkesselSchlesewsky et al. 2015), it is undisputed that the dorsal stream and the ventral stream perform distinct operations. Moreover, the data suggest a special hemispheric division of labour in healthy participants in that, at least in the righthanded population, the core domains of language structure (segmental phonology, semantics, and syntax) are predominantly (but not exclusively) processed in the left hemisphere of the brain, while especially information types related to the pragmatic use of language (e.g., prosody, theory of mind) tend to engage the right hemisphere. Finally, the current view that there is a language network in the brain suggests that language is not implemented in a modular fashion in the brain and that language-related areas also perform other cognitive operations. A second important strand of research is neuropsychology focusing on causal relationships between brain structure and cognitive function. Neuropsychological research has traditionally depended heavily on lesion studies, investigating which deviant behaviour can be associated with a dysfunctional brain region. By relying on deficits, lesion studies are among the few approaches where causal relationships (such as ‘necessity’ and ‘sufficiency’) between brain and cognitive functions can be directly tested. Language (and other cognitive abilities) can be impaired either following acute injuries and diseases that are known to cause classic aphasia syndromes or as a function of progressive neurodegenerative diseases that are known to cause primary progressive aphasia syndromes. The resulting
255
256
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
syndromes are associated with distinct behavioural abnormalities as well as with partly distinct lesion sites in the brain. There are a number of current debates and open questions that will likely shape neurolinguistic research in the future: •
•
•
8.5
The status of animal models: There is no fully fledged animal model to study language in the brain because human language in its complexity is unparalleled vis-à-vis animal communication. However, animal models of communicative functions such as vocal learning in songbirds, or primate models of neural structure related to sensory processing may contribute to our understanding of neural pathways for language and speech comprehension as well as the connection between genes and brain structure in developing communicative functions leading up to human language (e.g., Bornkessel-Schlesewsky et al. 2015). Reductionist vs. naturalistic designs: Most neurocognitive research is based on highly reductionist experiments in the lab (cf. Chapters 2 and 7), but over the past decade especially research in the neurobiology of language has stressed that language must be investigated in its natural communicative situation (e.g., Andric & Small 2015). Thus, the field is witnessing a steadily increasing number of neurocognitive investigations of language with naturalistic designs (cf. Brennan 2016; Hamilton & Huth 2018; Alday 2019). This endeavour goes hand in hand with the view that the links between neurolinguistics and neighbouring disciplines such as psychology, cognitive science, and linguistics must be strengthened in a unified interdisciplinary research program in order to fully understand humans’ language capacity (e.g., Embick & Poeppel 2015). Linguistic variability and diversity: The preceding aspects have mainly been studied with a focus on contrasting the mature vs. developing brain and the healthy vs. lesioned brain in a few major languages. Future research will have to incorporate linguistic diversity amongst healthy speakers (cf. Section 7.4) in order to further constrain models of language in the brain in such a manner that cross-linguistically plausible assumptions about language are incorporated. This requires more empirical data on brain correlates of language processing in typologically different languages (cf. Bornkessel-Schlesewsky & Schlesewsky 2008, 2016).
Summary
Neurolinguistics studies the relationship between the brain and language, asking what neural mechanisms are responsible for successful language behaviour and, thus, for successful communication. Two basic research
Neurolinguistics
methods are used to study the brain-language relationship: systematic open observation of patients with various forms of brain damage and experiments with neurocognitive measures. The choice of experimental design and methods for data collection depends on the research area, namely, language comprehension, language production, and language acquisition, and on the group of research participants with their specific linguistic abilities (healthy vs. impaired speakers, adults vs. children, humans vs. other species). While neurolinguistic experiments largely include the same key components of psycholinguistic experiments (experimental tasks, stimulus presentation modes, and behavioural experimental measures), they differ methodically from psycholinguistics in the extensive use of neurocognitive measures for data collection. Neurocognitive measures provide multidimensional data in the temporal or spatial domain of processing in the brain, posing additional challenges for data interpretation. The neural underpinnings of linguistic or other cognitive representations and processes can only be inferred indirectly from the data and, thus, multiple experiments or analyses with diverging methods are necessary to arrive at a full understanding on the brainbehaviour relationship for various cognitive domains. Very few neurocognitive measures allow causal inferences on the brain-behaviour relationship, so neurolinguistic studies mainly provide correlative evidence. The findings from experiments in all subfields of neurolinguistics have revealed that there is an extended language network, covering brain areas in all lobes and cortical and subcortical structures alike. These areas are not specific to language – i.e., they also serve other ontogenetically older cognitive functions and have been recruited for language processing when a linguistic function shares some properties with such an older cognitive function (e.g., vision and reading).
8.6
Exercises and Assignments
Neurolinguistic experiments require special technical equipment or access to patients with language(-related) disorders, so student projects may not be feasible in every case. The following exercises for students serve to familiarise them with the basic aspects of neurolinguistic research and can be included during a session on neurolinguistics or as part of project work: 8.1
8.2
Take one or more neurolinguistic studies (if more than one, all from the same neurolinguistic subfield) investigating the same or similar linguistic phenomena (e.g., processing of preferred vs. less preferred structures, semantic access) and work out the experimental components (variables, experimental tasks & measures). In what way do multiple experimental designs contribute to a more comprehensive investigation of the phenomenon? Find a research question for a neurolinguistic experiment and discuss whether measures with temporal or spatial resolution are suited to investigate that research question.
257
258
s p e c i fi c r e s e a r ch ap p r o a c h e s o f li n g u i s t i c s u b d i s c i p li n e s
8.3
The neural representation of verbs and nouns is taken as evidence for the special representation of syntactic categories and the associated object vs. action concepts (cf. Kemmerer 2015, chapters 10 and 11). Discuss the generalisability of the findings and conclusions in light of the cross-linguistic differences in differentiating between nouns and verbs.
Here are some ideas for minor research projects: 8.4
8.5
8.6
8.7
The CHILDES database for child language includes corpora with data from language-impaired children, some of them morphosyntactically annotated (https://childes.talkbank.org/access/Clinical-MOR/ and https://childes.talkbank.org/access/Clinical/). Use the datasets for two children with the same native language with different cognitive impairments and work out the specifics of their language deficits using a single-case study approach (Caramazza 1984, 1986). To what extent do the deficits differ from one another? Are your results in line with findings from the literature on the typical symptoms of the impairment under study? Alternatively, use datasets from children with different native languages but the same type of impairment. To what extent do the symptoms differ as a function of language? How can this be accounted for based on linguistic features or other variables (recording situation etc.)? Write a short review paper on a linguistic function of your choice. Find the first neurolinguistic publication on that linguistic function and then sample about five research articles from each decade up to the year of writing your review. (Your review should include three to four decades, so you may skip some decades for phenomena with a long history in neurolinguistic research.) When writing the review paper, pay particular attention to how the choice of methods for data collection and analysis affected research findings and the quality of research in general. Develop a neurolinguistic research plan from hypothesis generation to operationalisation (variables, stimuli, measurements, etc.) so that it is ready to be implemented in a laboratory. You can use previously published studies as a starting point to develop your research plan.
Further Reading
A very detailed introduction to cognitive neuroscience is Gazzaniga, Ivry & Mangun 20144, which is published in updated editions on a regular basis. The most recent edition by Poeppel, Mangun & Gazzaniga 20206 features several new chapters on language in the brain. Developmental cognitive neuroscience is introduced in Johnson & de Haan 20154. Introductions to
Neurolinguistics
neurolinguistics or the cognitive neuroscience of language are Ahlsén 2006, Ingram 2007, Friederici 2017, Kemmerer 2015. Caplan 1987 is devoted to neurolinguistics and aphasiology. Neurolinguistic handbooks and chapters in handbooks on psychology include Arbib 2015, Blumenstein 1995, Blumenstein & Myers 1993, Blanken, Dittmann & Grimm 1993, Brown & Hagoort 1999, Faust 2012, Hagoort 2019b, Stemmer & Whitaker 2008 and de Zubicaray & Schiller 2019. Hickok & Small 2016 is a handbook specifically devoted to the neurobiology of language and covers a very broad range of topics. There are also several edited volumes on neurolinguistic research methods. The volume edited by De Groot and Hagoort 2018 includes chapters on various experimental paradigms and methods for data collection. Luck 20142 is an introduction to the event-related potentials method, Rugg & Coles 1995 and Luck & Kappenman 2012 are handbooks to electrophysiological methods in neurolinguistics. Levelt 2014 reviews the history of neurolinguistics beginning from the pre-Chomskyan era. As neurolinguistic research is very rapidly evolving due to the advancement of neuroscientific methods and experimental designs, handbooks or textbooks often run the risk of becoming quickly outdated. This is possibly a reason why there are not as many of them compared with other linguistic subdisciplines. Therefore, the interested reader is strongly encouraged to become familiar with current research articles published in scientific journals. Major neuroscientific and psychological journals that publish neurolinguistic findings include, inter alia, ‘Neurobiology of Language’ (MIT Press), ‘Brain and Language’ (Elsevier), ‘Brain Research’ (Elsevier), ‘Nature Neuroscience’ (Springer Nature), ‘Nature Reviews Neuroscience’ (Springer Nature), ‘Neuropsychologia’ (Elsevier), ‘Journal of Cognitive Neuroscience’ (MIT Press), ‘Cortex’ (Elsevier), ‘Cerebral Cortex’ (Oxford University Press), ‘NeuroImage’ (Elsevier), ‘NeuroReport’ (Wolters Kluwer), ‘Neuroscience Letters’ (Elsevier), ‘Journal of Neurolinguistics’ (Elsevier), ‘Psychophysiology’ (Wiley) or ‘Science’ (American Association for the Advancement of Science). Neurolinguistic research is increasingly published in open access journals such as ‘eNeuro’ (Society for Neuroscience), ‘Frontiers’ or different journals published by ‘Plos’. Finally, there are useful online introductions to some neurocognitive methods: • •
www.newbi4fmri.com includes an introduction and hands-on tutorials to the fMRI method. https://erpinfo.org includes useful information and further reading on experiments with event-related potentials (ERP).
259
PART III
Linguistic Research across the Discipline
9
Insights from Linguistic Research
The aim of this book was to provide an overview of linguistic research in its broad range of approaches and varieties. This includes an understanding of how different subdisciplines contribute and interact to foster a deeper understanding of language in its many facets. There is no single empirical approach or method that is per se better than any other – they all have their pros and cons. Rather, each of them is designed and suited to study different aspects and thus to provide answers to different research questions. In this regard, findings in one area support research in others. Therefore, in this book we strove neither to focus on a single empirical approach nor to argue in favour of one methodological option over another. Instead, our aim was to provide a comprehensive overview of the entire field of linguistic research and to raise awareness for components and decisions in the research process as well as for consequences that come with empirical decisions. Hence, Chapters 1 and 2 give an overview of empirical research (basic components, considerations & methods of data collection), and Chapters 3 to 8 provide more detailed descriptions of empirical research (basic questions, methodology, and findings) in various subdisciplines, namely documentary and descriptive linguistics, language typology, corpus linguistics, sociolinguistics and anthropological linguistics, cognitive linguistics and psycholinguistics, and neurolinguistics. Admittedly, most linguists specialise in one or, at most, two sub-fields (as you will probably experience yourself ), but we nevertheless feel it is a worthwhile undertaking to become familiar with all subdisciplines at a basic level (as we hope is achieved by this this book) – on the one hand, in order to know the range of empirical options for choosing your own field of empirical interest and, on the other, in order to become aware of where your research is situated within linguistics and in which way your own research area interacts with other sub-fields. The latter helps to know on which findings you can build, for whom your findings might be interesting, and how you can combine different methods or develop innovative research approaches. In this chapter, we will provide you first with a summary of the basic subdiscipline-specific research questions, methodologies, and findings addressed in this book (Section 9.1), including a discussion of methodological strengths and weaknesses of the distinct approaches. On this basis, we will then discuss how the subdisciplines are related and complement each other in exploring the phenomenon of language in its complexity (Section 9.2). Such reflections on the 263
264
l i n g u i st i c r e se a r c h a c ros s t h e di s c i p l i n e
interfaces and (potential or existing) research co-operation within linguistics finally lead us to considerations regarding interdisciplinary research in linguistics (Section 9.3). Ultimately, we will discuss some trends in linguistics, how they are influenced by technological progress, and what the associated data management means for research (Section 9.4). We then close with some final remarks on the aim of the book (Section 9.4).
9.1
Summary of the Subdiscipline-Specific Research
The various linguistic subdisciplines focus on different aspects of language and have developed a broad range of distinct methodological approaches to investigate them empirically. The range of research results is equally diverse, each subdiscipline contributing their specific findings. Thus, we think it is extremely profitable to bring together the knowledge from various subdisciplines in order to gain more comprehensive insights on language. As we are aware of the basic research questions, methods and results of all subdisciplines, we can now identify points of reference for collaboration and work on closing knowledge gaps. In this sense, Table 9.1 provides a summary of the subdiscipline-specific fundamental research aims, methods, and findings. The subdisciplines differ with regard to fundamental research aspects such as the object of research (types of languages), the main research location, the research data, and the research approach. Even though subdiscipline-specific research is in itself often quite diverse and the types of research present opposite poles on a spectrum with fluid transitions in between, there are clear tendencies with regard to fundamental types of research per subdiscipline. a.
Depending on the research objective and the state of research, they focus on different (kinds of ) languages in their investigations: • well-studied major languages vs. understudied minor languages: The well-studied major languages are primarily Indo-European languages such as English, French, Spanish, German, Russian. They generally have a greater number of speakers as compared to the less studied languages. single languages vs. multiple languages: • In contrast to studies on a single language, research on multiple languages includes investigations on the interaction of these languages (as in language contact) and comparative investigations of two or more languages. • languages vs. language varieties: In contrast to research on languages, there are studies that focus on certain varieties of a language, such as dialects, sociolects, and pluricentric language varieties. While documentary and descriptive linguistics focus their research on understudied minor languages, most corpus linguists work with the major languages for which large amounts of data are already available in ready-made corpora. Based on the existing knowledge about single
Insights from Linguistic Research
265
Table 9.1 Summary of the basic research aims, methods, and findings per subdiscipline
Language documentation
Descriptive linguistics
Language typology
Corpus linguistics
Antrhopological linguistics
Research aim/topic
Research method
Research results
documentary collection of data prior to language extinction structural description of previously unstudied languages commonalities & differences in the languages of the world patterns of language use (primarily in major languages) relationship between language & culture
recording & editing of natural language data
language corpora
collection and analysis of systematic survey (elicitation)
grammars & dictionaries
cross-linguistic comparison based on published data (grammars) analysis of natural language data
universals & typologies
Sociolingusitics
correlation between language & social features of its speakers
Cognitive linguistics
relationship between language & thought mental concepts & processes in language production & comprehension operations of language processing in the brain
Psycholinguistics
Neurolinguistics
participant observation & survey for data collection, analysis of language data in its cultural context correlation analysis (features in natural language representative of distinct social subgroups) language analysis (regarding mental concepts) & experimental tasks language experiments (behavioural offline and online measures)
language experiments (neuroscientific online measures)
frequencies of occurrence, concordances & collocations cultural conceptualisations, ethnographies of speaking social markers & sociolects, dialects & atlases
impact of language on thought (incl. ‘thinking for speaking’) temporal and conceptual organisation of language processing
neurocognitive brain systems and temporal & spatial patterns of brain (de)activation
266
l i n g u i st i c r e se a r c h a c ros s t h e di s c i p l i n e
languages, typologists study the languages of the world crosslinguistically (current and past languages). Sociolinguists, instead, work on a micro-level. They focus on language-internal variation and provide insights into language varieties. As in psycho- and neurolinguistics, there is a research bias towards the major languages, which are not only better studied but usually also more easily accessible in terms of infrastructure for research. Anthropological linguists and cognitive linguists, in contrast, are predominantly interested in the less studied minor languages. Besides research on natural human languages, there are also studies on other linguistic or communicative systems. In search of natural language characteristics, for instance, natural languages can be typologically compared to artificial or constructed languages (such as the Star Trek language Klingon or the planned language Esperanto). Likewise, psycho- and neurolinguists compare human language and linguistic capacities (at different ages) to the communicative systems and capacities of other species either genetically related to humans (great apes) or not (e.g., songbirds) in order to identify characteristics and prerequisites for language. b.
Depending on the language(s) to be studied and, hence, the availability of data, researchers of different disciplines conduct their studies in different research environments/locations: • field vs. office vs. laboratory: In cases in which a language is studied in its natural environment and/or no language data are available prior to the research project, fieldwork is necessary. If research can build on existing data, it can generally be analysed without leaving the office. Research in a laboratory environment serves to examine language under controllable conditions. Furthermore, the use of immobile technical devices requires that research participants come to the researcher’s laboratory, in contrast to field research where the researcher goes to the research participants. While documentary and descriptive linguists conduct field research primarily in order to get unlimited access to native speakers, anthropological linguists and sociolinguists are also interested in the field (i.e., the sociocultural environment of the speakers) as an object of research. While the field sites of documentary, descriptive and anthropological linguistics are often non-Western locations, sociolinguists work primarily in specific Western places. Neuro- and several psycholinguists conduct their experimental research in the laboratory. This environment is necessary due to having access to technical devices and in order to keep confounding factors constant. More often than not, laboratories are situated in locations with advanced socioeconomic standards. Cognitive linguists, in contrast, work primarily with field experiments. Most typological and corpus studies
Insights from Linguistic Research
are based on pre-existing data and, therefore, neither field nor laboratory research is necessary. Researchers of these subdisciplines often do not leave the office environment, which includes working in libraries, archives, etc. c.
Depending on the research objective and partly also the required research effort, the various subdisciplines work with distinct kinds of research data: • primary vs. secondary or tertiary language data: In contrast to self-collected primary data, researchers who work with secondary or tertiary data have not collected the data themselves. For their analyses, they rely on data that others (primarily other linguists) have compiled or even pre-processed (cf. Section 1.1.4). • natural language data vs. language data that is generated explicitly by/for the research: While natural language data are produced in natural contexts without the impact of the researcher, non-natural language data are generated within the research process on the initiative of the researcher (i.e., elicitation), such as task-driven data. The transitions between the two kinds of data are often fluid (cf. Section 1.1.4). Besides non-natural language data, data from languagerelated tasks (such as eye movements or judgements) are usually also created in a controlled context. • written vs. spoken or signed language data: Natural as well as research-generated language data can be spoken or signed and/or written. Collecting data (including documentation and editing, cf. Section 1.2.) is considerably more elaborate when working with spoken or signed than with written text genres. The collection of natural language data (predominantly oral genres) is the major goal of documentary linguistics. In contrast, most corpus linguists analyse such primary data without necessarily having compiled the natural language data themselves. In this case, the use of ready-made corpora is research based on secondary data. A period of language data collection is also part of most studies in descriptive linguistics. However, the collected data are first and foremost non-natural language data which is elicited systematically to get the needed information. Some descriptive work (e.g., corpus-based grammars) is also based on the analysis of natural language data. Typological studies rely predominantly on tertiary data as published in grammars (i.e., language data that has already been analysed by descriptive linguists). They analyse the data further in terms of cross-linguistic similarities and variation. In the case that the required linguistic data are not available, typologists also collect the data themselves, generally by use of questionnaires (i.e., systematic elicitation). Socio- and anthropological-linguistic studies generally include a period of data collection (linguistic and/or sociocultural data) but it is also
267
268
l i n g u i st i c r e se a r c h a c ros s t h e di s c i p l i n e
possible to work with secondary or tertiary data to the extent that the required information is available (e.g., tertiary language data in grammars, secondary language data and social data in corpora, or cultural data in ethnographies). A period of data collection is also part of research in cognitive linguists as well as psycho- and neurolinguistics. The collected data are generally language-related data triggered by linguistic items in studies on language perception and natural or taskcontrolled language data in studies on language production. d.
Depending on the research objective and the basic research question, the subdiscipline-specific research approaches differ in various ways. Fundamental distinctions can be made between: • explorative vs. hypothesis-testing approaches: While explorative studies aim at getting a preliminary overview of a relatively unstudied topic, problem-oriented or hypothesis-testing studies have a specific research focus based on previous knowledge on a topic (cf. Section 1.2.1). Thus, descriptive research (describing a phenomenon) is generally more explorative and explanatory research (finding an explanation for a phenomenon) more problem-oriented. • quantitative vs. qualitative approaches: Explorative studies are predominantly qualitative, i.e., they aim at textual descriptions of a research issue. Conversely, hypothesistesting studies are often quantitative, i.e., they aim at numeric presentations of a research issue (how often a particular phenomenon occurs cross-linguistically, in a certain co-text or context, by a particular group of speakers, etc., cf. Section 1.1.4). Documentary and descriptive linguistics, corpus linguistics, and language typology are mainly descriptive in the sense that they try to find answers to what is the case – language phenomena in a single language, in natural language use or cross-linguistically. Contrastingly, socio- and anthropological-linguistics, cognitive linguistics, psycho- and neurolinguistics are more interested in finding answers to why we find these language phenomena instead of others, i.e., they follow an explanatory approach searching for (neuro-) cognitive & sociocultural reasons. While documentary, descriptive and anthropological linguistics, pursue a more qualitative approach, the main approach of corpus linguistics, variationist sociolinguistics, psycho- and neurolinguistics is more quantitative. In language typology, the search for (statistical) universals requires quantitative analyses, whereas the search for typologies (i.e., the spectrum of variation) is a more qualitative kind of research.
Overall, each methodological approach or procedure has its strengths and weaknesses. Starting from the research subject and the state of research, each subdiscipline has developed an optimal research design to empirically investigate the central research question(s) – weighing up the pros and cons and considering
Insights from Linguistic Research
the research effort. In the following, we will present some fundamental considerations in favour of each methodological procedure: • • • •
•
•
the collection and editing of natural language data are optimal to save linguistic knowledge as fast as possible prior to its extinction, as it can also be carried out by trained non-linguists working with standardised language/linguistic data is better for cross-linguistic comparability working with systematic elicitation is better suited for capturing language phenomena that rarely occur in natural language corpora the analysis of (representative) natural language corpora is the best or even only way to get information on unconscious language behaviour (such as slips of the tongue) and on actual occurrence patterns (such as frequency) the observation of native speakers in their natural sociocultural environment is the best option for studying the relationship between language and society/culture and for detecting relevant (emic) parameters working under controlled conditions allows for the systematic examination of the influence of a manipulated variable, excluding the influence of other variables
Quite often, the decision for one methodological procedure over another is also led by quite practical considerations regarding research effort. The broader the research issue, the less detailed the analysis on a micro-level is. Otherwise, projects would not be feasible within a realistic timeframe and with realistic resources. Typological studies based on natural language data, for instance, call for comparable well-balanced corpora in languages of all language families and sub-families. So far, the available data generally only allow for the analysis of parallel texts such as the bible and brochures of international institutions such as the UN – an admittedly very limited set of text types. Even the collection of systematically elicited language data via questionnaires in a representative sample of languages is hardly manageable for a single person. Similarly, anthropological- and sociolinguistic studies which require a high degree of effort to collect detailed sociocultural information and/or natural language data as used on an interactional micro level (i.e., by distinct speakers in different situations and contexts and towards different speech act participants) of a single language or even language variety are generally too labour-intensive to be investigated cross-linguistically. The decision for a methodical approach is also accompanied by disadvantages, which often lead to critical remarks. Some fundamental points of critique regarding individual methodological procedures are: •
bias towards the major languages in corpus linguistics and variationist sociolinguistics and often an overrepresentation of written data in large corpora
269
270
l i n g u i st i c r e se a r c h a c ros s t h e di s c i p l i n e
•
the lack of (sufficiently detailed) contextual information (sociocultural data on the speakers/authors, data on the social context of production, etc.) on the language data and its consideration in studies that rely on secondary data (e.g., ready-made corpora) a comparatively (too) small corpus size in documentary linguistics as compared to the size of ready-made corpora of the major languages and, hence, the risk of non-representativeness the insufficient consideration of language-internal variation (as it occurs in natural language) in descriptive studies which are based on systematically elicited (idealised) language data the risk of working with bad data in studies that rely on previous work done by others (i.e., secondary or tertiary data), particularly if the quality of this data is not reflected upon (e.g., how representative is an existing corpus) or cannot or only hardly be checked working with an unbalanced sample or the overrepresentation of better described languages in typological research for reasons of convenience the extreme simplification of linguistic complexity as it occurs in the individual languages for classification purposes the rather low generalisability of findings in anthropological linguistics and sociolinguistics due to research in very specific situations/contexts with many interacting parameters which are hardly or not comparable across languages/varieties and language communities the risk of generating Western-centric findings in psycho- and neurolinguistics (together with age and education biases) due to the overrepresentation of research participants from so-called WEIRD (Western, Educated, Industrialised, Rich, and Democratic) backgrounds (cf. Section 7.3.1) and studies on the major Indo-European languages the issue of whether laboratory results reflect language in natural contexts
• • •
• • •
•
•
Since no single approach is suitable for investigating all aspects of language, we have presented the variety of methodological approaches with all their strengths and weaknesses. Instead of simply doling out criticism in order to justify one’s own empirical procedure, the aim here was to create a useful overview to keep the big picture in mind. For a comprehensive understanding of language in its complexity, each subdiscipline contributes its part. It is crucial to know which subdiscipline is studying what, how, and why (see above), in order to identify interfaces for cooperation (see Section 9.2) and ways to fill research gaps. In this context, the points of criticism can provide a valuable starting point.
9.2
Interfaces of the Subdisciplines
Research does not start in a vacuum, but builds on previous experience and findings (cf. Section 1.1). In this sense, the research of the various
Insights from Linguistic Research
subdisciplines is also interlinked, i.e., the findings of individual subdisciplines provide important input for research in other areas. Figure 9.1 gives you an overview of the interfaces between the various subdisciplines – which knowledge is useful or necessary for which research. In the following, we will identify some connections between the subdisciplines and discuss a few examples. On a general level, approaches searching for answers that explain why one finds certain linguistic structures or phenomena (instead of others) build upon research describing what linguistic structures/phenomena are present or how languages look. In Figure 9.1, this is indicated by solid arrows. Documentary and descriptive linguistics, for instance, produce corpora and grammars on individual languages. The corpora allow for further analysis with regard to language-internal variation (such as in corpus linguistics) and the grammars can further be analysed in terms of cross-linguistic variation, as done in language typology. The findings of typological research, namely crosslinguistic universals and variation, can again provide input for further studies. While psycho- and neurolinguistics is primarily interested in finding explanations for cognitive-linguistic universals (e.g., cognitive processing mechanisms
language documentation & descriptive linguistics what/how?
grammars
corpora
language typology what/how?
crosslinguistic variation
corpus linguistics what/how?
universals
cognitive linguistics how/why?
cognitive representations
languageinternal variation
sociolinguistics & anthropological linguistics how/why?
socio-cultural impact
Figure 9.1 Interfaces of the subdisciplines
psycho-& neurolinguistics how/why?
neuro-cognitive representations & processes
271
272
l i n g u i st i c r e se a r c h a c ros s t h e di s c i p l i n e
in the human brain), anthropological and cognitive linguists work first and foremost on the other side of the spectrum. They are interested in the cultural impact on cross-linguistic variation and the linguistic impact on differences in thinking. In contrast, the reasons for language-internal variation are studied in sociolinguistics. This kind of research is often based on findings resulting from corpus analysis as in variationist sociolinguistics – extending the purely linguistic analysis to a correlation analysis in relation to social characteristics of the speakers. Finally, findings can be related to each other. The outcome of anthropological- and cognitive-linguistic research gives us information about the interaction of language, culture and cognition, and the findings of cognitive linguistics and psycho‑/neurolinguistics can be compared in terms of their correspondence to general cognitive principles. However, the impact between the subdisciplines is not unidirectional. Instead, the approaches that provide basic information on individual languages also benefit from further research based on them. In Figure 9.1, this is indicated by dotted arrows. Typological knowledge, for instance, is useful in descriptive linguistics. The more we know about the range of cross-linguistic variation or individual languages of a linguistic area and language family in particular, the better we are prepared to search systematically for linguistic phenomena (e.g., by elicitation) in an unstudied language. Likewise, a deeper understanding of (neuro-)cognitive processes in the human mind or brain does not only provide explanations for language universals but also allows for predictions about the stability of rare phenomena such as linguistic developments in situations of language contact (cf. Section 4.5), or language-internal evolution (crosslinguistically universal processes of grammaticalisation). Furthermore, findings in corpus analyses of individual languages (i.e., the range of language-internal variation) can be used for a more differentiated elaboration of grammars (i.e., corpus-based grammars), and anthropological-linguistic knowledge may be helpful in determining emic genres for documentary research. Furthermore, the methodological procedure in one subdiscipline may open up new possibilities if it can be transferred to another. The experience in corpus linguistics studying linguistic variation in individual languages, for instance, can be used for a cross-linguistic corpus-based study such as in the analysis of parallel texts. Moreover, there are linguists who try to make laboratory methods as used in psycholinguistics or even neurolinguistics suitable for the field (e.g., by use of portable technical equipment such as mobile eyetracking devices) in order to overcome the bias in favour of studies in Western contexts. Finally, in sociolinguistics and anthropological linguistics as well as in psycho- and neurolinguistics, there are also multiple methodological influences from other disciplines (cf. Section 9.3 on interdisciplinarity). However, transferring a method can be difficult or even impossible. The use of advanced neuroimaging methods such as fMRI in field research, for instance, is not feasible, first, because it is impossible to transport the equipment to different places, and second, because there are too many confounding factors that dilute
Insights from Linguistic Research
the data. Regarding the second reason, it is generally extremely challenging to apply experimental methods that are designed to test the correlation between a limited number of parameters in field contexts, and this problem is exacerbated the more advanced the technical equipment is. Moreover, research on individual languages and language varieties is often very detailed and requires the adaptation of empirical procedures to the very specific research situation and (field) context. Such adaptations, however, come at the expense of cross-linguistic comparability (Völkel 2017). Non-standardised methods and findings based on different kinds of data usually lack a basis of comparison across different settings and studies. Generally speaking, the larger the unit of comparison (e.g., on a group level or across languages), the less detail can be considered on an individual level. Thus, the application of empirical procedures as used in descriptive or explorative research is not suitable or too labour-intensive for crosslinguistic research. All in all, collaboration across subdisciplines is very important for the ultimate goal of linguistic research – understanding language in all its facets. This collaboration can take place in several basic ways: •
•
•
•
first of all, by reflecting on the significance and the kind of contribution of one’s own research within the whole field of empirical linguistics (this is possible without direct interaction with linguists of other subdisciplines, and such reflections contribute to research quality, cf. Sections 1.1.5 and 1.2.9), second, by considering methodological procedures of other subdisciplines for the research in one’s core area (this is possible without direct interaction with linguists of other subdisciplines, and is comparable to interdisciplinarity in person, cf. Section 9.3), third, by providing data or further research outcomes specifically for other subdisciplines or by building upon research findings of other subdisciplines (this is possible without direct interaction with linguists of other subdisciplines and is comparable to multidisciplinarity, cf. Section 9.3), fourth, by developing joint research projects across subdisciplines (this means direct cooperation with linguists of other subdisciplines, comparable to interdisciplinarity in teams, cf. Section 9.3).
Cross-disciplinary fields such as language acquisition, language contact, or language change are good examples to illustrate how different empirical approaches contribute to a more comprehensive understanding of the respective phenomenon, as can be seen in the broad spectrum of research questions (cf. Tables 3.1–8.1). First language acquisition, for instance, is investigated in several ways, pursuing different questions and using different methods: First of all, the natural communicative behaviour of children at different ages is observed and compiled in corpora of child language and child-directed speech (e.g., by documentary
273
274
l i n g u i st i c r e se a r c h a c ros s t h e di s c i p l i n e
linguists). Corpus linguists then analyse this data in search of developmental milestones. Psycho- and neurolinguists use additional experimental methods (cf. Section 7.3.6) in order to find out more about the cognitive and neuronal underpinnings of language acquisition. The cross-linguistic focus in language typology, instead, reveals which aspects of the acquisition process are universal and which ones show cultural variation. Finally, language acquisition is studied in the context of the socialisation process in anthropological linguistics and sociolinguistics (e.g., how and by whom children are socialised and how this relates to language acquisition). In language acquisition studies in a multilingual environment, knowledge of the individual contact languages as well as the specific contact situation must be taken into account, which makes the research process even more complex. This includes, for instance, considerations on the typological relatedness of the contact languages and consequences for cognitive representations as they become visible in natural language use, language ideologies related to the languages in contact and their respective social function for the speakers. Finally, when empirical evidence from multiple methods is taken into account to answer a research question (cf. multi-methods combinations vs. mixed-methods designs in Section 2.5.1), it is especially important that researchers decide a priori about the significance of each method. This includes considerations about what each method adds to our understanding of a particular aspect of language grammar or use, and how the findings relate to each other. Generally speaking, data collected with different methods can relate to one another in one of two ways: •
•
converging evidence: Results of one method are congruent with results of the other method. A well-known example is the relation between frequencies derived from corpus counts and acceptability judgements from psycholinguistic experiments. Phenomena that occur frequently in a corpus are typically judged as more acceptable in comparison to phenomena that are less frequent (Arppe & Järvikivi 2007). diverging evidence: One possibility is that the results of one method show contrary effects to the results of the other method. Or, in other words, results of one method do not predict the kind of findings obtained with another (e.g., McConnell & Blumenthal-Dramé 2019, Dabrowska 2014). For example, rare phenomena in corpora are not necessarily judged to be less acceptable in experiments collecting acceptability judgements, as they may also achieve a relatively high level of acceptability (Arppe & Järvikivi 2007). A second possibility is that only one method shows a clear effect for a variable, whereas the other method reveals no effect at all (a null effect). This is the case, for example, for word frequency, which has clearly measurable behavioral effects (e.g., longer reading times for low-frequency words vs. highfrequency words), but which yields null results with some neurocognitive methods (especially EEG; cf. Kretzschmar et al. 2015).
Insights from Linguistic Research
Many empirical studies report converging evidence, while diverging evidence tends occur less often. One reason for this is that the lack of evidence due to null effects is difficult or even impossible to interpret (especially when languagerelated data are collected). Another reason is that each method (observation and corpus, survey, experiment) potentially represents different aspects of the use and mental representation of language, so relating findings from different methods to one another in data interpretation is far from trivial. The question of how to deal with converging and diverging evidence in data interpretation is of course also discussed within the linguistic subdisciplines (e.g., the question of which experimental tasks or stimuli work adequately for a group of participants). However, the answer to this question becomes much more difficult when data from different methods are to be integrated with each other. Although most linguistic subdisciplines have developed complex theories and models of the human language architecture, there is no model that integrates evidence from all different methods and makes predictions about their weighting relative to one another – which is not surprising given the complexity of such a model. The challenge for future empirical research in linguistics will therefore be to systematically collect further multi-methods or mixed-methods data and to develop precise predictions as to which method or combination of methods speaks for or against which assumption about language.
9.3
Inter- and Multidisciplinarity
Inter- and multidisciplinarity describe the collaboration or cooperation of different academic disciplines such as linguistics and cultural anthropology. This definition poses two main problems: First, the demarcation of what are different disciplines may be controversial; usually, it is rooted in historical developments. Second, the terms ‘interdisciplinarity’ and ‘multidisciplinarity’ are often used interchangeably throughout the literature and/or the definitions are not precise or they vary slightly. In order to account for the different types of collaboration and cooperation, as we have seen in the previous section, we propose to make the following terminological distinction (as for instance made by Keck 2008): Interdisciplinarity describes a collaboration across two or more disciplines that includes a joint research process. This means that researchers deal with the methods, procedures, approaches, and ideas of the different disciplines and different research approaches (cf. also Section 2.5 on mixed-methods designs). In multidisciplinarity, in contrast, different disciplines only meet to consolidate their respective findings concerning a special topic but there is no empirical collaboration in terms of a joint research process. Particularly in the past 20 years, interdisciplinarity has become a fashionable buzzword in academia, primarily in research but also in teaching. From this development one expects to gain new, interlinked knowledge and scientific
275
276
l i n g u i st i c r e se a r c h a c ros s t h e di s c i p l i n e
enrichment for each individual discipline and for science as a whole. However, interdisciplinarity not only involves advantages but also challenges such as negotiations in larger research groups and extra work. What makes interdisciplinary research comparatively laborious is the joint research process. This is exactly the phase in which the majority of empirical challenges evolve. In interdisciplinary projects, methodological and basic content knowledge from more than one discipline is required – either in person (i.e., an academic with competences in several disciplines) or in teams (i.e., collaborating academics from different disciplines). While the first kind of interdisciplinarity is more elaborate in education (as the researchers needs to acquire knowledge in two or more disciplines equally) but easier regarding negotiations (as the researchers can negotiate controversial aspects with themselves), the opposite is the case for the second kind of interdisciplinarity. In linguistics, there are several interdisciplinary subdisciplines. As the names suggest, sociolinguistics is a collaboration or even a merger of linguistics with sociology, anthropological linguistics with cultural/social anthropology, psycholinguistics with psychology, and neurolinguistics with neuroscience. In psychoand neurolinguistics, there are also interfaces with biology (anatomy of humans vs. animals). Furthermore, cognitive linguistics, psycho- and neurolinguistics are part of the cogntivie sciences, a complex interdisciplinary network of sciences with a focus on cognition, namely neuroscience, artificial intelligence, psychology, philosophy, anthropology, and linguistics. As we have seen in Chapters 6–8, all these interdisciplinary linguistic subdisciplines have developed their own empirical procedures adopting and adapting research approaches, methods, and components from non-linguistic disciplines. Sociolinguists, for instance, work with several empirical methods and procedures as known from sociology such as the quantitative approach or network analysis (cf. Section 6). Anthropological linguists have adopted the core method of cultural anthropology, namely fieldwork with participant observation, together with a predominantly qualitative approach (cf. Section 6). Psycholinguists use experimental designs similar to psychological research and statistics to evaluate the data quantitatively (cf. Section 7). Finally, the core methods of neurolinguistics (such as fMRI and EEG) originally stem from neuroscience and experimental designs from psychology and cognitive sciences (cf. Sections 7 and 8). Overall, new approaches are often inspired not only from interdisciplinary work but also from a mutual understanding of the linguistic subdisciplines and from thinking outside of the box.
9.4
Current Trends in Linguistics: Technological Impact & Data Management
As in numerous other disciplines, the latest trends in linguistics are strongly affected by general technological progress taking place in recent decades. On the one hand, the (further) development of measurement techniques
Insights from Linguistic Research
or instruments leads to new or better methods and possibilities in data collection and, on the other, IT developments overall (e.g., increased computer performance) allow for the analysis of larger amounts of data by use of new or improved software tools as well as for vast data storage capacities. In linguistics, this development is most evident in psycho- and neurolinguistics as well as in corpus linguistics. The improved possibility to study language by use of neurocognitive methods (cf. Section 7.3.3) and the comparatively low level of knowledge in this area so far, results in an increased interest in psycho- and neurolinguistics, which is apparent in the increasing number of academic institutes with this research and teaching focus. In addition, there is a relatively large number of corpus studies, i.e., a trend to work with natural language data (particularly in the major linguistic disciplines such as English linguistics), which is driven by the improved possibilities in processing large datasets and, last but not least, the availability of the internet as a data source on its own and as a facilitator of accessibility to otherwise remote data. In applied linguistics, this technological trend can be seen in the improvement of electronic language tools such as software tools for translation and language learners, voice recognition or auto-correction/-completion software. In linguistic research, the trend to work with large datasets leads to considerations regarding data management. This means that the optimal handling of data as a valuable resource to ensure their maximum benefit, comprising the aspects of data formats, data storage, data governance, data access, data quality, data security and rights, and so on. Data management is not only important for further data use within a subdiscipline but also for collaborations across linguistic subdisciplines (as described in in Section 9.2). So far, data management in linguistics is primarily required for corpus data, but it gains importance in psycho-/neurolinguistics and other subdisciplines as well in order to allow for the reproducibility of research findings. In documentary linguistics, data management relates to natural language data, which are compiled in edited text corpora. In Chapter 3, we have discussed in detail the kind of data that is required and how it is structured (cf. Section 3.3.3.1), as well as the corpus requirements, including data-management related aspects such as portability, expandability, preservability, and ethics/rights (cf. Section 3.4). Although corpus data management is a prudent requirement ensuring that the data can be used for (further) analyses of various kinds (comparable to the readymade corpora of the major languages, cf. Chapter 5), the mentioned requirements do pose challenges. Expandability means that the data structure is clear and that the data platform allows for the addition of data by different researchers while not permitting access to change existing data. Preservability requires that the data platform exists over time, or if it is not maintained anymore, that the data is transferred to a new platform. The aspect of data rights and ethics means that the permission of the authors (speakers, writers, publishers, etc.) is needed not only for the collector’s research purpose, but also for making the data available to a wider academic community. This leads to the question of to whom access is
277
278
l i n g u i st i c r e se a r c h a c ros s t h e di s c i p l i n e
granted and how this access is controlled. Documentary corpus archives are primarily used by descriptive linguists for further analysis, but altogether large amounts of data still remain unused as compared to the corpora of the major languages. In addition to corpus databases, larger research groups have increasingly worked in recent decades on bringing together other kinds of datasets to make them more visible and usable for further research purposes. Examples from language typology (cf. Section 4.4) are the Universals Archive (an extensive collection of all kinds of universals that have been postulated so far in various publications) and the World Atlas of Language Structures or Glottobank (large databases of structural properties of numerous languages across the globe gathered from descriptive materials) which are similar to linguistics maps/atlases in dialectology (cf. Section 6.4). While the first data source simplifies the finding of typological findings throughout the literature and provides a general overview, the latter (WALS) allows for typological searches (typologies, distribution of features, correlations of features, etc.). In sum, such databases are not only useful for typologists, but also for researchers of other subdisciplines that build upon this knowledge – i.e., a database of findings in one subdiscipline may serve as a source database for another subdiscipline (cf. Section 9.2). There are also similar initiatives in psycho- and neurolinguistics to make large databases with either behavioural (e.g., aphasia.talkbank.org) or neuroimaging data (e.g., openneuro. org) available, similar to the typological databases. These databases collect raw data from different types of experiments, allowing researchers to conduct other analyses of individual data sets than the ones used for the original publication or to run meta-analyses across several experiments. However, not all kinds of data allow for data management in the same way. Particularly, qualitative primary data as collected in anthropological linguistics and interactional sociolinguistics are extremely problematic in this respect, mainly due to ethical concerns. Digital data archiving and open or even controlled data access must be weighed against the importance of building research participants’ confidence for data collection in a personal context. As qualitative research generally builds upon data of single or a few research participants, it is often impossible to guarantee confidentiality in the case of primary data publication (confidentiality vs. anonymity, cf. Section 1.1.4). Therefore, the principle applies that primary research has priority over research data management, just as is the case in cultural anthropology and other social sciences. Furthermore, the collected data can generally not be understood without socio-cultural and/or situational contextualization and embodied knowledge of the researcher. And finally, it is very challenging to establish database conventions that account for the great heterogeneity of qualitative research data. Despite the trends in linguistics that take advantage of the possibilities of technological developments and generate new knowledge about language, one should not neglect the other subdisciplines. As we have shown, no single approach or subdiscipline is equally capable of answering all research questions
Insights from Linguistic Research
and illuminating all aspects of language. Thus, good collaboration (incl. data management) across the linguistic subdisciplines is important, bringing together findings from qualitative and quantitative research, explorative and hypothesistesting research, field, office and laboratory research, research on languageinternal and cross-linguistic variation, etc. (cf. Section 9.1). Equally, a balanced interaction between theory (theoretical considerations) and empiricism (empirical evidence) is of great importance.
9.5
Concluding Remarks
Finally, we hope that this book has provided you with a comprehensive insight into empirical linguistics. Apart from the basics of empirical research, we have presented several approaches (without claiming completeness), which are methodologically very different. Maybe this book supported you in finding your own research area of interest, or you gained a better understanding of how your research is related to research in other subdisciplines, or it inspired you for new (possibly innovative) research projects – depending on your level of linguistic education and empirical experience. When all is said and done, empiricism cannot be learned theoretically. It is a learning process that requires practical experience. With each empirical project, you will gain more experience, which will again be helpful for further projects. Often, you actually learn the most by trial and error. So, this book cannot replace practical experience, but we hope that it is helpful in planning an empirical research project – to identify the research components and parameters, to consider at least some potential challenges, and to evaluate pros and cons of methodological options. With this book laying down the foundations of empirical linguistics, we strongly encourage you to design and conduct your own empirical research projects. Our recommendations for exercises and student projects may be a helpful starting point.
9.6
Exercises and Assignments
Exercises for students which can be included during a concluding session on linguistic research across the discipline or as part of project work: 9.1
9.2
Reflect upon your own research project and how it relates to the various subdisciplines in linguistics. How do your findings contribute to answering the fundamental research questions in linguistics such as ‘How does language work’? Discuss how linguistic research connects to research findings in other scientific fields. Can you find specific examples?
279
280
l i n g u i st i c r e se a r c h a c ros s t h e di s c i p l i n e
a. What and how could researchers from other linguistic subdisciplines and/or other scientific disciplines contribute to a specific research topic? b. Find interdisciplinary research groups on a topic of your choice and analyse their collaboration in terms of their specific contributions. c. Write a proposal of about three to five pages for an interdisciplinary research group on your topic.
References
Abbuhl, Rebekha; Susan Gass; Alison Mackey. 2013. ‘Experimental research design’. In Robert Podesva & Devyani Sharma (eds). Research Methods in Linguistics. Cambridge: Cambridge University Press. 116–134. Abeillé, Anne. (ed.). 2003. Treebanks: Building and Using Parsed Corpora. Dordrecht: Springer. Adolphs, Svenja; Dawn Knight. 2010. ‘Building a spoken corpus. What are the basics?’ In A. O’Keeffe & M. McCarthy (eds). The Routledge Handbook of Corpus Linguistics. London: Routledge. 38–52. Agouri, Jo. 2010. ‘Quantitative, qualitative or both? Combining methods in linguistic research’. In Lia Litosseliti (ed.). 2010. Research Methods in Linguistics. London: Continuum. 29–45. Aguado, Karin. 2014. ‘Triangulation’. In Julia Settinieri, Sevilen Demirkaya, Alexis Feldmeier, Nazan Gültekin-Karakoç & Claudia Riemer (eds). Empirische Forschungsmethoden für Deutsch als Fremd- und Zweitsprache. Eine Einführung. Paderborn: UTB. 47–56. Ahearn, Laura. 2012. Living language: An introduction to Linguistic Anthropology. Malden, MA: Wiley-Blackwell. Ahlsén, Elisabeth. 2006. Introduction to Neurolinguistics. Amsterdam: Benjamins. Aijmer, Karin; Christoph Rühlemann (eds). 2017. Corpus Pragmatics: A Handbook. Cambridge: Cambridge University Press. Aikhenvald, Alexandra. 2007. ‘Linguistic fieldwork. Setting the scene’. Sprachtypologie und Universalienforschung, 60(1). 3–11. Aikhenvald, Alexandra; R. M. W. Dixon (eds). 2001. Areal Diffusion and Genetic Inheritance: Problems in Comparative Linguistics. Oxford: Oxford University Press. Aikhenvald, Alexandra; R. M. W. Dixon (eds). 2006. Grammars in Contact: A CrossLinguistic Typology. Oxford: Oxford University Press. Aikhenvald, Alexandra; R. M. W. Dixon (eds). 2017. The Cambridge Handbook of Linguistic Typology. Cambridge: Cambridge University Press. Akinlabi, Akinbiyi; Bruce Connell. 2008. ‘The Interaction of linguistic theory, linguistic description and linguistic documentation’. In O.-M. Ndimele, I. Udoh, & O. Anyanwu (eds). Critical Issues in the Study of Linguistics, Languages & Literatures in Nigeria: A Festschrift for Conrad Max Benedict Brann. Port Harcourt: Grand Orbit Communications Ltd. & Emhai Press. 571–589. Albert, Ruth; Cor Koster. 2002. Empirie in Linguistik und Sprachlehrforschung. Ein methodologisches Arbeitsbuch. Tübingen: Narr.
281
282
references
Albert, Ruth; Nicole Marx. 2010. Empirisches Arbeiten in Linguistik und Sprachlehrforschung. Tübingen: Narr. Alday, Phillip. 2019. ‘M/EEG analysis of naturalistic stories. A review from speech to language processing’. Language, Cognition and Neuroscience, 34 (4). 457–473. Alday, Phillip; Franziska Kretzschmar. 2019. ‘Speed-accuracy tradeoffs in brain and behavior. Testing the independence of P300 and N400 related processes in behavioral responses to sentence categorization’. Frontiers in Human Neuroscience, 13. 285. von Alemann, Heine. 19842. Der Forschungsprozess: Eine Einführung in die Praxis der empirischen Sozialforschung. Stuttgart: Teubner. Allan, Keith (ed.). 2016. The Routledge Handbook of Linguistics. London: Routledge. Allwood, Jens. 2008. ‘Multimodal corpora’. In Anke Lüdeling & Merjy Kytö (eds). Corpus Linguistics: An International Handbook. (HSK, 29.1). Berlin: de Gruyter. 207–225. Altmann, Gerry. 2001. ‘The language machine. Psycholinguistics in review’. British Journal of Psychology, 92. 129–170. Ameka, Felix; Alan Dench; Nicholas Evans (eds). 2006. Catching Language: The Standing Challenge of Grammar Writing. Berlin: Mouton de Gruyter. Ammon, Ulrich; Norbert Dittmar; Klaus Mattheier; Peter Trudgill (eds). 2002/2005/2006. Sociolinguistics: An International Handbook of the Science of Language and Society. (HSK, 3.1–3). Berlin: de Gruyter. Andric, Michael; Steven Small. 2015. ‘fMRI methods for studying the neurobiology of language under naturalistic conditions’. In Roel Willems (ed.). Cognitive Neuroscience of Natural Language Use. Cambridge: Cambridge University Press. 8–28. Arbib, Michael. 2015. ‘Neurolinguistics. A Cooperative computation perspective’. In Bernd Heine & Heiko Narrog (eds). The Oxford Handbook of Linguistic Analysis 2 ed. Oxford: Oxford University Press. 639–669. Aronoff, Mark; Janie Rees-Miller (eds). 2003. The Handbook of Linguistics. Malden, MA: Blackwell. Arppe, Antti; Juhani Järvikivi. 2007. ‘Every method counts. Combining corpus-based and experimental evidence in the study of synonymy’. Corpus Linguistics and Linguistic Theory, 3(2), 131–159. Atkinson, Paul; Amanda Coffey; Sara Delamont; John Lofland; Lyn Lofland (eds). 2001. Handbook of Ethnography. Thousand Oaks, CA: Sage Publications. Austin, Peter. 2007. ‘Training for language documentation. Experiences at the School of Oriental and African Studies’. In Victoria Rau & Margaret Florey (eds). Documenting and Revitalizing Austronesian languages. Honolulu, HI: University of Hawai‘i Press. 25–41. Austin, Peter. 2010. ‘Current issues in language documentation’. In Peter Austin (ed.). Language Documentation and Description, vol. 7. London: SOAS. 12–33. Austin, Peter; Andrew Simpson (eds). 2007. Endangered Languages. Hamburg: Helmut Buske Verlag. Austin, Peter; Julia Sallabank (eds). 2011. The Cambridge Handbook of Endangered Languages. Cambridge: Cambridge University Press. Baayen, Harald. 2008. Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Oxford: Oxford University Press.
References
Bailey, Guy; Jan Tillery. 1999. ‘The Rutledge Effect: The Impact of Interviewers on Survey Results in Linguistics’. American Speech, 74(4). 389–402. Bailey, Guy; Jan Tillery. 2004. ‘Some sources of divergent data in sociolinguistics’. In Ronald Macaulay & Carmen Fought (eds). Sociolinguistic Variation: Critical Reflections. Oxford: Oxford University Press. 11–30. Bakeman, Roger; Vincenc Quera. 2011. Sequential Analysis and Observational Methods for the Behavioral Sciences. Cambridge: Cambridge University Press. Baker, Paul. 2006. Using Corpora in Discourse Analysis. London: Continuum. Baker, Paul (ed.). 2009. Contemporary Corpus Linguistics. London: Continuum. Baker, Paul. 2010a. ‘Corpus methods in linguistics’. In Lia Litosseliti (ed.). 2010. Research Methods in Linguistics. London: Continuum. 93–113. Baker, Paul. 2010b. Sociolinguistics and Corpus Linguistics. Edinburgh: Edinburgh University Press. Baker, Paul; Jesse Egbert (eds). 2016. Triangulating Methodical Approaches in Corpus-Linguistic Research. London: Routledge. Bakker, Dik; Anna Siewierska. 1991. A Database System for Language Typology. (Working paper, 3). Amsterdam: Department of Linguistics, University of Amsterdam. Bakker, Dik; Östen Dahl; Martin Haspelmath; Maria Koptjevskaja-Tamm; Christian Lehmann & Anna Siewierska. 1993. Eurotyp Guidelines. (Working paper, 3). Strasbourg: Program in Language Typology, European Science Foundation. Ball, Martin (ed.). 2010. The Routledge Handbook of Sociolinguistics around the World. London: Routledge. Bargh, John; Mark Chen; Lara Burrows. 1996. ‘Automaticity of social behavior. Direct effects of trait construct and stereotype activation on action’. Journal of Personality and Social Psychology, 71(2). 230–244. Barnett, Lincoln. 1948. The universe of Dr. Einstein. Mineola: Dover Publications. Bates, Elizabeth; Stephen Wilson; Ayse Pinar Saygin; Frederic Dick; Martin Sereno; Robert Knight; Nina Dronkers. 2003. ‘Voxel-based lesion–symptom mapping’. Nature Neuroscience, 6(5). 448–450. Bayley, Robert; Richard Cameron; Ceil Lucas (eds). 2013. The Oxford Handbook of Sociolinguistics. Oxford: Oxford University Press. Beer, Bettina. 2008. ‘Systematische Beobachtung’. In Bettina Beer (ed.). Methoden ethnologischer Feldforschung. Berlin: Reimer Verlag. 167–189. Beer, Bettina (ed.). 2008. Methoden ethnologischer Feldforschung. Berlin: Reimer Verlag. Beißwenger Michael; Angelika Storrer. 2008. ‘Corpora of computer-mediated communication’. In Anke Lüdeling & Merja Kytö (eds). Corpus Linguistics: An International Handbook. (HSK, 29.1). Berlin: Mouton de Gruyter. 292–308. Bell, Alan. 1978. ‘Language samples’. In Joseph Greenberg (ed.). Universals of Human Language, vol. 1. Stanford, CA: Stanford University Press. 123–156. Bell, Allan. 2014. The Guidebook to Sociolinguistics. Malden, MA: Wiley. Bergh, Gunnar; Eros Zanchetta. 2008. ‘Web linguistics’. In Anke Lüdeling & Merja Kytö (eds). Corpus Linguistics: An International Handbook. (HSK, 29.1). Berlin: Mouton de Gruyter. 309–327.
283
284
references
Bergqvist, Henrik. 2007. ‘The role of metadata for translation and pragmatics in language documentation’. In Peter Austin (ed.). Language Documentation and Description, vol. 4. London: SOAS. 163–173. Berko, Jean. 1958. ‘The child’s learning of English morphology’. Word, 14 (2–3). 150–177. Berlin, Brent; Paul Kay. 1969. Basic Color Terms. Berkeley, CA: University of California Press. Berman, Ruth; Dan Slobin 1994. Relating Events in Narrative. A Crosslinguistic Developmental Study. New York: Psychology Press. Bernard, Russel (ed.). 20115. Research Methods in Anthropology: Qualitative and Quantitative Approaches. Lanham, MD: Alta Mira Press. Bhat, D. N. S. 2004. Pronouns. Oxford: Oxford University Press. Biber, Douglas. 1986. ‘Spoken and written textual dimensions in English. Resolving the contra- dictory findings’. Language, 62(2). 384–414. Biber, Douglas. 1988. Variation Across Speech and Writing. Cambridge: Cambridge University Press. Biber, Douglas. 1990. ‘Methodological issues regarding corpus-based analyses of linguistic variation’. Literary and Linguistic Computing, 5 (4). 257–269. Biber, Douglas. 1993. ‘Representativeness in corpus design’. Literary and Linguistic Computing, 8(4). 243–257. Biber, Douglas. 1995. Dimensions of Register Variation: A Cross-Linguistic Comparison. Cambridge: Cambridge University Press. Biber, Douglas; Edward Finegan. 1989. ‘Drift and the evolution of English style. A history of three genres’. Language 65(3). 487–517. Biber, Douglas; James K. Jones. 2009. ‘Quantitative methods in corpus linguistics’. In Anke Lüdeling & Merja Kytö (eds). Corpus Linguistics: An International Handbook. (HSK, 29.2). Berlin: Mouton de Gruyter. 1268–1304. Biber, Douglas; Randi Reppen (eds). 2015. The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press. Biber, Douglas; Susan Conrad; Randi Reppen. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press. Biber, Douglas; Stig Johansson; Geoffrey Leech; Susan Conrad; Edward Finegan. 1999. The Longman Grammar of Spoken and Written English. London: Longman. Bickel, Balthasar. 2011. ‘Statistical modeling of language universals’. Linguistic Typology, 15. 401–414. Bickel, Balthasar. 2014a. ‘Sprachliche Vielfalt im Wechselspiel von Natur und Kultur’. In Elvira Glaser, Agnes Kolmer, Martin Meyer & Elisabeth Stark (eds). Sprache(n) verstehgen: Eine interdisziplinäre Vorlesungsreihe. Zürich: vdf Hochschulverlag. 101–126. Bickel, Balthasar. 2014b. ‘Linguistic diversity and universals’. In Nick Enfield, Paul Kockelmann & Jack Sidnell (eds). The Cambridge Handbook of Linguistic Anthropology. Cambridge: Cambridge University Press. 101–124. Bickel, Balthasar. 20152. ‘Distributional typology. Statistical inquiries into the dynamics of linguistic diversity’. In Bernd Heine & Heiko Narrog (eds). The Oxford Handbook of Linguistic Analysis. Oxford: Oxford University Press. 901–923. Biemer, Paul; Lars Lyberg. 2003. Introduction to Survey Quality. Hoboken, NJ: John Wiley & Sons.
References
Bird, Sonya; Bryan Gick. 2006. ‘Phonetics. Field methods’. In Keith Brown (ed.). The Encyclopedia of Language and Linguistics, vol. 9. Oxford: Elsevier. 463–467. Bisang, Walter. 2004. ‘Dialectology and typology – An integrative perspective’. In Bernd Kortmann (ed.). Dialectology Meets Typology: Dialect Grammar from a Cross-linguistic Perspective. Berlin: Mouton de Gruyter. 11–45. Blanken, Gerhard; Jürgen Dittmann; Hannelore Grimm (eds). 1993. Linguistic Disorders and Pathologies (Handbücher zur Sprach- und Kommunikationswissenschaft / Handbooks of Linguistics and Communication Science. Berlin: de Gruyter. Blom, Elma; Sharon Unsworth (eds). 2010. Experimental Methods in Language Acquisition Research. Amsterdam: Benjamins. Blumenthal-Dramé, Alice. 2016. ‘What corpus-based Cognitive Linguistics can and cannot expect from neurolinguistics’. Cognitive Linguistics, 27(4). 493–505. Blume, María; Barbara Lust. 2016. Research Methods in Language Acquisition. Berlin: de Gruyter. Blumstein, Sheila. 1995. ‘The neurobiology of language’. In Joanne Miller & Peter Eimas (eds). Speech, Language, and Communication. New York: Academic Press. 339–370. Blumstein, Sheila; Emily Myers. 2013. ‘Neural systems underlying speech perception’. In Kevin Ochsner & Stephen Kosslyn (eds). Oxford Handbook of Cognitive Neuroscience, vol. 1. New York: Oxford University Press. 507–523. Bock, Kathryn. 1996. ‘Language production. Methods and methodologies’. Psychonomic Bulletin & Review, 3(4). 395–421. Bondi, Marina; Mike Scott (eds). 2010. Keyness in Texts. Amsterdam: Benjamins. Bonvillain, Nancy (ed.). 2015. The Routledge Handbook of Linguistic Anthropology. London: Routledge. Bornkessel, Ina; Matthias Schlesewsky; Angela D. Friederici. 2002. ‘Grammar overrides frequency. Evidence from the online processing of flexible word order’. Cognition, 85(2). B21–B30. Bornkessel-Schlesewky, Ina; Andrej Malchukov; Marc Richards (eds). 2015. Scales and Hierarchies: A Cross-disciplinary Perspective. Berlin: de Gruyter. Bornkessel-Schlesewsky, Ina; Matthias Schlesewsky. 2008. ‘An alternative perspective on “semantic P600” effects in language comprehension’. Brain Research Reviews, 59(1). 55–73. Bornkessel-Schlesewsky, Ina; Matthias Schlesewsky. 2009. Processing Syntax and Morphology: A Neurocognitive Perspective. Oxford: Oxford University Press. Bornkessel-Schlesewsky, Ina; Matthias Schlesewsky. 2016. ‘The importance of linguistic typology for the neurobiology of language’. Linguistic Typology, 20(3). 615–621. Bornkessel-Schlesewsky, Ina; Matthias Schlesewsky; Steven Small; Josef Rauschecker. 2015. ‘Neurobiological roots of language in primate audition. Common computational properties’. Trends in Cognitive Sciences, 19(3). 142–150. Bortz, Jürgen; Christof Schuster. 2010. Statistik für Human- und Sozialwissenschaftler. Berlin: Springer. Bouquiaux, Luc; Jacqueline Thomas. 1992. Studying and Describing Unwritten Languages. Dallas: SIL.
285
286
references
Bowern, Claire. 2008. Linguistic Fieldwork: A Practical Guide. New York: Palgrave Macmillan. Brennan, Jonathan. 2016. ‘Naturalistic sentence comprehension in the brain’. Language and Linguistics Compass, 10(7). 299–313. Brenzinger, Matthias (ed.). 2007. Language Diversity Endangered. Berlin: Mouton de Gruyter. Brezina, Vaclav. 2018. Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press. Brown, Colin; Peter Hagoort (eds). 1999. The Neurocognition of Language. Oxford: Oxford University Press. Brown, Penelope; Stephen Levinson. 1993. Linguistic and Nonlinguistic Coding of Spatial Arrays: Explorations in Mayan Cognition. (Working paper, 14). Nijmegen: Cognitive Anthropology Research Group, Max Planck Institute for Psycholinguistics. Brysbaert, Marc; Matthias Buchmeier; Markus Conrad; Arthur M. Jacobs; Jens Bölte; Andrea Böhl. 2011. ‘The word frequency effect. A review of recent developments and implications for the choice of frequency estimates in German’. Experimental Psychology, 58(5). 412–424. Bybee, Joan. 2010. Language, Usage and Cognition. Cambridge: Cambridge University Press. Bybee, Joan. 2011. ‘Markedness. Iconocity, economy and frequency’. In Jae Jung Song (ed.). The Oxford Handbook of Linguistic Typology. Oxford: Oxford University Press. 131–147. Byrman, Alan. 2006. ‘Integrating quantitative and qualitative research. How is it done’. Qualitative Research, 6(1). 97–113. Campbell, Donald; Julian Stanley. 1963. ‘Experimental and quasi-experimental designs for research’. In Donald Campbell, Julian Stanley, & Nathaniel Gage (eds). Handbook of Research on Teaching. Chicago, IL: Rand McNally. 171–246. Campbell, George. 1995. Concise Compendium of the World’s Languages. London: Routledge. Caplan, David. 1987. Neurolinguistics and Linguistic Aphasiology. (Cambridge studies in speech science and communication). Cambridge: Cambridge University Press. Caramazza, Alfonso. 1984. ‘The logic of neuropsychological research and the problem of patient classifications in aphasia’. Brain and Language, 21(1). 9–20. Caramazza, Alfonso. 1986. ‘On drawing inferences about the structure of normal cognitive systems from the analysis of impaired performance. The case for single patient studies’. Brain and Cognition, 5(1). 41–66. Carlson, Laura; Patrick Hill. 2007. ‘Experimental methods for studying language and space’. In Monica Gonzalez-Marquez, Irene Mittelberg, Seana Coulson, & Michael Spivey (eds). Methods in Cognitive Linguistics. Amsterdam: Benjamins. 250–276. Carreiras, Manuel; Charles Clifton Jr. 2004. The On-Line Study of Sentence Comprehension: Eyetracking, Erps and Beyond. New York: Psychology Press. Casasanto, Daniel. 2017. ‘Relationships between language and cognition’. In Barbara Dancygier (ed.). Handbook of Cognitive Linguistics. Cambridge: Cambridge University Press. 19–37.
References
Chelliah, Shobhana; Willem de Reuse. 2011. Handbook of Descriptive Linguistic Fieldwork. Dordrecht: Springer. Chomsky, Noam. 1986. Knowledge of Language: Its Nature, Origin, and Use. New York: Praeger. Clark, Eve. 2009. First Language Acquisition. Cambridge: Cambridge University Press. Clark, Eve. 2016. Language in Children. London: Routledge. Clark, Herbert. 1973. ‘The language-as-fixed-effect fallacy. A critique of language statistics in psychological research’. Journal of Verbal Learning and Verbal Behavior, 12. 335–359. Cohen, Henri; Claire Lefebvre (eds). 2005. Handbook of Categorization in Cognitive Science. Amsterdam: Elsevier. Cohen, Laurent; Stanislas Dehaene; Lionel Naccache; Stéphane Lehéricy; Ghislaine DehaeneLambertz; Marie-Anne Hénaff; François Michel. 2000. ‘The visual word form area. spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients’. Brain, 123(2). 291–307. Comrie, Bernard. 1981. Language Universals and Linguistic Typology. Oxford: Blackwell. Comrie, Bernard; Lucia Golluscio (eds). 2015. Language Contact and Documentation. Contacto lingüístico y documentación. Berlin: de Gruyter. Comrie, Bernard; Martin Haspelmath; Balthasar Bickel. 2015. Leipzig Glossing Rules: Conventions for Interlinear Morpheme-by-morpheme Glosses. [–www.eva .mpg.de/lingua/resources/glossing-rules.php]. Comrie, Bernard; Norval Smith. 1977. ‘Lingua descriptive studies’. Questionnaire: Lingua, 42(1). 11–71. Cordaro, Lucian; James Ison. 1963. ‘Psychology of the scientist: X. Observer bias in classical conditioning of the planarian’. Psychological Reports, 13. 787–789. Coslett, Branch. 2016. ‘Noninvasive brain stimulation in aphasia therapy. Lessons from TMS and tDCS’. In Gergory Hickok & Steven Small (eds). 2016. Neurobiology of Language. London: Academic Press. 1035–1154. Coulmas, Florian (ed.). 1998. Handbook of Sociolinguistics. Malden, MA: Blackwell. Coupland, Nikolas; Adam Jaworski (eds). 1997. Sociolinguistics: A Reader and Coursebook. New York: Palgrave Macmillan. Cowart, Wayne. 1997. Experimental Syntax: Applying Objective Methods to Sentence Judgements. Thousand Oaks, CA: Sage Publications. Creswell, John. 20093. Research Design: Qualitative, Quantitative and Mixed Methods. Thousand Oaks, CA: Sage Publications. Creswell, John; Vicki Plano Clark. 2017. Designing and Conducting Mixed Methods research. Thousand Oaks, CA: Sage Publications. Croft, William. 1990. Typology and Universals. Cambridge: Cambridge University Press. Croft, William; Alan Cruse. 2004. Cognitive Linguistics. Cambridge: Cambridge University Press. Croft, William; Keith Poole. 2008. ‘Inferring universals from grammatical variation. Multidimensional scaling for typological analysis’. Theoretical Linguistics, 34(1). 1–37. Crowley, Terry; Nick Thieberger. 2007. Field Linguistics: A Beginner’s Guide. Oxford: Oxford University Press.
287
288
references
Crystal, David. 2011. Internet Linguistics: A Student Guide. London: Routledge. Cysouw, Michael. 2003. The Paradigmatic Structure of Person Marking. Oxford: Oxford University Press. Cysouw, Michael. 2005. ‘Quantitative methods in typology’. In R. Köhler, G. Altmann & R. Piotrowski (eds). Quantitative Linguistics. (HSK, 27). Berlin: de Gruyter. 554–578. Cysouw, Michael; Bernhard Wälchli. 2007. ‘Parallel texts. Using translational equivalents in linguistic typology’. STUF – Sprachtypologie und Universalienforschung, 60(2). 95–99. Cysouw, Michael; Jan Wohlgemuth. 2010. ‘The other end of universals. Theory and typology of rara’. In Michael Cysouw & Jan Wohlgemuth (eds). Rethinking Universals. How Rarities Affect Linguistic Theory. Berlin: Mouton de Gruyter. 1–9. Daase, Andrea; Beatrix, Hinrichs; Julia, Settinieri. 2014. ‘Befragung’. In Julia Settinieri, Sevilen Demirkaya, Alexis Feldmeier, Nazan Gültekin-Karakoç, & Claudia Riemer (eds). Empirische Forschungsmethoden für Deutsch als Fremd- und Zweitsprache. Eine Einführung. Paderborn: UTB. 103–122. Dąbrowska, Ewa. 2014. ‘Words that go together: Measuring individual differences in native speakers’ knowledge of collocations’. The Mental Lexicon, 9(3). 401–418. Dabrowska, Ewa; Dagmar Divjak (eds). 2015. Handbook of Cognitive Linguistics. (HSK, 39) Berlin: Mouton de Gruyter. Dancygier, Barbara (ed.). 2017. The Cambridge Handbook of Cognitive Linguistics. Cambridge: Cambridge University Press. Davies, Mark; Dee Gardner. 2013. A Frequency Dictionary of Contemporary American English: Word Sketches, Collocates and Thematic Lists. London: Routledge. Davies, Martin. 2007. Doing A Successful Research Project: Using Qualitative or Quantitative Methods. Basingstoke: Palgrave Macmillan. Dehaene, Stansilas; Laurent Cohen; Mariano Sigman; Fabien Vinckier. 2005. ‘The neural code for written words. A proposal’. Trends in Cognitive Sciences, 9(7). 335–341. Denzin, Norman. 1970. The Research Act. Chicago, IL: Aldine Transaction. DeWalt, Kathleen; Billie DeWalt. 20112. Participant observation: A Guide for Fieldworkers. Lanham, MD: Alta Mira Press. Diemer, Stefan. 2011. ‘Corpus linguistics with Google?’. Proceedings of the ISLE 2 Boston 2008. Diessel, Holger. 2009. ‘Corpus linguistics and first language acquisition’ In In Anke Lüdeling & Merjy Kytö (eds). Corpus Linguistics. An International Handbook (HSK, 29.2). Berlin: de Gruyter. 1197–1211. Dietrich, Rainer; Johannes Gerwien. 2017. Psycholinguistik: Eine Einführung. Stuttgart: Springer. Dirven, René; Marjolijn Verspoor (eds). 1998/20042. Cognitive Exploration of Language and Linguistics. Amsterdam: Benjamins. Divjak, Dagmar. 2008. ‘On (in)frequency and (un)acceptability’. In Barbara Lewandowska-Tomaszczyk (ed.). Corpus Linguistics, Computer Tools and Applications – State of the Art. Frankfurt: Peter Lang. 213–233. Dixon, Robert. 1980. The Languages of Australia. Cambridge: Cambridge University Press.
References
Dixon, Robert. 2007. ‘Field linguistics. A minor manual’. Sprachtypologie und Universalienforschung, 60 (1). 12–31. Dixon, Robert. 2010 & 2012. Basic Linguistic Theory, vol. 1–3. Oxford: Oxford University Press. Donders, Franciscus. 1969. ‘On the speed of mental processes’. Acta Psychologica, 30. 412–431. Doyle, Louise; Anne-Marie Brady; Gobnait Byrne. 2009. ‘An overview of mixed methods research’. Journal of Research in Nursing, 14. 175–185. Drager, Katie. 2018. Experimental Research Methods in Sociolinguistics. London: Bloomsbury. Dronkers, Nina F. 2000. ‘The pursuit of brain–language relationships’. Brain and Language, 71 (1). 59–61. Dryer, Matthew. 1989. ‘Large linguistic areas and large sampling’. Studies in Language, 13. 257–292. Dryer, Matthew. 1992. ‘The Greenbergian word order correlations’. Language, 68. 81–138. Dryer, Matthew; Martin Haspelmath (eds). 2013. The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. [http://wals.info]. Duncan, Susan; Sarah Tune; Steven Small. 2016. ‘The neurobiology of language. Relevance to linguistics’. Yearbook of the Poznan Linguistic Meeting, 2 (1). 49–66. Duranti, Alessandro. 1997. Linguistic Anthropology. Cambridge: Cambridge University Press. Duranti, Alessandro (ed.). 2001. Key Terms in Language and Culture. Malden, MA: Blackwell. Duranti, Alessandro (ed.). 2004. A Companion to Linguistic Anthropology. Malden, MA: Blackwell. Duranti, Alessandro (ed.). 20092. Linguistic Anthropology. A Reader. Malden, MA: Wiley-Blackwell. Durkheim, Émile. 2006 (1897). On Suicide. (translated by Robin Buss). London: Penguin Books Ltd. Dürr, Michael; Peter Schlobinski. 20063. Deskriptive Linguistik. Grundlangen und Methoden. Göttingen: Vandenhoeck & Ruprecht. Eckert, Penelope. 2000. Linguistic Variation as Social Practice. Malden, MA: Blackwell. Eckert, Penelope. 2014. ‘Sociolinguistics. Making quantification meaningful’. In Nick Enfield, Paul Kockelman & Jack Sidnell (eds). The Cambridge Handbook of Linguistic Anthropology. Cambridge: Cambridge University Press. 644–660. Eckert, Penelope; Sally McConnell-Ginet. 2003. Language and Gender. Cambridge: Cambridge University Press. Eddington, David. 2015. Statistics for Linguists: A Step-by-step Guide for Novices. Newcastle upon Tyne, UK: Cambridge Scholars Publishing. Eisenbeiss, Sonja. 2010. ‘Production methods in language acquisition research’. In Elma Blom & Sharon Unsworth (eds). Experimental Methods in Language Acquisition Research, vol. 27. Amsterdam: Benjamins. 11–34. Ellis, Nick C. 2002. ‘Frequency effects in language processing. A review with implications for theories of implicit and explicit language acquisition’. Studies in Second Language Acquisition, 24(2). 143–188.
289
290
references
Ellis, Nick C.; Matthew Brook O’Donnell; Ute Römer. 2014. ‘The processing of verbargument constructions is sensitive to form, function, frequency, contingency and prototypicality’. Cognitive Linguistics, 25(1). 55–98. Embick, David; David Poeppel. 2015. ‘Towards a computational(ist) neurobiology of language. Correlational, integrated and explanatory neurolinguistics’. Language, Cognition and Neuroscience, 30(4). 357–366. Emmorey, Karen. 2001. Language, Cognition, and the Brain. New York: Psychology Press. Ender, Andrea; Adrian Leemann; Bernhard Wälchli (eds). 2012. Methods in Contemporary Linguistics. Berlin: Mouton de Gruyter. Enfield, Nick; Paul Kockelman; Jack Sidnell (eds). 2014. The Cambridge Handbook of Linguistic Anthropology. Cambridge: Cambridge University Press. Evans, Nicholas; Stephen Levinson. 2009. ‘The myth of language universals. Language diversity and its importance for cognitive science’. Behavioral and Brain Sciences, 32. 429–448. Evans, Vyvyan. 2007. A Glossary of Cognitive Linguistics. Edinburgh: Edinburgh University Press. Evans, Vyvyan; Benjamin Bergen; Jörg Zinken (eds). 2007. The Cognitive Linguistics Reader. London: Equinox. Evans, Vyvyan; Melanie Green. 2006. Cognitive Linguistics. An Introduction. Edinburgh: Edinburgh University Press. Evans, Vyvyan; Stéphanie Pourcel (eds). 2009. New Directions in Cognitive Linguistics. Amsterdam: Benjamins. Everett, Daniel. 2001. ‘Monolingual field research’. In Paul Newman & Martha Ratliff (eds). Linguistic Fieldwork. Cambridge: Cambridge University Press. 166–188. Everett, Daniel. 2013. Monolingual Fieldwork Demonstration. [www.youtube.com/ watch?v=sYpWp7g7XWU]. Evison, Jane. 2010. ‘What are the basics of analyzing a corpus?’ In A. O’Keeffe & M. McCarthy (eds). The Routledge Handbook of Corpus Linguistics. London: Routledge. 122–135. Faust, Miriam. 2012. The Handbook of the Neuropsychology of Language. Malden, MA: Wiley Blackwell. Fedorenko, Evelina; Sharon L. Thompson-Schill. 2014. ‘Reworking the language network’. Trends in Cognitive Sciences, 18(3). 120–126. Fernald, Anne; Renate Zangl; Ana Portillo; Virginia Marchman. 2008. ‘Looking while listening. Using eye movements to monitor spoken language comprehension by infants and young children’. In Irina Sekerina, Eva Fernández & Harald Clahsen (eds). Developmental Psycholinguistics. On-line Methods in Children’s Language Processing. Amsterdam: Benjamins. 97–135. Fernández, Eva; Helen Smith Cairns. 2010. Fundamentals of Psycholinguistics. Malden, MA: Wiley-Blackwell. Fernández, Eva; Helen Smith Cairns (eds). 2018. The Handbook of Psycholinguistics. Malden, MA: Wiley-Blackwell. Field, Andy. 20185. Discovering Statistics Using SPSS. Thousand Oaks, CA: Sage Publications. Field, Andy; Jeremy Miles; Zoe Field. 2012. Discovering Statistics Using R. Thousand Oaks, CA: Sage Publications.
References
Fischer, Hans (ed.). 20022. Feldforschungen: Erfahrungsberichte zur Einführung. Berlin: Reimer Verlag. Fletcher, William. 2001. ‘Concordancing the web with KWIC finder’. Third North American Symposium on Corpus Linguistics and Language Teaching. Boston, MA. Fletcher, William. 2004. ‘Making the web more useful as a source for linguistic corpora’. In Ulla Connor & Thomas Upton (eds). Applied Corpus Linguistics: A Multidimensional Perspective. Amsterdam: Brill Rodopi. 191–205. Foley, William. 1997. Anthropological linguistics. Malden, MA: Blackwell. Foley, William. 2002. ‘Field methods’. In Kirsten Malmkjær (ed.). The Linguistics Encyclopedia. 131–137. Friederici, Angela. 2011. ‘The brain basis of language processing. From structure to function’. Physiological Reviews, 91(4). 1357–1392. Friederici, Angela. 2012. ‘The cortical language circuit. From auditory perception to sentence comprehension’. Trends in Cognitive Sciences, 16(5). 262–268. Friederici, Angela. 2017. Language in Our Brain: The Origins of a Uniquely Human Capacity. Cambridge, MA: MIT Press. Friederici, Angela; Guillaume Thierry (eds). 2008. Early Language Development. Bridging Brain and Behavior. Amsterdam: Benjamins. Friginal, Eric; Jack Hardy. 2014. Corpus-Based Sociolinguistics: A Guide for Students. London: Routledge. García, Ofelia; Nelson Flores; Massimiliano Spotti (eds). 2016. The Oxford Handbook of Language and Society. Oxford: Oxford University Press. Gaskell, Gareth (ed.). 2007. The Oxford Handbook of Psycholinguistics. Oxford: Oxford University Press. Gass, Susan. 2010. ‘Experimental research’. In Brian Paltridge & Aek Phakiti (eds). Continuum Companion to Research Methods in Applied Linguistics. London: Continuum. 7–21. Gatto, Maristella. 2014. Web as Corpus: Theory and Practice. London: Bloomsbury. Gazzaniga, Michael; Richard Ivry; George Mangun. 20144. Cognitive Neuroscience: The Biology of the Mind. New York: Norton. Geeraerts, Dirk (ed.). 2006. Cognitive Linguistics: Basic Readings. Berlin: Mouton de Gruyter. Geeraerts, Dirk; Hubert Cuyckens (eds). 2007. The Oxford Handbook of Cognitive Linguistics. New York: Oxford University Press. Gellerstam, Martin. 1992. ‘Modern Swedish text corpora’. In Jan Svartvik (ed.). Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4–8 August 1991. Berlin: de Gruyter. 149–163. Gennari, Silvia; Maryellen MacDonald. 2009. ‘Linking production and comprehension processes. The case of relative clauses’. Cognition, 111(1). 1–23. Gentner, Dedre; Susan Goldin-Meadow (eds). 2003. Language in Mind: Advances in the Study of Language and Thought. Cambridge, MA: MIT Press. Gernsbacher, Morton Ann (ed.). 1994. Handbook of Psycholinguistics. San Diego, CA: Academic Press. Geschwind, Norman. 1970. ‘The organization of language and the brain. Language disorders after brain damage help in elucidating the neural basis of verbal behavior’. Science, 170. 940–944.
291
292
references
Gilquin, Gaëtanelle; Stefan Gries. 2009. ‘Corpora and experimental methods. A state-ofthe-art review’. Corpus Linguistics and Linguistic Theory, 5(1). 1–26. Gippert, Jost; Nikolaus Himmelmann; Ulrike Mosel (eds). 2006. Essentials of Language Documentation. Berlin: Mouton de Gruyter. Gleason, Henry. 1961. An Introduction to Descriptive Linguistics. London: Holt, Rinehart & Winston. Goldberg, Adele. 1995. Constructions. A Construction Grammar Approach to Argument Structure. Chicago, IL: University of Chicago Press. Goldrick, Matthew; Victor Ferreira; Michele Moizzo (eds). 2015. The Oxford Handbook of Language Production. Oxford: Oxford University Press. Gonzalez-Marquez, Monica; Irene Mittelberg; Seana Coulson; Michael Spivey (eds). 2007. Methods in Cognitive Linguistics. Amsterdam: Benjamins. Granger, Sylviane. 2008. ‘Learner corpora’. In Anke Lüdeling & Merja Kytö (eds). Corpus Linguistics: An International Handbook. (HSK, 29.1). Berlin: Mouton de Gruyter. 259–275. Grant, Tim. 2017. Quantitative Research Methods for Linguists: A Questions and Answers Approach For Students. London: Routledge. Greenberg, Joseph. 1963 (1966). ‘Some universals of grammar with particular reference to the order of meaningful elements’. In Joseph Greenberg (ed.). Universals of Language. Cambridge, MA: MIT Press. 73–113. Greene, Jennifer C.; Valerie J. Caracelli; Wendy F. Graham. 1989. ‘Toward a conceptual framework for mixed-method evaluation designs’. Educational Evaluation and Policy Analysis, 11(3). 255–274. Grein, Marion. 2007. Kommunikative Grammatik im Sprachvergleich. Sprechaktsequenz Direktiv und Ablehnung im Deutschen und Japanischen. Tübingen: Niemeyer. Grenoble, Lenore. 2007. ‘The importance and challenges of documenting pragmatics’. In Peter Austin (ed.). Language Documentation and Description, vol. 4. London: SOAS. 145–162. Grenoble, Lenore; Lindsay Whaley. 2006. Saving Languages: An Introduction to Language Revitalization. Cambridge: Cambridge University Press. Grenoble, Lenore; Louanna Furbee (eds). 2010. Language Documentation. Practice and Values. Amsterdam: Benjamins. Gries, Stefan. 2009. Quantitative Corpus Linguistics with R: A Practical Introduction. London: Routledge. Gries, Stefan. 2012. ’Corpus linguistics, theoretical linguistics, and cognitive/psycholinguistics. Towards more and more fruitful exchanges‘. In Joybrato Mukherjee & Magnus Huber (eds). Corpus Linguistics and Variation in English: Theory and Description. Amsterdam: Brill Rodopi. 41–63. Gries, Stefan. 20132a. Statistics for Linguists with R: A Practical Introduction. Berlin: Mouton de Gruyter. Gries, Stefan. 2013b. ‘Corpus linguistics. Quantitative methods’. In Carol Chapelle (ed.). The Encyclopedia of Applied Linguistics. Malden, MA: WileyBlackwell. 1380–1385. Gries, Stefan; Anatol Stefanowitsch. 2004a. ‘Extending collostructional analysis. A corpus-based perspective on “alternations”’. International Journal of Corpus Linguistics, 9(1), 97–129.
References
Gries, Stefan; Anatol Stefanowitsch. 2004b. ‘Co-varying collexemes in the into-causative’. In Michale Achard & Suzanne Kemmer (eds). Language, Culture, and Mind. Stanford, CA: CSLI. 225–236. Griffin, Zenzi. 2004. ‘Why look? Reasons for eye movements related to language production’. In John Henderson & Fernanda Ferreira (eds). The Integration of Language, Vision, and Action: Eye Movements and the Visual World. New York: Taylor and Francis. 213–247. Griffin, Zenzi; Kathryn Bock. 2000. ‘What the eyes say about speaking’. Psychological Science, 11. 274–279. Grimes, Barbara (ed.). 1997. Ethnologue: Language Family Index. Dallas: Summer Institute of Linguistics. (see Simons & Fennig 201720 for online reference) de Groot, Annette; Peter Hagoort (eds). 2018. Research Methods in Psycholinguistics and the Neurobiology of Language: A Practical Guide. Hoboken, NJ: John Wiley & Sons. Gumperz, John; Marco Jacquemet. 2006. New Ethnographies of Communication. Malden, MA: Wiley-Blackwell. Gumperz, John; Stephen Levinson (eds). 1996. Rethinking Linguistic Relativity. New York: Cambridge University Press. Hagoort, Peter. 2019a. ‘The neurobiology of language beyond single-word processing’. Science, 366(6461). 55–58. Hagoort, Peter (eds). 2019b. Human Language: From Genes and Brain to Behavior. Cambridge, MA: MIT Press. Hagoort, Peter, Colin Brown; Jolanda Groothusen. 1993. ‘The syntactic positive shift (SPS) as an ERP measure of syntactic processing’. Language and Cognitive Processes 8(4). 439–483. Haig, Geoffrey; Nicole Nau; Stefan Schnell; Claudia Wegener (eds). 2011. Documenting Endangered Languages. Berlin: Mouton de Gruyter. Haig, Geoffrey; Stefan Schnell. 2011. Annotations Using GRAID. [www.isfas.uni-kiel .de/de/linguistik/forschung/uploads/graid-content/graid-manual-6.0]. Haiman, John. 1983. ‘Iconic and economic motivation’. Language, 59. 781–819. Hamilton, Liberty; Alexander Huth. 2018. ‘The revolution will not be controlled. Natural stimuli in speech neuroscience’. Language, Cognition and Neuroscience. 1–10. Hanneman, Robert; Mark Riddle. 2005. Introduction to Social Network Methods. Riverside, CA: University of California. Harrison, David; Greg Anderson. 2008. The Linguists. Garrison, NY: Ironbound Films. Hashemi, Mohammad; Esmat Babaii. 2013. ‘Mixed methods research. Toward new research designs in applied linguistics’. The Modern Language Journal, 97 (4). 828–852. Haspelmath, Martin. 2003. ‘The geometry of grammatical meaning. Semantic maps and crosslinguistic comparison’. In Michael Tomacello (ed.). The New Psychology of Language, vol. 2. New York: Erlbaum. 211–243. Haspelmath, Martin; Ekkehard König; Wulf Oesterreicher; Wolfgang Raible (eds). 2001. Language Typology and Language Universals. An International Handbook. (HSK, 20.1–2). Berlin: de Gruyter. Hasson, Uri; Giovanna Egidi; Marco Marelli; Roel M. Willems. 2018. ‘Grounding the neurobiology of language in first principles. The necessity of non-languagecentric explanations for language comprehension’. Cognition, 180. 135–157.
293
294
references
Haviland, John. 1979. ‘Guugu-Yimidhirr brother-in-law language’. Language in Dociety, 8. 365–393. Hawkins, John. 1983. Word Order Universals. New York: Academic Press. Hawkins, John. 1999. ‘Processing complexity and filler-gap dependencies across grammars’. Language, 75 (2). 244–285. Hawkins, John. 2011. ‘Processing efficiency and complexity in typological patterns’. In Jae Jung Song (ed.). The Oxford Handbook of Linguistic Typology. Oxford: Oxford University Press. 206–226. Heine, Bernd; Heiko Narrog (eds). 2010. The Oxford Handbook of Linguistic Analysis. Oxford: Oxford University Press. Heller, Monica; Sari Pietikäinen; Joan Pujolar. 2017. Critical Sociolinguistic Research Methods. London: Routledge. Hellwig, Birgit. 2019. ‘Linguistic diversity, language documentation and psycholinguistics. The role of stimuli’. Language Documentation and Conservation, 16. 5–30. Henderson, John; Fernanda Ferreira (eds). 2004. The Interface of Language, Vision, and Action. Eye Movements and the Visual World. New York: Psychology Press. Henrich, Joseph; Steven Heine; Ara Norenzayan. 2010. ‘The Weirdest People in the World?’. Behavioral and Brain Sciences, 33 (2–3). 61–83. Hernández-Campoy, Juan. 2014. ‘Research methods in sociolinguistics’. AILA Review, 27. 5–29. Hernández-Campoy, Juan; Camilo Conde-Silvestre (eds). 2012. The Handbook of Historical Sociolinguistics. Malden, MA: Wiley-Blackwell. Hickey, Raymond (ed.). 2017. The Cambridge Handbook of Areal Linguistics. Cambridge: Cambridge University Press. Hickok, Gregory. 2009. ‘The functional neuroanatomy of language’. Physics of Life Reviews, 6(3). 121–143. Hickok, Gregory; David Poeppel. 2007. ‘The cortical organization of speech processing’. Nature Reviews Neuroscience, 8(5). 393–402. Hickok, Gregory; Steven Small (eds). 2016. Neurobiology of Language. London: Academic Press. Himmelmann, Nikolaus. 1998. ‘Documentary and descriptive linguistics’. Linguistics, 36. 161–195. Himmelmann, Nikolaus. 2006. ‘The challenges of segmenting spoken language’. In Jost Gippert, Nikolaus Himmelmann & Ulrike Mosel (eds). Essentials of Language Documentation. Berlin: Mouton de Gruyter. 253–274. Himmelmann, Nikolaus. 2012. ‘Linguistic data types and the interface between language documentation and description’. Language Documentation & Conservation, 6. 187–207. Hoff, Erika (eds). 2012. Research Methods in Child Language: A Practical Guide. Malden, MA: Wiley-Blackwell. Höhle, Barbara (ed.). 2010/20122. Psycholinguistik. Berlin: Akademie Verlag. Holmes, Janet. 20134. An Introduction to Sociolinguistics. London: Routledge. Holmes, Janet; Kirk Hazen (eds). 2014. Research Methods in Sociolinguistics: A Practical Guide. Malden, MA: Wiley-Blackwell. Hopper, Paul; Sandra Thompson. 1993. ‘Language universals, discourse pragmatics, and semantics’. Language Sciences, 15 (4). 357–376.
References
Hudson, Richard. 19962. Sociolinguistics. Cambridge: Cambridge University Press. Humboldt, Wilhelm. 1836. Über die Verschiedenheit des menschlichen Sprachbaues und ihren Einfluss auf die geistige Entwickelung des Menschengeschlechts. Berlin: Dümmler. Hundt, Marianne; Nadja Nesselhauf; Carolin Biewer (eds). 2007. Corpus Linguistics and the Web. (Language and Computers, 59). Amsterdam: Brill Rodopi. Hunston, Susan. 2002. Corpora in Applied Linguistics. Cambridge: Cambridge University Press. Hunston, Susan. 2008. ‘Collection strategies and design decisions’. In Anke Lüdeling & Merja Kytö (eds). Corpus Linguistics: An International Handbook. (HSK, 29.1). Berlin: Mouton de Gruyter. 154–168. Hussy, Walter; Margit Schreier; Gerald Echterhoff. 2013. Forschungsmethoden in Psychologie und Sozialwissenschaften für Bachelor. Berlin: Springer. Ingram, David. 1989. First Language Acquisition. Method, Description and eExplanation. Cambridge: Cambridge University Press. Ingram, John. 2007. Neurolinguistics. An Introduction to Spoken Language Processing and Its Disorders. New York: Cambridge University Press. Ivankova, Nataliya; John Creswell. 2009. ‘Mixed methods’. In Juanita Heigham & Robert Croker (eds). Qualitative Research in Applied Linguistics: A Practical Introduction. Houndmill: Palgrave Macmillan. 135–163. Jegerski, Jill. 2014. ‘Self-paced reading’. In Jill Jegerski & Bill VanPatten (eds). Research Methods in Second Language Psycholinguistics. London: Routledge. 20–49. Jegerski, Jill; Bill VanPatten (eds). 2014. Research Methods in Second Language Psycholinguistics. London: Routledge. Johnson, Elizabeth; Tania Zamuner. 2010. ‘Using infant and toddler testing methods in language acquisition research’. In Elma Blom & Sharon Unsworth (eds). Experimental Methods in Language Acquisition Research, vol. 27. Amsterdam: Benjamins. 73–94. Johnson, Keith. 2008. Quantitative Methods in Linguistics. Malden, MA: Wiley-Blackwell. Johnson, Mark; Michelle de Haan. 20154. Developmental Cognitive Neuroscience : An Introduction. London: John Wiley & Sons. Johnstone, Barbara. 1999. Qualitative Methods in Sociolinguistics. Oxford: Oxford University Press. Jones, Mari. 2019. Endangered Languages and New Technologies. Cambridge: Cambridge University Press. Jones, Mari; Sarah Ogilvie (eds). 2013. Keeping Languages Alive: Documentation, Pedagogy and Revitalization. Cambridge: Cambridge University Press. Kastenholz, Raimund. 2002. ‘Die monographische Feldforschung’. In Anne Storch & Rudolf Leger (eds). Die afrikanistische Feldforschung. Köln: Rüdiger Köppe Verlag. 57–75. Kay, Paul; Chad McDaniel. 1978. ‘The linguistic significance of the meanings of basic color terms’. Language, 54. 610–646. Keating, Gregory; Jill Jegerski. 2015. ‘Experimental designs in sentence processing research. A methodological review and user’s guide’. Studies in Second Language Acquisition, 37 (1). 1–32. Keck, Verena. 2008. Interdisziplinäre Projekte und Teamarbeit’. In Bettina Beer (ed.). Methoden ethnologischer Feldforschung. Berlin: Reimer Verlag. 255–275.
295
296
references
Kemmerer, David. 2015. Cognitive Neuroscience of Language. New York: Psychology Press. Kennedy, Graeme. 1998. An Introduction to Corpus Linguistics. London: Longman. Keppel, Geoffrey. 19913. Design and Analysis: A Researcher’s Handbook. Englewood Cliffs: Prentice Hall. Kerlinger, Fred N. 19863. Foundations of Behavioral research. New York: Holt, Rinehart and Winston. Kilgarriff, Adam. 2001a. ‘Comparing corpora’. International Journal of Corpus Linguistics 6 (1), 1–37. Kilgarriff, Adam. 2001b. ‘Web as corpus’. Proceedings of the Corpus Linguistics Conference (CL 2001), University Centre for Computer Research on Language Technical Paper, vol. 13. Lancaster University. 342–344. Kilgarriff, Adam; Gregory Grefenstette. 2003. ‘Introduction to the special issue on the web as corpus’. Computational Linguistics, 29 (3). 333–47. Kirk, Roger E. 2003. ‘Experimental design’. In John Schinka & Waye Velicer (eds). Handbook of Psychology, vol. 2 (Research methods in psychology). Hoboken, NJ: John Wiley & Sons. 3–32. Kittredge, Audrey; Gary Dell. 2016. ‘Learning to speak by listening. Transfer of phonotactics from perception to production’. Journal of Memory and Language, 89. 8–22. Klamer, Marian; Francesca Moro. 2020. ‘What is “natural” speech? Comparing free narratives and Frog stories in Indonesia’. Language Documentation and Conservation, 14. 238–313. Knecht, Stefan; B. Dräger; M. Deppe; L. Bobe; H. Lohmann; A. Flöel; B. Ringelstein; H. Henningsen. 2000. ‘Handedness and hemispheric language dominance in healthy humans’. Brain, 123 (12). 2512–2518. Köhler, Reinhard; Gabriel Altmann; Rajmund Piotrowski (eds). 2005. Quantitative Linguistik. (HSK, 27). Berlin: de Gruyter. Kövecses, Zoltán. 2006. Language, Mind, and Culture: A Practical Introduction. Oxford: Oxford University Press. Kövecses, Zoltán. 2010. Metaphor: A Practical Introduction. Oxford: Oxford University Press. Krakauer, John; Asif Ghazanfar; Alex Gomez-Marin; Malcolm MacIver; David Poeppel. 2017. ‘Neuroscience needs behavior. Correcting a reductionist bias’. Neuron, 93 (3). 480–490. Kretzschmar, Franziska; Matthias Schlesewsky; Adrian Staub. 2015. ‘Dissociating word frequency and predictability effects in reading. Evidence from coregistration of eye movements and EEG’. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(6). 1648–1662. Kretzschmar, Franziska; Svenja Völkel. 2021. Teaching Mamaterials for Introducing Linguistic Research. [https://doi.org/10.14618/ids-pub-10454]. Kretzschmar, Franziska; Phillip Alday. Submitted. ‘Principles of statistical analyses. Old and new tools’. In Mirko Grimaldi, Yuri Shtyrov & Elvira Brattico (eds). Language Electrified: Techniques, Methods, Applications and Future Perspectives in the Neurophysiological Investigation of Language. Berlin: Springer. Kristiansen, Gitte; Michel Achard; René Dirven; Francisco Ruiz de Mendoza Ibáñez (eds). 2006. Cognitive Linguistics: Current Applications and Future Perspectives. Berlin: Mouton de Gruyter.
References
Kristiansen, Gitte; René Dirven (eds). 2007. Cognitive Sociolinguistics. Berlin: Mouton de Gruyter. Krug, Manfred; Julia Schlüter (eds). 2013. Research Methods in Language Variation and Change. Cambridge: Cambridge University Press. Krug, Manfred; Julia Schlüter; Annette Rosenbach. 2013. ‘Introduction. Investigating language variation and change’. In Manfred Krug & Julia Schlüter (eds). Research Methods in Language Variation and Change. Cambridge: Cambridge University Press. 1–13. Kurath, Hans. 1949. Western Pennsylvania: A Word Geography of the Eastern United States. Ann Arbor, MI: University of Michigan Press. Kutas, Marta; Kara D. Federmeier. 2011. ‘Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP)’. Annual Review of Psychology, 62. 621–647. Kutas, Marta; Steven Hillyard. ‘Reading senseless sentences. Brain potentials reflect semantic incongruity’. Science, 207(4427). 203–205. Labov, William. 1972. Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press. Labov, William; Sharon Ash; Charles Boberg. 2006. The Atlas of North American English: Phonetics, Phonology and Sound Change. Berlin: Mouton de Gruyter. Lahaussois, Aimée; Marine Vuillermet. 2019. Methodological Tools for Linguistic Description and Typology. (Language documentation and Conservation, 16). Honolulu, HI: University of Hawai‘i Press. Lakoff, George. 1987. Women, Fire, and Dangerous Things: What Categories Reveal about the Mind. Chicago, IL: University of Chicago Press. Lakoff, George; Mark Johnson. 1980. Metaphors We Live By. Chicago, IL: University of Chicago Press. Lang, Kurt; Gladys Engel Lang. 1953. ‘The unique perspective of television and its effect’. American Sociological Review, 18(1). 3–12. Lang, Kurt; Gladys Engel Lang. 1973. ‘Mac Arthur Day in Chicago. Die Einseitigkeit des Fernsehens und ihre Wirkungen’. In Jörg Aufermann, Hans Bohrmann & Rolf Sülzer (eds). Gesellschaftliche Kommunikation und Information. Frankfurt: Fischer. 498–525. Langacker, Ronald. 2008. Cognitive Grammar: A Basic Introduction. Oxford: Oxford University Press. Leckey, Michelle; Kara D. Federmeier. 2020. ‘The P3b and P600 (s). Positive contributions to language comprehension,’ Psychophysiology, 57(7). e13351. Lee, David. 2001. Cognitive Linguistics: An Introduction. Oxford: Oxford University Press. Lee, David. 2010. ‘What corpora are available?’ In Anne O’Keeffe & Michael McCarthy (eds). The Routledge Handbook of Corpus Linguistics. London: Routledge. 107–121. Leech, Geoffrey. 1992. ‘Corpora and theories of linguistic performance’. In Jan Svartvik (ed.). Directions in Corpus Linguistics. Proceedings of Nobel Symposium 82, Stockholm, 4–8 August 1991. Berlin: de Gruyter. 105–122. Leech, Geoffrey. 2007. ‘New resources or just better old ones? The Holy Grail of representativeness’. In Marianne Hundt, Nadja Nesselhauf & Carolin
297
298
references
Biewer (eds). Corpus Linguistics and the Web (Language and Computers, 59). Amsterdam: Brill Rodopi. 133–149. Lehmann, Christian. 1980. ‘Aufbau einer Grammatik zwischen Sprachtypologie und Universalistik’. In Gunter Brettschneider & Christian Lehmann (eds). Wege zur Universalienforschung. Tübingen: Narr. 29–37. Lehmann, Christian. 1982. ‘Directions for interlinear morphemic translations’. Folia Linguistica, 16. 199–224. Lehmann, Christian. 1999. Documentation of Endangered Languages: A Priority Task for Linguistics. (Arbeitspapier, 1). Erfurt: Institut für Sprachwissenschaft. Lehmann, Christian. 2001. ‘Language documentation. A program’. In Walter Bisang (ed.). Aspects of Typology and Universals. Berlin: Akademie Verlag. 83–97. Lehmann, Christian. 2005. ‘Interlinear morphemic glossing’. In Geert Booij, Christian Lehmann, Joachim Mugdan & Stavros Skopeteas (eds). Morphology: An International Handbook on Inflection and Word Formation. (HSK, 17.2). Berlin: Mouton de Gruyter. 1834–1857. Lemnitzer, Lothar; Heike Zinsmeister. 20153. Korpuslinguistik. Eine Einführung. Tübingen: Narr. Levelt, Willem. 2014. A History of Psycholinguistics. The pre-Chomskyan Era. Oxford: Oxford University Press. Levinson, Stephen. 1983. Pragmatics. Cambridge: Cambridge University Press. Levinson, Stephen. 1996. ‘Frames of reference and Molyneux’s question. Crosslingguistic evidence’. In Paul Bloom, Merrill Garrett, Lynn Nadel & Mary Peterson (eds). Language and Space. Cambridge, MA: MIT Press. 109–169. Levshina, Natalia. 2015. How to do Linguistics with R. Data Exploration and Statistical Analysis. Amsterdam: Benjamins. Lew, Robert. 2009. ‘The web as corpus versus traditional corpora’. In Paul Baker (ed.). Contemporary Corpus Linguistics. London: Continuum. 289–300. Lieb, Hans-Heinrich; Sebastian Drude. 2000. Advanced Glossing: A Language Documentation Format. (DOBES working paper). [http://dobes.mpi.nl/docu ments/Advanced-Glossing1.pdf]. Lindlof, Thomas; Bryan Taylor. 20113. Qualitative Communication Research Methods. Thousand Oaks, CA: Sage Publications. Litosseliti, Lia (ed.). 2010. Research Methods in Linguistics. London: Continuum. Littlemore, Jeanette; John Taylor (eds). 2014. The Bloomsbury Companion to Cognitive Linguistics. London: Bloomsbury. Lloyd-Fox, Sarah; Anna Blasi; C. E. Elwell. ‘Illuminating the developing brain. The past, present and future of functional near infrared spectroscopy’. Neuroscience & Biobehavioral Reviews, 34 (3). 269–284. Louw, Bill. 1993. ‘Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies’. In Mona Baker, Gill Francis & Elena TogniniBonelli (eds). Text and Technology: In Honour of John Sinclair. Amsterdam: Benjamins. 157–176. Luck, Steven J. 20142. An Introduction to the Event-Related Potential Technique. Cambridge, MA: MIT Press. Luck, Steven J.; Emily S. Kappenman (eds). 2012. The Oxford Handbook of EventRelated Potential Components. Oxford: Oxford University Press.
References
Lüdeling, Anke. 2007. ‘Das Zusammenspiel von qualitativen und quantitativen Methoden in der Korpuslinguistik’. In Gisela Zifonun & Werner Kallmeyer (eds). IDS-Jahrbuch 2006. Berlin: de Gruyter. 28–48. Lüdeling, Anke. 2017. ‘Grammatische Variation. Empirische Zugänge und theoretische Modellierung’. In Marek Konopka & Angelika Wöllstein (eds). Jahrbuch des Instituts für Deutsche Sprache 2016. Berlin: de Gruyter. 129–144. Lüdeling, Anke; Merja Kytö (eds). 2008. Corpus Linguistics: An International Handbook. (HSK, 29.1–2). Berlin: Mouton de Gruyter. Lüpke, Frederike. 2005. ‘Small is beautiful. Contributions of field-based corpora to different linguistic disciplines, illustrated by Jalonke’. In Peter Austin (ed.). Language Documentation and Description, vol 3. London: SOAS. 75–105. MacDonald, Maryellen C.; Neil J. Pearlmutter; Mark S. Seidenberg. 1994. ‘The lexical nature of syntactic ambiguity resolution’. Psychological Review, 101(4). 676–703. MacWhinney, Brian; Andrej Malchukov; Edith Moravcsik (eds). 2014. Competing Motivations in Grammar, Acquisition, and Usage. Oxford: Oxford University Press. Maddieson, Ian. 2001. ‘Phonetic fieldwork’. In Paul Newman & Martha Ratliff (eds). Linguistic Fieldwork. Cambridge: Cambridge University Press. 211–229. Mairal, Ricardo; Juana Gil (eds). 2006. Linguistic Universals. Cambridge: Cambridge University Press. Majid, Asifa. 2012. ‘A guide to stimulus-based elicitation for semantic categories’. In Nicholas Thieberger (ed.). The Oxford Handbook of Linguistic Fieldwork. Oxford: Oxford University Press. 54–71. Malinowski, Bronislaw. 1954. Magic, Science and Religion and Other Essays. New York: Doubleday. Mallinson, Graham; Barry Blake (eds). 1981. Language Typology. Amsterdam: North Holland. Malmkjær, Kirsten (ed.). 20022. The Linguistics Encyclopedia. London: Routledge. Manning, Christopher; Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. Martin, Andrea; Brian McElree. 2008. ‘A content-addressable pointer mechanism underlies comprehension of verb-phrase ellipsis’. Journal of Memory and Language, 58. 879–906. Mayer, Mercer. 1969. Frog, Where Are You? New York: Dial Press. McConnell, Kyla; Alice Blumenthal-Dramé. 2019. ‘Effects of task and corpus-derived association scores on the online processing of collocations’. Corpus Linguistics and Linguistic Theory (ahead of print). [https://doi.org/10.1515/ cllt-2018–0030]. McEnery, Tony; Andrew Hardie. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press. McEnery, Tony; Andrew Wilson. 1996/20012. Corpus Linguistics: An Introduction. Edinburgh: Edinburgh University Press. McEnery, Tony; Richard Xiao; Yukio Tono. 2006. Corpus-Based Language Studies: An Advanced Resource Book. London: Routledge. Meindl, Claudia. 2011. Methodik für Linguisten: Eine Einführung in Statistik und Versuchsplanung. Tübingen: Narr.
299
300
references
Merrison, Andrew; Aileen Bloomer; Patrick Griffiths; Christopher Hall. 20142. Introducing Language in Use: A Coursebook. London: Routledge. Mesthrie, Rajend (ed.). 2011. The Cambridge Handbook of Sociolinguistics. Cambridge: Cambridge University Press. Meyer, Charles. 2002. English Corpus Linguistics: An Introduction. Cambridge: Cambridge University Press. Meyerhoff, Miriam. 2006. Introducing Sociolinguistics. London: Routledge. Meyerhoff, Miriam; Erik Schleef (eds). 2010. The Routledge Sociolinguistics Reader. London: Routledge. Miestamo, Matti; Dik Bakker; Antti Arpe. 2016. Sampling for variety’. Linguistic Typology, 20 (2). 233–296. Milroy, James. 1992. Linguistic Variation and Change. Oxford: Blackwell. Milroy, James; Lesley Milroy. 1985. ‘Linguistic change, social network and speaker innovation’. Journal of Linguistics, 21. 339–384. Milroy, Lesley. 19872. Language and Social Networks. Oxford: Blackwell. Milroy, Lesley; Matthew Gordon. 20032. Sociolinguistics: Method and Interpretation. Malden, MA: Blackwell. Miner, Horace. 1956. ‘Body ritual among the Nacirema’. American Anthropologist, 58. 503–507. Monaghan, Padraic; Caroline Rowland. 2016. ‘Combining language corpora with experimental and computational approaches for language acquisition research’. Language Learning, 67(S1). 14–39. Moosbrugger, Helfried; Augustin Kelava. 20122. Testtheorie und Fragebogenkonstruktion. Berlin: Springer. Moravcsik, Edith. 2013. Introducing Language Typology. Cambridge: Cambridge University Press. Morgan, Marcyliena. 2014. Speech Communities. Cambridge: Cambridge University Press. Mosel, Ulrike. 1987. Inhalt und Aufbau deskriptiver Grammatiken (How to write a grammar). (Arbeitspapier, 4). Köln: Institut für Sprachwissenschaft. Mosel, Ulrike. 2006. ‘Grammaticography. The art and craft of writing grammars’. In Felix Ameka, Alan Dench & Nicholas Evans (eds). Catching Language: The Standing Challenge of Grammar Writing. Berlin: Mouton de Gruyter. 41–68. Mosel, Ulrike. 2014. ‘Corpus linguistic and documentary approaches in writing a grammar of a previously undescribed language’. Language Documentation & Conservation, 8. 135–157. Moseley, Christopher. 2010. UNESCO Atlas of the World’s Languages in Danger. [www .unesco.org/culture/en/endangeredlanguages/atlas]. Müller, Horst. 2013. Psycholinguistik–Neurolinguistik. Paderborn: UTB. Müller, Nicole; Martin Ball (eds). 2013. Research Methods in Clinical Linguistics and Phonetics: A Practical Guide. Malden, MA: Wiley-Blackwell. Myers, Jerome; Arnold Well; Robert Lorch Jr. 2010. Research Design and Statistical Analysis. London: Routledge. Nelson, Mike. 2010. ‘Building a Written Corpus. What Are the Basics?’ In A. O’Keeffe & M. McCarthy (eds). The Routledge Handbook of Corpus Linguistics. London: Routledge. 53–65. Newman, Paul; Martha Ratliff (eds). 2001. Linguistic Fieldwork. Cambridge: Cambridge University Press.
References
Newmeyer, Frederick. 2005. Possible and Probable Languages: A Generative Perspective on Linguistic Typology. Oxford: Oxford University Press. Noonan, Michael. 2006. ‘Grammar writing for a grammar-reading audience’. Studies in Language, 30 (2). 351–365. Nooteboom, Sieb; Hugo Quené. 2007. ‘The SLIP technique as a window on the mental preparation of speech. Some methodological considerations’. In Maria-Josep Solé, Patrice Beddor & Manjari Ohala (eds). Experimental Approaches to Phonology. Oxford: Oxford University Press. 339–350. Norcliffe, Elisabeth; Alice Harris & Florian Jaeger. 2015. ‘Cross-linguistic psycholinguistics and its critical role in theory development. Early Beginnings and Recent Advances: Language, Cognition and Neuroscience, 30 (9). 1009–1032. Norcliffe, Elisabeth; Agnieszka Konopka; Penelope Brown; Stephen Levinson. 2015. ‘Word order affects the time course of sentence formulation in Tzeltal’. Language, Cognition and Neuroscience, 30 (9). 1187–1208. Oakes, Lisa. 2012. ‘Advances in eye tracking in infancy research’. Infancy, 17 (1). 1–8. O’Keeffe, Anne; Michael McCarthy. 2010. The Routledge Handbook of Corpus Linguistics. London: Routledge. O’Keeffe, Anne; Michael McCarthy; Ronald Carter. 2007. From Corpus to Classroom. Language Use and Language Teaching. Cambridge: Cambridge University Press. Orfanidou, Eleni; Bencie Woll; Gary Morgan (eds). 2015. Research Methods in Sign Language Studies: A Practical Guide. Malden, MA: Wiley-Blackwell. Ostler, Nicholas. 2008. ‘Corpora of less studied languages’. In Anke Lüdeling & Merjy Kytö (eds). Corpus Linguistics: An International Handbook. (HSK, 29.1). Berlin: de Gruyter. 457–483. Östmann, Jan-Ola. 1989. ‘Testing iconicity’. Belgian Journal of Linguistics, 4. 145–163. Palmer, Gary. 1996. Towards a Theory of Cultural Linguistics. Austin, TX: University of Texas Press. Paltridge, Brian; Aek Phakiti (eds). 2010. Continuum Ccampanion to Research Methods in Applied Linguistics. London: Continuum. Paulston, Christina; Richard Tucker (eds). 2003. Sociolinguistics: The Essential Readings. Malden, MA: Blackwell. Payne, Thomas. 1997. Describing Morphosyntax. A Guide for Field Linguists. Cambridge: Cambridge University Press. Payne, Thomas; David Weber (eds). 2007. Perspectives on Grammar Writing. Amsterdam: Benjamins. Pederson, Eric; Eve Danziger; David Wilkins; Stephen Levinson; Sotaro Kita; Gunter Senft. 1998. ‘Semantic typology and spatial conceptualization’. Language, 74. 557–589. Pereltsvaig, Aysa. 2012. Languages of the World. An Introduction. Cambridge: Cambridge University Press. Perkins, Revere. 1989. ‘Statistical techniques for determining language sample size’. Studies in Language, 13 (2). 293–315. Pickering, Martin; Victor Ferreira. 2008. ‘Structural priming. A critical review’. Psychological Bulletin, 134 (3). 427–459. Pickering, Martin; Simon Garrod. 2013. ‘An integrated theory of language production and comprehension’. Behavioral and Brain Sciences, 36. 329–392.
301
302
references
Pike, Kenneth. 19672. Language in Relation to a Unified Theory of Structure of Human Behavior. Den Haag: Mouton. Plank, Frans. 1995. ‘(Re-)Introducing Suffixaufnahme’. In Frans Plank (ed.). Double Case: Agreement by Suffixaufnahme. New York: Oxford University Press. 3–110. Podesva, Robert; Devyani Sharma (eds). 2013. Research Methods in Linguistics. Cambridge: Cambridge University Press. Poeppel, David. 2014. ‘The neuroanatomic and neurophysiological infrastructure for speech and language’. Current Opinion in Neurobiology, 28. 142–149. Poeppel, David; George R. Mangun; Michael S. Gazzaniga. 20206. The cognitive neurosciences. Cambridge, MA: MIT Press. Popper, Karl. 1963. Conjectures and Refutations: The Growth of Scientific Knowledge. London: Routledge & Kegan Paul. Popper, Karl. 1973. Objektive Erkenntnis: Ein evolutionärer Entwurf. Hamburg: Hoffmann und Campe. Porst, Rolf. 20144. Fragebogen. Ein Arbeitsbuch. Berlin: Springer. von Poser, Alexis; Anita von Poser (eds). 2017. Facets of Fieldwork: Essays in Honor of Jürg Wassmann. Heidelberg: Winter. Price, Cathy. 2012. ‘A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading’. NeuroImage, 62 (2). 816–847. Price, Cathy; Joseph Devlin. 2003. ‘The myth of the visual word form area’. NeuroImage, 19(3). 473–481. Price, Cathy J.; Joseph T. Devlin. 2011. ‘The interactive account of ventral occipitotemporal contributions to reading’. Trends in Cognitive Sciences, 15(6). 246–253. Puglielli, Annarita; Mara Frascarelli. 2011. Linguistic Analysis. From Data to Theory. Berlin: Mouton de Gruyter. Pütz, Martin; Marjolijn Verspoor (eds). 2000. Explorations in Linguistic Relativity. Amsterdam: Benjamins. Rad, Mostafa Salari; Alison Martingano; Jeremy Ginges. 2018. ‘Toward a psychology of Homo Sapiens. Making psychological science more representative of the human population’. Proceedings of the National Academy of Sciences, 115 (45). 11401–11405. Ramat, Paolo. 1987. Linguistic Typology. Berlin: Mouton de Gruyter. Rasinger, Sebastian. 20132. Quantitative Research in Linguistics: An Introduction. London: Bloomsbury. Rayner, Keith; Alexander Pollatsek; Jane Ashby; Charles Clifton Jr. 20122. The Psychology of Reading. New York: Psychology Press. Rayson, Paul. 2015. ‘Computational tools and methods for corpus compilation and analysis’. In Douglas Biber & Randi Reppen (eds). The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press. 32–49. Rea, Louis; Richard Parker. 20144. Designing and Conducting Survey Research. A Comprehensive Guide. San Francisco, CA: Jossey-Bass. Redford, Melissa (ed.). 2015. The Handbook of Speech Production. Malden, MA: WileyBlackwell. Reppen, Randi. 2010. ‘Building a corpus. What are the key considerations?’ In A. O’Keeffe & M. McCarthy (eds). The Routledge Handbook of Corpus Linguistics. London: Routledge. 31–37.
References
Ricart Brede, Julia. 2014. ‘Beobachtung’. In Julia Settinieri, Sevilen Demirkaya, Alexis Feldmeier, Nazan Gültekin-Karakoç & Claudia Riemer (eds). Empirische Forschungsmethoden für Deutsch als Fremd- und Zweitsprache: Eine Einführung. Paderborn: Schöningh. 59–76. Rickheit, Gert; Theo Herrmann; Werner Deutsch. 2003. Psycholinguistik. (HSK, 24). Berlin: Mouton de Gruyter. Rijkhoff, Jan; Dik Bakker. 1998. ‘Language sampling’. Linguistic Typology, 2. 263–314. Rijkhoff, Jan; Dik Bakker; Kees Hengeveld; Peter Kahrel. 1993. ‘A method of language sampling’. Studies in Language, 17 (1). 169–203. Roberts, Leah; Jorge González Alonso; Christos Pliatsikas; Jason Rothman. 2018. ‘Evidence from neurolinguistic methodologies. Can it actually inform linguistic/language acquisition theories and translate to evidence-based applications?’. Second Language Research, 34(1). 125–143. Robinson, Cinton; Karl Gadelii. 2003. Writing Unwritten Languages: A Guide to the Process. UNESCO Working Paper. Robinson, Peter; Nick Ellis (eds). 2008. Handbook of Cognitive Linguistics and Second Language Acquisition. London: Routledge. Romaine, Suzanne. 20002. Language in Society: An Introduction to Sociolinguistics. Oxford: Oxford University Press. Rösler, Frank 2011. Psychophysiologie der Kognition. Eine Einführung in die Kognitive Neurowissenschaft. Heidelberg: Springer/ Spektrum Akademischer Verlag. Rost, Detlef. 20072. Interpretation und Bewertung pädagogisch-psychologischer Studien. Weinheim: Beltz UTB. Rugg, Michael. 1999. ‘Functional neuroimaging in cognitive neuroscience’. In Colin Brown & Peter Hagoort (eds). 1999. The Neurocognition of Language. Oxford: Oxford University Press. 15–36. Rugg, Michael; Michael GH Coles (eds). 1995. Electrophysiology of Mind: EventRelated Brain Potentials and Cognition. Oxford University Press. Ruhlen, Merrit. 1987. A Guide to the World’s Languages, vol. 1. (Classification). London: Edward Arnold. Ruiz de Mendoza, Francisco; Sandra Peña Cervel (eds). 2005. Cognitive Linguistics. Internal Dynamics and Interdisciplinary Interaction. Berlin: Mouton de Gruyter. Sakel, Jeanette; Daniel Everett. 2012. Linguistic Fieldwork. Cambridge: Cambridge University Press. Saldaña, Johnny. 20132. The Coding Manual for Qualitative Researchers. Thousand Oaks, CA: Sage Publications. Salkind, Neil. (ed.) 2010. Encyclopedia of Research Design. Thousand Oaks, CA: Sage Publications. Salzmann, Zdenek; James Stanlaw; Nobuko Adachi. 1993/20187. Language, Culture, and Society: An Introduction to Linguistic Anthropology. London: Routledge. Sampson, Geoffrey. 2001. Empirical Linguistics. London: Continuum. Sandra, Dominiek; Jan-Ola Östman; Jef Verschueren (eds). 2009. Cognition and Pragmatics. Amsterdam: Benjamins. Sanz, Monserrat; Itziar Laka; Michael Tanenhaus (eds). 2013. Language Down the Garden Path: The Cognitive and Biological Basis for Linguistic Structures. Oxford: Oxford University Press.
303
304
references
Sassenhagen, Jona; Phillip Alday. 2016. ‘A common misapplication of statistical inference. Nuisance control with null-hypothesis significance tests’. Brain and Language, 162. 42–45. Scalise, Sergio; Elisabetta Magni; Antonietta Bisetto (eds). 2009. Universals of Language Today. Dordrecht: Springer. Schiller, Niels. 2012. ‘Experimental methods and designs to investigate phonological encoding of spoken language’. In Abigail Cohn, Cécile Fougeron & Marie Huffman (eds). The Oxford Handbook of Laboratory Phonology. Oxford: Oxford University Press. 562–572. Schilling-Estes, Natalie. 2013. Sociolinguistic Fieldwork. Cambridge: Cambridge University Press. Schlehe, Judith. 2008. ‘Qualitative ethnographische Interviewformen’. In Bettina Beer (ed.). Methoden ethnologischer Feldforschung. Berlin: Reimer Verlag. 71–93. Schlobinski, Peter. 1996. Empirische Sprachwissenschaft. Opladen: Westdeutscher Verlag. Schmidt, Jürgen; Joachim Herrgen (eds). 2001–2009. Digitaler Wenker-Atlas (DiWA). Marburg: Forschungszentrum Deutscher Sprachatlas. [www.regionalsprache.de/]. Schmidtke-Bode, Karsten; Natalia Levshina; Susanne Maria Michaelis; Ilja Seržant (eds). 2019. Explanation in Linguistic Typology: Diachronic Sources, Functional Motivations and the Nature of the Evidence. Berlin: Language Science Press. Schmitt, Cristina; Karen Miller. 2010. ‘Using comprehension methods in language acquisition research’. In Elma Blom & Sharon Unsworth (eds). Experimental Methods in Language Acquisition Research. Amsterdam: Benjamins. 35–56. de Schryver, Gilles-Maurice. 2002. ‘Web for/as corpus. A perspective for the African languages’. Nordic Journal of African Studies, 11(2). 266–282. Schultze-Berndt, Eva. 2006. ‘Linguistic annotation’. In Jost Gippert, Nikolaus Himmelmann & Ulrike Mosel (eds). Essentials of Language Documentation. Berlin: Mouton de Gruyter. 213–251. Schütze, Carson. 1996/20162. The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. Chicago, IL: University of Chicago Press/Berlin: Language Science Press. Scott, John. 20002. Social Network Analysis. A Handbook. Thousand Oaks, CA: Sage Publications. Scott, Mike; Chris Tribble. 2006. Textual Patterns. Key Words and Corpus Analysis in Language Education. Amsterdam: Benjamins. Sedivy, Julie. 2014. Language in Mind. An Introduction to Psycholinguistics. Sunderland, MA: Sinauer Associates. Seidenberg, Mark. S.; Maryellen C. MacDonald. 2018. ‘The impact of language experience on language and reading’. Topics in Language Disorders, 38(1). 66–83. Seifart, Frank. 2000. Grundfragen bei der Dokumentation bedrohter Sprachen. (Arbeitspapier, 36). Köln: Institut für Sprachwissenschaft. Sekerina, Irina; Eva Fernández; Harald Clahsen (eds). 2008. Developmental Psycholinguistics: On-line methods in Children’s Language Processing. Amsterdam: Benjamins. Senft, Gunter. 1994. ‘Ein Vorschlag, wie man standardisiert Daten zum Thema “Sprache, Kognition und Konzepte des Raumes” in verschiedenen Kulturen erheben kann’. Linguistische Berichte, 154. 413–429.
References
Senft, Gunter. 2010. The Trobriand Islanders’ Ways of Speaking. (Trends in Linguistics, Documentation, 27). Berlin: Mouton de Gruyter. Senft, Gunter. 2014. Understanding Pragmatics. London: Routledge. Settinieri, Julia; Sevilen Demirkaya; Alexis Feldmeier; Nazan Gültekin-Karakoç; Claudia Riemer (eds). 2014. Empirische Forschungsmethoden für Deutsch als Fremd- und Zweitsprache. Paderborn: Schöningh UTB. Shadish, William; Thomas Cook; Donald Campbell. 2002. Experimental and QuasiExperimental Designs for Generalized Causal Inference. Boston, MA: Houghton Mifflin Company. Sharifian, Farzad (ed.). 2015. The Routledge Handbook of Language and Culture. London: Routledge. Sharifian, Farzad. 2017. Cultural Linguistics: Cultural Conceptualisations and Language. Amsterdam: Benjamins. Sharoff, Serge. 2006a. ‘Open-source corpora. Using the net to fish for linguistic data’. International Journal of Corpus Linguistics, 11(4). 435–462. Sharoff, Serge. 2006b. ‘Creating general-purpose corpora using automated search engine queries’. In Marco Baroni & Silvia Bernardini (eds). WaCky! Working Papers on the Web as Corpus. Bologna: Gedit. 63–98. Shibatani, Masayoshi; Theodora Bynon (eds). 1995. Approaches to Language Typology. Oxford: Clarendon Press. Shopen, Timothy (ed.). 1985. Language Typology and Syntactic Description, vol. 1–3. Cambridge: Cambridge University Press. Sidnell, Jack; Tanya Stivers (eds). 2012. The Handbook of Conversation Analysis. Malden, MA: Wiley-Blackwell. Siewierska, Anna. 2004. Person. Cambridge: Cambridge University Press. Simons, Gary; Charles Fennig (eds). 201720. Ethnologue. Languages of the World. Dallas, TE: SIL International. [http://ethnologue.com]. Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Sinclair, John. 1996. ‘Preliminary recommendations on corpus typology’. Technical Report, Expert Advisory Group on Language Engineering Standards (EAGLES). Sinclair, John. 2004. Trust the Text. Language, Corpus and Discourse. London: Routledge. Slobin, Dan. 1996. ‘From “thought and language” to “thinking for speaking”’. In John Gumperz & Stephen Levinson (eds). Rethinking Linguistic Relativity. New York: Cambridge University Press. 70–96. Small, Steven L.; Gregory Hickok. 2016. ‘The neurobiology of language’. In Gregory Hickok & Steven L. Small (eds). Neurobiology of Language. Amsterdam: Elsevier. 3–9. Song, Jae Jung. 2001. Linguistic Typology: Morphology and Syntax. Harlow: Longman. Song, Jae Jung (ed.). 2011. The Oxford Handbook of Language Typology. Oxford: Oxford University Press. Speed, Laura; Ewelina Wnuk; Asifa Majid. 2018. ‘Studying psycholinguistics out of the lab’. In Annette de Groot & Peter Hagoort (eds). Research Methods in Psycholinguistics and the Neurobiology of Language: A Practical Guide. Malden, MA: Wiley-Blackwell. 190–207. Spivey, Michael; Ken McRae; Marc Joanisse (eds). 2012. The Cambridge Handbook of Psycholinguistics. Cambridge: Cambridge University Press.
305
306
references
Stefanowitsch, Anatol. 2020. Corpus Linguistics.: A Guide to the Methodology. Berlin: Language Science Press. Stefanowitsch, Anatol; Stefan Gries. 2003. ‘Collostructions. Investigating the interaction between words and constructions’. International Journal of Corpus Linguistics, 8(2). 209–43. Stemmer, Brigitte; Harry Whitaker (eds). 1998. Handbook of Neurolinguistics. San Diego: Academic Press. Stoll, Sabine. 2015. ‘Cross-linguistic approaches to language acquisition’. In Edith Bavin & Letitia Naigles (eds). The Cambridge Handbook of Child Language. Cambridge: Cambridge University Press. 89–104. Stubbs, Michael. 1995. ‘Collocations and semantic profiles. On the cause of the trouble with quantitative methods’. Function of Language 2 (1). 1–33. Stubbs, Michael. 2001. Words and Phrases: Corpus Studies of Lexical Semantics. Oxford: Blackwell. Szmrecsanyi, Benedikt. 2017. ‘Variationist sociolinguistics and corpus-based variationist linguistics. Overlap and cross-pollination potential’. The Canadian Journal of Linguistics, 62(4). 685–701. Tagliamonte, Sali. 2006. Analysing Sociolinguistic Variation. Cambridge: Cambridge University Press. Tannen, Deborah; Heidi Hamilton; Deborah Schiffrin (eds). 20152. The Handbook of Discourse Analysis. Malden, MA: Wiley-Blackwell. Tanner, Kerry. 20022. ‘Experimental research designs’. In Kirsty Williamson (ed.). Research Methods for Students, Academics and Professionals: Informaton Management and Systems. Wagga Wagga: Centre for Information Studies.125–146. Tashakkori, Abbas; John Creswell. 2007. ‘The new era of mixed methods’. Journal of Mixed Methods Research, 1(3). 3–7. Tashakkori, Abbas; Charles Teddlie. 1998. Mixed Methodology: Combining Qualitative and Quantitative Approaches. Thousand Oaks, CA: Sage Publications. Taylor, John. 1989/20033. Linguistic Categorization: Prototypes in Linguistic Theory. Oxford: Clarendon Press. Teubert, Wolfgang. 2005. ‘My version of corpus linguistics’. International Journal of Corpus Linguistics, 10(1). 1–13. Thieberger, Nicholas (ed.). 2012. The Oxford Handbook of Linguistic Fieldwork. Oxford: Oxford University Press. Thomason, Sarah. 2015. Endangered Languages. An Introduction. Cambridge: Cambridge University Press. Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam: Benjamins. Tomasello, Michael. 2003. Constructing a Language: A Usage-Based Theory of Language Acquisition. Cambridge, MA: Harvard University Press. Tomlin, Russell. 1986. Basic Word Order: Functional Principles. London: Croom Helm. Tosco, Mauro. 1994. ‘The historical syntax of East Cushitic. A first sketch’. In Thomas Bearth, Wilhelm Möhlig, Beat Sottas & Edgar Suter (eds). Perspektiven afrikanischer Forschung. Köln: Köppe. 415–440. Traxler, Matthew. 2012. Introduction to Psycholinguistics: Understanding Language Science. Malden, MA: Wiley-Blackwell.
References
Traxler, Matthew; Morton Gernsbacher (eds). 20062. Handbook of Psycholinguistics. Amsterdam: Elsevier. Tremblay, Pascale; Anthony Steven Dick. 2016. ‘Broca and Wernicke are dead, or moving past the classic model of language neurobiology’. Brain and Language, 162. 60–71. Trudgill, Peter. 20004. Sociolinguistics: An Introduction to Language and Society. London: Penguin. Tusting, Karin (ed.). 2020. The Routledge Handbook of Linguistic Ethnography. London: Routledge. Tylor, John. 2002. Cognitive Grammar. Oxford: Oxford University Press. Underhill, James. 2012. Ethnolinguistics and Cultural Concepts: Truth, Love, Hate and War. Cambridge: Cambridge University Press. Ungerer, Friedrich; Hans-Jörg Schmid. 1996/20062. An Introduction to Cognitive Linguistics. London: Pearson Longman. Vaux, Bert; Justin Cooper. 2003. Introduction to Linguistic Field Methods. München: LINCOM Europa. Velupillai, Viveka. 2012. An Introduction to Linguistic Typology. Amsterdam: Benjamins. Verhaar, John. 1995. Towards a Reference Grammar of Tok Pisin: An Experiment in Corpus Linguistics. Honolulu, HI: University of Hawai‘i Press. Vigneau, Mathieu ; V. Beaucousin; P. Y. Hervé; H. Duffau; F. Crivello; O. Houdé; B. Mazoyer; N. Tzourio-Mazoyer. 2006. ‘Meta-analyzing left hemisphere language areas. Phonology, semantics, and sentence processing’. NeuroImage, 30 (4). 1414–1432. Visser, Penny; Jon Krosnick; Paul Lavrakas. 2000. ‘Survey Research’. In C. Judd & H. Reis (eds). Research Methods in Social Psychology. New York: Cambridge University Press. 223–252. Voegelin, Charles; Florence Voegelin. 1977. Classification and Index of the World’s Languages. New York: Elsevier. Völkel, Svenja. 2010. Social Structure, Space and Possession in Tongan Culture and Language: An Ethnolinguistic Study. (Culture and language use, 2). Amsterdam: Benjamins. Völkel, Svenja. 2016. ‘Tongan-English language contact and kinship terminology’. World Englishes, 35 (2). 242–258. Völkel, Svenja. 2017. ‘Challenges and profits of interdisciplinary fieldwork in linguistic and cognitive anthropology’. In Alexis von Poser & Anita von Poser (eds). Facets of Fieldwork. Heidelberg: Winter. Wagner, Elvis. 2010. ‘Survey research’. In Brian Paltridge & Aek Phakiti (eds). Continuum Companion to Research Methods in Applied Linguistics. London: Continuum. 22–38. Wardhaugh, Ronald. 20065. An Introduction to Sociolinguistics. Malden, MA: Blackwell. Wei, Li; Melissa Moyer (eds). 2008. The Blackwell Guide to Research Methods in Bilingualism and Multilingualism. Malden, MA: Blackwell. Whaley, Lindsay. 1997. Introduction to Typology. The Unity and Diversity Of Language. Thousand Oaks, CA: Sage Publications.
307
308
references
Wierzbicka, Anna. 2003. Cross-Cultural Pragmatics. The Semantics of Human Interaction. Berlin: Mouton de Gruyter. Willems, Roel M. (ed.). 2005. Cognitive Neuroscience of Natural Language Use. Cambridge: Cambridge University Press. Willems, Roel M.; Alejandrina Cristia. 2018. ‘Hemodynamic methods: fMRI and fNIRS’. In Annette de Groot & Peter Hagoort (eds). 2018. Research Methods in Psycholinguistics and the Neurobiology Of Language: A Practical Guide. Hoboken, NJ: John Wiley & Sons. 266–287. Winford, Donald. 2003. An Introduction to Contact Linguistics. Malden, MA: Blackwell. Wolf, Hans-Georg; René Dirven; Rong Chen; Ning Yu; Birgit Smieja (eds). 2006. The Cognitive Linguistics Bibliography (CogBib). Berlin: Mouton de Gruyter. Wolff, Phillip; Kevin Holmes. 2011. ‘Linguistic relativity’. Cognitive Science, 2(3). 253–265. Woodbury, Anthony. 2011. ‘Language documentation’. In Peter Austin & Julia Sallabank (eds). The Cambridge Handbook of Endangered Languages. Cambridge: Cambridge University Press. 159–186. Woods, Anthony; Paul Fletcher; Arthur Hughes. 1986. Statistics in Language Studies. Cambridge: Cambridge University Press. Wray, Alison; Aileen Bloomer. 20133. Projects in Linguistics and Language Studies. London: Routledge. Wynne, Martin (ed.). 2005. Developing Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow Books. Xiao, Richard. 2008. ‘Well-known and influential corpora’. In Anke Lüdeling & Merja Kytö (eds). Corpus Linguistics: An International Handbook. (HSK, 29.1). Berlin: Mouton de Gruyter. 383–457. Xiao, Richard. 2015. ‘Collocation’ In Douglas Biber & Randi Reppen (eds). The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press. 106–124. de Zubicaray, Greig; Niels Schiller (eds). 2019. The Oxford Handbook of Neurolinguistics. New York: Oxford University Press.
Index
accessibility 26, 79, 116, 118, 129, 138, 144, 185, 277 analysis 7, 9–10, 13, 15, 17, 20, 24–25, 29, 32–37, 40, 42, 44–45, 46, 50, 52, 57, 60, 69–72, 75, 79, 82–83, 92–98, 101, 103, 127, 133, 138, 140, 144, 149, 154, 158, 161, 163, 166, 168, 171–172, 177, 183, 185–188, 191–192, 194–195, 202–204, 215, 218, 224, 229, 240–244, 246, 249, 251, 257–258, 267, 269, 272, 276, 278 corpus analysis See corpus qualitative analysis See qualitative research statistical analysis See statistics annotation 15, 30–32, 38, 40, 79–80, 82, 83, 93–94, 102–104, 137–141, 143–150, 158–159, 161–162, 164, 177, 179, 183 anonymity 14, 201, 278 anthropological linguistics 28, 48, 62, 71, 82, 129, 166–172, 183, 186, 188, 190–191, 193–194, 197, 227, 263, 266, 268, 270, 272, 274, 276, 278 aphasia 197, 233, 235–236, 255, 278 aphasiology 229, 234, 237, 258 applied linguistics 8, 71, 160–161, 164, 277 areal typology 107, 117, 121 authenticity 139, 156, 180 balance 23, 100, 129, 138–141, 144–145, 148, 156–158, 160, 162, 187, 279 bias 4, 12–13, 23–24, 30, 37, 47, 50–51, 57–60, 62, 65, 68, 113–118, 120, 156, 158–159, 174, 177–178, 201, 206, 220, 223, 237, 266, 269–270, 272 personal bias 4, 12, 27, 47, 50, 52 reciprocal effect 12 response bias 59 sampling bias 58, 239–240 bottom-up approach 137 carryover effects 58, 68, 205 causality 16, 33–34, 61, 73, 189, 229, 250–252, 254–257 coding scheme 47, 51, 72, 74, 148, 150
cognitive linguistics 62, 66, 71, 129, 159, 161, 194–200, 203, 205, 218, 224–225, 227, 268, 272, 276 cognitive neuroscience 232, 234, 236, 248–249, 258 collocation 133, 151–154, 160, 162 collostructional analysis 154 community of practice 184, 186–187, 192 computational linguistics 153, 160–161 conceptualisation 189, 195, 197, 218, 224, 226 concordancing 148–149, 153, 155 Key Word In Context (KWIC) concordancing 148 confidentiality 14, 201, 278 controllability 11, 62, 66–69 corpus 7, 24, 71, 82–83, 92–95, 97–98, 103, 122, 134–136, 138–145, 155–159, 161–164, 171, 179, 192, 203, 266, 270, 274–275, 277–278 analysis 83, 94, 97, 101, 136–138, 141, 143–145, 147–155, 159–161, 163–164, 170, 186, 188, 193, 212, 215, 267, 272 compilation 79, 133, 137–141, 148, 156, 158–159, 162, 164 editing See annotation, tagging, lemmatisation linguistics 28, 30, 133–137, 158–159, 161–164, 215, 223, 263, 268–269, 271–272, 277 retrieval 138, 140, 147–151 typology 138, 141, 159, 162 correlation 16, 33–35, 61–62, 65, 110–111, 117, 120, 123–124, 126, 166, 170–172, 190, 192, 204, 219–220, 223, 240–241, 249–250, 252–254, 257, 272–273, 278 correlation design 241 cross-sectional design 51 data types 7, 9–10, 27–28, 30, 33, 40, 43, 46, 53, 79, 82–83, 86–87, 91, 95–96, 99, 102, 121–122, 133–135, 140, 155, 159–160, 162, 171–172, 176–183, 186, 191, 202–204, 208, 210–211, 213, 215, 219, 221, 224, 226, 240, 250, 254, 257, 266–270, 275–279 databases 22, 24, 84, 86, 99, 104, 113, 116, 122, 131, 156, 215, 258, 278
309
310
index
deductive approach 150 descriptive linguistics 28, 53, 71, 80, 82–83, 85, 101–102, 105, 159, 171, 263–264, 267–268, 271–272 diachronic research 9 dialectology 8, 56, 86, 91, 98, 100, 112, 121, 134, 139, 161, 167, 180, 184, 189, 192–193, 264, 278 dictionary 83, 90, 92, 95, 98, 101, 151 disjunctness 19, 41, 51, 110 documentary linguistics See language documentation double dissociation 251
157–158, 160–163, 180, 186, 202–203, 205, 212, 215, 222–223, 269, 274 functional magnetic resonance imaging (fMRI) 208, 244, 248–250, 249, 254, 259, 272, 276 functional near infrared spectroscopy (fNIRS) 208, 244, 248–250, 249
economy 128 ELAN 32, 94–95, 148 electroencephalogram (EEG) 208, 213, 216, 218, 228, 243–248, 245, 250, 274, 276 elicitation 27, 53, 71, 79, 82–83, 89, 92, 95–98, 103–105, 142, 158–160, 178, 180, 191, 202, 208, 213–214, 217–218, 226, 236, 240, 251, 267, 269, 272 emic perspective 48, 50–51, 89–90, 92, 110, 168, 171–172, 174, 178, 182–185, 187, 191, 269, 272 empiricism 3–5, 279 endangered languages 79–80, 84–87, 89, 91, 99, 102–104, 160, 168, 201 ethics 3, 9, 11, 13–15, 22, 37, 39, 46, 49, 51, 63, 82, 89–91, 99, 138, 144, 174, 232, 244, 277–278 ethnography of communication 194 Ethnologue 84, 86, 114 etic perspective 50, 89, 171, 182–185, 187, 191 event-related potentials (ERPs) 208, 232, 246–248, 253, 259 exhaustiveness 51, 150 experiment 12, 14, 27–29, 34, 40, 43–47, 46, 50, 52–53, 64–75, 162, 183, 195, 198, 201–203, 209–220, 224–226, 229, 232, 236–244, 246, 249, 254, 256–257, 259, 266, 274–275, 278 types of experiment 61–64 experimental task 50, 64, 66–67, 74, 159, 173, 178, 180, 191, 202–203, 205–206, 209, 212, 216, 218, 225, 236, 241, 244, 250, 257, 275 explanatory research 268 explorative research 43, 56, 73, 273 exposé See proposal eye-tracking 207–208, 211–212, 217, 225, 228, 251, 272
historical linguistics 10, 129, 134, 159–160 hypothesis 4–5, 8, 10, 15, 18–20, 22, 25–26, 32, 38, 40, 43–45, 46, 50, 58, 60, 64, 66, 68, 70, 74, 136, 141, 150, 198, 200, 210, 226, 258, 279
field research 7, 9, 11, 14, 20, 26–27, 37, 40, 42, 46, 50, 62, 64, 66, 71, 79, 83, 87–92, 88, 94–95, 99, 102, 105, 121, 150, 166, 171–176, 179, 181–183, 185, 191–192, 194, 197, 201, 209, 219–220, 225, 266–267, 272, 276, 279 frequency 32, 34, 37, 67, 71, 123, 129, 133–134, 136, 139, 143–144, 146, 149, 151–153,
genetic relatedness See language family genre 88–89, 92–93, 98, 122, 139, 142, 144, 148–150, 152, 160–161, 163, 168, 177–178, 185, 267, 272 geographic proximity See language area Glottobank 122, 132, 278
iconicity 128, 200 inductive approach 107, 133, 150 interdisciplinarity 272–273, 275–276 interview 27, 29, 43–47, 53–56, 59, 70–74, 179, 181–182 sociolinguistic interview 166, 177, 179–180, 191 introspection 27, 49, 83, 91–92, 176, 200, 202 invasiveness 242, 244, 249 inverse problem 245, 250 item See stimulus judgements 4, 9, 96, 160, 178, 206, 208–209, 208, 217, 226, 228, 267, 274 keyword list 151, 154, 161, 163–164 laboratory research 10, 20, 42, 266–267, 279 language acquisition 8, 18, 20–21, 23, 33, 42, 71, 107, 112, 126–127, 132, 134, 141, 161, 163, 168, 196, 198, 209, 215–218, 221, 223, 225, 227–228, 257, 273–274 language area 114, 117, 129–130, 235 language change 18, 33, 80, 107, 113, 132, 159–160, 167, 187, 190, 273 language comprehension 196–197, 203, 209, 211, 216–217, 221, 223, 225, 228–229, 231, 234–235, 257 language contact 8, 18, 41, 85, 105, 112, 116, 127–129, 132, 193, 264, 272–273 language documentation 28, 47, 79, 82–85, 90, 94, 98, 101–102, 104–105, 138, 144, 159, 267, 270, 277 language family 23–24, 87, 88, 100, 105, 113–114, 116–121, 126, 129, 132, 269, 272 language processing 15, 66, 107, 127–128, 160–161, 164, 195–196, 198, 200, 202–204,
Index 206–214, 216, 221–223, 225, 227, 230–232, 234, 236, 238, 242, 244–247, 250, 253, 255–257, 271, 277 language production 8, 72, 161, 181, 196–197, 203, 209, 212–215, 221, 223–225, 235–236, 255, 257, 268 language typology 9, 28, 82, 103, 106–107, 129, 131–132, 263, 268, 271, 274, 278 lemmatisation 146, 162 lesion 234–237, 240, 248, 250–252, 255–256 levels of measurement See scales linguistic relativity 195–197, 200–201, 218, 221, 227 longitudinal design 46, 46 magnetoencephalogram (MEG) 243–246, 245 mapping problem 248, 253 matched-guise task 180 measure 34–35, 47, 70, 90, 120, 141, 151–152, 194, 201, 204, 207–209, 213, 216, 222, 225–226, 228, 237–238, 242, 245, 252, 257, 273–274 behavioural 195, 204, 207–212, 208, 216, 240–241, 244, 250 neurocognitive 204, 207–209, 208, 212–213, 216, 229, 233–234, 240–244, 250, 253–254, 257, 277 offline 207–210, 212, 242 online 207–213, 225–226, 234, 242, 244 metadata 45, 80, 91, 94, 102, 105, 135, 138–140, 157, 161 metalanguage 32, 110, 129 method 4, 8, 11–12, 15, 22, 25, 27–29, 32–33, 35, 38, 41, 43–47, 49, 51–52, 58, 60, 68, 71, 73–75, 82, 89, 95, 105, 111, 121, 132–133, 147–148, 150, 154, 159–160, 162, 164, 166, 171, 173, 177–178, 181–183, 185–187, 191, 194, 200, 202–209, 212, 224, 227–228, 232, 234, 236–237, 241–245, 248, 250, 255, 257–259, 263, 272–276 experimental method See measure mixed-methods 28, 43–47, 69–75, 215, 274–275 multi-methods 71, 274–275 minimal pair 28, 65, 97, 178, 180, 205, 226 multi-dimensional (MD) analysis 133, 154–155 multidimensional scaling 111, 122 multidisciplinarity 273, 275–276 naming 96, 204, 213–214, 219, 226 naturalistic design 256 network See social network neurobiology of language 234, 236, 256, 259 neurolinguistics 28, 62, 65, 107, 129, 228–236, 238, 240, 242, 252, 254, 256–259, 263, 266, 268, 270–272, 276–278 neuropsychology 234, 236, 255
n-gram 151, 153 null results 254, 274 objectivity 11 observation 11–12, 24, 26–29, 40, 43–49, 46, 51–53, 63, 71, 73–74, 138, 153, 155–156, 173, 177–179, 183, 191, 202, 215, 224, 236–237, 240, 257, 269, 275 participant observation 48, 50, 166, 171–176, 178–179, 181–183, 185, 187, 191–192, 276 observer’s paradox 12, 47, 49–51, 59, 174, 177–178 operationalisation 15, 18, 22, 44, 59, 258 parameter See variable picture-prompted storytelling 226 pilot study 26, 28–29, 51, 58, 66 positron emission tomography (PET) 14, 244, 248, 249 poster 15, 37–39, 41 preferential looking paradigm 217 pre-test See pilot study priming 68, 207, 214 structural priming 214 processing See language processing proposal 15, 18, 24–25, 40, 138, 253, 280 psycholinguistics 28, 66, 71, 129, 133, 159–160, 195–200, 203, 205, 220, 222–223, 225, 228, 230, 232, 238–240, 251, 257, 263, 266, 268, 270–272, 276–278 pupillometry 212 qualitative research 10, 16, 42, 53–54, 58, 71, 136, 145, 149, 154, 161–162, 171, 186, 278 quality criteria 3, 11–13, 22, 37, 39, 61, 74, 176 quantitative research 10, 17, 32, 40, 42–43, 57, 60–61, 69–70, 72, 74, 133, 147, 149–150, 154, 158, 161–162, 173, 187, 268, 279 questionnaire 27–28, 43–47, 49, 53–56, 58–60, 70, 72–74, 96–97, 103, 121, 131, 181, 183, 192, 209, 267, 269 randomisation 62–64, 67 rara/rarissima 106–107, 123–124, 127, 129–130 reaction time 9, 37, 202, 208, 210–213, 216, 221, 250 reciprocal effects See bias reductionist design 43, 256 reference grammar 79, 82–83, 92, 95–98, 100–105, 121–122, 141, 160–161, 178, 202–203, 267–268, 271–272 register 8, 92, 130, 134, 138–139, 142, 148–150, 152, 155, 160, 163, 189 reliability 11, 44, 66, 70, 73–74, 99, 240 replication 11, 44, 63, 144, 159, 201, 239–240, 254
311
312
index
representativeness 12–13, 23, 35, 58, 99, 138–140, 143, 145, 156–158, 162, 270 reproducibility 156, 159, 277 research design 16–17, 20, 43–47, 69, 256, 268 research diary 25–27, 40, 45, 176 research process 3, 6–7, 13, 15, 17–18, 20, 25–26, 39–40, 45, 51, 59, 69, 79, 104, 106, 130, 148, 150, 156, 166, 177, 179, 184, 192, 195, 226, 229, 258, 263, 267, 273–274 research question 4, 8, 10–11, 15, 18, 20, 22–23, 26, 37–38, 41, 46, 58, 65–66, 68–70, 72–73, 80, 106, 117, 123, 129, 133–135, 138–140, 143, 145–147, 154, 156, 159, 167–168, 173, 183, 186, 189, 195, 198, 201–202, 205–206, 208, 210–211, 213, 225, 230–232, 234, 238, 263, 268–269, 273–274, 278 response format 52 sampling 15, 22–24, 35, 41, 44, 46, 47, 52, 56, 58, 63, 71, 106, 112–121, 129, 132, 138–144, 157, 162, 166, 185–186, 201, 239 sampling bias See bias sampling error 56, 58 scales 21, 36, 44, 57 segmentation 32, 83, 93, 105 selectivity problem 245 self-paced reading 208, 210, 226 semantic map 111 sentence-picture matching 217 single dissociation 251 slip of the tongue 30, 197, 203, 212, 215, 269 social markers 189 social network 33, 166, 170–171, 184, 186, 190, 194, 276 sociolect 8, 150, 166, 189, 264 sociolinguistic interview See interview sociolinguistics 28, 48, 62, 159, 166–168, 170–172, 180, 184, 186–188, 191, 215, 263, 270, 272, 274, 276, 278 variationist sociolinguistics 71, 161, 167, 170, 177, 187, 192, 268–269, 272 sorting 205 speech community 82, 92, 166, 183–186, 188–189 speech pathology 229 speed-accuracy trade-off 66, 205, 208, 208, 210 statistics 33–35, 42, 46, 57, 64, 67, 132, 147, 150, 152, 155, 164, 188, 240–241, 247, 276 stimulus 14, 22, 27–28, 47, 49, 60, 62, 65–68, 75, 88, 96–97, 99, 103, 128, 135, 148, 154, 179–180, 203–205, 208–214, 216–217, 219–220, 223, 225–226, 232, 241–242, 246, 248, 251, 254, 258, 268, 275
stimulus presentation 68, 205–207, 209, 211–212, 216, 225, 257 stroop task 207, 214 subtraction design 241 survey See interview, questionnaire synchronic research 9, 40 tagging 30–32, 133, 145–148, 162, 164 task demand 66, 241 task effect 66, 212 tetrachronic tables 124 theory 3–5, 15 tokenisation 145–146 top-down approach 137, 198, 200 transcranial magnetic stimulation (TMS) 244, 248, 249, 250–252, 254 transcription 15, 30–32, 40, 60, 79–80, 82–83, 85, 93–94, 102–103, 105, 138, 140, 143–144, 146, 177, 179, 183 translation 6, 8, 15, 30–32, 40, 82, 83, 93–96, 101–102, 105, 143, 161, 179, 277 transliteration 30–32 triangulation 72 type-token ratio (TTR) 151, 163 typological map 107, 126 typological universals 125, 200 typology 123–124 unit of analysis 24, 32, 134, 139, 145, 149, 160, 203, 242 unit of measurement 24, 32, 47, 51, 138–139 unit of observation See unit of measurement universals See typological universals Universals Archive 124, 130, 278 validity 11–13, 37, 44, 47, 58–69, 73–74, 138, 140, 206, 211, 219 variable 9–12, 15–23, 27–28, 32, 34–35, 37, 40, 43–44, 46, 47, 51–53, 55, 61–66, 68–70, 72–74, 82, 91–92, 110–112, 116–117, 121, 123–124, 126, 129–130, 132, 149, 154, 162, 166–167, 170–171, 176, 178, 181–187, 189–192, 198, 200, 204–205, 207, 209, 212, 215–218, 223–226, 242, 253, 257–258, 269–270, 273–274 variationist sociolinguistics See sociolinguistics visual world paradigm 211, 213, 225, 251 WALS (World Atlas of Language Structures) 84, 114, 122, 132, 278 wug test 217 Zipf’s Law 162