Research Methods for Understanding Child Second Language Development 2022009526, 9780367417024, 9780367815783, 9780367417017

Butler and Huang’s book is one of the first to focus on second language (L2) development research methods and techniques

215 79 14MB

English Pages 215 [217] Year 2022

Table of contents :
Cover
Endorsement
Half Title
Series Information
Title Page
Copyright Page
Dedication
Table of Contents
Contributors
Acknowledgements
1 Introduction: Researching Child Second Language Development
References
2 Observation and Ethnographic Methods for Researching Young Learners
Introduction
Observing Children’s Language Development
Participant and Non-Participant Observation
Participant Observation: Ethnographic Approaches
Recent Ethnographic and Observational Studies of Young L2 Learners
Study Design and Duration
Other Data Sources
Analysis
Theoretical Frameworks and Types of Research Questions Asked
Challenges and Consideration
Implications for Child SLA Researchers
Discussion Questions
References
3 Surveys and Questionnaires With Young Language Learners
Introduction
Questionnaires When Children Are Respondents: A Brief History
Building a Questionnaire for Children
Answering Questions Mirrors Cognitive Capacity and Processing
Stage One: Interpretation
Stage Two: Retrieval
Stage Three: Judgment
Stage Four: Response Selection
Cultural Sensitivity
Survey Research Overview: Our Recommended Procedures
Research Questions
Survey Gathering and Analysis
Theory and Prior Examples
Translation in All Its Forms
Sorting Constructs: The KJ Method
Implementation
Conclusion
Tools and Resources
Discussion Questions
References
4 Using Interviews With Children in L2 Research
Introduction
Interviews
Interviews With Children in SLA
Types of Studies: Examples
Challenges and Considerations
Power Gap, Rapport, and Language
L1 Or L2 Use in Interviews
Creating Comfortable Environments
Individual/Pair Interviews, Group Interviews, Or Peer Interviews
Some Ethical Issues
Conclusion
Discussion Questions
References
5 Verbal Reports as a Window for Understanding Mental Processes Among Young Learners
Introduction
General Description of Verbal Reports as an Introspective Method
Verbal Reports in Research Focusing On Children
Verbal Reports as a Method of Inquiry for L1 and L2/FL Development
Verbal Reports as a Method of Instruction
Challenges and Considerations for Implementing Verbal Reports With Young Language Learners
Veridicality and Reactivity: Conceptual Issues
Issues With Planning and Administering Verbal Reports
Issues With Analyzing and Interpreting Children’s Verbal Reports
Implications for Child SLD Researchers
Using Digital Technology
Taking a Child-Centered Approach
Discussion Questions
Notes
References
6 Research Methods for Evaluating Second Language Speech Production
Introduction
An Overview of L2 Speech Production Methods in the Literature
Standardized Norm-Referenced Assessments
Language Sampling Methods
The Use of Technology in L2 Speech Production Methods
Challenges and Considerations
Implications
Tools and Resources
Discussion Questions
References
7 Receptive Methods in Child Bilingualism and Second Language Acquisition
Introduction
Studying Morphosyntax
The Picture-Sentence Matching Task
Description
How the Method Is Used in Child SLA
Challenges and Recommendations
The Grammaticality Judgment Task
Description
How the Method Is Used in Child SLA
Challenges and Recommendations
Conclusion
Tools and Resources
Discussion Questions
References
8 Eye-Tracking Methods in Child SLA Research
Introduction
Using Eye Tracking to Study Reading Comprehension
Using Eye Tracking to Study Spoken Language Processing
Studies Illustrating the Use of Eye-Tracking Methodology With Child L2 Learners
Challenges and Considerations in Eye-Tracking Research With Child L2 Learners
Recruitment and Community Outreach
Materials and Procedures
Data Analysis
Concluding Remarks
Tools and Resources
Discussion Questions
Notes
References
9 Brain Imaging Methods
Introduction
Why Use Neuroimaging Tools to Understand Children’s Second Language Development?
Theoretical Considerations for the Neurobiology of Childhood Second Language Learning
Phonology and Second Language Learning During Childhood
Phonology and Literacy Development in School-Age Children
Overview of Developmental Cognitive Neuroimaging Tools
When Tools
Electroencephalogram (EEG) and Event-Related Potential (ERP)
ERP Use With Children
ERP Components in Language Research
Mismatch Negativity (MMN)
MMN in SLA Research
Rhyming Effect (RE)
RE in SLA Research
ERP Components in SLA Research
MEG: Magnetoencephalography
MEG Use With Children
Examples of MEG Use in SLA Research
MEG Use in Second Language Learning
When Tools Summary
Where Tools (Hemodynamics)
Using Where Tools (Neuroanatomical Principle)
Functional Magnetic Resonance Imaging (FMRI)
MRI-Based Anatomical Imaging
fMRI Use With Children
Using Where Tools (Neuroanatomical Principle)
Examples of FMRI Use in SLA Research
fNIRS—Functional Near-Infrared Spectroscopy
fNIRS Use With Children
Examples of FNIRS Use in SLA Research
Conclusion
Tools and Resources
Discussion Questions
Note
References
10 Research Methods for L2 Children With Special Needs
Introduction
Common Research Questions and Research Methods Used in Empirical Studies
Bi-DD and Risk Status
Language and Cognitive Profiles of Bi-DLD
Diagnostic Accuracy Studies
Intervention Studies
Challenges and Methodological Implications
Participant Selection
Comparison Group
Heterogeneity
Conclusion
Key Terms
Discussion Questions
Note
References
11 Considerations for Research Methods to Study Child Second Language Development
Introduction
Theories and Research Methods
Adapting Research Methods to Account for Developmental Factors
Cognitive and Metacognitive Factors
Affective Factors
Linguistic and Cultural Factors
Experiential Factors
Directions for the Future
How to Capture Dynamic and Fluid Conceptualization of Development
How to Take Child-Centered Approaches in Research
Conclusion
Note
References
Index

Recommend Papers

Instructed Second Language Acquisition Research Methods 9027212678, 9789027212672

Written for novice and established scholars alike, Instructed Second Language Acquisition Research Methods is a stand-al

101 57 6MB Read more

Second Language Research Methods - Oxford Applied Linguistics: 9780194423076, 9780194370677

Based on a set of four research parameters, this book discusses the development of research questions and hypotheses, na

118 66 Read more

Understanding Second Language Process 9781847690159

This book assembles 11 analytical and empirical studies on the process of SLA, reviewing a range of issues, from transfe

130 88 3MB Read more

Understanding Child Language Acquisition 9780415827133, 9781444152654, 9780203776025

For a full list of titles in the Understanding Language series, please visit https://www.routledge.com/Understanding-Lan

379 100 7MB Read more

Research Methods for Understanding Professional Learning 9781474274616, 9781474274609, 9781474274647, 9781474274630

Practitioners are experts in their field and this book introduces research methods that help to make that expertise expl

158 70 6MB Read more

Qualitative Research Methods for Media Studies [Second edition.] 9781315435961, 1315435969

Reflexivity -- Analyzing and Interpreting Ethnographic Material -- Ethical Considerations -- Research Using Ethnography

451 8 1MB Read more

Research Methods in Child Welfare 9780231141307, 9780231512145, 9780231141314, 2007044173

157 34 2MB Read more

Second Language Acquisition Myths: Applying Second Language Research to Classroom Teaching 9780472034987

100 46 10MB Read more

Research Methods for History 9781474408745

A wide-ranging critical survey of methods for historical research at all levels Historians have become increasingly sen

103 14 3MB Read more

The Grammar Dimension in Instructed Second Language Learning: Advances in Instructed Second Language Acquisition Research 9781472542113

One of the key issues in second language learning and teaching concerns the role and practice of grammar instruction. Do

160 14 2MB Read more

Research Methods for Understanding Child Second Language Development
2022009526, 9780367417024, 9780367815783, 9780367417017

Author / Uploaded
Yuko Goto Butler
Becky H. Huang

Similar Topics
Psychology
Pedagogy

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

i

“The potential of this book is enormous. No longer can L2 researchers claim ignorance to the nuances and challenges of working with younger learners, a population so often overlooked. Yuko Butler and Becky Huang’s volume, and the entire cast of contributors, deftly guides us on a full spectrum of methodological approaches and the myriad issues we are likely to encounter in conducting research with children. It’s now up to us—the L2 research community—to step up and to put into practice the time-and experience-tested techniques described herein. And I very much hope we will.” Luke Plonsky, Northern Arizona University, USA “This book is a gem for researchers of children’s L2 development! It offers wise guidance across a wealth of methods, from ethnography and interviews to eye- tracking and brain imaging, and a lot more. With its exciting selection of authors and its comprehensive coverage, this is the ideal textbook for a research methods course focusing on L2 child populations.” Lourdes Ortega, Georgetown University, USA “In an increasingly bilingual world with more and more children growing up exposed to an additional language either at birth or as young as age 2–5, this long- needed volume will make a very timely and notable contribution to the field of child L2 acquisition. With its different chapters discussing, in a highly accessible fashion, a variety of research methods in reference to the relevant theoretical framework behind them, the book will serve as an excellent resource for both novice and expert researchers interested in identifying intricate issues in child L2 development. I strongly recommend it.” Ayşe Gürel, Boğaziçi University,Turkey

ii

ii

iii

ii

RESEARCH METHODS FOR UNDERSTANDING CHILD SECOND LANGUAGE DEVELOPMENT

Butler and Huang’s book is one of the first to focus on second language (L2) development research methods and techniques specifically targeted at children of primary and pre- primary years. The last decade has seen a growing number of L2 studies of children aged 4–12, a demographic with special developmental characteristics that confound research methods designed for studying adults. Written by experts from a variety of disciplines, this book covers major research methods and techniques in existing L2 development research, including observations, surveys, interviews, introspective methods, speech production methods, receptive methods, eye tracking, and brain imaging, as well as research methods specifically designed for L2 children with special educational needs.The book also discusses various age-related considerations and challenges if they are employed to young L2 learners. This will be essential reading for SLA, child development, and TESOL researchers, and students in these courses will benefit particularly from pedagogical material such as further readings and discussion questions. Yuko Goto Butler is Professor of Educational Linguistics in the Graduate School of Education at the University of Pennsylvania. She is also Director of the Teaching English to Speakers of Other Languages (TESOL) Program. Her research interests include language assessment and second and foreign language learning among children. Becky H. Huang is Professor in the Department of Teaching and Learning at Ohio State University. Her research focuses on two interrelated areas that address the goal of promoting language and education outcomes for bilingual students: language/literacy development and assessment of bilingual students.

iv

SECOND LANGUAGE ACQUISITION RESEARCH SERIES Susan M. Gass and Alison Mackey, Series Editors Kimberly L. Geeslin, Associate Editor

The Second Language Acquisition Research Series presents and explores issues bearing directly on theory construction and/or research methods in the study of second language acquisition. Its titles (both authored and edited volumes) provide thorough and timely overviews of high-interest topics, and include key discussions of existing research findings and their implications. A special emphasis of the series is reflected in the volumes dealing with specific data collection methods or instruments. Each of these volumes addresses the kinds of research questions for which the method/instrument is best suited, offers extended description of its use, and outlines the problems associated with its use.The volumes in this series will be invaluable to students and scholars alike, and perfect for use in courses on research methodology and in individual research. Conducting Second-Language Reading Research A Methodological Guide Elizabeth B. Bernhardt and Michael L. Kamil Research Methods for Understanding Child Second Language Development Edited by Yuko Goto Butler and Becky H. Huang Gesture and Multimodality in Second Language Acquisition A Research Guide Edited by Gale Stam and Kimberly (Buescher) Urbanski For more information about this series, please visit: www.routledge.com/Second-Language-Acquisition-Research-Series/book- series/LEASLARS

v

RESEARCH METHODS FOR UNDERSTANDING CHILD SECOND LANGUAGE DEVELOPMENT Edited by Yuko Goto Butler and Becky H. Huang

vi

Cover image: © Getty Images First published 2023 by Routledge 605 Third Avenue, New York, NY 10158 and by Routledge 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2023 selection and editorial matter,Yuko Goto Butler and Becky H. Huang; individual chapters, the contributors The right of Yuko Goto Butler and Becky H. Huang to be identified as the authors of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Names: Butler,Yuko Goto, editor. | Huang, Becky H., editor. Title: Research methods for understanding child second language development / edited by Yuko Goto Butler and Becky H. Huang. Description: New York, NY : Routledge, 2023. | Includes bibliographical references and index. Identifiers: LCCN 2022009526 | ISBN 9780367417024 (hbk) | ISBN 9780367815783 (ebk) | ISBN 9780367417017 (pbk) Subjects: LCSH: Second language acquisition–Research–Methodology. | Bilingualism in children. | Children–Language. Classification: LCC P118.2 .R47 2023 | DDC 401/.93072–dc23/eng/20220524 LC record available at https://lccn.loc.gov/2022009526 ISBN: 978-0-367-41702-4 (hbk) ISBN: 978-0-367-41701-7 (pbk) ISBN: 978-0-367-81578-3 (ebk) DOI: 10.4324/9780367815783 Typeset in Bembo by Newgen Publishing UK

vi

For Donald Butler, Xū-Róu Qiū, and Xīng-Zhǎn Huáng

vi

ix

CONTENTS

List of Contributors Acknowledgements 1 Introduction: Researching Child Second Language Development Becky H. Huang and Yuko Goto Butler 2 Observation and Ethnographic Methods for Researching Young Learners Peter Sayer and Susan Ataei

xi xiv

1

11

3 Surveys and Questionnaires with Young Language Learners Emiko Hirosawa and W. L. Quint Oga-Baldwin

33

4 Using Interviews with Children in L2 Research Annamaria Pinter

49

5 Verbal Reports as a Window for Understanding Mental Processes among Young Learners Yuko Goto Butler

64

6 Research Methods for Evaluating Second Language Speech Production Becky H. Huang and Rica Ramírez

84

x

x Contents

7 Receptive Methods in Child Bilingualism and Second Language Acquisition Silvina Montrul, Alexandra Morales-Reyes, and Begoña Arechabaleta Regulez

102

8 Eye-Tracking Methods in Child SLA Research Paola E. Dussias and Karen Miller

121

9 Brain Imaging Methods Nia Nickerson and Ioulia Kovelman

144

10 Research Methods for L2 Children with Special Needs Li Sheng and Sharon R. Hollenbach

164

11 Considerations for Research Methods to Study Child Second Language Development Yuko Goto Butler

186

Index

201

x

xi

x

CONTRIBUTORS

Begoña Arechabaleta Regulez is Assistant Instructional Professor at the University

of Chicago. She received MA and PhD degrees in Spanish Linguistics from the University of Illinois at Urbana-Champaign. Her dissertation examined language variation from a psycholinguistic perspective. Begoña has also conducted research on language acquisition, especially by bilingual children and heritage speakers. Susan Ataei is a doctoral candidate in the PhD program in Language, Education

& Society at the Ohio State University. She is currently researching elementary newcomer refugee English language learners through the lens of asset-based pedagogies. Her research interests include emergent bilingualism, diversity and inclusion, and heritage language maintenance. Yuko Goto Butler is Professor of Educational Linguistics in the Graduate School

of Education at the University of Pennsylvania. She is also Director of the Teaching English to Speakers of Other Languages (TESOL) Program. Her research interests include language assessment and second and foreign language learning among children. Paola E. Dussias is Professor of Spanish, Linguistics, and Psychology at Penn State

University. Her research examines the interactions of a bilingual’s two languages during written and spoken language comprehension. To do this, she uses converging methodological tools from linguistics, experimental psycholinguistics, and second language acquisition. Emiko Hirosawa teaches full-time at a private elementary school in Tokyo while

also pursuing her doctorate at Waseda University. She researches elementary school

xi

xii List of Contributors

English education and motivation and is a co-editor of the Smile textbook series, specifically for private elementary schools in Japan. Sharon R. Hollenbach recently received her Communications and Information

Sciences MA from the University of Alabama, focusing on how individuals’ religious beliefs and communities affect other aspects of their public life. She spent three years at the University of Delaware assisting in bilingualism research, and now conducts original language textual analysis of Hebrew and Aramaic texts. Becky H. Huang is Professor in the Department of Teaching and Learning at Ohio

State University. Her research focuses on two interrelated areas that address the goal of promoting language and education outcomes for bilingual students: language/literacy development and assessment of bilingual students. Ioulia Kovelman is a faculty member at the University of Michigan. She is a

developmental cognitive neuroscientist interested in how bilingualism influences children’s emerging neural architecture for literacy development. She studies young learners of different languages, with typical development and dyslexia, using a variety of neuroimaging methods. Karen Miller is Associate Professor of Spanish and Linguistics at Penn State

University. Her research in developmental sociolinguistics focuses on how variable input impacts children’s acquisition of language and how children acquire sociolinguistic variation across development. Silvina Montrul is Professor in the Department of Spanish and Portuguese and a

Professor in the Department of Linguistics at the University of Illinois at Urbana- Champaign. Her research focuses on linguistic and psycholinguistic approaches to second language acquisition and bilingualism, with particular emphasis on heritage speakers. Alexandra Morales-Reyes is Associate Professor of Hispanic Linguistics at the

University of Puerto Rico, Mayagüez. Her teaching and research are focused on bilingualism and first and second language acquisition in children. Nia Nickerson is a developmental neuroscientist studying how different language

experiences, such as bilingualism and bi-dialectal experiences, influence children’s language and literacy development and the emerging neural pathways for learning to read. She pursues this inquiry from educational, brain-development, cognitive and sociocultural perspectives.

xi

xi

xi

List of Contributors xiii

W. L. Quint Oga-Baldwin is Professor of Elementary School English Education

at Waseda University, where he researches motivation, language, and pedagogy. He has trained primary school teachers in Japan for over a decade. Annamaria Pinter is Reader at the University of Warwick. She is internation-

ally known for her research in the area of teaching English to young learners. She has published extensively in this field and her most recent interests include inclusive research methods with children and working with young learners as co-researchers. Rica Ramírez is Assistant Professor at the University of Texas at San Antonio,

where she researches early childhood development, specifically focusing on how external factors (i.e., home and school) influence the school readiness development of young Latino children, and how maternal responsiveness impacts young Latino children’s language development. Peter Sayer is Associate Professor of Language Education Studies in the College

of Education and Human Ecology at Ohio State University. He has conducted extensive research with children in the English program for public primary schools in Mexico. Li Sheng is Professor and a faculty member of the Speech Therapy Unit at the

Department of Chinese and Bilingual Studies of the Hong Kong Polytechnic University. Her research focuses on assessing the language development of heritage speakers of Mandarin in the US, comparative studies of Mandarin-English and Spanish-English dual language learners, and, more recently, multilingual children in Hong Kong.

xvi

newgenprepdf

ACKNOWLEDGEMENTS

This edited book would not have been possible without the support of many people. First of all, we thank all the contributors: Begoña Arechabaleta Regulez; Susan Ataei; Paola E. Dussias; Emiko Hirosawa; Sharon R. Hollenbach; Ioulia Kovelman; Karen Miller; Silvina Montrul; Alexandra Morales-Reyes; Nia Nickerson; W. L. Quint Oga-Baldwin; Annamaria Pinter; Rica Ramírez; Peter Sayer; and Li Sheng. We are fortunate to have such an amazing group of professionals, and we enjoyed working with them in every step of the production of this book. We also wish to express our gratitude toward the reviewers: Angelique Blackburn; Janet Enever; María del Pilar García Mayo; Annina Kristina Hessel; Rhonda Oliver; Elizabeth Peña; Sabrina He Sun; Veronika Timpe-Laughlin; and Mikyung Wolf. They took time out of their busy schedules and provided us with thorough, detailed, and constructive feedback. We are also grateful to the second language acquisition research series editors, Sue Gass,Alison Mackey, and Kimberly Geeslin, for their helpful comments on our initial proposal and chapter drafts. Finally, we wish to thank Ze’ev Sudry and Helena Parkinson, editors at Routledge, who have offered us guidance, technical support, and encouragement throughout the process. This book is truly a product of collaboration.

1

1 INTRODUCTION Researching Child Second Language Development Becky H. Huang and Yuko Goto Butler

Due to the ongoing effects of globalization and human migration, the number of young language learners learning additional language(s) as their second or foreign language (L2/FL) continues to grow worldwide in both immersive and instructional settings (Butler, 2015, 2017; Enever, 2018; Huang & Kuo, 2020; Huang et al., 2020a; Murphy, 2014; Nikolov, 2016). In the United States, for example, based on recent educational statistics (National Center for Education Statistics, 2020), there was an estimated 5 million (or 10%) young language learners in public schools who were learning English as an L2 and received an English learner designation. This estimate is conservative, as it did not include young learners who have met state-mandated English language proficiency standards. The actual number of young language learners in the US is thus likely to be higher. Many other immigrant-receiving countries such as the United Kingdom, Canada, Spain, and Germany have also witnessed a large influx of young language learners. On the other hand, the popularity of FL education, in particular English as a FL, has also accelerated globally (Butler, 2015). For example, the age of compulsory English FL education has been lowered to early primary grades in numerous countries (Butler, 2015; Huang, 2016). The growing number of young L2/FL learners worldwide has led to a parallel, ever-increasing body of research targeting this group, either to address fundamental research questions about language acquisition or to inform L2/FL teaching pedagogy and assessment and language education policy (Philp et al., 2008). Traditionally, research on second language acquisition (SLA) has mainly focused on the adult population (Oliver & Azkarai, 2017; Philp et al., 2017). Child second language acquisition (“child SLA” hereafter) research is an emerging field for which origins may be traced back to seminal work on the order of L2 morpheme acquisition by Kenji Hakuta (1974, 1976) and Dulay and Burt (1974). Hakuta’s DOI: 10.4324/9780367815783-1

2

2 Becky H. Huang and Yuko Goto Butler

longitudinal, naturalistic case study of a five-year-old Japanese girl documented her acquisition of English L2 morphemes. Dulay and Burt (1974) also examined the acquisition order of English L2 morphemes by analyzing the naturalistic utterances of L2 children who spoke Spanish or Chinese at home. Research on child SLA has since grown and utilized a wider variety of methodology other than descriptive diary studies and speech analysis, such as grammaticality judgement tasks (Ionin & Wexler, 2002), survey methods (Oga-Baldwin et al., 2017), and stimulated recall/think-alouds (Butler, 2020). Advances in neuroimaging tools have also afforded us the opportunities to investigate child SLA beyond behavioral measures and to understand the brain mechanism of the L2 process and products. The majority of existing child SLA studies have pursued research questions and constructs similar to those in adult SLA research (Oliver & Azkarai, 2017; Philp et al., 2008), and many have also taken a generative approach (e.g., Rankin & Unsworth, 2016; Song & Schwartz, 2009; but see Paradis et al., 2011, who undertook a usage-based approach). In addition to the sequence of L2 acquisition, child SLA researchers have studied children’s developmental errors, the role of language input, the age-related effects on L2 learning outcomes, and the influence of first language (L1) on L2 acquisition (Dixon et al., 2012; Huang & Kuo, 2020). As mentioned earlier, parents, educators, and policymakers are also the main stakeholders who drive and inspire research on child SLA due to practical demands to inform pedagogy, assessment and education policy (Huang et al., 2020a). These child SLA studies are generally conducted in classroom-or laboratory-based instructional settings, and the results have direct application to pedagogy or assessment (e.g., Butler, 2017, 2020; Enever & Lindgren, 2017; García Mayo, 2017; Huang et al., 2020a, 2020b; Mackey & Silver, 2005; Muñoz, 2006; Oliver & Nguyen, 2018; Wolf & Butler, 2017). In recent years, child SLA researchers have expanded their focus to examine L2 children’s unique aspects and contexts of language learning. For example, Paradis and colleagues worked with Syrian refugee children recently arrived in Canada to understand their bilingual and biliteracy development (i.e., Arabic L1 and English L2) as well as their acculturation trajectories (Paradis, Chen, & Ramos, 2020; Paradis, Soto-Corominas, Chen, & Gottardo, 2020). The life experiences of recently arrived refugee children and youth differ from L1 children and other L2 children groups, as they may experience violence and war prior to migration as well as interrupted formal education. Some child SLA researchers have also challenged the traditional research methods with L2 children and these methods’ underlying philosophy, arguing for a shift from doing research on children to doing research with children (Pinter, 2014). This proposed new research approach, however, has received mixed reactions, particularly about its feasibility and desirability (Oliver & Azkarai, 2017). Investigating child SLA may require careful consideration of methods due to children’s unique characteristics; they are still in the midst of cognitive, social, affective and L1 development (Butler, 2019; Oliver & Azkarai, 2017; Philp et al.,

2

3

2

Introduction: Researching Child SLA 3

2008). In other words, commonly used methods and techniques for adults may not necessarily work for young learners, and, as a result, modifications to existing methods, or new methods, techniques, and instruments altogether, are needed to research this population. For example, speech elicitation tasks such as the read- aloud task and the monologue task are commonly used in adult L2 speech production research. However, these tasks may not be developmentally appropriate for young L2 children because of their emergent reading skills and/or because the rigid, prescribed nature of the tasks could induce children’s anxiety (Huang & Ramirez, Chapter 6 in this volume). Specific ethical considerations may also be necessary when working with this vulnerable population, particularly for children with special needs (Sheng & Hollenbach, Chapter 10 in this volume). However, to the best of our knowledge, there is no comprehensive book that provides child SLA researchers with useful information on how they can modify or develop age-and developmentally appropriate methods and instruments.This book is thus designed to serve this purpose. In this book, we define young L2 learners as children who learn additional language(s) between 4 and 12 years of age. Because researchers do not have an agreed-upon definition for child L2 learners, we focus on “sequential bilingual” children aged 4–12 to make a distinction between child SLA and “simultaneous bilingual acquisition,” also known as bilingual first language acquisition (De Houwer, 2009). “Sequential bilinguals” distinguish themselves from “simultaneous bilinguals” because they learn the additional language(s) as an L2/L2s in childhood with a relatively solid foundation of their L1. It should be noted that we use the term bilinguals and L2 learners to refer to multilingual learners as well (i.e., children who learn multiple languages, as the above definition indicates). Although all chapters’ target age groups fall within the 4–12 range, some chapters (e.g., Huang & Ramirez in Chapter 6 and Montrul et al. in Chapter 7) focus on specific age groups within the defined age range (ages 4–12). As indicated below, this book covers major research methods and techniques commonly used in existing SLA research and discusses various possibilities and limitations, considerations and challenges when they are employed with young L2/ FL learners. Ethical considerations for working with children are also discussed. Specifically, each chapter includes the following information: (1) general descriptions of the given method/technique (including a brief historical background); (2) an overview of how the method/technique is used in child SLA; (3) challenges and considerations when employing them with L2/FL-learning children (including some concrete examples from the authors’ and/or other’s research); and (4) implications for child SLA researchers. The book includes a diverse variety of research methods, ranging from more qualitatively oriented observation and ethnographic methods (Chapter 2), interviews (Chapter 4), and verbal reports (Chapter 5) to more quantitively oriented survey methods (Chapter 3), psycholinguistic methods (specific eye- tracking) (Chapter 8), and brain-imaging techniques (Chapter 9).The book covers methods

4

4 Becky H. Huang and Yuko Goto Butler

for researching both productive language (Chapter 6 on speech production) and receptive language (Chapter 7 on receptive methods), and a chapter on research methods for L2 children with special needs (Chapter 10) is also included for readers who are interested in understanding the full spectrum of language outcomes in child SLA. Below, we provide a brief summary of each chapter. Chapter 2 by Peter Sayer and Susan Ataei discusses observation and ethnographic methods. Ethnography, which originated in cultural anthropology, entails a range of methodological tools that involve collecting data primarily through observing participants and taking fieldnotes. When ethnography is employed among young L2 learners, the main objective is to identify and examine the various social dimensions that influence children’s language development rather than to describe their linguistic changes per se. Sayer and Ataei review various types of observation studies ranging from non-participant to participant and argue that a key element of “ethnographic” research in the classroom is participant observation. Researchers in these studies are fully immersed in the learning environment and serve as co-participants. The chapter discusses various critical topics when ethnography is conducted among young learners. Such topics include issues related to research question formulation, study design and duration, data sources used, data analysis, and theoretical frameworks employed. This chapter emphasizes the importance of (a) making a good connection between verbal data and children’s actions in the given social contexts; (b) capturing children’s views; and (c) actively engaging with children. While ethnography is primarily a qualitative method, in Chapter 3, Emiko Hirosawa and Quint Oga-Baldwin focus on survey research using questionnaires, a popular method to obtain self-reported information on children’s attitudes, beliefs, motivation, and other social and affective information. The authors first review a brief history of survey methods and then provide a description of children’s cognitive development that incorporates Piaget’s theory. While paying particular attention to children’s age-related maturational levels, Hirosawa and Oga-Baldwin offer very comprehensive and practical tips for how to implement survey methods with children in a step-by-step fashion that covers developing valid items, implementation procedures, and scoring. As with any other method, self-report surveys have some limitations, but they can be used even among children as a powerful tool to uncover various latent variables that might not otherwise be directly observable. Chapter 4 by Annamaria Pinter discusses interviews in child L2 research. Interviews allow researchers to obtain rich data sets in a flexible manner. Interviews have been used as an established method that is welcomed in both adult L2 research and child development studies in general. However, according to Pinter, interviews have been relatively underused and often implemented in an ad-hoc basis in child L2 research. After reviewing a number of leading existing studies involving interviews with children and identifying the primary principles of good practice, Pinter addresses both the challenges and potential of interview methods among children. The critical issues addressed in this chapter include: (a) the need

4

5

4

Introduction: Researching Child SLA 5

for sufficient consideration regarding the power gaps that exist between children and adult researchers; (b) the use of L1 or L2 in interviews and the consequences of doing so; (c) the importance of creating comfortable environments; (d) the pros and cons of individual and pair/g roup interview formats; and (e) ethical issues. The power imbalance and ethical issues addressed in this chapter are critically important, not only in interview methods but also in other types of research involving children. These issues are repeatedly addressed in other chapters as well. In Chapter 5, Yuko Butler focuses on verbal reports such as think-alouds and stimulated recall as a different method for eliciting verbal data from participants. In adult L2 development research, both think-alouds and stimulated recall have been used widely in order to access learners’ mental processes and reasoning while they engage in language-related tasks. Think-alouds and stimulated recall are also popular in child development and general education research. Notably, verbal reports, and think-alouds in particular, have been used not only as a research tool but also as a pedagogical tool for children. These tools have been used to assist children to improve their monitoring and other self-reflective skills when they develop reading and writing abilities. When it comes to child L2 research, however, verbal reports have been relatively underutilized as a research or pedagogical tool. Butler addresses a number of age-related considerations when implementing verbal reports as a research method among children. Important considerations include veridicality (i.e., the accuracy of children’s verbal responses) and reactivity (i.e., potential changes in children’s behaviors or performance due to the production of verbal reports through think-alouds). Chapter 6 by Becky Huang and Rica Ramírez focuses on methods to evaluate the L2 speech production of children between the ages of 6 and 12, i.e., from kindergarten to primary/elementary school grades. L2 speech production serves as a cognitive tool for young learners to develop higher mental functions, is a critical precursor to literacy, and is also often used to distinguish language-learning difficulties from a true developmental language disorder and/or from lack of opportunities to learn. This chapter reviews two main categories of speech production methods: standardized norm-referenced assessments and language sampling. Both types of methods can be used to evaluate various components of speech production from phonology to the hierarchical organization of the discourse. The authors discuss the advantages and disadvantages and the use of technology in each method. The authors also make suggestions for future directions and implications, including exploring alternative L2 production methods such as dynamic assessments and translanguaging techniques, developing alternative methods of administration such as involving parents as instrument administrators, and including multiple measures to gain a comprehensive picture of children’s L2 production. Silvina Montrul, Alexandra Morales-Reyes, and Begoña Arechabaleta Regulez discuss receptive methods in studying the comprehension of syntax, semantics, and inflectional morphology in Chapter 7. The authors focus on two receptive

6

6 Becky H. Huang and Yuko Goto Butler

methods, the sentence-picture matching task and the grammaticality/acceptability judgment task, because these have been widely used with monolingual children in L1 acquisition, adult L2 learners, as well as L2 children aged 4 to 12. For each of the two methods, the authors provide a general description, its assumptions, rationale and procedure, as well as the common linguistic structures that have been tested using that method. They also use their own work with children learning Spanish and English as a second language during the elementary school period to illustrate the advantages and disadvantages of using these two methods with L2 children. Although productive methods are the most direct way to assess L2 children’s language proficiency, receptive methods help capture children’s knowledge of language that they do not yet produce or do not produce often. The authors suggest that using receptive methods in conjunction with production methods or other language measures would allow researchers to gain a more complete picture of L2 child learners’ linguistic knowledge. In Chapter 8, Paola Dussias and Karen Miller explore eye-tracking research methods, a powerful means of uncovering cognitive processes when learners engage in L2 tasks. One of the exciting techniques for studying L2-learning children is the visual world paradigm, which combines eye-tracking methods with spoken language comprehension. The visual world paradigm has opened doors for researchers to uncover real-time and cognitive processing during language comprehension even among children at a pre-literate stage. Thanks to advancements in technology, researchers can now use eye- tracking methods among children not only in a laboratory setting but also in more natural contexts such as in classrooms. For successful use among young leaners, however, eye- tracking methods require researchers to make specific preparations and involve other considerations when recruiting child participants. Considerations include making age-related adjustments to materials and procedures and accounting for the different types of data collected when conducting data analyses. In combination with offline data, eye-tracking data allows researchers to triangulate their data, which, in turn, has the potential to greatly increase the validity of research among children. Chapter 9 introduces the use of neuroimaging methods in studying child L2 acquisition, specifically focusing on research on phonology and literacy. Nia Nickerson and Ioulia Kovelman summarize the two major categories of neuroimaging tools: the “When” and “Where” tools. Whereas the “When” tools measure the timing of the brain’s rapid electrical activity, the “Where” tools capture the location of the blood flow or magnetic fields associated with this electrical activity. The timing information from “When” tools can inform the development of sound sensitivity in the L2 and how this sensitivity facilitates word learning. Popular “When” tools in L2 acquisition research include the electroencephalogram (EEG), event- related potential (ERP), and magnetoencephalography (MEG). Among these tools, MEG has better neuroanatomical resolution than EEG, but EEG and ERP are more child friendly due to their affordability, silence,

6

7

6

Introduction: Researching Child SLA 7

and motion tolerance. On the other hand, information from “Where” tools allows researchers to test neuroanatomically based hypotheses about language learning. Functional magnetic resonance imagining (fMRI) and functional near-infrared spectroscopy (fNIRS) systems are commonly used “Where” tools, and both are MRI-based measures of the brain’s hemodynamic response. Both tools use magnetic fields to detect the amount of oxygen in the blood and where the blood is flowing, thus providing information about which parts of the brain respond to certain types of tasks. Although fMRI has better spatial precision, fNIRS is portable, less noisy, and more cost effective and child friendly. The authors also review research that has successfully applied neuroimaging technologies to studying young L2 learners’ neural architecture of the brain and discuss how this body of research affords us a better understanding of the role of bilingual experiences in young L2 learners’ minds and brains. Instead of focusing on a specific method, Chapter 10 by Li Sheng and Sharon Hollenbach reviews common methods utilized in research on child L2/bilingual learners who have developmental disorders (DD), specifically developmental language disorder (DLD) and autism spectrum disorder (ASD). In the first part of the chapter, the authors examine common research questions on bilingual children with DD in both basic science and clinical studies. The four common questions derived from their review of the literature are: 1) Does exposure to two languages present an additional risk to language development in children with DD?, 2) What are the language and cognitive profiles of bilingual children with DD in comparison to their typically developing bilingual peers? 3) What are the psychometric properties of diagnostic measures?), and 4) What is the efficacy of interventions designed to improve the language ability of bilingual children with DD? For each research question, the authors discuss the main research methods used to answer the question or present methodological standards that are used to guide the translational data for evidence-based clinical practice. In the second part of the chapter, the authors outline the challenges of studying bilingual learners with DD, discuss the methodological implications of these challenges, and make suggestions for future directions. Finally, in Chapter 11,Yuko Butler summarizes the major methodological issues and challenges when conducting research on young L2/FL development. Butler identifies the common age-related issues discussed in the previous chapters of the book. These include cognitive and metacognitive issues (memory, attention, processing speed and efficiency, metacognition, and meta-awareness), affective issues (anxiety and engagement), and linguistic and cultural issues (L1 development, the choice of using a child’s L1 and L2, and hidden cultural assumptions). Age is also often related to children’s experiential factors, such as their experience and familiarity with research. The chapter concludes with suggestions for future directions, including the need for both a dynamic and fluid conceptualization of children’s L2/FL development as well as a deeper consideration of the power imbalance between children and adult researchers.

8

8 Becky H. Huang and Yuko Goto Butler

To conclude, as the number of young language learners continues to grow, more and more researchers as well as practitioners are interested in working with this population, either to study fundamental questions about language learning or to inform pedagogy and education policy. Although child SLA is a relatively nascent field, it is interdisciplinary in nature and built on a wealth of existing research, such as adult SLA, L1 acquisition, child development, communication disorders, and education. The inclusion of diverse research methods with divergent research philosophies in this book reflects the interdisciplinary feature of child SLA research. As research methodologies are central to these pursuits for all researchers and practitioners across disciplines, the aim of this book is to provide its readers with the background information and procedural details of a diverse array of research methods commonly used in child SLA research. We intend the book to be useful not only to novice researchers and practitioners but also experienced researchers and practitioners who would like to refresh their knowledge of methods that they are familiar with and/or broaden their scope of techniques for child SLA research.

References Butler, Y. G. (2015). English language education among young learners in East Asia: A review of current research. Language Teaching, 48(3), 303–342. https://doi.org/10.1017/ s0261444815000105 Butler,Y. G. (2017). Instructed SLA in East Asian contexts. In S. Loewen & M. Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 321–338). Routledge. Butler,Y. G. (2019). Assessment of young English learners in instructional settings. In X. Gao (Ed.), Second handbook of English language teaching. Springer. https://doi.org/10.1007/ 978-3-030-02899-2_24. Butler, Y. G. (2020). The ability of young learners to construct word meaning in context. Studies in Second Language Learning and Teaching, 10(3), 549–580. https://doi.org/ 10.14746/ssllt.2020.10.3.7 De Houwer, A. (2009). Bilingual first language acquisition. Multilingual Matters. Dixon, L. Q., Zhao, J., Shin, J.Y., Wu, S., Su, J. H., Burgess-Brigham, R., ... Snow, C. (2012). What we know about second language acquisition: A synthesis from four perspectives. Review of Educational Research, 82(1), 5–60. https://doi.org/10.3102/0034654311433587 Dulay, H., & Burt, M. (1974). A new perspective on the creative construction process in child second language acquisition. Language Learning, 24(2), 253–278. https://doi.org/ 10.1111/j.1467-1770.1974.tb00507.x Enever, J. (2018). Policy and policies in global primary English. Oxford University Press. Enever, J., & Lindgren, E. (Eds.). (2017). Early language learning: Complexity and mixed methods. Multilingual Matters. García Mayo, M. P. (Ed.). (2017). Learning foreign languages in primary school: Research insights. Multilingual Matters. Hakuta, K. (1974). Prefabricated patterns and the emergence of structure in second language acquisition. Language Learning, 24(2), 287–297. https://doi.org/10.1111/j.1467- 1770.1974.tb00509.x

8

9

8

Introduction: Researching Child SLA 9

Hakuta, K. (1976). A case study of a Japanese child learning English as a second language 1, 2. Language Learning, 26(2), 321–351. https://doi.org/10.1111/j.1467-1770.1976. tb00280.x Huang, B. H. (2016). A synthesis of empirical research on the linguistic outcomes of early foreign language instruction. International Journal of Multilingualism, 13(3), 257–273. https://doi.org/10.1080/14790718.2015.1066792. Huang, B. H., Bailey, A. L., Sass, D. A., & Shawn Chang, Y. H. (2020a). An investigation of the validity of a speaking assessment for adolescent English language learners. Language Testing. Advance online publication. https://doi.org/10.1177/0265532220925731 Huang, B. H., Bedore, L. M., Niu, L., Wang, Y., & Wicha, N. Y. (2020b). The contributions of oral language to English reading outcomes among young bilingual students in the United States. International Journal of Bilingualism. Advance online publication. https:// DOI: 10.1177/1367006920938136 Huang, B. H., & Kuo, L. J. (2020). The role of input in bilingual children’s language and literacy development: Introduction to the special issue. International Journal of Bilingualism, 24(1), 3–7. https://doi.org/10.1177/1367006918768369 Ionin, T., & Wexler, K. (2002). Why is ‘is’ easier than ‘-s’?: Acquisition of tense/agreement morphology by child second language learners of English. Second Language Research, 18(2), 95–136. https://doi.org/10.1191/0267658302sr195oa Mackey, A., & Silver, R. E. (2005). Interactional tasks and English L2 learning by immigrant children in Singapore. System, 33, 239–260. https://doi.org/10.1016/j.sys tem.2005.01.005 Muñoz, C. (2006). The effects of age on foreign language learning: The BAF Project. In C. Muñoz (Ed.), Age and the rate of foreign language learning (pp. 1–40). Multilingual Matters. Murphy,V. (2014). Second language learning in the early school years:Trends and contexts. Oxford University Press. National Center for Education Statistics. (2020). English language learners in public schools. https://nces.ed.gov/programs/coe/indicator_cgf.asp Nikolov, M. (2016). Trends, issues, and challenges in assessing young language learners. In M. Nikolov (Ed.), Assessing young learners of English: Global and local perspectives (pp. 1–17). Springer. Oga-Baldwin, W. L. Q., Nakata, Y., Parker, P. D., & Ryan, R. M. (2017). Motivating young language learners: A longitudinal model of self-determined motivation in elementary school foreign language classes. Contemporary Educational Psychology, 49, 140–150. http://doi.org/10.1016/j.cedpsych.2017.01.010 Oliver, R., & Azkarai, A. (2017). Review of child second language acquisition (SLA): Examining theories and research. Annual Review of Applied Linguistics, 37, 62–76. https://doi.org/10.1017/s0267190517000058 Oliver, R., & Nguyen, B. (Eds.). (2018). Teaching young second language learners: Practices in different classroom contexts. Routledge. Paradis, J., Chen, X., & Ramos, H. (2020). The language, literacy, and social integration of refugee children and youth. Applied Psycholinguistics, 41(6), 1251–1254. https://doi.org/ 10.1017/s0142716420000788 Paradis, J., Nicoladis, E., Crago, M., & Genesee, F. (2011). Bilingual children’s acquisition of the past tense: A usage-based approach. Journal of Child Language, 38(3), 554–578. https://doi.org/10.1017/s0305000910000218 Paradis, J., Soto-Corominas, A., Chen, X., & Gottardo, A. (2020). How language environment, age, and cognitive capacity support the bilingual development of Syrian refugee

10

10 Becky H. Huang and Yuko Goto Butler

children recently arrived in Canada. Applied Psycholinguistics, 41(6), 1255–1281. https:// doi.org/10.1017/s014271642000017x Philp, J., Borowczyk, M., & Mackey, A. (2017). Exploring the uniqueness of child second language acquisition (SLA): Learning, teaching, assessment, and practice. Annual Review of Applied Linguistics, 37, 1–13. https://doi.org/10.1017/s0267190517000174 Philp, J., Oliver, R., & Mackey, A. (Eds.). (2008). Second language acquisition and the younger learner: Child’s play? (Vol. 23). John Benjamins Publishing. Pinter, A. (2014). Child participant roles in applied linguistics research. Applied Linguistics, 35(2), 168–183. https://doi.org/10.1093/applin/amt008 Rankin,T. & Unsworth, S. (2016). Beyond poverty: Engaging with input in generative SLA. Second Language Research, 34, 563–572. https://doi:10.1177/0267658316648732 Song, H. S., & Schwartz, B. D. (2009). Testing the fundamental difference hypothesis: L2 adult, L2 child, and L1 child comparisons in the acquisition of Korean wh-constructions with negative polarity items. Studies in Second Language Acquisition, 323–361. https://doi. org/10.1017/s0272263109090329 Wolf, M. K., & Butler, Y. G. (Eds.). (2017). English language proficiency assessments for young learners. Routledge.

10

1

10

2 OBSERVATION AND ETHNOGRAPHIC METHODS FOR RESEARCHING YOUNG LEARNERS Peter Sayer and Susan Ataei

Introduction In this chapter, we discuss the use of ethnographic and observational methods to study young children’s additional language learning, which we will combine under the rubric of ethnographic observations. Ethnography as a method includes a range of research tools that involve direct observation of and interactions with research participants (see Chapter 4, this volume, for interview methods). While ethnography as an approach is rooted in the disciplinary traditions of cultural anthropology, here we will focus somewhat more specifically on aspects of ethnography that have been taken up by applied linguists in studying second language (L2) learning with younger learners, namely the process of collecting data through participant observation and the use of fieldnotes, and the analysis of these data from an ethnographic perspective. We begin with a general discussion of observation as a research technique in working with young children and then survey studies from the field that rely on ethnographic observations to study aspects of young learners’ L2 development. As we will see, the aim of researchers of young L2 learners employing ethnographic observations is often less on particular linguistic gains that children make and more on the various dimensions of the social processes that facilitate or constrain the children’s language acquisition.

Observing Children’s Language Development The observation of children to understand their learning has a long history in the social sciences. Developmental psychologists who examine the role of play and social interaction in learning rely on detailed systematic observations of children as their main source of data (Pellegrini, 2013). In linguistic anthropology, the direct DOI: 10.4324/9780367815783-2

12

12 Peter Sayer and Susan Ataei

observation of children and caretakers in natural settings is the cornerstone of work on first language socialization (Schieffelin & Ochs, 1986). The main advantage of direct observation as a data collection technique in studying young children is the ability it gives researchers to capture the nonverbal and naturally occurring interactions that allow them to address the question of how children learn rather than merely what they learn. McKechnie (2000) argues that observation is particularly well-suited to the study of younger children (we are considering here up to age 12) because they are less self-conscious than adolescents and adults. As we grow, the presence of an observer is more likely to cause us to change our behavior to conform to (what we perceive as) accepted social norms.Young children have yet to master these social norms and are hence less prone to altering their behavior because of the presence of an observer. Through observation, a researcher spends time in the daily activities of a group of people (e.g., in a classroom, or following a focal student or group throughout the school day) and collects information while observing them. By doing this, the researcher gains access to firsthand information and the realities of life manifested publicly or implied through the actions of people in that setting. Observation is particularly useful in research with children who have not yet acquired the L2 productive skills and are thus silent for the most part in the classroom or naturally do not talk that much in the classroom (McKechnie, 2000). Observation allows the researcher to document how a child responds nonverbally to input from an adult or the nonverbal interactions amongst peers. A clear example is the study of kindergarten ESL students in the US by DaSilva Iddings and Jang (2008). One focal child, Juan, had recently arrived from Mexico and was in his silent period, the initial stage in second language acquisition where learners are nonverbal (Roberts, 2014). The observations revealed that, even though the focal student had not acquired the necessary linguistic base to communicate, he was actively engaged in the educational activities and was learning to create and express meaning in the L2. The observations also revealed that the focal student was concerned about being a competent member of the classroom community. For example, during a classroom routine where the teacher used a puppet, the researchers noted that: Juan seemed to understand various communicative intentions signaled by the presence or absence of the puppet. He was observed to perform appropriate actions for the practice. During the Quiet Mouse, Juan was not only quiet but also pointed to himself as if to say, “Hey, pick me! I am quiet.”This action served as evidence that Juan understood another person’s intention in the specific practice and that the understanding of intention influenced his intentional state in ways relevant to the situation. DaSilva Iddings & Jang, 2008, p. 578 In the above-cited study, the researchers noted how Juan was keenly attuned to interactional routines in the classroom and quickly learned to pick up on key linguistic cues in English (“It’s time for the Quiet Mouse”) together with the

12

13

12

Observation and Ethnographic Methods 13

physical clues from the context (moving chairs, taking out the puppet). Interaction in educational settings is often a central focus of researchers using observation with young L2 learners (Philp et al., 2008), likely for two reasons. First, the quality and type of interactional opportunities afforded to L2 learners in the target language is seen as a critical feature of second language acquisition, and hence of effective L2 pedagogy (Nava & Predrazzini, 2018). Second, observation is ideally suited to recording learners’ actual language practices rather than perceptions about practices through surveys (see Chapter 3 in this volume) or retrospections through interviews (Chapters 4 and 6). It also captures authentic or naturally occurring interactions rather than controlled or experimental methods where children are responding to something the researcher is guiding them to do (though observation may be used to investigate the effects of a pedagogical intervention). There is no set way of designing observational research. Spada (2019) characterizes classroom research as “broad based and open-ended, or narrowly focused and closed, structured or unstructured, objective or subjective, quantitative or qualitative” (p. 187). We would also add that observation may be used as a primary or secondary data source, and it may involve relatively little direct participation and time in the classroom, perhaps only capturing a snapshot of one lesson, or entail a researcher’s long-term engagement at a site over several years. Decisions about which approach to use in classroom observation largely depend on the aims and goals of the research and the nature of the research questions asked. In the following section, we provide a summary of observation research with young L2 learners.

Participant and Non-Participant Observation One key distinction in the use of observation as a research tool is whether or not the observer interacts directly with the research participants (Wragg, 2012). This is better understood as a continuum of non-participant to participant observation, as shown in Figure 2.1. At the non-participant end, the researcher adopts a “fly on the wall” approach, trying to minimize contact with participants and the effect of the researcher’s presence on the social scene. At the other end of the spectrum, the researcher is fully immersed and takes an active role as a co-participant. Participant observation is one of the hallmarks of an ethnographic approach to classroom research.

Non-Participant Observation: Interaction Analysis Schemes Non-participant observation tends to be more structured. The researcher often uses a checklist or observation protocol to focus the observation on aspects of the classroom that have been determined beforehand as relevant to the research aim. The research is most often concerned with measuring the frequency of certain behaviors or actions, or the presence or lack of certain characteristics. The advantage of this approach is that it is relatively objective, since the research is

14

14 Peter Sayer and Susan Ataei

Non-participant observation

Participant observation

More structured, use of observation checklist or protocol Quantitative or mixed methods

Ethnographic, open-ended, reliance on fieldnotes Qualitative, often case study

FIGURE 2.1 Continuum

of observational approaches

essentially counting occurrences of pre-identified categories. It also lends itself well to projects where researchers are working in a team and studying a phenomenon across many classrooms, since the researchers can establish inter-rater reliability, and the data collected lend themselves to quantitative analysis. In one study using this approach, Silverman et al. (2014) investigated the relationship between teachers’ instruction and students’ vocabulary and comprehension in grades 3–5, comparing native and L2 English speakers. They observed 33 classrooms three times during an academic year and audio-recorded a total of 274 students at the beginning and end of the year. Their secondary source of data was observations of reading/language arts. They reported that observation enabled them to provide context on classroom groupings, instructional materials, and nonverbal information such as a picture or graphic organizer presented to the class. The authors characterized this approach, based on a snapshot of the classroom that measured certain pre-defined features, as a “quantitative observational study.” As mentioned above, much of the research that uses observation as the data collection tool has examined classroom interaction. Spada (2019) gives the historical background on interaction analysis schemes in L2 research, starting with the Foreign Language Interaction system (FLINT; Moskowitz, 1976). She notes that many of these observation schemes, also called observation protocols, were originally developed to evaluate teacher effectiveness and were therefore focused on the teacher’s actions. Later, they came to be used to support the implementation of a particular teaching method or approach, such as the Communicative Orientation to Language Teaching (COLT) scheme (Allen et al., 1984). By the late 1980s, Chaudron (1988) had identified about 24 different schemes designed in this tradition. They all used a variation of a coding system, counting the presence/absence or frequency of predetermined events organized into different categories. Most also required the observer to make low-inference observations, or overt, easily observable behaviors such as the number of questions the students asked, or high-inference observations, which entail the observer using her judgment—for instance, about

14

15

14

Observation and Ethnographic Methods 15

how motivated or engaged the students were in the lesson. Also, some observation schemes focused on larger units of analysis, such as a pedagogical activity or episode (e.g., the interactional structure and routines of “circle time”), or on smaller linguistic or functional units (e.g., the response to a specific type of question or corrective feedback) (Spada, 2019). A recent example of this approach is the project undertaken by Dockrell and colleagues (Dockrell et al., 2012) to look at the extent to which UK classrooms were providing L2 English learners at the Reception/Key Stage 1 levels. They carried out a comprehensive literature review of research on effective language learning in classrooms in order to “identif[y]‌features in the classroom and ways of talking with children which had been demonstrated to support the development of oral language skills” (p. 12). Based on this, they developed the Communication Supporting Classroom (CSC) Observation Tool, an instrument that consists of three parts: (1) an adequate language-learning environment (19 items), (2) language-learning opportunities (5 items), and (3) language-learning interactions (20 items). They employed the CSC to compare 101 classrooms in rural, suburban, and urban contexts. Items that looked at the environment included “Children’s own work is displayed and labeled appropriately.” The observation of language-learning opportunities included the category “Children have opportunities to engage in structured conversations with teachers and other adults.” Language-learning interactions include “Adult encourages children to use new words in their own talking.” It is important to note that, whereas the DaSilva Iddings and Jang (2008) study cited above focused on the language produced by the student, the focus of the CSC is on the conditions for L2 learning created by the teacher. The CSC does measure aspects of the interactions (e.g., “encouraging turn-taking” and “extending children’s language”); however, the data are collected relative to the teacher’s actions vis-à-vis the children, not in how the children respond. One observation protocol that attempts to capture the complexity of children’s interactions in the classroom is the Scheme for Educational Dialogue Analysis (SEDA) developed by Hennessy and colleagues (Hennessy et al., 2016). The SEDA is “rooted in the growing body of literature on classroom talk and dialogic teaching-and-learning from a sociocultural perspective, which highlights the intrinsically social and communicative nature of human life” (p. 17). It builds on work in sociolinguistics on the ethnography of communication (Hymes, 1972; Saville-Troike, 2003) and is organized according to nested levels of analysis: The communicative situation (macro), the communicative events (meso), and the communicative acts (micro). This approach combines the ability to measure the frequency of relevant occurrences and do quantitative analysis with a qualitative, discourse analytic view of the interactions themselves. However, although the SEDA was developed in classrooms in the UK, Spain, and Mexico, it was not designed to look specifically at younger L2 learners.

16

16 Peter Sayer and Susan Ataei

One critique of non-participant observation is that, in some sense, as soon as the observer enters the classroom, she becomes a participant in that her presence will affect the dynamic of that social scene. This is what Labov (1972) described as the “Observer’s Paradox,” or what Hammersley (2007) termed “reactivity”: one must inevitably affect what one is trying to observe. In our own research in classrooms with young L2 learners in Mexico, the US, and Iran, we have experienced very different reactions to our presence when entering a new classroom and trying to be unobtrusive. In a few cases, the teacher has said “we have people coming in and out all the time, so they probably won’t even notice you.” More often, the students were very curious about what we were doing and would crane around in their desks (we usually try to position ourselves in the far back corner of the room to have the best vantage point) to look at us and, when given the chance, approach us to ask about what we were writing. We should note that children’s reactions to observers are strongly influenced by the cultural context and the expectations of appropriate behavior in different educational settings. In Mexico, for example, classrooms are considered by US-American standards to be quite loud, and it is perfectly normal for inquisitive youngsters to approach and ask questions to a new adult in the room. In Iran, on the other hand, it would be rare for primary-aged students to talk to an adult unless specifically directed to by the teacher. When using a video camera with a tripod, children are often even more inquisitive and distracted. Often, researchers try to ameliorate this effect in different ways, by being as unobtrusive as possible, for example, or by not turning on the recording for the first few days until the students and teacher get used to the presence of the researcher and/or camera (called “dummy observations”). In participant observation, however, the researcher takes the opposite stance. She embraces the fact that she is part of the classroom community and often embeds herself in a role that gives her the ability to interact directly with the students, such as a classroom aide. In doing research with children, this phenomenon may even be more evident in that children naturally become curious about the researcher or the equipment that she is using (acknowledging the cultural differences above) and therefore no classroom setting is entirely unaffected by the presence of the researcher even when her goal is to not intervene in the natural flow of events.

Participant Observation: Ethnographic Approaches Since the 1990s, the use of classroom interaction analysis schemes for observational L2 research has given way to more open-ended, qualitative work, which we include here under the rubric of “ethnographic approaches.” This shift has likely been due to the influence of sociocultural, ecological, and post-structural perspectives in studying L2 learning that try to examine particular linguistic behaviors (such as classroom interaction) not as a set of predetermined categories such as the FLINT observation scheme but rather in relation to other

16

17

16

Observation and Ethnographic Methods 17

social categories or phenomenon (such as identity formation) (see Hennessy et al., 2016). Some research on young L2 learners has followed this trend, and most of the recent work we review here on children’s L2 learning using observation methods approach the research from an ethnographic perspective. On this end of the spectrum (Figure 2.1), the researcher takes a more participatory role. This can include partial participation, such as an observer who may circulate about the room to interact with children, asking questions or playing with the children, but whose main role is to observe and take notes. An example is Gort’s (2006) study of bilingual writing in two first-g rade classes in a two-way bilingual education program in the US. She explains that “researchers took detailed field notes of participant activities and audiotaped participant, peer, and teacher conversations while avoiding potential interference with regular classroom instruction, normal patterns and routines, and general student participation in the [writing workshop]” (p. 332). At the far end of the continuum is full participant observation, where the observer has a defined role such as classroom aide or tutoring small groups. This approach is illustrated by Worthy et al.’s (2013) study of bilingual students in a fifth- grade classroom in a transitional bilingual program in Texas. They explain that: Monica (the teacher) sat with the students in a circle on the carpet while she read aloud at least once a day from novels or picture books, with each session lasting 20 to 30 minutes. The first author also sat on the carpet, listening to discussions, and occasionally reading for brief periods of time when Monica needed to be out of the room. p. 316 “Action research,” where the teacher is the researcher, is the ultimate form of participant observation (also called “practitioner research” or “practitioner inquiry”) and often utilizes observation as a primary data source. Other affiliated methods, such as participatory action research (PAR) and collaborative research emphasize the synergetic relationship between the researcher and the teacher. An example of this approach is Çamlibel and Garcia’s (2012) study of a Turkish immigrant first-grader, Zehra. They conducted a qualitative case study in Zehra’s ESL class, English-medium content classroom, and Turkish classroom for six months. The first researcher who observed Zehra in the three classrooms was also her Turkish class teacher. The difficulty with these forms of participant observation, of course, is that the observer is simultaneously fulfilling dual roles. Researchers immersed as participant observers are often not able to simultaneously document what they are observing, especially micro-level interactions. Often, a researcher who is embedded in a classroom as a teachers’ aide, for example, may be asked to work with the students at a small group table to lead a mini-lesson or suddenly called upon to help resolve

18

18 Peter Sayer and Susan Ataei

a behavior problem or dispute between the children, and thus cannot be taking detailed fieldnotes in real time (Emerson et al., 2011). Therefore, having time and a quiet space immediately after a lesson to record fieldnotes is important. Video recordings are also a valuable way of augmenting the observation, especially for language acquisition researchers who need to be able to capture the specific words and the quality of how those words were used by the children. However, the researcher should be mindful that researchers making any type of recordings of children must follow strict ethical standards to protect confidentiality and guarantee informed consent from their parents or caretakers. One approach developed by Roberts et al. (2001) is to employ “ethnography for language learning.”This approach leverages the inquiry stance of ethnographic research and uses it as a pedagogical tool. Examples of this hybrid research- pedagogy technique with children include work done with young English as a second language students in Canada, including Dagenais et al. (2009), Prasad (2013), and Smythe and Toohey (2009). In this work, the children take the role of “co- ethnographers,” and multiple techniques are used to transform traditional participant observation done by the adult researcher into ethnographic pedagogical activities carried out by the children. Prasad’s (2013) study included English learners in grades 4–6 introducing themselves by creating linguistic and cultural self-portraits using colored papers that students chose to associate with different languages, creating family language maps to illustrate patterns of language use among their family members at home, using digital cameras to map the linguistic landscapes of their school and home, and creating plurilingual and multimodal identity texts.

Recent Ethnographic and Observational Studies of Young L2 Learners Observation studies in L2 learning have shifted from more structured, quantitative work using classroom interaction schemes to a more open-ended, interpretative perspective. This has also mirrored the shift from a focus on the teacher and the teaching method to the student and the social environment of the classroom, also called an “ecological approach” to researching language learning. Like other research on young L2 learners, work with observational methods with a specific focus on children’s additional language learning is a relatively new but expanding area of interest, and it is therefore relatively easy to compile a comprehensive list of work. Table 2.1 presents a synopsis of 15 recent studies of young L2 learners where observation was used as the primary method for collecting data. Most of the studies have appeared in the last five years, though we consider Willett’s (1995) study as a seminal study in children’s L2 socialization. In order to characterize this work, we note strong alignment in the approach used by the researchers in terms of study design, duration, other data sources, data analysis, and the theoretical framework and types of research questions the researchers posed.

18

19

newgenrtpdf

18

TABLE 2.1 Studies of young L2 learners that used observation as a data collection tool

Study

Study design

Study duration

Bernstein Ethnographic 1 school year (2018): The perks of and discourse being peripheral: English analytic learning and participation study in a preschool classroom network of practice

Other data collection methods used

Participant observation

Video, teacher Communities of Discourse analysis, interviews, practice, language analytic memos, student socialization, social close analysis of interviews, artifact network theories video transcripts, collection visual mappings, online lexical complexity analyzer

Multiple Study 1: Study ethnographic 2 years and 1: participant case studies 3 months observation Study 2: 1 Study 2: non- school year participant observation

Audio recording, teacher and staff interviews, informal conversations

Theoretical/conceptual framework

Analysis

Sociocultural Thematic analysis, perspective, frequency distribution, central legitimate peripheral tendency of codes participation

Çamlibel & Garcia Qualitative case 6 months (2012): Zehra’s study story: Becoming biliterate in Turkish and English

Participant observation (teacher researcher)

Audio +video Grounded theory, L2 Constant-comparison recording, reading method interviews, artifact collection

Cho (2016): Formal and informal academic language socialization of a bilingual child

Non-participant observation

Audio +video recording, interviews

Ethnographic case study

1 year

Language socialization, communities of practice

Coding and calculation of the frequency of events to find the representative events (continued)

Observation and Ethnographic Methods 19

Bligh & Drury (2015): Perspectives on the “silent period” for emergent bilinguals in England

Participant vs. non-participant

20

newgenrtpdf

Study

Study design

Study duration

Participant vs. non-participant

Other data collection methods used

Theoretical/conceptual framework

Analysis

Collett (2018): Constructing identities: How two emergent bilinguals create linguistic agency in elementary school

Qualitative case 18 months study

Non-participant observation

Audio recording, student interviews, student artifacts

Identity, figured worlds, identities in school-based contexts

Discourse analysis

DaSilva Iddings & Jang Qualitative case 1 year (2008): The mediational study role of classroom practices during the silent period: A new- immigrant student learning the English language in a mainstream classroom

Non-participant observation

Video, interviews Ecological approach (teacher, students, to L2 learning, staff), academic sociocultural artifacts theory

Inductive, purposeful episode selection that represented the classroom routine and the process of L2 learning

Fogle (2008): Home- Multiple case school connections study for international adoptees: Repetition in parent-child interaction

Non-participant observation

Audio recording, ethnographic interviews

Discourse analysis

6 months

Interactional perspective, discourse competence

20 Peter Sayer and Susan Ataei

Table 2.1 Cont.

20

21

20

Qualitative case 6 months study

Non-participant observation

Audio recording, interviews, writing sample

Linguistic interdependence principle, sociocultural theory

Qualitative analysis, teacher member checking

Kenner (2004): Living in simultaneous worlds: Difference and integration in bilingual script-learning

Case study

1 year

Non-participant observation

Video, parent and teacher interviews, artifacts

Social semiotic theory

Qualitative analysis, thematic construction

Kwon & Han Longitudinal (2008): Language case study transfer in child SLA: A longitudinal case study of a sequential bilingual

2+ years

Participant observation

Audio +video recordings, fieldnotes, informal interviews

Cognitive SLA, language transfer

Qualitative coding of utterance based on established categories

Martínez et al. Ethnographic (2017):Becoming“Spanish study learners”: Identity and interaction among multilingual children in a Spanish-English dual language classroom

2 school years Non-participant observation

Video, interviews, artifact collection

Identity formation in Ethnographic social interaction microanalysis of social interaction

(continued)

Observation and Ethnographic Methods 21

Gort (2006): Strategic codeswitching, interliteracy, and other phenomena of emergent bilingual writing: Lessons from first grade dual language classrooms

2

newgenrtpdf

Study

Study design

Study duration

Participant vs. non-participant

Other data collection methods used

Theoretical/conceptual framework

Analysis

Pinnow & Chval (2015): “How much You wanna bet?”: Examining the role of positioning in the development of L2 learner interactional competencies in the content classroom

Ethnographic study

3 years

Non-participant observation

Video (head- mounted cameras), audio recording, parent and teacher interviews, writing samples

Identity, positioning theory

Microanalytic longitudinal mapping of the classroom interactional architecture, multimodal analysis of video

Wagner (2019): Connections between reading identities and social status in early childhood

Case study

4 months

Non-participant observation

Audio, student Identity perspective, and teacher communities of interviews, family practice questionnaire

Micro-ethnographic approach

Willett (1995): Becoming Ethnographic first graders in an case study L2: An ethnographic study of L2 socialization

1 year

Participant observation

Parent interviews, L2 socialization academic records, sociometric test

Inductive qualitative analysis

Worthy et al. (2013): Spaces Ethnographic for dynamic bilingualism study in read-aloud discussions: Developing & strengthening bilingual & academic skills

8 months

Semi-participant observation

Interviews and informal conversations, artifact collection

Culturally relevant Inductive qualitative and responsive analysis pedagogy, dynamic bilingualism

22 Peter Sayer and Susan Ataei

Table 2.1 Cont.

2

23

2

Observation and Ethnographic Methods 23

Study Design and Duration All of the studies we reviewed that use direct observation are qualitative, and most are called “ethnographic case studies.” The case usually focuses on one child, a small group of children, or one classroom, and may be drawn from a larger ethnographic study. Some authors use slightly different terms—for example, Bernstein (2018) refers to hers as an ethnographic and discourse analytic study, while Collett (2018) and several others term it a “qualitative case study.”The hallmark of ethnographic case studies, then, is that they include multiple observations of the same group of learners in the same setting over an extended period of time.The shortest duration (what in ethnography is called “time spent in the field”) of the studies reviewed was a full academic term (four months), while most spent a full school year or more. Besides the duration of the study, researchers must decide on the intensity of the observations. One approach is to observe the same event with less frequency, for example observing an English as a foreign language class once a week—perhaps every Wednesday morning—over the entire school year. This has the advantage of consistently being able to see the same events, such as classroom routines, in the same setting, as well as enabling the researcher to build rapport and relationships with the children. However, where researchers are interested in seeing how the children engage across different subject areas or times or places throughout the day, they may choose shorter, coherent chunks of time. Martínez et al. (2017), for example, wanted to follow a group of focal students in a bilingual program as they move across three different teachers: We collected data five days per week for the entire school day (approximately seven hours per day) over a one-week period in January and then over an eight-week period in May–June during the first year of the study (in Ms. Birch’s classroom) and then over a six-week period in May–June during the second year of the study (in Ms. Cervantes’ classroom). Data sources included ethnographic fieldnotes, transcripts of video-recorded classroom observations, transcripts of video-recorded interviews, and student work samples. p. 172

Other Data Sources Employing observations in combination with other data sources as described above by Martínez et al. (2017) is common. In qualitative research, this is referred to as “triangulation,” where several different types of data are collected in order to obtain a holistic understanding of the phenomenon and as a strategy to show the robustness of the data collected. Generally, the data then need to be connected through the process of analysis. The researcher may arrange that each data type answers a specific research question, but more commonly the analysis attempts to

24

24 Peter Sayer and Susan Ataei TABLE 2.2 Combining observation with other types of data

Primary data sources

Secondary data sources

• Fieldnotes taken during or immediately • Interviews with students, teachers, and/or after direct observation parents (formal and transcribed or informal • Transcripts of video recordings conversations) • Samples of student work or other “artifacts” • Other contextualizing information (demographic information about the school, achievement data such as test scores)

layer the data sources and to show how data from multiple sources converge: A pattern that the observer has noted in her fieldnotes may be confirmed by looking at the transcripts of the recordings and connecting it to something that the teacher has said in an interview. In the studies we consider here, observation is used as a “primary data source,” while other secondary sources were used to support the researcher’s understanding of the observations, as shown in Table 2.2.

Analysis Since the use of ethnographic observation is an inherently qualitative approach, the analysis of this type of data includes the range of commonly used qualitative analysis. One main qualitative approach is to code data to create categories. Thematic analysis organizes the data according to main and sub-themes (see Bligh & Drury, 2015, from Table 2.1). Ethnographic approaches are usually “inductive,” meaning that the researchers do not have pre-established or a priori categories for sorting the data, in contrast to observation schemes discussed previously that use pre-defined categories to guide the data collection and analysis. Inductive analysis is usually based on a grounded theory perspective, meaning that the researchers should generate, define and refine the categories as they move through the analysis. This type of analysis is often referred to as constant comparative analysis (see Çamlibel & Garcia, 2012, from Table 2.1) and involves various iterations or rounds of analysis to both refine the categories and draw connections (triangulate) across the observational data and other sources discussed above. In the L2 studies included in Table 2.1, many of the researchers use some version of discourse analysis. Discourse analysis used in classrooms with L2 learners focuses on how specific language practices illuminate our understanding of a more general phenomenon (see Bernstein, 2018, in Table 2.1); for example, looking at various micro-level interactional sequences during cooperative play can allow researchers to see what is shaping L2 children’s ability to participate

24

25

24

Observation and Ethnographic Methods 25

in small group settings with native-speaking peers. Different types of qualitative analysis are often combined. Bernstein (2018) used discourse analysis combined with visual mappings of the children based on the video and applied a lexical complexity analyzer to examine the words her focal participants produced. Fogle (2008) combined audio recordings of adult-child mealtime interactions with ethnographic interviews. She used discourse analysis to analyze closely the functions of self-and other-repetitions in the development of children’s L2 discourse competence, while ethnographic interviews served to help contextualize the analysis. Because of the sheer amount and complexity of data generated through observations, either through fieldnotes or transcriptions of recordings, many researchers utilize qualitative research software (NVivo and ATLAS.ti are amongst the most widely used) to help organize and manage the analysis process.

Theoretical Frameworks and Types of Research Questions Asked Besides the common methodological approach, recent observation research with young L2 learners is also aligned in terms of the theoretical perspectives and types of questions the researchers are seeking to answer. The key shared feature of this work is a broad “sociocultural perspective” on children’s L2 learning. This includes studies framed as “L2 language socialization” (Bernstein, 2018; Cho, 2016; Willett, 1995), as well as studies that look at language learning as an aspect of the child’s identity development (Collett, 2018; Martínez, et al, 2017; Pinnow & Chval, 2015; Wagner, 2019). Related frameworks included social semiotic theory (Kenner, 2004), an ecological approach to L2 learning (DaSilva Iddings & Jang, 2008), communities of practice and legitimate peripheral participation (Bligh & Drury, 2015), and culturally responsive pedagogy (Worthy et al., 2013). For these researchers, direct observation of children in L2 classrooms is a means of examining how children are able to use language (productively and receptively) to participate in the social life of the classroom. That is, researchers look for evidence over time of the child’s increased skill in successfully navigating the social and academic challenges of the classroom. For example, DaSilva Iddings and Jang (2008) noted how their focal boy, Juan, was keenly attuned to “interactional routines” in the classroom and quickly learned to pick up on key linguistic cues in English (“It’s time for the Quiet Mouse”) together with the physical clues from the context (moving chairs, taking out the puppet). These types of observations allow us to identify the ways that the patterns of classroom interaction either constrain or provide affordances for children to engage, often referred to as the classroom “participation structures.” In Willett’s (1995) seminal study on the L2 socialization of four seven-year-old ESL students in a US first-g rade classroom, she noted the ways that the school day was structured. The following extended

26

26 Peter Sayer and Susan Ataei

description of the morning events of the classroom gives the reader a good sense of how multiple observations compiled over a year from an ecological perspective are reported on: Morning events were highly predictable in Room 17. The children would enter the classroom, taking a piece of lined paper on the way to their seat, and begin writing out their alphabet. When Mrs. Singer finished the roll call, she would begin phonics recitation, in which she would elicit from the children words that fit the particular sound/letter correspondence on which she wanted to focus. Finally, she would ask the children to work independently in their phonics workbook while she worked with one reading group at a time. […] This predictable sequence would become the dependable frame that they would use to figure out how to act, how to talk, and how to work in Room 17, and how to display their growing social and academic competence. These events contained much of the cultural information and language the children would need to gradually become competent members of the class. From the continual reenactment of the event, the children would learn to scrutinize text carefully and follow its directions, engage in the problem-solving logic demonstrated by adults, revise work checked by the aide, write neatly, display their competence, and work independently. […] Despite the teacher’s telling the class to “do their own work,” the girls would learn that it was really all right for children to help one another. The ESL children would notice that the useful words and phrases bandied about the room could be used to gain recognition from teachers and peers. They would soon realize that the words in the workbooks were the answers the teacher was looking for when she asked questions during phonics recitation, and they would be able to display their knowledge long before they knew much English. p. 485 The descriptive function of these observations provides the basis for understanding how the participation structure of the room supports and hinders the children’s socialization into a mainstream English-speaking classroom. Philp et al. (2008) point out that this type of “descriptive work has recognized and highlighted the consequences for L2 development of the interactional opportunities afforded or denied by the child’s peers” (p. 12). Over time, the researcher is able to build up a sophisticated understanding of the workings of the classroom. For instance,Willett (1995) noted that, amongst the four focal ESL students, the three girls who shared a common L1 and socioeconomic background were able to build social supports for one another that also facilitated their L2 English development. The lone ESL boy in the classroom, however, lacked these social supports and his pathway to becoming a competent English-speaking member of the classroom was much more difficult.

26

27

26

Observation and Ethnographic Methods 27

Challenges and Consideration One of the main considerations for researchers conducting ethnographic observations is the time investment. Researchers are advised to go slow, be patient, and take sufficient time to both understand the social scene and get to know the students and teachers in the classroom. The presence of a stranger-researcher in the classroom might be stressful, intimidating, or simply new, affecting and potentially changing the ecology of the classroom. Thus, children may need some time before they get familiar with the researcher and become used to her presence. In this regard, McKechnie (2000) suggests giving ample opportunity for children to get comfortable with the researcher and the equipment that is going to be used during the research before the actual data collection sessions begin. Conducting multiple observations and discarding the data from the initial sessions would also help mitigate some of the effects that the observer might have on the children and on the overall context. Another issue relates to the power distance between the researcher and the children. Often, researchers are adults, physically larger, whose language proficiency is much higher than that of the children and whom the children might view as evaluative and judgmental. This raises the issue of trust and comfort that children may or may not feel with and around the researcher. Corsaro and Molinari (2017) address this issue in their study of a pre-school class in Italy. They report that children viewed Bill (the observer) as an “incompetent” adult because of his lack of proficiency in speaking Italian. He was, in their eyes, an atypical adult and a “big kid” (they called him Big Bill) because he played with them like a child and got engaged and interested in their activities, laughed at their jokes, and did not discipline them as an adult would have. There is an advantage in ethnographic research in the children seeing the adult researcher not as a regular adult but as an “incompetent” one who is relying on the children to learn, a stance referred to as “ethnographic naiveté” (Spotti, 2014). Corsaro and Molinari (2017) argue that a researcher observing children should adopt a “participant status” (p. 241), in part by bringing herself to the level of children, not only in terms of power relations but also sometimes physically, such as sitting on the ground and playing with the same objects as them. This aids the researcher in building rapport with the children, as well as bodily positioning oneself from the child’s vantage point. Positioning the camera at children’s eye level is also effective in giving the researcher their perspective. For example, Pinnow and Chval (2015) used an innovative method of video recording. Because they were interested particularly in the language that L2 students developed in working on math problems, they had their focal participants wear head-mounted cameras so they could capture from a first-person perspective how they worked on the math problems. The strength of ethnographic observations is that the repeated gaze on a single student or group of students allows the observer to see deeply into the ways that

28

28 Peter Sayer and Susan Ataei

the children’s L2 learning is mediated by elements of the social scene that may be completely unanticipated. Key moments, called “critical incidents” or “rich points,” may help illuminate the researcher’s understanding of these mediational processes, but the mundane and everyday occurrences in the classroom are just as important. In fact, when presenting observational data to support her findings, it is important that the researcher not cherry-pick the most outstanding incidents. Instead, as the Willet (1995) excerpt above illustrates, the findings should be supported through multiple instances and not presented impressionistically based on a single moment. This is accomplished through careful organization of the data and thoughtful, painstaking, and repeated readings of the fieldnotes to ensure that the analysis is rigorous. Bligh and Drury (2015), for example, used inductive qualitative analysis to create their categories and then analyzed the frequency distribution and central tendency of their categories to establish which were more or less salient.

Implications for Child SLA Researchers In this chapter, we have outlined how ethnographic observations can be used to research young L2 learners. Early approaches to observation in L2 classrooms focused on the frequency of different types of interaction, with a focus on the effectiveness of the teacher and the teaching method. More recently, observation methods have become more qualitative and geared towards gaining insights about the learners and how they negotiate the classroom learning process. We reviewed recent work in this area and noted that most work using ethnographic observations are case studies that view classrooms from an ecological perspective and L2 learning from a sociocultural perspective. These include studies of L2 socialization and identity formation. To summarize, the main features of ethnographic observations are: 1. Observations of naturally occurring social interactions. There is no intervention or manipulation of the interaction by the researcher. Rather, the researcher’s primary role is to document what the participants are doing and, to the extent possible, how they understand their actions and make sense of their world from their own point of view. 2. The starting point of observation is the detailed description of a phenomenon within the context of the social scene where it takes place. The interpretation and explanation of the phenomenon then proceed from the description. 3. The main data generated by observations are fieldnotes and transcriptions of video recordings. The analysis is usually qualitative, involves the construction of codes, and often combines observational data with other data sources.

28

29

28

Observation and Ethnographic Methods 29

4. Ethnographic observations tend to be long term. In order to fully understand the classroom as a social scene and document changes in the participants, the study should be longitudinal. Multiple observations of the same students in the same classroom should be made, usually for a minimum of six months. When using observational methods with young L2 learners, researchers should keep the following in mind: 1. It connects the verbal data (“Don’t hit me!”) to the actions within the social setting (one child pushing another) to a larger question (how an L2 student uses her emerging linguistic repertoire to manage peer interactions). 2. If possible, the researcher should try to physically align herself (or the camera) to capture the children’s point of view. 3. Finally, participant observation involves active engagement with children, including verbal (e.g., answering a lot of questions about yourself) and other types of interactions (e.g., tying shoelaces). For long- term observational research, this allows the researcher to build rapport and relationships with participants in ways not often afforded by other methods.

Further Readings Pellegrini, A.D. (2013). Observing children in their natural worlds: A methodological primer (3rd ed). New York & London: Psychology Press. While it does not consider language learning in the classroom per se, this is an excellent text that discusses the main methodical issues for doing observation research with children. Spada, N. (2019). Classroom observation research. In J. W. Schwieter & A. Benati (Eds.), The Cambridge handbook of language learning (pp. 186–207). Cambridge University Press. The author gives a good overview of the development of observation research in L2 classrooms. Willett, J. (1995). Becoming first graders in an L2: An ethnographic study of L2 socialization. TESOL Quarterly, 29(3), 473–503. This is an early seminal study of children’s L2 learning from a language socialization perspective. Wragg, E.C. (2012). An introduction to classroom observation (Classic ed). New York: Routledge. While it does not focus on children or language learning, this is a comprehensive text for researchers doing classroom observation research.

Discussion Questions 1. What are the theoretical perspectives most often associated with ethnographic observations of young L2 learners? 2. What types of data sources can be combined with observations? How? 3. What are some ways of reducing the effect of the observer’s presence in the classroom?

30

30 Peter Sayer and Susan Ataei

References Allen, J. P. B., Frohlich, M., & Spada, N. (1984). The communicative orientation of second language teaching: An observation scheme. In J. Handscombe, R. Orem & B. Taylor (Eds.), On TESOL ’83 (pp. 232–252). Washington, DC: TESOL Intl. Bernstein, K. A. (2018).The perks of being peripheral: English learning and participation in a preschool classroom network of practice. TESOL Quarterly, 52(4), 798–844. https:// doi.org/10.1002/tesq.428 Bligh, C., & Drury, R. (2015). Perspectives on the “silent period” for emergent bilinguals in England. Journal of Research in Childhood Education, 29(2), 259–274. https://doi.org/ 10.1080/02568543.2015.1009589 Çamlibel, Z., & Garcia, G. (2012). Zehra’s story: Becoming biliterate in Turkish and English. In E. B. Bauer & M. Gort (Eds.), Early biliteracy development (pp. 118–138). New York: Routledge. Chaudron, C. (1988). Second language classrooms: Research on teaching and learning. New York: Cambridge University Press. Cho, H. (2016). Formal and informal academic language socialization of a bilingual child. International Journal of Bilingual Education and Bilingualism, 19(4), 387–407. https://doi. org/10.1080/13670050.2014.993303 Collett, J. (2018). Constructing identities: How two emergent bilinguals create linguistic agency in elementary school. Bilingual Research Journal, 41(2), 133–149. https://doi.org/ 10.1080/15235882.2018.1451410 Corsaro,W. A., & Molinari, L. (2017). Entering and observing in children’s worlds: A reflection on a longitudinal ethnography of early education in Italy. In P Christensen & A. James (Eds.), Research with children (pp. 11–30). Abingdon, UK: Routledge. Dagenais, D., Moore, D., Sabatier, C., Lamarre, P., & Armand, F. (2009). Linguistic landscape and language awareness. In E. Shohamy & D. Gorter (Eds.), Linguistic landscape: Expanding the scenery (pp. 253–269). London: Routledge. DaSilva Iddings, A. C., & Jang, E. Y. (2008). The mediational role of classroom practices during the silent period: A new-immigrant student learning the English language in a mainstream classroom. TESOL Quarterly, 42(4), 567–590. https://doi.org/10.1002/ j.1545-7249.2008.tb00149.x Dockrell, J. E., Bakopoulou, I., Law, J., Spencer, S., & Lindsay, G. (2012). Developing a communication supporting classrooms observation tool. London: Department for Education. Emerson, R. M., Fretz, R. I., & Shaw, L. L. (2011). Writing ethnographic fieldnotes (2nd ed.). Chicago & London: University of Chicago Press. Fogle, L. W. (2008). Home-school connections for international adoptees: Repetition in parent-child interaction. In J. Philp, R. Oliver & A. Mackey (Eds.), Second language acquisition and the younger learner: Child’s play? (pp. 279–302). Amsterdam & Philadelphia: John Benjamins. Gort, M. (2006). Strategic codeswitching, interliteracy, and other phenomena of emergent bilingual writing: Lessons from first grade dual language classrooms. Journal of Early Childhood Literacy, 6(3), 323–354. https://doi.org/10.1177/1468798406069796 Hammersley, M. (2007). Observation, participant and non-participant. In G. Ritzer (Ed.), Blackwell encyclopedia of sociology (pp. 3236–3240). Blackwell Reference Online: Blackwell Publishing. Hennessy, S., Rojas-Drummond, S., Higham, R., Márquez, A.M., Maine, F., Ríos, R. S., García-Carrión, R., Torreblanca, O., & Barrera, M. J. (2016). Developing a coding scheme for analyzing classroom dialogue across educational contexts. Learning, Culture & Social Interaction, 9, 16–44. https://doi.org/10.1016/j.lcsi.2015.12.001

30

31

30

Observation and Ethnographic Methods 31

Hymes, D. (1972). Models of interaction in language and social life. In J. Gumperz & D. Hymes (Eds.), Directions in sociolinguistics: The ethnography of communication (pp. 35–71). London: Basil Blackwell. Kenner, C. (2004). Living in simultaneous worlds: Difference and integration in bilingual script-learning. International Journal of Bilingual Education and Bilingualism, 7(1), 43–61. https://doi.org/10.1080/13670050408667800 Kwon, E.-Y. & Han, Z. (2008). Language transfer in child SLA: A longitudinal case study of a sequential bilingual. In J. Philp, R. Oliver & A. Mackey (Eds.), Second language acquisition and the younger learner: Child’s play? (pp. 303–332). Amsterdam & Philadelphia: John Benjamins. Labov, W. (1972). Sociolinguistic patterns. Philadelphia, PA: University of Pennsylvania Press. Martínez, R. A., Durán, L., & Hikida, M. (2017). Becoming “Spanish learners”: Identity and interaction among multilingual children in a Spanish-English dual language classroom. International Multilingual Research Journal, 11(3), 167–183. https://doi.org/10.1080/ 19313152.2017.1330065 McKechnie, L. (2000). Ethnographic observation of preschool children. Library & Information Science Research, 22(1), 61–76. https://doi.org/10.1016/S0740-8188 (99)00040-7 Moskowitz, G. (1976). The FLINT system: An observational tool for the foreign language classroom. In A. Simon & E. Boyer (Eds.), Mirrors for behavior: An anthology of classroom observation instruments (pp. 125–157). Philadelphia, PA: Center for the Study of Teaching, Temple University. Nava, A. & Pedrazzini, L. (2018). Second language acquisition in action: From principles to practice. London: Bloomsbury. Pellegrini, A. D. (2013). Observing children in their natural worlds: A methodological primer (3rd ed). New York & London: Psychology Press. Philp, J., Mackey, A. & Oliver, R. (2008). Child’s play? Second language acquisition and the younger learner in context. In J. Philp, R. Oliver & A. Mackey (Eds.), Second language acquisition and the younger learner: Child’s play? (pp. 3–26). Amsterdam & Philadelphia: John Benjamins. Pinnow, R. J., & Chval, K. B. (2015). “How much you wanna bet?”: Examining the role of positioning in the development of L2 learner interactional competencies in the content classroom. Linguistics and Education, 30, 1–11. https://doi.org/10.1016/j.lin ged.2015.03.004 Prasad, G. (2013). Children as co-ethnographers of their plurilingual literacy practices: An exploratory case study. Language & Literacy, 15(3), 4–30. https://doi.org/10.20360/ G2901N Roberts, C., Byram, M., Barro, A., Jordan, S., & Street, B. (2001). Language learners as ethnographers. Clevedon: Multilingual Matters. Roberts, T.A. (2014). Not so silent after all: Examination and analysis of the silent stage in childhood second language acquisition. Early Childhood Research Quarterly, 29(1), 22–40. https://doi.org/10.1016/j.ecresq.2013.09.001 Saville- Troike, M. (2003). The ethnography of communication: An introduction (3rd ed). Oxford: Blackwell. Schieffelin, B., & Ochs, E. (1986). Language socialization. Annual Review of Anthropology, 15, 163–191. Silverman, R. D., Proctor, C. P., Harring, J. R., Doyle, B., Mitchell, M. A. & Meyer, A. G. (2014). Teachers’ instruction and students’ vocabulary and comprehension: An

32

32 Peter Sayer and Susan Ataei

exploratory study with English monolingual and Spanish-English bilingual students in grades 3–5. Reading Research Quarterly, 49(1), 31–60. https://doi.org/10.1002/r rq.63 Smythe, S., & Toohey, K. (2009). Investigating sociohistorical contexts and practices through a community scan: A Canadian Punjabi-Sikh example. Language and Education, 23(1), 37–57. https://doi.org/10.1080/09500780802152887 Spada, N. (2019). Classroom observation research. In J. W. Schwieter & A. Benati (Eds.), The Cambridge handbook of language learning (pp. 186–207). Cambridge University Press. Spotti, M. (2014). Voices in the classroom: On being caught between pupils’ inventiveness and ethnographic naivety. Ethnography and Education, 9(3), 359–372. https://doi.org/ 10.1080/17457823.2014.919601 Wagner, C. J. (2019). Connections between reading identities and social status in early childhood. TESOL Quarterly, 53(4), 1060–1082. https://doi.org/10.1002/tesq.529 Willett, J. (1995). Becoming first graders in an L2: An ethnographic study of L2 socialization. TESOL Quarterly, 29(3), 473–503. https://doi.org/10.2307/3588072 Worthy, J., Durán, L., Hikida, M., Pruitt, A., & Peterson, K. (2013). Spaces for dynamic bilingualism in read-aloud discussions: Developing and strengthening bilingual and academic skills. Bilingual Research Journal, 36(3), 311–328. https://doi.org/10.1080/15235 882.2013.845622 Wragg, E.C. (2012). An introduction to classroom observation (Classic ed). New York: Routledge.

32

3

32

3 SURVEYS AND QUESTIONNAIRES WITH YOUNG LANGUAGE LEARNERS Emiko Hirosawa and W. L. Quint Oga-Baldwin

Introduction This chapter is for researchers interested in conducting survey research using questionnaires with young learners. Adults are different from one another in their own ways, as are children, but children also differ from grown-ups in meaningful ways. Children cannot be treated the same as adult research subjects because children are considerably different from adults in their needs, capabilities, and perspectives (Thompson & Jackson, 1998), and researchers need to be aware of consequences when appropriate measures are not taken. In the field of second language acquisition, research has been centered around the language learning of adults compared to that of children and adolescents (Oliver & Azkarai, 2019; Butler, 2015; Paradis, 2007). As a consequence, much less is known about how children experience learning and how they develop skills compared to what is known about adults. This is unfortunate for the rapid accumulation of interest in the foreign language learning of young learners. This chapter aims to help stimulate research in this area by describing how age affects question answering and what precautions are needed to design a survey for children who study a new language. In colloquial use, questionnaire and survey refer to the same thing—a set of questions used to find out information or the opinions of a large number of people—and they are used interchangeably. Even in the field of research, some handbooks on research methods do not give a clear definition of what a survey is (de Leeuw et al., 2008). Since this chapter discusses the design methods of both, it will be helpful to clarify the differences between the two. A survey is a research strategy in which researchers ask questions of a sample of respondents from a large population in a systematic way to collect specific DOI: 10.4324/9780367815783-3

34

34 Emiko Hirosawa and W. L. Quint Oga-Baldwin

data. Surveys can be conducted in person, by phone, by (e-)mail, or via a website. The identifying character is the use of a fixed set of questions—a questionnaire. A questionnaire consists of open-ended questions, closed-ended questions, or both. Closed-ended questions are most frequently used in which respondents choose a response from a list provided (Blair et al., 2014). A questionnaire is a measurement tool used during research for data collection, whereas a survey encompasses all of the steps of research, from sampling and measurement development to data gathering and analysis. In many ways, questionnaires and surveys for young learners function similarly to other forms of assessment (McKay, 2006). Like formal assessments (tests), surveys and questionnaires are subject to practical, logistical, and statistical procedures to demonstrate their valid functioning (Kline, 2019). Unlike tests, the goal of this assessment has much lower stakes, and the content often cannot be easily confirmed against external norms or criteria (Montrul et al., Chapter 7, this volume; Butler, 2019). At the same time, researchers still seek a concrete, empirical set of data on students’ (subjective) understanding of the world. Using many of the same procedures as formal assessments (e.g., interviews and paper-based questions), surveys can provide insight into the even less visible world of young language learners’ cognitions, beliefs, and attitudes (Djigunovic & Lopriore, 2011). In this chapter, we will focus on how to design survey research with children of ages 4 to 12.The chapter will refer explicitly to survey design using questionnaires— the recommended steps and precautions. The first section aims to supply readers with a brief background on survey research concerning children and more specific details on how to tailor questionnaires for children. It includes details on what and why caution is needed to create effective questionnaires for children. Primarily, the focus of the chapter will be on the empirical fundamentals of working with children and adolescents; young language learners are first and foremost children, and language learners secondarily—and then only in the best of cases. The second half of the chapter aims to help readers see the steps involved in survey research and the fundamental procedures to be taken in a rough chronological order.

Questionnaires when Children are Respondents: A Brief History One of the important reasons for early surveys was for marketing industries and researchers to gain an understanding of social problems. In these early days, when surveys started being conducted, factual information was gathered with not so much attention given to the wording of questions. However, when interest started growing for measuring subjective states, researchers found that the wording of questions sometimes had large effects on the answers, especially for attitude questions (Groves et al., 2004).Young learners have long been a neglected minority in research and surveys generally. In many original texts, children’s individual differences were ignored, despite explicit focus over 60 years ago in Maccoby

34

35

34

Surveys with Young Language Learners 35

and Maccoby’s (1954) article on interviewing children (cited in Borgers et al., 2000). They were long seen as incapable of providing reliable and validated data due to their developmental attributions (Johnson & Foley, 1984) even for child development studies (Scott, 1997). Although there are some areas of children’s lives in which proxy reporting remains superior (e.g., collecting data about family or health-related issues from an informed parent or caretaker serving as a proxy for the child), children’s cognitive processes have been reliably tested to assess school readiness and general mental capacity for over a century (Spearman, 1904). Children are currently regarded to be the best respondents for their own subjective feelings, knowledge, or opinions, and proxy reporting is no longer seen as a sufficient measure to reflect children’s subjective viewpoints (Scott, 1997). Many official government agencies have developed special surveys and academic institutes for surveying children, a recognition of the need for accurate data on children’s perspectives, actions, and attitudes. However, knowledge and methods on how to survey children have not been given the same attention as methods for surveying adults.

Building a Questionnaire for Children A questionnaire that produces good results for children will follow the same procedures as a good questionnaire for adults, but it will also account for developmental differences. For any survey to work well, ample attention needs to be paid to wording. Dillman et al. (2014) offer key principles for creating questionnaires that will be cognitively easy for the respondents to comprehend: • • • • • • • • • •

Choose the appropriate question format. Make sure the question applies to the respondent. Ask one question at a time. Make sure the question is technically accurate. Use simple and familiar words. Use specific and concrete words to specify the concepts clearly. Use as few words as possible to pose the question. Use complete sentences that take a question form, and use simple sentence structures. Make sure “yes” means yes and “no” means no (e.g., avoiding double-negatives). Organize questions in such a way as to make it easier for respondents to comprehend the response task. Dillman et al., 2014, p. 126

Given these recommendations for creating questionnaires for adults, tailoring surveys to suit children requires taking these precautions to extremely refined levels, with additional care given to cognitive developmental features. Cognitive development is theorized to occur in an age-fixed sequence of stages according to

36

36 Emiko Hirosawa and W. L. Quint Oga-Baldwin

Jean Piaget’s theory of cognitive growth (1960): Sensory-motor intelligence (birth to age 2); preconceptual thought (age 2 to 4); intuitive thought (age 4 to 7–8); concrete operation (age 8 to 11); and formal thought (age 11 to 16). From 16, according to Piaget, cognitive capacities are fully developed. Although later work by his successors have indicated that the boundaries of the stages are fuzzy and frequently overlap depending on heredity, experience, learning, and social environment (Borgers et al., 2000), Piaget’s theory informs us on how children’s cognitive capabilities increase with age. Our population of interest is still in the midst of developing their cognitive abilities, on top of learning a new language with all of the complications that entails (McKay, 2006). Cognitive development affects children’s cognitive processing ability, which will largely influence their validity as a respondent to a questionnaire.

Answering Questions Mirrors Cognitive Capacity and Processing Tourangeau and Rasinski (1988) have detailed four stages of question answering that fully grown adults go through in order to respond to a survey question in which response options are prespecified: (a) interpretation, reading the written text and understanding what information the question requires to be retrieved in the coming step; (b) retrieval, searching in one’s mind for relevant information; (c) judgment, deciding on what information is most appropriate to answer the question; and (d) response selection, revisiting the questionnaire and considering which of the provided responses most closely resembles the information chosen in the judgment phase. When all four stages are successfully handled for most questions, a questionnaire is considered appropriately answered. Children are also assumed to go through the same cognitive process. However, since their mental faculties have not yet fully developed, they are more sensitive to cognitively demanding questions than adults. This will make them experience each of the four stages of question answering described above in a very different manner.

Stage One: Interpretation In the interpretation stage, responders read and try to understand the intended meaning of the question and what information it asks for. To do this, one needs to be able to read the words written (literacy skills) and interpret the words as they are intended (comprehension skills).The concrete operational stage, age 7–11 (Piaget, 1960), is usually when language develops and reading skills are acquired. Although age does not strictly equate with development, it makes little sense to use self-administered paper-and-pencil questionnaires with pre-literate 4-year- olds. Many children under the age of 7 will not have sufficient cognitive skills to answer direct questionnaires and provide effective results (de Leeuw et al., 2011),

36

37

36

Surveys with Young Language Learners 37

so it is too naive to assume that a child understands the intent of a question just because they can sound out the words written on a questionnaire. This is not to say that children below the age of 7 cannot be surveyed (Borgers et al., 2000). For children at this young stage, Borgers and colleagues advise that surveys should be either short, qualitative interviews or simple structured questionnaires, presented as a game or a form of play to account for their still emerging literacy skills and short attention span. Examples of this in language can be seen in Wu’s (2003) investigation of pre-school aged Chinese children. Children from age 8 to 11 can be surveyed through individual or group interviews and self-administered surveys; they can answer a range of different questions and a well-designed questionnaire with some consistency (de Leeuw & Otter, 1995). de Leeuw et al. (2011) emphasize the following to direct special care toward wording for children. First, use simple words. This is essential because if the child does not know the words used, they will not understand the question. Questions must be written below children’s reading level. Second, they advise that sentences be made short. The longer the sentence, the higher the comprehension skills required. Shorter sentences are less cognitively taxing and thus easier for children to understand. Complex sentences are harder to read; breaking them down into shorter sentences will convey the same meaning and will likely produce better understanding. However, with less than enough information, it will be hard for them to comprehend what is required of the item (e.g., Age:______). In this situation, it is better to use complete sentences to facilitate understanding (e.g., Age:_____ → How old are you?). Conversely, there are times when longer sentences can increase the clarity and meaning. In that case, longer sentences are preferred. Last, it is essential to avoid vague quantifiers such as “sometimes.” Children have not yet developed the meta-view of themselves needed to generalize an action and apply it to the given quantifiers and they have minimal tolerance for ambiguity (de Leeuw & Otter, 1995). Ambiguity should be eliminated from any questionnaire, especially those targeting children. Researchers should be continually on guard of potential ambiguities that could ruin the item. Also, they advise strongly that negatively formulated questions be avoided, as it will make the intention of the question ambiguous. Children aged 7–10 are at higher risk of misunderstanding the question’s intention and tend to need clear definitions. It must be taken into account as well that children, especially at younger ages, will take a word at its literal meaning (e.g., the word “sex” may induce giggling). They also sometimes have difficulty comprehending generalized, depersonalized questions (e.g., children of my age; Borgers et al., 2000).

Stage Two: Retrieval More often than not, surveys performed in the social sciences ask for information about a person’s past. After interpreting the question, one’s memory is scanned for relevant information. This sounds very simple, but it is usually unlikely that

38

38 Emiko Hirosawa and W. L. Quint Oga-Baldwin

all relevant information is available and accessed; retrieving unrelated information linked with the relevant information is also common. While this step is affected by personal beliefs and bias in adults (Tourangeau & Rasinski, 1988), children have their own problems due to memory capacity. Consistently, children’s spontaneous recall ability is weaker than that of adults. Further literature indicates individual differences within age groups in how well children accurately recall past experiences (de Leeuw et al., 2011). Young children also have more difficulty differentiating between an actual perception of an event and what they imagined (Johnson & Foley, 1984).Taking this into account, questions for children are better framed in the here and now. The threshold for when children’s recall ability resembles adults seems to be age 11 (de Leeuw et al., 2011). That said, children still cannot freely and easily act and think as adults do.

Stage Three: Judgment Eleven-year-olds will have roughly the same memory capacity as adults, but processing speed does not develop at the same rate. Even at age 12, children may require approximately 1.5 times the processing time of adults (de Leeuw et al., 2011). Without sufficient time to process what is being asked for, children might become stressed, resulting in satisficing: taking shortcuts during the response process, answering in the wrong format, making up answers or skipping questions (Dillman et al., 2014). Since cognitive capacity and skills develop by age, the younger the respondent, the lower their processing speed and ability to cope with cognitive demands. This means that the length of the questionnaire needs to be carefully considered so that it will not stress the respondent. There is another issue that must be addressed in this stage that may affect judgment. Children in middle childhood (age 7–12) have a strong bias and sensitivity toward social desirability. Socially desirable responses involve the tendency to answer a question so as to make the respondent look good in the eyes of others. Heavy bias on social desirability negatively affects validity in that the respondent does not show their candid self (Lalwani et al., 2006). Social desirability bias is prevalent in adults, but children are incomparably more sensitive to the fear of doing something wrong and have a high tendency to please (de Leeuw et al., 2011), often presenting as right-side bias in survey responses (Oga-Baldwin & Nakata, 2017). Any sign of suggestion in tone, expression, or language from an adult will potentially alter the child’s response until middle to late childhood (age 10–12), when children become less suggestible (Lalwani et al., 2006).

Stage Four: Response Selection After going through the three stages, respondents will choose a response that matches, or most closely resembles, their judgment and apply it as their response to the question. When a question provides too many choices, it will increase the

38

39

38

Surveys with Young Language Learners 39

complexity of the question.Thus, it is important to find a sweet spot that does not heavily tax cognition but still collects sufficient data. It has been demonstrated that the maximum number of choices that works well for adults is seven (Netemeyer et al., 2003). At age 7–10, two to three is an advised number of response options and, for age 11–15, four to five (de Leeuw et al., 2011), though there are studies using 5-point rating systems on children from age 10–11 that have produced usable results (Oga-Baldwin et al., 2017). To further increase clarity, all response categories should be labeled (e.g., a 4-point Likert scale where (1) is labeled “not at all true for me,” (2) as “not very true for me,” (3) “sort of true for me,” (4) “very true for me.”) This can improve the reliability of answers in adults (Krosnick & Fabrigar, 1997), thus may be more crucial for children. Caution is needed when labeling, since ambiguous labeling will have a negative influence on data quality (de Leeuw & Otter, 1995). Graphical response option choices, similar to what is often used in visual articulation scales (VAS) (Hayes & Patterson, 1921), might make answering surveys enjoyable, which will compensate for children’s short attention span and keep them interested in question answering. This has shown to be effective and enjoyable even for children at age 16 (Scott et al., 1995). Djigunovic and Lopriore (2011) have also reported a three smiley-face scale for 7-to 9-year- olds and five choices for 10-to 11-year-olds working well with a large and diverse sample of EU-based young learners.

Cultural Sensitivity We should also take into account the effect of culture on cognitive growth. Culture impacts cognition (King & McInerney, 2014) perhaps as much as developmental stages. Culture affects how children react and attend to the environment, perceive others, memorize, and learn (Han & Northoff, 2009; Lalwani et al., 2006). Empirical research into cultural influences on cognition has progressed in the last few decades and provided robust informational background to question if measurement tools developed in one culture can be blindly applied to other cultures just by superficial/literal linguistic translation (Auer et al., 2000; Henrich et al., 2010). How to account for this issue is introduced in a later section. To summarize, people largely rely on their cognitive processing ability and capacity to answer questions appropriately. The fact that children are less developed in this area will make them more sensitive to problems with measurement tools, and errors are likely to be magnified. Questionnaires will need to be written and planned to match the intended children’s developmental stages, and this will, in turn, affect the quality of data attained. All the mentioned precautions for building a questionnaire for children apply for foreign language research, although one aspect that might be stressed is that, when surveying children, it should be in whatever language they feel comfortable in, which is usually not their L2.

40

40 Emiko Hirosawa and W. L. Quint Oga-Baldwin

Survey Research Overview: Our Recommended Procedures Working with children throughout the survey creation process to develop effective instruments will likely lead to the best and most worthwhile results. While there are numerous debates on the validity and applicable reach of self-report questionnaire instruments (see Fryer & Dinsmore, 2020), good surveys are nonetheless very carefully tested to ensure their basic statistical and theoretical validity. Surveys should be recognized as a different form of assessment, and item creation for surveys thus needs to undergo the same rigorous process as is often done for tests (see Montrul et al., Chapter 7, this volume; Butler, 2019). In using surveys to research children, it is crucial to balance both the qualitative theoretical and quantitative statistical aspects of survey content to develop the best representation of these latent constructs. Based on our prior experience (Fryer & Oga-Baldwin, 2019; Oga-Baldwin & Nakata, 2015, 2017; Oga-Baldwin et al., 2017, etc.), we suggest the following steps for developing useful surveys.

Research Questions While it should go without saying, research questions need to appropriately address what surveys can answer and need to be decided first. Surveys primarily capture self-report assessments of learners’ attitudes, emotions, thoughts, and behaviors that would be otherwise invisible without asking the person. Research questions must therefore deal explicitly with these constructs. Research might consist of multiple data gathering methods, thus data that can be accurately gathered from surveys are limited, which will, in turn, define the possible research questions that can be asked.

Survey Gathering and Analysis For any researcher interested in doing surveys, having a clear analysis plan from the beginning will define the success of the project.We strongly (emphasis intended) encourage all researchers who use surveys to measure young learners’ behaviors and attitudes to have a clear conception of how the analysis will be carried out. Will mean scores for the latent constructs (motivation, emotions, engagement, etc.) be sufficient? Is the sample size sufficiently large enough for latent variable measurement? What kinds of relationships do the researchers hope to show? What would be sufficient evidence for this? All of these questions should be addressed before doing any survey research. On a practical-logistical level, how the data will be converted to digital form also needs to be addressed prior to the survey design. Given that the data will need to be presented in some sort of digital form for publication, preparing for this before designing your survey will save a great deal of headaches and heartache later. If using rating-scale questionnaires (e.g., Likert or VAS), access to a mark

40

41

40

Surveys with Young Language Learners 41

sheet reader and high-speed scanner can make data gathering significantly easier. An optical mark reader software to scan and digitize data is useful.Though expensive, the hours it saves make it well worth the investment (e.g., Remark Office OMR (https://remarksoftware.com/products/offi ce-omr/), but other solutions exist. Though online survey software makes this step unnecessary, paper surveys carry many advantages over online ones, especially in schools that have not yet been digitized. When doing surveys with young language learners, a clear understanding of how the data will be digitized and analyzed is absolutely necessary. We are stressing this here before discussing any further steps as it is often forgotten by new researchers. The logistical and analytical elements can be just as important as the research questions and, in many cases, can define and limit the questions that can be answered.

Theory and Prior Examples A clear understanding of prior theories and the applicable instruments used in these theories is necessary for developing new survey instruments. While there are aspects of teaching young language learners that may indeed be new, strong caution is needed for readers to look beyond the often-insular field of foreign language education to childhood development, education, psychology, and cultural studies to find examples of successful instruments. Though we do not expect new researchers to develop encyclopedic knowledge of all related fields, contact and familiarity with fields outside of language will reinforce good practices and help new researchers to triangulate practices for the most effective research. Truly new research topics that will involve self-report surveys are rare, and reinventing the wheel from nothing requires both extremely good justification and powerful data gathering methods. Thus, we strongly recommend that researchers take from prior examples. This may involve the wholesale use of an existing survey, or borrowing elements of surveys, such as the response format, appearance, anchoring, or wording conventions from research involving similar populations. Starting with prior successful surveys involving young learners means that constructs (i.e., the abstract, theoretical ideas that the researcher wishes to study, such as motivation, emotion, or beliefs, etc.) have clear connections to the existing literature, and it will provide hints on how best to ask questions that children can answer.

Translation in All its Forms Once question items and constructs in line with the project have been clearly chosen, the survey must be effectively translated.Asking students about complicated issues in their non-dominant language is a recipe for difficulty; we strongly recommend addressing students in their own language. If the original survey comes from

42

42 Emiko Hirosawa and W. L. Quint Oga-Baldwin

a language other than the target population of children’s own, this will involve lexical translation. If the survey has been used with older learners or adults in the children’s own language, it still sometimes must be translated into “childese”— language that is familiar and comfortable for children. After translating the survey into a comprehensible language, the constructs need to be confirmed with the target population; beyond translating the items into a language that children in a specific culture can understand, the items need to clearly connect to the ideas as they are used in the target culture. To undertake these steps, it is recommended to involve potential stakeholders in the process. Stakeholders may include (but are not limited to) parents, teachers, school administrators, and the young learners themselves. Once items are translated into young learners’ own language, the stakeholders can provide clear information on the content of the items. For this purpose, we can summon preliminary focus groups involving the stakeholders. At least a sample of the learners themselves should be asked, and often the supervising parents, teachers, and administrators will accompany them in this process if the researchers are unfamiliar to the target population.

Sorting Constructs: The KJ Method Another method is using a modified version of the KJ method (Sculpin, 1997) to choose the best items to study a construct. In this method, the linguistic translations of individual items are written on index cards and presented to a focus group of learners and/or other stakeholders. Learners and other stakeholders will then be asked to rewrite or restate items in ways familiar to them. The researchers can explain the intended meaning of specific items and ask learners to elaborate and explain their understandings of items.When misunderstandings or differences in interpretation occur, the children themselves should rewrite or restate items for self-report; if focus group children are unable to state the items on their own terms, it is highly unlikely that a larger population will be able to respond in a consistent way. For open-ended questionnaires and interviews, the process may end here; for Likert and other rating scale-based questions targeting larger populations, further steps will be needed. This process also serves to limit the number of items, which ensures that learners will finish the survey and accounts for their short attention span and patience for long and complicated surveys (de Leeuw et al., 2011). This will naturally limit the number of constructs that can be studied, a further advantage in focusing research questions and potential hypotheses. Given that a minimum of three items are needed to appropriately identify a latent factor (Kline, 2010) and at least one item may need to be deleted in later analyses, we suggest four to five indicators per construct in these focus groups. One thing to keep in mind is that defining each construct takes time, and young

42

43

42

Surveys with Young Language Learners 43

learners may not be willing to participate in focus groups that last a long time. Recognizing that time taken doing surveys on young learners is time taken from other activities, asking the target population to take a revised version of the same survey again may not be feasible and can diminish trust in you as a researcher. Time invested in the credibility and validity of the instrument strongly increase the likelihood that surveys will yield worthwhile results. Providing focused wordings that describe a construct will help prevent measurement issues later, and it also limits the number of constructs that can be assessed. Once question wordings are decided, students sort the cards according to their understanding of the similarities of the items. In many ways, this resembles a “qualitative factor analysis,” testing whether hypothesized items coalesce clearly. Learners and stakeholders sort the cards into piles according to the perceived similarity of the items. Please note that we recommend this method for use with students who have a firm grasp of concrete operations and can recognize and vocalize some abstract ideas (i.e., late childhood through adolescence, roughly ages 9–12). If the way the items are worded does not match the research objectives and theoretical constructs, these items need further work before they are used in a real survey. Once a reasonable consensus has been reached regarding placement, the items can be used. This might be assessed by unanimous consensus, where all participants agree with the assigned categories, or by asking each person to sort the items individually and then comparing agreement using inter-rater reliability. For some groups, providing the assumed number of constructs can be useful, selecting one of the cards to represent each construct, then asking learners to sort them into categories. Although these steps may appear laborious and time consuming to some researchers, following these or other qualitative grounding steps prior to survey implementation will save much time and resources when items do not work as intended. Careful planning is required, and the above steps provide firm grounding for launching the study.

Implementation Once survey items have been selected and credibly piloted, ethical considerations must also be addressed. Parent or guardian buy-in to the research is necessary, as is compliance with local and national ethics when publishing research. In working with parents, teachers, and schools, their cooperation and endorsement of the research goals will influence the success of your research. In complying with research ethics, it is also important to remind the young learners that their participation is voluntary, in ways that are culturally appropriate. For this as well, we emphasize that classroom teachers from the local culture (in the event that they are not the researcher) can provide the best insight into how to maximize participation while emphasizing the fact that completing the survey is

4

44 Emiko Hirosawa and W. L. Quint Oga-Baldwin

voluntary. Thus, their cooperation and understanding will be quite important for a fruitful research project. The timing of the survey will also be important. Previous research with young language learners has indicated that learners’ self-assessments are most valid and externally recognizable immediately after a class (Butler & Lee, 2006). Fleeting constructs— i.e., emotions, motivations, engagement, or perceptions of the learning environment—are going to be best assessed immediately after classes, perhaps in the last minutes of class, or immediately following the lesson. Given this, we re-emphasize that short, clear, easy-to-answer surveys on a limited number of appropriately measured constructs will lead to the best results. Assessment of regular habits, such as homework or number of books read per week, may be equally reliable when assessed outside of class. In closing our recommended procedures, we offer new researchers a checklist for ensuring that their surveys have the greatest likelihood of success, presented below. Necessary Steps for Survey Creation from Conceptualization to Analysis • • • • • • • • • • • •

Clarify survey-oriented research questions. Decide constructs. Plan analysis methods. Choose instruments. Coordinate with teachers, administrators and parents. Translate survey if necessary. Conduct student focus groups. Finalize item wordings. Create survey. Implement survey in classes. Convert responses into data. Conduct analyses.

Conclusion In this chapter, we have presented the theoretical and practical aspects of survey development for research with young language learners. As a final note, researchers seeking to use survey methods are advised to undertake these practices with a clear eye toward their goals. It is absolutely necessary to have clear research goals and a clear set of hypotheses prior to asking about students’ attitudes, motivations, affects, emotions, and/or perceptions. Surveys are often an easy way to gather data, but they are just as often an easy way to gather meaningless or uninterpretable data.We emphasize here the importance of asking clear questions throughout the entire process, both prior to implementing the survey and in the surveys themselves, in order to produce the most meaningful line of research.

4

45

4

Surveys with Young Language Learners 45

Further Readings Butler, Y. G. (2019). Assessment of young English learners in instructional settings. In X. Gao (Ed.), Second handbook of English language teaching (Vol. 74, pp. 1–20). Springer International. https://doi.org/10.1007/978-3-030-02899-2-24 We consider assessment the heart of survey research; this up-to-date reference serves as an approachable and coherent introduction to the topic. Fowler, F. J. (2014). Survey research methods. Sage. This generalized reference for surveys is essential reading for anyone who wishes to understand the diversity of survey methods. Fryer, L. K., & Dinsmore, D. L. (2020). The promise and pitfalls of self-report. Frontline Learning Research, 8(3), 1–9. http://doi.org/10.14786/flr.v8i3.623 This commentary (and the special issue it introduces) offers a comprehensible overview of what self-report data can and cannot tell us, and how we can improve its use. Kline, R. B. (2019). Becoming a behavioral science researcher (2nd ed.). Guilford Publications. A crucial primer for all researchers interested in documenting human behavior, especially those who seek to understand concepts such as validity and reliability in clear, direct prose. McKay, P. (2006). Assessing young language learners. Cambridge. This in-depth text covers many important concepts and techniques for measuring young learners’ abilities that extend into the realm of surveys and questionnaires. Sculpin, R. (1997). The KJ method: A technique for analyzing data derived from Japanese ethnology. Human Organization, 56(2), 233–237. http://doi.org/10.17730/ humo.56.2.x335923511444655 This introduction to the KJ method provides a straightforward explanation of the approach and its history, as well as its potential applications in the social sciences.

Tools and Resources •

Remark Office OMR Software: A useful software for converting paper surveys to digital data. Can digitize responses from regular paper using optical mark reader technology. Available from: https://remarksoftware.com/products/offi ce-omr/

Discussion Questions 1. What are your research questions? How would a survey help you answer these questions? 2. What constructs are represented in your survey? 3. How do you write items or questions that represent the constructs you intend to investigate? Do surveys exist already that represent this construct? 4. When you translate the construct across languages and cultures, how can you make it comprehensible to young language learners in the target culture?

46

46 Emiko Hirosawa and W. L. Quint Oga-Baldwin

References Auer, S., Hampel, H., Möller, H. J., & Reisberg, B. (2000). Translations of measurements and scales: Opportunities and diversities. International Psychogeriatrics, 12(1), 391–394. https://doi.org/10.1017/S104161020000733X Blair, J., Czaja, R., & Blair, E.A. (2014). Designing surveys:A guide to decisions and procedures. Sage. Borgers, N., de Leeuw, E., & Hox, J. (2000). Children as respondents in survey research: Cognitive development and response quality. Bulletin de Methodologie Sociologique, 66, 60–75. https://doi.org/10.1177/075910630006600106 Butler,Y. (2015). English language education among young learners in East Asia: A review of current research (2004–2014). Language Teaching, 48(3), 303–342. Butler, Y. G. (2019). Assessment of young English learners in instructional settings. In X. Gao (Ed.), Second handbook of English language teaching (Vol. 74, pp. 1–20). Springer International. https://doi.org/10.1007/978-3-030-02899-2_24 Butler,Y. G., & Lee, J. (2006). On-task versus off-task self-assessments among Korean elementary school students studying English. The Modern Language Journal, 90(4), 506–518. https://doi.org/10.1111/j.1540-4781.2006.00463.x de Leeuw, E. D., & Otter, M. E. (1995). The reliability of children’s responses to questionnaire items: Question effects in children questionnaire data. In J. J. Hox, B. F. van der Meulen, J. M. A. M. Janssens, J. J. F. ter Laak & L.W. C.Tavecchio (Eds.), Advances in family research (pp. 251–258). Thesis Publishers. de Leeuw, E. D., Borgers, N., & Hox, J. (2011). Improving data quality when surveying children and adolescents: Cognitive and social development and its role in questionnaire construction and pretesting. Report prepared for the Annual Meeting of the Academy of Finland: Research Programs Public Health Challenges and Health and Welfare of Children and Young People. Available from: www.aka.fi/globalassets/awanhat/docume nts/tiedostot/lapset/presentations-of-the-annual-seminar-10-12-may-2011/surveying- children-and-adolescents_de-leeuw.pdf de Leeuw, E. D., Hox, J. J., & Dillman, D. A. (Eds.). (2008). International handbook of survey methodology. Lawrence Erlbaum Associates. Dillman, D. A., Smyth, J. D., & Christian, L. M. (2014). Internet, phone, mail, and mixed-mode surveys:The tailored design method (4th ed.). John Wiley. Djigunovic, J. M., & Lopriore, L. (2011).The learner: Do individual differences matter? In J. Enever (Ed.), Early language learning in Europe (pp. 43–60). British Council. Fryer, L. K., & Dinsmore, D. L. (2020). The promise and pitfalls of self-report. Frontline Learning Research, 8(3), 1–9. http://doi.org/10.14786/flr.v8i3.623 Fryer, L. K., & Oga-Baldwin, W. L. Q. (2019). Succeeding at junior high school: Students’ reasons, their reach, and the teaching that h(inders)elps their grasp. Contemporary Educational Psychology, 59, 101778. http://doi.org/10.1016/j.cedpsych.2019.101778 Groves, R. M., Fowler, F. J., Jr., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2004). Survey methodology (1st ed.). Wiley. Han, S., & Northoff, G. (2009). Understanding the self: A cultural neuroscience approach. Progress in Brain Research, 178(C), 203–212. https://doi.org/10.1016/ S0079-6123(09)17814-7 Hayes, M. H., & Patterson, D. (1921). Experimental development of the graphic rating method. Psychological Bulletin, 18, 98–107. https://doi.org/10.1037/h0064147 Henrich, J., Heine, S. J., & Norenzayan,A. (2010).The weirdest people in the world. Behavioral and Brain Sciences, 33, 61–135. https://doi.org/10.1017/S0140525X0999152X

46

47

46

Surveys with Young Language Learners 47

Johnson, M. K., & Foley, M. A. (1984). Differentiating fact from fantasy. The reliability of children’s memory. Journal of Social Issues, 40, 33–50. https://doi.org/10.1111/j.1540- 4560.1984.tb01092.x King, R. B., & McInerney, D. M. (2014). Culture’s consequences on student motivation: Capturing cross-cultural universality and variability through personal investment theory. Educational Psychologist, 49(3), 175–198. http://doi.org/10.1080/00461 520.2014.926813 Kline, R. B. (2010). Principles and practice of structural equation modeling (3rd ed.). Guilford Press. Kline, R. B. (2019). Becoming a behavioral science researcher (2nd ed.). Guilford Press. Krosnick, J. A., & Fabrigar, L. R. (1997). Designing rating scales for effective measurement in surveys. In L. Lyberg, P. Biemer, M. Collins, L. Decker, E. D. de Leeuw, C. Dippo, N. Schwarz & D. Trewin (Eds.), Survey measurement and process quality (pp. 141–164). Wiley- Interscience. https://doi.org/10.1002/9781118490013.ch6 Lalwani, A. K., Shavitt, S., & Johnson,T. (2006).What is the relation between cultural orientation and socially desirable responding? Journal of Personality and Social Psychology, 90(1), 165–178. https://doi.org/10.1037/0022-3514.90.1.165 Maccoby, E. E., & Maccoby, N. (1954).The interview: A tool of social science. In G. Lindzey (Ed.), Handbook of social psychology (Vol. 1, pp. 449–487). Addison-Wesley. McKay, P. (2006). Assessing young language learners. Cambridge. Netemeyer, R. G., Bearden, W. O., & Sharma, S. (2003). Scaling procedures: Issues and applications. Sage. Oga-Baldwin, W. L. Q., & Nakata,Y. (2015). Structure also supports autonomy: Measuring and defining autonomy-supportive teaching in Japanese elementary foreign language classes. Japanese Psychological Research, 57(3), 167–179. http://doi.org/10.1111/jpr.12077 Oga-Baldwin, W. L. Q., & Nakata, Y. (2017). Engagement, gender, and motivation: A predictive model for Japanese young language learners. System, 65, 151–163. http://doi. org/10.1016/j.system.2017.01.011 Oga-Baldwin, W. L. Q., Nakata, Y., Parker, P. D., & Ryan, R. M. (2017). Motivating young language learners: A longitudinal model of self-determined motivation in elementary school foreign language classes. Contemporary Educational Psychology, 49, 140–150. http://doi.org/10.1016/j.cedpsych.2017.01.010 Oliver, R., & Azkarai, A. (2019). Patterns of interaction and young ESL learners. Language Teaching for Young Learners, 1(1), 82–102. https://doi.org/10.1075/ltyl.00006.oli Paradis, J. (2007). Second language acquisition in childhood. In E. Hoff & M. Shatz (Eds.), Blackwell handbook of language development (pp. 387–405). Blackwell Publishing. https:// doi.org/10.1002/9780470757833.ch19 Piaget, J. (1960). The child’s conception of the world. Adams. Scott, J. (1997). Children as respondents: Methods for improving data quality. In L. Lyberg, P. Biemer, M. Collins, E. de Leeuw, C. Dippo, N. Schwarz & D. Trewin (Eds.), Survey measurement and process quality (pp. 331–350). Wiley. Scott, J., Brynin, M., & Smith, R. (1995). Interviewing children in the British household panel survey. In J. J. Hox, B. F. van der Meulen, J. M. A. M. Janssens, J. J. F. ter Laak & L. W. C. Tavecchio (Eds.), Advances in family research (pp. 259–266). Thesis. Sculpin, R. (1997). The KJ method: A technique for analyzing data derived from Japanese ethnology. Human Organization, 56(2), 233–237. http://doi.org/10.17730/ humo.56.2.x335923511444655 Spearman, C. (1904). “General intelligence”, objectively determined and measured. American Journal of Psychology, 15, 201−293. https://doi.org/10.2307/1412107

48

48 Emiko Hirosawa and W. L. Quint Oga-Baldwin

Thompson, R. A., & Jackson, S. (1998). Ethical dimensions of child memory research. Applied Cognitive Psychology, 12(3), 218–224. https://doi.org/10.1002/(sici)1099- 0720(199806)12:33.3.co;2-d Tourangeau, R., & Rasinski, K. A. (1988). Cognitive processes underlying context effects in attitude measurement. Psychological Bulletin, 103, 299–314. https://psycnet.apa.org/doi/ 10.1037/0033-2909.103.3.299 Wu, X. (2003). Intrinsic motivation and young language learners:The impact of the classroom environment. System, 31(4), 501–517. http://doi.org/10.1016/j.system.2003.04.001

48

49

48

4 USING INTERVIEWS WITH CHILDREN IN L2 RESEARCH Annamaria Pinter

Introduction Why are interviews important with children in second language acquisition (SLA) and second/foreign language education? In SLA, and more broadly in child second/foreign language education, there is a growing realization that it is important to complement primary linguistic data that captures second language (L2) use and development with children’s own insights and reflections in order to arrive at a more holistic understanding of children’s language learning processes and experiences. Overall, we know very little about children’s views and insights on their own second/foreign language learning processes, such as what sense they make of these experiences, what they enjoy and why, what motivates them to learn and participate in various language learning activities, and what they find meaningful or puzzling. Yet, such insights are crucial if our ultimate goal is to improve foreign/second language curricula, programs and materials for child L2 learners across different contexts worldwide. This chapter aims to summarize the potential advantages of using interviews with young learners in SLA/second language education. After considering the generic advantages of interviews, the main part of this chapter will address the ways in which research interviews may be adapted to be used with children across the age range of 4–12 years.

Interviews Interviews are arguably the most widely used tool in qualitative research, including research in second/foreign language education. The literature discussing types of interviews and issues relating to designing and conducting both individual and DOI: 10.4324/9780367815783-4

50

50 Annamaria Pinter

focus group interviews is incredibly rich (see, for example, Gubrium & Holstein, 2002); much has also been written about the challenges of interviewing different types of participants, including vulnerable participants such as children, in a range of fields of study such as health, education, law, and social care. In SLA and second/ foreign language education research, the power of interviews lies in their potential to uncover participants’ experiences in relation to various language learning tasks or situations they have encountered. Thanks to their flexibility and because they can yield large amounts of rich data relatively quickly, interviews have been extremely popular in qualitative research studies as well as mixed methods studies of all kinds. Interviews allow the researcher to explore issues of interest in great depth or further probe into topics of specific interest that may arise over the course of a study. Although interviews can be conducted online, most commonly researchers opt for face-to-face interviews, capitalizing on the added advantage of being able to register both verbal and nonverbal cues. In any interview, in addition to the content of what is told by the participant, it is also important to consider how the content is told, i.e., with strong emotion, in a light-hearted manner, or perhaps hesitantly or tentatively. Given the private nature of the classic one-to-one interview that is also face-to face rather than online, it is particularly suitable for tackling sensitive, more private issues that participants would not disclose otherwise. At the same time, some of these features also make interviews challenging from an ethical point of view. At first sight, interviews appear to be deceptively easy, and almost everyone can relate to the experience of being interviewed, yet becoming a skilled interviewer is a lengthy process that takes time and practice. In fact, skilled interviewing often requires specific training. Interviewee-interviewer relationships are never neutral; instead, complex power relationships play out in unique ways between individuals of different status, background and motives. No matter how hard the researcher tries to give space to the views and opinions of the interviewee, an interview is always a joint construction of meaning (Richards, 2003; Mann, 2016).

Interviews with Children in SLA Interviews with adults are much more commonly used in SLA research than with children. One of the main reasons for this is the fact that the literature on developmental psychology, which has been most influential in shaping our understanding of childhood and children for the most part of the 20th century, has tended to focus on children’s weaknesses rather than their potential strengths when it comes to giving accounts of their experiences. Traditionally, child-focused research has been dominated by studies where children tended to play passive object roles (e.g., Piaget, 1955) rather than more active roles. This is explained by the underlying belief that children are not able to give accounts of their experiences; therefore, researchers have to rely on observations of children’s behaviors both in their natural environments and indeed in formal

50

51

50

Interviews with Children in L2 Research 51

experiments. Such observations are then complemented by the opinions and views of caregivers, such as parents or teachers, rather than the children’s views. Lewis (1992) summarizes some widely held beliefs about young learners as unreliable interview participants: They can be easily distracted, they may not consider their answers carefully (Hughes & Grieve, 1980), they can be susceptible to leading questions (Spencer & Flin, 1990), and they have language limitations when it comes to expressing their views. Accordingly, based on the widespread belief that children are unreliable as interview participants, their opinions and views have not been systematically sought (Woodhead & Faulkner, 2008).Yet, there is also the realization that research concerned with measurements of output alone, and based on adults’ views and opinions only, misses an important element. Whilst adult observations are important, studies that focus solely on quantifiable data (such as, for example, large scale studies that measure children’s uptake of error correction, their meaning negation strategies or their acquisition of certain grammatical features), and express their results as numbers and statistics, lose out on more holistic insights from the participants. Back in the 1970s, Donaldson (1978) and her colleagues wrote convincingly about the fact that even young children’s responses and comments were much more reliable than once thought, based on the large scale measurements of performances generated in certain age groups. Donaldson’s work indicated that, once the classic Piagetian tasks were contextualized and made familiar, children were able to give much more reliable responses and were able to give accounts and reflect on their experiences. Today, there is a consensus that if children are interested and can make sense of the purpose of an activity and the contexts in which an interview takes place, they can make useful contributions and provide meaningful insights. Scott (2008) suggests that children can produce reliable responses if they are questioned about meaningful events in their lives. Rinaldi (2006) further argues that the presence of children and the accounts of their own experiences are essential in our quest for understanding their life worlds. Since children’s experiences and views matter a great deal, it could be argued that, in fact, it is the adults’ ethical duty to make an effort to understand children’s views from their own perspectives so that these views can ultimately be fed into policy and practice. In an attempt to critique the rather passively conceived role of children in research, sociologists (James & Prout, 1997; James et al., 1998) in the 1980s began to suggest that we must listen to children more carefully and take their views more seriously. Rather than treating children according to expectations associated simply with their chronological age, researchers began to consider children as individuals who have unique stories to tell and who are in fact, to some extent, experts of their own lives. Such a view of children as active and capable subjects rather than just passive objects was further strengthened by the political developments associated with the children’s rights movement after the declaration and ratification of the United Nations Convention of the Rights of the Child (UNCRC, 1989). Containing 54 articles relating to children’s rights, the UNCRC emphasizes that

52

52 Annamaria Pinter

children, like adults, must be treated as human beings with distinct rights rather than just passive objects of adult care and intervention. One of the most often cited articles of the UNCRC is Article 12, which suggests that children have the right to express their views, opinions and feelings about all important matters affecting them, and adults must take these views seriously and act on them. In this sense, some might say that seeking children’s views, understandings, feelings, and opinions about their own learning, including language learning, is an ethical and moral imperative in SLA as well. Based on the children’s rights movement, Bucknall (2014, p. 70) suggested that “it is intended that ... [children’s rights] should be exercised in all areas of children’s lives,” and this must include language learning-related contexts at school and outside school. Such an approach to listening to children also fits with the fundamental imperative to achieve social justice in education. A social justice approach promotes fair opportunities in participation (Fraser 2013) regarding children’s linguistic and educational rights. As argued above, in the field of language education and child SLA, currently we know very little about children’s day-to-day experiences or their perspectives and views on their English language programs. Accordingly, it is essential to devote space and effort to understanding children’s views in order to complement other/ adult knowledge. Interviewing children about topics relevant to their language education enables us to find out about their real life and specifically about their second/foreign language learning experiences. As Graue and Walsh (1998) point out, we need to find out about what children think, …and to keep finding it out, because if we do not find it out, someone will make it up. In fact, someone probably has already made it up and what they make up affects children’s lives; it affects how children are viewed and what decisions are made about them. Finding it out challenges dominant images. Making it up maintains them. Graue & Walsh, 1998, p. xvi Whether any adult researcher working with children in the broad field of foreign/second language education might consider using interviews with children at all will of course depend on a range of issues. The most important issue is related to the type of research question one is seeking to answer. For example, researchers might be interested in eliciting previously taught vocabulary in a post- test following an intervention. Here, the research question is focused on whether children are able to recall a linguistic item or not, and interviewing the children would simply not target the research question. However, even in these kinds of studies, it might still be useful and possible to seek children’s views on their experiences. Following the tasks or tests completed, the children may be given a questionnaire or may be interviewed with a focus on their experience of the

52

53

52

Interviews with Children in L2 Research 53

research. Such insights from children have the potential to help interpret findings, explain any surface contradictions or puzzles in the primary data set, or just give the adult researcher a better idea of the children’s perspectives of the project as they experienced it. A further factor influencing the choice of interviews with children is related to what Alderson (2005, p. 30) refers to as the adult researcher’s “conception” of childhood/model of childhood/beliefs about childhood. All adult researchers bring these beliefs and conceptions to any research project and will be guided by these beliefs when conducting research. These beliefs and views are unique to each individual adult researcher, but they are nonetheless also based on shared conceptions that have been passed from generation to generation. For example, the beliefs that children are born innocent and vulnerable and they need control and discipline are widely held views often implicitly reflected in policy decisions and how children are treated in everyday life inside and outside school. For some adult researchers, beliefs and conceptions of childhood may stay constant, while some may change their views and beliefs over time after their experiences of working with children. If an adult researcher believes that children are full of potential because they are resourceful and competent, their views and insights can be trusted, and they may have something relevant to say, he or she is more likely than others to formulate research questions that will allow for the use of interviews or other tools to elicit children’s insights.

Types of Studies: Examples In various studies in child L2 education, more recently, children have been interviewed and have taken up more consultative roles, such as in Kuchah and Pinter (2012). Children have also been asked to evaluate research tools for adults (Zandian, 2015). In Pinter and Zandian (2012), the participatory nature of the children’s involvement afforded a space for spontaneous, unsolicited insights and questions, which the adult facilitators encouraged and responded to. In Pinter et al. (2016), children undertook classroom investigations alongside their teachers to explore their own English learning in Indian primary English classrooms. Prasad (2013, 2014) worked with children as co-researchers or co-ethnographers in an attempt to make sense of the children’s multilingual identities using art-based creative methods and identity texts. Various types of interviews were used in these studies, including those where children interviewed their peers. In Lundy et al. (2011), the children contributed to the development of research questions and choice of methods, and they were also involved in the interpretation of the data and dissemination of the findings. Some of these studies indicate that children’s views are not just useful and valid, their input into any research project can go well beyond simply being data sources. Interviews may be used as the primary data gathering tool or they may be used alongside other tools and methods. Here, I will briefly discuss two examples in

54

54 Annamaria Pinter

more detail where interviews have been used to elicit children’s views in foreign/ second language education research to illustrate some of the advantages that can be accrued. In the first study by Kuchah and Pinter (2012), interviews were used with children as the main data collection tool, and the aim was to gain insights from children to complement adult understandings of “good English teaching.”The insights offered by the children contributed to a better understanding of the research question and ultimately enriched the research with findings and insights that could not have been gained from the adults alone. The study was originally planned to explore teachers’ practice in primary schools in Cameroon with a focus on understanding how “good” practice was understood by local English teachers. To this end, a complicated selection process was introduced to find locally esteemed inspirational teachers. In all schools where inspirational teachers were identified, the children in their classrooms were also interviewed. In one of the schools, when the 10-and 11-year-old children were interviewed about the characteristics of good English language teachers, they insisted that their regular teacher (the one identified by the adults) was not their best English teacher. Instead, the children recommended another teacher in the school who was, in fact, a less established and less qualified teacher. The children justified their views and persuaded the researcher to include their teacher in the study. The adult researcher at this point could have simply acknowledged the children’s suggestions politely but without taking action. However, the researcher decided to take the children’s views seriously by observing and including the new teacher. This led to an understanding of important differences between children’s and adults’ views on good teaching, to an alternative research focus, and, indeed, to a new research question in the original study. At the same time, the credibility of the adult researcher went up, as the children could see that their views were taken seriously. In this study, interviews were used to elicit children’s views, and their sophisticated insights led to the introduction of an additional research question to the study. In the second study, Prasad (2018) used a range of tools to access children’s views and perspectives, including observations and reflective activities as well as interviews that were mediated by artifacts (visual aids). Prasad was interested in exploring how 9-to 11-year-old children could make sense of their plurilingual lives. “How does it look and feel to be plurilingual?” was the core question of the research. Given that using and learning multiple languages is fast becoming the norm in primary schools in many contexts, the researcher wanted to understand how children themselves represented their multilingual selves. The more inclusive visual/creative method acted as a scaffold when it came to the children’s articulation of their ideas. These collages “served as evocative elicitation tools […] to access perspectives and feelings about plurilingualism” (Prasad, 2018, p. 8). Prasad argues that the collages helped the children verbalize their thoughts and reflect on their own understandings of themselves as plurilingual language learners and users. Children’s insights from these interviews were then combined with other

54

5

54

Interviews with Children in L2 Research 55

data sets such as adult observations to answer the overarching research questions of the study. Using artifacts such as visual aids or collages can be helpful when conducting interviews with children.

Challenges and Considerations Power Gap, Rapport, and Language In any interview situation, challenges arise out of the relationship between interviewer and interviewee due to the fact that the researcher always enjoys an elevated status and more power and authority than the interviewee. In the case of adults interviewing children, such power issues are even more complex given the status quo of the social order whereby adults firmly control all aspects of children’s lives. It is therefore natural for children to simply obey adults and respond in interviews with answers that are likely to please the adult even if explicitly told there is no right answer. Children suspect that underneath it all there must be a right answer. Therefore, one of the most important tasks of the adult interviewer is to create a friendly atmosphere where children are encouraged to participate and even ask questions rather than only answer them: One of the key aspects of the interview approach recommended is flexibility. Although the researcher will have certain questions in mind to start, he or she must be willing to let the interview develop by allowing opportunities for new questions to emerge based on what is shared during the interview. These questions may arise from anyone, not just the researcher. Eder & Fingerson, 2002, p. 185 The inevitable power gap between children and adults, of course, cannot be completely overcome (Mayall, 2008), but adult researchers need to think about the implications of this power and mitigate it by, for example, building relationships, organizing briefings about the research, and discussing with the children the purpose of the research interviews. Such negotiations and relationship building will take time and rarely can they be handled in a single conversation. It is often difficult to organize these meetings in busy schools and researchers may need to compromise, but the more time is invested at this stage, the better the quality of the data will be. Adult researchers who may be presenting themselves as interested outsiders may also need to spend time negotiating and explaining their hybrid identities (e.g., Kuchah & Pinter, 2012) and encourage children to ask questions to help them make sense of the adult’s unique identity. For example, PhD students as researchers who present themselves as adults but at the same time as students often find it useful to open up a discussion about their various roles and identities. One strategy that seems to work well is to be open about the fact that, as an adult,

56

56 Annamaria Pinter

you are in need of information that only the children have, and you are therefore highly dependent on their help. Some argue that so-called participatory approaches, such as the use of drawings, drama, photos and other creative methods instead of or alongside more traditional interviews, can also facilitate the process of breaking down power barriers between adults and children because such methods can be more in line with children’s preferred ways of expressing themselves. For example, child-led photography is often used as a first step to help the children focus on something personally significant when talking to the adult researcher (e.g., Fassetta, 2016). Traditional question- answer interviews have the potential disadvantage that some children may find it hard to express themselves in a format that is truly reminiscent of school discourse with the expectation that adults know best and there is a right answer.

L1 or L2 Use in Interviews Interviews are best to be conducted in the interviewees’ strongest language, which, in the case of bilingual or multilingual children, can sometimes be a language that is not shared with the adult. Careful thought must be given to how such challenges can be addressed in any one context. Given the reality of young learner classrooms around the world, where children speak a variety of first languages, the danger is that children whose strongest language is not shared by all (including the adult researcher) are often left out even though they might have a lot to contribute. Even if the adult interviewer and child interviewee share a common (first) language, breakdowns in communication can happen due to a lack of familiarity with each other’s communication styles (Kellett & Ding, 2004). This may be due to the fact that the adult researcher has not communicated the nature of the problem to the child well, or the child does not have the same understanding of the words being used. Adult researchers need to tune into the natural discourses of children. As Punch (2002, p. 328) suggests, language difficulties between adults and children are often “mutual.” Korteslouma et al. (2003) suggest that children might not always interpret questions accurately, and it is therefore the adults’ job to strategically work on preventing misunderstandings. This may mean listening to the children’s language and adjusting your own and repeating back answers to ensure understanding. It is also important to attend to children’s nonverbal clues. A brief period of observation before interviews begin can also be very useful if this is feasible. With regard to language use, Spyrou (2011, 2016) also draws our attention to the fact that children will say different things to different people and thus there is no such thing as children’s one true authentic voice. Whatever children tell us in interviews will be the product of our relationship with them and the context we find ourselves in. Thus, we need to situate children’s voices and stories in interactional, institutional, and local discourse contexts. Deeper layers of understanding often represent complex, contradictory views, and uncovering views requires

56

57

56

Interviews with Children in L2 Research 57

sustained engagement. Komulainen (2007) reminds us further that interactions and communication with children can never be fully authentic, as their words are often reminiscent of adult discourses. Adult researchers always need to listen out for spontaneous comments, unsolicited remarks, and questions, as these are likely to be less adult mediated. Reflexive research accepts that children’s voices are messy and multi-layered and need to be analyzed within their micro contexts. The same child may tell an interviewer different things based on who they think the interviewer is, what their agenda might be, where the interview takes place, how they feel about the topic of the interview, how comfortable they are with the tools/questions used, and how they feel about the topic, just to mention a few relevant issues to keep in mind. Adult researchers need to be careful in their interpretations of their findings based on these and other relevant factors in the given micro context. In longitudinal projects where the adult researcher can build lasting, trusting relationships with the children and has the opportunity to talk to children/interview them frequently, it becomes possible to track children’s developing views and understandings, which are always dynamic and multifaceted.

Creating Comfortable Environments Tammivaara and Enright (1986) suggest that it is a good idea to embed interviews in familiar activities such as “show and tell” activities and “circle time,” as these help with relaxing the children. Right at the beginning of the interview, it is good practice to have an initial chat/small talk or share meaningful personal information and even some laughter. Depending on the physical environment, adult interviewers may find it more challenging to create a comfortable place. For example, a frequently quoted challenge with school-based research is that the strict hierarchical setup does not allow for too much informality. Robinson and Kellett (2004, p.91) argue that the balance of power is “so heavily skewed towards adults” in schools that children’s responses will be inevitably impacted. Often, it is not possible to find a quiet neutral venue.Whether children are being interviewed in the headteacher’s office as opposed to a corner in the school library or a bench in the playground may have important consequences for the quality and nature of the resulting data. It is also important that interviews in quiet places should not be interrupted by anyone, but this is almost impossible in schools where staff have the overriding right to enter and supervise/safeguard children. The actual physical arrangement of the space is also important. Children may be asked to sit on the floor or the carpet, especially if this is a routine they recognize. In some contexts, children are not used to sitting on chairs behind desks with an adult on the other side and may find this arrangement too formal. Especially with younger children, such as preschoolers, Griffin (2019, p. 103) suggests taking great care with physical arrangements. She suggests using “shoulder- to- shoulder” or “walk- around” interviewing methods. During shoulder- to- shoulder interviews, the researcher and the child are usually sitting side by side on

58

58 Annamaria Pinter

the floor with their shoulders next to each other. Interestingly, Griffin argues that this helps with avoiding direct eye contact, which is believed to be off-putting. Griffin offers the following reflections in this regard: I was speaking with my research partner […] about how my 9-year-old son seemed more willing to share things with me at night. She shared that she had similar experiences when she was in the car with her children. She noticed they seemed to share more when they were driving somewhere and they were sitting in the back seat.We realised that the common denominator seemed to be lack of eye contact, either because the lights were off or because the parent was driving and looking at the road and not the children. This seemed to create a level of comfort for the child not found when we attempted conversations where direct eye contact was maintained. Additionally, I wondered if having a book or something else to focus on would take pressure off the child being interviewed. Griffin, 2019, p. 114 Similarly, a “walk-around” interview attempts to avoid formal setups by following the child around, moving with the child as he/she goes about his or her normal routine and comments on what is happening.

58

Individual/Pair Interviews, Group Interviews, or Peer Interviews Children are more relaxed when interviewed in groups or pairs (Lewis, 1992; Eder, 1995; Fingerson, 1999; Pinter 2007). When in the company of peers, children feel less intimidated and more confident, and they can spontaneously build on each other’s comments and points, potentially revealing more than would have been possible in individual interviews. It can be easier to express uncertainties or negative opinions when you are part of a group, and children may be more ready to question the interviewer. It is more likely that, in group interviews, children will offer stories and anecdotes or they may naturally direct the line of the conversation in ways that is meaningful to them. There is also less of a chance for the adult to impose their own meanings on the children. The discourse of the group talk is more likely to grow out of the children’s peer culture, giving the adult researcher unexpected insights into the way children naturally defend their points and agree or disagree with each other and compromise. The specific composition of groups matters. Children will open up more easily if they are comfortable, sitting next to their friends. Boys’ and girls’ groups tend to function differently because of different communication styles (Sigelman & Holtz, 2013), so it is important to consider single sex versus mixed groups. Children’s group dynamics are complex, including influences of gender, personality, age, ability, attractiveness, popularity within their peer group, and many other factors. Children, like adults, are heavily influenced by complex and dynamic

59

58

Interviews with Children in L2 Research 59

power relationships within their peer groups, and this further complicates power structures in any context where adults and children work together. It is always advisable to allow children to choose their own groups and/or rely on the advice of someone who knows the children well and to be ready to change the size and composition of groups in case of any difficulties. Group and pair interviews are not always appropriate. Indeed, if an individual child would like to disclose an issue that is more private in nature, groups and pairs would not allow for that.

Some Ethical Issues With interview research, as with all other types of research, children’s consent or assent has to be sought. Given that children need to understand fully what they are agreeing to and what the consequences of their participation will mean, securing consent is usually a process that entails ongoing negotiations and checks throughout the study (Kuchah & Pinter, 2021). When children are promised confidentiality, it is important to make sure they understand that, in case a child discloses any sensitive information, the adult’s role is first of all to seek help to protect the child rather than keep the confidentiality. Many other ethical dilemmas arise such as who will benefit from the research. Eder and Fingerson (2002) suggest that ethical research with children must involve reciprocity, meaning that the children must get something back after the study is completed. With interviews, usually only a handful of children are selected, and more often than not these children are the ones who are more confident, with superior communication and social skills. A real challenge for adult researchers is therefore to consider who gets selected and, in turn, left out. Many groups of children are less visible and less represented in research because they belong to marginalized groups such as those with disabilities, chronic illnesses, in care, looked after children, minorities, or those at risk of exclusion. Any research where children’s views are interrogated needs to address the question about inclusion and exclusion. As adult researchers, when approaching data produced by children in interviews and in other types of research as well, a key question arises about the representation and analysis of the data. Adult researchers need to take care to acknowledge their own biases regarding their own context, race, class, gender, and voice (Fine, 1994) and consider to what extent they represent their own rather than the children’s voices.

Conclusion Eliciting children’s views and opinions relating to any topic within SLA and L2 education for children is immensely useful. There are no easy recipes to follow when it comes to preparing for interviews with children, but adult researchers are encouraged to consider ways in which it may be possible to

60

60 Annamaria Pinter

overcome the power gap between children and the adult researcher, utilizing some of the ideas discussed in this chapter. The aim is to make the most of the dialogical space carefully created in conversations with children so that they feel empowered and ready to voice their views, confident that their views will make a difference.

Further Readings Eder, D., & Fingerson, L. (2002). Interviewing children and adolescents. In J. B. Gurbium & J. A. Holstein (Eds.), Handbook of interview research (pp. 181–202). Sage Publications. This chapter sits in the well-known handbook of interview research and it addresses the most important principles of undertaking interviews with children of all ages. It focuses on what is qualitatively different in the process of interviewing children and adolescents as compared to adults. The chapter is not focused on language education, but it contains some excellent generic advice backed up by research applicable to language education. O’Reilly, M., & Dogra, N. (2017). Interviewing children and young people for research. Sage Publications. This is a publication that is entirely focused on interviewing children and young people, and it is considered an authoritative guide in the field on all aspects of interviewing children. The book offers both theoretical and practical guidance for adult researchers, from the conceptualization of interview studies to considerations of ethics, methods, fieldwork, and data analysis. Sargeant, J., & Harcourt, D. (2012). Doing ethical research with children. Open University Press. This is a book-length publication that addresses principles relating to undertaking research with young children, with a focus on preschoolers. The main premise of the book is that all research with young children needs to be respectful of their rights as research participants. Useful examples of projects considered good practice by the authors are described and discussed. The book addresses various data elicitation techniques, including talking to children and interviewing them using different techniques and methods.

Discussion Questions 1. Working with younger children in preschool contexts (3–4 years of age), what tools could you use to help you elicit children’s views on their own language learning experiences? 2. Children sometimes stay silent or say very little in response to questions coming from an adult researcher they do not know. What possible reasons can you think of to explain the lack of response? 3. Imagine that, as part of your study, you plan to interview children about their classroom English learning experiences. You will be conducting group interviews with 5-year-olds, 8-year-olds, and 12-year-olds. How would you cater for these age groups in terms of your preparation?

60

61

60

Interviews with Children in L2 Research 61

References Alderson, P. (2005). Designing ethical research with children. In A. Farrell (Ed.), Ethical research with children (pp. 27–36). Open University Press. Bucknall, S. (2014). Doing qualitative research with children and young people. In A. Clark, R. Flewitt, M. Hammersley & M. Robb (Eds.), Understanding research with children and young people (pp. 69–84). Sage. Donaldson, M. (1978). Children’s minds. Fontana Press. Eder, D. (1995). School talk: Gender and adolescent culture. Rutgers University Press. Eder, D., & Fingerson, L. (2002). Interviewing children and adolescents. In J. B. Gurbium & J. A. Holstein (Eds.), Handbook of interview research (pp. 181–202). Sage. Fassetta, G. (2016). Using photography in research with young migrants: Addressing questions of visibility, movement and personal spaces. Children’s Geographies, 14(6), 701– 715. https://doi.org/10.1080/14733285.2016.1190811 Fine, M. (1994).Working the hyphens: Reinventing self and other in qualitative research. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research. Sage. Fingerson, L. (1999). Active viewing: Girls’ interpretation of family television programmes. Journal of Contemporary Ethnography, 28, 389–418. https://doi.org/10.1177%2F0891241 99129023497 Fraser, N. (2013). Scales of justice. Polity Press. Graue, M. E., & Walsh, D. J. (1998). Studying children in context:Theories, methods and ethics. Sage. Griffin, K. M. (2019). Participatory research interviewing practices. In A. Eckhoff (Ed.), Participatory research with young children (pp. 95–121). Springer. Gubrium, J. F., & Holstein, J. A. (Eds.). (2002). Handbook of interview research. Sage. Hughes, M., & Grieve, P. (1980). On asking children bizarre questions. First Language, 1, 149–160. https://doi.org/10.1177/014272378000100205 James, A., & Prout, A. (Eds.). (1990/1997). Constructing and re-constructing childhood. Falmer Press. James, A., Jenks, C., & Prout, A. (1998). Theorising childhood. Polity Press. Kellett, M., & Ding, S. (2004). Middle childhood. In S. Fraser,V. Lewis, S. Ding, M. Kellett & C. Robinson (Eds.), Doing research with children and young people (pp. 161–174). Sage. Komulainen, S. (2007). The ambiguity of the child’s voice in social research. Childhood 14(1), 11–28. https://doi.org/10.1177%2F0907568207068561 Korteslouma, R-L., Hentinen, M., & Nikkonen, M. (2003). Conducting a qualitative interview: Methodological considerations. Journal of Advanced Nursing, 42(5), 434–441. http://doi.org/ 10.1046/j.1365-2648.2003.02643.x. Kuchah, H. K., & Pinter, A. (2012). Was this an interview? Breaking the power barrier in adult-child interviews in an African context. Issues in Educational Research, 22(3), 283– 297. www.iier.org.au/iier22/kuchah.html Kuchah, H. K., & Pinter, A. (2021). Introduction. In A. Pinter & H. K. Kuchah (Eds.), Ethical and methodological issues in researching young learners in school contexts. Multilingual Matters. Lewis, A. (1992). Group children interviews as a research tool. British Education Research Journal, 18(4), 413–421. https://doi.org/10.1080/0141192920180407 Lundy, L., McEvoy, L., & Byrne, B. (2011). Working with young children as co- researchers: An approach informed by the United Nations Convention on the Rights of the Child. Early Education and Development, 22(5), 714–736. https://doi.org/10.1080/ 10409289.2011.596463 Mann, S. (2016). The research interview: Reflective practice and reflexivity in research processes. Palgrave Macmillan.

62

62 Annamaria Pinter

Mayall, B. (2008). Conversation with children: Working with generational issues. In P. M. Christensen & A. James (Eds.), Research with children: Perspectives and practices (pp. 120– 135). Routledge Falmer Press. Piaget, J. (1955). The language and thought of the child. Routledge. Pinter, A. (2007). What children say: Benefits of task repetition. In K. Van den Branden, K. Van Gorp & M.Verhelst (Eds.), Task-based language education from classroom-based perspective (pp. 126–149). Cambridge Scholars Publishing. Pinter, A., Mathew, R., & Smith, R. (2016). Children and teachers as co-researchers in Indian primary English classrooms. ELT research paper 16.03. The British Council. Pinter, A., & Zandian, S. (2012). “I thought it would be tiny little one phrase that we said, in a huge big pile of papers”: Children’s reflections on their involvement in participatory research. Qualitative Research, 15(2), 235–250. https://doi.org/10.1177%2F14687 94112465637 Prasad, G. (2013). Children as co-ethnographers of their plurilingual literacy practices: An exploratory case study. Language and Literacy, 15(3), 4–30. https://doi.org/10.20360/ G2901N Prasad, G. (2014). Portraits of plurilingualism in a French international school in Toronto: Exploring the role of visual methods to access students’ representations of their linguistically diverse identities. Canadian Journal of Applied Linguistics, 17(1), 51–77. https://journals.lib.unb.ca/index.php/CJAL/article/view/22126 Prasad, G. (2018). How does it look and feel to be plurilingual? Analysing children’s representations of plurilingualism through collage. International Journal of Bilingual Education and Bilingualism, 23(8), 902–924. https://doi.org/10.1080/13670 050.2017.142003 Punch, S. (2002). Research with children: The same or different from research with adults? Childhood, 9(3), 321–341. https://doi.org/10.1177/0907568202009003005 Richards, K. (2003). Qualitative inquiry in TESOL. Palgrave Macmillan. Rinaldi, C. (2006). In dialogue with Reggio Emilia: Listening, researching and learning. Routledge. Robinson, C., & Kellett, M. (2004). Power. In S. Fraser, V. Lewis, S. Ding, M. Kellett & C. Robinson (Eds.), Doing research with children and young people (pp. 81–96). Sage. Scott, J. (2008). Children as respondents: The challenge for qualitative methods. In P. M. Christensen & A. James (Eds.), Research with children: Perspectives and practices (pp. 98–119). Routledge Falmer Press. Sigelman, C. K., & Holtz, K. D. (2013). Gender differences in preschool children’s commentary on self and other. Journal of Genetic Psychology, 174(2), 192–206. https://psycnet.apa. org/doi/10.1080/00221325.2012.662540 Spencer, J. R., & Flin, R. (1990). The evidence of children:The law and psychology. Blackstone. Spyrou, S. (2011). The limits of children’s voices: From authenticity to critical reflexive representation. Childhood, 18(2), 151–165. https://doi.org/10.1177%2F090756821 0387834 Spyrou, S. (2016). Researching children’s silences: Exploring the fullness of voice in childhood research. Childhood, 23(1), 7–21. https://doi.org/10.1177%2F090756821 5571618 Tammivaara. J., & Enright, D. S. (1986). On eliciting information: Dialogues with child informants. Anthropology and Education Quarterly, 17, 218–238. https://doi.org/10.1525/ aeq.1986.17.4.04x0616r

62

63

62

Interviews with Children in L2 Research 63

United Nations. (1989). United Nations conventions on the rights of the child. New York: United Nations. Woodhead, M., & Faulkner, D. (2008). Subjects, objects or participants: Dilemmas of psychological research with children. In A. James & P. Christensen (Eds.), Research with children: Perspectives and practices (pp. 10–39). Routledge. Zandian, S. (2015). Children’s perceptions of intercultural issues: An exploration into an Iranian context [Unpublished doctoral dissertation]. University of Warwick.

64

5 VERBAL REPORTS AS A WINDOW FOR UNDERSTANDING MENTAL PROCESSES AMONG YOUNG LEARNERS Yuko Goto Butler

Introduction In second language development (SLD) research, verbal reports, such as think- alouds (Bowles, 2010a, 2010b) and stimulated recall (Gass & Mackey, 2000, 2017), have been widely used as a means of understanding learners’ mental processes when engaging in language-related tasks and activities. Despite the popularity of verbal reports as a research tool, their application remains relatively limited in SLD research focusing on children. For verbal reports to be effective when used with children, numerous modifications may be necessary due to children’s specific developmental factors and life experiences. Drawing on research findings on child development and education as well as child SLD, this chapter discusses age-related challenges and various pitfalls and possibilities that can arise when using verbal reports to understand young learners’ second or foreign language (L2/FL) learning. In addition to addressing the use of verbal reports as a research tool, this chapter also considers them as a pedagogical tool for assisting children’s self-reflection and language learning.

General Description of Verbal Reports as an Introspective Method Verbal reports are used as an introspective method to access people’s mental processes through their verbalizations either while they are engaging in tasks or after they have completed tasks. Because mental processes are not easily understood by simply observing behaviors, using verbal reports as a window to access mental processes is widely accepted in psychology, linguistics, education, medicine, DOI: 10.4324/9780367815783-5

64

65

64

Verbal Reports and Language Learning 65

business, and other fields (Bloom, 1953; Bowles, 2010a, 2010b; Ericsson & Simon, 1984, 1996; Gass & Mackey, 2000, 2017; Pressley & Afflerbach, 1995). Introspection, which can be broadly defined as “the looking into our own minds and reporting what we there discover” (James, 1890, p. 185), has a long history in scientific inquiry in the West, going at least as far back as Aristotle in the 3rd century BCE. However, introspection as a research method lost popularity with the rise of behaviorism, in which human behaviors were believed to be understood solely by directly observable and measurable means. During the heyday of behaviorism (early to mid-20th century), introspection was considered unscientific and was largely ignored. As mentalists gained power in the mid-to late 20th century, introspection and verbal reports regained popularity as research methods (Bowles, 2019; Ericsson & Simon, 1984, 1996; Fox et al., 2011; Gass & Mackey, 2000, 2017; Levy & Ransdell, 1995).1 There are multiple methods for collecting verbal reports. Among researchers of applied linguistics, think-alouds and stimulated recall are the most well-known verbal report methods. Think-aloud is a concurrent method for accessing participants’ ongoing cognitive processes. According to Ericsson and Simon (1996), instead of simply vocalizing silent speech (referred to as a talk-aloud report), a think- aloud report asks participants to convert thoughts into a verbalizable form for vocalization. In other words, a think-aloud requires participants to go through an additional verbal encoding process while they maintain their attention on the information being verbalized, which, in turn, results in additional time to complete the given task. Think-alouds usually take an oral form. Stimulated recall is a type of retrospective method. It is used as a prompt for assisting participants to recall thoughts that they had while they were engaging in tasks. Namely, it is a way to access participants’ memory structure. Stimulated recall is usually conducted immediately after the task is completed, and it mainly takes an oral format but can take a written form (e.g., Brown, 1993). Stimulated recall uses some artifact of the given task, such as video/audio recording or spoken/written products, as the stimulus. When this method is used for children, because they often need additional scaffolding, a stimulated recall interview is used as well (e.g., Winke et al., 2018). In this chapter, stimulated recall interviews are treated as a type of research method utilizing verbal reports and are included in the discussion. Verbal reports have been used extensively as a research method in second language (L2) research primarily focusing on adults. Research on L2 development uses verbal reports to try to uncover (a) language-related declarative knowledge (i.e., linguistic rule-based knowledge that is often conscious) and how it is organized, (b) language-related procedural knowledge (i.e., cognitive processes such as searching, storing, and retrieving processes that are usually unconscious unless there is a breakdown in communication), and (c) various strategies when using language (Bowles, 2019; Gass & Mackey, 2000, 2017; Sanchez & Grimshaw, 2020; Zhang & Zhang, 2020). Numerous studies have been conducted on different

6

66 Yuko Goto Butler

aspects of language-related knowledge, processing, and strategies, including reading (Hosenfeld, 1977; Hu & Gao, 2017; Tode, 2012), writing (Bosher, 1998; Negretti & McGrath, 2018), listening (Goh, 1998), oral communication and discourse strategies (Hawkins, 1985; Swain & Lapkin, 1998), lexical access and structure (Deschambault, 2018; Nassaji, 2006), language awareness (Yoshida, 2008), learners’ perception of feedback (Mackey et al., 2000; Sachs & Polio, 2007), and teacher cognition (Polio et al., 2006). Verbal reports have also been used to verify tests and measurements (Barkaoui et al., 2013; Gass, 1994) and as an awareness-raising tool (Sheppard & Ellis, 2018). (For lists of relevant studies, refer to Bowles, 2010a, 2010b, 2019; Sheppard & Ellis, 2018; Gass & Mackey, 2000, 2017; Sanchez & Grimshaw, 2020; Zhang & Zhang, 2020.) Despite the popularity of verbal reports as a research tool, two major validity concerns arise when using them as a window into internal mental processes; these two validity issues are referred to as reactivity and veridicality. Reactivity is defined as “the effect of verbalization on learners’ performance” (Egi, 2008, p. 212) or, more specifically, “either a positive or negative influence of verbalization during or after the task (i.e., concurrent or retrospective reports, respectively) on learners’ task performance and/or subsequent learning” (p. 213). Veridicality concerns the “accuracy or completeness of reports as a reflection of learners’ thought processes” (p. 213). When it comes to think-alouds, both reactivity and veridicality can be a potential concern. Ericsson and Simon (1996) argued that, in principle, they do not change the nature of one’s cognitive processes except for extending processing time. However, according to Ericsson and Simon, because think-alouds require an additional verbal-encoding process during a task, this additional process might be an extra cognitive burden that could affect participants’ task performance (reactivity) and the accuracy of the verbalized data (veridicality). With respect to stimulated recall, although this method is free from concerns related to reactivity, participants may provide the reasoning for their cognitive processes, or post rationalizations, rather than simply exhibiting their mental process (veridicality). The issue of reactivity in think-alouds is controversial (Hu & Gao, 2017; Leow & Morgan-Short, 2004; Pressley & Afflerbach, 1995; see also Ericsson & Fox, 2011; Fox et al., 2011; and Schooler, 2011, for a recent debate on this topic). Based on a meta-analysis, Bowles (2010b) argued that the task performance of think- aloud groups does not consistently differ from the performance of silent groups; however, a variety of factors may influence the reactivity of think-alouds. Such factors include task types, participants’ proficiency levels and familiarity with verbalization, directions and prompts provided to the participants (to help them keep talking), pre-training or modeling for verbalization, and so forth (Bowles, 2010b; see also Fox et al., 2011; Deschambault, 2018;Valfredini, 2015). Similar complexity has been articulated with respect to veridicality. While veridicality can be a potential issue for both think-aloud and stimulated recall, one

6

67

6

Verbal Reports and Language Learning 67

may assert that veridicality should be a more serious concern for stimulated recall because of its potential for memory decay. However, the time lag is not the only factor that influences the accuracy of the data; other potentially influential factors include unfamiliar tasks, unclear instruction, cultural mismatch in expectation between participants and experimenters, affective factors (e.g., motivation and anxiety), and the language used for verbalization (e.g., L1, L2, or a combination of the two, etc.). It is also important to note that veridicality has been studied far less than reactivity in L2 development research (Bowles, 2010b, 2019; Gass & Mackey, 2000, 2017; Hu & Gao, 2017; Zhang & Zhang, 2020). Researchers who take a sociocultural approach, such as Valfredini (2015), have theoretically challenged Ericsson and Simon’s (1996) claim that think-alouds do not change one’s thought processes. Unlike cognitive information-processing frameworks, in which speech is considered an independent manifestation of one’s cognitive process, sociocultural frameworks view speech and thought as inseparable; accordingly, one’s speech does influence one’s thinking (Vygotsky, 1978). For example, Swain (2006) argued that verbalization is the very act of learning rather than a simple display of learned knowledge. Therefore, verbalization should alter the learning processes and thus should be treated as an object of investigation in and of itself instead of as a means of accessing participants’ thinking processes. In sum, verbal reports, most commonly think-alouds and stimulated recall, have been used extensively as a window for understanding individuals’ knowledge and mental processing in adult L2 development studies. Despite the popularity of verbal reports as a research tool, two major validity concerns, namely reactivity and veridicality, have been articulated. Depending on epistemological traditions, researchers also have different conceptualizations of what verbal reports entail.

Verbal Reports in Research Focusing on Children In addition to the extensive use of verbal reports—both think-alouds and stimulated recall—in adult L2 development studies, these methods are used widely in child development studies (cognitive development studies in particular) as well as in L1 literacy development studies. Jean Piaget, an influential 20th-century cognitive developmental psychologist, relied heavily on children’s verbal explanations for their responses to cognitive tasks as a means to understand their thought processes and reasoning (e.g., Piaget, 1952). Verbal reports have been used with children engaged in various problem-solving tasks across subject matters, including L1 reading and writing (e.g., Alvermann, 1984; Gordon, 1990; Kucan & Beck, 1997; Ness & Kenny, 2016), while its application to children’s L2/FL development research remains relatively limited (Sanchez & Grimshaw, 2020). Significantly, even though verbal reports originally gained attention as a method of inquiry in child research, they have become popular as a tool for instruction as well (Kucan & Beck, 1997; Palincsar & Brown, 1984). In the following subsections, we focus

68

68 Yuko Goto Butler

on verbal reports both as a method of inquiry and as a method of instruction in the contexts of L1 and L2/FL development among children.

Verbal Reports as a Method of Inquiry for L1 and L2/FL Development Among both L1 and L2/FL children, verbal reports have been used in studies primarily on reading and writing research. Verbal reports have also been used for validating measurements and assessments for young learners. Treating reading as a problem-solving task, numerous studies used verbal reports to uncover children’s reading processes and to identify the strategies and metacognitive skills that they employed during reading. In L1 reading research, for example, Meyers et al. (1990) found that successful fourth-and fifth-g rade readers verbalize more than their less-successful counterparts and use greater varieties of strategies and reasoning that are relevant to comprehension. Inference skills employed during L1 reading were different between children who had trouble with comprehension and those who did not (Kim & Choi, 2017). Coté et al. (1998) reported that think-alouds helped sixth graders with the quality of their text recall when reading informational texts with relatively unfamiliar content, but that think-alouds did not help fourth graders.The researchers speculated that the cognitive demands required for the task (i.e., think-alouds) might have hindered the fourth graders from making a coherent connection between their prior knowledge and unfamiliar knowledge. These results remind us of the importance of individual and age-related considerations when employing think- alouds; quality and quantity (the amount of verbalization) can differ greatly across children. Of researchers using verbal reports to study reading among children in L2/FL contexts, Rao et al. (2007) and Zhang et al. (2008) examined reading strategies taken by English-learning bilingual children in primary school using think-alouds. The studies reported different uses of strategies, both in terms of frequencies and types of strategies, by the children’s proficiency levels and grades. Rao et al. (2007) found that higher proficiency readers used strategies that required deep-level processing, such as inferencing and making predictions, more frequently than less proficient readers who, in turn, tended to rely more on surface-level strategies such as paraphrasing and re-reading. Similarly, Zhang et al. (2008) reported a different use of strategies by children’s grade levels as well as proficiency levels. Butler (2020) used stimulated recall to examine fourth-g rade L2 children’s skills and strategies for constructing the meaning of unknown words in both information- rich texts (texts that contain a great deal of contextual information that supports readers in inferring the meaning of the target words) and information-poor texts. Different strategies were found between strong and emergent readers depending on the richness of contextual information given in the texts. Jiménez et al. (1996) compared strategies used by Spanish–English bilingual readers and monolingual

68

69

68

Verbal Reports and Language Learning 69

English readers. They identified some unique strategies that were employed by the proficient bilingual readers; namely, frequently transferring information across languages, translating, and actively accessing cognate vocabulary. In both Butler (2020) and Jiménez et al. (1996), the researchers stressed the importance of making sure that students are comfortable with the language used for verbalization. Similarly, verbal reports have been used to uncover writing processes and strategies among children. For example, Davidson and Berninger (2016) used think- alouds with 10-to 12-year-old L1 students to identify and compare the quality of their idea generation and planning when writing essays. Age effects were found in the quality of idea generation but not planning. In addition, only idea-generating variables significantly predicted quality of writing. As suggested by Breetvelt et al. (1994) nearly three decades ago, the relationship between a writing product and cognitive activities during writing needs to be clearly formulated in order to establish a cause-and-effect relationship between writing products and processes. The accumulation of research looking into writing processes through think- alouds helps us better understand such cause-and-effect relations. With advances in digital technology, researchers are also interested in using verbal reports to identify unique skills and strategies that are necessary when using language through smartphones, laptops, and other digital technology. Although this line of research among children is relatively unexplored, one can imagine that reading online, for example, is no longer limited to constructing meaning in a set body of given textual information but also requires new skills, including searching for multimodal information appropriately and efficiently from the web, determining the credibility of information that is almost endlessly available, synthesizing and organizing information, and so forth (Leu et al., 2004, pp. 1589– 1590). Coiro and Dobler (2007), relying on think-alouds as well as stimulated recall with L1 sixth graders, discovered that skillful online readers used their prior knowledge, inferential reasoning strategies, and self-regulated processing. Jensen (2019) used a type of stimulated recall (what he referred to as descriptive ethnographic interviews) in which Danish primary school students (ages 7–11) were interviewed while being shown how to engage in games and other digital activities in their L2 (English). The study uncovered the children’s (a) motives for engaging in chatting and reading/listening to online content and (b) strategies for participating in digital activities in English. In the digital activities, the children showed engagement with English in a variety of ways and in a goal-oriented manner. The children who engaged heavily in digital activities for learning English tended to perceive the use of English in the language classroom as much less authentic than in the virtual space and were less motivated in class. Taken together, research on verbal reports to study reading and writing among children in L1 and L2/FL contexts has revealed that children’s mental strategies and metacognitive skills involve complicated interactions among characteristics of the children themselves (e.g., age, abilities, and experiences), the texts they are reading and writing (e.g., information-r ich vs. information-poor texts), and their

70

70 Yuko Goto Butler

learning environments (e.g., online vs. offline) (Bowles, 2010a; Coiro, 2011; Kucan & Beck, 1997; Meyers et al., 1990). Verbal reports can also be used to validate children’s L1 and L2/FL assessments. Winke and her colleagues (2018), based on a stimulated recall interview along with a picture drawing task, reported that incorrect responses on a standardized English assessment were due to children’s lack of assessment literacy and age- related cognitive capacity rather than their English knowledge. This finding suggests the value of using interviews with children to check the validity of assessments— even assessments that are already psychometrically established. Similarly, in Johnstone et al. (2006), think-alouds were administered to children with disabilities and L2 learners in order to identify construct-irrelevant variables in a large-scale math test. Finally, Butler’s (2018a, 2018b) use of stimulated recall revealed the complicated process that FL-learning children used to respond to self-assessment items. She argued that self-assessment items might measure different abilities depending on children’s age and the degree of contextualization of item construction and administration (e.g., context-specific items administered immediately after the given activity versus general items not bound by specific context).

Verbal Reports as a Method of Instruction In addition to using verbal reports as a data collection tool, scholars have embraced verbal reports, particularly think-alouds, as a pedagogical tool. Numerous studies have been conducted to understand how verbal reports can be used to enhance children’s monitoring and other metacognitive strategies in reading and writing as well as their effects (e.g., Kucan & Beck, 1997; Oster, 2001). A well-known example of how think-alouds are used in L1 literacy instruction involves reciprocal teaching or teacher modeling through think- alouds. For example, when reading a text for comprehension, teachers often make their mental processing visible by verbalizing it and presenting it as a model for their students, with the goal of teaching the children how to comprehend texts (Dunn, 2011; Palincsar & Brown, 1984; Rosenshine & Meister, 1994). After observing the teachers’ modeling, children are invited to engage in the process of thinking aloud, sometimes working with partners first and then gradually internalizing the process individually. This pedagogy aligns well with Vygotsky’s sociocultural theory, which holds that learning starts with social interactions and leads to individual self-regulation. A great deal of practical information is available to help teachers incorporate think-alouds into their instruction (e.g., Baumann et al., 1993; Ness & Kenny, 2016; Oster, 2001), including think-aloud examples in digital formats (White, 2016). Importantly, when think-alouds are used as an instructional tool, teachers are often encouraged to articulate their thoughts as a way to help their students make sense of texts; intentionality may play some role. As a result, teachers’ think-alouds in these situations may not directly reflect their reading processes and

70

71

70

Verbal Reports and Language Learning 71

might differ from the think-alouds that would be obtained if the teachers were not using think-alouds for instructional purposes (Kucan & Beck, 1997). Research indicates that think- alouds can indeed facilitate children’s comprehension-monitoring abilities (e.g., Baumann et al., 1993) and their acquisition of subject matter concepts (e.g., Ortlieb & Norris, 2012). Think-alouds seem to be effective for very young children as well as for children with disabilities and L2 readers (e.g., Migyanka et al., 2005). Most research into think-alouds as a pedagogical tool has focused on reading instruction; however, think-alouds have been applied to writing instruction as well (e.g., Fisher et al., 2008). Finally, think-alouds have also been used for diagnostic assessment—a type of formative assessment that helps teachers identify students’ strengths and weaknesses and assists their deeper engagement with texts though dialogue (Chun & Jang, 2012; Jang et al., 2013).

Challenges and Considerations for Implementing Verbal Reports with Young Language Learners When using verbal reports as an inquiry method for children, it is critically important to consider children’s developmental factors. With regard to children’s developing cognitive capacity, researchers must consider the potential cognitive burdens associated with think-alouds (i.e., conducting verbalization and problem solving simultaneously) and stimulated recall (i.e., relying on memory). Given that the previous studies using verbal reports, and particularly think-alouds, tend to focus on relatively older children (children at middle-and upper-g rade levels and beyond), the use of verbal reports with younger children seems to require extra consideration. It is important to note, however, that there are substantial individual differences in cognitive, social, and emotional development among children with the same chronological age. Researchers should also not forget that children are vulnerable to power imbalances between adults and children as well as potential negative emotions during experiments with verbal reports. As mentioned already, there has not been a lot of empirical research using verbal reports in the field of child L2/FL acquisition, but many studies of verbal reports in child development and education can offer us valuable insights into using verbal reports as a research tool for child L2/FL research. This section discusses issues associated with (a) veridicality and reactivity; (b) planning and administering verbal reports; and (c) analyzing and interpreting children’s verbal reports.

Veridicality and Reactivity: Conceptual Issues As discussed above, veridicality (accuracy of verbal responses) and reactivity (changes in performance/behavior as a result of producing verbal reports) are two major potential concerns with verbal reports as a method of inquiry. These concerns can be more serious when using verbal reports with children due to the reasons described below.

72

72 Yuko Goto Butler

Child development research has found frequent discrepancies between children’s verbal responses and their behavioral responses.The research also suggests discrepancies between children’s verbal reports and their thinking (Siegler, 2000;Woolley, 2006). In fact, agreeing with Siegler (2000), Woolley stated that such discrepancies are “a norm rather than exception” (Woolley, 2006, p. 1549). For example, in a study on mathematical problem solving, Goldin-Meadow (1997) reported that children’s hand gestures showed an accurate understanding of a math concept while their verbal explanation did not. Similarly, Zelazo et al. (1996) found that children verbalized one rule for a game but played the game using a different rule. Such discrepancies might have been due to the fact that the cognitive capacity required for the task execution exceeded the children’s verbalizable knowledge. It is, therefore, reasonable to assume that there can be differences in what is reflected in verbal and behavioral responses. Children might also change the nature of a task if they are asked to vocalize the process (Byrd et al., 2004).Woolley (2006) stated that verbal responses are viewed by children as goal-driven activities that are intended “to convey knowledge about their thought processes to the experimenter,” whereas behavioral responses are “an involuntary accompaniment to the verbal explanation” (p. 1545). Woolley also argued that verbal responses contain children’s judgment and explanation, whereas behavioral responses reflect their uncertain, automatic, unintentional, or spontaneous thought. In other words, verbal responses more or less reflect children’s explicit thoughts, whereas behavioral responses reflect their somewhat implicit thought. If this conceptualization is correct, it is important to remember that not all explicit thoughts can be elicited through verbal reports. In addition, verbal reports most likely fail to capture implicit thoughts, which may play a significant role in children’s language learning and use. During the course of development, children’s knowledge is largely unstable, uncertain, gradual, and transitional. Multiple levels of knowledge often coexist and thus can be hard to generalize through vocalization (Woolley, 2006).

Issues with Planning and Administering Verbal Reports The possible gaps in children’s verbal and behavioral responses discussed in the preceding subsection seem to be induced by a number of factors. Such factors include children’s age in addition to the modality of the tasks, the type and difficulty level of the tasks, the types of questions asked, the amount of planning time, children’s experience with the task, and children’s relationship with the person administering the experiment (e.g., whether he/she is an acquaintance) (Woolley, 2006). Researchers, therefore, should pay sufficient attention to these factors when planning and administering verbal reports in their work. First, when planning a task for verbal reports, the cognitive load required for completing the task, as well as any social and emotional elements associated with the task, should be carefully examined in relation to the child’s developmental levels and experiences. To manipulate cognitive demands and difficulty levels,

72

73

72

Verbal Reports and Language Learning 73

Robinson’s (2007) triadic componential framework for tasks would be helpful. In Robinson’s framework, tasks can be classified according to three dimensions: task complexity (cognitive demands); task condition (interactive factors); and task difficulty (learner factors) (p. 164). Based on this framework, when designing a writing task based on a series of wordless pictures for think-alouds, for example, researchers can manipulate cognitive demands by changing the number of pictures or protagonists appearing in the story, changing the order in which the pictures are presented, changing the type of story used (e.g., using one that requires reasoning and perspective taking instead of one that does not), and so forth. Researchers should keep in mind that verbalization during language tasks requires additional cognitive demands. Second, when offering instructions to children, it is important to provide them with sufficient preparation time for the tasks as well as for producing verbal reports (Greene et al., 2011).Warm-up sessions or trainings are desirable or even necessary, particularly for think-alouds.The instruction for conducting verbal reports should be comprehensible to children so that they can easily associate it with familiar experiences. Meyers and her colleagues (1990), for example, told children to be like a sports broadcaster and to try to report everything that occurred in their mind. Considering the potential benefits of using verbal reports as an instructional tool, the training itself can be an opportunity for learning. Depending on the research questions and types of data anticipated, researchers may prepare a structure for children to follow. Offering some modeling may work as well. However, caution is necessary; the training sessions and modeling may alter children’s thought processes to reflect what is expected by the researcher. Third, when stimulated recall is conducted among children, efforts should be made to minimize their memory load. Thus, it is critical to keep the duration between the event/activity in question and the recall as short as possible (Lyle, 2003). It is well established that one’s working memory and information- processing speed increase with age up to around age 18 (Schneider, 2014); thus, minimizing participants’ memory load in stimulated recall is important for adult learners (Gass & Mackey, 2000, 2017) but is particularly crucial when administered with children. Stimulated recall is also usually accompanied by some sort of aid (e.g., pictures, videos, verbal prompts, etc.) to assist with the recall. Indeed, there is evidence that incorporating videos and photographs can be beneficial (e.g., Cutter-Mackenzie et al., 2015; Hyvönen et al., 2014).Videos have also been used for triangulation purposes; specifically, researchers have analyzed video recordings of children’s behaviors in combination with children’s verbal reports in order to increase the validity of their examination or avoid bias in their interpretation (Theobald, 2008). For younger children (children up to lower primary school levels), researchers have used drawings in addition to stimulated recall (e.g., Gross & Hayne, 1999; Lee & Winke, 2018). Gross and Hayne (1999), for example, let children (5–6 years of age) draw pictures during the stimulated recall, which the authors referred to as a memory interview. Gross and Hayne found that drawing

74

74 Yuko Goto Butler

allowed the children to elicit more information. Moreover, the children’s delayed recall (administered 6 months later) showed the positive effects of drawing on children’s memory. Based on these results, Gross and Hayne suggested that drawing can facilitate children’s memory and their ability to verbalize their memory. Fourth, in order to further ease any potential cognitive burden, verbal reports are often accompanied by some sort of prompt from examiners. It is important to keep in mind that, because children are inclined to be directed by adults (Greene & Hill, 2005), how prompts are offered can greatly influence children’s verbal responses. According to Meier and Vogt (2015), what questions “help to clarify understanding and fill in knowledge gaps,” whereas why questions “are more thought-provoking and may lead to higher-order thinking” (p. 48). But no matter which types of questions are used, they should be open-ended enough so that they welcome any kind of response and avoid giving children the impression that they are being tested. Lyle (2003) also suggested that stimulated recall should have “consonance” with children’s cognitive organization (p. 861); the interview questions should be closely aligned with the mental processes at work in children’s behaviors. When a video or audio recording accompanies stimulated-recall activities, researchers must make a number of decisions. Verbal reports can be collected as the video/audio is being played back without interruption, or the reports can be collected after interrupting the playback. When the latter option is chosen, the researcher must decide who should interrupt the playback (i.e., the child or the researcher) and when/where it should be interrupted. If the researcher stops the video or pre-selects specific sequences from the video for stimulated recall, he/she can focus on a certain cognitive process efficiently. However, criteria should be laid out in advance to avoid bias (Meier & Vogt, 2015). Alternatively, the researcher can ask children to verbalize whenever they want to and to comment on whatever they want to. This approach allows the researcher to identify what children are interested in and focused on (Theobald, 2008). Moreover, researchers can also have children elaborate on their initial responses by engaging in dialogue with them (Hyvönen et al. 2014). In any event, all these decisions must be made based on the nature of the research questions and the type of information that the researcher expects to find in the data. Finally, great care must be taken to ease children’s emotional distress, such as anxiety and demotivation. Verbal reports should be collected in a space that children find comfortable and in a manner that puts them at ease; ideally, the children should be familiar with the person collecting the reports (Lyle, 2003; Meier & Vogt, 2015). Adult participants often complete think-alouds without having an examiner present, an approach that may be difficult for children. The presence of a familiar examiner may serve as an emotional and motivational support, at least for some children, to complete the task, although specific effects of the presence or absence of an examiner on children’s verbal reports requires empirical investigation. To reduce children’s anxiety, some researchers

74

75

74

Verbal Reports and Language Learning 75

suggest using pair-or group-stimulated recall, with younger children in particular (e.g., DeWitt & Osborne, 2010; Meier & Vogt, 2015; Morgan, 2007). Stimulated recall in pair or group formats can facilitate children’s participation, and it allows the researcher to elicit comments that would not be possible in individually conducted recall (e.g., comments based on their shared knowledge) (Cutter-Mackenzie et al., 2015; Fleer, 2008). Verbal responses produced in pair and group formats may better reflect children’s thoughts if the researcher has less direct involvement in the recall processes. However, verbal reports produced in pair and group formats are at greater risk of being influenced by dynamic social relationships among children (Als et al., 2005; Meier & Vogt, 2015; see also Chapter 4 in this volume).

Issues with Analyzing and Interpreting Children’s Verbal Reports When analyzing and interpreting verbal data elicited from children, it is important to remember that there are substantial individual differences in verbalization abilities (both in terms of volume and quality) among children (Meier & Vogt, 2015). Elicited data may not tell the researchers the whole story. When video or audio aids are used along with the verbalization, researchers should acknowledge that bringing a video or audio recorder into a learning context itself creates a risk of altering children’s behaviors (Fleer, 2008). Given that discrepancies between verbal and nonverbal behaviors among children are rather common, researchers may want to analyze nonverbal data (e.g., gestures, eye gazes, eye movements, etc.) together with verbal data.

Implications for Child SLD Researchers When considering the potential implications of using verbal reports for child SLD research, I focus on two areas of great interest to researchers and educators alike: using digital technology and taking a child-centered approach.

Using Digital Technology Verbal reports represent nonautomatized processes and strategies. But automatized processes or implicit knowledge can play a significant role in children’s L2/FL learning (DeKeyser, 2000). Thus, it is advisable to combine verbal reports with data obtained from automatized or implicit sources. Advances in technology allow researchers to use physiological and neuroscientific measures, in addition to traditional observation-based behavioral data, together with verbal reports. Bell and her colleagues (2018) surveyed major physiological and neuroscientific measures that can be combined with verbal reports, including eye movements, pupil arousal (pupillometry), heart rate, skin conductance response (electrodermal activity,

76

76 Yuko Goto Butler

EDA), electronic current in brains (electroencephalography, EEG), blood-oxygen signals in the brain (magnetic resonance imaging, MRI), and so forth (see also Chapters 8 and 9 in this volume). With the exception of eye movement (e.g., Lee & Winke, 2018), these measures have rarely been used in child L2/FL studies, but they have the potential to deepen our understanding by accessing unconscious mechanisms that verbal data alone cannot reveal. Studies combining eye-tracking data, for example, open a door to better understanding underlying mechanisms during pauses in think-alouds, the relationship between eye-movements and cognitive processing, and so forth (e.g., Oh et al., 2013). Advances in digital technology also expand possible topics for future studies. Because many children grow up with digital technology, it would be fruitful to investigate how and the extent to which digital technology influences the way that children comprehend and produce language online, and how their mental processes for reading and writing online are different from or similar to paper- based reading and writing. Because online reading and writing often involve the use of multimodal information as well as frequent multitasking, it is reasonable to assume that children’s reading and writing processing have undergone a great deal of change (Coiro, 2011). Research on this topic would have significant practical implications, and verbal reports can be a promising way to investigate such issues. Thanks to technological advances, data collection and analyses are also increasingly efficient and diversified. For instance, data can be digitalized and shared widely among teachers and learners to instruct or develop strategies for reading and writing. And there are online sites where children can upload digital think- aloud files for others to share (e.g., White, 2016).2

Taking a Child-Centered Approach As with other child development research, it is advisable to seek a child-centered approach in research using verbal reports. In child psychology, the notion of researching with children, as opposed to researching on children, has gained substantial attention (Christensen & James, 2017).Traditionally, researchers—including researchers using verbal reports—have treated children as research objects (e.g., adults giving children some measures in a controlled setting) or research subjects (e.g., adults observing children and interpreting their behaviors). The researching with children approach, however, considers children’s agency in their learning and their experience to be critical and central to the project. Children are treated as social actors and sometimes even invited to participate as quasi-or co-researchers in the studies (Charters, 2003). In research with verbal reports, there is room for researchers to endorse the idea that “children are experts in their own rights” (Dahl, 2014, p. 595) and to grant them greater autonomy. For example, empowering children to control when and how to verbalize their thoughts can give them more autonomy. Researchers can even invite children to analyze and interpret the data. The digital think-aloud site that I mentioned at the end of the preceding section

76

7

76

Verbal Reports and Language Learning 77

(White, 2016) could be a step toward building a child-led community of mutual learning. In conclusion, verbal reports have been used as a research tool to better understand children’s knowledge, cognitive processes, and strategies related to various language activities. When verbal reports are implemented with children, researchers need to consider issues related to children’s age and experience.Verbal reports have also been used as an effective pedagogical tool to support children’s language development. For the future, researchers are encouraged to explore additional angles of using think-alouds with children, to take advantage of advancing technology to triangulate the data, and to adopt a more child-centered approach when implementing verbal reports.

Further Readings Bowles, M. A. (2010a). The think-aloud controversy in second language research. Routledge. Gass, S. M., & Mackey, A. (2000). Stimulated recall methodology in second language research. Lawrence Erlbaum. Gass, S. M., & Mackey, A. (2017). Stimulated recall methodology in applied linguistics and L2 research. Routledge. Gass and Mackey (2000) is a pioneering work on stimulated recall in SLD research, and Gass and Mackey (2017) is an updated version of the use of the method in the field. Bowles (2010a) focuses on the think-aloud method in SLD. For a general, comprehensive introduction to stimulated recall and think-alouds, these books are highly recommended even though they are not meant specifically for children.

Discussion Questions 1. Identify the strengths and weaknesses of think-aloud and stimulated-recall methods. Come up with a series of potential research questions for each method. 2. Develop a set of reading and writing think-aloud tasks for L2/FL-learning primary school students. Identify potential cognitive, social, and affective loads for each task. 3. Discuss ethical considerations when using think-alouds and stimulated recall with children.

Notes 1 Behaviorism and mentalism are major theoretical approaches to learning. Behaviorism emphasizes the role of environmental factors in learning. By exclusively focusing on observable behaviors, behaviorism explains behaviors as a result of external stimulus and response mechanisms—namely, through interacting with the external environment. In contrast, mentalism emphasizes the role of internal innate factors in learning and views learning as primarily biologically determined. 2 An example of such sites is tinyurl.com/screencast-clip-B discussed in White (2016).

78

78 Yuko Goto Butler

References Als, B. S., Jensen, J. J., & Skov, M. B. (2005). Comparison of think-aloud and constructive interaction in usability testing with children. In Proceedings of the 4th International Conference for Interaction Design and Children. University of Colorado. Alvermann, D. E. (1984). Second graders’ strategic preferences while reading basal stories. Journal of Educational Research, 77(3), 184–189. https://doi.org/10.1080/00220 671.1984.10885521 Barkaoui, K., Brooks, L., Swain, M., & Lapkin, S. (2013). Test-takers’ strategic behaviors in independent and integrated speaking tasks. Applied Linguistics, 34, 304–324. https://doi. org/10.1093/applin/ams046 Baumann, J. F., Jones, L. A., & Seifert-Kessell, N. (1993). Using think alouds to enhance children’s comprehension monitoring abilities. The Reading Teacher, 47(3), 184–193. https://www.jstor.org/stable/20201231 Bell, L.,Vogt, J., Willemse, C., Routledge, T., Butler, L. T., & Sakai, M. (2018). Beyond self- report: A review of physiological and neuroscientific methods to investigate consumer behavior. Frontiers in Psychology, 9(1655). https://doi.org/10.3389/fpsyg.2018.01655 Bloom, B. (1953).Thought-processes in lectures and discussions. In S. J. French (Ed.), Accent on teaching: Experiments in general education (pp. 23–46). Harper. Bosher, S. (1998). The composing processes of three Southeast Asian writers at the post- secondary level: An exploratory study. Journal of Second Language Writing, 7, 205–241. https://doi.org/10.1016/s1060-3743(98)90013-3 Bowles, M. A. (2010a). The think-aloud controversy in second language research. Routledge. Bowles, M. A. (2010b). Concurrent verbal reports in second language acquisition research. Annual Review of Applied Linguistics, 30, 111–127. https://doi.org/10.1017/s026719051 0000036 Bowles, M. A. (2019). Verbal reports in instructed SLA. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 32–43). Routledge. Breetvelt, I., van den Bergh, H., & Rijlaarsdam, G. (1994). Relations between writing processes and text quality: When and how? Cognition and Instruction, 12(2), 103–123. https://doi.org/10.1207/s1532690xci1202_2 Brown, A. (1993).The role of test taker feedback in the test development process:Test taker’s reactions to a tape-mediated test of proficiency in spoken Japanese. Language Testing, 10(3), 277–301. https://doi.org/10.1177/026553229301000305 Butler, Y. G. (2018a). The role of context in young learners’ processes for responding to self-assessment items. The Modern Language Journal, 102(1), 242–261. https://doi.org/ 10.1111/modl.12459 Butler, Y. G. (2018b). Young learners’ processes and rationales for responding to self- assessment items: Cases of generic can-do and five-point Likert-type formats. In J. Davis et al. (Eds.), Useful assessment and evaluation in language education (pp. 21–39). Georgetown University Press. Butler, Y. G. (2020). The ability of young learners to construct word meaning in context. Studies in Second Language Learning and Teaching, 10(3), 549–580. https://orcid.org/ 0000-0002-9531-3469 Byrd, D. L., van der Veen, T. K., McNamara, J. P. H., & Berg, K. (2004). Preschoolers don’t practice what they preach: Preschoolers’ planning performances with manual and spoken response requirements. Journal of Cognition and Development, 5, 427–449. https:// doi.org/10.1207/s15327647jcd0504_2

78

79

78

Verbal Reports and Language Learning 79

Charters, E. (2003). The use of think-aloud methods in qualitative research: An introduction to think-aloud methods. Brock Education, 12(2), 68–82. https://doi.org/10.26522/ brocked.v12i2.38 Christensen, P., & James, A. (Eds.). (2017). Research with children: Perspectives and practices (3rd ed.). Routledge. Chun, C.W., & Jang, E. E. (2012). Dialogic encounters with early readers through mediated think-alouds: Constructing the transactional zone. Language and Literacy, 14(3), 61–82. https://doi.org/10.20360/g2d01k Coiro, J. (2011). Talking about reading as thinking: Modelling the hidden complexities of online reading comprehension. Theory and Practice, 50, 107–115. https://doi.org/ 10.1080/00405841.2011.558435 Coiro, J., & Dobler, E. (2007). Exploring online reading comprehension strategies used by skillful sixth grade readers to search for and locate information on the Internet. Reading Research Quarterly, 42(2), 214–257. https://doi.org/10.1598/r rq.42.2.2 Coté, N., Goldman, S. R., & Saul, E. U. (1998). Students making sense of informational text: Relations between processing and representation. Discourse Processes, 25(1), 1–53. https://doi.org/10.1080/01638539809545019 Cutter- Mackenzie, A., Edwards, S., & Quinton, H. W. (2015). Child- framed video research methodologies: Issues, possibilities and challenges for researching with children. Children’s Geographies, 13(3), 343–356. https://doi.org/10.1080/14733 285.2013.848598 Dahl, T. I. (2014). Children as researchers: We have a lot to learn. In G. B. Melton, A. Ben- Arieh, J. Cashmore, G. S. Goodman & N. K. Worley (Eds.), The SAGE handbook of child research (pp. 593–618). Sage. Davidson, M., & Berninger,V. (2016). Thinking aloud during idea generating and planning before written translation: Developmental changes from ages 10 to 12 in expressing and defending opinions. Cogent Psychology, 3(1). https://doi.org/10.1080/23311 908.2016.1276514 DeKeyser, R. (2000). The robustness of critical period effects in second language acquisition. Studies in Second Language Acquisition, 22(4), 499–533. https://doi.org/10.1017/ s0272263100004022 Deschambault, R. (2018). Activity managed products: Think-aloud data and methods in applied linguistics research. Applied Linguistics Review, 9(4), 539–562. https://doi.org/ 10.1515/applirev-2017-0028 DeWitt, J., & Osborne, J. (2010). Recollections of exhibits: Stimulated-recall interviews with primary school children about science centre visits. Internal Journal of Science Education, 32, 1365–1388. https://doi.org/10.1080/09500690903085664 Dunn, M. W. (2011). Writing- skills instruction: Teachers’ perspectives about effective practices. Journal of Reading Education, 37(1), 18–25. Egi, T. (2008). Investigating stimulated recall as a cognitive measure: Reactivity and verbal reports in SLA research methodology. Language Awareness, 17(3), 212–228. https://doi. org/10.1080/09658410802146859 Ericsson, K. A., & Fox, M. C. (2011). Thinking aloud is not a form of introspection but a qualitatively different methodology: Reply to Schooler (2011). Psychological Bulletin, 137(2), 351–354. https://doi.org/10.1037/a0022388 Ericsson, K., & Simon, H. (1984). Protocol analysis:Verbal reports as data. MIT Press. Ericsson, K., & Simon, H. (1996). Protocol analysis: Verbal reports as data (3rd ed.). MIT Press.

80

80 Yuko Goto Butler

Fisher, D., Frey, N., & Lapp, D. (2008). Shared readings: Modeling comprehension, vocabulary, text structures, and text features for older readers. The Reading Teacher, 61(7), 548– 557. https://doi.org/10.1598/rt.61.7.4 Fleer, M. (2008). Using digital video observations and computer technologies in a cultural- historical approach. In M. Hedegaard & M. Fleer (Eds.), Studying children: A cultural- historical approach (pp. 111–117). McGraw Hill, Open University Press. Fox, M. C., Ericsson, K. A., & Best, R. (2011). Do procedures for verbal reporting of thinking have to be reactive? A meta-analysis and recommendations for best reporting methods. Psychological Bulletin, 137, 316–344. https://doi.org/10.1037/a0021663 Gass, S. (1994). The reliability of second-language grammaticality judgements. In E. Tarone, S. Gass & A Cohen (Eds.), Research methodology in second-language acquisition (pp. 303– 322). Lawrence Erlbaum. Gass, S. M., & Mackey, A. (2000). Stimulated recall methodology in second language research. Lawrence Erlbaum. Gass, S. M., & Mackey, A. (2017). Stimulated recall methodology in applied linguistics and L2 research. Routledge. Goh, C. C. (1998). How ESL learners with different listening abilities use comprehension strategies and tactics. Language Teaching Research, 2(2), 124–147. https://doi.org/ 10.1191/136216898667461574 Goldin-Meadow, S. (1997). When gestures and words speak differently. Current Directions in Psychological Science, 6, 138–143. https://doi.org/10.1111/1467-8721.ep10772905 Gordon, C. J. (1990). Modeling an expository text structure strategy in think alouds. Reading Horizons, 31, 149–167. https://scholarworks.wmich.edu/reading_horizons/ vol31/iss2/6 Greene, J. A., Robertson, J., & Costa, L. J. (2011). Assessing self-regulated learning using think-aloud methods. In B. J. Zimmerman & D. H. Schunk (Eds.), Handbook of self- regulation of learning and performance (pp. 313–328). Routledge. Greene, S., & Hill, M. (2005). Conceptual, methodological and ethical issues in researching children’s experiences. In S. Greene & D. Hogan (Eds.), Researching children’s experience: Methods and approaches (pp. 1–21). Sage. Gross, J., & Hayne, H. (1999). Drawing facilitates children’s verbal reports after long delays. Journal of Experimental Psychology: Applied, 5(3), 265–283. https://doi.org/10.1037/ 1076-898x.5.3.265 Hawkins, B. (1985). Is the appropriate response always so appropriate? In S. Gass & C. Madden (Eds.), Input in second language acquisition (pp. 162–178). Newbury House. Hosenfeld, C. (1977). A preliminary investigation of the reading strategies of successful and nonsuccessful second language learners. System, 5(2), 110–123. https://doi.org/ 10.1016/0346-251x(77)90087-2 Hu, J., & Gao, X. A. (2017). Using think-aloud protocol in self-regulated reading research. Educational Research Review, 22, 181–193. https://doi.org/10.1016/j.edurev.2017.09.004 Hyvönen, P., Kronqvist, E., Järvelä, S., Määttä, E., Mykkänen, A., & Kurki, K. (2014). Interactive and child-centred research methods for investigating efficacious agency of children. Journal of Early Childhood Education Research, 3(1), 82–107. https://jecer.org/ fi/issues/jecer-31-2014/ James, W. (1890). The principles of psychology (Vol. 1). Henry Holt. Jang, E. E., Dunlop, M., Wagner, M., Kim,Y.-H., & Gu, Z. (2013). Elementary school ELLs’ reading skill profiles using cognitive diagnosis modeling: Roles of length of residence and home language environment. Language Learning, 63(3), 400–436. https://doi.org/ 10.1111/lang.12016

80

81

80

Verbal Reports and Language Learning 81

Jensen, S. H. (2019). Language learning in the wild: A young user perspective. Language Learning & Technology, 23(1), 72–86. https://doi.org/10125/44673 Jiménez, R. T., García, G. E., & Pearson, P. D. (1996). The reading strategies of bilingual Latina/ o students who are successful English readers: Opportunities and obstacles. Reading Research Quarterly, 31(1), 90–112. https://doi.org/10.1598/r rq.31.1.5 Johnstone, C. J., Bottsford-Miller, N. A., & Thompson, S. J. (2006). Using the think aloud method (cognitive labs) to evaluate test design for students with disabilities and English language learners (Technical Report 44). University of Minnesota, National Center on Educational Outcomes. Kim, H. I., & Choi, S.-Y. (2017). Inferential characteristics of poor comprehenders and typically developing children using the think-aloud method. Communication Sciences & Disorders, 22(4), 669–680. https://doi.org/10.12963/csd.17428 Kucan, L., & Beck, I. L. (1997).Thinking aloud and reading comprehension research: Inquiry, instruction, and social interaction. Review of Educational Research, 67(3), 271–299. https:// doi.org/10.3102/00346543067003271 Lee, S., & Winke, P. (2018). Young learners’ response processes when taking computerized tasks for speaking assessment. Language Testing, 35(2), 239–269. https://doi.org/10.1177/ 0265532217704009 Leow, R. P., & Morgan-Short, K. (2004). To think aloud or not to think aloud: The issue of reactivity in SLS research methodology. Studies in Second Language Acquisition, 26(1), 35–57. https://doi.org/10.1017/s0272263104026129 Leu, D. J., Jr., Kinzer, C. K., Coiro, J., & Cammack, D. (2004). Toward a theory of new literacies emerging from the Internet and other information and communication technologies. In R. B. Ruddell & N. Unrau (Eds.), Theoretical models and processes of reading (5th ed.) (pp. 1570–1613). International Reading Association. Levy, C. M., & Ransdell, S. (1995). Is writing as difficult as it seems? Memory and Cognition, 23, 767–779. https://doi.org/10.3758/bf03200928 Lyle, J. (2003). Stimulated recall: A report on its use in naturalistic research. British Educational Research Journal, 29(6), 861–878. https://doi.org/10.1080/0141192032000137349 Mackey, A., Gass, S., & McDonough, K. (2000). How do learners perceive interactional feedback? Studies in Second Language Acquisition, 22(4), 471–497. https://doi.org/ 10.1017/s0272263100004010 Meier, A. M., & Vogt, F. (2015). The potential of stimulated recall for investigating self- regulation processes in inquiry learning with primary school students. Perspectives in Science, 5, 45–53. https://doi.org/10.1016/j.pisc.2015.08.001 Meyers, J., Lytle, S., Palladino, D., Devenpeck, G., & Green, M. (1990).Think-aloud protocol analysis: Investigation of reading comprehension strategies in fourth-and fifth-grade students. Journal of Psychoeducational Assessment, 8, 112–127. https://doi.org/10.1177/ 073428299000800201 Migyanka, J. M., Policastro, C., & Lui, G. (2005). Using a think- aloud with diverse students: Three primary grade students with diverse experience. Early Childhood Education Journal, 33(3), 171–177. https://doi.org/10.1007/s10643-005-0045-z Morgan,A. (2007). Using video-stimulated recall to understand young children’s perceptions of classroom settings. European Early Childhood Education Research Journal, 15(2), 213– 226. https://doi.org/10.1080/13502930701320933 Nassaji, H. (2006). The relationship between depth of vocabulary knowledge and L2 learners’ lexical inferencing strategy use and success. The Modern Language Journal, 90(3), 387–401. https://doi.org/10.1111/j.1540-4781.2006.00431.x

82

82 Yuko Goto Butler

Negretti, R., & McGrath, L. (2018). Scaffolding genre knowledge and metacognition: Insights from an L2 doctoral research writing course. Journal of Second Language Writing, 40, 12–31. https://doi.org/10.1016/j.jslw.2017.12.002 Ness, M., & Kenny, M. (2016). Improving the quality of think-alouds. The Reading Teacher, 69(4), 453–460. https://doi.org/10.1002/trtr.1397 Oh, K., Almarode, J. T., & Tai, R. T. (2013). An exploration of think-aloud protocols linked with eye-gaze tracking: Are they talking about what they are looking at. Procedia –Social and Behavioral Sciences, 93, 184–189. https://doi.org/10.1016/j.sbspro.2013.09.175 Ortlieb, E., & Norris, M. (2012). Using the think-aloud strategy to bolster reading comprehension of science concepts. Current Issues in Education, 15(1). http://cie.asu.edu/ojs/ index.php/cieatasu/article/view/890 Oster, L. (2001). Using the think-aloud for reading instruction. The Reading Teacher, 55(1), 64–69. www.jstor.org/stable/20205012 Palincsar, A. S., & Brown, A. L. (1984). Reciprocal teaching of comprehension-fostering and comprehension-monitoring activities. Cognition and Instruction, 1(2), 117–175. https:// doi.org/10.1207/s1532690xci0102_1 Piaget, J. (1952). Origins of intelligence in the child (A. Cook,Trans.). International Universities Press. (Original work published 1936) Polio, C., Gass, S. M., & Chapin, L. (2006). Using stimulated recall to investigate native speaker perceptions in native-nonnative speaker interaction. Studies in Second Language Acquisition, 28(2), 237–267. https://doi.org/10.1017/s0272263106060116 Pressley, M., & Afflerbach, P. (1995). Verbal protocols of reading: The nature of constructively responsive reading. Erlbaum. Rao, Z., Gu, P. Y., Zhang, L. J., & Hu, G. (2007). Reading strategies and approaches to learning of bilingual primary school pupils. Language Awareness, 16(4), 243–262. https:// doi.org/10.2167/la423.0 Robinson, P. (2007). Task complexity, the cognition hypothesis and second language learning and performance. International Review of Applied Linguistics in Language Teaching, 45, 161–176. https://doi.org/10.1515/IRAL.2007.007 Rosenshine, B., & Meister, C. (1994). Reciprocal teaching: A review of the research. Review of Educational Research, 64(4), 479–530. www.jstor.org/stable/1170585 Sachs, R., & Polio, C. (2007). Learners’ uses of two types of written feedback on an L2 writing revision task. Studies in Second Language Acquisition, 29(1), 67–100. https://doi. org/10.1017/s0272263107070039 Sanchez, H. S., & Grimshaw, T. (2020). Stimulated recall. In J. McKinley & H. Rose (Eds.), The Routledge handbook of research methods in applied linguistics (pp. 312–323). Routledge. Schneider,W. (2014). Memory developments in childhood. In U. Goswami (Ed.), The Wiley- Blackwell handbook of childhood cognitive development (pp. 347–376). Wiley Blackwell. Schooler, J. W. (2011). Introspecting in the spirit of William James: Comment on Fox, Ericsson, and Best (2011). Psychological Bulletin, 137(2), 345–350. https://doi.org/ 10.1037/a0022390 Sheppard, C., & Ellis, R. (2018). The effects of awareness-raising through stimulated recall on the repeated performance of the same task and on a new task of the same type. In M. Bygate (Ed.), Learning language through task repetition (pp. 171–192). John Benjamins. Siegler, R. S. (2000). The rebirth of children’s learning. Child Development, 71, 26–35. https://doi.org/ 10.1111/1467-8624.00115 Swain, M. (2006). Verbal protocols: What does it mean for research to use speaking as a data collection tool? In M. Chalhoub-Deville, C. A. Chapelle & P. Duff (Eds.), Inference

82

83

82

Verbal Reports and Language Learning 83

and generalizability in applied linguistics: Multiple research perspectives (pp. 97–113). John Benjamins. Swain, M., & Lapkin, S. (1998). Interaction and second language learning: Two adolescent French immersion students working together. Modern Language Journal, 82, 320–337. https://doi.org/10.1111/j.1540-4781.1998.tb01209.x Theobald, M. A. (2008). Methodological issues arising from video stimulated recall with young children. In Australian Association of Research in Education (AARE) Conference 2008, 30th November–4th December, 2008, Brisbane. Retrieved from https://eprints.qut.edu.au/ 17817/ Tode, T. (2012). Schematization and sentence processing by foreign language learners: A reading-time experiment and a stimulated-recall analysis. IRAL, 50, 161–187. https:// doi.org/10.1515/iral-2012-0007 Valfredini, A. (2015). Studying the process of writing in a foreign language: An overview of the methods. Journal of Language Teaching and Research, 6(5), 907–912. https://doi.org/ 10.17507/jltr.0605.01 Vygotsky, L. S. (1978). Thought and language. (E. Hanfmann & G. Vaker, Eds., Trans.). MIT Press. (Original work published 1934) White, A. (2016). Using digital think-alouds to build comprehension of online informational texts. The Reading Teacher, 69(4), 421–425. https://doi.org/10.1002/trtr.1438 Winke, P., Lee, S., Ahn, J. I., Choi, I., Cui,Y., & Yoon, H-J. (2018). The cognitive validity of child English language tests: What young language learners and their native-speaking peers can reveal. TESOL Quarterly, 52(2), 274–303. https://doi.org/10.1002/tesq.396 Woolley, J. D. (2006). Verbal-behavioral dissociations in development. Child Development, 77(6), 1539–1553. https://doi.org/10.1111/j.1467-8624.2006.00956.x Yoshida, R. (2008). Learners’ perceptions of corrective feedback in pair work. Foreign Language Annals, 41(3), 525–541. https://doi.org/10.1111/j.1944-9720.2008.tb03310.x Zelazo, P. D., Frye, D., & Rapus, T. (1996). An age-related dissociation between knowing rules and using them. Cognitive Development, 11, 37–63. https://doi.org/10.1016/ s0885-2014(96)90027-1 Zhang, L. J., Gu, P. Y., & Hu, G. (2008). A cognitive perspective on Singaporean primary school pupils’ use of reading strategies in learning to read in English. British Journal of Educational Psychology, 78(2), 245–271. https://doi.org/10.1348/000709907x218179 Zhang, L. J., & Zhang, D. (2020). Think-aloud protocols. In J. McKinley & H. Rose (Eds.), The Routledge handbook of research methods in applied linguistics (pp. 302–311). Routledge.

84

6 RESEARCH METHODS FOR EVALUATING SECOND LANGUAGE SPEECH PRODUCTION Becky H. Huang and Rica Ramírez

Introduction The number of children growing up learning more than one language has been increasing dramatically in the past few decades due to globalization and immigration trends (Baker & Wright, 2017).Young learners may learn a second language (L2) in a majority/societal language immersion context (e.g., learning English in the United States) or in a minority/foreign language instructed learning context (e.g., learning English in Mexico) (Huang & Kuo, 2020). The increase in this population comes with the need to study their L2 production for a fundamental understanding of L2 development. Evaluation of L2 production is also conducted to make education decisions, such as young learners’ eligibility to receive or exit L2 service programs, and/or to meet the needs of stakeholders such as the government or parents (Chik & Besser, 2011; Huang, 2016). Young L2 learners mainly rely on oral language for communication. L2 speech production serves as a cognitive tool for young learners to develop higher mental functions (Vygotsky, 1986) and is also a critical precursor to literacy (August & Shanahan, 2006). Additionally, L2 speech production is used to distinguish language-learning difficulties from a true developmental language disorder and/ or from lack of opportunities to learn L2 (Westby & Hwa-Froelich, 2010; see also Chapter 10 in this volume). Given the rapidly growing young L2 learner population and the important role of oral language in their language development, this chapter focuses on L2 speech production methods for young learners/L2 children between the ages of 6 and 12, i.e., from kindergarten to primary/elementary school grades.

DOI: 10.4324/9780367815783-6

84

85

84

Evaluating L2 Speech Production 85

An Overview of L2 Speech Production Methods in the Literature Speech production is generally defined as the oral production of language, which includes components of phonology, vocabulary, morphology, grammar, and discourse (De Jong et al., 2012). One of the conceptual models of speech production consists of three main constructs: complexity, accuracy, and fluency (CAF) (Ellis, 2009). In the CAF framework, accuracy is operationalized as the target-likeness of pronunciation (vowels and consonants), word stress, grammar, and so forth, whereas complexity generally refers to the complexity of sentences and grammatical structures. Fluency may be indexed by speech rate, articulation rate, number of pauses, repetitions, and repairs (Huang et al., 2018a, 2018b; Huang et al., 2017). Researchers have also proposed a distinction between accentedness, intelligibility, and comprehensibility (Derwing & Munro, 1997). Intelligibility was defined as the extent to which listeners understand the intended message at the word and utterance levels, whereas comprehensibility referred to listeners’ perception of intelligibility at a higher level of understanding. Accentedness was generally measured by listeners’ perceptions of the degree of non-native accents in the L2 speech. Although related to intelligibility, accentedness could be a distinct dimension, as a strong accent does not necessarily impede intelligibility. Some researchers have argued that comprehensible rather than unaccented speech should be the L2 learning goal (Saito et al., 2016). Techniques for eliciting young learners’ L2 speech production mainly fall into two categories: standardized norm-referenced assessments and language sampling. Both types of methods can be used to evaluate various components of speech production, from phonology to the hierarchical organization of the discourse.The following sections will provide more details about these methods.

Standardized Norm-Referenced Assessments The majority of norm-referenced assessments that are commercially available are for evaluating English speech production. Based on the structural linguistics perspective, language is a symbolic system consisting of discrete and hierarchical components such as phonetics (sounds), semantics (words), and grammar (sentences) (Chomsky, 1986). These assessments are available for all language components. Example instruments that are widely used include Diagnostic Evaluation of Articulation and Phonology (DEAP) (Dodd et al., 2002) and Goldman-Fristoe Test of Articulation 2 (GFTA-2) (Goldman & Fristoe, 2000) for phonology; Expressive Vocabulary Test (EVT) (Williams, 1997, 2007) and Expressive One-Word Picture Vocabulary Test (EOWPVT-4) (Martin & Brownell, 2011) for vocabulary; and subtests in the Clinical Evaluation of Language Fundamentals (CELF) (Wiig et al., 2013), Bilingual Syntax Measure (Burt et al., 1973), Bilingual English-Spanish Assessment (BESA) (Peña et al., 2018), and Bilingual

86

86 Becky H. Huang and Rica Ramírez

English-Spanish Assessment-Middle Elementary (BESA-ME) for the evaluation of syntax/grammar. For the assessment of discourse or narrative production, Test of Narrative Language (TNL) (Gillam & Pearson, 2004) and the Edmonton Narrative Norms Instrument (ENNI) (Schneider & Hayward 2005) are popular. In addition, the Multilingual Assessment Instrument for Narratives (MAIN) is an instrument developed by a group of researchers from Germany and was designed to assess the comprehension and production of narratives in children who acquire one or more languages from 3 to 9 years of age (Gagarina et al., 2012). There are also a variety of norm-referenced assessments for evaluating young learners’ overall L2 speech production proficiency, also known as speaking proficiency. They are either developed by testing companies to meet assessment demands or by government entities for education accountability purposes. For example, a well- known commercially available test, Cambridge English: Young Learners, targets English learners between the ages of 6 and 12. The test was developed by Cambridge English Language Assessment in the early 1990s and is currently used worldwide. Other popular English proficiency assessments for young learners that were developed by testing companies include the Pearson Test of English Young Learners (PTE), LAS Links®, and IPT I Oral English. In response to the increasing assessment demands for young L2 learners, Educational Testing Service (ETS) has also developed TOEFL Primary® that targets young English learners ages eight and above (Cho et al., 2016). All of these commercially available English proficiency tests include a speaking subtest. Some of these English language assessments, such as Cambridge English Language Assessment, were developed and/or used mainly in English as a foreign language (EFL) contexts, whereas others, such as LAS Links® and IPT, target English as a second language (ESL) learners. Although they all measure English language proficiency, the purposes, constructs, and uses of ESL and EFL assessments vary (Wolf & Faulkner-Bond, 2016). For more details about these assessments, see the Tools and Resources section at the end of this chapter. To meet education accountability requirements in migrant-receiving countries such as the United States and Canada, the government or educational organizations would develop language proficiency assessments for young immigrant children to evaluate their L2 proficiency, and these assessments generally include a speech production/speaking component. To illustrate, the most widely adapted large-scale assessment for measuring the English proficiency of school-age L2 learners in the US is ACCESS for ELLs, which was developed by the World-class Instructional Design and Assessment (WIDA) consortium and currently used in 40 different states. The other popular assessment is English Language Proficiency Assessment for the 21st Century (ELPA21), which was developed by a consortium of states and is currently used in eight states as of 2018 (Huang & Flores, 2018). In contrast to the English language assessments developed by testing companies, such as LAS Links® and IPT, ACCESS for ELLs and ELPA21 both received funding support from the US Department of Education and were developed by a consortium of states.

86

87

86

Evaluating L2 Speech Production 87

Although norm- referenced assessments are psychometrically ideal, the prescribed and rigid setting may induce children’s anxiety, particularly among younger children such as kindergarteners and children in early elementary grades. These assessments should also be used with caution when evaluating culturally and linguistically diverse children (Heilmann et al., 2010a). For example, although labeling is a popular task in English vocabulary measures, some cultures may de- emphasize labeling, and children may thus be inadvertently identified as being at risk of language or learning problems due to their low performances on the language tests (Strauss et al., 2006).

Language Sampling Methods Language sampling methods use samples of children’s speech to evaluate their language proficiency. Language sampling methods are varied in the sampling procedures, the genres of the elicited samples (e.g., fictional, conversational), and the analysis of samples. Sampling procedures can be characterized as either naturalistic sampling or structured elicitation. Naturalistic sampling involves recording children’s L2 production and interaction as it occurs using voice-or video-recorders. To gain a comprehensive picture of child language production, researchers generally sample across different contexts, such as book reading, free play, and meal times. This sampling method is less intrusive and may reduce performance and test anxiety (see also Chapter 2 in this volume). There are some technological resources available for the naturalistic sampling method. For example, Language ENvironment Analysis (LENA) is a commercial product that allows for the audio-recording of a child’s language environment one day at a time (www.lena.org/technology/). LENA uses a small battery-operated wearable device to capture a child’s language input and production. The LENA device can capture a full day of speech. The audio files are then transferred to a cloud-based software that analyzes the audio files using algorithms that can distinguish between adult speech, child speech, and noises. The software then generates analysis and feedback reports on the quantity and quality of talk in the given child’s environment based on complex algorithms and child language research. LENA is normed and validated in five different languages: English, Spanish, European French, the Shanghai dialect of Mandarin Chinese, and Swedish. Although it is designed and validated for young children between the ages of 0 and 4, it has the potential to be expanded for use with older children (Jones et al., 2019). In contrast to naturalistic sampling, a structured sampling task involves providing a specific prompt to the child to elicit L2 production. If we were to capture infrequent occurrences of speech production, such as certain morphological errors, naturalistic sampling may not be the best method as it would require a great amount of recording to collect enough data to address this question. The target features of structured sampling tasks may range from sounds (e.g., specific vowels and consonants), single words, phrases and sentences to discourses. Task

8

88 Becky H. Huang and Rica Ramírez

types generally fall into one of the following categories: imitation/repetition, read aloud, elicitation of personal narratives, or fictional stories. Imitation/repetition and read aloud tasks are commonly used to evaluate phonological production, such as the degrees of non-native accents in L2 speech production. Imitation/ repetition tasks are also common for evaluating morphological and grammatical knowledge (Peña et al., 2018). To assess vocabulary, grammar, discourse organization, and communicative performance, the elicitation of extended discourse/ narratives is popular among researchers and practitioners (Soodla & Kikas, 2010). Personal narratives can be elicited using techniques such as the Conversational Map Procedure proposed by McCabe and Rollins (1994), though this technique has mostly been applied to preschool-aged children. To use this procedure, the elicitor would describe a brief personal experience first, then ask the child to recount a similar experience using prompts. Fictional narratives can be elicited using wordless picture books. For example, the frog picture book series by Mayer (e.g., 1967, 1969, 1971, 1974) is very popular for story telling/re-telling tasks for children in kindergarten and early primary/elementary school grades. For older children in upper primary/elementary grades and even adolescents, Doctor DeSoto (Steig, 2013) and Renfrew Bus Story Test (Glasgow & Cowley, 1994) are widely adopted. After language samples are collected, depending on the type and length of the samples and the research questions, they may be either scored by human judges or machines (i.e., automatic scoring) or transcribed and formatted for linguistic analysis. For example, for questions about phonological production such as specific vowels or consonants, perceived degrees of non-native accents, or overall speaking proficiency, judgments and annotations are commonly used to determine accuracy or proficiency outcomes. To use this scoring method, it is important to include a clear rubric and to provide rater training and calibration to ensure high intra- and inter-rater reliability. Some studies have found that, without training, human raters’ characteristics such as their own L2 proficiency and familiarity with non- native accents may influence their judgments of L2 speech production (Huang et al., 2016; Winke & Gass, 2013; Winke et al., 2013). Rater training has been shown to result in improved rater reliability. Given that there is a larger variation in child speech than in adult speech, rater training that addresses the characteristics of child speech would help ensure high reliability. For an analysis of vocabulary and grammar, such as lexical diversity and sentence complexity, language samples are generally fully transcribed before the analysis.The transcribers should be native or advanced speakers of the target language, and a second transcriber is recommended to check the accuracy of transcription. Extended language samples, in particular story narratives, may also be coded or analyzed for the assessment of macrostructure features that relate to the hierarchical organization of the language samples. Macrostructure features can be independent of language proficiency and include both episodic structure and story grammar components such as goals, attempts, and outcomes (Heilmann et al., 2010a). Goals

8

89

8

Evaluating L2 Speech Production 89

are generally defined as the character’s reaction to events, and attempts are the character’s effort to accomplish a goal. Outcomes relate to whether the character reaches the goal. Inter-rater reliability should be reported for evaluations that involve human judgments. Specialized software programs are available for analyzing different components of transcribed language samples. For the analysis of acoustic/phonetic and phonological production, PHON (Rose et al., 2007) and PRAAT (Boersma & Weenink, 2019) are widely used free software. For vocabulary, morphology, and syntax, Child Language Analysis (CLAN) (MacWhinney & Snow, 1990), Systematic Analysis of Language Transcripts (SALT) (Miller & Iglesias, 2012), and Sampling Utterances and Grammatical Analysis Revised (SUGAR; Pavelko & Owens, 2019) are three popular options. These programs may have their own technical and formatting requirements. For example, to use CLAN, all transcripts should be converted into Codes for the Human Analysis of Transcripts (CHAT) formats, a standard transcription system (https://talkbank.org/manuals/CHAT.html). In addition, ELAN (EUDICO Linguistic Annotator) (Sloetjes & Wittenburg, 2008) is an annotation tool developed by the Max Planck Institute in the Netherlands. This tool allows researchers to edit and search annotations for both video and audio data. These programs for analyzing language samples are great resources for researchers and practitioners. The recommended length of language samples varies depending on the sampling procedure, the type of analysis, the child’s age, and the desired reliability index. Sample length can be defined in number of minutes or number of utterances. Historically, researchers have recommended at least 50 utterances (which take typically developing children about 4–5 minutes to produce) to achieve reliable and valid measures of vocabulary and grammar skills (Heilmann et al., 2010b). However, Guo and Eisenberg (2015) found that at least 7 minutes of language samples are needed to obtain reliable measures of vocabulary and grammar for 3-year-olds. In contrast to norm- referenced tasks, language sampling methods are less rigid and more sensitive to L2 children from culturally, linguistically, and socio-economically diverse backgrounds (Alt et al., 2016). However, they are not as widely used as norm-referenced tasks in practice because of the lack of standardized procedures, the time demands for transcription and analysis, and the limited resources for data comparison. To overcome these limitations, Miller and Iglesias (2012) developed standardized elicitation protocols and materials as well as a reference database of typically developing L2 children between the ages of 5 and 10. These resources are available on their SALT software website (www.saltsoftw are.com/resources/databases?SID=1m452ebt7h6fnvf0j9o0657iq2). A new reference database from standardized narrative measures such as TNL and ENNI has also been added to the website. In a study by Huang et al. (2017), the authors used both a standardized norm-referenced assessment and a structured language sampling task to evaluate

90

90 Becky H. Huang and Rica Ramírez

early adolescents’ English language production skills. The goal was to compare the English language outcomes of two groups of early adolescents in grades 5–7 (average age =11 years old): monolingual English speakers and English L2 learners. The authors used a grammar subtest in the Clinical Evaluation of Language Fundamentals 5th Edition (CELF-5) to measure productive grammar and a structured language sampling task for which participants were asked to tell a story based on pictures. Speech samples were transcribed and analyzed in the CLAN program. The authors derived five variables from the speech sample analysis based on the CAF model previously discussed: lexical diversity, words per unit, and clauses per unit (Complexity); percent of grammatical errors (Accuracy); and articulation rate (Fluency). Results from the study revealed no significant differences between the two groups in any of the English language measures, suggesting that simply speaking a home language that is different from English does not diminish English L2 learners’ English language proficiency.

The Use of Technology in L2 Speech Production Methods The ever-increasing advances in technology have made an important impact on L2 speech production methods. Many of the norm-referenced assessments are either completely computerized and delivered via a web-based interface or include a portion of digital items.To increase scoring efficiency and reduce human rater bias (Evanini et al., 2017), researchers have also been working on developing the automated/machine scoring of L2 speech production. Automated scoring involves processing the L2 speech sample and assigning a speaking proficiency score. Most automated scoring systems include a speech recognition system and a scoring model that generates a score with a machine learning paradigm that uses human raters’ scores as the gold standard to train the system (Chen et al., 2018). The large variation in the L2 speech, such as variations in L2 proficiency and L2 learners’ native language influence, as well as in the technical quality of the L2 audio samples may result in speech recognition systems’ high error rates. It is also very challenging to capture the content and organization features of L2 speech using automated scoring systems, and these features are highly valued by human raters (Xi, 2010). Compared to the error rates in using automated scoring for adult speech, the error rates are even higher for child speech because of the larger variation in the acoustic properties of child speech and the lack of a child speech database (Yeung & Alwan, 2018). Automated scoring has been used in the evaluation of highly predictable speech, such as read-aloud speech with highly constrained vocabulary and grammatical structures (Cincarek et al., 2009). For example, the testing company Pearson has recently developed a high-accuracy speech recognition system for the automated scoring of L2 speech (Cheng et al., 2015). The new system has been applied in the Arizona English Language Learner Assessment (AZELLA), the domestic English language proficiency assessment for K-12 English learner

90

91

90

Evaluating L2 Speech Production 91

students in Arizona. Scoring spontaneous speech without a known text has proven to be more complex and challenging despite automated scoring’s rapidly evolving capabilities.These automated scoring efforts have mainly focused on speech features that can be extracted with current speech recognition technologies. For example, the testing company Educational Testing Service developed SpeechRater® to score spontaneous L2 speech, and its scores have been in operational use since 2006 (Zechner et al., 2009). SpeechRater® evaluates multiple components of L2 speech, including fluency features as well as pronunciation, vocabulary, grammar, and, most recently, content and discourse coherence features (Chen et al., 2018). Given the limitations of current speech recognition technologies and scoring models, researchers caution against using automated scoring for high-stakes decisions (Xi, 2010). However, technological advances in the future may improve the accuracy of automated scoring systems and make it feasible and valid for high stakes assessments.

Challenges and Considerations According to the National Association for the Education of Young Children’s (NAEYC) position statement on assessments, researchers and practitioners should use evidence-based methods that are developmentally, culturally, and linguistically appropriate. In addition, assessments should be connected to specific purposes that are beneficial to the L2 learner, such as making valid and reasoned decisions about teaching and learning, identifying significant concerns for possible intervention, and assisting programs to improve their educational and developmental instruction and intervention (NAEYC, 2009). This section discusses some of the challenges and considerations of assessing young L2 learners’ speech production followed by recommendations and resources to guide the different methods of assessment for this unique and ever-g rowing young L2 learner population. First, the purpose of the assessment should match the intended use or goal of the assessment (Peña & Halle, 2011). For instance, if the goal is to measure children’s growth in their L2 production proficiency, then using an L2 production assessment at both pre-and post-test to show growth would be appropriate. If the goal is to examine L2 children’s overall production skills, it would be essential to assess them in both of their L1 and L2 to capture their language competencies. This can be difficult to achieve because of the lack of standardized assessments of children’s native/home language development. For example, some English language proficiency assessments have a parallel version in Spanish, such as the Expressive One-Word Picture Vocabulary Test (EOWPVT-4), the Clinical Evaluation of Language Fundamentals (CELF), and the Bilingual English Spanish Assessment (BESA) mentioned earlier that can be used to evaluate the L1 and L2 proficiencies of English L2 children from Spanish-speaking homes or vice versa. However, most English language proficiency assessments do not have a parallel

92

92 Becky H. Huang and Rica Ramírez

version in a language other than Spanish, making it challenging to evaluate the L1 of English L2 children from other L1 backgrounds. A method’s validity is also threatened when the child’s cultural experiences do not match the measure’s expectations, or when the assessment items are not presented in a way that allows the young L2 learners to demonstrate competence (Peña & Halle, 2011). For example, there can be potential biases in language sampling methods that use storybooks as stimuli due to varied content, thus resulting in substantial differences in children’s productive language. In one of the frog storybooks, Frog Goes to Dinner (Mayer, 1974), the author describes a dining experience at a fancy restaurant. The theme could be potentially biased against L2 children from low socioeconomic status (SES) backgrounds as they may not have any experience of a fancy restaurant. Young learners’ L2 pronunciation variations and dialects should also be taken into consideration.The dialects spoken by L2 learners vary by the family’s country of origin, the geographic region in which these families reside, and their SES background (Basterra et al., 2011). For example, Sandilos and colleagues (2015) investigated the way in which items on the Spanish version of the Woodcock- Muñoz Language Survey Revised (WMLS-R) function for bilingual children from different ethnic subgroups who speak different dialects of Spanish. There are differences in the dialects of Spanish among Mexican, Cuban, and Puerto Rican in terms of pronunciation, grammar, and vocabulary (Silva-Corvalán et al., 2004). Therefore, these differences in dialect may affect the performance of L2 children on the Spanish subtests of the WMLS-R. The authors of this study examined WMLS- R items and identified several dialectal differences across Mexican, Cuban, and Puerto Rican Spanish (Sandilos et al., 2015). For instance, there are many variations of the word “baby”: it may be referred to as niño, chiquito, and infante in Mexican Spanish; bebé in Cuban Spanish; and nené in Puerto Rican Spanish. Another example is the word anteojos, which is the formal way of saying “eyeglasses,” but it could also be referred to as antiparras or gafas in Mexican Spanish. Therefore, administrators of this test should be mindful of children’s dialects in order to avoid underestimating young learners’ language competence. There is a growing trend of incorporating technology in L2 production methods and assessments.Technology may help improve validity and increase administration and scoring efficiency to save valuable time for instruction and learning (Bailey, 2017). Using technology for standard administration can help reduce assessment errors and biases in administration and scoring due to a test administrator’s unfamiliarity with children’s L2 production. For example, a human administrator who is unfamiliar with children’s L2 production may mistake an acceptable L2 production variation for an error. Human administrators’ preferences for or biases against certain non-L2 accents could also lead to construct-irrelevant variance in their scoring (see Huang et al., 2016;Winke et al., 2013). Embedding assistive technology such as hyperlinks in glossaries or pop-up visual aids in computer-based L2 production methods may also help provide accessibility and accommodations for

92

93

92

Evaluating L2 Speech Production 93

L2 learners (Kopriva, 2011). For example, assistive technology allows L2 learners to click on a novel word in an online test to look up its definition. The definition, if applicable, may also be provided as an image or picture rather than words to help L2 learners better understand it. However, technology is a double-edged sword, and an important caveat to note is that technology can also introduce bias and unfairness if the L2 children are unfamiliar with technology. For example, young immigrant children who have limited familiarity with and exposure to technology prior to their arrival in the host country and young children who are still developing motor skills and learning to use digital devices may have difficulties completing some computer-based tasks. In a recent study by Ramírez (2017), a touchscreen device was used to assess language skills among young L2 children from very low socioeconomic homes, which resulted in floor effects because these particular children had no experience with touchscreen devices. Despite practice, these children still struggled to master the assessment, thus compromising its validity.Therefore, caution should be exercised when implementing technology in the assessment of L2 children. Finally, oftentimes an overlooked consideration is the child’s personality and shyness. Children who are shy and reserved tend to be more difficult to assess, especially their productive language knowledge. Norm-referenced standardized assessments tend to be more rigid than the language sampling method and can potentially bias the results when assessing shy children. Children who are more gregarious may perform better on standardized language measures as they may produce more language (Hutchins et al., 2005).Therefore, it is important to establish rapport and opportunities for these shy children to warm up before taking part in the assessment in order to avoid underestimating their skills (Harbaugh et al., 2018). To illustrate, Harbaugh and colleagues presented a case study on David, a 5-year-old Spanish-English bilingual child from a Spanish-speaking home. David was described as shy by his parents and teacher. His teacher also reported him as being non-verbal in both languages in school, and a formal evaluation was requested for David. The bilingual speech language therapist who worked with David started with classroom observation and incorporated techniques such as creative and structured play in their one-on-one sessions to engage and encourage David to speak. Over a 4-day period, David went from whispering to producing verbal speech with the therapist, and the Preschool Language Scales, Fifth Edition Spanish (PLS-5 Spanish) was administered to him at the end. Although his test results were below average compared to his age-matched bilingual peers, his standard scores approximated the average range.

Implications In this chapter, we have reviewed two main methods (standardized norm-referenced vs. language sampling) for researching young L2 learners’ speech production, presented example measures in each category, and examined the advantages and

94

94 Becky H. Huang and Rica Ramírez

disadvantages of the two methods. We have also discussed the challenges and considerations for evaluating L2 children’s speech production skills. Several implications can be drawn from the work reviewed in this chapter. The first implication is to explore and experiment with alternative L2 production methods, such as dynamic assessment (DA). DA is rooted in Vygotsky’s theory of the “zone of proximal development” (ZPD;Vygotsky, 1986). In the development of higher mental functioning,Vygotsky argued that children benefit from instructional and/or social interaction with more experienced others. Resting upon this premise, DA thus emphasizes identifying the learners’ skills as well as their ability to learn the skills after explicit instruction. Researchers have argued that DA prevents many of the sources of bias associated with norm-referenced assessments (Petersen et al., 2017). When using DA, the researcher does not necessarily assess what the L2 learner already knows, rather, s/he assesses how they learn.This circumvents the problem seen for so many young L2 learners: a lack of prior knowledge of items presented on standardized tests (Roseberry-McKibbin & O’Hanlon, 2005). The Test-Teach-Test is one of the most popular forms of DA (Petersen et al., 2017).The purpose of the first test phase in the three phases (Test-Teach-Test) is to obtain an initial measure of the L2 learner’s ability independently. In the teaching phase, the researcher provides a brief period of instruction on the relevant content.The teaching should address both the target language skills and the associated learning behaviors. The L2 learner is then retested using the same or an alternate form of the initial test. DA can bypass many differences in L2 learners’ prior knowledge, prior language, and cultural backgrounds because it examines the process of learning rather than the product of prior learning. When working with young L2 learners, this is particularly important in light of the large variation in young learners’ L2 performances. Although DA does not yield a conventional score like norm- referenced or language sample methods do, researchers and practitioners can gauge young learners’ learning potential based on their change or growth in performance from pre-to post-test. Another implication is to develop alternative measures or methods of administration or test presentation, such as using parent report or involving parents or teachers as the assessment administrator. Results from parent report measures of children’s language proficiency, such as the widely used MacArthur Communicative Development Inventories (CDI), have been shown to correlate strongly with results from direct language assessments (Marchman & Martínez-Sussmann,2002; Sachse & Von Suchodoletz, 2008). Furthermore, since young L2 learners, particularly the shy ones, may not feel comfortable interacting with a researcher who they do not know, parent-or teacher- administered measures that do not require young learners to interact with unfamiliar adults may thus yield results that better reflect young learners’ true productive skills (Harbaugh et al., 2018). For example, Klein et al. (2013) had parents trained to administer standardized assessments, such as the Test of Narrative

94

95

94

Evaluating L2 Speech Production 95

Language (TNL)-Narrative Comprehension and TNL-Oral Narration (TNL-C and TNL-O; Gillam & Pearson, 2004) and a receptive vocabulary assessment, the Peabody Picture Vocabulary Test-4 (PPVT; Dunn & Dunn, 2007) to their children with selective mutism who were between the ages of 5 and 12. Results from parent- administered tests were comparable to those from professional-administered tests. Despite these promising results, it is worth noting these alternative methods’ potential disadvantages. For example, parent report may be less sensitive to the effect of context or task than child language assessments. Some parents may have limited proficiency in the L2 and thus have difficulties providing a valid report. Furthermore, although the child may feel more comfortable with their parent or teacher being the test administrator, the parent or teacher may bring their own bias about the child to the assessment and thus jeopardize the validity of the inferences to be drawn from the assessment. Finally, given the complexity of the construct of L2 production skills, to gain a comprehensive picture of L2 children’s full capacities in speech production, it is important to use multiple measures and/or multiple methods (Brookhart, 2009). For example, using multiple reports/surveys of language skills (e.g., both parent and teacher reports) can give a clearer picture of L2 capabilities than a single report. Researchers can also combine both standardized assessment methods and language sampling methods in a single study, as illustrated in some existing studies. Both standardized assessments and language sampling methods have their own limitations and disadvantages as do the alternative assessments discussed in this chapter. Using multiple assessments/methods can thus help improve the validity of the inferences we make about young learners’ L2 production proficiency from their performances on these assessments.

Further Readings Armon- Lotem, S., de Jong, J., & Meir, N. (Eds.). (2015). Assessing multilingual children: Disentangling bilingualism from language impairment. Multilingual Matters. This book provides information on language assessments and measures that can be used to differentiate between the effects of bilingualism and those of language impairment in bilingual children. De Groot, A. M., & Hagoort, P. (Eds.). (2017). Research methods in psycholinguistics and the neurobiology of language: A practical guide (Vol. 9). John Wiley. This book presents information on the psycholinguistic and neurobiological methods and technologies used in language acquisition and processing research. Although it does not have a young learner or speech production focus, some chapters in the book introduce methods that can be adapted for assessing young learners. Nikolov, M. (Ed.). (2016). Assessing young learners of English: Global and local perspectives. Berlin: Springer. This book presents information on the trends and challenges in assessing the English language proficiency of young learners learning EFL in various geographical and educational contexts.

96

96 Becky H. Huang and Rica Ramírez

Wolf, M. K., & Butler, Y. G. (Eds.). (2017). English language proficiency assessments for young learners. Routledge. This book provides information on developing and validating English language proficiency assessments for young learners between 5 and 13 years of age learning English as a foreign language (EFL) or second language (ESL).

Tools and Resources 1. Commercially available standardized speech production/speaking assessments: •

The ACTFL Assessment of Performance toward Proficiency in Languages (AAPPL) (Grades K- 12): www.languagetesting.com/lti-for-organizations/ k-12-aappl • preLAS: The English Language Proficiency Assessment for Early Learners (ages 3 –first grade): https://laslinks.com/prelas/ • LAS Links® (for PreK-3 to 12th grade): https://laslinks.com/ • IPT I Oral English for Grades K-6: www.ballard-tighe.com/ipt/about/ipt-oral- english/ipt-i/ • Cambridge English: Young Learners (includes a speaking subtest) (ages 6– 12): www.cambridgeenglish.org/in/exams-and-tests/young-learners-english/ • TOEFL Primary® (ages 8+ ) (includes a speaking subtest): www.ets.org/ toefl_primary • Pearson Test of English Young Learners (PTE Young Learners) (includes a speaking subtest) (ages 6–13): https://qualifications.pearson.com/en/qualificati ons/international-certificate/young-learners.html • Expressive One-Word Picture Vocabulary Test (EOWPVT-4) (ages 2:6–90+): www.proedinc.com/Products/13692/eowpvt4-expressive-oneword-picture- vocabulary-testfourth-edition.aspx • Clinical Evaluation of Language Fundamentals 5th Edition (CELF- 5) (ages 5:0–21:11): www.pearsonassessments.com/store/usassessments/en/Store/Profe ssional-Assessments/Speech-%26-Language/Clinical-Evaluation-of-Language- Fundamentals-%7C-Fifth-Edition/p/100000705.html • Computer Articulation Instrument (CAI) (for Dutch children ages 2–7): https://pubs.asha.org/doi/full/10.1044/2018_JSLHR-S-18-0274 2. Commercially available parent and teacher report measures: • ITALK (to be administered by the examiner as a parent and teacher interview) (ages 4– 6): www.northernspeech.com/bilingual-culturally-diverse-cld/ besa-forms-italk • Student Oral Language Observation Matrix (SOLOM) (to be administered by a teacher) (used with multiple age ranges): www.cal.org/twi/EvalToolkit/appen dix/solom.pdf • Children’s Communication Checklist-2 (CCC-2) (a parent or caregiver rating scale) (ages 4:0– 16:11): www.pearsonassessments.com/store/usassessments/ en/Store/Professional-Assessments/Speech-%26-Language/Children%27s- Communication-Checklist-2-%7C-U-S-Edition/p/100000193.html 3. Corpus of child language samples: • Child Language Data Exchange System (CHILDES): https://childes.talkb ank.org/

96

97

96

Evaluating L2 Speech Production 97

4. Software programs for collecting and analyzing language samples: • Language ENvironment Analysis (LENA): www.lena.org/about/ • For phonetics and phonological patterns: i. Phon: www.phon.ca/phon-manual/misc/Welcome.html ii. Praat: www.fon.hum.uva.nl/praat/ • For morphology, lexicon, syntax, and code-switching: i. VocabProfile: www.lextutor.ca/vp/eng/ ii. Computerized Language Analysis (CLAN): http://dali.talkbank.org/clan/ iii. Systematic Analysis of Language Transcripts (SALT): www.saltsoftware.com/ iv. Sampling Utterances and Grammatical Analysis Revised (SUGAR): www. sugarlanguage.org/

Discussion Questions 1. Think about an L2 child who you have worked with or know personally. He/she could be of any age between 6 and 12 and from any native language background. Which L2 speech production method would you use to evaluate the child’s L2 speech production? Justify your choice. 2. Using your own experiences, give an example of challenges in assessing young L2 learners’ speech production. 3. Find other technological resources for assessing young L2 learners’ speech production.

References Alt, M., Arizmendi, G. D., & DiLallo, J. N. (2016). The role of socioeconomic status in the narrative story retells of school-aged English language learners. Language, Speech, and Hearing Services in Schools, 47(4), 313–323. https://doi.org/10.1044/2016_LS HSS-15-0036 August, D., & Shanahan, T. (2006). Synthesis: Instruction and professional development. In D. August & T. Shanahan (Eds.), Developing literacy in second-language learners (pp. 351– 364). Erlbaum. Bailey, A. L. (2017). Theoretical and developmental issues to consider in the assessment of young students’ English language proficiency. In M. K.Wolf & Y. G. Butler (Eds.), English language proficiency assessments for young learners (1st ed., pp. 25–40). Routledge. Baker, C., & Wright, W. E. (2017). Foundations of bilingual education and bilingualism. Multilingual Matters. Basterra, M., Trumbull, E., & Solano- Flores, G. (Eds.). (2011). Cultural validity in assessment: Addressing linguistic and cultural diversity. Routledge. Boersma, P., & Weenink, D. (2019). Praat: Doing phonetics by computer [Computer program].Version 6.1.04. www.praat.org/ Brookhart, S. M. (2009).The many meanings of “multiple measures”. Educational Leadership, 67(3), 6–12. Burt, M. K., Dulay, H. C., & Hernandez- Chavez, E. (1973). Bilingual syntax measure. Harcourt Brace Jovanovich. Chen, L., Zechner, K., Yoon, S. Y., Evanini, K., Wang, X., Loukina, A., Tao, J., Davis, L., Lee, C. M., Ma, M., Mundkowsky, R., Lu. C., Leong, C.W., & Gyawali, B. (2018). Automated scoring of nonnative speech using the SpeechRater SM v. 5.0 engine

98

98 Becky H. Huang and Rica Ramírez

(Research Report No. RR- 18- 10). Educational Testing Service. https://doi.org/ 10.1002/ets2.12198 Cheng, J., Chen, X., & Metallinou, A. (2015). Deep neural network acoustic models for spoken assessment applications. Speech Communication, 73, 14–27. https://doi.org/ 10.1016/j.specom.2015.07.006 Chik, A., & Besser, S. (2011). International language test taking among young learners: A Hong Kong case study. Language Assessment Quarterly, 8(1), 73–91. https://doi.org/ 10.1080/15434303.2010.537417 Cho,Y., Ginsburgh, M., Morgan, R., Moulder, B., Xi, X., & Hauck, M. C. (2016). Designing the TOEFL® Primary™ tests (Research Memorandum No. RM- 16- 02). Educational Testing Service. Chomsky, N. (1986). Knowledge of language: Its nature, origin, and use. Greenwood Publishing Group. Cincarek, T., Gruhn, R., Hacker, C., Nöth, E., & Nakamura, S. (2009). Automatic pronunciation scoring of words and sentences independent from the non-native’s first language. Computer Speech & Language, 23(1), 65–88. https://doi.org/10.1016/j.csl.2008.03.001 De Jong, N. H., Steinel, M. P., Florijn, A. F., Schoonen, R., & Hulstijn, J. H. (2012). Facets of speaking proficiency. Studies in Second Language Acquisition, 34(1), 5–34. https://doi. org/10.1017/S0272263111000489 Derwing, T. M., & Munro, M. J. (1997). Accent, intelligibility, and comprehensibility: Evidence from four L1s. Studies in Second Language Acquisition, 19(1), 1–16. Dodd, B., Zhu, H., Crosbie, S., Holm, A., & Ozanne, A. (2002). Diagnostic evaluation of articulation and phonology (DEAP). Psychology Corporation. Dunn, L. M., & Dunn, D. M. (2007). Peabody picture vocabulary test (4th ed.). Minneapolis, MN: Pearson Assessments. Ellis, R. (2009). The differential effects of three types of task planning on the fluency, complexity, and accuracy in L2 oral production. Applied linguistics, 30(4), 474–509. Evanini, K., Hauck, M. C., & Hakuta, K. (2017). Approaches to automated scoring of speaking for K–12 English language proficiency assessments. ETS Research Report Series (ETS RR-17-18), 1–11. https://doi.org/10.1002/ets2.12147 Gagarina, N., Klop, D., Kunnari, S.,Tantele, K.,Välimaa,T., Balčiūnienė, I., Bohnacker, U., & Walters, J. (2012). MAIN: Multilingual assessment instrument for narratives. ZAS Papers in Linguistics, 56, 1–140. Gillam, R. B., & Pearson, N. A. (2004). TNL:Test of narrative language. LinguiSystems. Glasgow, C., & Cowley, J. (1994). Renfrew bus story test-North American edition. Centreville School. Goldman, R., & Fristoe, M. (2000). Goldman-Fristoe test of articulation 2. Pearson. Guo, L. Y., & Eisenberg, S. (2015). Sample length affects the reliability of language sample measures in 3-year-olds: Evidence from parent-elicited conversational samples. Language, Speech, and Hearing Services in Schools, 46(2), 141–153. Harbaugh, E., Prezas, R. F., & Edge, R. L. (2018). Selective stimulability in the speech and language assessment of bilingual children with selective mutism. Journal of Human Services: Training, Research, and Practice, 3(2). https://scholarworks.sfasu.edu/jhstrp/vol3/ iss2/5 Heilmann, J., Miller, J. F., & Nockerts, A. (2010a). Sensitivity of narrative organization measures using narrative retells produced by young school- age children. Language Testing, 27(4), 603–626. https://doi.org/10.1177/0265532209355669

98

9

98

Evaluating L2 Speech Production 99

Heilmann, J., Nockerts, A., & Miller, J. F. (2010b). Language sampling: Does the length of the transcript matter? Language, Speech, and Hearing Services in Schools. 41(4), 393–404. https://doi.org/10.1044/0161-1461(2009/09-0023) Huang, B. H. (2016). A synthesis of empirical research on the linguistic outcomes of early foreign language instruction. International Journal of Multilingualism, 13(3), 257–273. https://doi.org/10.1080/14790718.2015.1066792 Huang, B. H., Alegre, A., & Eisenberg, A. (2016). A cross-linguistic investigation of the effect of raters’ accent familiarity on speaking assessment. Language Assessment Quarterly, 13(1), 25–41. https://doi.org/10.1080/15434303.2015.1134540 Huang, B., Chang, Y. H. S., Niu, L., & Zhi, M. (2018a). Examining the effects of socio- economic status and language input on adolescent English learners’ speech production outcomes. System, 73, 27–36. https://doi.org/10.1016/j.system.2017.07.004 Huang, B. H., Chang,Y. H. S., Zhi, M., & Niu, L. (2018b). The effect of input on bilingual adolescents’ long-term language outcomes in a foreign language instruction context. International Journal of Bilingualism, 24(1), 8–25. https://doi.org/10.1177/136700691 8768311 Huang, B. H., Davis, D. S., & Ngamsomjan, J. R. (2017). Keeping up and forging ahead: English language outcomes of proficient bilingual adolescents in the United States. System, 67, 12–24. https://doi.org/10.1016/j.system.2017.04.002 Huang, B. H., & Flores, B. B. (2018). The English language proficiency assessment for the 21st century (ELPA21). Language Assessment Quarterly, 15(4), 433–442. https://doi.org/ 10.1080/15434303.2018.1549241 Huang, B. H., & Kuo, L.-J. (2020).The role of input in bilingual children’s language and literacy development: Introduction to the Special Issue. International Journal of Bilingualism, 24(1), 3–7. https://doi.org/10.1177/1367006918768369 Hutchins, T. L., Brannick, M., Bryant, J. B., & Silliman, E. R. (2005). Methods for controlling amount of talk: Difficulties, considerations and recommendations. First Language, 25(3), 347–363. https://doi.org/10.1177/0142723705056376 Jones, R. M., Plesa Skwerer, D., Pawar, R., Hamo, A., Carberry, C., Ajodan, E. L., Caulley, D., Silverman, M., McAdoo, S., & Yoder, A. (2019). How effective is LENA in detecting speech vocalizations and language produced by children and adolescents with ASD in different contexts? Autism Research, 12(4), 628–635. https://doi.org/ 10.1002/aur.2071 Klein, E. R., Armstrong, S. L., & Shipon-Blum, E. (2013). Assessing spoken language competence in children with selective mutism: Using parents as test presenters. Communication Disorders Quarterly, 34(3), 184–195. https://doi.org/10.1177/1525740112455053 Kopriva, R. (2011). Improving testing for English language learners. Routledge. MacWhinney, B., & Snow, C. (1990). The child language data exchange system: An update. Journal of Child Language, 17(2), 457–472. https://doi.org/10.1017/S030500090 0013866 Marchman,V. A., & Martínez-Sussmann, C. (2002). Concurrent validity of caregiver/parent report measures of language for children who are learning both English and Spanish. Journal of Speech, Language, and Hearing Research, 45, 983–97. Martin, N. A., & Brownell, R. (2011). Expressive one-word picture vocabulary test-4 (EOWPVT- 4). Academic Therapy Publications. Mayer, M. (1967). A boy, a frog, and a dog. New York, NY: Dial Press. Mayer, M. (1969). Frog, where are you? New York, NY: Dial Press.

01

100 Becky H. Huang and Rica Ramírez

Mayer, M. (1971). A boy, a dog, a frog, and a friend. New York, NY: Dial Press. Mayer, M. (1974). Frog goes to dinner. New York, NY: Dial Press. McCabe, A., & Rollins, P. R. (1994). Assessment of preschool narrative skills. American Journal of Speech-Language Pathology, 3(1), 45–56. Miller, J., & Iglesias, A. (2012). SALT: Systematic analysis of language transcripts. Software for the analysis of oral language. SALT Software. NAEYC. (2009). Developmentally appropriate practice in early childhood programs serving children from birth through age 8. [Position Statement]. www.naeyc.org/resour ces/position-statements/dap Pavelko S . L., & Owens R . E. (2019). Diagnostic accuracy of the Sampling Utterances and Grammatical Analysis Revised (SUGAR) measures for identifying children with language impairment. Language, Speech, and Hearing Services in Schools, 50(2), 211–223. https://doi.org/10.1044/2018_LSHSS-18-0050 Peña, E. D., Gutiérrez-Clellen, V. F., Iglesias, A., Goldstein, B. A., & Bedore, L. M. (2018). Bilingual English Spanish Assessment (BESA). Brookes Publishing. Peña, E. D., & Halle, T. G. (2011). Assessing preschool dual language learners: Traveling a multiforked road. Child Development Perspectives, 5(1), 28–32. https://doi.org/10.1111/ j.1750-8606.2010.00143.x Petersen, D. B., Chanthongthip, H., Ukrainetz, T. A., Spencer, T. D., & Steeve, R. W. (2017). Dynamic assessment of narratives: Efficient, accurate identification of language impairment in bilingual students. Journal of Speech, Language, and Hearing Research, 60(4), 983– 998. https://doi.org/10.1044/2016_JSLHR-L-15-0426 Ramírez, R. (2017). Latino mothers’ responsiveness and bilingual language development in young children from 24 months to 36 months. (Doctoral dissertation). Scholar Commons. http://scholarcommons.usf.edu/etd/6935 Rose, Y., Hedlund, G. J., Byrne, R., Wareham, T., & MacWhinney, B. (2007). Phon 1.2: A computational basis for phonological database elaboration and model testing. In Proceedings of the workshop on cognitive aspects of computational language acquisition (pp. 17–24). Association for Computational Linguistics. www.aclweb.org/anthology/W07- 0603.pdf Roseberry-McKibbin, C., & O’Hanlon, L. (2005). Nonbiased assessment of English language learners: A tutorial. Communication Disorders Quarterly, 26(3), 178–185. https:// doi.org/10.1177/15257401050260030601 Sachse, S., & Von Suchodoletz, W. (2008). Early identification of language delay by direct language assessment or parent report? Journal of Developmental & Behavioral Pediatrics, 29(1), 34–41. Saito, K., Trofimovich, P., & Isaacs, T. (2016). Second language speech production: Investigating linguistic correlates of comprehensibility and accentedness for learners at different ability levels. Applied Psycholinguistics, 37(2), 217–240. https://doi. org/10.1017/S0142716414000502 Sandilos, L. E., Lewis, K., Komaroff, E., Hammer, C. S., Scarpino, S. E., Lopez, L., Rodriguez, B., & Goldstein, B. (2015). Analysis of bilingual children’s performance on the English and Spanish versions of the Woodcock- Muñoz Language Survey- R (WMLS- R). Language Assessment Quarterly, 12(4), 386–408. https://doi.org/10.1080/15434 303.2015.1100198 Schneider, P., Dubé, R.V., & Hayward, D. (2005). The Edmonton Narrative Norms Instrument. www.rehabresearch.ualberta.ca/enni/ Silva-Corvalán, C., Finegan, E., & Rickford, J. R. (2004). Language in the USA:Themes for the twenty-first century. Cambridge University Press.

01

10

01

Evaluating L2 Speech Production 101

Sloetjes, H., & Wittenburg, P. (2008). Annotation by category –ELAN and ISO DCR. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). Soodla, P., & Kikas, E. (2010). Macrostructure in the narratives of Estonian children with typical development and language impairment. Journal of Speech, Language, and Hearing Research, 53(5), 1321–1333. https://doi.org/10.1044/1092-4388(2010/08-0113) Steig, W. (2013). Doctor De Soto. Farrar, Straus and Giroux (BYR). Strauss, E., Sherman, E. M., & Spreen, O. (2006). A compendium of neuropsychological tests: Administration, norms, and commentary. American Chemical Society. Vygotsky, L. S. (1986). Thought and language. Cambridge, MA: MIT Press. Westby, C., & Hwa-Froelich, D. (2010). Difficulty, delay, or disorder: What makes English hard for English language learners. In M. Schatz & L.C. Wilkinson (Eds.), The education of English language learners: Research to practice (pp. 198–221). The Guilford Press. Wiig, E. H., Secord, W. A., & Semel, E. (2013). Clinical evaluation of language fundamentals: CELF-5. Pearson. Williams, K. T. (1997). Expressive vocabulary test second edition (EVT™ 2). J. Am. Acad. Child Adolesc. Psychiatry, 42, 864–872. Williams, K. T. (2007). Expressive vocabulary test (2nd ed.). Minneapolis, MN: Pearson Assessments. Winke, P., & Gass, S. (2013).The influence of second language experience and accent familiarity on oral proficiency rating: A qualitative investigation. TESOL Quarterly, 47(4), 762–789. Winke, P., Gass, S., & Myford, C. (2013). Raters’ L2 background as a potential source of bias in rating oral performance. Language Testing, 30(2), 231–252. Wolf, M. K., & Faulkner-Bond, M. (2016).Validating English language proficiency assessment uses for English learners: Academic language proficiency and content assessment performance. Educational Measurement: Issues and Practice, 35(2), 6–18. Xi, X. (2010). Automated scoring and feedback systems: Where are we and where are we heading? Language Testing, 27(3), 291–300. https://journals.sagepub.com/doi/pdf/ 10.1177/0265532210364643 Yeung, G., & Alwan, A. (2018). On the difficulties of automatic speech recognition for kindergarten-aged children. In Proceedings of Interspeech, 1661–1665. Zechner, K., Higgins, D., Xi, X., & Williamson, D. M. (2009). Automatic scoring of non- native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883–895. https://doi.org/10.1016/j.specom.2009.04.009

012

7 RECEPTIVE METHODS IN CHILD BILINGUALISM AND SECOND LANGUAGE ACQUISITION Silvina Montrul, Alexandra Morales-Reyes, and Begoña Arechabaleta Regulez

Introduction In this chapter, we discuss the sentence-picture matching task and the grammaticality judgment task as they have been used to investigate comprehension of morphosyntax in children learning Spanish and English as a second language during the elementary school period. We provide a general description of each method, their assumptions, rationale and procedure, and the types of structures that we have tested using these methods in different languages. We offer concrete examples of the advantages and challenges of using these methods based on our work and discuss implications for the field of child SLA.

Studying Morphosyntax One of the main goals of linguistic theory is to understand knowledge of language and use in adult mature native speakers; a related goal is to find out how linguistic knowledge is acquired or develops in young children (Chomsky, 1986). The same questions have guided much linguistic research in child and adult second language acquisition (White, 1989, 2003). Linguistic knowledge is represented in the mind and is only observable and measurable through language behavior during production and comprehension. With their emphasis on linguistic competence, Chomsky’s theories of language and linguistic knowledge have been extremely influential in advancing research on the acquisition of syntax (sentence structure), semantics (meaning of words and sentences), and morphology (the structure of words and morphemes), especially to understand unconscious knowledge of language that goes beyond what speakers say. Morphosyntax, the focus of this chapter, is the study of morphemes (smaller parts of words) that perform sentence-level DOI: 10.4324/9780367815783-7

012

103

012

Child Bilingualism and L2 Acquisition 103

functions, such as inflection for subject-verb agreement, or case, which marks the function of noun phrases in a sentence (subject, object, etc.). The development of research methodologies to achieve these goals has been critically important to the field, especially because methodology must be tightly linked to our research questions, the hypotheses driving our studies, and the conclusions drawn from the data collected. Many of the methodologies used with adult native speakers and second language speakers can be adapted for children ages 4 to 14, taking into account the nature of the phenomenon that is the focus of the study, the cognitive demands of the tasks, the age of the children, and their experience with literacy. Language production is perhaps the most direct way to test knowledge of language, especially with very young children: if a learner produces a particular linguistic expression (such as a word, a phrase, or a sentence), we assume that the expression is part of the learner’s linguistic knowledge. However, children and adult speakers have knowledge of linguistic expressions that they may not produce themselves but that they have heard or read (in the case of older children). Conversely, the production of certain frequent formulaic expressions may reflect rote memorization rather than linguistic knowledge. For these reasons (among others), knowledge of language is also examined through receptive methods that rely on comprehension. In this chapter, we discuss two receptive methods that have been widely used with school-age monolingual and bilingual children to study knowledge and comprehension of syntax, semantics, and inflectional morphology: the picture-sentence matching task and the grammaticality judgment task. In comprehension-based experiments of this sort, as in any type of experiment, linguistic behavior can be influenced by the task itself, the experimental materials, the context and background information, and the order of presentation of the linguistic stimuli (Schmitt & Miller, 2010). We give a brief description of the method, we illustrate the method as used in several studies with child L2 learners, and we discuss the advantages and challenges of the two methods from our own experience.

The Picture-Sentence Matching Task Description The picture- sentence matching task is a comprehension- based procedure in which children hear and/or read a sentence and are presented with two or more pictures. They need to choose the picture that best describes the sentence. In one of our studies currently in progress (Montrul, 2020–2023), whose goal is to investigate the later language development of complex syntactic structures, we are using the picture-sentence matching task to test knowledge and comprehension of pronouns and word order in Spanish, passive and active sentences, and three types of relative clauses in Spanish and in English in 9-year-old monolingual English, monolingual Spanish, and bilingual Spanish- English children in the

104

104 Silvina Montrul et al.

FIGURE 7.1a,b Prompt: “The

man was bitten by the squirrel”

United States. Example 7.1 (see Figure 7.1a,b) is a stimulus from one of our tasks. (Squirrels are common in the area where these children are tested.) The picture- sentence matching task has been widely used in first language acquisition studies with both typically developing children and clinical populations to assess semantic interpretations of morphosyntactic contrasts, such as affirmative and negative sentences, subjects and objects, present progressive and future tense, singular vs. plural morphology, present progressive and past tense, mass and count nouns, passives and actives, direct and indirect objects, pronouns and anaphors, and many more (Gerken & Shady, 1996; Schmitt & Miller, 2010). This receptive task has been used to assess whether children have knowledge of structures they do not yet produce or do not produce often enough in order to determine developmental sequence or timing of acquisition of certain structures (e.g., active vs. passive), to compare children’s production and comprehension of the same forms, and to infer the nature of children’s morphosyntactic representations by examining the systematicity of comprehension errors they make. An example of the picture selection task used to assess monolingual children’s comprehension of plurality in English comes from a study by Johnson et al. (2005), who investigated whether 62 mainstream English-speaking children between the ages of 3 and 6 were able to understand that the 3rd person singular –s marks number. The task presented two pictures and a sentence in oral form. One picture had a single subject (singular) and the other had two or more subjects performing the same action (plural). The linguistic stimuli were designed to mask plurality on the noun by using verbs that began with s-, as in (1). This was done to ensure that the only cue children had to identify number in the subject of the sentence was the verb ending (pictures from Johnson et al., 2005, p. 322).

104

105

Child Bilingualism and L2 Acquisition 105

FIGURE 7.2

Source: Johnson et al., 2005

(1) a. The duck swims on the pond. (unmasked plurality) b. The ducks swim on the pond. (masked plurality) The results of the masked plurality items revealed that although English- speaking children produce third person singular –s in their naturalistic speech, they do not accurately comprehend its meaning until around age 5; the 3-and 4-year-olds performed no better than chance. Johnson et al. (2005) suggested that difficulties in comprehension are due to the child’s inability to access the functional features of agreement in comprehension at this early age, thus making direct inferences about the nature of the children’s grammatical knowledge from their behavior in this task.

104

How the Method is Used in Child SLA The picture-sentence matching task has also been used with school-age bilingual children (ages 6 and above) and L2 learners from a variety of first language backgrounds to test different morphosyntactic and semantic properties. For example, Serratrice (2007) used a picture-sentence matching task (adapted from a version developed for adults by Tsimpli et al., 2004) to investigate the interpretation of null and overt subject pronouns in Italian in Italian-English bilingual children with a mean age of 8:6 (6:11–9:11). In Italian and Spanish, because the verb has agreement morphology that differs for different persons, it is possible to identify the subject of the sentence through the verb morphology, and, for this reason, it is possible in Italian and Spanish to not use overt subject pronouns and use null subject pronouns instead (Canto “I sing” instead of Io canto “I sing”). Italian and Spanish are null subject languages because the subject noun phrase or pronoun can be elided. English is not a null subject language: it must always have a subject pronoun. A sentence like Sing, meaning I sing, is not possible because English does not have sufficient morphological information in the verb to tell us what the subject is. Serratrice (2007) investigated the interpretation of subject pronouns in adverbial subordinate clauses in anaphoric contexts, as in (2a), and in cataphoric contexts, as in (2b), where the adverbial subordinate clause precedes the main clause.

016

106 Silvina Montrul et al.

(2) a. Il nonno parla al nipote, mentre lui legge un libro. (anaphoric) “The granddad talks to the grandson while he reads a book.” b. Mentre lui legge un libro, il nonno parla al nipote. (cataphoric) “While he reads a book the granddad talks to the grandson.” With this test, Serratrice wanted to know if children interpret overt subjects— for example, lui, the subject of the subordinate clause that refers to nipote, the object of the main clause—like adults, regardless of whether the subordinate clause with the pronoun precedes or follows the main clause containing the referent to the pronoun. In null subject contexts, where the subject of the subordinate clause is a null subject, the preferred interpretation by adult native speakers is that the referent is the nonno, the subject of the main clause—again, regardless of the position of the subordinate clause.The experiment included four different conditions manipulating the type of subject pronoun (null/overt) and the position of the subordinate clause (anaphoric vs. cataphoric context) with five experimental items each, for a total of 20 items. Each child was tested individually, and, before the actual experiment, there was a two-item practice/trial to make sure the children understood the task. Serratrice found that the children performed like the adults with the null subject condition but unlike the adults with the overt subject conditions. The cataphoric context (2b) proved more cognitively and linguistically difficult than the anaphoric condition (2a) for the children, since the latter required the children to keep the pronoun in memory until they processed the referent later in the clause. Etxebarria (2021) used a version of this task to investigate the interpretation of null and overt subjects in Spanish and Basque in monolingual Spanish and bilingual Basque-Spanish school-age children. Through extensive piloting with 6-to 14-year-old children in the Basque Country, Etxebarria found that the cataphoric condition was very difficult for the children, especially the younger ones, and had to be dropped from the experiment. Serratrice’s results suggest that the children’s performance on the cataphoric condition should be taken

FIGURE 7.3

Source: Serratrice, 2007

016

107

016

Child Bilingualism and L2 Acquisition 107

with caution since it was not replicated in another context, unlike the anaphoric condition. Following Pérex-Leroux (2005) and Herschensohn et al. (2005), Morales- Reyes (2014) and Morales-Reyes and Montrul (2020) used the picture-sentence matching task to investigate verbal plurality in Spanish and verbal singularity in English in a study of child L2 learners of Spanish and child L2 learners of English. Following Johnson et al. (2005), Morales-Reyes compared the production and comprehension of 3rd person singular –s in English and third person plural –n in Spanish present tense. The English and the Spanish versions of the tasks were administered to 32 English-speaking children (ages 7:7–9:9, mean 8:9) learning Spanish as L2 in a full immersion school in the United States and 32 Spanish- speaking children (ages 7:5–10:11, mean 9:1) learning English as L2 in a full immersion school in Puerto Rico.The main goal of this study was to test whether acquiring verbal agreement morphology in Spanish is easier than in English due to the morphological properties of the languages (sparse morphology in English, rich morphology in Spanish). Stimuli sentences evaluated the L2 children’s ability to use the verbal inflection as the sole reliable cue to identify the number of the subject noun phrase. In both the English and Spanish comprehension tasks, pictures were presented in sets of three (i.e., one single object picture, one plural object picture, and a distractor) along with a pre-recorded stimulus sentence, as shown in Figures 7.4 and 7.5. Both tasks included 3 practice items, 10 target items, six control items, and three filler items. (3) a. Ella salta la cuerda. “She jumps/skips rope.” b. Ellas saltan la cuerda. “They jump/skip rope.” (4) a. The elephant spills the paint. b. The elephants spill the paint. The pictures were presented along with the singular or the plural stimulus, which the children could hear. The three practice items, as well as three of the fillers from the experimental section, were null subject sentences (e.g., Ø Ve televisión /“He watches television”) in the Spanish task and present progressive sentences (e.g., The girl is taking a nap) in the English task. The control items included in the experimental sections were sentences with overt subjects, none of which masked the plurality of the noun phrase: Spanish task e.g., Ella corre en las mañanas (“She runs in the mornings”); English task e.g., The boy writes a letter. To randomize the items and limit the number of items the children would see, two experimental lists of each task were created—Form A and Form B—so the children would see half of the total items required in the test. The target (masked) and the control (non-masked) sentences were counterbalanced, and each task form consisted of one practice section with two singular stimuli and one plural

018

108 Silvina Montrul et al.

FIGURE 7.4

018

FIGURE 7.5

stimulus. The sentences included in both tasks were recorded by either a Spanish native speaker or an English native speaker. The speakers were instructed to read the sentences at a natural speech rate and not to pause between the subject and the verb of the sentence. In natural connected speech, speakers would not pause between the noun phrase and the verb, and in sentences such as the ones included

019

018

Child Bilingualism and L2 Acquisition 109

in the experiments, both the –s of the noun phrase and the –s of the verb would merge into one sound, resulting in no clear cues for word boundaries. In this way, participants were forced to rely exclusively on the verbal inflection –n or –s to correctly identify the number of the subject of the sentence. The task was first piloted and normed with adult Spanish and English speakers. Accuracy in Spanish was higher than in English, suggesting that learning inflection in English is more difficult than in Spanish. The results of the children showed higher accuracy with agreement comprehension in the children learning Spanish than in the children learning English as L2, suggesting that learning Spanish agreement morphology is easier than learning English agreement morphology.

Challenges and Recommendations In general, the picture-sentence matching task can be very useful and successful in obtaining information about L2 children’s comprehension of morphosyntactic and semantic aspects of their L2 or one of the languages of developing bilinguals. The main challenge of this method is in designing the stimuli and the pictures. In our experience, it is important to choose pictures that are clear and that correctly capture what one wants to test. The pictures should be similar in design, either drawn by the same artist or constructed using the same drawing program, to avoid pictures being too different from each other or some pictures being more salient (more colorful, busier, etc.) than other pictures. Picture quality can affect the responses. It is also a good idea to norm the pictures by themselves without sentences with adult native speakers to make sure the pictures clearly depict what they were designed to depict and that they are culturally appropriate for the children tested. Today, this can easily be done with Amazon Mechanical Turk, a crowdsourcing website through which remotely located “crowdworkers” can be hired to perform tasks such as these. It is operated under Amazon Web Services and owned by Amazon. The next step is to run a pilot test with adult native speakers to also make sure that the items work accordingly with the pictures. If the pictures are unclear or too visually busy, they can prove cognitively challenging to children and may fail to test what we really want to test. For example, in a current study testing the interpretation of accusative case marking with specific and nonspecific direct objects (Coşkun & Montrul, 2021) in monolingual and bilingual Turkish-speaking children and adults, we found that depicting “specificity” (specific vs. nonspecific noun phrases) is quite challenging to draw or to convey with just a picture, and the responses on nonspecific items are quite variable so far, even for the adult native speakers. Depending on the age of the child, it is also important to include filler and control items so that children do not develop a strategy and figure out what the task is about. Similarly, counterbalancing the order of pictures and sentences and possibly creating two experimental lists is always good experimental practice, and

10

110 Silvina Montrul et al.

we would recommend following those practices when working with children. The setting of the testing is also important. First, the test administrator must establish a good rapport with the child by perhaps engaging in some child conversation or a short game and thus gain the child’s trust. In our experience, it is better to conduct the test in an environment with few distractions (such as an office with little furniture, a lab room, or a room in a library) because children are easily distracted, especially those under age 9.This is difficult to accomplish if the testing must take place in a school setting. In a school setting, many interruptions can occur during test administration (people walking by or entering the testing room), and this can affect the number of pauses and breaks the administrator must make during testing. During administration, it is advisable to have a couple of trials and practice items with the children to make sure they understand the task. All these stages add time to the test administration. Data analysis is straightforward, as responses are coded as correct or incorrect. A problem may arise if a child chooses both pictures, and a decision needs to be made as to how to code such a response. In general, this type of problem has not arisen in our experience because we found that children tend to be more categorical and choose one or the other picture rather than choosing both. If there are only two pictures, chance response (the minimum possible accuracy score by just guessing or choosing always the same picture based on position, for example) is 50%; if there are three pictures, chance performance is 33.3%, and if there are four pictures, chance performance is 25%. Results need to be analyzed statistically against chance, and obtaining an adjusted measure of sensitivity (A’) is also a good way to check for children’s response bias. Finally, with online methods, it is possible to collect reaction time if the task is performed through software or programs that measure reaction times (see Morales-Reyes & Arechabaleta Regulez, 2017). Reaction time refers to the time it takes participants to process the linguistic stimuli during the task and answer each question. With the advent of eye-tracking technology, the picture-sentence matching task is ideal to test comprehension in a less cognitively demanding way if the child is just asked to look at the pictures without having to make a choice. We have also adapted the picture selection task to investigate knowledge of gender in nouns in children acquiring Spanish as an L2 (Arechabaleta Regulez, 2016), and it has proved very successful and fun for the children tested. An advantage of these tasks is that we can use grammatical sentences that the children have encountered in their linguistic environment, making this task more ecologically valid.

The Grammaticality Judgment Task Description In the generative grammar tradition, syntactic data have mostly consisted of native speaker judgments of sentences. The grammaticality judgment task presents a

10

1

10

Child Bilingualism and L2 Acquisition 111

participant with a list of grammatical and ungrammatical sentences and the participant judges whether each sentence is acceptable or unacceptable or correct or incorrect based on their intuitions. Although the terms grammaticality judgment task (GJT) and acceptability judgment task (AJT) are often used interchangeably, Ionin and Zyzik (2014), citing Cowart (1997), consider that AJT should be used instead of GJT because experimental tasks can test the acceptability of a given sentence by speakers, and the speakers’ judgments can consequently be used to make inferences about grammaticality, which is an abstract notion. Closely related to AJTs are preference tasks, which ask participants to choose between two or more words, phrases, or sentences. The difference between the two is that AJTs ask for a judgment of a given linguistic form, whereas preference tasks ask participants to choose the best form for a given context. In all versions of an AJT, the participants’ response informs the researchers about the status of particular linguistic forms in the learners’ grammar. In some versions of the task, the sentences are presented in isolation either in written form or in auditory form (depending on the age and level of literacy of the participants). Table 7.1 shows some of the ungrammatical experimental sentences used by McDonald (2008). In other versions of the task, the stimulus sentences can be presented with a context, either verbal (another sentence or a short story) or visual (a picture or a sequence of pictures). The types of responses can be binary (correct, incorrect), or on an acceptability Likert scale (1 =completely unacceptable, 5 =perfectly acceptable) depending on the linguistic phenomenon. In this chapter we discuss these tasks as used with children. For a discussion of the types of AJTs used with literate adults and second language learners, see Ionin and Zyzik (2014). Acceptability judgment tasks differ from other comprehension-based tasks in that they usually rely on metalinguistic awareness, requiring the child to use his/ her grammatical knowledge to assess the well-formedness rather than the meaning of a construction. Depending on the age and level of literacy of the child, the TABLE 7.1 Examples of ungrammatical stimuli from a grammaticality judgment task

Construction

Example

Word order Yes/No questions Article Wh-question Regular plural Present progressive Regular past 3rd person singular present Irregular plural Irregular past tense

*The teacher the test grades. *Drives the teacher a really fancy red car? *The lady drove same car for the last twenty years. *What you think about the new coach? *There are twenty flute in our marching band. *The little girl is play with the dolls. *Last night my friend walk home after dark. *The boy jump whenever he is startled. *Several of the mans decided not to go to the football game. *Last week the pilot flied to Paris.

Sources: McDonald, 2008, p. 254.

12

112 Silvina Montrul et al.

task can also measure the degree to which knowledge of language is more or less explicit. So, in a sense, AJTs are more cognitively demanding than comprehension tasks like the picture-sentence matching task. Metalinguistic awareness develops in children around age 4 (Doherty & Perner, 1998), and we see that the earliest grammaticality judgment tasks have been used with children starting at about 4 years of age. Cairns et al. (2006), for example, tested 77 4-, 5-, and 6-year-old English-speaking children. The children were presented with well-formed and ill-formed versions of 10 different sentence types. They were asked to judge the grammaticality of the sentences and correct the ill-formed ones. The sentences were presented in an interview format developed by McDaniel and Cairns (1990). Cairns et al. (2006) found that acceptability judgment and correction ability improved with age and concluded that the ability to make acceptability judgments and to correct ill-formed sentences reflects the child’s developing ability to access syntactic knowledge consciously and to employ that knowledge in the processing of sentences.

How the Method is Used in Child SLA The acceptability judgment task in general has been used substantially more with adults than with child L2 learners. However, there are a few studies that have used this methodology successfully with children. For example, Sorace et al. (2009) tested bilingual English-Italian children on the interpretation of generic and specific definite noun phrases in both English and Italian using a context-based AJT. The test sentences were presented aurally and tested four main conditions, as shown in (5). (5) a. b. c. d.

Here, strawberries are red. Here, the strawberries are red. In general, sharks are dangerous. In general, the sharks are dangerous.

The adverbial here sets up a specific interpretation, while in general sets up a generic interpretation, and the sentences were accompanied by pictures of prototypical objects (red strawberries or dangerous sharks) to set up a context.The children had to say whether the sentence sounded acceptable in English or in Italian. It is worth noting that, in this study, the English and Italian versions of the task were not fully comparable. In Italian, the sentences with bare plurals ([5a] and [5c]) are ungrammatical and the sentences with definite plurals ([5b] and [5d]) are grammatical; in order to provide a response, participants had to simply note whether the plural noun phrase was preceded by a definite article. In the English version, however, participants needed to pay attention to the adverb as well as the article: for the specific interpretation, established by here, the definite plural is required ([5b] is acceptable, [5a] is not); for the generic interpretation, established

12

13

12

Child Bilingualism and L2 Acquisition 113

by in general, the bare plural is required ([5c] is acceptable, [5d] is not). Sorace et al. (2009) found that, in English, the child, and to a lesser extent even the control adult participants, did not always take the adverbial into consideration when giving their responses. So, this study suggests that an AJT may not be the best way to test the interpretation of definite articles, probably because AJTs are better to test aspects of grammar that are strictly grammatical or ungrammatical. Testing aspects of syntax or morphology that depend on semantics is challenging in relation to this type of task. Argyri and Sorace (2007) used an acceptability judgment task with video contexts to test 32 English-Greek 8-year-old children’s knowledge of subject verb inversion in embedded interrogatives, null/over subjects and object pronouns in Greek. The study included three judgment tasks, one for each structure. Each task consisted of nine items (6 stimuli and 3 fillers). The children watched a short video with a person and two hand puppets. The person spoke either English or Greek and asked a question. Each puppet uttered a response sentence (one a grammatical sentence, the other an ungrammatical or pragmatically infelicitous sentence) related to the main grammatical structures tested in the experiment and the children had to say which puppet said it better. So, in essence, this was a preference task. This same methodology was used by Shin and Cairns (2012) in their study of null and overt subject pronouns in monolingual Spanish-speaking children from Mexico. Coming back to Argyri and Sorace’s (2007) study, the reason why they must have done three separate AJTs, one for each structure, is that embedded interrogatives and object pronouns manipulated word order and were grammatical or ungrammatical.The testing of null and overt subjects required a different setup, with two puppets being asked a question about the person (rather than the person asking the question) and the children were supposed to answer using a sentence with a null or an overt subject. The null subject sentence is the correct and preferred response in Greek; the overt subject response is still grammatically correct but pragmatically odd. In English, only the overt subject response is grammatical. Null subject responses are ungrammatical in English. So, once again, the two versions of the test were not strictly comparable. The bilingual children performed like monolingual controls in the object pronoun test and significantly differently with subject-verb inversion and null and overt subjects.The children were also tested on the same structures in elicited production; the results of subject-verb inversion and overt subjects were similar in the two tasks, suggesting that the grammaticality judgment task was successful overall at investigating the grammatical knowledge of these children. Another example of the successful elicitation of grammaticality judgments with child L2 learners is Paradis (2010), who tested 43 French-English bilingual children in first grade in Alberta, Canada, on their knowledge of English inflectional morphology (3rd person singular agreement, past tense, copula be). Paradis used the TEGI test (Test of English Grammatical Impairment, Rice & Wexler, 2001), which includes elicited production and grammaticality judgment probes. The grammatical and ungrammatical sentences were presented by the

14

114 Silvina Montrul et al.

experimenter, who acted out a scenario with two robot toys that were learning English and asked the children to say if the sentences uttered were “said right” or were “not so good” in English. For scoring, Paradis calculated A’-scores, a measure of sensitivity to correct and incorrect responses. This measure takes into account the proportion of children’s correct rejections (of ungrammatical sentences), false alarms (incorrect rejection of grammatical items), misses (incorrect acceptance of ungrammatical items), and hits (correct acceptance of grammatical items). The results of hits and false alarms were entered into a formula to calculate an A’-score, which is designed to correct for “yes” biases. When the results of the production and the acceptability judgment parts of the test were compared for the bilingual children, more children reached the criterion score (i.e., the lowest score a child could obtain within the norms for his age group) for each morpheme tested in the acceptability judgment task than in the elicited production task. When the results of the elicited production and acceptability judgment were correlated, the correlation was significant. Paradis concluded that the French-dominant bilingual children were drawing on similar sources of knowledge to perform on both production and acceptability judgment tasks on verbal morphology in English. The AJT has been appropriate to compare linguistic knowledge in L2 children and L2 adults. Song and Schwartz (2009) tested acquisition of wh-constructions with negative polarity items in Korean as a second language by monolingual Korean children and child and adult L2 learners of Korean with English as a first language. The L2 children were between the ages of 6 and 13. The canonical word order in Korean is subject (S)-object (O)-verb (V). Korean is also a wh-in-situ language, where question words do not move to the beginning of the sentence as in English. Scrambling of the object to presubject position (i.e., movement that results in an OSV word order) is generally optional; however, in the context of negative questions with a negative pronoun subject (e.g., amwuto “anyone”), (a) object wh-phrases must scramble on the wh-question reading and (b) the nonscrambled order has a yes/no question reading. Song and Schwartz argue that these properties of Korean wh-constructions with negative pronouns are very problematic for L1 English speakers learning L2 Korean (as well as for native Korean-acquiring children). Participants completed an elicited production task, an acceptability judgment task, and an interpretation verification task. The acceptability judgment task consisted of a short story presented in English, a picture, and the Korean sentence to be judged as acceptable or unacceptable. The experimenter asked a children’s character, Bbung Byung, a question using four Korean word or phrase cards. Participants were to decide whether the sentence was okay in the context provided. They were given three choices: “yes,” “no,” and “I don’t know.” The task had 16 test items and 16 fillers in four experimental conditions. The children and the adults were successful at performing this task. The results showed that high proficiency (adult and child) L2 learners performed like the native adult

14

15

14

Child Bilingualism and L2 Acquisition 115

controls on all three tasks. The adult and child L2 learners followed the same (inferred) route to convergence, which was different from the route manifested by the Korean L1 children. In all the examples given above, the children’s linguistic knowledge was examined with multiple measures: an acceptability judgment task and an oral production task or an interpretation task. It was reassuring to see that the results of the tasks converged in all cases, suggesting that the acceptability judgment task was an appropriate methodology to evaluate children’s knowledge of morphosyntax and inflectional morphology.

Challenges and Recommendations In general, studies of L1 and L2 acquisition have found the grammaticality judgment task and versions of it to be quite effective with children older than 4 years of age. One of the primary considerations is the nature of the linguistic phenomenon that one wishes to target. Specifically, the researcher must determine whether the linguistic contrast can be evaluated out of context. If so, then a traditional AJT with a binary response scale may be appropriate. Traditional AJTs are typically limited to the study of morphosyntax but can be used to address very different research questions—both questions that focus on the status of a particular construction in the learners’ interlanguage and questions that address broader issues of proficiency and ultimate attainment. The main challenge of the AJT is the construction of the stimuli and the stimuli presentation so that the children understand what they are being asked to do. In our experience, it is very important to include a training stage, before administering the actual stimuli, to make sure the child understands what he or she has to do. As in other experiments, the number of items and conditions will depend on the age of the children, but there should be at least four items per condition testing the same structure, in order to make sure that children have productive knowledge of the structure and it is not linked to one or two specific lexical items learned by rote, and some fillers or distractors, if possible. One of the problems with AJTs is that the training session often includes very simple and straightforward items and the stimuli sentences may be more complex, so it is advisable to choose clear items and some complex items for the training session as well. AJTs have proven difficult to interpret when the grammaticality of the items or structures being tested depends on semantic or contextual factors that are difficult to control in isolated sentences, such as issues related to definiteness and specificity, if one is testing knowledge of definite articles, or case marking that is contextually determined. Semantic and pragmatic phenomena require context: the same grammatical sentence may be appropriate and felicitous in one context but inappropriate or infelicitous in another. The inclusion of context makes for another level of decision-making, including the type of context (acting out sentential, paragraph,

16

116 Silvina Montrul et al.

or picture, as seen in the examples given) that will clearly depict the intended meaning. Moreover, the language of the context (L1 vs. L2) is an important factor that must be considered in light of the proficiency level of the learners. Piloting the task several times with a small group of children before actual administration is also critical so that the researcher makes sure the children can actually perform the task and that the task is eliciting what it is supposed to elicit. Something that works with adults but may not be appropriate for children is to ask them to correct responses they consider incorrect, which adds yet another layer of metalinguistic awareness that may not be developed at a young age. The study by Cairns et al. (2006) with monolingual children was able to elicit corrections. For sentence correction, we have also piloted the sentence repetition task with ungrammatical sentences (Lucy tiene un perrito negro vs. *Lucy tiene un negro perrito, “Lucy has a black doggie”), and we found that the children corrected the sentences with the wrong noun-adjective order without having to identify the ungrammaticality explicitly. Although the AJT has been successfully used to compare the linguistic knowledge of children and adults, there are some differences in the administration of AJTs in children and adults that must also be taken into account when planning such studies. With adults, it is possible to use rating scales to gauge acceptability judgments; with children, responses are binary (yes, no). Data analysis of children’s responses usually takes into account correct and incorrect responses as well as measures of sensitivity to rule out a “yes” bias. When testing children, a “yes” bias (acceptance of incorrect sentences as grammatical) is more likely in this task than in the sentence-picture matching task (Rice et al., 1999). An important difference between AJTs administered to children and adults is that the stimuli is presented orally or auditorily with children, while it is very often done in written form with adults. If the experimenter utters the sentences while conducting the test in an interactive manner, it should be done in a manner that does not affect the pronunciation or emphasis of different sentences. A different emphasis could affect the way children respond to different items. For example, Rice et al. (1999) describe how examiners must be trained to administer the task in a manner that will minimize giving out prosodic cues (intonational prominence or stress) for different items. A potential challenge of administering oral acceptability judgment tasks with children concerns the length and complexity of the sentences. When children are presented the stimuli orally, they must hear the sentence and comprehend it before they can assign a judgment. Therefore, it becomes crucial to also take into account issues related to working memory and processing. McDonald (2008) tested how monolingual English-speaking children’s ability to judge grammatical and ungrammatical sentences related to measures of working memory span and phonological ability. She found that grammaticality judgments on word order, – ing, regular and irregular past tense, and 3rd person singular –s were predicted by

16

17

16

Child Bilingualism and L2 Acquisition 117

working memory capacity. Phonological ability only predicted performance on items with plurals and 3rd person singular –s. While these measures have been correlated with judgments in the L1, it would be appropriate to consider how working memory and phonological ability may also apply to children assessed in their L2.

Conclusion We have given a brief overview of two receptive measures used in child L2 acquisition: the picture-sentence matching task and the grammaticality, or acceptability judgment, task.These tasks have been widely used in L1 acquisition and with adult L2 learners, as well as with child L2 learners. Taken together, both tasks provide a valuable set of methodological options for SLA researchers assessing receptive knowledge of language in school-age children and working with various theoretical frameworks. Although the GJT, or AJT, has been associated with the generative linguistic tradition, it is not a measure of actual competence, but another measure of linguistic performance that requires intuitive knowledge of one’s language and some metalinguistic awareness. Both the picture-sentence matching task and the AJT are also very appropriate for making comparisons between child and adult populations, especially because these tasks have been designed for adults and adapted for children. Finally, we have shown that many of these tasks can be used in combination with other tasks to yield a more complete and more insightful picture of learners’ knowledge. All types of judgment and interpretation tasks can be used in conjunction with production measures, allowing researchers to examine whether the same or different factors are at work in learners’ production and comprehension. As online measures of language processing become more common and accessible, future studies could use these to make the tasks less cognitively demanding for children and to obtain more sensitive processing data. Assessing how the responses to these tasks correlate with other cognitive or metalinguistic measures in child L2 acquisition and bilingualism is something that needs to be pursued in the future.

Further Readings Blom, E. & Unsworth, E. (Eds.). (2010). Experimental methods in language acquisition research. Amsterdam: John Benjamins. [This volume describes the dos and don’ts of different experimental methods of production, judgment, and comprehension used in first, second, and bilingual language acquisition by children and adults.] Kail, M. (2004). On- line grammaticality judgments in French children and adults: A crosslinguistic perspective. Journal of Child Language, 31(3), 713–737. https://doi.org/ 10.1017/S030500090400649X [This is an example of how the same research tool can be adapted for use with children and adults.]

18

118 Silvina Montrul et al.

McDaniel, D., McKee, C., & Cairns, H. (Eds.). (1996). Methods for assessing children’s syntax. Cambridge, MA: The MIT Press. [This volume brings together researchers who have pioneered and used receptive and productive tasks to assess syntactic knowledge in young children. Each chapter discusses a methodology (the truth-value judgment task, sentence repetition, etc.) in depth.] Rice, M., & Wexler, K. (2001). The test of early grammatical impairment. San Antonio, TX: The Psychological Corporation. [This is a standardized measure of knowledge of tense in English used with typically developing English-acquiring children and children with language impairment. The manual describes the elicitation techniques used in great detail as well as instructions for data analysis and interpretation.]

Tools and Resources •

•

•

•

IRIS database: A digital repository of instruments and materials for research into second languages. www.iris-database.org/iris/app/home/index;jsessionid=FBEEC 97F6746E7C939D88B5843B85ABE Pixton.com: This is a computer program for educators, parents and students to create comics and pictures. Ideal for creating different versions of the same picture. The picture in Example 1 was drawn using Pixton. Psychope:This software allows the user to design and run psycholinguistic experiments without the need for programming. It measures reaction time responses to various stimuli and tasks performed on a computer. However, the software only runs on Mac. Psychopy: This software is similar to Psychope, as it allows the user to design and run psycholinguistic experiments. It also has the added benefit of running on Mac, Windows, and Linux. However, it demands more programming knowledge, as experiments are generated by writing Python scripts.

Discussion Questions 1. Explain why it is important to test not only children’s comprehension but also their production. 2. Describe difficulties that researchers may encounter when using the same experimental tasks to test children and adults together. 3. A researcher has the following research question: “Do four-year-old children comprehend the grammatical meaning of the English plural morpheme –s, despite its pronunciation?” For example, will the children be equally successful in comprehending the plural marker [–s] in bats, the plural marker [–z] in bugs, and the plural marker [îz] in buses? Design a task that the researcher could use to address this research question.

References Arechabaleta Regulez, B. (2016, September 22–25). Processing of Spanish grammatical gender at early stages: A comparison between child-L2 and adult-L2 learners [Paper presentation]. 35th Second Language Research Forum, Columbia University, New York, NY, United States. www.tc.columbia.edu/slrf2016/program/

18

19

18

Child Bilingualism and L2 Acquisition 119

Argyri, E., & Sorace, A. (2007). Crosslinguistic influence and language dominance in older bilingual children. Bilingualism: Language and Cognition, 10(1), 79–99. Cairns, H., Schlisselberg, G., Waltzman, D., & McDaniel, D. (2006). Development of a metalinguistic skill: Judging the grammaticality of sentences. Communication Disorders Quarterly, 27(4), 213–220. Chomsky, N. (1986). Knowledge of language: Its Nature, origin and use. New York: Praeger. Coşkun, A., & Montrul, S. (2021). Sources of variability in the acquisition of differential object marking by Turkish heritage language children in the United States. Bilingualism: Language and Cognition, 1–14. https://doi.org/10.1017/S136672892 1001000 Cowart, W. (1997). Experimental syntax. Sage. Doherty, M., & Perner, J. (1998). Metalinguistic awareness and theory of mind: Just two names for the same thing? Cognitive Development, 13(3), 279–305. Etxebarria, E. (2021). Subject Expression in Spanish in Contact with Basque and in Spanish- Basque Bilingualism [Unpublished doctoral dissertation]. Department of Spanish and Portuguese, University of Illinois at Urbana, Champaign. Gerken, L. A., & Shady, M. (1996). The picture selection task. In D. McDaniel, C. McKee & H. Cairns (Eds.), Methods for assessing children’s syntax, (pp. 125–146). The MIT Press. Herschensohn, J., Stevenson, J., & Waltmunson, J. (2005). Children’s acquisition of L2 Spanish morphosyntax in an immersion setting. IRAL: International Review of Applied Linguistics in Language Teaching, 43(3), 193–217. Ionin,T., & Zyzik, E. (2014). Judgment and interpretation tasks in second language research. Annual Review of Applied Linguistics, 34, 37–64. Johnson,V., de Villiers J., & Seymour, H. N. (2005). Agreement without understanding? The case of third person singular /s/. First Language, 25(3), 317–330. McDaniel, D., & Cairns, H. S. (1990). The child as informant: Eliciting intuitions from young children. Journal of Psycholinguistic Research, 19(5), 331–344. McDonald, J. (2008). Grammaticality judgments in children: The role of age, working memory and phonological ability. First Language, 35(2), 247–268. Montrul, S. (Principal investigator). (2020– 2023). Validating new measures of later language development with Spanish and English monolinguals and bilinguals (Project No R03HD09949) [Grant]. Eunice Kennedy Shriver National Institute of Child Health and Human Development. https://projectreporter.nih.gov/reporter_searchresults.cfm Morales-Reyes, A. (2014). Production and comprehension of verb agreement affixes by Spanish and English Child L2 learners [Unpublished doctoral dissertation]. University of Illinois at Urbana-Champaign. Morales-Reyes, A., & Arechabaleta Regulez, B. (2017). Are lions green? Child L2 learners’ interpretation of English generics and definite determiners. Languages, 2(4), 22. https:// doi.org/10.3390/languages2040022 Morales-Reyes, A., & Montrul, S. (2020). Variational learning in child L2 acquisition. Language Learning and Development, 16(3), 211–230. Paradis, J. (2010). Bilingual children’s acquisition of English verb morphology: Effects of language exposure, structure complexity and task type. Language Learning, 60(3), 651–680. Pérez-Leroux, A.T. (2005). Number problems in children. In C. Gurski (Ed.), Proceedings of the 2005 Canadian Linguistic Association Annual Conference. Available online at http://ling. uwo.ca/publications/CLA-ACL/CLA-ACL2005.htm.

210

120 Silvina Montrul et al.

Rice, M., & Wexler, K. (2001). The test of early grammatical impairment. Examiner’s manual.The Psychological Corporation. Rice, M., Wexler, K., & Redmond, S. (1999). Grammaticality judgments of an extended optional infinitive grammar: Evidence from English-speaking children with SLI. Journal of Speech, Language and Hearing Research, 42(4), 943–961. Schmitt, C., & Miller, K. (2010). Using comprehension methods in language acquisition research. In E. Blom & S. Unsworth (Eds.), Experimental methods in language acquisition research (pp. 35–56). John Benjamins. Serratrice, L. (2007). Cross-linguistic influence in the interpretation of anaphoric and cataphoric pronouns in English–Italian bilingual children. Bilingualism: Language and Cognition, 10(3), 225–238. Shin, N. L., & Cairns, H. S. (2012). Subject pronouns in child Spanish and continuity of reference. In Selected proceedings of the 11th Hispanic Linguistics Symposium (pp. 155–164). Cascadilla Proceedings. Song, H., & Schwartz, B. (2009). Testing the fundamental differences hypothesis. L2 adult, L2 child and L1 child comparisons in the acquisition of Korean wh-construction with negative polarity items. Studies in Second Language Acquisition 31(2), 323–361. Sorace, A., Serratrice, L., Filiaci, F., & Baldo, M. (2009). Discourse conditions on subject pronoun realization: Testing the linguistic intuitions of older bilingual children. Lingua, 113(3), 460–477. Tsimpli, I., Sorace, A., Heycock, C., & Filiaci, F. (2004). First language attrition and syntactic subjects: A study of Greek and Italian near-native speakers of English. International Journal of Bilingualism, 8(3), 257–277. White, L. (1989). Universal grammar and second language acquisition. John Benjamins. White, L. (2003). Second language acquisition and universal grammar. Cambridge University Press.

210

12

210

8 EYE-TRACKING METHODS IN CHILD SLA RESEARCH Paola E. Dussias1 and Karen Miller

Introduction This chapter presents an overview of the advantages and challenges of eye-tracking research methods with young second language (L2) learners (ages 4–12). We first review studies that incorporate eye tracking during reading and on the visual world paradigm to study spoken language comprehension. We then discuss how these methods can be paired with production data (e.g., child-caregiver corpus studies, elicitation tasks; see also Chapter 6 in the present volume) to inform our understanding of the L2 input to children as well as children’s knowledge of language across development. We also describe a few illustrative studies on how each technique has been employed to study child L2 learners.

Using Eye Tracking to Study Reading Comprehension The recording of eye movements to study reading comprehension is a popular method, partly because several decades of eye-movement research has generated detailed information about how visual information is processed while our eyes move across a line of text (Ehrlich & Rayner, 1981; Rayner, 1983; for reviews of the method, see Conklin et al., 2020; Dussias, 2010; Godfroid, 2020; Huettig et al., 2011). For example, we know that, when we read, our eyes do not move smoothly across lines of text as our experience as readers might suggest. Instead, our eyes make small jumps called saccades to move through text. College-age readers normally make about three to four saccadic movements per second, each lasting between 20 and 40 milliseconds. Some saccadic movements are forward (or rightward) movements and others are regressive (or leftward). The average length of a saccade in languages such as English is approximately eight letter spaces, although DOI: 10.4324/9780367815783-8

21

122 Paola E. Dussias and Karen Miller

it can range between 2 and 18 letter spaces, with long saccades typically occurring following a regression. In alphabetic languages with left-to-r ight script direction (i.e., Dutch, English, Spanish), approximately 10% of the time, readers perform regressive saccadic movements to go back to material that has already been read. Saccades are separated by moments during which the eyes remain relatively still. These are called fixations and allow readers to extract information about the text. An average fixation lasts approximately 200–250 milliseconds, although a fixation can range from a little under 100 milliseconds to more than 500 milliseconds. The variability in fixation duration and saccadic length is thought to be associated with the cognitive processes related to the ease or difficulty of comprehending text (Rayner, 1998). An important question in reading research concerns the amount of information that readers acquire at each fixation. Experiments using different paradigms have been instrumental in providing answers. For example, in alphabetic languages, we know that the perceptual span—the region from which readers are able to acquire useful information—is asymmetric in size. For readers of left-to-r ight languages, the span extends from not more than three or four letter spaces to the left of a fixation to approximately 14–15 letter spaces to the right of the fixation. For readers of right-to- left languages (e.g., Hebrew), the span is also asymmetric but in the opposite direction; it is larger to the left of the fixation than to the right of it (Pollatsek et al., 1981). Interestingly, the characteristics of eye movements have been found to be smaller for more densely packed writing systems like Japanese. Mean saccadic size for kanji-based text in Japanese is smaller than English, averaging about 5.8 characters. Mean fixation duration in kanji is also smaller than English, averaging 168 ms, as is the perceptual span for readers of Japanese, which is about 6 character spaces to the right of a fixation and 7 character spaces to the left of a fixation (Ikeda & Saida, 1978). For combined kana-kanji script, which is frequently encountered in Japanese text, the span extends 6 character spaces to the right of a fixation (Osaka, 1992). Several decades of eye-movement research has shown that there are systematic relations between fixation durations and word frequency, word length (defined in terms of number of letters), and contextual predictability (Rayner & McConkie, 1976; Rayner et al., 2007; Godfroid, 2020). Readers spend more time fixating on low-frequency or less commonly used words than on words that occur more frequently in daily speech or printed material. To illustrate this point, fixation duration for a high-frequency word like rain in “The heavy rain damaged the crops” decreases compared to a lower-frequency word like hail, which is matched for length, number of syllables, meaning, and sentence frame (O’Regan & Lévy- Schoen, 1987; Rayner & Pollatsek, 1987). We also know that the amount of time spent fixating a word increases as word length increases (see table 1 in Rayner et al., 1996, for an example). Longer words are also more likely to be re-fixated than shorter words, and words that are likely to be skipped are short function words (such as on, the, and at). Eye movements are also influenced by variations in the content of the sentence. In this respect, predictable words (i.e., words that

21

123

21

Eye-Tracking Methods in Child SLA Research 123

can be anticipated with high accuracy given the preceding context) have immediate effects on fixation duration. Ehrlich and Rayner (1981) showed that readers tend to look at predictable words (e.g., shark in “The coastguard had warned that someone had seen a shark…”) less than unpredictable words (e.g., shark in “The zookeeper explained that the life span on a shark…”) and that readers skip over predictable words more frequently than unpredictable words. What dependent variables are available to the investigator when collecting eye- movement records? For any critical region or regions of interest, a number of measurements can be distinguished. The earliest measure is first fixation, defined as the first time the eyes land on a region (whether a single word or a string of words). This measure appears to be sensitive to lexical information such as word frequency. The next measure is first pass time, which refers to the sum of all fixations in a region, from first entering it until the eyes first exit to the left or right of the region. On regions with only one word, first pass time equals gaze duration (e.g., Rayner & Duffy, 1986). First pass time has been found to be most informative in revealing detections of syntactic anomalies. Regression path time (the sum of all temporally contiguous fixations from the time the reader first enters the region of interest until advancing to the right beyond that region) has also been interpreted as a sensitive measure of immediate anomaly detection, given that readers often respond to processing difficulties by regressing to earlier portions of the sentence (e.g., Liversedge et al., 1998; Wilson & Garnsey, 2009). Another commonly used measure is second pass time, which refers to the time spent reading a region after leaving the region (in other words, excluding first pass time). Finally, total time is the sum of all fixations in a region (effectively, the sum of first pass time and second pass time). A major advantage of the eye-movement recording technique is that it allows researchers to obtain evidence about what is happening during the comprehension of a sentence moment by moment, as processing unfolds, without significantly altering the normal characteristics of either the task or the presentation of the stimuli. Eye movements are a normal characteristic of reading and, while eye-movement records are collected, participants are free to move their eyes along the printed line of text. Recent advancements in eye-movement technology also make available eye-tracking equipment that is extremely versatile and replaces traditional, fixed eye-tracking systems with more flexible head-mounted systems, or remote systems that do not require the use of a headband or head (i.e., chin or forehead) support. In addition, to obtain the dependent measure, participants are not required to perform a secondary task (such as a button or pedal press) that might disrupt the normal comprehension process. Furthermore, thanks to several decades of eye-movement research during reading, we have a very good understanding of the amount of visual information processed while our eyes fixate on text. Many recent studies using eye tracking during reading have contributed to a nuanced understanding of how the languages of L2 speakers are processed in an

214

124 Paola E. Dussias and Karen Miller

integrated language system in which there is extensive interaction. The method has been used more frequently in adult than in child second language acquisition (SLA) studies, largely because of the literacy requirement when conducting reading studies. In Section 3 we will present several studies of how the method has been used with children.

Using Eye Tracking to Study Spoken Language Processing One experimental eye- tracking methodology that has had great success in research examining spoken language processing is the visual world paradigm (e.g., Allopenna et al., 1998). The visual world combines experimental designs typically employed in eye tracking with spoken language comprehension studies.The paradigm has successfully been used to answer research questions related to virtually any area of spoken language comprehension in adult language processing (see Tanenhaus & Trueswell, 2006). The paradigm has also been used in research with monolingual-speaking children (e.g., Trueswell et al., 1999) as well as in studies involving child L2 speakers (e.g., Lemmerth & Hopp, 2019). Because of the versatility of the method to study child language processing, we will describe it in some detail below. Cooper (1974) is widely cited as the first scholar to have used the visual world paradigm to study real-time perceptual and cognitive processes. However, it was not until Tanenhaus et al. (1995) that researchers took notice of the strong link between eye movements and comprehension. In that study,Tanenhaus et al. presented adult participants with one of two visual scenes containing four objects—for example, a towel, a towel with an apple on it, a pencil, and an empty box. Participants were asked to carry out a simple task by auditory instruction, such as “Put the apple on the towel in the box.” The auditory instruction is syntactically ambiguous at the point when participants hear “towel” because it could be interpreted as the goal (i.e., move the apple to the empty towel) or as a modifier (i.e., move the apple that is on the towel to a to-be-named location). The ambiguity is resolved at the moment when participants hear the actual goal, “box.” While listening to experimental instructions, participants’ eye movements were recorded by way of a head-mounted eye tracker (i.e., a light-weight wearable device placed over the participant’s head). Tanenhaus et al. found that, upon encountering the region of ambiguity, participants’ eye movements toward the incorrect target location (e.g., the empty towel) increased. In other words, participants’ eye movements reflected the local syntactic ambiguity, thereby confirming that eye movements are closely time locked to the unfolding auditory signal. Researchers using the visual world paradigm look for the presence of competitor and anticipatory effects relative to a neutral baseline. Competitor effects occur when multiple candidate representations are activated and compete for selection, whereas anticipatory effects are observed when upcoming linguistic material is predicted before it is heard or read. Competitor effects are taken to reflect delayed

214

125

214

Eye-Tracking Methods in Child SLA Research 125

processing, whereas anticipatory effects are interpreted as indexing facilitated processing. Typically, the total proportion of fixations over trials (targeted in pre- defined interest areas) and over participants are calculated and plotted over a millisecond timescale. This method of data visualization permits a closer inspection of how processing is affected in real time. A good illustration of how the competitor effect is employed to examine spoken language comprehension is found in a study by Allopenna et al. (1998). In the version of the paradigm used in that study, adult participants were presented with a visual scene of four line drawings while they listened to instructions to click on one of the objects. In critical trials, target items were paired with phonological competitors. For example, if participants heard “Click on the beaker,” a line drawing of a beaker was presented alongside a picture of a “beetle,” a “speaker,” and a “carriage.” The non-target candidate “beetle” is a phonological cohort due to the phonological overlap in the first syllable [bi]. Results plotting the proportion of fixations over time showed an initial overlap in looks to the target item (e.g., beaker) and to the phonological cohort (e.g., beetle), indicating that participants considered both items in real-time processing. However, after around 400 ms, looks to target items began to increase and looks to the phonological cohort diminished, indicating the participants’ convergence on the target item. The momentary overlap between the target item and the phonological cohort is the classic competitor effect. The effect reflects the additional auditory input that participants require to correctly identify the target item because of the existence of the initial phonological overlap between two items on the visual scene. A study by Lew-Williams and Fernald (2007) makes use of the anticipatory effect to ask whether young Spanish- speaking children and native Spanish- speaking adults use gender information encoded in prenominal modifiers (e.g., articles) to facilitate the processing of an upcoming noun. Using the looking- while-listening technique, an eye-tracking measure of real-time language processing, they presented participants with two-picture visual scenes in which the pictured objects either matched in gender (e.g., “pelota”/ball [fem.] displayed alongside “galleta”/ cookie [fem.]) or differed in gender (e.g., “pelota”/ ball [fem.] displayed with “carro”/car [masc.]), and asked participants to follow simple instructions in Spanish that named one of two pictured objects (e.g., “encuentra la pelota”/find the ball; “¿dónde está la pelota?”/where is the ball?). The task was to click on the named object. The findings revealed that adults were able to orient their eyes towards target objects more quickly on different gender trials (i.e., when the gender information encoded in the article was informative) than on same gender trials, eliciting an anticipatory effect. In other words, when adult speakers heard “encuentra la [fem.]” in a trial where they were presented with a picture of “galleta [fem.]” displayed next to the picture of “carro [masc.],” the feminine gender encoded in the article guided them to look towards the picture of “galleta,” even before the onset of the noun [ga] was heard. In addition, the children identified the referent of a noun more quickly

216

126 Paola E. Dussias and Karen Miller

when the preceding article was informative (i.e., when it provided a cue to the correct referent). Although it is tempting to view competitor and anticipatory effects as opposite effects, in fact they are not. They are effects that can only be determined relative to a neutral baseline. For example, in the case of the Spanish grammatical gender findings reported in Lew-Williams and Fernald (2007), if the gender information encoded in the article is not informative, the basic task is one of target word identification. This is precisely what goes on in the same-gender trials (e.g., “pelota”/ball [fem.] displayed alongside “galleta”/cookie [fem.]); therefore, these trials constitute the neutral baseline. The effect of interest is whether the gender information present on the definite article will affect the time course of a target relative to the neutral baseline. As described above, in the case of Spanish, it does, and it does so by speeding up the time course, hence the presence of an anticipatory effect.

Studies Illustrating the Use of Eye-Tracking Methodology with Child L2 Learners Relative to studies examining children’s online processing in their first language, there is a paucity of research investigating child language processing in their second language. Despite this, the versatility of eye-tracking methodology has allowed L2 scholars to ask questions at different levels of language comprehension.We provide a few illustrations below. In a study examining syntactic processing, Cristante (2017) asked whether 7- year- old and 10- year- old child L2 learners of German (L1 Turkish; onset of L2 acquisition 3 to 4 years of age) processed sentences with non-canonical (or less typically encountered) word order similarly to an age-matched control group of native German-speaking children. The central question was whether children would use an agent-first strategy (i.e., assigning the agent role to the first-encountered noun phrase [NP] in the sentence; see Aschermann et al., 2004; Dittmar et al., 2014) to determine thematic roles when processing sentences with non-canonical word order. Although Cristante (2017) examined a variety of structures, for illustrative purposes we will only describe the experiment investigating the processing of German active and passive sentences. German passives are interesting for two reasons. First, they are acquired late by native German- speaking children, making their text comprehension challenging (see Cristante, 2017, and references therein). Second, processing passives in an adult native-like manner requires that children suppress reliance on the agent-first strategy and process the morphosyntactic information encoded in the verb to assign the correct thematic roles. For the purposes of triangulation (see below for the importance of triangulation when conducting research with L2 children), Cristante measured offline comprehension using a picture-matching task and online processing using a visual word experiment.The results for the 10-year-old native-speaking children

216

217

216

Eye-Tracking Methods in Child SLA Research 127

and L2 child learners showed that they were highly accurate in the offline task and also behaved similarly on the eye-tracking experiment, assigning the role of agent to the first noun and later revising their initial interpretation over similar temporal windows. Although the two 7-year-old groups were also accurate on the picture- matching task, the eye-tracking results revealed that the native German-speaking children revised their initial (but incorrect) agent-first interpretation earlier in the sentence than the L2 children, indicating that the L2 7-year-old children had difficulty suppressing their initial preference.The 7-year-old L2 children were exposed for a shorter time to the target language than the L1 children and may still be in the process of acquiring the cue weights in a nativelike manner. As noted by the author, this finding would have been missed had the data been collected using only offline measurements. Other studies have employed eye-tracking methodology to examine morphosyntactic processing in L2-speaking children, focusing mainly on grammatical gender processing (e.g., Lew-Williams & Fernald, 2007). Past studies with adult L2 speakers have shown that the processing of grammatical gender can pose challenges for L2 learners; whereas some adult L2 speakers are indistinguishable from native speakers in their ability to use grammatical gender cues encoded in prenominal modifiers to anticipate an upcoming noun (e.g., Dussias et al., 2013; Hopp, 2013), other L2 speakers do not engage in anticipatory processing. Given that proficiency level, immersion experience, and the nature of the L1 have been implicated in grammatical gender processing in adult L2 speakers, studies that have examined morphosyntactic processing in children have also investigated the role of these variables. For example, Lew-Williams (2017) asked whether immersion in an L2 environment conferred an advantage in language processing relative to typical L2 classroom environments. In three eye-tracking experiments, 6-and 10- year-old children who were enrolled in Spanish immersion elementary schools listened to sentences with articles that conveyed information about the grammatical gender and the biological gender of an ensuing noun. Analyses of the time course of looking to the target on different-gender and same-gender trials showed that native-speaking children of Spanish used articles to guide their attention to target referents in informative contexts (i.e., different-gender trials) for grammatical and biological gender. The Spanish L2 children did not take advantage of the gender encoded in the articles as cues to grammatical gender but succeeded in doing so for biological gender. A recent study by Lemmerth and Hopp (2019) extended the grammatical gender processing research by examining the degree to which Russian gender influenced German production and comprehension in a group of simultaneous and early successive bilingual Russian-German children aged 8–9 years. An elicited production task in German showed that children assigned target gender to nouns irrespective of whether the nouns belonged to the same or different gender class in German and Russian. A visual-world eye-tracking task demonstrated that the simultaneous bilingual children and monolingual children alike made predictive

218

128 Paola E. Dussias and Karen Miller

use of gender irrespective of gender congruency. However, the L2 children showed predictive gender processing only for lexically congruent nouns, suggesting that gender information in L2 child learners is accessed through the L1 lexicon. Finally, Pellicer-Sánchez et al. (2020) used eye tracking to examine the processing of multimodal input and its impact on reading comprehension in young 11-to 12-year-old learners who had received five years of instruction in English as a foreign language. Prior research has demonstrated that, when instruction includes words (written or spoken) and graphics in reading comprehension activities, learning outcomes are superior (e.g., better retention and better application of newly learned knowledge) relative to presenting written text alone (Mason et al., 2015). The goal in Pellicer-Sánchez et al.’s study was to investigate whether young L2 learners formed better referential connections when written text and visual materials were presented with or without accompanying auditory input. Children’s eye movements were recorded while they read a text for comprehension. Some children were assigned to a reading-only condition or to a reading- while-listening condition. In both cases, the text was supported with pictures. Analyses on several dependent measures showed that the L2 children assigned to the reading-while-listening condition spent proportionally more time and had more fixations on the pictures than children assigned to the reading-only condition.This suggests more allocation of attention and better integration of verbal and pictorial sources of information when the text was accompanied with auditory input. On the other hand, the children in the reading-only condition spent proportionally more time and had a higher percentage of fixations on the text than the children in the reading-while-listening condition. In addition, the children in the reading-while-listening condition comprehended the text as competently as the children in the reading-only condition, a finding that does not support prior studies arguing for a detrimental effect of dual modality presentations during reading comprehension tasks. Other eye-tracking studies investigate children’s processing of language by having them read sentences on a display. This allows researchers to determine how children interpret different types of dependencies within a sentence, as, for example, pronoun and antecedent dependencies or agreement dependencies. Eilers et al. (2018) tracked the gaze durations of 9-year-old German-speaking children as they read sentences that required anaphora resolution. Sentences were of the type “Leon [masc.]/Lisa [fem.] shooed away the sparrow [masc. in German]/the seagull [fem. in German] and then he [masc.] ate the tasty sandwich.” In German, sparrow takes a masculine determiner while seagull takes a feminine one. In their first experiment, they found no qualitative differences between children’s and adults’ processing of the pronouns. Both groups showed longer gaze durations when the subject and the pronoun mismatched in gender than when they matched in gender. They also found no evidence of interference of a gender-matching object. In other words, the masculine or feminine gender of the object (e.g., sparrow vs. seagull) did not impact their reading of these sentences. In their second experiment,

218

219

218

Eye-Tracking Methods in Child SLA Research 129

however, they found that children who did not detect gender mismatches in their offline task also showed no impact of gender mismatching in the eye-tracking task, while children who reported detection of the mismatch showed a reading pattern more similar to that of the adults.The pattern of results was similar to a prior study on children’s detection of semantic anomalies (Connor et al., 2014). In this section, we have shown how the recoding of eye movements has become a widely used response measure for addressing many classical questions in child L2 acquisition and, in particular, how the visual world paradigm has opened up relatively uncharted territory in the study of language comprehension in pre-literate L2 acquirers. In the next section, we focus on the challenges and considerations that need to be addressed when collecting eye movement data with L2 children.

Challenges and Considerations in Eye-Tracking Research with Child L2 Learners Although eye tracking is an excellent tool for studying language acquisition in young children, it can present many challenges. Because our own research focuses on bilingual and bidialectal development in young children, we employ eye- tracking methods not only in lab settings but also in the field. Much of our data collection is carried out abroad and often in remote locations, where the community has very little experience participating in experimental research. In such contexts, the recruitment of children for studies involving the use of an eye tracker can pose additional challenges, as caregivers may initially feel uncertain about the safety of such methods. As such, additional considerations must be made at each level of data collection, from recruitment to final data reporting. Here, we provide a brief overview of some of these challenges and discuss considerations that can be taken into account when employing eye-tracking methods in developmental linguistic research.

Recruitment and Community Outreach Setting up research studies with young children involves a substantial amount of advance preparation, and this is especially true when working in field settings. In university settings, child recruitment databases and university research websites are often available for recruiting young children and their caregivers. These tools reduce some of the initial groundwork required for later recruitment. When carrying out research with children in the field (e.g., abroad, in schools, in remote locations), these tools are generally not available, but there are other ways to initiate contact with families. In our own work, we follow a three-step process. First, we work directly with a native-speaking research assistant from the same local area as the children. This person (e.g., an undergraduate/g raduate student or local resident with teaching or research experience) is trained to make initial in-person contact with local schools and community leaders and provide them with an

310

130 Paola E. Dussias and Karen Miller

overview of the study, including recruitment letters and consent forms. The sole purpose of this first meeting is to determine whether the school (or community) is interested in participating in the study. The second step involves working with the school or community leaders to inform the community about the study. This is generally done by presenting the basic procedures of the experimental tasks at community-wide meetings (e.g., parent-teacher conferences, school festivals, teacher/staff meetings, etc.). This can be done one-on-one with caregivers, in small groups, or in large groups; it is important that parents, teachers, and children are informed about the study procedures and meet the experimenters who will be working with their children. Often, we will volunteer in the children’s classrooms both before and during the study to create connections with the families and teachers. Related to this latter point, the third step involves providing a service to the schools/communities that we work with.Teaching (or co-teaching) an activity in the classroom, presenting at a teacher in-service meeting, or creating relevant teaching materials are ways that we have translated our research to the local community and established relationships for future research.

Materials and Procedures An important matter with any experimental method testing young children’s language development is maintaining the child’s attention throughout the entirety of the task. Offline tasks often have an advantage in that they are similar to the types of activities that children are already familiar with from educational experiences. Object naming, picture description, and choosing between two pictures, for example, are methods often used in school settings and allow the experimenter to engage the child more directly and maintain their attention for up to 20–30 minutes at a time (see also Chapters 5 and 6 in this volume).These tasks are simple and often supply the data necessary for assessing children’s language development. The use of additional eye-tracking equipment can often result in less interactions between experimenter and child throughout the task, which can impact children’s ability to stay focused and could result in data loss. If the child turns their head away from the screen (or blinks), the eye tracker cannot record their gaze location, resulting in data loss. We have found that, if children grow tired during an experimental task, they will tend to move their head (e.g., to look for their caregiver or to look around the room). In offline tasks, this has less of an impact because the experimenter interacts more directly with the child, thereby maintaining their attention, and head movement generally does not impact offline data collection. Indeed, the loss of data in eye-tracking tasks with young children is often higher than the loss of data in offline tasks. To address issues related to attention in eye- tracking studies with children, both in the lab and in field settings, it is important to implement procedures that will reduce children’s head movements and, at the same time, engage children more directly throughout the entirety of the task so that data loss is minimized. Procedures related to the lab setup (e.g., seating,

310

13

310

Eye-Tracking Methods in Child SLA Research 131

eye-tracker enclosure, positioning of child and caregiver), calibration, and task design can be optimized in a variety of ways. Working in child L2 learner settings requires us to collect data in a university lab setting and also in schools and homes. The setup of the experimental space, whether in a university campus lab or in the field, can be managed to protect against data loss. Small rooms are generally preferred and, in cases where there may be outside distractions (e.g., if using a larger room, or a space inside the child’s school or home), an enclosure that fits around the eye tracker and computer monitor can be useful. Enclosures can be portable, made of a variety of materials (e.g., cardboard, cloth), and we have found that they are particularly useful for concealing distractions, not only during eye-tracking tasks but also when collecting offline data. In our own studies, we have used an enclosure made of cloth material and held together with plastic PVC pipe. During the course of the experiment, the caregiver can sit outside the enclosure, allowing her/him to respond verbally to the child if needed and, at the same time, minimizing visual distractions. Seating practices differ depending on the age of the child. While infants are often positioned on the caregiver’s lap or in a car seat (Gredebäc et al., 2010; Wass et al., 2014), older children (4 years of age and above) can be positioned in a high chair or an office chair (Lukyanenko & Miller, 2018, 2019), with or without a booster seat, depending on the stature of the child, with the goal of positioning the child’s head parallel to the computer screen on the table. The best option is a chair that does not swivel or roll and that has a back, arm rests, and also an adjustable footrest in the event that children’s feet do not reach the floor. Arm rests tend to decrease arm movements; footrests discourage children from swinging their legs back and forth and accidentally kicking and jiggling the table on which the eye tracker is placed. Chairs can also be height adjustable, which is especially useful when the available tables in a field are not able to be adjusted. Calibration refers to the process of estimating the geometric characteristics of the participant’s eyes so that the eye tracker can integrate that information when calculating the participant’s gaze data during the experimental task. Visual markers are displayed at known locations on the screen and the participant’s gaze location is calculated as s/he fixates those targets. In most studies, including our own, a 5-point calibration procedure is common with toddlers, while a 9- point calibration procedure is appropriate for children 4 years of age and older. A successful calibration ensures the accuracy and precision of the eye-tracking data collected during the experimental task, and some eye trackers, like SR Research Eyelink (SR Research Ltd.), provide a quantitative assessment of the quality of the calibration. For studies with children, calibration procedures are especially important because calibration data from toddler and school-aged children have been found to be less accurate and precise than the advertised eye-tracker specifications (Dalrymple et al., 2018). In one study, Dalrymple et al. (2018) compared the calibration data of adults with that of three groups of children: 18-month-olds, 30-months-olds, and

312

132 Paola E. Dussias and Karen Miller

9-to 11-year-olds. They found that precision and accuracy was much more variable in the youngest children, with several toddlers rejected for poor calibration accuracy. Older children (9–11-year-olds) on average showed lower precision and accuracy than expected based on the eye-tracker’s specifications, but some individual children showed calibration values within the advertised range. In another study, Van der Stigchel et al. (2017) found that children’s capacity to disengage visual attention—in order to look from one stimulus to another—increases with age, which can lead to timing differences in looking data across groups.This could have implications for determining which time window is analyzed when comparing the data of children from different age groups (see also Wass et al., 2014, for age effects on eye-tracking data quality). There is also an indication that ethnicity can impact data quality even in adults, a finding which should be taken into consideration when working with bidialectal and bilingual children (Blignaut & Wium, 2014). Dalrymple et al. (2018) note that, if not carefully assessed, systematic differences in data quality could be misinterpreted as differences between different participant groups or between different experimental conditions, and they recommend that researchers who implement eye-tracking studies with children and toddlers perform independent verification of calibration accuracy and precision on a participant-by-participant basis and report these measures in all eye-tracking studies (see Dalrymple et al., 2018, for their procedures, materials, and user manual, which are freely available to researchers; and Oakes, 2010, for a series of eight guidelines for reporting eye- tracking data with young children). Additional methods can be implemented in the design of the experiment to deal with data loss in child studies. Designing visual world experiments such that the stimuli are placed far apart from each other on the screen allows one to more readily determine which stimulus is being fixated. In most studies with 2-to 6-year-old children, experimental displays are comprised of only a few stimuli that are placed several inches apart (see Borovsky & Creel, 2014; Brandt-Kobele & Hohle, 2010; Lemmerth & Hopp, 2019; Lew-Williams, 2017; Lukyanenko & Miller, 2018, 2019; among many others). For example, in a study of how dialect impacts 4-to 6-year-old children’s interpretation of morphology, Lukyanenko and Miller (2019) asked Chilean children to view displays that contained two pictures, each about 5.5 inches square and approximately 10 inches apart, which resulted in only about 10% data loss (i.e., due mostly to looking away, blinking, and fidgeting). In reading experiments that involve textual stimuli, sufficient line spacing and font size are important. Most studies use double spacing and Consolas or Courier New fonts (Allopenna et al., 1998). Texts are generally focused in the center of the display and usually do not involve more than 6 lines of text. In addition to the setup of the experimental space and the calibration procedures for determining the precision and accuracy of the eye-tracking data, task design can also be controlled to minimize data loss. The overall structure of the experiment and careful control of experimental prompts and visual displays

312

13

312

Eye-Tracking Methods in Child SLA Research 133

are components of the design to be considered. Experimental studies with school-aged children (ages 4–12) are generally structured to imitate everyday school and play activities. Experimental tasks have included the use of puppets, acting out stories with toys or retelling stories, and choosing between pictures that best represent a spoken utterance, among many other tasks. The more interesting the task is, the easier it is for children to stay engaged. For example, a common method used by Crain and colleagues, the truth-value judgment task (see Thornton, 2017, and references therein) involves feeding puppets various items (e.g., rags, vegetables, deserts) to indicate comprehension. If the puppet describes a context accurately, he is fed something nice (e.g., a cookie), if not, he is fed something funny (e.g., a rag or a dirty sock). The task works extremely well with children; we often hear them laughing throughout the experimental session. It can also be run on a computer by having children click on items (e.g., using a button press or computer mouse) to feed a character in the display. In our own version of this task, the character (a monster) would munch loudly on the items and then eructate, a task that children enjoyed (Miller & Schmitt, 2004). This forced-choice task, as well as the others described above, can be easily implemented in eye-tracking experiments and, with school-aged children, can be more engaging than passive looking. For example, Brandt- Kobele and Hohle (2010) combined a preferential looking task with a forced- choice task when investigating 3-to 4-year-old children’s comprehension of verbal inflection. Trueswell et al. (1999), among many others, tracked 5-year- old children’s looking behavior while they participated in an act-out task that assessed their processing of garden-path sentences.2 Another factor to consider is the length of the task itself; shorter tasks or block designs are preferable with younger school-aged children. In our own work, we often build the experiment around a board game. After every few trials of the experiment, children place a sticker or stamp on a board game path (made on a sheet of paper) until they reach the end of their “journey” (i.e., the experimental task). This keeps children engaged and gives them a sense, throughout the experiment, of task length. There are several factors to keep in mind when creating experimental prompts and visual displays. Auditory stimuli should be recorded by a native speaker of the language community that the children belong to. Previous studies indicate that children are sensitive to unfamiliar accents in experimental stimuli, which can impact their behavior in a task (see Harte et al., 2016, for a review). In one study, Nathan et al. (1998) found that 7-to 8-year-old children modify or “correct” experimental stimuli produced in an unfamiliar accent. Bent (2014) found that stimuli presented in a non-native accent were significantly more difficult to recognize than stimuli presented in the native accent. And, in eye- tracking research, Mulak et al. (2013) reported that, while 15-month-olds showed differences in fixation rates on named vs. non-named objects when objects were named in their native accent, they did not do so when the objects were named

314

134 Paola E. Dussias and Karen Miller

in a non-native accent. A non-native accent in experimental tasks can impact children’s behavior differently, depending on the child’s age and familiarity with multiple accents in their day-to-day linguistic experiences. Van Heugten and Johnson (2017) note that children who hear multiple accents process language differently from those who hear a single accent, and this factor should be kept in mind, especially when testing children from diverse language backgrounds. To record the auditory stimuli, we generally recruit an adult speaker from the same local community as the participants and provide training to ensure stimuli are natural sounding. Although child-directed speech (i.e., motherese) is used in experiments with infants and toddlers, this is generally not preferred for school- aged children and can be somewhat distracting for older children, resulting in data loss. As with auditory stimuli, special care must be taken when creating visual displays in eye-tracking studies with children. Children’s looks to objects in a visual display can be influenced by a number of factors, including the object’s shape, size, and color, whether children recognize the object or not, and whether the object is animate vs. inanimate, among other factors. Lemmerth and Hopp’s (2019) recent paper on L2 children’s processing of grammatical gender provides a good model for ensuring that visual materials are controlled for. In their study, Lemmerth and Hopp (2019) pretested their pictures with both L2 and monolingual children to see whether they would recognize the objects and know their labels. Pictures that were not named accurately were removed from the study, thereby creating a subset of words that children clearly recognized. When testing children who speak different dialects of the same language, similar methods are required. As an example, in our own work with children acquiring different dialects of Spanish, words that we often use in experimental tasks with children, such as “monkey,” “piglet,” or “marbles,” have different forms depending on the dialect—for example, “monkey” is monito in Chile but chango in Mexico and the Dominican Republic. These differences are relevant, especially when certain phonetic properties of words need to be controlled for. It is also possible to use pictures that have already been normed on children. Although originally normed on adults, the International Picture Naming Project (IPNP) provides a subset of pictures that were also normed on children, as subsequent studies have utilized these pictures in child development research. For example, Sheng and McGregor (2010) used 120 pictures from the IPNP to examine the accuracy, latency, and errors of picture naming in children with and without specific language impairment (SLI).The MacArthur-Bates Communicative Development Inventories provide information about children’s knowledge of words across development. Cycowicz et al. (1997) provide normative measures for 400 line drawings viewed by 5-and 6-year-olds, where children were asked to name the picture, rate the complexity of the pictures, and rate their familiarity with the concept depicted by each picture (see also Berman et al., 1989, for a similar study with 8-to 10-year-old children). While studies such as these

314

135

314

Eye-Tracking Methods in Child SLA Research 135

provide a starting point for creating visual stimuli, less data is available comparing children from different cultural and linguistic backgrounds. Careful measures must also be taken when preparing materials in reading studies; length of the text must be age-appropriate, and word difficulty and frequency need to be assessed through child-appropriate corpora and acquisition norms (Gagl et al., 2015; Huestegge et al., 2009; Joseph et al., 2013). Past work has shown that reading is more effortful for younger children; their ability to decode words is less automatized than it is for adults and older children. For example, Huestegge et al. (2009) asked children to read target words that were embedded in active declarative sentences. In this reading task, they found that gaze durations on target words were impacted by word frequency and word length such that frequent words and short words were read with shorter gaze durations than infrequent and long words. In another study, Gagl et al. (2015) investigated the extent to which letter, phoneme length, and consonant clusters contribute to word-length effects in early readers and found that words with consonant clusters had the largest impact on the word-length effect. It is also important to keep in mind that certain techniques used to investigate reading in adults do not always transfer well to children. One example is the boundary paradigm to investigate parafoveal processing (Schotter et al., 2012); there have been fewer attempts to carry out such studies with children (see Blythe & Joseph, 2011, for a review). To overcome some of the difficulty of carrying out this technique with children, Marx et al. (2015) used a novel incremental priming technique that manipulated the salience of the previews by varying their perceptibility. In particular, the salience of their parafoveal previews was manipulated by gradually reducing the visual integrity of the preview—by randomly exchanging certain numbers of the black pixels from the bitmap of the preview with white pixels. They then examined whether first fixation duration and gaze duration were impacted by the salience levels of the previews to address the question of whether they induced facilitation or interference in young readers.

Data Analysis One of the most difficult issues that we have found in child language acquisition research is integrating the results of various studies into a coherent account of child language development. Asymmetries in children’s production and comprehension of linguistic forms are often reported (Ercenur & Papafragou, 2016; Hendriks & Koster, 2010; van Hout et al., 2010; Verhagen & Blom, 2014), and different methodologies used across different experiments sometimes provide what may seem to be contradictory results. Eye-tracking studies, for example, sometimes reveal early knowledge of a linguistic form, knowledge that was not apparent in offline tasks. Are differences in results due to methodology, whereby certain methods are better at assessing child knowledge than others? Or do they

136

136 Paola E. Dussias and Karen Miller

represent fundamentally different types of knowledge, such as specific information about children’s implicit vs. explicit knowledge of particular linguistic forms (Koder & Falkum, 2020; Lukyanenko & Miller, 2019; Nilsen et al., 2008; Ruffman et al., 2001, among many others)? The best solution for addressing these issues is triangulation. Studies that triangulate data from offline and online measures, as well as from corpus studies of child and caregiver spontaneous conversational speech, can offer a more complete, holistic picture of children’s developing knowledge of language (see also Chapter 5 in this volume). In our own studies, we frequently triangulate our data by assessing children’s knowledge of a linguistic form through a variety of methods. For example, in a series of studies on Chilean Spanish-speaking children’s acquisition of plural morphology, we carried out an eye-tracking task that examined children’s looks to plural vs. singular images upon hearing plural and singular noun phrases, we collected data on the children’s comprehension of plural markings in act-out and picture-matching tasks, and we examined children and their caregivers’ production of the plural marker in elicitation tasks and spontaneous conversational speech (Lukyanenko & Miller, 2019; Miller, 2013, 2014, 2019; Miller & Schmitt, 2012). Act-out and picture-matching tasks involve providing children with toys to act out their interpretation of sentences and choosing the picture, out of a set of pictures, that best represents an experimental prompt (see Schmitt & Miller, 2010, for more details). Spontaneous conversational data showed that children closely parallel their caregivers in their omissions of the plural marker, both in terms of the overall rates of omissions and the various linguistic and extralinguistic factors constraining omissions (Miller, 2013). What’s more, children who omitted the plural marker more often in their speech showed more difficulty in associating the plural marker to a more-than- one interpretation in offline tasks of comprehension; however, this correlation between comprehension and production was only found for sentence repetition tasks in which children produced complex plural noun phrases embedded in longer sentences. No correlation was found between comprehension and single- word naming tasks (Miller, 2014).This highlights the importance of triangulating production data. In addition, our eye-tracking data did not directly correlate with offline tasks assessing comprehension of the plural (Lukyanenko & Miller, 2019). Instead, in the eye-tracking task, children distinguished plural from singular more often than they did in the offline task. We suggest that the eye-tracking task, an implicit measure, may represent children’s earlier knowledge of the plural marker, while the offline task, which is an explicit measure, may represent later, more entrenched knowledge. An effect of age was found in the data such that older children outperformed younger children—which we took as support for this explanation. Taken together, triangulated data can provide a more detailed picture of the development of production and comprehension of linguistic forms, including, perhaps, different levels of knowledge of language and use in different types of linguistic contexts.

136

137

136

Eye-Tracking Methods in Child SLA Research 137

Concluding Remarks In the past several years, there has been an increase in the number of studies that employ eye-tracking methods to study child L2 acquirers. The number of findings involving different types of linguistic structures is rapidly growing, and explanations are beginning to emerge that characterize the information that child L2 learners use when processing their second language. The framing questions underlying much of the child L2 processing research are similar to those guiding adult L2 processing studies: What is the role of the L1? Are there differences between L1 and L2 child acquirers? Which learners’ characteristics interact with linguistic aspects of child L2 language learning and processing? Given the nature of the research questions, the need for sophisticated online behavioral measures becomes central. The recording of eye movements brings with it a set of exciting possibilities to study real-time child L2 language acquisition and comprehension. Here, we have presented a glimpse of what is possible and of the considerations that need to be taken when using eye-movement methods. The advantage of the method is the range of theoretical questions in child L2 acquisition that can be asked, but only when the data collected follow careful quality assurance metrics.

Further Readings Godfroid, A. (2020). Eye tracking in second language acquisition and bilingualism: A research synthesis and methodological guide. Routledge. This nine-chapter volume provides an authoritative look into the issues to be considered when conducting eye-tracking research to study language comprehension in speakers of more than one language. It is a must read for everyone new to the use of eye-tracking methodology in SLA and bilingualism, as well as an excellent go-to reference for those already familiar with the method. Joseph, H., & Nation, K. (2018). Examining incidental word learning during reading in children: The role of context. Journal of Experimental Child Psychology, 166, 190–211. https://doi.org/10.1016/j.jecp.2017.08.010 This study examines whether encountering novel words in semantically diverse contexts leads to better learning in children aged 10 and 11. Tanenhaus, M. K., & Trueswell, J. C. (2006). Eye movements and spoken language comprehension. In M. J. Traxler & M. A. Gernsbacher (Eds.), The handbook of psycholinguistics (pp. 863–900). Elsevier. https://doi.org/10.1016/B978- 012369374-7/500237 This chapter presents an overview of research in which eye movements are monitored as speakers follow spoken instructions to move objects or pictures in a visual workspace.

Tools and Resources • •

SR Research: www.sr-research.com PyGaze: www.pygaze.org

138

138 Paola E. Dussias and Karen Miller

Discussion Questions 1. What factors might be responsible for the processing differences that are found between children and adults? And what would the theoretical implications of those differences be? 2. What role does children’s ability to integrate contextual and discourse cues play in their behavior in eye-tracking tasks? 3. What are the important methodological concerns to be addressed when comparing adult behavior to child behavior?

Notes 1 Paola E. Dussias was partially supported by NSF OISE 1545900.We would like to thank the anonymous reviewer and the editors for their helpful and insightful comments. All errors are our own. 2 A garden-path sentence is a grammatically correct sentence that initially leads the reader to parse the sentence in a way that is later unparsable. The reader is “led down a garden path,” meaning they are initially deceived or tricked into misparsing the sentence. An example of a garden-path sentence is “As the mother bathed the baby cried.” In this sentence, readers often initially parse “the baby” as the direct object of the sentence. However, upon hearing the verb “cried,” the reader must reparse the sentence.

References Allopenna, P., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the time course of spoken word recognition using eye-movements: Evidence for continuous mapping models. Journal of Memory and Language, 38, 419–439. https://doi.org/10.1006/ jmla.1997.2558 Aschermann, E., Gülzow, I., & Wendt, D. (2004). Differences in the comprehension of passive voice in German-and English-speaking children. Swiss Journal of Psychology, 63, 235–245. https://doi.org/10.1024/1421-0185.63.4.235 Bent, T. (2014). Children’s perception of foreign-accented words. Journal of Child Language, 41, 1334–1355. https://doi.org/10.1017/S0305000913000457 Berman, S., Friedman, D., Hamberger, M., & Snodgrass, J. G. (1989). Developmental picture norms: Relationships between name agreement, familiarity, and visual complexity for child and adult ratings of two sets of line drawings. Behavior Research Methods, Instruments, & Computers, 21, 371–382. https://doi.org/10.3758/BF03202800 Blignaut, P., & Wium, D. (2014). Eye-tracking data quality as affected by ethnicity and experimental design. Behavioral Research, 46, 67–80. https://doi.org/10.3758/s13 428-013-0343-0 Blythe, H., & Joseph, H. (2011). Children’s eye movements during reading. In S. P. Liversedge, I. D. Gilchrist & S. Everling (Eds.), The Oxford handbook of eye movements (pp. 643–662). Oxford University Press. Borovsky, A., & Creel, S. (2014). Children and adults integrate talker and verb information in online processing. Developmental Psychology, 50, 1600–1613. https://doi.org/ 10.1037/a0035591

138

319

138

Eye-Tracking Methods in Child SLA Research 139

Brandt-Kobele, O.-C., & Hohle, B. (2010).What asymmetries within comprehension reveal about asymmetries between comprehension and production: The case of verb inflection in language acquisition. Lingua, 120, 1910–1925. https://doi.org/10.1016/j.lin gua.2010.02.008 Conklin, K., Alotaibi, S., Pellicer-Sánchez, A., & Vilkaite-Lozdiene, L. (2020). What eye- tracking tells us about reading-only and reading-while-listening in a first and second language. Second Language Research, 36, 257–276. https://doi.org/10.1177/026765832 0921496 Connor, C. M. D., Radach, R.,Vorstius, C., Day, S. L., McLean, L., & Morrison, F. J. (2014). Individual differences in fifth graders’ reading and language predict their comprehension monitoring development: An eye-movement study. Scientific Studies of Reading, 19, 114–134. https://doi.org/10.1080/10888438.2014.943905 Cooper, R. M. (1974). The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing. Cognitive Psychology, 6, 84–107. https://doi.org/10.1016/ 0010-0285(74)90005-X Cristante,V. (2017). The processing of non-canonical sentences in children with German as a first or second language and German adults: Evidence from an eye-tracking study [Doctoral dissertation, Westfälischen Wilhelms-Universität]. http://nbn-resolving.de/urn:nbn:de:hbz:6- 11269715348 Cycowicz,Y. M., Friedman, D., Rothstein, M., & Snodgrass, J.G. (1997). Picture naming by young children: Norms for name agreement, familiarity, and visual complexity. Journal of Experimental Child Psychology, 65, 171–237. https://doi.org/10.1006/jecp.1996.2356 Dalrymple, K., Manner, M., Harmelink, K., Teska, E., & Elison, J. (2018). An examination of recording accuracy and precision from eye tracking data from toddlerhood to adulthood. Frontiers in Psychology, 9, 1–12. https://doi.org/10.3389/fpsyg.2018.00803 Dittmar, M., Abbot- Smith, K., Lieven, E., & Tomasello, M. (2014). Familiar verbs are not always easier than novel verbs: How German pre- school children comprehend active and passive sentences. Cognitive Science, 38, 128–151. Dussias, P. E. (2010). Uses of eyetracking data in second language sentence processing research. Annual Review of Applied Linguistics, 30, 149–166. https://doi.org/10.1017/ S026719051000005X Dussias, P. E., Valdés Kroff, J. R., Guzzardo, R. E., & Gerfen, C. (2013). When gender and looking go hand in hand. Studies in Second Language Acquisition, 35, 353–387. https://doi. org/10.1017/S0272263112000915 Ehrlich, S. F., & Rayner, K. (1981). Contextual effects on word perception and eye movements during reading. Journal of Verbal Learning & Verbal Behavior, 20, 641–655. https://doi.org/10.1016/S0022-5371(81)90220-6 Eilers, S., Tiffin-Richards, S. P., & Schroeder, S. (2018). Individual differences in children’s pronoun processing during reading: Detection of incongruence is associated with higher reading fluency and more regressions. Journal of Experimental Child Psychology, 173, 250–267. https://doi.org/10.1016/j.jecp.2018.04.005 Ercenur, U., & Papafragou, A. (2016). Production-comprehension asymmetries and the acquisition of evidential morphology. Journal of Memory and Language, 89, 179–199. https://doi.org/10.1016/j.jml.2015.12.001 Gagl, B., Halwelka, S., & Wimmer, H. (2015). On sources of the word length effect in young readers. Scientific Studies of Reading, 19, 289–306. https://doi.org/10.1080/10888 438.2015.1026969

410

140 Paola E. Dussias and Karen Miller

Godfroid, A. (2020). Eye tracking in second language acquisition and bilingualism: A research synthesis and methodological guide. Routledge. Gredebäc, G., Johnson, S., & von Hofsten, C. (2010). Eye tracking in infancy research. Developmental Neuropsychology, 35, 1–19. https://doi.org/10.1080/87565640903325758 Harte, J., Oliviera, A., Frizelle, P., & Gibbon, F. (2016). Children’s comprehension of an unfamiliar speaker accent: A review. International Journal of Language & Communication Disorders, 51, 221–235. https://doi.org/10.1111/1460-6984.12211 Hendriks, P., & Koster, C. (2010). Production/ comprehension asymmetries in language acquisition. Lingua, 120(8), 1887–1897. https://doi.org/10.1016/j.lin gua.2010.02.002 Hopp, H. (2013). Grammatical gender in adult L2 acquisition: Relations between lexical and syntactic variability. Second Language Research, 29, 33–56. https://doi.org/10.1177/ 0267658312461803 Huestegge, L., Radach, R., Corbic, D., & Huestegge, S. (2009). Oculomotor and linguistic determinants of reading development: A longitudinal study. Vision Research, 49, 2948– 2959. https://doi.org/10.1016/j.visres.2009.09.012 Huettig, F., Rommers, J., & Meyer, A. (2011). Using the visual world paradigm to study language processing: A review and critical evaluation. Acta Psychologica, 137, 151–171. https://doi.org/10.1016/j.actpsy.2010.11.003 Ikeda, M., & Saida, S. (1978). Span of recognition in reading. Vision Research, 18, 83–88. https://doi.org/10.1016/0042-6989(78)90080-9 Joseph, H., Nation, K., & Liversedge, S. (2013). Using eye movements to investigate word frequency effects in children’s sentence reading. School Psychology Review, 42, 207–222. https://doi.org/10.1080/02796015.2013.12087485 Koder, F., & Falkum, I. L. (2020). Children’s metonymy comprehension: Evidence from eyetracking and picture selection. Journal of Pragmatics, 156, 191–205. https://doi.org/ 10.1016/j.pragma.2019.07.007 Lemmerth, N., & Hopp, H. (2019). Gender processing in simultaneous and successive bilingual children: Cross linguistic lexical and syntactic influences. Language Acquisition: A Journal of Developmental Linguistics, 26, 21–45. https://doi.org/10.1080/ 10489223.2017.1391815 Lew-Williams, C. (2017). Specific referential contexts shape efficiency in second language processing:Three eye-tracking experiments with 6-and 10-year-old children in Spanish immersion schools. Annual Review of Applied Linguistics, 37, 128–147. https://doi.org/ 10.1017/S0267190517000101 Lew-Williams, C., & Fernald, A. (2007). Young children learning Spanish make rapid use of grammatical gender in spoken word recognition. Psychological Science, 18, 193–198. https://doi.org/10.1111/j.1467-9280.2007.01871.x Liversedge, S. P., Paterson, K. B., & Pickering, M. (1998). Eye movements and measures of reading time. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp. 55–75). Elsevier Science Ltd. https://doi.org/10.1016/B978-008043361-5/50004-3 Lukyanenko, C., & Miller, K. (2018). Children’s and adults’ processing of variable agreement patterns: Agreement neutralization in English. In A. B. Bertolini & M. J. Kaplan (Eds.), Proceedings of the 42nd Annual Boston University Conference on Language Development (Vol. 2, pp. 479–492). Cascadilla Press. Lukyanenko, C., & Miller, K. (2019). Learning the plural from variable input: An eye- tracking study of Chilean children’s plural comprehension. Journal of Monolingual and Bilingual Speech, 1(2), 248–279. https://doi.org/10.1558/jmbs.v1i2.11788

410

14

410

Eye-Tracking Methods in Child SLA Research 141

Marx, C., Hawelka, S., Schuster, S., & Hutzler, F. (2015). An incremental boundary study on parafoveal preprocessing in children reading aloud: Parafoveal masks overestimate the preview benefit. Journal of Cognitive Psychology, 27(5), 549–561. https://doi.org/ 10.1080/20445911.2015.1008494 Mason, L., Tornatora, M. C., & Pluchino, P. (2015). Integrative processing of verbal and graphical information during re-reading predicts learning from illustrated text: An eye movement study. Reading and Writing, 28, 851–872. https://doi.org/10.1007/s11 145-015-9552-5 Miller, K. (2013). Acquisition of variable rules: /S/-lenition in the speech of Chilean Spanish-speaking children and their caregivers. Language Variation and Change, 25, 311– 340. https://doi.org/10.1017/S095439451300015X Miller, K. (2014). Assessing plural morphology in children acquiring /s/-leniting dialects of Spanish. Language, Speech, and Hearing Services in Schools, 45, 173–184. https://doi.org/ 10.1044/2014_LSHSS-13-0032 Miller, K. (2019). Children’s acquisition of sociolinguistic variation. In T. Ionin & M. Rispoli (Eds.), Three streams of generative language acquisition research. Selected papers from the 7th Meeting of Generative Approaches to Language Acquisition –North America, University of Illinois at Urbana-Champaign (pp. 35–58). John Benjamins. Miller, K., & Schmitt, C. (2004). Wide-scope indefinites in English child language. In Proceedings of generative approaches to language acquisition 2003 (GALA) (pp. 317–328.). LOT. http://dspace.library.uu.nl/handle/1874/296242. Miller, K., & Schmitt, C. (2012). Variable input and the acquisition of plural morphology. Language Acquisition: A Journal of Developmental Linguistics, 19, 223–261. https://doi.org/ 10.1080/10489223.2012.685026 Mulak, K., Best, C., Tyler, M., Kitamura, C., & Irwin, J. (2013). Development of phonological constancy: 19-month-olds, but not 15-month-olds, identify words in a non- native regional accent. Child Development, 84(6), 2064–2078. https://doi.org/10.1111/ cdev.12087 Nathan, L.,Wells, B., & Donlan, C. (1998). Children’s comprehension of unfamiliar regional accents: A preliminary investigation. Journal of Child Language, 25(2), 343–365. https:// doi.org/10.1017/S0305000998003444 Nilsen, E., Graham, S., Smith, S., & Chambers, C. (2008). Preschoolers’ sensitivity to referential ambiguity: Evidence for a dissociation between implicit understanding and explicit behavior. Developmental Science, 11, 556–562. https://doi.org/10.1111/ j.1467-7687.2008.00701.x O’Regan, J. K., & Lévy-Schoen, A. (1987). Eye-movement strategy and tactics in word recognition and reading. In M. Coltheart (Ed.), Attention and performance 12: The psychology of reading (pp. 363–383). Lawrence Erlbaum Associates, Inc. Oakes, L. (2010). Infancy guidelines for publishing eye-tracking data. Infancy, 15(1), 1–5. Osaka, N. (1992). Size of saccade and fixation duration of eye movements during reading: Psychophysics of Japanese text processing. Journal of the Optical Society of America, A, Optics, Image & Science, 9(1), 5–13. Pellicer-Sánchez, A.,Tragant, E., Conklin, K., Rogers, M., Serrano, R., & LLanes, Á. (2020). Young learners’ processing of multimodal input and its impact on reading comprehension: An eye-tracking study. Studies in Second Language Acquisition, 42, 577–598. https:// discovery.ucl.ac.uk/id/eprint/10091384 Pollatsek, A., Bolozky, S., Well, A. D., & Rayner, K. (1981). Asymmetries in the perceptual span for Israeli readers. Brain and Language, 14, 174–180. https://doi.org/10.1016/ 0093-934X(81)90073-0

412

142 Paola E. Dussias and Karen Miller

Rayner, K. (1983).The perceptual span and eye movement control during reading. In Keith Rayner (Ed.), Eye movements in reading: Perceptual and language processes (pp. 97–120). Academic Press. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372–422. https://doi.org/10.1037/0033-2909.124.3.372 Rayner, K., & Duffy, S. A. (1986). Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory & Cognition, 14, 191–20. https://doi.org/10.3758/BF03197692 Rayner, K., & McConkie, G. W. (1976). What guides a reader’s eye movements? Vision Research, 16(8), 829–837. https://doi.org/10.1016/0042-6989(76)90143-7 Rayner, K., & Pollatsek, A. (1987). Eye movements in reading: A tutorial review. In Attention and performance 12: The psychology of reading (pp. 327–362). Lawrence Erlbaum Associates, Inc. Rayner, K., Pollatsek, A., Drieghe, D., Slattery, T. J., & Reichle, E. D. (2007). Tracking the mind during reading via eye movements: Comments on Kliegl, Nuthmann, and Engbert. Journal of Experimental Psychology: General, 136, 520–529. https://doi.org/ 10.1037/0096-3445.136.3.520 Rayner, K., Sereno, S. C., & Rayne, G. E. (1996). Eye movement control in reading: A comparison of two types of models. Journal of Experimental Psychology: Human Perception and Performance, 22, 1188–1200. https://doi.org/10.1037//0096-1523.22.5.1188 Ruffman,T., Garnham,W., Import,A., & Connolly, D. (2001). Does eye gaze indicate implicit knowledge of false belief? Charting transitions in knowledge. Journal of Experimental Child Psychology, 80, 201–224. https://doi.org/10.1006/jecp.2001.2633 Schmitt, C., & Miller, K. (2010). Using comprehension methods in language acquisition research. In E. Blom & S. Unsworth (Eds.), Experimental methods in language acquisition research (pp. 35–56). John Benjamins. Schotter, E. R., Bernhard,A., & Rayner, K. (2012). Parafoveal processing in reading. Attention, Perception, & Psychophysics, 74, 5–35. https://doi.org/10.3758/s13414-011-0219-2 Sheng, L., & McGregor, K. K. (2010). Object and action naming in children with specific language impairment. Journal of Speech, Language, and Hearing Research, 53, 1704–1719. https://doi.org/10.1044/1092-4388(2010/09-0180) Tanenhaus, M. K., Spivey- Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, 1632–1634. https://doi.org/10.1126/science.7777863 Tanenhaus, M. K., & Trueswell, J. C. (2006). Eye movements and spoken language comprehension. In M. J. Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholinguistics (2nd ed., pp. 863–900). Elsevier Press. Thornton, R. (2017). The truth value judgment task. In M. Nakayama, Y. Su & A. Huang (Eds.), Studies in Chinese and Japanese language acquisition: In honor of Stephen Crain (pp. 13–39). John Benjamins. Trueswell, J. C., Sekerina, I., Hill, N. M., & Logrip, M. L. (1999). The kindergarten-path effect: Studying on-line sentence processing in young children. Cognition, 73, 89–134. https://doi.org/10.1016/s0010-0277(99)00032-3 Van der Stigchel, S., Hessels, R. S., van Elst, J. C., & Kemner, C. (2017). The disengagement of visual attention in the gap paradigm across adolescence. Experimental Brain Research, 235, 3585–3592. https://doi.org/10.1007/s00221-017-5085-2 van Heugten, M., & Johnson, E. K. (2017). Input matters: Multi-accent language exposure affects word form recognition in infancy. The Journal of the Acoustical Society of America, 142, 2. https://doi.org/10.1121/1.4997604

412

413

412

Eye-Tracking Methods in Child SLA Research 143

van Hout, A., Harrigan, K., & de Villiers, J. (2010). Asymmetries in the acquisition of definite and indefinite NPs. Lingua, 120(8), 1973–1990. https://doi.org/10.1016/j.lin gua.2010.02.006 Verhagen, J., & Blom, E. (2014). Asymmetries in the acquisition of subject-verb agreement in Dutch: Evidence from comprehension and production. First Language, 34, 315–335. https://doi.org/10.1177/0142723714544412 Wass, S., Forssman, L., & Leppanen, J. (2014). Robustness and precision: How data quality may influence key dependent variables in infant eye-tracker analyses. Infancy, 19(5), 427–460. https://doi.org/10.1111/infa.12055 Wilson, M., & Garnsey, S. (2009). Making simple sentences hard:Verb bias effects in simple direct object sentences. Journal of Memory and Language, 60, 368–392. https://doi.org/ 10.1016/j.jml.2008.09.005

41

9 BRAIN IMAGING METHODS Nia Nickerson and Ioulia Kovelman1

Introduction Learning a new language changes the mind and brain. Developmental cognitive neuroscience, a discipline that bridges the science of child cognitive and brain development (Johnson, 2005), offers us theoretical frameworks and neuroimaging tools to examine the mental and neural processes of child language development.

Why Use Neuroimaging Tools to Understand Children’s Second Language Development? There is substantial variation in how children learn to speak and read in a new or second language. In the United States, a teacher may ask if a bilingual child is a typically developing English second language learner (ESL), or if the child is an at-risk learner and needs targeted language and literacy intervention. Cross- linguistic research with young monolinguals suggests a link between children’s phonological processing, literacy success, and potential impairments. As detailed below, neuroimaging helps shed light on phonological development and other related aspects of bilinguals’ language and literacy development, thereby informing theories of bilingualism, sources of cross-linguistic variation, and instructional and clinical practice for young bilingual learners. This chapter is consistent with the book’s objective to understand the language and literacy experiences of sequential bilingual children, those introduced to a second language after acquiring their heritage language.Yet, please note that there is substantial variation in the definition of bilingualism or second language learning in developmental neuroscience. For instance, researchers interested in considering the influence of age on brain development may make group distinctions based DOI: 10.4324/9780367815783-9

41

145

41

Brain Imaging Methods 145

on age of new language acquisition (early vs. late bilingual acquisition). However, researchers interested in comparing the monolingual to bilingual brain may not make such detailed distinctions and consider any child with exposure to more than one language before age 5 to be a simultaneous bilingual.This is further complicated by the observation that, in adults who were exposed to a new language during childhood, the development of the second language often outpaces that of the first language (Lynch, 2017). As neuroimaging can reveal the neural bases of efficiency in language processing, the neuroimaging lens is often deployed to understand the types of second language learning experiences that foster optimal dual language proficiency outcomes.

Theoretical Considerations for the Neurobiology of Childhood Second Language Learning Words are made of sounds, or phonemes—the building blocks of language. In early life, infants are sensitive to all language sounds, even those not used in their native language. For instance, 4-month-olds only exposed to English can hear subtle phonemic differences that are present in Hindi but absent in English (Peña et al., 2010). These infants are considered “universal” learners, ready to learn any language. As infants age, their ability to discriminate between non-native sounds subsides while their proficiency with native sounds improves. During this period, the developing brain becomes fine-tuned or “neurally committed” to the native language(s) (Kuhl, 2010; Petitto et al., 2012). Thus, a school-aged child may find it challenging to learn a new language because the brain has become preferentially organized towards the processing of the child’s first language and potentially less plastic towards new language experiences. This is an important consideration for second language development in school-age children, as literacy, the gateway to academic development, builds upon children’s ability to match language sounds with orthographic representations.

Phonology and Second Language Learning During Childhood A theoretical explanation for later-age difficulties in learning new languages is thought to lie in sensitive periods of brain development crucial for acquiring new phonological regularities. In infancy, the brain, especially the left temporal regions, is thought to have maximal sensitivity to language sounds (Werker & Hensch, 2015). As children age, the neural tissue’s sensitivity to non-native phonological information may subside to allow for improved proficiency and neural fine-tuning in the native language. This is logical; to learn words and sentence structure, a child must establish a stable phonological repertoire rather than an infinitely flexible learning system that may confuse further language development. Werker and Hensch (2015) suggest the existence of a genetically guided age window for maximal sensitivity to language sounds that lay the foundation

416

146 Nia Nickerson and Ioulia Kovelman

for successful language development. Thus, in theory, children exposed to a new language at an increasingly later age should have greater difficulty in mastering the sounds of that new language, as their brain is no longer in the sensitive or optimal window for acquiring language sounds.Yet, the temporal boundaries of this sensitive neural period remain unclear. Importantly, childhood experiences with each of their languages appear to play an equally critical role in shaping children’s dual language proficiency. For instance, Sheng and colleagues (2013) have found that in young Spanish-English bilinguals who were exposed to their two languages before age 7, experience with those languages played a greater role in their proficiency than age of exposure. In this chapter, we explore the relationship between second language learners’ language experiences and the neurobiological factors in phonological development.

Phonology and Literacy Development in School-Age Children Learning to read requires children to master the relation between word sounds and their orthographic forms. Phonological awareness or children’s active sensitivity to the sound units of language, such as knowing that the words “cat” and “hat” rhyme, supports children’s ability to associate word sounds and orthographic forms. In learning to read, young bilinguals are often tasked with mastering literacy in their second language, inevitably facing the challenge of acquiring phonology, phonological awareness, and literacy in a new language all at the same time. Not surprisingly, researchers often find lower phonological awareness and literacy abilities in children who were recently exposed to a new language (Kovelman et al., 2008). Unfortunately, phonological deficits are also a common characteristic of dyslexia, a life-long impairment with written language that often makes successful reading a very effortful act (Snowling et al., 2020). In order to distinguish typical and atypical learners, it is important to consider the impact of bilingualism and second language exposure on brain development for phonological and related literacy processes (see also Chapter 10 in this volume). In the methodological review that follows, we discuss how neuroimaging can be applied to shed light on the phonological development in bilingualism and second language learning in young, school-aged children.

Overview of Developmental Cognitive Neuroimaging Tools Within the brain’s basic physiology, communication between neurons occurs electro-chemically and rapidly. On the other hand, neurons do not store energy. Therefore, when a brain region is active, it incurs greater blood flow with oxygen and other nutrients. Neuroimaging methods thus subdivide into two major categories of “When” and “Where” tools. The “When” tools aim to measure the timing of the brain’s rapid electrical activity. The “Where” tools aim to measure the location of the blood flow or the brain’s hemodynamic response. Let us now

416

147

416

Brain Imaging Methods 147

consider how researchers leverage each of these neuroimaging modalities to answer questions about first and second language acquisition in childhood.

When Tools The brain’s electrochemical response is extremely rapid. For instance, sounds [b]‌ and [d] vary in less than 10 ms worth of information, yet our brain can tell the difference between the two as we hear them. The electroencephalogram (EEG) and event-related potential (ERP) capture the activation generated by the neuron’s electrochemical activity, while magnetoencephalography (MEG) captures magnetic fields generated by the neuron’s electrochemical activity.

Electroencephalogram (EEG) and Event-Related Potential (ERP) One of the most commonly used when neuroimaging tools is an electroencephalogram, or an EEG. EEGs are used to capture the electrical signals generated within the brain, which are continually occurring. EEGs capture the brain’s neural activity, as well as electric signals associated with a particular event, such as an unexpected sound or word, which is referred to as an event-related potential (ERP; Luck, 2005). An ERP is the recorded neural response to a specific stimulus, while an EEG simply refers to the ongoing waveforms depicting electrical activity within the brain. In this chapter we focus on ERPs as an indicator of neural activity for specific language skills.

ERP Use with Children The brain’s ERP signal is captured with a set of electrodes arranged within a net or cap that is placed onto the scalp. Researchers typically use a mapping convention, called the 10/20 convention, that dictates electrode placement in relation to an individual’s anatomical landmarks, such as the nose and the ears. The 10/20 convention ensures that electrode placement is common across research labs to facilitate data comparison and interpretation. The use of different size nets or caps provide researchers flexibility in testing subjects of different ages. Electrode placement for each cap is adjusted to the 10/20 convention as cap sizes change accordingly. Correct electrode placements are also ensured through precise subject-determined measurements. It may take up to 30 minutes to place the cap and ensure signal quality, so it is a good idea to have a cartoon ready to entertain and divert children’s attention during the setup. The computer tasks can be long and tiring, so a child may need a break from the task and the screen, while staying seated, so it is also a good idea to have some entertainment readily available during breaks, such as a card game. A few key advantages of the ERP/EEG in child language research include their affordability, silence, and motion tolerance, making them child friendly and ideal

418

148 Nia Nickerson and Ioulia Kovelman

for studying language development. The ERP’s primary advantage is the high temporal resolution with which it captures the brain’s electrochemical response as it happens in real time. Yet, it is difficult to use an EEG to locate which brain region(s) generated the neural signal subsequently recognized at the surface of the head. One reason for this is that the skull impedes the transmission of electric signals. Conductive gels are often used on participant’s heads to improve signal quality. Another downside of using EEGs is that resulting data can be affected by muscle movements, such as blinking, and the presence of electromagnetic devices.

ERP Components in Language Research ERP components are the resulting neural response from a specific stimulus within the ongoing waveforms of the EEG signal. For instance, language researchers interested in phonological processing will attend to a mismatch negativity (MMN) response, a type of ERP component. Different ERP components reflect the resulting neural activity from specific processes depending on research topics. Within language research, relevant components include the MMN and the rhyme effect (RE) for language processing. During data acquisition, researchers typically present participants with several trials of, for instance, phonetic contrasts. During data analyses, researchers average the participants’ signal across these trials, thus averaging the waveforms containing the ERP components related to the language skill of interest. The latency and amplitude of waveforms are crucial aspects for describing and understanding ERPs. Latency refers to the duration of an event, or the time delay between the presentation of a stimulus and the subsequent response. Amplitude is used to describe the strength of waves. ERP responses peaking prior to ~ 100–200 ms after stimulus onset are often linked to initial sensory experiences, and later ERP responses are often linked to higher-order cognitive processes (Luck, 2005). Let us now consider ERP components commonly linked to language processes.

Mismatch Negativity (MMN) Those using an EEG/ERP to study children’s early phonological development often utilize the mismatch paradigm.The mismatch negativity (MMN) response is a neural index of an individual’s ability to distinguish between phonetic contrasts, such as [ba] and [da]. In such experiments, participants listen to multiple repeats of one “standard” syllable (e.g., [ba]) and one infrequently occurring “deviant” or “odd-ball” syllable (e.g., [da]). MMN is often used in infant studies to understand sensitivity to native versus non-native speech sounds. In developmental research, it is important to consider developmental differences in neural responses reflected by the ERPs, including time of onset, polarity, scalp distribution, and duration of the response. For instance, recent infant research indicates that neural responses to deviant phonetic changes emerge as a positive wave that eventually matures

418

419

Brain Imaging Methods 149

into an adult-like negative response, thus prompting a relabeling of the MMN as a mismatch response or MMR (Garcia-Sierra et al., 2011). Further evidence suggests that a positive MMR reflects less mature processing compared to the MMN associated with adult neural activity (Morr et al., 2002).

MMN in SLA Research Garcia-Sierra et al. (2011) asked if infants exposed to two languages develop an MMN response similar to the typical MMN displayed by 6-to 9-month-old monolingual infants in response to phonological contrasts in their native language. Bilingual infants did not show neural discrimination of either language at 6– 9 months, but, by 10–12 months of age, a response was observed for both phonological contrasts. Furthermore, there was a linear relationship between the strength of infants’ MMN response in a given language and their amount of exposure to that language between the ages of 6 to 12 months.These findings suggest that dual language experience may extend the period of infants’ sensitivity to variations in language input, thus extending the formation of stable phonetic categories and the neural dedication to each of their languages. A complementary explanation is that bilingual development is maximally monolingual-like in the language to which they have the most exposure.

418

Rhyming Effect (RE) With regard to phonological development for learning to read, such as phonological awareness and rhyme judgment ability, researchers may consider the rhyming effect (RE) component. A rhyme judgment task typically asks children to listen or read two words, such as “cat” and “hat,” and decide if they rhyme. The RE is another ERP “component” typically detected at about 250–300 ms after the onset of the target word (“hat”) and peaking at 400–450 ms, eliciting a fairly large negative deflection for non-rhyming versus rhyming target words (Coch et al., 2005). Developmentally, rhyme effects may differ by amplitude and latency, potentially reflecting differences in processing efficiency. In support of this notion, Coch et al. (2005) reported that the amplitude of school aged children’s associated waveforms appeared larger than adult waveforms during an experimental task of identifying rhyming words and nonwords.

RE in SLA Research It is surprising that, in the light of the importance of phonology and phonological awareness in language and literacy development, there is so little neuroimaging work with young second language learners.To the best of our knowledge, the only study that uses an ERP to investigate the neural basis of phonological awareness in young second language learners is a dissertation study by Andersson (2012).

510

150 Nia Nickerson and Ioulia Kovelman

The study asked English-Spanish bilinguals and English monolinguals, ages 6–8, to complete an auditory nonword rhyme judgment task in English that was modeled after Coch et al. (2005). As a group, the monolinguals showed a previously documented RE component, similar to that of adults in latency and amplitude. In the bilinguals, only the high proficiency bilinguals showed an RE-like component with a later onset and a longer latency than in monolinguals.The data further revealed that proficiency/exposure, but not the age of acquisition (AoA), were associated with the quality of their RE component.

ERP Components in SLA Research N400 responses can tell us about how the brain processes meaning and sentence structure. Specifically, N400 is a negative-going wave that peaks between 200 and 500 ms post stimulus and arises in response to semantic anomalies such as in the sentence “He is baking a house” (Kaan, 2007). P600 and LAN/ELAN are additional ERP components that help us to understand syntax processing. P600 is a positive-going wave that peaks between 500 and 800 ms, while LAN is a negative- going wave that peaks between 300 and 400 ms in response to syntactic anomalies such as in the sentence “Yesterday he bake a house” (Kaan, 2007). LAN and especially early LAN (or ELAN) are often interpreted as automated sentence structure processing and the noting of its violation, whereas P600 is often interpreted as the process of sentence repair and the effort to understand an ungrammatical sentence (Kaan, 2007). Second language researchers can use indexes of neural activity associated with detecting this missing grammatical morpheme (P600) or an incurred word use (N400) as shedding light on children’s emerging proficiency with new language structures, words, and sentence processing more generally (Swaab et al., 2012). There are, of course, many other ways in which one can consider the linguistic information and the concomitant neural responses, especially in the context of continuous and naturalistic speech (Brennan et al., 2019). Moving beyond the individual ERP components, researchers can also use the EEG as well as the MEG method that we review below to consider neural activity variation during linguistic events by analyzing EEG rhythms and oscillations (Lajiness-O’Neill et al., 2017).

MEG: Magnetoencephalography Similar to ERPs, magnetoencephalography (MEG) is also a “when” method that tracks neural activity across time. An EEG captures the brain’s electrical activity, whereas MEG captures the magnetic fields associated with this electrical activity. MEG has better neuroanatomical resolution than an EEG because the skull is less of an impediment for the magnetic signal detected through MEG than for the electric signal detected through an EEG (Dale & Halgren, 2001). The localization

510

15

510

Brain Imaging Methods 151

of magnetic signals requires complex source modeling (Gross et al., 2013). Given that MEG and the EEG are measuring a similar type of neural activity, both can detect the above-mentioned MMN, RE, N400, etc. Yet, the MEG community often uses slightly modified nomenclature to denote the use of MEG, such as MMNm or the magnetically assessed equivalent of the mismatch negativity response.

MEG Use with Children With MEG, participants wear an MEG helmet embedded with superconducting quantum interference devices, or SQUIDs, that detect and amplify the brain’s magnetic signal. Yet, the magnetic field produced by the brain’s neurons is quite small relative to the magnetic field of the earth and all the electromagnetic devices that surround us. For that reason, MEG utilizes a large device within a magnetically shielded room in order to detect the magnetic fields emitted by the brain. Most MEG devices have helmets suited to fit an adult head and thus work well for children around age 8 and up, whereas MEG devices suited for infants and younger children are not easily found, but a few do exist. They often utilize specialized helmets better suited for smaller heads.The key advantage of MEG use with children is that this method is silent and does not require confinement to a narrow tube, allowing children to freely move everything but their heads. Another key advantage includes its ability to offer both temporal and spatial information, providing information about what is occurring within the brain in real time and indicating where that activity is occurring. However, MEG involves being in a fixed position with a helmet in a large device in a specialized room that blocks extraneous electromagnetic activity, which makes it cumbersome and the most expensive of all neuroimaging methods.

Examples of MEG Use in SLA Research Theoretical perspectives on early neural sensitivity for language sounds suggest some flexibility. For instance, one hypothesis is that exposure to two languages may prolong the period during which the brain establishes neural commitment to language, as it must consider a greater number of phonological units (Werker & Hensch, 2015).

MEG Use in Second Language Learning Ferjan Ramírez et al. (2017) used MEG to investigate this hypothesis. Two groups of 11-month-old infants exposed either to monolingual English contexts or bilingual Spanish-English contexts within the US underwent MEG neuroimaging during a phoneme discrimination paradigm developed by Garcia-Sierra et al. (2011).A within-group comparison revealed that monolinguals showed a difference

512

152 Nia Nickerson and Ioulia Kovelman

in neural response to Spanish and English during the later time window (260–460 ms) associated with phonological analyses but not during the earlier time window (100–260 ms) associated with acoustic features of language sound.The reverse was true of bilinguals, with significant language differences in the early, but not the late, time window. Nevertheless, between-g roup differences showed similarities in bilinguals’ and monolinguals’ response to English but not to Spanish—as would be logical for a comparison of two groups who share only one of the two languages. The authors interpreted the findings to suggest that, at the age of 11 months, bilinguals’ response to language is guided by acoustic features. Finally, the authors found that bilinguals showed a significantly stronger neural response to Spanish in the bilateral prefrontal regions often associated with bilingual language switching and executive function. Theoretical perspectives on first and second language learning also suggest that children make use of naturally occurring regularities in the continuous linguistic stream to acquire the building blocks of language such as phonemes and words (Aslin et al., 1998). This often includes attending to frequent co-occurring syllables. To uncover the neural bases of language learning efficiency, Wagley et al. (2020) used MEG with 8-to 12-year-old monolingual children, who listened to auditory passages with co-occurring syllables in Italian as a new language. The findings revealed that only the children who showed behavioral evidence of having learned the new Italian words also showed a neural index of learning in their N400 response in the frontal lobe. In contrast, all children (those who appeared to learn the words and those who failed to do so) showed a neural index of learning in the temporal lobe. In other words, the findings can help guide teachers towards language exposure protocols that can engage different types of second language learners in being efficient in learning words when listening to a new language.

When Tools Summary In sum, the EEG and MEG methods offer excellent information about the timing of neural activity in second language learners.This timing information can inform researchers on the development of sound sensitivity within children’s second language (MMN) and how this sensitivity guides them to learning new words (N400). Such insight can then, in turn, inform second language instruction by informing the teachers on effective practices for exposing children to new language sounds and addressing individual differences, such as differences in attention, during word learning.

Where Tools (Hemodynamics) When a certain brain region processes language, it needs oxygen to flow to that region, carried by hemoglobin molecules within the bloodstream; as it uses up

512

513

512

Brain Imaging Methods 153

the oxygen as “fuel,” the amount of oxygen in the blood is depleted. This blood supply is known as a hemodynamic response. Unlike the rapid and transient electric neural response, the hemodynamic response is slow and sustained, rising about 5 s after the onset of a stimulation and decaying for about 10–15 s after the stimulation ends (Lindquist, 2015). In other words, the hemodynamic response stays in place long enough to allow researchers to capture its neuroanatomical location, alongside some temporal information, through neuroimaging modalities such as functional magnetic resonance imaging (fMRI) or functional near-infrared spectroscopy (fNIRS) systems.

Using Where Tools (Neuroanatomical Principle) Language development researchers using where methodologies often do so by posing neuroanatomically based hypotheses. Auditory language perception is supported by the bilateral auditory cortex, whereas print perception is supported by occipital, inferior temporal and fusiform regions. Reading in particular entails the emergence of print-specific regions such as the visual word form area (VWFA) (Kovelman et al., 2012). Phonological processing typically engages the posterior aspect of the left superior temporal gyrus (pSTG) and the posterior aspect of the left inferior frontal gyrus that comprise the phonological “dorsal” language network, regions connected by the arcuate fasciculus white matter tract that facilitates their communication (Skeide & Friederici, 2016). Thus, phonological awareness such as visual rhyme judgment tasks typically engage the “dorsal” network as well as parietal and VWFA regions that help children associate language sounds with print. Multiple aspects of the “dorsal” network, including the anterior aspect of the left STG, also participate with syntactic processing and its development (Skeide & Friederici, 2016). Lexico-semantic processing typically engages the left middle temporal gyrus (MTG) and the ventral aspect of the left inferior frontal gyrus (vIFG) that comprise the semantic or the “ventral” language network. Many language tasks require memory and attention, thereby engaging the left middle frontal gyrus (MFG) and other aspects of the frontal and parietal regions that help support attention and verbal working memory (Hagoort & Indefrey, 2014). Researchers interested in using where tools to study the impact of bilingualism have a variety of tools available to them. First, they can take a “whole-brain” approach and examine the functionality of all possible active regions during a given task. Second, they can take a “region of interest” approach, focusing on specific regions through a guided hypothesis. One can, for instance, compare left STG activation during a visual versus an auditory rhyme judgment task, or between speakers of varying proficiency. Researchers can also consider network- type analyses. For instance, the psychophysiological interaction (PPI) analyses allow one to identify brain regions that work together or in synchrony during a particular task. Multi-voxel pattern analysis (MVPA) is another approach to analyzing brain activity in terms of network dynamics and involves searching for

514

154 Nia Nickerson and Ioulia Kovelman

highly reproducible spatial patterns of activity that differentiate across experimental conditions, such as, again, a visual versus an auditory rhyme judgment task (Poldrack et al., 2011). Note that network-type analyses can be used with the “when” methods.

Functional Magnetic Resonance Imaging (fMRI) fMRI measures the brain’s hemodynamic response. The brain’s reliance on oxygen to function alters the ratio of oxygenated and deoxygenated blood cells within active regions. In order to uncover active regions, or regions incurring blood flow, blood-oxygen-level-dependent (BOLD) response levels are recorded. MRI machines use magnetic fields to detect the amount of oxygen in the blood. In particular, they are really great at seeing where the blood is flowing, giving us information about which specific regions of the brain are responsive to certain types of stimuli or tasks.

MRI-Based Anatomical Imaging As a giant magnet, MRI is capable of detecting the magnetic properties of the water molecules in the brain. This provides information about different types of brain tissue. For instance, grey matter has more water molecules than the white matter or the fiber tracts that connect different regions of the brain to each other (Zatorre, et al., 2012). **Because of this difference in tissue make-up, an MRI provides precise details about neuroanatomy: where regions are, how big they are, how they are connected to each other, and how strong those connections are (for more information, see Yousaf et al., 2018).

fMRI Use with Children In fMRI, children are asked to lay still on their backs in the narrow tube of an MRI machine with their head confined in an even narrower head coil. The coil is often equipped with a mirror or specialized glasses that allow a child to see the experimental stimuli. In order to prepare children for the large size of the machine and for actual testing session circumstances, researchers often run through experimental protocol-like scenarios within a “mock scanner” that resembles an MRI machine. fMRIs are notoriously loud (~100–110 dB), requiring children to wear noise-cancelling headphones and padding. To prepare children for the noxious noise, researchers may share the MRI noise file to parents to play to their children in advance of their visit to the MRI facility. fMRI can thus be a rather cumbersome approach for studying young and awake children, and especially for auditory language. The MRI’s magnetic field is unsafe for anyone with metal implants or braces, making it inaccessible to certain populations (e.g., children with cochlear implants). Finally, similar to MEG, fMRI also requires a specialized room, but this

514

15

514

Brain Imaging Methods 155

time to protect the surroundings from its loud noise. In sum, the key advantage of fMRI is its excellent neuroanatomical resolution. Key disadvantages include the expense, noise, and awkwardness associated with operating a large and loud magnet. Nevertheless, MRI is an increasingly popular method as it gives much detail about brain anatomy and function.

Using Where Tools (Neuroanatomical Principle) Language development researchers using where methodologies often do so by posing neuroanatomically based hypotheses. Auditory language perception is supported by the bilateral auditory cortex, whereas print perception is supported by occipital, inferior temporal and fusiform regions. Reading in particular entails the emergence of print-specific regions such as the visual word form area (VWFA) (Kovelman et al., 2012). Phonological processing typically engages the posterior aspect of the left superior temporal gyrus (pSTG) and the posterior aspect of the left inferior frontal gyrus that comprise the phonological “dorsal” language network, regions connected by the arcuate fasciculus white matter tract that facilitates their communication (Skeide & Friederici, 2016). Thus, phonological awareness such as visual rhyme judgment tasks typically engage the “dorsal” network as well as parietal and VWFA regions that help children associate language sounds with print. Multiple aspects of the “dorsal” network, including the anterior aspect of the left STG, also participate with syntactic processing and its development (Skeide & Friederici 2016). Lexico-semantic processing typically engages the left middle temporal gyrus (MTG) and the ventral aspect of the left inferior frontal gyrus (vIFG) that comprise the semantic or the “ventral” language network. Many language tasks require memory and attention, thereby engaging the left middle frontal gyrus (MFG) and other aspects of the frontal and parietal regions that help support attention and verbal working memory (Hagoort & Indefrey, 2014).

Examples of fMRI Use in SLA Research Unlike adults, children are often introduced to a new or second language while they are still acquiring their first language. Over the course of development, sociocultural or schooling factors may further prompt these children to switch dominance or become more proficient in their second than their first language. Below, we consider how MRI has helped us gain a better understanding of how changes in learning environments influence dual language development. Young second language learners often enter school with greater L1 than L2 language aptitude. Yet, this difference is likely to change direction when L2 is the dominant language of schooling. To illuminate mechanisms supporting this dynamic change in proficiency, Archila-Suerte et al. (2015) asked Spanish-English speaking children and adults in the US to participate in a single-word reading task

516

156 Nia Nickerson and Ioulia Kovelman

in both of their languages while undergoing fMRI neuroimaging. The authors were particularly interested in noting differences that may be associated with the aforementioned “language dominance shift” during English/L2 schooling. Behaviorally, children and adults showed a similar language proficiency for Spanish, while adults outperformed children in English, as would be consistent with English-dominant schooling for this population of Spanish-speaking immigrants in the US. Neuroimaging findings revealed that both adults and children recruited similar regions within the known language network of the brain for both of their languages.Yet, adults showed more brain activity in MTG. In English, adults showed stronger MTG activation in the left hemisphere, where it is associated with meaning processing, as well as in the right hemisphere, where it is associated with more general audio-visual word processing. In contrast, in Spanish, the adults only showed stronger activation in the right hemisphere MTG region. Because children and adults had the same level of Spanish proficiency, the stronger right MTG activation in adults was thus thought to reflect maturation changes in brain development and the efficiency of word processing in general. In contrast, because the adults had better English proficiency than the children, the stronger left MTG activation in adults was thought to suggest developmental improvements in word meaning recognition in English orthography. In other words, as child second language learners develop and learn, their language proficiency changes but so does their overall cognitive and brain development. Neuroimaging can help us differentiate developmental changes in general cognitive/maturational processes from those of second language processing. Another relevant consideration that may impact children’s language performance and processing is the age of acquisition (AoA) for their second language. How might this AoA influence a later processing of sounds within their second language? To answer this question, Archila-Suerte et al. (2015) asked English monolinguals, early- exposed Spanish- English bilinguals (exposure to English prior to age 9), and later-exposed Spanish-English bilinguals to complete an English phonological task during fMRI neuroimaging. The findings revealed that, on average, English monolinguals had better English proficiency than both of the bilingual groups, whereas early-exposed bilinguals outperformed the later- exposed group. Neuroimaging findings revealed that later- exposed bilinguals showed the strongest increase in activation in bilateral frontal and parietal regions, whereas monolinguals had the least amount of activation in the parietal region and a similarly low level of activation to the early-exposed bilinguals in the bilateral frontal lobes. These findings suggest that later-exposed bilinguals more actively recruit language regions when processing their second language compared to monolinguals and early-exposed bilinguals. Intriguingly, early bilinguals showed the least amount of activation in the left temporal regions relative to the other groups who were monolingual at an early age. The findings suggest that early exposure to two languages may influence the efficiency and manner in which bilinguals process the sounds of language.

516

517

516

Brain Imaging Methods 157

Is it possible that, in addition to functional brain changes, early bilingual exposure also changes brain structures? A study by Archila-Suerte et al. (2018) suggests that the answer is “yes.” In particular, the study investigated children with varying “bilingual balance,” or the degree of similarity between their English and Spanish proficiencies. Remarkably, it turned out that, while the left STG, IFG and MFG cortical regions of interest were thinner in balanced than unbalanced bilinguals, their subcortical putamen region was thicker. The putamen is the part of the brain associated with the control of articulatory movement. In other words, it is possible that children who are maximally balanced in their two languages are the most efficient in speaking them—they may have a more efficient articulatory system and require less reliance on higher-order cognitive skills. In sum, early and systematic exposure to two languages can improve the brain’s efficiency in both hearing and producing language sounds, as evidenced through functional and anatomical imaging studies with bilinguals (Archila-Suerte et al., 2018). The use of fMRI thus deepens our understanding of how age of exposure as well as proficiency shape the structure and the function of the bilingual brain.

fNIRS—Functional Near-Infrared Spectroscopy In fNIRS neuroimaging, participants wear a cap or headband with optodes that capture subtle changes in the color of the bloodstream as an index of hemodynamic response. The optodes, a light emitter, and a detector are usually positioned 3 cm apart from each other, thus creating a data channel with peak signal midway between the two optodes and about 3 cm under the surface. Thankfully, many of the language regions are evolutionarily new and within the 3 cm cortical layers at the surface of the brain, making fNIRS well-suited for language research. In sum, while fNIRS lacks the spatial precision of the fMRI, it is more cost effective, silent, portable, and child friendly.

fNIRS Use with Children fNIRS is an incredibly child-friendly neuroimaging methodology. Similar to an EEG, during fNIRS children typically wear a cap while completing a computer- based language task.The general setup and system use with children is similar to the one described for an EEG, minus the use of conductive gel minimizing the messiness of the setup procedure. Similar to an EEG, the size of the system allows for portability, and the lack of system noise provides opportunities for use in school as well as laboratory settings for ecologically valid spoken/auditory second language studies.

Examples of fNIRS Use in SLA Research Orthographic systems vary in how language sounds map onto print. How might learning to read in different orthographies influence children’s emerging neural

518

158 Nia Nickerson and Ioulia Kovelman

architecture for literacy? To answer this question, Jasińska et al. (2017) asked young English monolinguals as well as Spanish-English and French-English bilinguals to read English words during fNIRS neuroimaging. Of the three languages, Spanish is most transparent or has most predictable sound-to-letter mapping, while English is least transparent and French is somewhere in between. The findings revealed that, relative to the monolinguals, Spanish bilinguals showed greater activation in the left STG region, the region thought to support automated connections between sound and print. In contrast, English monolinguals showed greater activation in the left IFG region thought to support analytically complex connections between sound and print. French bilinguals were “in between” the other two groups, with stronger left STG activation relative to English monolinguals but stronger left IFG activation relative to Spanish bilinguals. The findings suggest that orthographic experiences with another language can have a language-specific impact on bilinguals’ phonological literacy networks. How does bilingualism influence the development of literacy over time? Jasińska and Petitto (2014) used fNIRS to investigate bilingual children and adults, who read regularly and irregularly spelled words. First, the study revealed the effect of age: the youngest participants showed robust STG recruitment across tasks as is consistent with the idea that early readers place heavier reliance on sound-to-letter mappings. In contrast, older children and adults showed more task-specific patterns of brain activity. Second, the findings revealed that bilinguals showed stronger activation in the left STG and IFG language regions, as well as the MFG regions associated with attention and working memory, when compared to monolinguals. This discovery exemplifies the common finding across all the studies discussed here—bilingual experiences change how individuals learn to speak and to read, making the bilingual brain and cognitive processes qualitatively different from those of monolinguals, yielding a “neural signature” of bilingualism.

Conclusion Neuroimaging tools have transformed our understanding of second language acquisition beyond the insight provided by behavioral results. Looking beneath the performance differences displayed between monolingual and bilingual children, we now know that crucial differences exist within the structure and function of the unique bilingual brain. With a more complete understanding of its complexity, parents and educators will be better equipped to support the language development of young second language learners. Theories of literacy development place children’s ability to process language sounds and map those sounds onto print at the forefront of reading acquisition. As literacy is the foremost educational task in early grades, phonological processes and their role in learning to read have also been at the forefront of second language development research in young children. Contexts of second language and literacy development are complex: children vary in the age of second language

518

519

518

Brain Imaging Methods 159

exposure and the types of experiences they have with the second language, as well as in first language typology and whether or not they are also learning to read in the first language. Neuroimaging has been effectively deployed to offer complementary evidence on how these and many other factors influence children’s emerging second language literacy aptitude. The emerging evidence reviewed in this chapter offers three primary themes. First, neuroimaging helps us better understand the mechanisms that support developmental improvements in second language aptitude, such as the differential transfer of L1 literacy skills (Jasińska et al., 2017) and the general reduction in cognitive load and improvement in the automaticity of phonological processing (Archila-Suerte et al., 2018). Second, neuroimaging can help us disentangle different factors in children’s second language improvement, such as the effects of language learning experiences versus general cognitive maturation (Archila-Suerte et al., 2015) or individual differences in learning aptitude (Wagley et al., 2020). Finally, and most importantly, neuroimaging reveals that learning a new language can change brain function and anatomy, leaving an enduring mark on the language and cognitive function of children’s brains and their cognitive development relative to monolinguals (Archila-Suerte et al., 2015; Jasińska et al., 2014). As a result, in building theories of second language learning, one must consider not only what we know about children’s first and second language development but also how children’s language and literacy may change overall as a result of children’s expanding linguistic and cross-linguistic experiences. While prior work has placed much emphasis on maturational and otherwise biological factors in how the brain reorganizes itself for multiple language acquisition, moving forward, researchers may also consider second language development from a sociocultural perspective, investigating the supportive role of culture on linguistic identity, which may, in turn, be reflected in performance and brain organization for language.

Further Readings ERP Luck, S. J. (2014). An introduction to the event-related potential technique. MIT Press. Introductory reading for using the ERP and EEG techniques. Provides a guide for designing, conducting, and analyzing data from ERP experiments. Payne, B. R., Ng, S., Shantz, K., & Federmeier, K. D. (2020). Event-related brain potentials in language processing: The N’s and the P’s. Psychology of Learning and Motivation. An in-depth review of frequently used components within ERP research, including information associated with positive-and negative-going waveforms. MEG Hansen, P., Kringelbach, M., & Salmelin, R. (Eds.). (2010). MEG: An introduction to methods. Oxford University Press. An introductory reading into the use of MEG methodology.

610

160 Nia Nickerson and Ioulia Kovelman

Hari, R., & Puce, A. (2017). MEG-EEG primer. Oxford University Press. An in-depth guide to understanding the use of both MEG and EEGs, including the practicality and function of both systems, in addition to data collection, processing, and response inference information. fMRI Buxton, R. B. (2009). Introduction to functional magnetic resonance imaging: Principles and techniques. Cambridge: Cambridge University Press. Huettel, S. A., Song, A. W., & McCarthy, G. (2004). Functional magnetic resonance imaging (Vol. 1). Sunderland, MA: Sinauer Associates. Two thorough readings introducing fMRI methods. Examples of Studies in Language Research Sulpizio, S., Del Maschio, N., Fedeli, D., & Abutalebi, J. (2020). Bilingual language processing: A meta-analysis of functional neuroimaging studies. Neuroscience & Biobehavioral Reviews, 108, 834–853. Enge, A., Friederici, A. D., & Skeide, M. A. (2020). A meta-analysis of fMRI studies of language comprehension in children. NeuroImage, 116858. fNIRS Bunce, S. C., Izzetoglu, M., Izzetoglu, K., Onaral, B., & Pourrezaei, K. (2006). Functional near-infrared spectroscopy. IEEE engineering in Medicine and Biology Magazine, 25(4), 54–62. Pinti, P., Tachtsidis, I., Hamilton, A., Hirsch, J., Aichelburg, C., Gilbert, S., & Burgess, P. W. (2020). The present and future use of functional near-infrared spectroscopy (fNIRS) for cognitive neuroscience. Annals of the New York Academy of Sciences, 1464(1), 5. Two published readings introducing the use of fNIRS methods.

Tools and Resources 1. ERP/EEG • Erpinfo.org/resources provides links to information on best practices and procedures to employ when using ERP research. This includes Boot Camp slides, practice data processing, and other helpful information. The website was developed by Steve Luck and Emily Kappenman. 2. MEG • For more information on MEG refer to: https://ilabs.washington.edu/meg- brain-imaging • A helpful video depicting MEG use with children can be found at: https:// cookchildrens.org/neurology/advanced-technology/Pages/magnetoencephal ography.aspx 3. fMRI • First, there are repositories of fMRI data—e.g., https://openneuro.org—that may prove helpful for researchers looking to become more familiar with analyzing fMRI data. • An additional resource, www.jove.com/t/1309/making-mr-imaging-child-s- play-pediatr ic-neuroimaging-protocol, provides information on using fMRI with children. 4. Videos of fMRI use • www.youtube.com/watch?v=ow3Z3KYUOgE

610

16

610

Brain Imaging Methods 161

•

www.jove.com/v/5212/fmri-functional-magnetic-resonance-imaging (subscription needed to view) 5. fNIRS • The Society for fNRIS has a website, https://fnirs.org/, that provides access to helpful resources including training courses, system information, software packages, and much more.

Discussion Questions 1. How might educators use neuroscientific findings on children’s phonological development to build better literacy instruction for bilingual learners? 2. Which system would you use if you were interested in understanding how second language learning influences the anatomy of the developing brain? 3. Which system would you use if you wanted to find out if there were timing differences in children’s processing of words in their first vs. second languages?

Note 1 The authors would like to thank the members of the Language and Literacy Laboratory at the University of Michigan for their help in writing and editing the chapter, as well as the National Institute of Health for supporting this work (NIH R01 092498).

References Andersson, A. (2012). Second language acquisition in 6-to 8-year-old native Spanish- speaking children: ERP studies of phonological awareness, semantics, and syntax. [ProQuest Information & Learning]. Dissertation Abstracts International: Section B: The Sciences and Engineering, 74(1-B(E)). Archila-Suerte, P.,Woods, E. A., Chiarello, C., & Hernandez, A. E. (2018). Neuroanatomical profiles of bilingual children. Developmental Science, 21(5), 1–14. Archila-Suerte, P., Zevin, J., & Hernandez, A. E. (2015). The effect of age of acquisition, socioeducational status, and proficiency on the neural processing of second language speech sounds. Brain and Language, 141, 35–49. Aslin, R. N., Saffran, J. R., & Newport, E. L. (1998). Computation of Conditional Probability Statistics by 8-Month-Old Infants. Psychological Science, 9(4), 321–324. Brennan, J. R., Lajiness-O’Neill, R., Bowyer, S . M., Kovelman, I., & Hale J. T. (2019). Predictive sentence comprehension during story-listening in autism spectrum disorder. Language Cognition and Neuroscience. 34(4), 428–439. Coch, D., Grossi, G., Skendzel, W., & Neville, H. (2005). ERP nonword rhyming effects in children and adults. Journal of Cognitive Neuroscience, 17(1), 168–182. Dale, A. M., & Halgren, E. (2001). Spatiotemporal mapping of brain activity by integration of multiple imaging modalities. Current Opinion in Neurobiology, 11(2), 202–208. Ferjan Ramírez, N., Ramírez, R. R., Clarke, M., Taulu, S., & Kuhl, P. K. (2017). Speech discrimination in 11- month- old bilingual and monolingual infants: A magnetoencephalography study. Developmental Science, 20(1), e12427. Garcia-Sierra, A., Rivera-Gaxiola, M., Percaccio, C. R., Conboy, B. T., Romo, H., Klarman, L., Ortiz, S., & Kuhl, P. K. (2011). Bilingual language learning: An ERP study relating

612

162 Nia Nickerson and Ioulia Kovelman

early brain responses to speech, language input, and later word production. Journal of Phonetics, 39(4), 546–557. Gross, J., Baillet, S., Barnes, G. R., Henson, R. N., Hillebrand, A., Jensen, O., Jerbi, K., Litvak, V., Maess, B., Oostenveld, R., Parkkonen, L.,Taylor, J. R., van Wassenhove,V.,Wibral, M., & Schoffelen, J.-M. (2013). Good practice for conducting and reporting MEG research. NeuroImage, 65, 349–363. Hagoort, P., & Indefrey, P. (2014). The neurobiology of language beyond single words. Annual Review of Neuroscience, 37, 347–362. Jasińska, K. K., Berens, M. S., Kovelman, I., & Petitto, L. A. (2017). Bilingualism yields language-specific plasticity in left hemisphere’s circuitry for learning to read in young children. Neuropsychologia, 98, 34–45. Jasińska, K. K., & Petitto, L. A. (2014). Development of neural systems for reading in the monolingual and bilingual brain: New insights from functional near infrared spectroscopy neuroimaging. Developmental Neuropsychology, 39(6), 421–439. Johnson, M. H. (2005). Developmental Cognitive Neuroscience: An introduction (2nd ed.). Blackwell Publishing. Kaan, E. (2007). Event- related potentials and language processing: A brief overview. Language and Linguistics Compass, 1(6), 571–591. Kovelman, I., Baker, S. A., & Petitto, L. A. (2008). Age of first bilingual language exposure as a new window into bilingual reading development. Bilingualism: Language & Cognition, 11(2), 203–223. Kovelman, I., Norton, E. S., Christodoulou, J. A., Gaab, N., Lieberman, D. A., Triantafyllou, C.,Wolf, M.,Witfield-Gabrieli, S., & Gabrieli, J. D. E. (2012). Brain basis of phonological awareness for spoken language in children and its disruption in dyslexia. Cerebral Cortex, 22(4), 754–764. Kuhl, P. K. (2010). Brain mechanisms in early language acquisition. Neuron, 67(5), 713–727. Lajiness-O’Neill, R., Brennan, J. R., Moran, J. E., Richard, A. E., Flores, A.-M., Swick, C., Goodcase, R., Andersen, T., McFarlane, K., Rusiniak, K., Kovelman, I., Wagley, N., Ugolini, M., Albright, J., & Bowyer, S. M., (2017). Patterns of altered neural synchrony in the default mode network in autism spectrum disorder revealed with magnetoencephalography (MEG): Relationship to clinical symptomatology. Autism Research, 11(3), 434–449. Lindquist, M. (2015). Functional magnetic resonance imagery, Analysis of. In International encyclopedia of the social & behavioral sciences (2nd ed., pp. 525–531). Elsevier Inc. Luck, S. (2005). An introduction to the event-related potential technique. Cambridge: MIT Press. Lynch, A. (2017). Bilingualism and second language acquisition. Second and foreign language education. Encyclopedia of Language and Education, 4, 43–55. Morr, M. L., Shafer, V. L., Kreuzer, J. A., & Kurtzberg, D. (2002). Maturation of mismatch negativity in typically developing infants and preschool children. Ear and Hearing, 23, 118–136. Nobre, A. C., Allison, T., & McCarthy, G. (1994). Word recognition in the human inferior temporal lobe. Nature, 372(6503), 260–263. Peña, M., Pittaluga, E., & Mehler, J. (2010). Language acquisition in premature and full- term infants. Proceedings of the National Academy of Sciences, 107(8), 3823–3828. Petitto, L. A., Berens, M. S., Kovelman, I., Dubins, M., Jasinska, K., & Shalinsky, M. H. (2012). The “Perceptual Wedge Hypothesis” as the basis for bilingual babies’ phonetic processing advantage: New insights from fNIRS brain imaging. Brain & Language, 121(2), 130–143.

612

613

612

Brain Imaging Methods 163

Poldrack, R. A., Mumford, J. A., & Nichols, T. E. (2011). Handbook of functional MRI data analysis. New York, NY: Cambridge University Press. Sheng, L., Bedore, L. M., Peña, E. D., & Fiestas, C. (2013). Semantic development in Spanish- English bilingual children: Effects of age and language experience. Child Development, 84(3), 1034–1045. Skeide, M. A., & Friederici, A. D. (2016). The ontogeny of the cortical language network. Nature Reviews Neuroscience, 17(5), 323–332. Snowling, M. J., Hulme, C., & Nation, K. (2020) Defining and understanding dyslexia: Past, present and future. Oxford Review of Education, 46(4), 501–513. Swaab, T. Y., Ledoux, K., Camblin, C. C., & Boudewyn, M. A. (2012). Language-related ERP components. In S. J. Luck & E. S. Kappenman (Eds.), The Oxford handbook of event- related potential components (pp. 397–439). Oxford University Press. Wagley, N. (2020). Language and literacy development as revealed through the bilingual brain [ProQuest Information & Learning]. Dissertation Abstracts International: Section B:The Sciences and Engineering, 81(8-B). Wagley, N., Lajiness-O’Neill, R., Hay, J. S., Ugolini, M., Bowyer, S. M., Kovelman, I., & Brennan, J. R. (2020). Predictive processing during a naturalistic statistical learning task in ASD. Eneuro, 7(6). Werker, J. F., & Hensch, T. K. (2015) Critical periods in speech perception: New directions. Annual Review of Psychology, 66(1), 173–196. Yousaf,T., Dervenoulas, G., & Politis, M. (2018).Advances in MRI methodology. International Review of Neurobiology, 141, 31–76. Zatorre, R. J., Fields, R. D., & Johansen- Berg, H. (2012). Plasticity in gray and white: Neuroimaging changes in brain structure during learning. Nature Neuroscience, 15(4), 528–536.

164

10 RESEARCH METHODS FOR L2 CHILDREN WITH SPECIAL NEEDS Li Sheng and Sharon R. Hollenbach

Introduction This chapter examines methodological approaches for studying children between the ages of 4 and 12 who are exposed to multiple languages and have special needs. As speech-language pathology professionals, we focus our discussion of special needs on developmental disorders (DDs) that negatively impact oral communication, i.e., comprehension and expression of spoken language. Subtypes of DD that have received the most attention in the bilingualism literature are developmental language disorder (DLD) and autism spectrum disorder (ASD). Readers who are interested in the interface between L2 literacy development and specific learning difficulties may peruse works by Kormos (2017a, 2017b). The study of bilingual children1 with a DD (Bi-DD) utilizes a wide variety of research methods, and we are unable to give close attention to each of them in this chapter. At the same time, the study of Bi-DD attempts to answer a set of research questions that are uniquely motivated by the needs of this population. Different questions necessitate different research designs and methods. Therefore, we adopt a different organizational structure for this chapter. In the first section, we summarize common basic science and clinical research questions in the study of Bi-DD. Under each research question, we highlight relevant research methods used to answer the question or present methodological standards that guide the generation of high-quality translational data to establish the evidence base for clinical practice. In the second part of the chapter, we outline the challenges that come with studying Bi-DD and discuss the methodological implications of these challenges.

DOI: 10.4324/9780367815783-10

164

165

164

Methods for L2 Children with Special Needs 165

Common Research Questions and Research Methods Used in Empirical Studies Bi-DD and Risk Status One of the most frequently encountered questions in the study of Bi-DD is: Does exposure to two languages present an additional risk for language acquisition in children with a DD? Even in typically developing children, in spite of mounting evidence that the human language capacity can accommodate two or even more linguistic systems, the decision to raise a child bilingually is not easy and is often met with conflicting advice from professionals and family members. Children with a DD usually have less efficient language learning capacity and lag behind typical age peers on acquiring their native language. Would the demand of acquiring two languages overburden the already hindered system and lead to further delay and extraordinary difficulties with both languages? To answer this question, researchers often pit Bi-DD participants against a comparison group of monolingual children with the same diagnosis. Studies of this nature have included various disorder types (e.g., ASD, DLD, and specific learning disabilities such as dyslexia), a range of geographic locations (e.g., Canada, China, Italy, and the United States), multiple language combinations, and outcome measures across language domains. For example, Petersen et al. (2012) used standardized tests such as the Peabody Picture Vocabulary Test (Dunn, 2007) to measure receptive vocabulary and the Preschool Language Scale (Zimmerman et al., 2011) to measure the language comprehension and production of the Bi-ASD children and the monolingual ASD control group. Paradis et al. (2003) coded the use of grammatical morphemes in spontaneous language samples produced by bilingual and monolingual children with DLD.Vender et al. (2018) designed a cloze task that assessed the ability to generate plural noun inflections of nonwords in bilingual and monolingual children with dyslexia. The main finding is that bilingual children with a DD usually performed comparably to monolinguals with a DD, when the stronger language or both languages of bilinguals were considered. Extensions of this line of work have included testing the bilinguals in both languages and comparing them to two monolingual groups with the same diagnosis (e.g., Paradis et al., 2003), four-way comparisons that fully cross diagnostic status (DD vs. typical) and bilingual status (bilingual vs. monolingual) (e.g., Gonzalez-Barrero & Nadig, 2019), and comparing two DD groups who were sequentially bilingual and sequentially trilingual, respectively (e.g., To et al., 2012). These studies further buttress the conclusion that children with significant language learning impairments are able to become bilingual or even multilingual.

Language and Cognitive Profiles of Bi-DLD To pave the way for effective assessment and treatment, one must have good descriptive data about the nature and extent of deficits in Bi-DD populations. Within this line of research, the Bi-DLD population has been studied more than

61

166 Li Sheng and Sharon R. Hollenbach

the Bi-ASD population. These studies on Bi-DLD aim to delineate the dual language profiles of Bi-DLD in comparison to typically developing bilingual peers (Bi-TD) in all domains of language: phonological memory (repetition of nonsense words), lexical development (using standardized tests of receptive and expressive vocabulary), semantic development (using semantic fluency and word association tasks to examine the relationships among words), morphosyntactic ability (using spontaneous language samples to measure utterance length and complexity), and overall quality of discourse (eliciting story samples to examine narrative macrostructure and microstructure). All of the methods for assessing language outcomes discussed in other chapters of this book should, in principle, be applicable to the study of Bi-DD populations. A method that merits special attention is narrative sampling, one of the most frequently used and arguably the most established method for studying expressive language in Bi-DD because it can be readily adapted across languages, ages, and diagnoses. Narrative sampling involves eliciting speech via wordless picture books or specific prompts. Several standardized protocols exist (see Table 10.1 for a summary of narrative sampling protocols). Among them, the Edmonton Narrative Norms Instrument (ENNI, Schneider et al., 2005) and the Multilingual Assessment Instrument for Narratives (MAIN; Gagarina et al., 2012, 2019) are freely available to researchers. The frog narrative (e.g., Frog, Where Are You? Mayer, 1969) elicitation protocols can be purchased at a low price at the Systematic Analysis of Language Transcript (SALT; Miller & Iglesias, 2017) software website; or, alternatively, researchers can create their own protocols using the frog storybooks. The SALT software also provides access to a database that contains normative samples from English monolingual children for the ENNI, the frog stories, and the Test of Narrative Language (TNL; Gillam & Pearson, 2017) and normative samples from Spanish-English bilingual participants for the frog stories. These are useful reference data when trying to determine if a bilingual child meets the criterion for having a language disorder. There are a number of factors that can make one narrative task more appropriate than others when testing bilingual populations. Most of these narrative tasks utilize wordless picture sequences, making them accessible to all populations. Despite this neutral format, some of the images or scenes may be culture specific or not equally familiar to all individuals, causing unintentional bias. The MAIN is an example of a relatively culturally fair task given the careful consideration of cultural factors in the creation of the pictorial materials. Additionally, task materials, including story scripts, comprehension questions, and scoring protocols, may only be available in English or a handful of additional languages and would require additional ad hoc translation before the task becomes viable for other language speakers. The MAIN task materials are available in many languages. According to the test developers, the MAIN empirical database now consists of more than 2,500 narratives, which bodes well for researchers who need norm-referenced scores on this instrument. By comparison, the frog stories’ task materials are available

61

167

newgenrtpdf

61

TABLE 10.1 Narrative tasks Additional details

Available languages

Citation

Edmonton Narrative Norms Instrument (ENNI)

Tell (story generation) task

Standardized, normed measure, for ages 4–9 years

Materials in English and French, can be conducted in any language

Frog narratives

Tell, retell, and comprehension tasks

Can be conducted in any language

Multilingual Assessment Instrument for Narratives (MAIN)

Tell, retell, and comprehension

Standardized measure, retell normed for monolingual Spanish (5:10–10:7) and bilingual Spanish-English (5:0–9:9), tell normed for bilingual Spanish-English (5:0–9:7) Standardized measure

Schneider, P., Dubé, R.V., & Hayward, D. (2005). The Edmonton Narrative Norms Instrument. Retrieved from University of Alberta Faculty of Rehabilitation Medicine website: www.rehabresearch. ualberta.ca/enni. Govindarajan, K. & Paradis, J. (2019). Narrative abilities of bilingual children with and without Developmental Language Disorder (SLI): Differentiation and the role of age and input factors. Journal of Communication Disorders, 77, 1–16. Mayer, M. (1969). Frog, where are you? New York: Dial Press. Scripts can be found at www.saltsoftware.com/resources/databases Gutiérrez-Clellen,V. F., Simon-Cereijido, G., & Wagner, C. (2008). Bilingual children with language impairment: A comparison with monolinguals and second language learners. Applied Psycholinguistics, 29(1), 3–19.

Materials available in more than 27 languages, including Estonian, Lithuanian, Vietnamese, and Welsh

Gagarina, N., Klop, D., Kunnari, S., Tantele, K.,Välimaa, T., Balčiūnienė, I., Bohnacker, U. & Walters, J. (2012). MAIN: Multilingual Assessment Instrument for Narratives. ZAS Papers in Linguistics, 56. Gagarina, N., Klop, D., Kunnari, S., Tantele, K.,Välimaa, T., Bohnacker, U. & Walters, J. (2019). MAIN: Multilingual Assessment Instrument for Narratives –Revised. ZAS Papers in Linguistics, 63. Tsimpli, I. M., Peristeri, E., & Andreou, M. (2016). Narrative production in monolingual and bilingual children with specific language impairment. Applied Psycholinguistics, 37, 195–216.

(continued)

Methods for L2 Children with Special Needs 167

Task type

618

newgenrtpdf

Task type

Additional details

Available languages

Citation

Renfrew Bus Story

Retell task

Standardized measure, for ages 3–6:11

English

Test of Narrative Language (TNL)

Narrative tell, retell, and comprehension

Standardized, normed measure, for ages 4:0–15:11

English, Brazilian Portuguese, experimental Spanish version

Renfrew, C. E. (1969). The bus story: A test of continuous speech. North Place, Old Headington: Oxford. Rezzonico, S., Chen, X., Cleave, P. L., Greenberg, J., Hipfner-Boucher, K., … Girolametto, L. (2015). Oral narratives in monolingual and bilingual preschoolers with SLI. International Journal of Language & Communication Disorders, 50(6), 830–841. Gillam, R., & Pearson, N. (2017). TNL-2:Test of Narrative Language (2nd ed.). Austin, Texas: Pro-Ed. Rossi, N. F., Lindau, T. A., Gillam, R. B., & Giacheti, C. M. (2016). Cultural adaptation of the Test of Narrative Language (TNL) into Brazilian Portuguese. CoDAS, 28(5), 507–516. Perme, A. L. (2014). Measures of narrative performance in Spanish-speaking children on the Test of Narrative Language-Spanish [Unpublished master’s thesis]. University of Texas at Austin, Austin, TX. Squires, K. E., Lugo-Neris, M. J., Peña, E. D., Bedore, L. M., Bohman, T. M., & Gillam, R. B. (2014). Story retelling by bilingual children with language impairments and typically developing controls. International Journal of Language & Communication Disorders, 49(1), 60–74.

Note: In the citation column, the first citation listed is the task itself and the second citation is an example article that uses the tool.

168 Li Sheng and Sharon R. Hollenbach

Table 10.1 Cont.

618

619

618

Methods for L2 Children with Special Needs 169

in fewer languages but large-scale normative data exist in English and Spanish, making it possible to compare a particular child’s performance to others from a similar background. Finally, if testing is planned for bilinguals’ two languages, it is also necessary to select a narrative task with multiple stories that closely parallel each other (i.e., MAIN, the frog stories) to decrease practice effect. To generate good descriptive data, one could also profile language growth over time given that different rates of L1 and L2 growth across domains of language are well documented for bilingual learners (Ebert & Kohnert, 2016).This research goal requires a longitudinal design that assesses learners over multiple time points. As with any population, conducting longitudinal studies is more challenging than cross-sectional studies. Although one should strive for assessing Bi-DLD children in both languages, oftentimes this is simply not achievable because of the lack of tools in many languages and the lack of linguistic expertise among researchers and practitioners (Sheng, 2019). A substantial line of research takes this reality into consideration by asking: How do TD sequential bilinguals compare to monolingual peers with and without DLD in single-language assessment? Because sequential bilinguals have had less exposure to the L2, their performance on L2 language measures is often indistinguishable from that of monolinguals with DLD. The goal of these studies is to identify potential fault lines that could separate TD bilinguals from monolinguals with DLD by scrutinizing performance on a range of linguistic and nonlinguistic skills in three groups of children: TD sequential bilinguals, TD monolinguals, and monolinguals with DLD. Testing is conducted in the monolingual’s only language and the bilinguals’ L2. Measures that show clear separation between the two TD groups and the DLD group are ideal because they are minimally affected by differences in language experience while at the same time sensitive to the integrity of the language learning system. Measures that yield an indistinguishable performance between TD sequential bilinguals and monolinguals with DLD are to be avoided in non-biased assessment. This line of work has pointed to certain nonlinguistic skills (e.g., reaction time in shape detection, Kohnert & Windsor, 2004), clausal embedding (i.e., frequency of producing embedded clauses in spontaneous language samples, Scheidnes & Tuller, 2019), and error types (e.g., TD sequential bilinguals were more likely to make substitution errors whereas monolinguals with DLD were more likely to make omission errors in the production of inflections and prepositions, Armon-Lotem, 2014) as potential candidates that can be used to rule out DLD in sequential bilinguals.

Diagnostic Accuracy Studies Studies that delineate dual language profiles are clinically useful because they inform us about weaknesses in Bi-DLD at a group level. Studies with the goal of identifying fault lines between TD sequential bilinguals and monolingual DLD are also useful because they tell us what not to use in diagnostic testing and what

710

170 Li Sheng and Sharon R. Hollenbach

measures are good at ruling out DLD. However, neither type of study can tell a clinician whether or not a client with a certain combination of scores is affected or typical. To exert a more direct practical impact, diagnostic accuracy studies ask these questions: What are the psychometric properties of the proposed measure? Specifically, what are the sensitivity, specificity, positive likelihood ratio, and negative likelihood ratio of the index test (i.e., the measure under scrutiny) when evaluated against a reference standard (i.e., a widely accepted approach to classify individuals into categories)? Studies of this nature have evaluated a broad range of potential measures, including nonlinguistic processing tasks (e.g., processing speed, Ebert & Pham, 2019), clinical markers of DLD such as morphosyntactic composite, nonword repetition, and sentence repetition (Girbau & Schwartz, 2008; Gutiérrez-Clellen et al., 2008; Thordardottir & Brandeker, 2013), dynamic assessment tasks (Orellana et al., 2019), parent report of bilingual children’s first language development (Paradis et al., 2010), English standardized test scores (Gillam et al., 2013), and scores on a bilingual screener (Lugo-Neris, Peña et al., 2015). While a number of these measures are promising, the methodological quality is variable across studies (Dollaghan & Horner, 2011; Orellana et al., 2019). The ultimate charge for the researcher who studies clinical populations is to generate a high-quality evidence base to support effective clinical practice. High-quality translational research is not only governed by its own set of methodological standards but should also follow all familiar standards of scientific inquiries. There has been a concerted effort among the scientific community to develop standards and procedures to increase the quality of clinical research. The EQUATOR network (Enhancing the Quality and Transparency of Health Research) is a multinational initiative dedicated to promoting the use of comprehensive reporting guidelines that facilitate not only accurate and transparent reporting but also the planning and implementation of health research. The network offers a free online library of reporting guidelines for various study types. For instance, the Standards for Reporting Diagnostic accuracy studies (STARD; Bossuyt et al., 2015) is a 30-item checklist of requirements for the title, abstract, introduction, methods, results, discussion, and other relevant information (e.g., funding source) sections of a paper. Readers of a diagnostic research paper can use this checklist to judge the potential bias, relevance, and validity of study findings, whereas researchers can use the checklist for the design, conduct, and reporting of diagnostic research.

Intervention Studies An important goal of studying Bi-DD is to design effective intervention to improve the quality of life of affected individuals. All the questions pertaining to intervention for monolinguals apply to bilinguals. Among the questions unique to bilinguals, the most common is: What should be the language of intervention for bilinguals? Under this broad question, more specific questions include: How does bilingual intervention compare to L2-only intervention (Restrepo et al.,

710

17

710

Methods for L2 Children with Special Needs 171

2013)? Would time spent providing intervention in the minority language lead to smaller gains in majority language skills compared to an L2-only intervention (Restrepo et al., 2013)? Could intervention delivered in one language lead to gains in the other language (Petersen et al., 2016)? To date, research evidence indicates that bilingual intervention results in as much gain in the majority language as L2-only intervention, with some added benefit of L1 gains. Therefore, to the extent possible, intervention provided in both languages of the bilinguals should be encouraged. Under the bilingual intervention condition, the following questions have been raised: Is there an optimal order of initial instructional language (L1 first or L2 first) (Lugo-Neris, Bedore et al., 2015)? Given the frequent mismatch in clinician-client languages, could caregivers be trained to deliver effective intervention in the home language (Pedero et al., 2018)? Could intervention targeting nonlinguistic cognitive processing lead to cross-domain gains in both of the bilinguals’ languages (Ebert et al., 2014)? Studies attempting to answer these questions are beginning to emerge, but considerable gaps are present for all of them. Intervention studies require the measurement of participants’ language skills before and after intervention. Depending on the goal of the intervention, researchers may use standardized tests, language sampling, and researcher-designed probes to establish baseline performance and to evaluate change in a specific area (e.g., tense morphology) or more broadly (e.g., increase in mean length of utterance or in standardized test scores). For intervention research, the gold standard is randomized controlled trials (RCT), which measure the effectiveness of an intervention by randomly assigning participants to either the intervention or the comparison group. Again, readers can use guidelines on the EQUATOR network (i.e., Consolidated Standards for Reporting Trials, CONSORT; Schulz et al., 2010), a 25-item checklist to appraise the quality of a published RCT, or plan for a new study. Single-case designs are also appropriate in intervention studies targeting bilingual populations with a language learning impairment. These designs sample a few participants’ responses to an intervention multiple times over a period of time. The single-case reporting guidelines in behavioral interventions (SCRIBE; Tate et al., 2016), a 26-item checklist, can be used for the planning, conduct, and evaluation of single-case research. In the realm of educational research, the What Works Clearinghouse (WWC), an initiative of the US Department of Education’s Institute of Education Sciences, has published handbooks of standards and procedures used by the WWC to review and appraise the quality of education intervention studies. Now in its fourth version, the Standards Handbook (What Works Clearinghouse, 2020) describes in detail the standards for four types of intervention research designs: RCT, quasi- experimental design, regression discontinuity design, and single- case design. Researchers developing interventions for Bi-DD populations should be cognizant of these guidelines and standards and ensure adherence to the standards in their respective field.

172

172 Li Sheng and Sharon R. Hollenbach

In summary, research questions posed by the study of Bi-DD are of interest to both basic and clinical sciences. They offer insights into the process of language acquisition and can inform the interrelations between language, cognition, and experience. Well-designed treatment studies are particularly suitable to test hypotheses about the nature of underlying learning and processing deficits because they are better equipped for drawing causal relationships.

Challenges and Methodological Implications Answering any of the questions outlined in the previous section presupposes that one has a method for selecting the population of interest, for identifying the appropriate matching comparisons, for measuring the linguistic construct of interest, and for removing or controlling confounds that could threaten the validity of the method. When studying young L2 learners, these methodological requirements present a number of challenges due to the scarcity of participants and the increased number of potential confounds that are inherent in a highly heterogeneous population.

Participant Selection Readers of the Bi-DD literature would quickly notice that the participant section is quite elaborate because thorough descriptions of the bilingual status and the disorder status of the participants are in order. Every researcher who studies the Bi- DD population should already have a detailed background questionnaire in their methodological toolkit (see Table 10.2 for a summary of questionnaires).These tools rely on a report by the primary caregiver, typically administered in a face-to-face interview to increase reliability of reporting. They allow the researcher to quantify the current level of use and lifetime cumulative use of each language and document the daily function of each language across various settings and interlocutors. Researchers may choose to set a certain threshold of language use and/or language proficiency to include or exclude individuals. For instance, Gonzalez-Barrero and Nadig (2019) used a combination of four indices to determine the bilingual status of their ASD participants: (1) > 20% of lifetime exposure to each language according to parent report; (2) the ability to complete the testing protocol in both languages; (3) a score of > 3 on a 4-point proficiency scale in each language as rated by parents; and (4) mean ratings of > 2 on a 4-point proficiency scale from three external raters’ assessment of language use based on videos of the testing sessions. Others may choose to use > 20% current language use rather than lifetime exposure and still others may use a different cut-off criterion (e.g., < 65% English; Ebert et al., 2019). There is no consensus on the definition of bilingual. Thus, the main guidance is to choose a logically sound criterion that helps one fulfil the aim of the project. Procedures for determining or verifying disorder status is specific to each disorder. ASD is diagnosed based on the distinct behavioral profile demonstrated

172

713

newgenrtpdf

172

TABLE 10.2 Language use and experience questionnaires Name

Languages

The Alberta Language Language history Environment and present Questionnaire use, behavior, (ALEQ) and family history

Standardized measure, normed using children ages 5–7 years with DLD

Materials in English but content not language specific

Bilingual Input- Parent/teacher Output Survey, assessment of part of Bilingual language use English-Spanish and exposure Assessment (BESA)

Standardized questionnaire, for ages 4–6 years

Bilingual Language Current year’s Experience input and Calculator (BiLEC) output, lifetime input and output

Standardized measure

Current & Standardized history use and measure exposure

Citations

Paradis, J. (2011). Individual differences in child English second language acquisition: Comparing child-internal and child-external factors. Linguistic Approaches to Bilingualism,Volume 1(3). Reetzke, R., Zou, X., Sheng, L. & Katsos, N. (2015). Communicative development in bilingually exposed Chinese children with autism spectrum disorders. Journal of Speech, Language, and Hearing Research, 58, 813–825. Available in English Peña, E., Gutierrez-Clellen,V., Iglesias, A., Goldstein, B., & Bedore, L. (2018). and Spanish BESA: Bilingual English-Spanish Assessment. Baltimore, MD: Brookes Publishing. Grasso, S. M., Peña, E. D., Bedore, L. M., Hixon, J. G., & Griffin, Z. M. (2018). Cross-linguistic cognate production in Spanish-English bilingual children with and without specific language impairment. Journal of Speech, Language, and Hearing Research, 61, 619–633. Intended for Unsworth, S. (2013). Assessing the role of current and cumulative exposure in bilinguals of simultaneous bilingual acquisition: The case of Dutch gender. Bilingualism English and any 16, 86–110. https://doi.org/ 10.1017/S1366728912000284 other language Vender, M., Hu, S., Mantione, F., Savazzi, S., Delfitto, D., & Melloni, C. (2018). Inflectional morphology: Evidence for an advantage of bilingualism in dyslexia. International Journal of Bilingual Education and Bilingualism, 24(2), 155–172. https://doi.org/10.1080/13670050.2018.1450355 Available in 24 Marian,V., Blumenfeld, H. K., & Kaushanskaya, M. (2007). The Language languages, Experience and Proficiency Questionnaire (LEAP-Q): Assessing language including Arabic, profiles in bilinguals and multilinguals. Journal of Speech, Language, and Russian, Spanish, Hearing Research, 50 (4), 940–967. and Thai https://bilingualism.northwestern.edu/leapq/ Mor, B.,Yitzhaki-Amsalem, S., & Prior, A. (2014). The joint effect of bilingualism and ADHD on executive function. Journal of Attention Disorders, 19(6), 1–15.

Note: In the citation column, the first citation listed is the task itself and the second citation is an example article that uses the tool.

Methods for L2 Children with Special Needs 173

Additional details

Language Experience and Proficiency Questionnaire (LEAP-Q)

Focus

174

174 Li Sheng and Sharon R. Hollenbach

by affected individuals. Participant recruitment is typically through community referrals and research registries. Researchers then either request health/educational records from participants or administer additional tests in the laboratory to document the severity of the disorder. Diagnosing DLD, even in monolinguals, is not a cut-and-dry process. For bilinguals, the problem becomes more complex due to the overlap in linguistic performance between typical sequential bilinguals and monolinguals with DLD, the shortage of psychometrically sound tools, and the lack of bilingual expertise in the professional workforce.To ensure accurate participant selection, researchers administer confirmatory testing to verify the diagnostic status of the children recruited through community referrals. In Sheng et al. (2012), to be included in the DLD group, not only were the Spanish-English bilinguals enrolled in therapy at school, but they also demonstrated (1) low proficiency ratings (more than 1 SD below the group mean in a pool of 280 children) in both languages reported by parents and teachers; (2) valid concerns expressed by teachers and parents about their language ability; (3) clinician concern on the basis of difficulties at the time of testing; and (4) low grammaticality in narrative production in both languages. The convergent sources of information guard against errors of over-, under, and mis-diagnosis of DLD frequently reported in bilingual populations. Convergent information from both subjective ratings and object performance measures is a viable solution to diagnosing DLD when norm-referenced tests are unavailable (see Table 10.3 for a list of standardized language tests in languages other than English). When such tests are available, it is customary to use 1 to 1.5 standard deviations below the mean on omnibus L1 and L2 proficiency tests to select individuals with DLD (e.g., Russian-Hebrew: Fichman & Altman, 2019; Spanish-English: Grasso et al., 2018). The use of 1–1.5 standard deviations below the mean, however, is not universal. Further discussions on language test score criteria can be found in Plante (1998). The ideal norm should consist of bilingual children with similar demographic characteristics and comparable language experience, but this is rarely the case given the challenges in recruiting large bilingual samples. IQ testing is almost always required in studies of special populations. IQ test scores are used to document the cognitive functioning of the participants and to select appropriately matched controls (e.g., monolinguals with the same diagnosis and similar IQ scores or younger typically developing children with comparable raw IQ scores). In the case of DLD, a cut-off score of 70 on nonverbal intelligence tests is commonly used to exclude individuals whose language deficits are caused by deficits in intellectual ability. Determination of bilingual status and DD status is not trivial. Both involve a combination of subjective judgment from stakeholders (i.e., parents, teachers, trained professionals) and objective performance measures. Such painstaking details are critical to ensuring confidence in the participants’ status and finding the right matching group to answer key research questions.

174

175

newgenrtpdf

174

TABLE 10.3 Language measures for groups other than English-speaking monolinguals

Focus

Batteria per la Valutazione della Dislessia e della Disortografia Evolutiva –2 [Battery for the assessment of developmental dyslexia and dysorthographia-2] (DDE-2) Bilans Informatisés du Langage Oral [Computerized schedule for oral language] (BILO-3C) Bilingual English Spanish Assessment (BESA)

Additional details

Available languages

Citations

Word and nonword Standardized, reading and normed writing tasks, measure homophones

Italian

Expressive and receptive, morphosyntax, sentence completion, phonology Morpho/syntax, semantics, phonology, pragmatics, questionnaires

French

Sartori, G., Job, R., & Tressoldi, P. E. (2007). DDE-2. Batteria per la valutazione della dislessia e della disortografia evolutiva [Battery for the assessment of developmental dyslexia and dysorthographia]. Firenze: Giunti OS. Vender, M., Hu, S., Mantione, F., Savazzi, S., Delfitto, D., & Melloni, C. (2018). Inflectional morphology: Evidence for an advantage of bilingualism in dyslexia. International Journal of Bilingual Education and Bilingualism, 24(2), 155–172. https://doi.org/10.1080/ 13670050.2018.1450355 Khomsi, A., Khomsi, J., Parabeau-Guéno, A., & Pasquet, F. (2007). Bilans Informatisés du Langage Oral (BILO-3C) [Computerized schedule for oral language]. Paris, France: Editions du CPA. Scheidnes, M. & Tuller, L. (2019). Using clausal embedding to identify language impairment in sequential bilinguals. Bilingualism: Language and Cognition, 22(5), 949–967. Peña, E., Gutierrez-Clellen,V., Iglesias, A., Goldstein, B., & Bedore, L. (2014). BESA: Bilingual English-Spanish Assessment Manual. San Rafael, CA: AR-Clinical Publications. Squires, K. E., Lugo-Neris, M. J., Peña, E. D., Bedore, L. M., Bohman, T. M., & Gillam, R. B. (2014). Story retelling by bilingual children with language impairments and typically developing controls. International Journal of Language & Communication Disorders, 49(1), 60–74.

Standardized measure, for infants through adolescents

Standardized English and and normed Spanish measure, for ages 4–6 years

(continued)

Methods for L2 Children with Special Needs 175

Name

716

newgenrtpdf

Name

Focus

Clinical Evaluation of Language Fundamentals (CELF), -S, -NL

Additional details

Available languages

Citations

Receptive and Standardized, expressive normed language, written measure, language, social for ages 5– skills 21 years

English, Spanish, Dutch

Évaluation du Langage Oral de l’enfant Aphasique [Oral language evaluation of aphasic children] (ELOLA)

Language production (originally intended for children with aphasia)

Standardized measure, for ages 4– 12 years

French

Goralnik Screening Test for Hebrew

Sentence repetition, comprehension, expression, pronunciation, vocabulary, and storytelling

Standardized, normed measure

Hebrew

Semel, E., Wiig, E., & Secord, W. A. (2013). Clinical Evaluation of Language Fundamentals (5th ed.). San Antonio, TX: Pearson. Semel, E., Wiig, E. H., & Secord, W. A. (2003). Clinical Evaluation of Language Fundamentals (4th ed.) [CELF-4 Spanish]. San Antonio, TX: PsychCorp. Altman, C., Armon-Lotem, S., Fichman, S., & Walters, J. (2016). Macrostructure, microstructure, and mental state terms in the narratives of English-Hebrew bilingual preschool children with and without specific language impairment. Applied Psycholinguistics, 37, 165–193. De Agostini, M., Metz-Lutz, M.-N.,Van Hout, A., Chavance, M., Deloche, G., Pavao-Martins, I., & Dellatolas, G. (1998). Batterie d’évaluation du langage oral de l’enfant aphasique (ELOLA): standardisation française (4–12 ans) [Oral language evaluation battery of aphasic children: A French standardization]. Revue de Neuropsychologie, 8(3), 319–367. Scheidnes, M. & Tuller, L. (2019). Using clausal embedding to identify language impairment in sequential bilinguals. Bilingualism: Language and Cognition, 22(5) 949–967. Goralnik, E. (1995). Goralnik Screening Test for Hebrew. Even Yehuda: Matan. Fichman, S., Altman, C.,Voloskovich, A., Armon-Lotem, S., & Walters, J. (2017). Story grammar elements and causal relations in the narratives of Russian-Hebrew bilingual children with SLI and typical language development. Journal of Communication Disorders, 69, 72–93.

176 Li Sheng and Sharon R. Hollenbach

Table 10.3 Cont.

716

17

716

Standardized English, measure, for Spanish ages 4–6 years

Peña, E., Gutierrez-Clellen,V., Iglesias, A., Goldstein, B., & Bedore, L. (2014). BESA: Bilingual English-Spanish Assessment Manual. San Rafael, CA: AR-Clinical Publications. Grasso, S. M., Peña, E. D., Bedore, L. M., Hixon, J. G., & Griffin, Z. M. (2018). Cross-linguistic cognate production in Spanish-English bilingual children with and without specific language impairment. Journal of Speech, Language, and Hearing Research, 61(3), 619–633.

Standardized, normed measure, for ages 8– 30 months

Preschool Language Scales (PLS) -3, -4, -5, -Spanish

Standardized, normed measure, for ages birth–7:11

Fenson, L. (2007). MacArthur-Bates communicative development inventories. Baltimore, MD: Paul H. Brookes Publishing Company. http://wordbank.stanford.edu/ Petersen, J. M., Marinova-Todd, S. H., & Mirenda, P. (2012). Brief report: An exploratory study of lexical skills in bilingual children with autism spectrum disorder. Journal of Autism and Developmental Disorders, 42, 1499–1503. Boehm, A. E. (1971) Boehm Test of Basic Concepts. New York: The Psychological Corporation. Thordardottir, E. T., Weismer, S. E., & Smoth, M. E. (1997).Vocabulary learning in bilingual and monolingual clinical intervention. Child Language Teaching and Therapy, 13(3), 215–227. Zimmerman, I. L., Steiner,V. G., & Pond, R. E. (2011). Preschool language scales (5th ed.). San Antonio, TX: Pearson. Ohashi, J. K., Mirenda, P., Marinova-Todd, S., Hambly, C., Fombonne, E., … the Pathways in ASD Study Team (2012). Comparing early language development in monolingual-and bilingual-exposed young children with autism spectrum disorder. Research in Autism Spectrum Disorders, 6(2), 890–897.

Receptive and expressive language from pre-verbal to early literacy

Standardized, normed measure, for ages 3:0–5:11

Used across 29 languages, including Norwegian, Danish, Portuguese, and Turkish English, Spanish

English, Spanish

(continued)

Methods for L2 Children with Special Needs 177

Inventory to Assess Language Five areas of Knowledge (iTALK), language part of Bilingual English- development Spanish Assessment (vocabulary, (BESA) grammar, sentence production, comprehension, and phonology) MacArthur-Bates Early language Communicative including Development Inventories vocabulary (MB-CDIs); CDI; comprehension Preschool CDI (PCDI); and production, Chinese CDI (CCDI) gestures, and grammar Preschool Boehm Test of Basic language Basic Concepts and cognitive development

718

newgenrtpdf

Name

Focus

Additional details

Available languages

Citations

Receptive and Expressive One-Word Picture Vocabulary Test (ROW/ ROWPVT & EOW/ EOWPVT); Spanish- Bilingual Edition (EOWPVT-3: SBE) Russian Language Proficiency Test for Multilingual Children

Vocabulary

Standardized, normed measure, for ages 4–70+ years

English, Spanish, bilingual edition

Brownell, R. (Ed.). (2000). Expressive one-word picture vocabulary test: Manual. Academic Therapy Publications. Grasso, S. M., Peña, E. D., Bedore, L. M., Hixon, J. G., & Griffin, Z. M. (2018). Cross-linguistic cognate production in Spanish-English bilingual children with and without specific language impairment. Journal of Speech, Language, and Hearing Research, 61(3), 619–633.

Production and receptive language

Standardized and preliminarily normed measure, for ages 3–6:11

Gagarina N., Klassert A., & Topaj, N. (2010). Russian language proficiency test for multilingual children. ZAS Papers in Linguistics, 54. Fichman, S., & Altman, C. (2019). Referential cohesion in the narratives of bilingual and monolingual children with typically developing language and with specific language impairment. Journal of Speech, Language, and Hearing Research, 62(1), 123–142.

Schlichting test voor taalproductie [Schlichting test for language production]; -2

Productive semantics, syntax, and pragmatics

Standardized, normed measure, for ages 1:2–6:3, version 2 for ages 3:9–7:0 Standardized, normed measure

Russian, preliminary bilingual norms for Russian- Hebrew bilinguals Dutch

Dutch

Verhoeven, L., & Vermeer, A. (2001). Taaltoets alle kinderen [Dutch language test for children]. Arnhem: The Netherlands Cito Group. Verhoeven, L., Steenge, J., & van Balkom, H. (2012). Linguistic transfer in bilingual children with specific language impairment. International Journal of Language & Communication Disorders, 42(2), 176–183.

Taaltoets Alle Kinderen [The Receptive and language proficiency productive test for all children] language, (TAK-R)* semantics, morphosyntax

Schlichting, J., van Eldik, M., lutje Spelberg. H., van der Meulen, S., & van der Meulen, B. (2003). Schlichting test voor taalproductie [Schlichting test for language production]. Lisse, The Netherlands: Swets & Zeitlinger.

Note: In the citation column, the first citation listed is the task itself and the second citation is an example article that uses the tool.

178 Li Sheng and Sharon R. Hollenbach

Table 10.3 Cont.

718

719

718

Methods for L2 Children with Special Needs 179

Comparison Group As illustrated in the research question section, the appropriate comparison group is dictated by the question. At a minimum, the comparison group should be of a similar age, socioeconomic status, gender, and geographic region to the group of interest. In studies of monolingual children with DLD, researchers often utilize another type of comparison— namely, language- matched peers— to examine attainment in one aspect of language relative to another. For instance, English- speaking children with DLD are repeatedly found to score significantly worse on grammatical morphology than younger peers matched on mean length of utterance, hence the conclusion that extraordinary difficulties with grammatical morphology is a core characteristic of English DLD (Leonard, 2014). Language matching is unattested in Bi-DLD for obvious reasons: Most bilinguals do not have balanced skills in both languages. Language matching could result in large differences in chronological age between the L1 and L2 language-matched peers, making the comparisons unfair and invalid for this population.

Heterogeneity Much of child language research emphasizes the need for homogeneous groups of participants for the purpose of experimental control. When homogeneity proves difficult to attain in special populations, researchers turn to grouping techniques (e.g., grouping by disorder subtype or severity) or statistical techniques to analyze the effect of individual variation or factor out undesirable differences. Anyone who has conducted research on either bilinguals or individuals with a DD already knows that participants are in short supply. When the target population has to meet both criteria, the number of eligible participants decreases exponentially. Further complicating the matter, both bilinguals and individuals with a DD are known for their heterogeneity. When striving for homogeneous participant pools, Bi-DD researchers may control for participants’ language type and exposure level and limit participants’ age range. However, these constraints further limit participant availability. Depending on the research question, more inclusive approaches of participant selection can be used to expand the participant pool without jeopardizing study validity. One approach is to broaden the language requirement by accepting participants exposed to any pairing or grouping of languages into the “bilingual” group of a study. This should be done when differences between languages or language pairs are irrelevant to the goals of the study or when researchers want language-specific differences to average out, allowing results to generalize across multiple language populations. Questions of this nature often focus on the general cognitive effects of bilingual exposure or examine if assessing only one language (i.e., the majority language) or assessing nonlinguistic cognitive skills can adequately separate individuals with DLD from TD individuals.

810

180 Li Sheng and Sharon R. Hollenbach

When a research question requires specific language pairs, the amount of language exposure per participant is another variable that can be expanded. Including participants with a wide range of bilingual exposure is well suited for answering questions regarding the effect of exposure on attainment. Examples of this type of question can be found in Bohman et al.’s (2010) large-scale investigation of the language input effect on TD Spanish-English bilingual children’s language performance and in Gonzalez-Barrero and Nadig (2018)’s study on the effect of current language exposure on vocabulary and morphological skills in bilingual school-age children with ASD.

Conclusion Studying bilingual children with a developmental disability affords many opportunities for high-stakes research questions. We have tried to illustrate some of the research questions uniquely motivated by this population. At the same time, this line of work poses many challenges because of the complexity and heterogeneity of the population, and we have described some of the innovative solutions to overcome these challenges. As this field of study advances, the research questions will become more nuanced and sophisticated and so must our research methods. Equally importantly, future studies need to meet the highest methodological standards to translate research evidence into practice.

Key Terms Autism spectrum disorder is a neurodevelopmental disorder manifested on a spectrum of severity in the areas of social interaction, communication, restricted and repetitive behaviors, and sensory interests or responses. Basic science addresses questions about the core of how and why things work the way they do, which often requires translation in order to be applicable. Clinical science tests the efficacy, benefits, and accuracy of treatments, medication, and diagnostic techniques. Developmental language disorder is a disorder that negatively affects a person’s ability to acquire their native (and subsequent) language(s) in the absence of sensory, neurological, intellectual, and social-emotional impairment. Dynamic assessment is a flexible method of evaluating a child’s capacity for learning through skills such as attention, memory, and cognitive flexibility. Dynamic assessment procedures include testing, teaching, and retesting phases, which are analyzed by either establishing how much a child has improved, how much support and modification the child needs, or some combination thereof. Dynamic assessment is believed to help separate children whose language lags behind peers due to general skills versus those who lag behind due to lower exposure.

810

18

810

Methods for L2 Children with Special Needs 181

Index test is the test whose scoring or diagnostic accuracy is being examined. Negative likelihood ratio is the odds of an individual having a given diagnosis after receiving a negative test result. Positive likelihood ratio is the odds of an individual having a given diagnosis after receiving a positive test result. Reference standard refers to the accepted clinical diagnosis. This is used to compare with the accuracy of the index test, and, if the index test is accurate, they align. Screener is a brief measure of language ability used to detect individuals who may be at risk of having a language disorder. Individuals who fail a screening do not necessarily have a disorder but should undergo comprehensive testing or close monitoring. Sensitivity refers to a test’s ability to positively diagnose an individual, as calculated by the number of true positives divided by the combined value of true positives and false negatives. Specificity refers to a test’s ability to correctly identify individuals who do not have a given diagnosis, as calculated by the number of true negatives divided by the combined value of true negatives and false positives.

Further Readings Abbeduto, L., Kover, S. T., & McDuffie, A. (2012). Studying the language development of children with intellectual disabilities. In E. Hoff (Ed.), Research methods in child language: A practical guide (pp. 330–346). Hoboken: Blackwell Publishing Ltd. https://doi. org/10.1002/9781444344035.ch22 This chapter describes challenges in assessing language in individuals with intellectual disabilities and some of the methods that can be used to deal with these challenges. Ebert, K. D., & Kohnert, K. (2016). Language learning impairment in sequential bilingual children. In Language Teaching (Vol. 49). https://doi.org/10.1017/s026144481 6000070 This review focuses on the evidence regarding theoretical and pedagogical issues for children who have been both diagnosed with language impairments and are sequential bilinguals. Kay-Raining Bird, E., Genesee, F., & Verhoeven, L. (2016). Bilingualism in children with developmental disorders: A narrative review. Journal of Communication Disorders, 63, 1–14. https://doi.org/10.1016/j.jcomdis.2016.07.003 This article reviews the published evidence regarding developmental differences between simultaneous and sequential bilinguals with a DD, and how language intervention influences bilingual children with a DD. McGregor, K. K. (2012). Studying children with language impairment. In E. Hoff (Ed.), Research methods in child language: A practical guide (pp. 317–329). Hoboken: Blackwell Publishing Ltd. This chapter describes methods of studying children with LI, including the selection of participants, comparison groups, and tasks. In addition, it provides guidance on how to

812

182 Li Sheng and Sharon R. Hollenbach

make such research high quality and translational to serve evidence-based intervention practices.

Discussion Questions 1. Consider how the language evaluation of a child with a DD should be altered when that child is also bilingual. How should those alterations change depending on the specific DD diagnosis? 2. What types of measures are available for Bi-DD ages 4–12 and what is lacking? How does this affect their chances of an accurate diagnosis? 3. Provide some examples of how various DDs can affect bilingual language acquisition. Are there differences between the impact of DDs on bilingual versus monolingual language acquisition?

Note 1 Because this literature typically labels participants as “bilingual,” here we use the term “bilingual” interchangeably with “L2 children.”

References Armon-Lotem, S. (2014). Between L2 and SLI: Inflections and prepositions in the Hebrew of bilingual children with TLD and monolingual children with SLI. Journal of Child Language, 41(1), 3–33. https://doi.org/10.1017/s0305000912000487 Bohman, T. M., Bedore, L. M., Peña, E. D., Mendez-Perez, A., & Gillam, R. B. (2010). What you hear and what you say: Language performance in early sequential Spanish– English bilinguals. International Journal of Bilingual Education and Bilingualism, 13, 325– 344. https://doi.org/10.1080/13670050903342019 Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L., Lijmer, J. G., Moher, D., Rennie, D., de Vet, H. C. W., Kressel, H. Y., Rifai, N., Golub, R. M., Altman, D. G., Hooft, L., Korevaar, D. A., & Cohen, J. F. for the STARD Group. (2015). STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies. BMJ, 351. https://doi.org/10.1136/bmj.h5527 Dollaghan, C. A., & Horner, E. A. (2011). Bilingual language assessment: A meta-analysis of diagnostic accuracy. Journal of Speech, Language, and Hearing Research, 54(4), 1077–1088. https://doi.org/10.1044/1092-4388(2010/10-0093) Dunn, D. M. (2007). PPVT-4: Peabody Picture Vocabulary Test. Minneapolis, MN.: Pearson Assessments. Ebert, K. D., & Kohnert, K. (2016). Language learning impairment in sequential bilingual children. Language Teaching, 49(3), 301–338. https://doi.org/10.1017/s026144481 6000070 Ebert, K. D., Kohnert, K., Pham, G., Disher, J. R., & Payesteh, B. (2014). Three treatments for bilingual children with primary language impairment: Examining cross-linguistic and cross-domain effects. Journal of Speech, Language, and Hearing Research, 57(1), 172– 186. https://doi.org/10.1044/1092-4388(2013/12-0388) Ebert, K. D., & Pham, G. (2019). Including nonlinguistic processing tasks in the identification of developmental language disorder. American Journal of Speech-Language Pathology, 28, 932–944. https://doi.org/10.1044/2019_AJSLP-IDLL-18-0208

812

183

812

Methods for L2 Children with Special Needs 183

Ebert, K. D., Rak, D., Slawny, C. M., & Fogg, L. (2019). Attention in bilingual children with developmental language disorder. Journal of Speech, Language, and Hearing Research, 62, 979–992. https://doi.org/10.1044/2018_JSLHR-L-18-0221 Fichman, S., & Altman, C. (2019). Referential cohesion in the narratives of bilingual and monolingual children with typically developing language and with specific language impairment. Journal of Speech, Language, and Hearing Research, 62(1), 123–142. https:// doi.org/10.1044/2018_JSLHR-L-18-0054 Gagarina, N., Klop, D., Kunnari, S., Tantele, K., Välimaa, T., Balčiūnienė, I., Bohnacker, U., & Walters, J. (2012). MAIN: Multilingual Assessment Instrument for Narratives. ZAS Papers in Linguistics, 56. Gagarina, N., Klop, D., Kunnari, S.,Tantele, K.,Välimaa,T., Balčiūnienė, I., Bohnacker, U., & Walters, J. (2019). MAIN: Multilingual Assessment Instrument for Narratives –Revised. ZAS Papers in Linguistics, 63. Gillam, R. B., & Pearson, N. (2017). TNL-2: Test of Narrative Language (2nd ed). Austin, Texas: Pro-Ed. Gillam, R. B., Peña, E. D., Bedore, L. M., Bohman, T. M., & Mendez-Perez, A. (2013). Identification of specific language impairment in bilingual children: I. Assessment in English. Journal of Speech, Language, and Hearing Research, 56(6), 1813–1823. https://doi. org/10.1044/1092-4388(2013/12-0056) Girbau, D., & Schwartz, R. G. (2008). Phonological working memory in Spanish-English bilingual children with and without specific language impairment.Journal of Communication Disorders, 41(2), 124–145. https://doi.org/10.1016/j.jcomdis.2007.07.001 Gonzalez-Barrero, A. M., & Nadig, A. (2018). Bilingual children with autism spectrum disorders: The impact of amount of language exposure on vocabulary and morphological skills at school age. Autism Research, 11(12), 1667–1678. https://doi.org/ 10.1002/aur.2023 Gonzalez-Barrero, A. M., & Nadig, A. S. (2019). Can bilingualism mitigate set-shifting difficulties in children with autism spectrum disorders? Child Development, 90(4), 1043– 1060. https://doi.org/10.1111/cdev.12979 Grasso, S. M., Peña, E. D., Bedore, L. M., Hixon, J. G., & Griffin, Z. M. (2018). Cross- linguistic cognate production in Spanish–English bilingual children with and without specific language impairment. Journal of Speech, Language, and Hearing Research, 61, 619– 633. https://doi.org/10.1044/2017_JSLHR-L-16-0421 Gutiérrez-Clellen, V. F., Simon-Cereijido, G., & Wagner, C. (2008). Bilingual children with language impairment: A comparison with monolinguals and second language learners. Applied Psycholinguistics, 29(1), 3–19. https://doi.org/10.1017/S014271640 8080016 Kohnert, K. & Windsor, J. (2004). The search for common ground part II: Nonlinguistic performance by linguistically diverse learners. Journal of Speech, Language, and Hearing Research, 47(4), 891–903. https://doi.org/10.1044/1092-4388(2004/066) Kormos, J. (2017a). The second language learning processes of students with specific learning difficulties. New York, NY: Routledge. Kormos, J. (2017b). The effects of specific learning difficulties on processes of multilingual language development. Annual Review of Applied Linguistics, 37, 30–44. Leonard, L. B. (2014). Children with specific language impairment. Cambridge, MA: MIT Press. Lugo-Neris, M. J., Bedore, L. M., & Peña, E. D. (2015). Dual language intervention for bilinguals at risk for language impairment. Seminars in Speech and Language, 36(2), 133– 142. https://doi.org/10.1055/s-0035-1549108

814

184 Li Sheng and Sharon R. Hollenbach

Lugo-Neris, M. J., Peña, E. D., Bedore, L. M., & Gillam, R. B. (2015). Utility of a language screening measure for predicting risk for language impairment in bilinguals. American Journal of Speech-Language Pathology, 24(3), 426–437. https://doi.org/10.1044/2015_aj slp-14-0061 Mayer, M. (1969). Frog, where are you? Dial Press. Miller, J. F., & Iglesias, A. (2017). Systematic analysis of language transcripts (SALT) [Computer Software]. Salt Software, LLC. Orellana, C. I., Wada, R., & Gillam, R. B. (2019). The use of dynamic assessment for the diagnosis of language disorders in bilingual children: A meta-analysis. American Journal of Speech- Language Pathology, 28, 1298–1317. https://doi.org/10.1044/2019_AJ SLP-18-0202 Paradis, J., Crago, M., Genesee, F., & Rice, M. (2003). French-English bilingual children with SLI: How do they compare with their monolingual peers? Journal of Speech, Language and Hearing Research, 46, 113–128. https://doi.org/10.1044/1092-4388(2003/009) Paradis, J., Emmerzael, K., & Duncan, T. S. (2010). Assessment of English language learners: Using parent report on first language development. Journal of Communication Disorders, 43, 474–497. https://doi.org/10.1016/j.jcomdis.2010.01.002 Pedero, T. N., Zelaya, M. I., & Kaiser, A. P. (2018). Teaching low-income Spanish-speaking caregivers to implement EMT en Español with their young children with language impairment: A pilot study. American Journal of Speech-Language Pathology, 27(1), 136–153. https://doi.org/10.1044/2017_ajslp-16-0228 Petersen, J. M., Marinova-Todd, S. H., & Mirenda, P. (2012). Brief report: An exploratory study of lexical skills in bilingual children with autism spectrum disorder. Journal of Autism and Developmental Disorders, 42, 1499–1503. Petersen, D. B., Thompsen, B., Guiberson, M. M., & Spencer, T. D. (2016). Cross-linguistic interactions from second language to first language as the result of individualized narrative language intervention with children with and without language impairment. Applied Psycholinguistics, 37(3), 703–724. https://doi.org/10.1017/S0142716415000211 Plante, E. (1998). Criteria for SLI: The Stark and Tallal legacy and beyond. Journal of Speech, Language, and Hearing Research, 41(4), 951–957. https://doi.org/10.1044/jslhr.4104.951 Restrepo, M. A., Morgan, G. P., & Thompson, M. S. (2013). The efficacy of a vocabulary intervention for dual- language learners with language impairment. Journal of Speech, Language, and Hearing Research, 56(2), 748–765. https://doi.org/10.1044/1092- 4388(2012/11-0173)x Scheidnes, M., & Tuller, L. (2019). Using clausal embedding to identify language impairment in sequential bilinguals. Bilingualism: Language and Cognition, 22(5), 949–967. https://doi.org/10.1017/S1366728918000949 Schneider, P., Dubé, R.V., & Hayward, D. (2005). The Edmonton Narrative Norms Instrument. www.rehabresearch.ualberta.ca/enni/ Schulz, K. F., Altman, D. G., & Moher, D. (2010). CONSORT 2010 statement: Updated guidelines for reporting parallel group randomised trials. Journal of Clinical Epidemiology, 63(8), 834–840. https://doi.org/10.1016/j.jclinepi.2010.02.005 Sheng, L. (2019). Introduction to the forum: Innovations in clinical practice for dual language learners, Part 1. American Journal of Speech-Language-Pathology, 28(3), 929–931. https://doi.org/10.1044/2019_AJSLP-IDLL-19-0143 Sheng, L., Peña, E. D., Bedore, L. M., & Fiestas, C. E. (2012). Semantic deficits in Spanish– English bilingual children with language impairment. Journal of Speech, Language and Hearing Research, 55(1), 1–15. https://doi.org/10.1044/1092-4388(2011/10-0254)

814

158

814

Methods for L2 Children with Special Needs 185

Tate, R. L., Perdices, M., Rosenkoetter, U., Shadish,W.,Vohra, S., Barlow, D. H., Horner, R., Kazdin, A., Kratochwill, T., McDonald, S., Sampson, M., Shamseer, L., Togher, L., Albin, R., Backman, C., Douglas, J., Evans, J. J., Gast, D., Manolov, R., … Wilson, B. (2016). The Single-Case Reporting Guideline in BEhavioural Interventions (SCRIBE) 2016 statement. Journal of Clinical Epidemiology, 73, 142–152. https://doi.org/10.1016/j.jclin epi.2016.04.006 Thordardottir, E., & Brandeker, M. (2013). The effect of bilingual exposure versus language impairment on nonword repetition and sentence imitation scores. Journal of Communication Disorders, 46(1), 1–16. https://doi.org/10.1016/j.jcomdis.2012.08.002 To, C. K. S., Law, T., & Li, X. X. (2012). Influence of additional language learning on first language learning in children with language disorders. International Journal of Language and Communication Disorders, 47(2), 208–216. https://doi.org/10.1111/ j.1460-6984.2011.00105.x Vender, M., Hu, S., Mantione, F., Savazzi, S., Delfitto, D., & Melloni, C. (2018). Inflectional morphology: Evidence for an advantage of bilingualism in dyslexia. International Journal of Bilingual Education and Bilingualism, 24(2), 155–172. https://doi.org/10.1080/13670 050.2018.1450355 What Works Clearinghouse. (2020). Standards Handbook (4.1). Washington DC: Institute of Education Sciences. Zimmerman, I. L., Steiner,V. G., & Pond, R. E. (2011). Preschool Language Scale, Fifth Edition (PLS-5). APA PsycTests.

816

11 CONSIDERATIONS FOR RESEARCH METHODS TO STUDY CHILD SECOND LANGUAGE DEVELOPMENT Yuko Goto Butler

Introduction Children’s second or foreign language (L2/FL) development is extremely challenging to pin down empirically. This is in large part because research methods must be adjusted to account for the fact that children, far more than adults, are in the midst of developing in a number of domains. These developmental domains include cognitive and metacognitive factors (e.g., memory, attention, processing speed, metacognition, etc.), affective factors (e.g., anxiety and engagement), and linguistic and cultural factors (e.g., first language [L1] development, the choice of L1 or L2, and hidden cultural assumptions). Age is often correlated with experiential factors (e.g., experience with participating in research and verbalizing thoughts, etc.). Researchers must take these factors into consideration when designing research methods for children (Philp et al., 2008). This chapter lays out how these developmental factors influence the various methods for studying child second language development discussed in this volume. Because research methods are inseparable from the theories underpinning them, I first consider how the two major theories of child development—Piaget’s theory of cognitive development and Vygotsky’s sociocultural theory—shape the design and execution of child second language studies. I then provide a synthesis of the different research methods covered in the preceding chapters and discuss major age-related considerations for those approaches. In the second half of the chapter, I address two issues concerning research methods that merit more scholarly attention in the field of child second language development. The first issue concerns how to adapt research methods to reflect new understandings of children’s development as dynamic and fluid. Asserting simple cause-and-effect relationships among variables may no longer be sufficient DOI: 10.4324/9780367815783-11

816

817

816

Considerations for Child SLA Research 187

and might even be misleading given the variable nature of children’s performance (Butler, 2017).The second issue concerns developing child-centered approaches to our research that reflect the growing interest in research with children as opposed to research on children in child development studies. Such approaches must go beyond simply obtaining children’s assent and require rethinking established research approaches in the field (e.g., Christensen & James, 2017).

Theories and Research Methods Research methods are grounded in epistemological and theoretical traditions.The theory to which researchers subscribe influences how they formulate research questions, which research methods they use, how they design the study, and how they interpret the results. Child L2/FL research has been influenced greatly by theories and methods that have been used in both child language acquisition and adult L2 language acquisition research. Adopting nativist accounts in theoretical linguistics (Chomsky, 1965), the beginning of the modern inquiries of L2 acquisition sought the universal and invariable path of grammar constructions in learners’ language (e.g., Dulay & Burt, 1973). Since then, the field has broadened its scope significantly and has yielded a number of theories, including skill acquisition theory, input processing theory, processability theory, usage-based theory, sociocultural theory, dynamic complexity theory, and many others (VanPatten & Williams, 2015). These theories often borrowed ideas from other disciplines applying them to understand the mechanism of language development (Ellis, 2020). When it comes to child L2/FL development, as with the first language acquisition research, because children’s linguistic development is intertwined with their cognitive and social-affective development, the two major theories in child development research have had an enormous influence on understanding child FL/L2 development—namely, Jean Piaget’s (1896–1980) theory of cognitive development and Lev S.Vygotsky’s (1896–1934) sociocultural theory (Pinter, 2011). The theories of Piaget and his successors have had varying degrees of impact on researchers interested in cognitive-interactionist bases of child language development, while Vygotsky’s theory has constituted a branch of research that investigates sociocultural bases of language development. Piaget’s (1952) theory of cognitive development primarily concerns the internal mental processes of individual children based on the premise that they are adaptive to the external environment. Piaget believed that children are active thinkers who seek and adapt information about the world around them. Piaget argued that children go through stages, each of which is characterized by distinct and qualitatively different thinking, and that this process is universal and invariant. In other words, he thought that all children go through the same stages in the same order without skipping any. According to Piaget, during the preschool and early primary school years (ages 2–7, the preoperational period), children start constructing action- based concepts while using symbols such as words and engaging in symbolic play.

18

188 Yuko Goto Butler

Children at this stage have a hard time thinking logically and take egocentric views in their thinking. During middle childhood (ages 7–11, the concrete operational stage), children begin to develop logical thinking, although their thinking is still largely concrete. They become less egocentric (i.e., they can take others’ perspectives) and can pay attention to multiple aspects of a problem or task. After children reach the formal operational period (ages 11–15), their mental operations go beyond concrete thinking and increasingly show more sophisticated logical and abstract thinking (Miller, 2014). Piaget’s theory has had a tremendous influence on successive research on child cognitive development. In this volume, for example, Hirosawa and Oga-Baldwin (survey methods, Chapter 3) suggest that the item construction of surveys should consider the developmental stages that Piaget laid out. (For details, refer to Chapter 3). Piaget relied on observations and interviews as research methods.This approach has long influenced successive research methods in child cognitive development as well as first language development. However, his heavy reliance on children’s verbal responses has been criticized. Indeed, Butler (verbal reports, Chapter 5 in this volume) reminds us of the frequent discrepancies between what children say and do. Sayer and Ataei (Chapter 2 on observation and ethnographic methods) and Pinter (Chapter 4 on interviews) also address the importance of incorporating other data sources (e.g., photos, drawings) in addition to children’s verbal data, or connecting their verbal data with their actions in order to improve the validity of the data. Vygotsky (1986) criticized Piaget’s theory for relying too heavily on maturational factors and argued that it underestimated the role of culture in children’s development. Instead, he argued that culture should play a central role in sociocultural theory (Gajdamaschko, 2015). Culture creates symbolic artifacts such as music, the arts, physical tools, and, most importantly, language. Children use these cultural artifacts to mediate and regulate their relationship with the world. By internalizing (i.e., acquiring) symbolic artifacts while interacting with others (e.g., parents, teachers, and capable peers), children can develop higher psychological functions, including memory and attention. For Vygotsky, thinking and speaking are highly interdependent and constantly changing (Lantolf, 2000). While Piaget considered children’s private speech (i.e., speech not directed to others) to be a reflection of their egocentric thought,Vygotsky viewed children’s private speech as a cultural tool to self-regulate their behaviors and facilitate their verbal thinking. Accordingly, researchers’ interpretation of children’s verbal data will differ depending on the theory to which they subscribe. Vygotsky and his followers proposed a model of the periods of child development. At each period, certain culturally specific activities are central for driving children’s emotional and cognitive development. Moreover, some periods are more emotional-interpersonal in focus while others are more cognitive. Importantly, in sociocultural theory, such periods are developmentally determined rather than determined by age; some children may stay longer at a certain period but make a

18

819

18

Considerations for Child SLA Research 189

quick transition to another period depending on their social and cultural contexts of development (Kozulin, 2015). Therefore, researchers want to design tasks that correspond well with cultural activities for their child participants.

Adapting Research Methods to Account for Developmental Factors As the chapters in this book make apparent, every methodological approach requires specific considerations and adjustments when applied to young learners. Table 11.1 lists major factors for consideration in child SLA research and indicates the extent to which each factor may potentially influence data collection and interpretation for the methods discussed in this book. In reality, the influence of these factors depends greatly on the specific context/condition in which the given method is introduced; it is not as straightforward as this table suggests. However, the table provides a general picture of which factors researchers need to pay attention to when implementing the given method in their research concerning young learners. In the following sections, I briefly discuss the major categories of factors.

Cognitive and Metacognitive Factors During the preschool and primary school years, children undergo drastic changes in memory capacity, attention span, information-processing speed, and metacognitive ability. These are all important factors for researchers to consider when designing studies for children. Indeed, many chapters in this volume address a number of recommendations for reducing the cognitive burden of various tasks. Such recommendations include using shorter and fewer items while focusing on the here-and-now in surveys (Hirosawa & Oga-Baldwin, Chapter 3); using videos and drawings to assist children’s memory in stimulated recall (Butler, Chapter 5); and selecting picture books that are not too “busy” for picture-matching tasks and reducing the complexity and length of sentences in grammaticality judgment tasks (Montrul et al., Chapter 7). Moreover, to keep children’s attention during tasks related to eye tracking (Dussias & Miller, Chapter 8), brain imaging (Nickerson & Kovelman, Chapter 9), and children with special needs (Sheng & Hollenbach, Chapter 10), the authors all recommend having fewer stimuli, using attractive objects, and shortening the duration of the tasks. Memory is categorized by different domain-specific types that have different developmental patterns. For example, implicit memory (i.e., memory without awareness) is present from birth and shows little change over time. In contrast, explicit memory (i.e., conscious memory such as episodic and event memory) improves during the ages of 6 to 12. Children can remember specific events for a longer time from a young age (younger than 3), but their age affects the amount and quality of the events that they can remember (Schneider, 2010). This has

910

newgenrtpdf

Observation/ ethnography

Memory Attention Processing speed/efficiency Metacognition/meta-awareness Anxiety Engagement L1 development (understanding items and procedures, verbalizing own thoughts)(a) Choice of L1/L2 Cultural assumptions Reactivity Familiarity(b) (trainings are most likely required) ++ + +/++ Notes:

Survey

Interviews

Speech elicitation

Verbal reports

Standardized tasks

Naturalistic sampling

Structured sampling

Think- aloud

Stimulated recall

. . . . . . .

+ + + + +/++ ++ ++

+ + . + + + ++

++ ++ ++ ++ ++ ++ +/++

+ + + . . . .

++ ++ ++ ++ ++ ++ +

+ ++ ++ ++ ++ ++ ++

++ ++ . ++ ++ ++ ++

. ++ +/++ .

++ + +/++ +

++ + +/++ +

. ++ . +/++

. - +/++ .

. +/++ . ++

++ . ++ ++

++ . . ++

Greatly influential or almost always influential Somewhat influential or potentially influential Depending on specific tasks used in the given method. Minimal influence in most cases (a) This may be carried out through children’s L1 or other nontarget languages among multilingual children. (b) Familiarity with tasks and procedures so that children can maximize their performance.

190 Yuko Goto Butler

TABLE 11.1 Factors influencing different research methods for young learners

910

19

newgenrtpdf

910

Table 11.1 Cont.

Receptive methods

Brain imaging

For children with special needs(c)

Sentence-picture matching task

Grammaticality judgment task

Narrative sampling

+/++ ++ ++ +/++ ++ ++ +

++ ++ ++ +/++ ++ ++ +/++

+/++ ++ ++ . ++ ++ ++

+/++ ++ +/++ . ++ ++ +/++

+ + + + ++ ++ +

. ++ . +/++

. + . ++

. +/++ +/++ ++

. +/++ +/++ ++

. . . +

++ Greatly influential or almost always influential + Somewhat influential or potentially influential +/++ Depending on specific tasks used in the given method. Minimal influence in most cases Note: (c) Chapter 10 (“Research methods for L2 children with special needs”) is different from the other chapters in that it does not focus on a particular research method but discusses a number of different methods. Narrative sampling, a type of elicitation method, is listed here as one of the popular research methods used for children with special needs.

Considerations for Child SLA Research 191

Memory Attention Processing speed/efficiency Metacognition/meta-awareness Anxiety Engagement L1 development (understanding items and procedures, verbalizing own thoughts)(a) Choice of L1/L2 Cultural assumptions Reactivity Familiarity(b) (trainings are most likely required)

Eye tracking

912

192 Yuko Goto Butler

useful implications for interviews and other methods that elicit children’s memory and thinking through verbalization. In experimental research designs, children’s short-term memory or working memory is a concern when designing and implementing tasks and interpreting the results. Short-term memory is generally considered to be the capacity to hold a limited amount of information temporarily. Working memory, although sometimes used almost as a synonym for short-term memory, is often thought of as the capacity to both hold and process information during a given task, such as reading.1 There is convincing evidence showing that short-term memory or working memory increase steadily over time during childhood (until around the age of 18) (Schneider, 2010). This increase is primarily due to the increase in information-processing speed, which shows a drastic improvement during early childhood. Information-processing speed itself is influenced not only by maturational factors but also by experiential factors, such as greater use of strategies and increasing familiarity with the material to be remembered (Schneider, 2010). Working memory also involves attention. Attention is composed of multiple aspects including sustained attention, focused attention, and shifting of attention, all of which generally show rapid changes from the ages of 8 to 10, after which changes slow down (Klenberg et al., 2001). Depending on children’s age, therefore, researchers must carefully attend to the kinds and amount of memory load required to complete a given task. Metacognition, traditionally defined as “cognition that reflects on, monitors, or regulates first-order cognition” has been studied extensively, in part to better understand “how and why cognitive development both occurs and fails to occur” (Kuhn, 2000, p. 178). Researchers used to believe that metacognitive abilities did not emerge until children reached 8–10 years old. More recently, however, due to the reconceptualization of metacognition (from focusing on metamemory to covering much broader contexts) and methodological refinements (introducing more age-appropriate tasks and incorporating observations instead of heavily relying on children’s verbal responses), metacognitive abilities are now thought to emerge earlier—by the age of 4 if not before (Whitebread et al., 2010). Considering children’s metacognitive abilities is important when designing research because metacognition certainly impacts children’s task performance. Moreover, the impact of metacognition also depends on children’s affective responses to the given task, such as their beliefs about the value of the task and their feelings about its difficulty (Whitebread et al., 2010).

Affective Factors Given that cognition and affects are not separable (Butler, 2019; Calkins & Bell, 2010), sufficient consideration should be given to research participants’ affective factors.This is particularly critical when the target participants are children because they are vulnerable to negative emotional and/or unsuccessful experiences—more

912

193

912

Considerations for Child SLA Research 193

vulnerable than was previously thought (e.g., Carless & Lam, 2014). In contrast, positive affects increase children’s performance in various tasks and vice versa (e.g., Butler, 2017; Rader & Hughes, 2005). Several authors in this volume address the importance of creating comfortable environments for child participants. Pinter (interview, Chapter 4) suggests that interviews should be embedded in children’s familiar activities as opposed to treated as an independent special activity. As for tasks and activities employed for research purposes, children should, ideally, already be familiar with them. When novel tasks and activities are introduced, it is highly advisable to provide sufficient warm-up time (Huang & Ramírez, speech elicitation methods, Chapter 6) and/or pre-task trials (Montrul et al., receptive methods, Chapter 7) to reduce children’s anxiety as well as to make sure that they understand the task procedure. Providing a physically pleasant space, such as comfortable seating (Dussias & Miller, Chapter 8, eye tracking), or choosing child-friendly neuroimaging tools (Nickerson & Kovelman, Chapter 9, brain imaging) is also critical in experimental studies. Ideally, whoever administers the interviews, tasks, and experiments should be someone the participating children know, such as their teacher. If the children do not already know the individual, researchers may want to allow time for the children to become acquainted with them (Sayer & Ataei, ethnographic methods, Chapter 2) or have somebody already familiar to them (e.g., parents and teachers) assist the children so that they are more at ease during the task procedure (Nickerson & Kovelman, Chapter 9, brain imaging). In some research contexts, pair or group administration may be preferable to individual administration, as Pinter (interview, Chapter 4) and Butler (verbal reports, Chapter 5) suggest. Indeed, pair or group administration may reduce children’s anxiety while also avoiding child–adult power imbalances (addressed below). Pair or group administration of interviews and tasks may allow researchers to elicit authentic information (e.g., information shared among children that would not be revealed in child–adult interactions) (Butler, Chapter 5). Care needs to be taken, however, when pairing and grouping children and designing tasks and activities. Depending on social dynamics among children, certain emotions (e.g., peer pressure and overexcitement) may lead to undesirable outcomes. Lastly, researchers have put a lot of thought into maintaining children’s interest and engagement in tasks. Materials should be age appropriate and attractive to children (Dussias & Miller, eye tracking, Chapter 8). For brain-imaging methods, Nickerson and Kovelman (Chapter 9) discuss using cartoons to keep children motivated while the researchers set up the equipment.

Linguistic and Cultural Factors It is critical to make sure that children understand task instructions and interview/survey questions. The wording should be accessible to the children based

914

194 Yuko Goto Butler

on where they are in their L1 and L2 development. For example, drawing from Dillman et al. (2014), Hirosawa and Oga-Baldwin (surveys, Chapter 3) list concrete tips for constructing survey items that are easier for children to comprehend. Their suggestions include using simpler and familiar words, constructing complete sentences with fewer words and simpler structures, and avoiding double negatives. These suggestions are likely generally applicable to other methods as well. Researchers might also find it necessary to provide visual aids (e.g., pictures) along with verbal instruction. Depending on children’s L1 and L2 proficiencies, the choice of L1 or L2 as a medium of data collection can greatly influence both the quality and quantity of the verbal data that researchers elicit. Researchers who contributed chapters on survey methods (Hirosawa & Oga-Baldwin, Chapter 3), interview methods (Pinter, Chapter 4), and verbal reports (Butler, Chapter 5) all suggest conducting studies in the children’s L1 or the language that they are most comfortable using. The choice of language also potentially influences children’s affective states. Butler (verbal reports, Chapter 5), however, cautions that, in stimulated recall, asking children to verbalize their thoughts in their L1 while working on L2 tasks may create a complicated interplay between L1 and L2, which, in turn, may create reactivity—a phenomenon in which participants’ engagement in research alters the nature of their behaviors and performance. Reactivity can be induced not only by the interaction between L1 and L2 but also other factors such as the presence of researchers, the way that survey/interview questions are formed, and children’s experience with research, as discussed more below. Just as cognition and affects are inseparable, language and culture are inseparable as well. Research on language development is embedded in specific cultural contexts. Researchers often have to deal with various cultural assumptions (e.g., certain types of research activities are more acceptable in some cultures than in others, as discussed by Dussias and Miller in Chapter 8 on eye tracking). Tasks and activities introduced in research must be culturally appropriate. The interpretation of the data must be situated in a given cultural context as well. However, how the rule of culture is conceptualized depends on the theories, research questions, and methods used. Cultural variation has often been treated as something that should be controlled in cognitive-based experimental studies. For example, Montrul et al. (Chapter 7 on receptive methods) state that pictures used in picture-matching tasks should be culturally appropriate. But culture is part of the research validity in social-oriented research (e.g., culture itself is a target of research in understanding children’s language development) such as in Sayer and Ataei (Chapter 2 on observation and ethnographic methods).

Experiential Factors As children grow older, they have more accumulated experiences; their age and experiences are usually correlated. Children’s experiences also influence research

914

195

914

Considerations for Child SLA Research 195

designs and data collection and interpretation. Children’s experience of participating in research activities, or their familiarity with research tasks, can greatly influence their performance. For example, children who are used to being videotaped may have minimal or no reactivity; in other words, videotaping may not alter their behavior or performance. Children who are used to verbalizing their thoughts may also have minimal reactivity in think-aloud tasks (Butler, verbal reports, Chapter 5). Since reactivity is a potential threat to research validity, researchers should be mindful of its possibility. On the contrary, if researchers ask children to do novel or unfamiliar tasks and activities, trainings or trials are often necessary. If any trainings and trials are offered, however, researchers have to make sure that the training and trials themselves do not influence subsequent performance.

Directions for the Future Because the body of research on child L2 development is far less extensive than the body of research on adult L2 development (Oliver & Azkarai, 2017; Philp et al., 2008), research methods developed largely for adult learners are frequently applied to children. When this happens, a number of age-related considerations are necessary. The chapters in this book identify several such considerations and adaptations; however, to advance our understanding of child L2 development, even further refinement and adjustment are needed. In the rest of this chapter, I address two major methodological issues that warrant more attention—namely, how to cope with (a) dynamic and fluid conceptualizations of development and (b) power imbalances between children and adults.

How to Capture Dynamic and Fluid Conceptualization of Development In recent years, researchers of developmental studies have increasingly taken a view that various developmental factors—cognitive, metacognitive, social, affective, and linguistic—are integrated or interrelated for understanding children’s development (Butler, 2017, 2019). For example, Ellis (2019) characterized cognition as “embodied, environmentally embedded, autopoietically enacted, and socially encultured and distributed” (p. 39). Calkins and Bell (2010), representing the view that cognition and affects are “inseparable,” stated that “cognitive processes of thinking, learning, and action can be viewed as regulators of a child’s emotion behaviors. Likewise, emotions can be understood as organizers of behaviors, essentially modifying a child’s thinking, learning and action” (p. 4). Affects, in turn, are increasingly accepted as social and interpersonal phenomena as opposed to the property of individuals (Oatley et al., 2011). Increasingly, researchers in language development (both L1 and L2 development) also view such developmental factors as interrelated, as one can see in major theories in language development.

916

196 Yuko Goto Butler

For example, in “interactionist” or “emergentist” theoretical approaches, language development is seen as a process of interplays between human internal elements and external environmental elements. Similarly, in Vygotsky’s sociocultural theory, as noted above, development is viewed as occurring through meaning-making processes in social interaction with capable others. Usage-based approaches to language development do not accept any prewired language faculty and argue instead that linguistic structures emerge by using language in context (Tomasello, 2003). And according to complexity dynamic systems theory, all variables are interconnected; therefore, any change in a variable will impact all the other variables connected as part of the given system (de Bot et al., 2007). Given the interrelated and integrated nature of multiple variables, it is no surprise that there is substantial variability in human development, including language development. Rather than treating variability as noise, some researchers see it as a window for better understanding the complex mechanism of development. Variabilities in performance are observed not only among children who belong to the same age group (i.e., interindividual variability) but also within a single child over time, across tasks at a single point in time, or even within a single task (i.e., intraindividual variability) (Alibali & Sidney, 2015). Indeed, intraindividual variability across time is a central concern in research on child development (although, when it comes to L2 development research, more longitudinal studies, rather than cross-sectional studies, are needed). Variability in performance across tasks or even in a single task at a single point in development makes researchers aware of the importance of employing multiple tasks or multiple trials among children in order to improve research validity. However, if one adopts a dynamic conceptualization of language development, such as complex dynamic systems theory, variability within a single task or across multiple tasks, which is a state of instability, can be regarded as a moment of change from one knowledge system to another—namely, a moment of learning (Lewis, 2000). As researchers increasingly subscribe to such interrelated, dynamic, and fluid conceptualizations of language development, no single method can uncover the mechanism of language development. Integrating various methods appears to be indispensable. Indeed, a number of authors in this volume addressed the importance of using multiple measures in order to increase the research validity (e.g., Butler in Chapter 5; Huang & Ramirez in Chapter 6; Montrul et al. in Chapter 7; Sheng & Hellenbach in Chapter 10). Recent advances in digital technology also make it easier for researchers to combine nonverbal measures with more traditional, verbal-based measures, as exemplified in eye-tracking and brain-imaging methods. Recent developments in digital technology also give researchers options for taking bottom-up approaches (e.g., analyzing big data) as well as top-down, theory-driven approaches (Butler, 2019). Roy (2009), for example, analyzed a large data set of video-recordings by employing pattern recognition algorithms and data visualization techniques in order to understand infants’ vocabulary

916

197

916

Considerations for Child SLA Research 197

learning mechanism. Such hybrid approaches should be encouraged not only for triangulating data but also for understanding the complex interplay of language and other factors in development. As Ellis (2019) suggested, research on language development now “calls for greater transdisciplinarity, diversity, and collaboration” (pp. 39–40).

How to Take Child-Centered Approaches in Research Another challenge is how to take more child-centered approaches to the study of child L2 development. As mentioned already, L2 development research has largely been conducted among adult learners. Most child L2 research has been derived from the studies and methodologies used in adult L2 development research (Oliver & Azkarai, 2017).Various modifications and adjustments have been made to make methods more appropriate for child participants. However, we may need more fundamentally unique approaches to child L2 studies, not simply modifications and adjustments of adult-based models. In recent years, child development researchers have advocated for research with children as opposed to research on children (Christensen & James, 2017). Research on children treats children as an object of research (e.g., give children some measures and examine their reactions) or as a subject of research (e.g., observe children’s behaviors). In these approaches, even though age-appropriate materials and tasks are given to children, children are still under adult control with little agency. After all, “age-appropriateness” is determined by adults in such research. Even if children’s perspectives are recognized, such as in observational studies, they are still positioned as subjects of study, and interpretations are made from the adults’ point of view. In research on children, a power imbalance between adults and children is inevitable. A number of scholars have discussed how to conduct research with children— research that can respect and value children’s rights, agency, and autonomy (e.g., Christensen & James, 2017; Kellett, 2010). In practice, depending on the research questions and children’s characteristics (e.g., age and experience with research), research with children will need to take different forms. Children’s level of engagement might vary depending on the research phase, from planning the study to interpreting and reporting the results. A minimal level of engagement could entail consulting children on the “age-appropriateness” of the materials and tasks. At a far greater level of engagement, children could be co-researchers or even major players in their own “action research.” Importantly, however, it is not the research method per se that makes the research child-centered but rather the extent to which researchers are aware of the inherent power imbalance between adults and children in child development research and the extent to which researchers can critically reflect and take steps to minimize the gaps in perspectives between adults and children (Alderson, 2005). Ultimately, the goal of research with children is to assist children in becoming autonomous learners by creating opportunities for

918

198 Yuko Goto Butler

them to meaningfully participate in research. In this process, adult researchers need to provide varying types and degrees of assistance to children; however, developing such approaches has been a serious research challenge. In child L2 development studies, research on children remains dominant, and the notion of research with children is not yet sufficiently recognized as a valid approach (Pinter, 2014). This may, in part, be due to the theoretical and logistical complexities of inviting children to research as autonomous actors. Research with children requires rethinking established research methods and practices in L2 development. It would be very fruitful to have more discussions among child L2/ FL development researchers to better incorporate children’s views and voices in their studies.

Conclusion Despite the growth in research on children who learn L2/additional languages, we urgently need more high- quality research on children’s L2 development. Studying child L2 development requires a number of specific considerations. This concluding chapter has synthesized those considerations as they apply to the different methods covered in this book. I have also addressed two major issues for future work. This is an exciting time for child L2 development researchers. As we adopt more dynamic, interrelated, and fluid conceptualizations of language development, the need for hybrid and collaborative efforts will most certainly increase. And as advances in digital technology make it possible to develop more innovative tools and methods for capturing and analyzing dynamic and interrelated phenomena, we will need field-tested approaches for maximizing the value and effectiveness of these tools and methods in order to reflect children’s inner voices, thoughts, and feelings represented in and via their language.

Note 1 Researchers differ in their conceptualization of short- term memory and working memory; see Cowan (2008) for a detailed discussion.

References Alderson, P. (2005). Designing ethical research with children. In A. Farrell (Ed.), Ethical research with children (pp. 27–36). Open University Press. Alibali, M. W., & Sidney, P. G. (2015). The role of intraindividual variability in learning and cognitive development. In M. Diehl, K. Hooker & M. J. Sliwinski (Eds.), Handbook of intraindividual variability across the life span (pp. 84–102). Routledge. Butler, Y. G. (2017). The role of affect in intraindividual variability in task performance for young learners. TESOL Quarterly, 51(3), 728–737. https://doi.org/10.1002/ tesq.385

918

91

918

Considerations for Child SLA Research 199

Butler, Y. G. (2019). Linking noncognitive factors back to second language learning: New theoretical directions. System, 86, 102–127. https://doi.org/10.1016/ j.system.2019.102127 Calkins, S. D., & Bell, M. A. (Eds.). (2010). Human brain development. Child development at the intersection of emotion and cognition. American Psychological Association. https://doi.org/ 10.1037/12059-000 Carless, D., & Lam, R. (2014). The examined life: Perspectives of lower primary school students in Hong Kong. Education 3–13, 42(3), 313–329. https://doi.org/10.1080/ 03004279.2012.689988 Chomsky, N. (1965). Aspect of the theory of syntax. MIT Press. Christensen, P., & James, A. (2017). Research with children: Perspectives and practice. Routledge. Cowan, N. (2008). What are the differences between long- term, short- term, and working memory? Progress in Brain Research, 169, 323–338. https://doi.org/10.1016/ S0079-6123(07)00020-9 de Bot, K., Lowie,W., & Verspoor, M. (2007). A dynamic systems theory approach to second language acquisition. Bilingualism: Language and Cognition, 10(1), 7–21. https://doi.org/ 10.1017/s1366728906002732 Dillman, D. A., Smyth, J. D., & Christian, L. M. (2014). Internet, phone, mail, and mixed-mode surveys:The tailored design methods. John Wiley. Dulay, H., & Burt, M. (1973). Should we teach children syntax? Language Learning, 23, 245– 258. https://doi.org/10.1111/j.1467-1770.1973.tb00659.x Ellis, N. (2019). Essentials of a theory of language cognition. The Modern Language Journal, 103, 39–60. https://doi.org/10.1111/modl.12532 Ellis, R. (2020). A short history of SLA:Where have we come from and where are we going? Language Teaching, 54(2), 190–205. https://doi.org/10.1017/S0261444820000038 Gajdamaschko, N. (2015).Vygotsky’s sociocultural theory. In J. D. Wright (Ed.), International encyclopedia of the social & behavioral sciences (2nd ed., pp. 329–334). https://doi.org/ 10.1016/B978-0-08-097086-8.23203-0 Kellett, M. (2010). Rethinking children and research:Attitudes in contemporary society. Continuum. Klenberg, L., Korkman, M., & Lahti- Nuuttila, P. (2001). Differential development of attention and executive functions in 3-to 12-year-old Finnish children. Developmental Neuropsychology, 20(1), 407–428. Kozulin, A. (2015). Vygotsky’s theory of cognitive development. In J. D. Wright (Ed.), International Encyclopedia of the Social & Behavioral Sciences (2nd ed., pp. 322–328). www. sciencedirect.com/science/article/pii/B9780080970868230948 Kuhn, D. (2000). Metacognitive development. Current Directions in Psychological Science, 9(5), 178–181. https://doi.org/10.1111/1467-8721.00088 Lantolf, J. P. (2000). Introducing sociocultural theory. In J. P. Lantolf (Ed.), Sociocultural theory and second language learning (pp. 1–26). Oxford University Press. Lewis, M. D. (2000). The promise of dynamic systems approaches for an integrated account of human development. Child Development, 71, 36–43. https://doi.org/10.1111/ 1467-8624.00116 Miller, P. H. (2014). Piaget’s theory: Past, present, and future. In U. Goswami (Ed.), The Wiley- Blackwell handbook of childhood cognitive development (pp. 649– 672). Malden, MA: John Wiley. Oatley, K., Parrott,W. G., Smith, C., & Watts, F. (2011). Cognition and emotion over twenty- five years. Cognition and Emotion, 25, 1341–1348. https://doi.org/10.1080/02699 931.2011.622949

20

200 Yuko Goto Butler

Oliver, R., & Azkarai, A. (2017). Review of child second language acquisition (SLA): Examining theories and research. Annual Review of Applied Linguistics, 37, 62–76. https://doi.org/10.1017/s0267190517000058 Philp, J., Oliver, R., & Mackey, A. (2008). Child’s play? Second language acquisition and the younger learner in context. In J. Philp, R. Oliver & A. Mackey (Eds.), Second language acquisition and the younger learner (pp. 3–23). John Benjamins. Piaget, J. (1952). Origins of intelligence in the child (A. Cook,Trans.). International Universities Press. (Original work published 1936) Pinter, A. (2011). Children learning second languages. Palgrave Macmillan. Pinter, A. (2014). Child participant roles in applied linguistics research. Applied Linguistics, 35(2), 168–183. https://doi.org/10.1093/applin/amt008 Rader, N., & Hughes, E. (2005). The influence of affective state on the performance of a block design task in 6-and 7-year-old children. Cognition and Emotion, 19, 143–150. https://doi.org/10.1080/02699930441000049 Roy, D. (2009). New horizons in the study of child language acquisition. www-prod.media.mit. edu/publications/new-horizons-in-the-study-of-child-language-acquisition/ Schneider,W. (2010). Memory development in childhood. In U. Goswami (Ed.), The Wiley- Blackwell handbook of child cognitive development (pp. 347–376). John Wiley. Tomasello, M. (2003). Constructing a language: A usage-based approach to child language acquisition. Harvard University Press. VanPatten, B. & Williams, J. (Eds.). (2015). Theories in second language acquisition: An introduction. Taylor & Francis. Vygotsky, L. (1986). Thought and language (A. Kozulin, Ed. & Trans.). MIT Press. (Original work published 1934). Whitebread, D., Almeqdad, Q., Bryce, D., Demetriou, D., Grau, V., & Sangster, C. (2010). Metacognition in young children: Current methodological and theoretical developments. In A. Efklides & P. Misailidi (Eds.), Trends and prospects in metacognitive research (pp. 233–258). Springer.

20

210

INDEX

acceptability judgment task 6, 111–15, 117 act out task 133 additional language(s) 1, 3, 166, 198 anxiety 3, 7, 67, 74, 87, 186, 190, 191, 193 assistive technology 92, 93 attention 7, 37, 39, 42, 65, 127, 128, 130, 132, 147, 152, 153, 155, 158, 173, 180, 186, 188–92 autism spectrum disorder 7, 164, 173, 180 automated scoring 90, 91 bilingualism 22, 117, 137, 144, 146, 153, 158, 164, 173, 175, 176 brain development 144–6, 156 brain-imaging techniques 3 child-centered approach 75–7, 187, 197 conversation(s) 15, 17, 19, 22, 24, 55, 58, 60, 110 corpus 97, 121, 136 cultural assumption 7, 186, 190, 191, 194 developmental language disorder 5, 7, 84, 164, 167, 180 diagnostic accuracy 169, 170, 181 dynamic assessments 5, 94 ecological approach 18, 20, 25 elicitation task 3, 121, 136 ethical consideration 3, 43 ethnographic case study 19, 22, 23

eye tracking 3, 6, 76, 121, 123–6, 189, 191, 193, 194, 196 familiarity 41, 56, 66, 88, 93, 134, 190, 191, 192, 195 first language acquisition 3 grounded theory 19, 24 interactional routines 12, 25 interdisciplinary 8 intervention 7, 13, 52, 91, 144, 170, 171, 177 interview 3–5, 11, 13, 19–25, 34, 37, 42, 49–59, 65, 69, 70, 73, 74, 112, 172, 188, 190, 192–4 introspective method 64 KJ method 42 L1 development 2, 7, 186, 190, 191 language sample(s) 88, 89, 94, 165, 166, 169 language sampling methods 87, 89, 92, 93, 95 localization 150–1 MacArthur-Bates Communicative Development Inventories 94, 134, 177 memory 7, 37, 38, 65, 67, 71, 73, 74, 106, 116, 153, 155, 158, 166, 180, 186, 188–92 metacognition 7, 186, 190–2

20

202 Index

methodological standard 7, 164, 170, 180 morphosyntax 102, 115, 175, 178 multiple measures 5, 95, 115, 196 observation protocol 13–16 outreach 129 participant observation 4, 11, 13, 14, 16–22, 29 phonological awareness 146, 149, 153, 155 picture matching task 6, 102, 116, 189, 191, 194 power gap 5, 55, 60 processing speed 38, 73, 170, 186, 189–92 psycholinguistic methods 3 qualitative 4, 13–17, 19–25, 28, 37, 40, 43, 49, 50, 128 quantitative 14, 15, 18, 40, 131 questionnaires 4, 22, 33–40, 42, 52, 172, 173, 175 reactivity 5, 16, 66, 67, 71, 190, 191, 194, 195 reading comprehension 121, 128

second language learning 11, 49, 64, 75, 144, 145, 146, 151, 152, 159 self-reflection 64 self-report 4, 40–2 sequence of L2 acquisition 2 sequential bilingual(s) 3, 21, 144, 165, 169, 174–6 simultaneous bilingual(s) 3, 127, 145, 173 speech production 4, 5, 84–8, 90, 91, 93–5 standardized norm-referenced assessments 5, 85, 89 stimulated recall 2, 5, 64–71, 73–5, 189, 190, 194 surveys 2–4, 11, 13, 33–44, 92, 95, 188, 189, 194 think-aloud 2, 5, 65–71, 73, 74, 76, 77 verbal report 3, 5, 64–77, 188, 190, 193–5 veridicality 66, 67, 71 visual world paradigm 6, 121, 124, 129 wording 34, 35, 37, 41, 43, 44, 193

20