Instructed Second Language Acquisition Research Methods 9027212678, 9789027212672

Written for novice and established scholars alike, Instructed Second Language Acquisition Research Methods is a stand-al

132 57 6MB

English Pages 388 [414] Year 2022

Recommend Papers

The Grammar Dimension in Instructed Second Language Learning: Advances in Instructed Second Language Acquisition Research 9781472542113

One of the key issues in second language learning and teaching concerns the role and practice of grammar instruction. Do

160 14 2MB Read more

Investigations in Instructed Second Language Acquisition 9783110197372, 9783110179705

Methods in current instructed second language acquisition research range from laboratory experiments to ethnography usin

157 72 2MB Read more

Early Instructed Second Language Acquisition: Pathways to Competence 9781788922517

Studies investigate pathways which lead to successful foreign language learning among very young pupils This book exam

162 76 1MB Read more

Speaking and Instructed Foreign Language Acquisition 9781847694126

This book investigates various aspects of speaking in a foreign language. It is unique in considering this key skill fro

135 51 1MB Read more

Willingness to Communicate in Instructed Second Language Acquisition: Combining a Macro- and Micro-Perspective 9781783097173

This book investigates individual differences variables as well as contextual factors that impinge on second language le

137 75 2MB Read more

Psycholinguistic Approaches to Instructed Second Language Acquisition: Linking Theory, Findings and Practice 9781788928762

This book applies a psycholinguistic perspective to instructed second language acquisition, bridging the gap between res

105 47 706KB Read more

Second Language Acquisition Myths: Applying Second Language Research to Classroom Teaching 9780472034987

100 46 10MB Read more

Arabic Second Language Acquisition of Morphosyntax 9780300159158

While the demand for Arabic classes and preparation programs for Arabic language teachers has increased, there is a nota

185 101 10MB Read more

Fossilization in Adult Second Language Acquisition 9781853596889

This book addresses the issue of fossilization in relation to a key question in SLA research, which is: why are learners

105 81 668KB Read more

Measuring Second Language Vocabulary Acquisition 9781847692092

Measuring Second Language Vocabulary Acquisition describes the effect that word frequency and lexical coverage have on l

142 25 1MB Read more

Instructed Second Language Acquisition Research Methods
9027212678, 9789027212672

Author / Uploaded
Laura Gurzynski-Weiss (editor)
Youjin Kim (editor)

Commentary
No bookmarks or hyperlinks.

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Research Methods in Applied Linguistics

3 Instructed Second Language Acquisition Research Methods

Edited by Laura Gurzynski-Weiss YouJin Kim

John Benjamins Publishing Company

Instructed Second Language Acquisition Research Methods

Research Methods in Applied Linguistics (RMAL) issn 2590-096X The Research Methods in Applied Linguistics (RMAL) series publishes authoritative general guides and in-depth explorations of central research methodology concerns in the entire field of Applied Linguistics. The hallmark of the series is the contribution to stimulating and advancing professional methodological debates in the domain. Books published in the series (both authored and edited volumes) will be key resources for applied linguists (including established researchers and newcomers to the field) and an invaluable source for research methodology courses. Main directions for the volumes in the series include (but are not limited to): Comprehensive introductions to research methods in Applied Linguistics (authoritative, introductions to domain-non specific methodologies); In-depth explorations of central methodological considerations and developments in specific areas of Applied Linguistics (authoritative treatments of domain-specific methodologies); Critical analyses that develop, expand, or challenge existing and/or novel methodological frameworks; In-depth reflections on central considerations in employing specific methodologies and/or addressing specific questions and problems in Applied Linguistics research; Authoritative accounts that foster improved understandings of the behind the scenes, inside story of the research process in Applied Linguistics. For an overview of all books published in this series, please see benjamins.com/catalog/rmal

Editor Rosa M. Manchón University of Murcia

Editorial Board David Britain

Juan Manuel Hernández-Campoy

Diane Pecorari

University of Bern

University of Murcia

City University of Hong Kong

Gloria Corpas Pastor

Ute Knoch

Luke Plonsky

University of Malaga

University of Melbourne

Northern Arizona University

Marta González-Lloret

Anthony J. Liddicoat

Li Wei

University of Hawai’i

University of Warwick

University College London

Laura Gurzynski-Weiss

Brian Paltridge

Indiana University Bloomington

University of Sydney

Volume 3 Instructed Second Language Acquisition Research Methods Edited by Laura Gurzynski-Weiss and YouJin Kim

Instructed Second Language Acquisition Research Methods Edited by

Laura Gurzynski-Weiss Indiana University

YouJin Kim Georgia State University

John Benjamins Publishing Company Amsterdam / Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

doi 10.1075/rmal.3 Cataloging-in-Publication Data available from Library of Congress: lccn 2022041953 (print) / 2022041954 (e-book) isbn 978 90 272 1267 2 (Hb) isbn 978 90 272 1268 9 (Pb) isbn 978 90 272 5697 3 (e-book)

© 2022 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Company · https://benjamins.com

For our collaborators in love and life: Nick & Hyung Ju

Table of contents

Acknowledgements

ix

Introduction to the volume

xi

List of contributors

xix

Section 1. Introduction Chapter 1 Getting started: Key considerations Laura Gurzynski-Weiss and YouJin Kim

3

Section 2. Identifying your research approach Chapter 2 Quantitative research methods in ISLA Shaofeng Li Chapter 3 Qualitative ISLA research methodologies and methods Peter I. De Costa, Robert A. Randez, Carlo Cinaglia, and D. Philip Montgomery Chapter 4 Mixed methods research in ISLA Masatoshi Sato Chapter 5 Replication research in instructed SLA Kevin McManus

31

55

79

103

Section 3. ISLA research across methodological approaches Chapter 6 Unique considerations for ISLA research across approaches Laura Gurzynski-Weiss and YouJin Kim

125

viii Research Methods in Instructed Second Language Acquisition

Section 4. Designing instructional interventions for specific skills & competencies Chapter 7 Pragmatics: Assessing learning outcomes in instructional studies Naoko Taguchi and Soo Jung Youn Chapter 8 Vocabulary: A guide to researching instructed second language vocabulary acquisition Emi Iwaizumi and Stuart Webb Chapter 9 Grammar: Documenting growth in L2 classrooms Paul D. Toth Chapter 10 Pronunciation: What to research and how to research in instructed second language pronunciation Andrew H. Lee and Ron I. Thomson

149

181

207

233

Chapter 11 Listening: Exploring the underlying processes Ruslan Suvorov

257

Chapter 12 Reading: Adopting interdisciplinary paradigms in ISLA reading research Irina Elgort

281

Chapter 13 Writing: Researching L2 writing as a site for learning in instructed settings Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

305

Chapter 14 Speaking: Complexity, accuracy, fluency, and functional adequacy (CAFFA) 329 Folkert Kuiken and Ineke Vedder Section 5. Sharing your research Chapter 15 Contributing to the advancement of the field: Collaboration and dissemination in ISLA YouJin Kim and Laura Gurzynski-Weiss Index

355 379

Acknowledgements

This volume was possible due to the generosity of energy, time, feedback, and sup port of numerous colleagues. First and foremost, we would like to thank Rosa M. Manchón (Editor of the Research Methods in Applied Linguistics book series), Kees Vaes (Acquisitions Editor at John Benjamins), and the team at John Benjamins for the invitation to create this project, and for their impactful guidance throughout. In particular, Rosa’s detailed comments at various steps of this project have greatly im proved the volume. Additionally, we are wholeheartedly grateful for our colleagues who contributed. Thank you for sharing your genius in such an accessible way. We would like to acknowledge the reviewers, who provided comprehensive and useful comments that greatly strengthened the volume: Rebekha Abbuhl (Cali fornia State University Long Beach, United States), Rosa Alonso Alonso (University of Vigo, Spain), Jennifer Behney (Youngstown State University, United States), Alessandro Benati (Anaheim University, United States), Frank Boers (Western University, Canada), Carly Carver (Augusta University, United States), Tracey Derwing (University of Alberta, Canada), Patricia A. Duff (The University of British Columbia, Canada), Martin East (University of Auckland, New Zealand), Marta González-Lloret (University of Hawai’i at Mānoa, United States), Suzanne Graham (University of Reading, United Kingdom), Mark Johnson (East Carolina University, United States), Sara Kennedy (Concordia University, Canada), Avizia Long (San José State University, United States), Josh Matthews (University of New England, Australia), Marije Michel (University of Groningen, Netherlands), Ryan Miller (Kent State University, United States), Rosamund Mitchell (University of Southampton, United Kingdom), Atsushi Mizumoto (Kansai University, Japan), Mirosław Pawlak (Adam Mickiewicz University, Poland), Ana Pellicer Sánchez (University College London, United Kingdom), Luke Plonsky (Northern Arizona University, United States), Graeme Porte (University of Granada, Spain), Andrea Révész (University of London, United Kingdom), Ellen J. Serafini (George Mason University, United States), Rachel Shively (Illinois State University, United States), Julio R. Torres (University of California Irvine, United States), Nicole Tracy-Ventura (West Virginia University, United States), Pavel Trofimovich (Concordia University, Canada), Paula Winke (Michigan State University, United States), and Haomin (Stanley) Zhang (East China Normal University, China). We would also like to express our gratitude as well to colleagues Andrea Révész (University of London,

x

Research Methods in Instructed Second Language Acquisition

United Kingdom) and Caroline Payant (Université du Québec à Montréal, Canada), for valuable additional feedback on Chapters 1, 6, and 15. We are indebted especially to colleagues from Indiana University (United States): Megan DiBartolomeo provided outstanding research assistance through out this project (made possible in part by a generous grant from the Department of Spanish and Portuguese), and Megan Solon provided exceptional consultation and editing in the final stages. Their attention to detail, careful reading, and generosity of time truly made the difference. We are also grateful for the wonderful graduate students we have worked with throughout the years at Indiana University and Georgia State University, and espe cially those who piloted the volume in an Instructed Second Language Acquisition Research Methods seminar taught at IU during the spring of 2022: Marcela de Oliveira e Silva Lemos; Nick Blumenau; Katie Lindley; Rachel Garza; Estefany Sosa; Christine Song; and Mike Uribe. We would like to acknowledge several frolleagues (friends/colleagues) who provided support and encouragement throughout our COVID-impacted writ ing: Melissa Baralt, Ellen J. Serafini, Julio R. Torres, Luke Plonsky, Kim Geeslin, Caroline Payant, Nicole Tracy-Ventura, Ute Römer, Andrea Révész, and Manuel Díaz-Campos: we appreciate you more than we can put into words. Thank you for the always on-point memes and GIFs that powered us through when needed, for being a sounding board to process our ideas, and for your ánimo, understanding, and humor when it was needed the most. Finally, we would like to thank our collaborators in non-academic life. From Laura – Heartfelt thanks to Nick, Felix, and Vesper Weiss for their love and contagious motivation to live each day as an adventure that begins and ends with snuggles and koala hugs. I could not be more grateful. Thank you, too, to my parents, Ted and Kathy Gurzynski, and to our babysitters, Annie Tuszynski and Maggie Bott, who provided much-needed backup when we needed to constantly pivot. Finally, a massive thank you to my co-editor YouJin: it was an honor to create this volume with you and have an excuse for weekly meetings. You are an inspira tional colleague and an even more special friend. From YouJin – Special thanks to my parents, Bokhee Song and Deokgon Kim, who always believe in me and motivate me to see the outside the box and to think big. Also I want to extend my gratitude to Hyung Ju Pak for all of his support and love during the long journey. You inspire me everyday and remind me of why I love what I do. Additionally, I cannot thank my co-editor Laura enough: Working with you over the last two years on this special volume has been an incredible experience that I cherish so much!

Introduction to the volume

Instructed second language acquisition research methods is a stand-alone “how to”’ research methods guide from an Instructed Second Language Acquisition (ISLA) lens. The volume consists of five sections and 15 chapters: (1) introduction to ISLA (Chapter 1), (2) identifying your research approach (Chapters 2 through 5), (3) ISLA research across methodological approaches (Chapter 6), (4) designing in structional interventions for specific skillls and competencies (Chapters 7 through 14), and (5) sharing your research (Chapter 15). We introduce readers to the field of ISLA, outline the basics of research design, and provide concrete guidance on how to come up with research questions, how to identify the right methodology and method(s), how to adapt an existing instrument or determine the need to create your own, how to carry out a study, how to analyze and interpret your data, and decide how/where/when to share your work. These questions are answered focusing on four skill areas (listening, speaking, reading, writing) as well as four major linguistic features (grammar, vocabulary, pronunciation, pragmatics). By introducing different research methods and then focusing on how they fit in the re search domain of learning each language skill/feature, we actively guide novice and experienced readers alike in developing their knowledge of research methods with each skill or feature context in mind. The volume additionally provides chapters that address common inquiries of conducting ISLA research, particularly in classroom contexts (e.g., obtaining IRB, collaborating with classroom teachers, working with small sample sizes, considering individual differences), and that suggest how to maximize ISLA research findings to contribute to language pedagogy.

Intended audience The intended audience of this research guide is novice and junior ISLA researchers and graduate and advanced undergraduate students who are interested in conduct ing ISLA research. By novice we mean novice to ISLA, to ISLA research methodol ogy, or novice to one of the methods or competencies of focus within the volume. This volume can be used as a textbook in a research methods course, particu larly focusing on ISLA, topics courses on ISLA, as well for courses on L2 teaching

https://doi.org/10.1075/rmal.3.int © 2022 John Benjamins Publishing Company

xii Research Methods in Instructed Second Language Acquisition

methods. It is also useful in a professionalization course for graduate students in applied linguistics, Teaching English to Speakers of Other Languages (TESOL), and world language departments.

Common topics across chapters Common topics are included to facilitate usability of the chapters, which are each written by a leading expert on the topic at hand. For example, chapters providing step-by-step guidance on specific types of ISLA research methodologies (Chapter 2 on quantitative, Chapter 3 on qualitative, Chapter 4 on mixed methods, and Chapter 5 on replication studies) each address the following topics: 1. What is X research methodology and why is it important for ISLA? 2. What typical research questions are targeted by X methodology? 3. Common options for investigation using X methodology with step-by-step guidelines and example studies 4. Advice for future X methodology researchers 5. Troubleshooting X research methodology 6. Conclusion 7. Further reading and additional resources 8. References Chapters that guide the reader through instructional interventions for specific skills and competencies (Chapter 7 on pragmatics, Chapter 8 on vocabulary, Chapter 9 on grammar, Chapter 10 on pronunciation, Chapter 11 on listening, Chapter 12 on reading, Chapter 13 on writing, and Chapter 14 on speaking) include the following: 1. What is X competency/skill and why is it important in ISLA? 2. What do we know and what do we need to know about X competency/skill in ISLA? 3. Data elicitation and interpretation options, step-by-step guidelines, and example studies for research on X 4. Advice to future X researchers 5. Troubleshooting X research 6. Conclusions 7. Further reading and additional resources 8. References

Introduction to the volume xiii

Section 1. Defining the domain of ISLA Chapter 1: Getting started: Key considerations In Section 1, Chapter 1 introduces the volume by familiarizing the reader with ISLA and research methodology and methods. Laura Gurzynski-Weiss and YouJin Kim begin the chapter by operationalizing ISLA, outlining its overarching goals, and presenting key concepts across theories in ISLA. They discuss the aims of scientific research and research design in general, providing questions to guide the novice researcher through study design and implementation. Lastly, they discuss the ethics of research with human subjects and the different types of studies (e.g., exempt, expedited) one might find in ISLA research. The chapter concludes with suggestions for further reading about ISLA theory, as well as ethics in applied lin guistics research. Section 2. Identifying your research approach Chapter 2: Chapter 3: Chapter 4: Chapter 5:

Quantitative research methods in ISLA Qualitative ISLA research methodologies and methods Mixed methods research in ISLA Replication research in instructed SLA

Section 2 explores the different methodologies used in ISLA research. In Chapter 2, Shaofeng Li provides an overview of quantitative research methodology and the most commonly used quantitative methods in ISLA research. Beginning with a discussion of key concepts in quantitative research (e.g., sample size, variables), he follows by enumerating typical research questions that are answered using quanti tative methodology. Li then discusses the principles of study quality and examines the nuances of experimental, correlational, and observational research. Finally, Li concludes with guidance for study design and reporting quantitative data, for con ducting statistical analyses in ISLA research, and for addressing challenges in data analysis. Li’s guide to quantitative methods is followed by Peter I. De Costa, Robert A. Randez, Carlo Cinaglia, & D. Philip Montgomery’s detailed discussion of quali tative methods in Chapter 3. The authors begin with an introduction to the theories used in qualitative research (language socialization, identity and agency, socio cultural theory, emotion, and motivation and investment) and provide example research questions explored using each theory. De Costa et al. then discuss common qualitative methodologies, focusing specifically on case studies, ethnographies, and conversation analysis. They conclude their chapter with advice to future qualitative researchers and for troubleshooting qualitative methods. Chapter 4 by Masatoshi

xiv Research Methods in Instructed Second Language Acquisition

Sato explores the increasingly popular mixed methods research (MMR) in ISLA. He begins the chapter with a discussion of how MMR can remedy methodological challenges found in using either quantitative or qualitative methods alone. Next, Sato provides readers with typical research questions asked in MMR and an over view of three common options for MMR including convergent, explanatory sequen tial, and exploratory sequential designs. Sato’s chapter concludes with step-by-step guidelines for how to conduct MMR projects and advice for future mixed methods researchers. Concluding the section with Chapter 5, Kevin McManus examines replication research in ISLA. He describes what this research entails and the ques tions it may answer, the most common types of replication research, and why it is important to conduct replication studies. McManus subsequently provides an overview of replication research in ISLA. Finally, the chapter gives guidelines for carrying out replication studies, how to trouble-shoot issues that may arise in this type of work, and makes recommendations for future replication research in ISLA. Section 3. ISLA research across methodological approaches Chapter 6: Unique considerations for ISLA research across approaches In Section 3, Chapter 6 by Laura Gurzynski-Weiss and YouJin Kim addresses the unique considerations that may affect ISLA research across the aforementioned methodological approaches from Chapters 2, 3, 4, and 5. Chapter foci include small sample sizes, the use of intact classes, using one’s own students as participants, how to measure and account for learner and instructor IDs, the risk and measurement of outside exposure to the target structure, and how to put the bi/multilingual turn into practice in ISLA research. Gurzynski-Weiss and Kim conclude the chapter with an analysis of a sample study addressing each of the aforementioned considerations, as well as suggestions for future reading and useful resources for novice and junior ISLA researchers. Section 4. Designing instructional interventions for specific skills and competencies Chapter 7: Pragmatics: Assessing learning outcomes in instructional studies Chapter 8: Vocabulary: A guide to researching instructed L2 vocabulary acquisition Chapter 9: Grammar: Documenting growth in L2 classrooms Chapter 10: Pronunciation: What to research and how to research in instructed second language pronunciation

Introduction to the volume

Chapter 11: Listening: Exploring the underlying processes Chapter 12: Reading: Adopting interdisciplinary paradigms in ISLA reading research Chapter 13: Writing: Researching L2 writing as a site for learning in instructed settings Chapter 14: Speaking: Complexity, accuracy, fluency, and functional adequacy (CAFFA) Section 4 explores how to research the principle skills/competences within the field of ISLA. The section begins with Naoko Taguchi and Soo Jung Youn’s Chapter 7, which focuses on pragmatics. They first introduce the field of L2 pragmatics and why it is important to ISLA, as well as what we know and need to know about in structed pragmatics. Next, Taguchi and Youn describe common learning outcome measures used in instructional studies using discourse completion tasks and role plays in particular, providing in-depth examples of each method. Finally, the chap ter suggests improvements and trouble-shooting when using DCTs and role plays, and how to determine the appropriate assessment criteria. Next, Emi Iwaizumi and Stuart Webb provide guidelines for research in in structed L2 vocabulary acquisition in Chapter 8. The authors begin by describ ing key concepts in the field and giving an overview of instructed L2 vocabulary research. Then, Iwaizumi and Webb explore different measures of L2 vocabulary knowledge and provide advice for data interpretation, critiquing sample studies as examples. The chapter also outlines steps to carry out a study on instructed L2 vocabulary acquisition. The chapter closes with advice for future L2 vocabulary researchers and how to address the challenges that may arise in these studies. Chapter 9 by Paul Toth addresses instructed L2 grammar acquisition. Toth first gives an overview of grammatical knowledge and what we know so far about grammatical development in ISLA. He then discusses data elicitation, interpreta tion, and study design through the lens of his own work on explicit instruction in secondary school L2 Spanish classrooms. The chapter concludes with advice for future grammar researchers and for troubleshooting issues in grammar research. Chapter 10 by Andrew H. Lee and Ron Thomson focuses on pronunciation, a growing research area in ISLA. They begin by introducing key terms and critical topics within the field of L2 pronunciation including accentedness, comprehensibil ity, and intelligibility. Next, Lee and Thomson elaborate on data elicitation options such as picture-description and scalar judgment tasks, and provide a step-by-step guide for pronunciation research in ISLA, followed by advice for future pronunci ation researchers. They end the chapter with advice for task design, data collection, and data analysis.

xv

xvi Research Methods in Instructed Second Language Acquisition

Chapter 11 by Ruslan Suvorov introduces this under-researched but essential topic of L2 listening in ISLA. Suvorov first describes the three main models that explain the process of listening, as well as current trends in L2 listening research. Next, he elaborates on data elicitation and interpretation methods including sur vey research, verbal reports, and behavioral and neuroimaging methods, and gives guidelines for how to incorporate eye-tracking into an L2 listening study. Finally, Suvorov gives advice for study design, validity, data collection, and analysis. Irina Elgort subsequently explores L2 reading in Chapter 12. She first explores the place of reading in SLA and provides an overview of reading research, specif ically focusing on what should be taught in instructed L2 reading. Next, Elgort describes data elicitation methods used to study reading development and provides critical analyses of sample studies in L2 reading. The chapter closes with directions for future research. Following this chapter, Chapter 13 by Ronald P. Leow, Rosa M. Manchón, and Charlene Polio focuses on L2 writing. They begin with an overview of the current state of L2 writing research with a focus on three main areas: language learning associated with instructional interventions via the manipulation of tasks, language learning outcomes of classroom L2 learners processing of written corrective feed back, and written language development in instructed settings. Leow et al. then discuss important methodological considerations within each area and conclude their chapter with suggestions for future researchers regarding how to address methodological issues that may arise in L2 writing research. Section 4 concludes with Chapter 14 on L2 speaking by Folkert Kuiken and Ineke Vedder. The authors begin by giving an overview of and review of the lit erature surrounding the two components that are taken into consideration in L2 speaking research: complexity, accuracy, and fluency (CAF) and functional ade quacy (FA) (henceforth, CAFFA). Next, Kuiken and Vedder provide a guide to data elicitation and interpretation for L2 speaking, followed by advice to future CAFFA researchers and how to troubleshoot issues in researching CAFFA, such as having native and non-native speakers in the same study. Section 5. Sharing your research Chapter 15: Contributing to the advancement of the field: Collaboration and dis semination in ISLA In Section 5, YouJin Kim and Laura Gurzynski-Weiss close the volume with Chapter 15, highlighting advice on how to share one’s research with academic and non-academic stakeholders especially teachers, teacher trainers, and language

Introduction to the volume xvii

policy makers. The authors advocate for collaboration between ISLA researchers and educators, citing its many benefits. Kim and Gurzynski-Weiss also provide de tailed guidance on how to draft ISLA manuscripts and discuss how to decide which venue would be best for one’s work. Finally, they conclude by elaborating future directions for ISLA research and ways to maximize the impact of this research on L2 pedagogy.

Volume reference matrix To facilitate reader reference and consultation, we provide a volume reference ma trix. The individual chapters (beyond the first introductory and the final concluding chapters) are seen across the top columns, with each key question in ISLA (outlined in detail in Chapter 1) provided in the left-most column, followed by research methods, and finally data types. While this matrix is not exhaustive, we hope it provides guidance when looking for specific resources. Individual chapter

2 3 4 5 6 7 8 9 10 11 12 13 14

Key questions in ISLA How are L2s learned in instructed contexts? +

+ + + +

+

+

+

+

What is the nature of the L2 knowledge gained in instructed contexts?

+

+ + + +

+

+

+

+

How do variables related to instructional context (broadly defined) influence L2 learning?

+

+ + + + +

+

+

+

+

How do individual differences play a role in + + instructed L2 learning?

+

+

+

+

What do SLA theories and research say about the effectiveness of L2 instruction?

+

+ + + +

+

+

+

Research methodologies highlighted Quantitative Qualitative Mixed methods Replication

+ +

+

+

+ +

+ + + +

+

+

+ + + +

+ + +

+ +

+ +

+

Methods of data elicitation Processing data Eye-tracking Neuroimaging methods

+ +

+

Learning outcome data Discourse completion task Role play

+ +

+

(continued)

xviii Research Methods in Instructed Second Language Acquisition

Individual chapter

2 3 4 5 6 7 8 9 10 11 12 13 14

Acceptability judgment task Scalar judgment task Picture description task Recognition test Recall test Writing task Questionnaire Forced-choice identification task

+ +

Verbal reports data Think-aloud Stimulated recall Interview

+ + +

Methods of data coding/analysis Complexity Accuracy Fluency Intelligibility Receptive knowledge gain measures Productive knowledge gain measures Explicit knowledge gain measures Implicit knowledge gain measures Quality of oral language Quality of written language

+ + +

+ + +

+ + + + +

+ + + +

+ + +

+

+ + + +

+ + + + + + +

+ + + + + +

+

+

+

+ + +

+ +

+ + + + + + +

+ + +

+ + + +

+ + + + + +

+ + + + + + + +

List of contributors

Carlo Cinaglia is a doctoral student in the Second Language Studies program at Michigan State University. His current research draws on narrative inquiry and discourse-analytic approaches to examine student identity and investment in lan guage learning as well as pre-service language teacher identity and agency within teacher education programs. He is also interested in qualitative SLA research meth odology and ethics in applied linguistics scholarship more broadly. Carlo has taught undergraduate courses in Linguistics, TESOL, Spanish and ESL, and he currently mentors graduate student language teachers completing their practicum. Peter I. De Costa is Associate Professor in the Department of Linguistics, Languages & Cultures and the Department of Teacher Education at Michigan State University. His research areas include emotions, identity, ideology, and ethics in educational linguistics. He also studies social (in)justice issues. He is the co-editor of TESOL Quarterly and the First Vice-President of the American Association for Applied Linguistics. Irina Elgort is Associate Professor in Higher Education at Victoria University of Wellington. Her research interests include explicit and implicit vocabulary learning and processing; bilingual lexical and semantic representations and their develop ment, and reading. She uses research paradigms from applied linguistics, cogni tive psychology, & education to better understand, predict, and influence learning. Irina’s research has been published in Applied Linguistics; Bilingualism: Language and Cognition; Language, Cognition and Neuroscience; Language Learning; and Studies in Second Language Acquisition. Laura Gurzynski-Weiss is Professor in the Department of Spanish and Portuguese at Indiana University, where she researches instructed second language acquisi tion, task-based language teaching, and individual differences. She is the editor of Expanding the interaction approach: Investigating learners, instructors, and other interlocutors (2017, John Benjamins) and Cross-theoretical explorations of interlocutors and their individual differences (2020, John Benjamins). She directs the TBLT Language Learning Task Bank, is on the Board of the International Association for Task-Based Language Teaching, is AILA Secretary General, and is co-founder/ director of AILA Ibero-America.

xx Research Methods in Instructed Second Language Acquisition

Emi Iwaizumi is a PhD candidate in Applied Linguistics at the University of West ern Ontario, Canada. She has taught English as a Foreign Language in Japan, and both English for Academic Purposes and a graduate course in second language vocabulary teaching in Canada. She is interested in exploring theories that under lie first and second language vocabulary acquisition and examining the effects of different instructional approaches to developing vocabulary knowledge. She is also interested in the use of technology in teaching, analyzing, and assessing vocabulary and pronunciation. YouJin Kim is Professor and Director of Graduate Studies in the Department of Applied Linguistics and ESL at Georgia State University. Her research interests involve second language acquisition and task-based language teaching and assess ment, targeting both English and Korean learners. In particular, she has conducted a number of classroom-based studies which examine task design, task implemen tation variables, and learner variables in diverse instructional contexts. She is a co-author of Pedagogical Grammar (with Casey Keck, 2014, John Benjamins) and co-edited Task-Based Approaches to Teaching and Assessing Pragmatics (with Naoko Taguchi, 2018, John Benjamins). She is currently an Associate Editor of Journal of Second Language Writing. Folkert Kuiken is Professor Emeritus of Dutch as a Second Language and Multi lingualism at the University of Amsterdam, and Academic Director of the Insti tute for Dutch Language Education at that same university. His research interests include the effect of task complexity and interaction on SLA, Focus on Form, and the relationship between linguistic complexity and functional adequacy. He (co) authored and (co)edited various books and special issues, including Dimensions of L2 performance and proficiency (Housen, Kuiken, & Vedder, 2012, John Benjamins). Andrew H. Lee is Assistant Professor of Applied Linguistics at Brock University, where his course topics span instructed second language (L2) acquisition, applied phonetics, and bilingualism. His research interests focus on various instructional techniques in L2 classroom settings, L2 pronunciation instruction and its effects on the acquisition of L2 phonological, lexical, and morphological targets, and the impact of individual differences such as age and executive function skills on L2 acquisition. Dr. Lee has published his work in numerous academic journals includ ing Studies in Second Language Acquisition, Language Learning, Language Teaching Research, and Applied Psycholinguistics.

List of contributors xxi

Ronald P. Leow is Professor of Applied Linguistics and Director of Spanish Language Instruction in the Department of Spanish and Portuguese at Georgetown University. His areas of expertise include language curriculum development, teacher education, instructed language learning, cognitive processes, CALL, and written corrective feedback. Professor Leow has published extensively and has co-edited several books together with his single-authored 2015 book, Explicit learning in the L2 Classroom: A student-centered approach (Routledge) and his edited Routledge handbook of second language research in classroom learning (2019). His Feedback Processing Framework, based on his 2015 Model of the L2 Learning Process in ISLA, appeared in 2020. Shaofeng Li is Associate Professor of Second and Foreign Language Education at Florida State University. His main research interests include language aptitude, working memory, form-focused instruction, task-based language teaching and learning, corrective feedback, and research methods. His research has primarily focused on the joint effects of learner-external and learner-internal factors on sec ond language learning outcomes. His publications have appeared in Annual Review of Applied Linguistics, Applied Linguistics, Applied Psycholinguistics, Language Learning, Language Teaching Research, Modern Language Journal, Studies in Second Language Acquisition, System, among others. He is the editor-in-chief of Research Methods in Applied Linguistics (Elsevier) and the book review editor of TESOL Quarterly (Blackwell). Rosa M. Manchón is Professor of Applied Linguistics in the Department of English, University of Murcia, Spain. Her research explores L2 writing processes and strate gies and has appeared as articles in prestigious journals, book chapters, and several edited books. In addition to her research activities, she has served the profession in various capacities, including work for professional associations, and different editorial positions: she was AILA Publications Coordinator, is past Editor of the Journal of Second Language Writing, and is Chief Editor of the book series “Research Methods in Applied Linguistics” (John Benjamins). Kevin McManus is Associate Professor in the Department of Applied Linguistics at Penn State University, USA, where he is also director of the Center for Language Acquisition. His research specializations include (instructed) second language learning, crosslinguistic influence, and replication research. His most recent books include Crosslinguistic Influence and Second Language Learning (2022, Routledge) and Doing Replication Research in Applied Linguistics (2019, Routledge). His work has appeared in scholarly journals such Applied Linguistics, Studies in Second Language Acquisition and Modern Language Journal.

xxii Research Methods in Instructed Second Language Acquisition

D. Philip Montgomery is a doctoral student in the Second Language Studies program at Michigan State University. His research focuses on educational linguistics, lan guage policy and planning, and sociocultural approaches to understanding language ideology. He has taught ESL, Spanish as a foreign language, and Academic English at secondary and tertiary levels in the US and Kazakhstan. Philip has published on adaptive transfer of genre knowledge in multilingual contexts and is currently the Graduate Assistant Director of the Writing Center at Michigan State University. Charlene Polio is Professor in the Department of Linguistics, Languages, and Cul tures at Michigan State University, where she teaches in the applied linguistics pro gram. Her main area of research is second language (L2) writing, particularly the various research methods and measures used, the interface between the fields of L2 writing and second language acquisition, and applications of corpus-based meth ods. She is the co-editor of TESOL Quarterly and a consulting editor for R esearch Methods in Applied Linguistics. She is the co-editor, with Rosa M. Manchón, of the Routledge Handbook of Second Language Acquisition and Writing. Robert A. Randez is a PhD candidate in Second Language Studies at Michigan State University. He is an educational linguist interested in the language learning experi ences of neurodivergent individuals. Receiving his BA and MA from the University of Texas at San Antonio, he has taught learners of a variety of ages in a variety of contexts and is currently a member of Michigan’s Seal of Biliteracy Council. Masatoshi Sato is Professor at Universidad Andrés Bello, Chile. His research agenda is to conduct theoretical and applied research in order to facilitate the di alogue between practitioners and researchers. In addition to his publications in international journals, he has co-edited volumes from John Benjamins (2016: Peer Interaction and Second Language Learning), Routledge (2017: The Routledge Handbook of ISLA; 2019: Evidence-Based Second Language Pedagogy), Language Teaching Research (2021: Learner Psychology and ISLA), and the Modern Language Journal (2022: The Research-Practice Relationship). He is the recipient of the 2014 ACTFL/ MLJ Paul Pimsleur Award. He is the Editor of Language Awareness. Ruslan Suvorov is Associate Professor in applied linguistics at the University of Western Ontario, where he teaches courses in second language assessment and computer-assisted language learning (CALL). His research interests lie at the in tersection of language assessment, CALL, and instructional technology and de sign, with a focus on second language listening and eye tracking. Ruslan has given numerous conference presentations and workshops and published in peer-reviewed journals (e.g., CALICO Journal, International Journal of Listening, Language Testing), edited volumes, conference proceedings, encyclopedias, and research reports. He is a co-author of Blended language program evaluation (Palgrave Macmillan, 2016).

List of contributors xxiii

Naoko Taguchi is Professor of Applied Linguistics at Northern Arizona University, where she teaches courses on TESOL, linguistics, and SLA. Her research interests include second language pragmatics, technology-assisted teaching, intercultural learning, and English-medium education. She is currently co-editing the Ency clopedia of Applied Linguistics Pragmatics Volume. She is a co-editor of Applied Pragmatics. Ron Thomson is Professor of Applied Linguistics at Brock University. He teaches undergraduate courses in phonetics and phonology and a graduate seminar in teaching second language oral/aural skills. His research interests focus on the devel opment of L2 oral fluency and pronunciation skills. He is also known for his work in High Variability Phonetic Training, which has resulted in a free online perceptual training platform, www.englishaccentcoach.com. Paul Toth is Associate Professor of Spanish applied linguistics at Temple University, where he teaches graduate and undergraduate courses and mentors doctoral stu dents. Since 1991, he has also taught high school and university-level Spanish, coor dinated language curricula, and supervised pre-service teachers. He has published 27 research articles and book chapters on instructed second language learning and is the editor of a 2022 special issue of Language Learning that brings together various perspectives on this topic. He was twice awarded the Paul Pimsleur Award for research excellence from the American Council on the Teaching of Foreign Languages. Ineke Vedder is a researcher at the University of Amsterdam. Her research interests include Instructed Second Language Acquisition (particularly Italian), academic writing in L2 and L1, L2 pragmatics, and assessment of functional adequacy in L2 performance, in relation to linguistic complexity. Her publications have appeared in various edited books and journals, comprising two special issues, together with Housen, De Clercq, and Kuiken, on syntactic complexity in SLA research (2019). Stuart Webb is Professor of Applied Linguistics at the University of Western On tario. Before teaching applied linguistics, he taught English as a foreign language in Japan and China. His research interests include vocabulary studies, second lan guage acquisition, and extensive reading, listening, and viewing. His latest books are How Vocabulary is Learned (with Paul Nation) and The Routledge Handbook of Vocabulary Studies.

xxiv Research Methods in Instructed Second Language Acquisition

Soo Jung Youn is Assistant Professor of English Education at Daegu National University of Education, Korea. Prior to joining DNUE, she worked as Associate Professor of Applied Linguistics at Northern Arizona University, USA. Her research interests include language testing and assessment, second language pragmatics, interactional competence, task-based language teaching, and mixed methods re search. Her research has been published in numerous academic journals including TESOL Quarterly, Language Testing, System, Applied Linguistics Review, Intercultural Pragmatics, and Journal of English for Academic Purposes and edited books.

Section 1

Introduction

Chapter 1

Getting started Key considerations Laura Gurzynski-Weiss and YouJin Kim Indiana University / Georgia State University

This introductory chapter orients the reader to both instructed second language acquisition (ISLA) and research methods in general. We begin with an operationalization of ISLA, followed by an outline of the main research questions and overarching goals within ISLA, and specification of the differences between an applied orientation to research and simply drawing pedagogical implications. In the second half of the chapter, we move into the nature and ultimate aims of research (i.e., theoretical, empirical, applied), in general, and for the specific field of ISLA. We highlight the importance of connecting to and building off prior work and how research contributions must be connected to disciplinary discussions as well as previous scholarly work as well as being ethically sound. Finally, we outline the basics of research decision-making and walk the reader through each of the principal considerations during the design phase of a study. Keywords: instructed second language acquisition, research methodology, research methods, L2 learning/development

1. Introducing instructed second language acquisition: Defining the domain 1.1

What is ISLA? How does it differ from SLA and applied linguistics?

Before we begin, we will operationalize – provide our working definitions of – terms used throughout the volume. Broadly, instructed second language acquisition (ISLA) is a field of empirical inquiry that examines how additional languages (L2s; i.e., those learned after one’s native language(s) [L1]) are learned in instructed settings and what can be done to optimize this learning (see Loewen, 2020). It is one of the fastest-growing sub-fields of second language acquisition (SLA). Our use of the term instructed settings includes any context where there is an intentional pursuit of learning the L2 and – in our case as researchers, teach ers, and teacher trainers – an overall intent to optimize this learning. Instructed https://doi.org/10.1075/rmal.3.01gur © 2022 John Benjamins Publishing Company

4

Laura Gurzynski-Weiss and YouJin Kim

settings include in-person, online, or in-person/online hybrid classrooms, as well as intensive immersion camps or short-term programs, technology-mediated lan guage learning applications (e.g., language apps, games), or online or in-person learning environments such as L2 learning tables, conversation partners, etc. Finally, in this volume we use the term L2 learning or development rather than acquisition. Instructional settings most often incorporate opportunities for explicit learning – intentional learning with clear learning objectives where learners are aware of and trying to learn (which may later become proceduralized and more implicit; see DeKeyser, 2003; Ellis, 2009; Loewen, 2020) – more so than implicit learning, which that takes place without intention or conscious attention, resulting in incidental learning of language (Leow, 2018; Loewen, 2020); this latter category of learning is also commonly referred to as acquisition (Ellis, 1994). ISLA falls within the larger domain of SLA, which refers to the scientific study of the development of an L2 without necessarily an intent to manipulate or improve the learning conditions. Within SLA there is a critical distinction be tween acquisition (less intentional/more incidental, less explicit/more implicit) and learning (more intentional/less incidental, more explicit/less implicit), and L2 development may occur with or without instruction or intent to manipulate or in crease conditions favorable for learning. Some distinguish between SLA as focusing on “natural” contexts of language learning as compared to “instructed” contexts of ISLA, though we would argue that learning intentionally, including in classrooms, is a natural part of life for many. The overarching questions of the field of SLA are to examine how L2s are acquired (and, inherently, how this acquisition differs from L1 acquisition), as well as explaining why there is such variation in the success of L2 acquisition (again inherently as compared to L1 acquisition). For example, a frequently researched question in SLA may be if there is a developmental sequence to a specific linguistic structure, such as copula verbs ser and estar in Spanish or questions in English (spoiler alert: there is! See work by Geeslin, 2000, 2003; Geeslin & Long, 2015; Kim, 2012; Ryan & Lafford, 1992; VanPatten, 1987), whereas an ISLA focus of that area may be how to manipulate instruction or learners’ atten tion through specific tasks within instruction to move more quickly through the developmental sequence (Cheng, 2002). Both SLA and ISLA fall within the broader umbrella of applied linguistics, which is an interdisciplinary field of inquiry that attempts to provide explanation of “what we know about (a) language, (b) how it is learned and (c) how it is used, in order to achieve some purpose or solve some problem in the real world” (Schmitt & Celce-Murcia, 2020, p. 1).

1.2

Chapter 1. Getting started

Overarching goals of ISLA

Moving into the domain of ISLA, the overarching goals are to: (1) examine how languages are learned in instructed contexts where learners are intentionally learning the L2 and teachers and/or researchers are manipulating conditions to encourage/ improve their learning; (2) research whether a pedagogical intervention improves L2 learning opportunities via measurements and analysis of the aforementioned conditions and resulting learning; (3) examine the influence of myriad variables, the learning and instructional context as well as variables pertaining to the individual differences (IDs) of all involved and the impact of surrounding communities; and (4) interpret how learning opportunities need to be adjusted accordingly based on research as well as collaborate with practitioners to find out how we need to adjust research according to L2 instructed contexts. Following the tradition of SLA research, ISLA researchers often focus on specific target linguistic features such as a set of pronunciation features, vocabulary items, morphemes, syntactic structures, and pragmatic features. And although L2 pedagogy often focuses on four skill areas: listening, speaking, reading, and writing, ISLA research rarely explores the development of these skills. However, if we want to connect ISLA research and classroom instruction, a broader perspective of ISLA research is warranted. Therefore, in the current volume, we also organize the target research areas based on the skill areas, which is a unique approach to ISLA research methods. We also include pragmatic competence, and divide spoken production into two areas: pronunciation (Chapter 13) and speaking performance in terms of complexity, accuracy, fluency, and functional adequacy (Chapter 14). 1.3

Common assumptions across theories of ISLA

All ISLA research decisions must be grounded in a given theoretical framework, and we begin as such in this chapter. In this respect, the terms, theory, model, and hypothesis differ considerably, as outlined by VanPatten and Williams (2015). A theory refers to “a set of statements about natural phenomena that explains why these phenomena occur the way they do” (p. 1). A model “describes processes or sets of processes of a phenomenon” (p. 4). And while theory may attempt to account for many phenomena together, a hypothesis focuses on a single phenomenon that can be tested. Several assumptions are held across theoretical frameworks for ISLA. Both Interactionist Approaches to SLA (Mackey & Gass, 2014; Long, 1996) and Sociocul tural Theory (Lantolf, 2020) offer useful learning mechanisms that can be applied to instruction development. First, input – the L2 data to which learners are exposed

5

6

Laura Gurzynski-Weiss and YouJin Kim

in aural and/or written modalities – is critically important for ISLA to take place. Without exposure to the L2, learning simply will not occur. Theories of ISLA also maintain that learners must attend, at some level, to the L2 in order to learn; this is a fundamental difference with the larger field of SLA, which maintains that the L2 may also be learned incidentally without conscious attention. Given the lim ited opportunities for L2 learning in instructed contexts, there is almost always an explicit component at play. Third, for L2 learning to occur, learners must interact with and try out producing the L2. They need to do this in order to test out their internal hypotheses about how the L2 works, receive feedback (either positive, in support of their hypotheses, or negative, providing important information about what is or is not possible or preferable in the L2), and try out producing the L2 once again (Swain, 2005). In order for an interaction and feedback iterative cycle to occur, learners need interlocutors, or communicative partners, who are at least sometimes more advanced than they are (Gurzynski-Weiss, 2020). As the purpose of language is communication, or attempts for mutual understanding between two or more individuals (also referred to as negotiation for meaning), a learner needs someone, whether in person or online, or even artificial as in a language learning app, with whom they can interact. From a sociocultural perspective, language is considered as an important mediating tool, and through interaction learners also experience a higher level of mediation process, which results in learning (Payant & Kim, 2019). Ideally, the aforementioned L2 opportunities also occur in meaningful and varied contexts that approximate as much as possible the variation found in L1 learning. 2. Global issues in ISLA research 2.1

Key questions and considerations in ISLA research

In this section, we highlight an intentional selection of the principal research ques tions in ISLA, corresponding findings, and some of the (many!) areas in need of additional empirical inquiry. 2.1.1 How are L2s learned in instructed contexts? Predicated in this question is, of course: is instruction beneficial for L2 learning? Does learning happen in instructed contexts? (Long, 1983). Loewen (2020) defines these as the primary questions of ISLA research, and VanPatten and Benati (2010) maintain they are central questions within the larger field of SLA. In this volume we assert that L2s can be learned in instructed contexts, and we focus on how learning opportunities can be adjusted for L2 learning. At the same time, it is

Chapter 1. Getting started

important to state that L2 instruction or any type of intentional manipulation/ treatment of variables or learning conditions does not change the route of L2 development – though it can influence the rate, the type(s) of learning that takes place, and even the amount of learning (Mackey, 1999). By route we refer to the different trajectories of a given structure, sequence, or competence; these trajectories may include developmental sequences or stages (the acquisition of ser and estar in Spanish over five developmental stages), U-shaped curves (correct use of an irregular verb ending, overgeneralization, and then a return to the correct and initial use of irregular morphology such as went, *goed, went in English), or more simplistic, initial learning such as chunks such as Me llamo Laura in Spanish to later on unpacking the phrase to understand reflexive verbs) or 1:1 translation (‘hospital’ in Portuguese to the same in English). In other words, there is nothing unique about the mechanisms of learning whether in an instructed context or a non-instructed, incidental, and more naturalistic setting. The speed of learning and the depth and breadth of proficiency of each competency, however, are influenced greatly by the variables in each instructed setting and within each learner. And as we will see in the next section and throughout the volume, individual ISLA theories determine the variables of interest in studying how L2s are learned in instructed contexts (see the next section), and the methodologies (Chapters 2, 3, 4, and 5) dictate how each variable will be approached. That being said, overall, there is agreement that L2 input in instructed settings must be present, varied, meaningful, and authentic, that learners must attend to the input in some way, and that they must have opportunities to engage with the input through practice (production) with it. 2.1.2 What is the nature of the L2 knowledge gained in instructed contexts? A second central question in the field of ISLA is about the nature of L2 knowledge that can be gained in instructed settings. Given our earlier definitions of instructed contexts as being a place of largely intentional learning, and explicit learning as learning with attention (see also Leow, 2015; Loewen, 2020), it comes as little surprise that explicit knowledge is the primary type of knowledge obtained in instructed con texts. Specifically, explicit instruction tends to result in explicit knowledge, though the nuances in the types of explicit instructional manipulations and the ways explicit knowledge is measured are many (see Ellis et al., 2009; Leow, 2015). For example, explicit instruction may be operationalized as a rule-based presentation of grammar, such as the WEIRDO (wish, emotion, impersonal, request, desire, ojalá) contexts when the Spanish subjunctive is most often used, or it may be an explicit descrip tion of how in English the morphological -ed marker is used to communicate past tense (with exceptions requiring memorization) regardless of subject. Among other options, explicit knowledge may be tested as the requirement to choose the correct

7

8

Laura Gurzynski-Weiss and YouJin Kim

conjugation of a particular verb (subjunctive or indicative mood) in a sentence, or it could be more nuanced asking participants to correct any incorrect sentences and explain why the original was incorrect. Importantly, and as stated earlier, explicit instruction that results in explicit knowledge does not necessarily result in the ability to spontaneously produce the L2, one of the most important considerations for ISLA competencies (see Chapters 7, 8, 9, 10, 11, 12, 13, and 14). Explicit learning can eventually become more automatized and used spontane ously with practice. Explicit knowledge has also been found to result from implicit learning. For example, Leow (2000) created crossword puzzles so that learners would have enough prior knowledge to confidently complete many of the answers and then discover the rules for irregular stem-changing verbs in the preterite tense as they completed the puzzle. This was an implicit technique that resulted in explicit knowledge. Implicit knowledge has also been found to result from instructed contexts (Williams, 2004, 2005), though there are considerably fewer studies compared to those focusing on explicit instruction and knowledge. Most of the time, implicit knowledge results from implicit learning. This occurs most often from inductive approaches, which encourage learners to go bottom-up from the examples in con text and try to figure out the patterns of use, and less as a result of explicit learning, often arising from deductive approaches which utilize a top-down approach of sharing the pattern and then having learners find examples of the rules in action. For example, in the aforementioned Leow (2000) study, the crossword puzzle is an inductive approach. A second example would be to have learners circle all in stances of the subjunctive in a given passage and attempt to describe the pattern(s) they see. Finally, options for measuring implicit learning include an elicited oral imitation test, asking participants to recount as much of a story as possible (with the assumption that they will omit any structures within the story that they do not understand how to use), or a timed grammaticality judgment test, where partic ipants quickly state if a sentence is grammatical (or acceptable) or not. In a few occasions, implicit learning has also been found to result from explicit instruction. For example, Spada and Tomita’s (2010) meta-analysis on L2 English found explicit instruction to result in both explicit and implicit knowledge. This was also found in Ellis et al.’s (2006) study on L2 English. However, a relationship between explicit instruction and implicit learning was not found in Sonbul and Schmitt (2013) nor in Toomer and Elgort (2019), though both of these studies focused on vocabulary rather than on grammar like the prior two. Leow (2015) and Loewen (2020) independently suggest that, given the evidence that implicit knowledge can outlast explicit knowledge (at least as measured on delayed posttests), maintaining a mix of both explicit and implicit learning can be beneficial in instructed contexts. This differs from what we find naturally occurring

Chapter 1. Getting started

in instructed settings: younger L2 learners usually have considerably more oppor tunities for implicit L2 learning while adults, in contrast, are more accustomed to and primed to learn explicitly. Indeed, Loewen (2020) even points out that in instructed settings adult L2 learners resist more implicit techniques, requesting explicit grammar instruction and feeling unsatisfied when not given rules to apply. Given the (comparative) success of younger learners compared to older, and the lasting effects of implicit learning, we echo the call for more research on implicit learning in the field of ISLA.

How do variables related to instructional contexts influence the L2 learning process? Instructional contexts have been noticeably expanded in the ISLA literature over the last few decades. Historically, one of the most popular distinctions among in structional contexts was that between foreign language learning (those learning the target language in their home countries or where it is not an official language of the country where they study the target language) and second language learning (those learning the target language where it is used as an official language). However, this distinction is largely disappearing to reflect the bi/multilingual reality that is the majority worldwide with researchers, including in this volume, denoting additional languages as L2s. Heritage language learning (also referred to as heritage language acquisition) has also been increasingly examined and is differentiated from foreign and/or second language learning. According to Valdés (2000), heritage speakers are those who were raised at home where a language other than the dominant commu nity language was spoken. Thus heritage speakers often have close cultural bonding with speakers of their heritage language such as family members. Heritage language learners vary in the amount of exposure they have had to the heritage language, but their connection to the language makes the language learning process different from that of non-heritage language learners. Heritage language learners may have some similarities to other native speakers of that language, such as exposure to the language at an early age in a naturalistic setting. But in other ways, heritage language learners are similar to adult L2 learners of the language – for example, they may have limited access to input or may exhibit transfer errors from their dominant language. Because of heritage language learners’ unique language repertoire and background, researchers have begun to compare bilingual heritage language and L2 learners in classrooms (e.g., Bowles et al., 2014), and book length discussions of heritage language education is available (e.g., Kagan et al., 2017). Two additional contexts have been made central in L2 research. As college study abroad programs have been on the rise, the research domain of study abroad has grown noticeably, as seen in recent development on the new journals (e.g., Study Abroad Research in Second Language Acquisition and International Education), 2.1.3

9

10

Laura Gurzynski-Weiss and YouJin Kim

and a new strand on study abroad at professional conferences such as American Association of Applied Linguistics. Additionally, mobile applications such as Duolingo (Loewen et al., 2019), Babble (Loewen et al., 2020), and TalkAbroad (Kessler et al., 2020) are becoming a common context of ISLA studies. 2.1.4 How do individual differences play a role in instructed L2 learning? IDs are the characteristics that all humans possess, and which we use to differen tiate and compare across individuals and groups; we will unpack IDs in detail in Chapter 6. While, at the beginning of the larger field of SLA, researchers focused on which characteristics “good” language learners possessed (Ruben, 1975), we have thankfully evolved, recognizing no single ID or set of IDs is necessarily more or less predictive of L2 learning success. Rather, IDs are data points that assist researchers (and ideally also learners and teachers) in understanding how L2 learning occurs and what can be done to maximize L2 learning opportunities. They also assist us in understanding that certain actions may be differentially beneficial for specific learners or groups. Traditionally, research has focused on learner IDs (Dörnyei & Ryan, 2015) and often on a single ID at a given time (e.g., L2 motivation, L2 anxiety, or working memory). More recently, however, in keeping with the dynamic turn in ISLA (de Bot et al., 2007), the IDs of interlocutors including teachers (Gurzynski-Weiss, 2017; Long & Geeslin, 2020; Nakatsukasa, 2017; Ziegler & Smith, 2017), host fam ilies (Serafini, 2020), cultural tutors (Serafini, 2020), and non-present interlocutors (Back, 2020) have been considered. There is also convincing evidence that many IDs are dynamic – changing over time – and that IDs interact with each other (see volume-length discussion in Gurzynski-Weiss, 2020). For example, there is evidence that working memory plays less of a role in L2 learning as proficiency increases (Serafini & Sanz, 2016). Further complicating the picture is the fact that IDs often interact with context-related variables. Take, for example, variable age of onset. If we compare L2 learners whose age of onset was early (e.g., preschool context) to those whose age of onset was within adulthood (e.g., secondary school or college context), it is likely that, in addition to age of onset, context-related variables such as amount of interaction, opportunities for explicit versus implicit meaningful presentation and practice, and exposure to rich and varied input, differ greatly. Finally, and in relation to the earlier mention of IDs in relation to context-related variables, the target L2 and the larger societal value and role of that language – including language policies and potential marginalization of the language and/or L1 speakers of the language – outside of the classroom can impact learners’ current L2 identity and ideal self.

2.1.5

Chapter 1. Getting started

What do SLA theories and research say about the effectiveness of L2 instruction? What instructional techniques are most likely to facilitate ISLA?

VanPatten et al. (2020) introduce 10 common observations that need to be ex plained by theories in SLA, and one that is closely related to ISLA is observation 10: “There are limits on the effects of instruction on L2 acquisition” (p. 12). A total of 10 theories are introduced in the edited volume, and each theory presents different perspectives of the role of instruction. Some theories do not support the effective ness of instruction and have a pessimistic view of instruction in L2 acquisition. For instance, in the Usage-based Approaches to L2 acquisition, Ellis and Wulff (2020) state that attention to language through explicit teaching may facilitate learners’ noticing of language forms; however, they claim that “explicit knowledge about language is of a different stuff from that of the implicit representational systems” (p. 77) and that instruction does not always guarantee language acquisition. On the other hand, there are theories that support the benefits of instruction: e.g., Interaction Approach, Sociocultural Theory, Skill Acquisition Theory, and Input Processing Theory (Loewen, 2020). For instance, from the interactionist perspectives, numerous research studies on different aspects of interactional fea tures (oral corrective feedback, peer interaction, alignment (see Gass & Mackey, 2020, for review) have been suggested as beneficial for language learning. Students’ mediation processes either individually (private speech, inner speech) or during interaction (collaborative dialogue) have been found to facilitate L2 learning, as well (Kim, 2008). Considering that L2 instruction is often conducted in classroom contexts, one can assume that instruction is expected to have some impact on L2 learning. Then what instructional techniques are most effective for facilitating ISLA? Over the last few decades, a large number of empirical studies have been conducted with a goal of finding ultimate instructional methods to facilitate L2 learning. Meta-analyses of such research have found that instruction can result in positive learning outcomes. In particular, Norris and Ortega’s (2000) meta-analysis with 49 ISLA research studies found that explicit instruction was effective and the learning was sustaina ble. Taguchi’s (2015) review paper on pragmatics instruction also found the benefits of explicit instruction. Shintani et al. (2013) compared comprehension-based and production-based grammar instruction in their meta analysis with 35 studies. They found that both types of instruction had benefits on both receptive and productive knowledge. Such meta-analyses have offered insightful syntheses of previous ISLA research. Individual studies, however, remain important to consider the interaction between context, IDs, and linguistic features (to name a few) that also mediate the effectiveness of certain instructional techniques.

11

12

Laura Gurzynski-Weiss and YouJin Kim

2.2

Ultimate aims of scientific research in general and in ISLA

There are three ultimate aims of scientific research in general, and ISLA is no exception. You will notice below when we delineate each part of scientific study that all three components – theory, research, and applications – are present in the basic outline of all ISLA studies. 2.2.1 Theoretical aims First, scientific research is grounded in theory. Whether it be a theory, model, framework, or approach (Mitchell et al., 2019; VanPatten & Williams, 2015), all de cisions made in ISLA research (design, execution, analysis, interpretation, etc.) must be based on sound theoretical assumptions. For more quantitative approaches (see Chapter 2), decisions in methodological design are also motivated by in-progress hypothesis testing. In more qualitative approaches (see Chapter 3), choices are motivated by both theoretical and contextual relevance – what is important for the particular situation at hand. In other words, all decisions in empirical research tie back to what we believe (and what has been empirically found) to impact ISLA. In utilizing quantitative research methods, we strive to provide supportive or contrary evidence to theoretical ideas and existing research on how L2s are learned. So, for example, we would research the type of input that is more effective for drawing learners’ attention to a specific feature of language, as both input and attention are believed to be critically necessary for L2 learning to take place; we would not conduct a research study comparing the efficacy of two different font colors in a given task, as that does not have any theoretical tie-back (at least not yet; one never knows!). We conduct qualitative research in order to understand the multifaceted nature of how L2 learning occurs taking into account micro (e.g., individual learner or specific classroom), meso (e.g., school), and macro (e.g., larger community) fac tors (Douglas Fir Group, 2016, as cited in De Costa et al., 2019); we would not study a learner’s classroom interaction patterns without also considering (at minimum) their IDs and the immediate learning context. 2.2.2 Empirical aims Second, ISLA seeks to provide a robust body of empirical evidence that increases our understanding of how L2 development occurs. For quantitative research, em pirical studies test the theories of how L2s are acquired in instructed contexts and what manipulations are most effective and meaningful for the specific context and learners at hand. By robust we mean that the strongest, most validated, and most appropriate steps have been taken to conduct research whose results are meaningful for the specific study and have the potential to inform and impact the larger field. Most qualitative research is realized in a bottom-up way, meaning that researchers

Chapter 1. Getting started 13

start from the data and create coding schemata and at times even research q uestions based on what the data reveal. For qualitative research, empirical studies seek to provide exceptional detail of a specific phenomenon or phenomena in a given context. Whereas most quantitative research is conducted top-down, meaning that we start from a hypothesis and make decisions accordingly based on the existing literature. This second component, empirical research, is where we will concentrate much of this volume. 2.2.3 Applied aims Finally, ISLA, as with all fields of scientific inquiry, seeks to be useful in application beyond the study itself; specifically, ISLA seeks to be useful for L2 pedagogy. The extent to which research reflects real-life situations is called ecological validity; the goal in ISLA research is that our work is as ecologically valid for L2 instructional contexts as possible. As our goals are to uncover how L2 learning takes place in an instructed context and what can be done to maximize this learning, the logical next step is to share this information with L2 teachers and teacher trainers. The extent to which we hope a specific study is generalizable, or able to inform contexts beyond that of the current study, depends on the type of methodology and the study design. In general, quantitative studies are more concerned with generalizability of the findings; here, we will find attempts to choose representative participants and randomized samples, etc. (see Chapters 2 and 4). In contrast, qualitative studies are focused within the study, on accurately and robustly examining the specific context at hand with much more detail (see Chapters 3 and 4). Regardless of the method ological approach taken, all ISLA research seeks to be applicable to p edagogical contexts. Current practices in the field, and ideas for how such practices can be done better, most importantly by collaborating with L2 practitioners, is detailed in Chapter 15. 2.3

State of the field in ISLA research methods

The majority of ISLA research methods are adopted from research in second language acquisition studies at large (Mackey & Gass, 2022). Traditionally, ISLA research has adopted a quasi-experimental pretest-posttest research design be cause it is mainly concerned with the L2 learning outcomes of different instruc tional conditions. However, such a research design has been criticized as being a blackbox study because how learners engaged with given instructional mate rials are not known. Therefore, over the last two decades, studies which exam ined both process and learning outcome have been on the rise (e.g., Kim et al., in press). In addition to the learning outcome data, processing elicitation procedures such as think-aloud protocols, stimulated recall, and eye-tracking data have been

14

Laura Gurzynski-Weiss and YouJin Kim

increasingly used to gain insights on learning processes. ISLA researchers are in terested in e xamining learners’ processing of the L2 in real time (via methods such as eye-tracking, neuroimaging data, think-aloud protocols, interaction data, etc.) as well as eliciting production data that can give us insights into where learners are in their L2 development (with respect to a specific competency or skill, for example) at a specific moment in time. Another significant development in the field is to measure learning from multi-dimensional perspectives. As discussed above, various knowledge types have been examined in ISLA research (e.g., declarative knowledge, procedural knowl edge, explicit knowledge, implicit knowledge), and researchers have d eveloped dif ferent types of measures accordingly. Not only the testing outcome data, but also other indications of learning such as noticing, collaborative dialogue, and awareness provide further insights. In the current volume, several chapters discuss measure ment (see the volume matrix, p. xvii). This volume highlights current trends in ISLA research methods. Throughout there is an emphasis on methods that permit and uncover the potential dynamic ity of variables (for quantitatively-focused studies) and the greater picture within and beyond the classroom (for qualitatively-focused work). Over the last two dec ades, mixed-methods have been increasingly adopted. Furthermore, replication studies are widely encouraged. In the second section of this volume, each chapter introduces different research approaches. 3. Conceptualizing your study 3.1

Connecting to and building off prior research

To maximize the potential impact of your study, it is important to connect to and build off prior research. First, your work must be grounded in disciplinary discus sions, both theoretical and methodological. As mentioned earlier, although the research ideas are motivated by pedagogy/practice-oriented observation, a project also needs to be motivated by current ISLA theory. For example, let’s say that a writing teacher is interested in learning about different strategies of offering effec tive written corrective feedback. In their ISLA research, what strategies researchers have adopted, and how their effectiveness differs, needs to be reviewed and explored before designing a new empirical study. In addition to grounding your study in current ISLA theory and corresponding research methods (see all remaining chapters in this volume), you must also in vestigate all or at least a robust representative sample of empirical work relating to your research interest. For more quantitative approaches, meta-analyses, which

Chapter 1. Getting started 15

statistically examine trends in a given research area (such as the effectiveness of oral feedback, see Plonsky & Brown, 2015) are particularly helpful. For ISLA re search whose findings are varied depending on settings, meta-analysis results may provide an overall picture based on the systematic synthesis of previous research. In more qualitative approaches, narrative syntheses, which compare and discuss trends in topics (see, for example, Gurzynski-Weiss & Plonsky, 2017, which details which interlocutors and which individual differences have been investigated thus far in the cognitive-interactionist approach) or methods (see De Costa et al., 2019, which reviews classroom-based qualitative research methods), provide a starting point. Individual empirical studies, especially replication studies that have zeroed in on improving specific measures, treatments, or analyses, or expanding them to a different context or population of participants (see Chapter 5), are also necessary to understand what has been discovered and what are the motivated next steps in a certain area of research. It is much more impactful for your study to take one small, well-motivated step forward than to start from an untethered space where the p otential contribution is unclear or limited at best. We encourage novice re searchers especially to replicate previous research. Your study must also be grounded in the relevant ISLA research method ology literature. (see all remaining chapters in this volume).We need to utilize instruments that have been found to be both valid (accurately measuring what we hope to measure) and reliable (consistently measuring the variable in the same way across participants and contexts) and that are also the most robust options for the theoretical framework in which the study is conducted. For example, if we are conducting a study examining how learners are influenced by interlocutors within a classroom within Sociocultural Theory, we would need to consider the potential influence of all individuals within a classroom, including prior, imagined, or future, of those within and beyond the classroom (Back, 2020; Lantolf, 2020) and how these individuals ultimately influence a given learner’s L2 developmental trajectory, including their identity in the L2. The sample size of your participant population is another central concern, especially when designing a novel quantitative empirical study. Plonsky (2015) and Larson-Hall (2015) suggest conducting an a priori power analysis to deter mine the sample size needed. There are several options online for calculating the number of participants needed to have a certain effect size, which Plonsky (2015) recommends taking from recent meta-analyses or similar studies on the same topic. When reaching the recommended number of participants in a quantitative study is not possible, it is best to eliminate the quest for finding statistical significance and instead look to descriptives and effect sizes to tell the story based on your data (see Plonsky, 2015, and Larson-Hall, 2015, as well as Chapter 2 of the current vol ume, for more on this). Additional steps to increase the strength of a study include

16

Laura Gurzynski-Weiss and YouJin Kim

reducing the number of divisions (i.e., comparing two instead of four groups), or bootstrapping (Plonsky et al., 2015). In addition to the size of your sample, you also must consider if your sam ple is representative of the population of interest. Or, perhaps more appropriately said for ISLA researchers, to which educational context(s) and population(s) can your results, given an appropriate sample size for your quantitative method, be generalized? For qualitative studies, researchers must justify how participants were selected and describe in detail who the participants are and what the context of instruction and study is, to allow for any potential interpretations for the current, or expanded, populations. We also encourage you to think about how the project at hand fits into the big ger picture of who you are as a researcher. In other words, how does this project ad vance you as a scholar? As each research study requires significant time and e nergy, you want to select a topic that is personally motivating to you and either adds to your skill set – providing you with the opportunity to learn or solidify grounded theory, for example – or to your research agenda; such as providing an applicable branch to your otherwise theoretical agenda; ideally, each project does both. Even in graduate seminars that require original research projects, we encourage our students to choose a topic that is meaningful to their professional trajectory in the immediate (meeting the course requirements, providing the opportunity to learn something new or improve) and long-term (enhancing one’s research agenda and marketability) future. Finally, we encourage you to pilot a reduced version of your study before col lecting the data you plan to use for your full study. A pilot is a practice round and is used to try out and ensure that what you have planned for your empirical study will in fact produce viable data. In a pilot, you will want to try out as close to a sim ilar study as possible to identify as many issues as you can with the planned study. Pilots often reveal potential issues with the planned study and save invaluable time and participants later on. For example, if you are replicating a study, extending it to an additional population or a different institutional context, you may find that the tasks are too difficult for the population you had in mind, and you need to adjust accordingly. Perhaps you will find that the video equipment you plan to use for your longitudinal ethnography does not produce the quality of sound you need to fully capture the data you are after, or maybe you learn that the camera must be placed in a different location due to unforeseen interference with the sound of an air conditioner. Importantly, you want to balance having a pilot be as close as possible to the full study without influencing the instructional context or taking too many participants from your available pool. Another goal of the pilot study is to make sure that the instructional materials are designed appropriately considering the target setting and population. We strongly encourage you to pilot materials and

Chapter 1. Getting started 17

revise them as needed for the full study. We also encourage you to consider the audience of your research and to have a representative colleague or two weigh-in on your design and the data you will be sharing with the field at this earlier stage. To summarize, when designing a research study: (1) Find a topic that is theoret ically and methodologically grounded in current trends; (2) read as much as you can to find out what a logical next step is and replicate or extend the scholarship accordingly; and (3) pilot prior to the full study. How does one build this level of awareness, identify trends and gaps, and expand ISLA knowledge in modest, measurable, and potentially impactful direc tions? The answer is engagement. Not just a catchy term in all areas of education and L2 learning (see Hiver et al., 2020); it applies to your efforts as a burgeoning scholar, as well. We recommend that you: – Subscribe to journals’ Table of Content alerts, set alerts on GoogleScholar for specific authors and/or keywords, and/or follow researchers’ work on ResearchGate, Academia, and professional social media – Attend conferences and webinars that relate to your research interests (see Chapter 15) – Document your research ideas and follow-up questions that come from your reading in a way that honors your personality and way of processing information 3.2

Designing your study: Questions to ask yourself

As you develop the design for your study, here is a list of questions to consider and discuss prior to seeking permission, piloting, and conducting the full study. (1)

(2)

Authorship: Are you designing and conducting this study on your own? Under the guidance of a faculty advisor? With a fellow teacher? Or in collabo ration with another person or several people? What role will each person play in the design, execution, and write-up of this study? What can each person uniquely contribute and get out of the experience of this study? What are the specific tasks each person is responsible for? What are the deadlines of each task? What is the expected authorship, and how will this change if tasks are not completed by the explicit deadline set by the research team? Theoretical and methodological grounding: How is your study informed by current theoretical and methodological discussions in the field? Are all decisions (participants, instruments, coding schemes or anticipated ap proaches to coding and analyses, etc.) theoretically, empirically, and meth odologically motivated?

18

Laura Gurzynski-Weiss and YouJin Kim

(3) (4)

(5)

(6)

(7)

(8)

Relation with prior research: How does your study relate to and expand prior research within and outside of ISLA? Research questions: How can you craft research questions that are specific, measurable, and can offer novel insight into a particular branch or between branches of ISLA research? How are you operationalizing all terms in your research questions, and what data will you elicit to answer these questions? Alternatively, if you plan to have research questions emerge from your data set, how will you go about this process? Participants: Who will your participant(s) be? Why are they the most ap propriate participants for this study? How can you measure all potentially relevant background variables1 of all participating in the study, including you as researcher and, when applicable, participants in the surrounding contex t(s)? How will you recruit your participants? What does your participant pool mean for potential generalizability (if a goal for your study)? How many par ticipants do you need? If it’s a quantitative study where statistics are needed, does your study have enough power to detect the effect you anticipate? What sampling procedures are you going to employ? Are they appropriate consid ering your study context and the research questions you are investigating? Data collection: What data do you need to answer your research questions? How will you collect the data? Are you going to collect the data as a part of regular courses? Who will do this and where will it happen? Will you use prior instruments as-is, edit existing instruments,2 or create your own? How will you validate your instrument(s) or report prior validation in your context? If you are employing any kind of measurement instruments in your study, will they produce reliable results? What reliability estimate should you report? How can you reduce or at least account for your presence as a researcher in these data collection procedures? Saving and anonymizing data: How will you assign participants code num bers or pseudonyms and save anonymized data separately from identifiable information? Who will have access to the anonymized data and/or to the identifiable information? Data analysis: How will you analyze the data you collect? Will you have pre determined coding, or will your coding schemata arise from the data itself? What resources can you enlist for guidance (statistical centers on campus, books, websites, online groups)? Who else will code your data? How will

1. Individual differences (IDs) are discussed in detail in Chapter 6. 2. We recommend researchers visit IRIS-database.org, which houses instruments published in SLA studies, for inspiration. See also the Task Bank, tblt.indiana.edu, for L2 tasks used in classrooms and task-based research.

Chapter 1. Getting started 19

(9)

(10)

(11)

(12)

(13)

you compare your coding and ensure it is similar enough? If using statistical analysis, what assumptions are there for the test(s)? What adjustments (al ternative statistics if you violate assumptions, or normative adjustments to the dataset) will need to be made based on the sample you expect? How will you make these adjustments? How will you deal with missing data (whether random, systematic, or planned)? Pilot: What are the most novel parts of your study in need of the most trial? What do you need to learn from your pilot study? How can you design a pilot that provides the closest version of your full study without influencing or reducing the full study (particularly in terms of eligible participants)? Potential impact of your study: In an ideal scenario, what will your study offer the field? If you do not find what you anticipate, how does your data contribute to the field? Have you consulted with all relevant stakeholders to ensure the study as designed has the potential for impact in the ways you are intending? Limitations: What potential limitations are there in your design? How can you eliminate as many as possible? Are the limitations that remain necessary and/or the lesser of the limitations for collecting this data? Project timeline: What is your timeline for each step of this project? Have you factored into the timeline the pilot study as well as the submission and waiting process for the approval from your Ethics Committee? What is your end goal for sharing this work? Does this align with your timeline? Record of decision-making: Where will you keep a record of all of your decisions made in the study so you have a coherent narrative to look back on when writing it up to share?

4. Research with human subjects If your study involves people, and most ISLA studies do, you will need to seek approval from your institution’s Ethics Committee.3 The Ethics Committee is a governing body that oversees that any research conducted involving human participants is conducted ethically, responsibly, with minimal and explicitly ex pressed risk to participants, and with full participant consent to participate. The committee is also responsible for ensuring researchers follow-up on studies as they progress, report any adverse and unforeseen risks, etc. At your institution, you will need to find the Ethics Committee, go through a training on conducting research

3. In the United States this is commonly referred to as the Institutional Review Board.

20 Laura Gurzynski-Weiss and YouJin Kim

with human subjects,4 identify the correct person to go to with questions (while there is a general contact information, it is common practice for a given department or branch of research to have an ethics contact with whom they work closely; ask your colleagues), and then begin working on submitting your protocol, or outline of your proposed study. You will need to have ethics approval before you recruit participants for your pilot and full study. The two types of ethics committee proto cols you will usually choose between for ISLA research are exempt and expedited studies, which we detail in the next sections. 4.1

Exempt research studies

Exempt research studies are reviewed by one Human Subjects Office or ethics staff member. For a study to be considered exempt, the proposal must contain no more than minimal risk for participants and fall into one or more of specific exempt categories. We selected the most common exempt categories for ISLA research (Indiana University Human Resource Protection Program, 2021a): – Category 1: Research conducted in established or commonly accepted educa tional settings – Category 2: Research that involves observations, tests, or surveys – Category 3: Research using brief behavioral interventions that do not cause harm, embarrassment, or offense to participants An example study in Category 1 would be an ISLA study that compares two types of instructional interventions, such as implicit and explicit instruction. A study falling under Category 2 could include a design that has learners completing an anxiety questionnaire before, halfway through, and after completing an information-gap task with an interlocutor. And finally, in Category 3 we could find a study that requires learners to repeat the feedback given to them orally while interacting with an interlocutor in an online chat. 4.2

Expedited research studies

Expedited studies are reviewed by one or more ethics reviewers, have minimal risk for participants, and fall into one or more specific categories (again, we r eproduced those most common for ISLA research; there are others) (Indiana University Human Resource Protection Program, 2021b): 4. You will most likely also complete something akin to a conflict of interest statement with your institution, declaring any additional consulting or employment outside of your regular teaching or research contracts.

Chapter 1. Getting started 21

– Category 5: Research using materials (documents, data, records, etc.) that have been or will be collected only for non-research purposes – Category 6: Data from voice, video, digital or image recordings made for re search purposes – Category 7: Research involving individual or group characteristics (including research on perception, cognition, motivation, language, communication, cultural beliefs or practices, and social behavior). Research using surveys, interviews, oral history, focus groups, program or human factors evaluation, or quality assurance methodologies. A study classified as Category 5 could be reanalyzing anonymized student work collected earlier. For Category 6, we could expect to see a study that has instructors complete a task on a computer while wearing eye-tracking equipment and thinking out loud and later completing an electronic survey. And finally, a study that would be classified as Category 7 could include a longitudinal analysis of university-level learners’ IDs, including L2 motivation, attitudes towards the L2, as well as online working memory tests, collected each semester over two years. 4.3

Preparing your protocol

You will need to have your study thoroughly designed and well thought-out before completing the ethics protocol, especially with respect to participant eligibility, recruitment, participant compensation, data collection, data saving, coding, and who will have access to the participants and identifiable data. At minimum, the protocol will ask: – what your study is about, – what research questions your study hopes to answer, explained without jargon and with each term operationalized – what participants you will recruit, – how participant consent will be obtained, – what participants will be asked to do including copies of these instruments – if, when, and how you will pay or otherwise compensate participants and what they need to do to be compensated – what the potential risks are to participating in the study and how risks will be minimized; common risks in ISLA studies are the loss of confidentiality of data if someone hacks into your database, for example – what the potential benefits are of participating in your study, in both the shortand long-term

22

Laura Gurzynski-Weiss and YouJin Kim

– who the personnel will be for your study, what each person’s role will be (particularly who will be interacting with the participants and/or have access to identifiable data), and assurance that they have been trained in human subject research as well as in the protocols within your study, and finally – the anticipated timeline for completion of your study 4.4 Research that supports DEIA principles Finally, prior to submitting your study to the Ethics Committee, and most definitely before piloting or contacting potential participants, we encourage you to examine your design and adjust as needed to support the field’s commitment to conducting research that increases the diversity, equity, inclusion and access (DEIA) to the potential benefits of your work. Whereas the former inappropriate nature of em pirical research led to the creation of Ethics Committees, there is similarly much remaining work to be done to ensure that all research, studies within the field of ISLA included, conducts work that has the potential to include and benefit all. We encourage you to thoroughly survey the field and not rely on others’ potentially whitewashed syntheses of published material, to include researchers of diverse backgrounds, to ensure that your study participants represent all L2 learners, and to take efforts so that the potential positive impact of your study is accessible to all. 5. Conclusions Although L2 pedagogy research has been widely conducted, the establishment of the ISLA research domain as a separate subfield of SLA research is rather new. In this chapter, we introduced the field of ISLA and the overarching goals of ISLA research as well as key questions and considerations in ISLA research, particularly related to the use of human subjects for both exempt and expedited studies. We discussed initial steps to follow when designing ISLA research and highlighted the importance of piloting and keeping an eye on ethics throughout the research design and implementation. While we revisit some of these key considerations in Chapter 6 (i.e., individual differences) as well as Chapter 15 (disseminating ISLA re search), this chapter has laid the foundation for the next chapters that present meth odology options present for ISLA researchers: quantitative (Chapter 2), qualitative (Chapter 3), mixed methods (Chapter 4), and replication (Chapter 5).

Chapter 1. Getting started 23

6. Further reading While this chapter provides a thorough overview of key considerations of get ting started in conducting ISLA research, we encourage you to follow our advice and review additional literature on the topic. Below you will find several specific recommendations to start with in your further reading.

Introducing ISLA Ellis, R. (2012). Language teaching research and language pedagogy. Wiley-Blackwell. https://doi.org/10.1002/9781118271643 Ellis, R., & Shintani, N. (2014). Exploring language pedagogy through second language acquisition research. Routledge. Housen, A., & Pierrard, M. (Eds.). (2005). Investigations in instructed second language acquisition. Mouton de Gruyter. https://doi.org/10.1515/9783110197372 Leow, R. P. (2015). Explicit learning in the L2 classroom. Routledge. https://doi.org/10.4324/9781315887074 Leow, R. P., & Cerezo, L. (2016). Deconstructing the I and SLA in ISLA: One curricular approach. Studies in Second Language Learning and Teaching, 6, 43–63. https://doi.org/10.14746/ssllt.2016.6.1.3 Loewen, S. (2020). Introduction to instructed second language acquisition (2nd ed.). Routledge. https://doi.org/10.4324/9781315616797 Loewen, S., & Sato, M. (Eds.). (2017). The Routledge handbook of instructed second language acquisition. Routledge. https://doi.org/10.4324/9781315676968 Sok, S., Kang, E. Y., & Han, Z. (2019). Thirty-five years of ISLA on form-focused instruction: A methodological synthesis. Language Teaching Research, 23(4), 403–427.

Research methods in ISLA Friedman, D. (in progress). Researching second language classrooms: Qualitative and mixed methods approaches. Routledge. Mackey, A., & Gass, S. M. (2016). Second language research: Methodology and design. Routledge. Mackey, A., & Gass, S. M. (Eds.). (2012). Research methods in second language acquisition: A practical guide. Wiley-Blackwell. McKay, S. L. (2006). Researching second language classrooms. Lawrence Erlbaum. https://doi.org/10.4324/9781410617378 Plonsky, L. (2015). Advancing quantitative methods in second language research. Routledge. https://doi.org/10.4324/9781315870908 Riazi, A. M. (2017). Mixed methods research in language teaching and learning. Equinox.

24

Laura Gurzynski-Weiss and YouJin Kim

Ethical considerations in ISLA research Barnard, R., & Wang, Y. (Eds.). (2021). Research ethics in second language education: Universal principles, local practices. Routledge. De Costa, P. (Ed.). (2016). Ethics in applied linguistics research: Language researcher narratives. Routledge. De Costa, P., Rabie-Ahmed, A. & Cinaglia, C. (forthcoming). Ethical issues in applied linguistics scholarship. John Benjamins. De Costa, P., Sterling, S., Lee, J., Li, W., & Rawal, H. (2021). Research tasks on ethics in applied linguistics. Language Teaching, 54(1), 58–70. https://doi.org/10.1017/S0261444820000257 Sterling, S., & De Costa, P. (2018). Ethical applied linguistics research. In A. Phakiti, P. De Costa, L. Plonsky, & S. Starfield (Eds.), The Palgrave handbook of applied linguistics research methodology (pp. 163–182). Palgrave Macmillan. https://doi.org/10.1057/978-1-137-59900-1_8 Sterling, S., & Gass, S. (2017). Exploring the boundaries of research ethics: Perceptions of ethics and ethical behaviors in applied linguistics research. System, 70, 50–62. https://doi.org/10.1016/j.system.2017.08.010

Acknowledgements Our thanks to Andrea Révész for suggesting the second part of this question during the review process.

References Back, M. (2020). Interlocutor differences and the role of social others in a Spanish peer tutoring context. In L. Gurzynski-Weiss (Ed.), Cross-theoretical explorations of interlocutors and their individual differences (pp. 99–123). John Benjamins. https://doi.org/10.1075/lllt.53.05bac Bowles, M., Adams, R., & Toth, P. (2014). A comparison of L2-L2 and L2-HL interactions in Spanish language classrooms. Modern Language Journal, 98(2), 497–517. https://doi.org/10.1111/modl.12086 Cheng, A. C. (2002). The effects of processing instruction on the acquisition of ser and estar. Hispania, 85(2), 308–323. https://doi.org/10.2307/4141092 de Bot, K., Lowie, W., & Verspoor, M. (2007). A Dynamic Systems Theory approach to second language acquisition. Bilingualism Language and Cognition 10(1), 7–21. https://doi.org/10.1017/S1366728906002732 De Costa, P., Li, W., & Rawal, H. (2019). Qualitative classroom methods. In J. Schweiter & A. Benati (Eds.), The Cambridge handbook of language learning (pp. 113–136). Cambridge. https://doi.org/10.1017/9781108333603.006 DeKeyser, R. (2003). Implicit and explicit learning. In C. Doughty & M. Long (Eds.), The handbook of second language acquisition (pp. 313–348). Wiley-Blackwell. https://doi.org/10.1002/9780470756492.ch11 Douglas Fir Group. (2016). A transdisciplinary framework for SLA in a multilingual world. Modern Language Journal, 100(Supplement 2016), 19–47. https://doi.org/10.1111/modl.12301

Chapter 1. Getting started 25

Dörnyei, Z., & Ryan, S. (2015). The psychology of the language learner revisited. Routledge. https://doi.org/10.4324/9781315779553 Ellis, R. (1994). The study of second language acquisition. Oxford University Press. Ellis, R. (2009). Implicit and explicit learning, knowledge, and instruction. In R. Ellis, S. Loewen, C. Elder, R. Erlam, J. Philp, & H. Reinders (Eds.), Implicit and explicit knowledge in second language learning, testing, and teaching (pp. 3–25). Multilingual Matters. https://doi.org/10.21832/9781847691767-003 Ellis, R., Loewen, S., & Erlam, R. (2006). Implicit and explicit corrective feedback and the acqui sition of L2 grammar. Studies in Second Language Acquisition, 28(2), 339–368. https://doi.org/10.1017/S0272263106060141 Ellis, R., Loewen, S., Elder, C., Erlam, R., Philp, J., & Reinders, H. (Eds., 2009), Implicit and explicit knowledge in second language learning, testing, and teaching. Multilingual Matters. Ellis, N. C., & Wulff, S. (2020). Usage-based approaches to L2 acquisition. In B. VanPatten & J. Williams (Eds.), Theories in second language acquisition: An introduction (2nd ed.) (pp. 63–85). Routledge. https://doi.org/10.4324/9780429503986-4 Mackey, A., & Gass, S. (2014). Interaction approaches. In B. VanPatten & J. Williams (Eds.), Theories in second language acquisition. Routledge. Gass, S. M., & Mackey, A. (2020). Input, interaction, and output in L2 acquisition. In B. VanPatten, G. D. Keating, & S. Wulff (Eds.), Theories in second language acquisition (pp. 192–222). Routledge. https://doi.org/10.4324/9780429503986-9 Geeslin, K. (2000). A new approach to the study of the SLA of copula choice. In R. Leow & C. Sanz (Eds.), Spanish applied linguistics at the turn of the millennium (pp. 50–66). Cascadilla Press. Geeslin, K. (2003). A comparison of copula choice in advanced and native Spanish. Language Learning, 53(4), 703–764. https://doi.org/10.1046/j.1467-9922.2003.00240.x Geeslin, K., & Long, A. Y. (2015). The development and use of the Spanish copula with adjectives by Korean-speaking learners. In I. Pérez-Jiménez, M. Leonetti, & S. Gumiel-Molina (Eds.), New perspectives on the study of ser and estar (pp. 293–324). John Benjamins. https://doi.org/10.1075/ihll.5.11gee Gurzynski-Weiss, L. (Ed.). (2017). Expanding individual difference research in the interaction approach: Investigating learners, instructors, and other interlocutors. John Benjamins. https://doi.org/10.1075/aals.16 Gurzynski-Weiss, L. (Ed.). (2020). Cross-theoretical explorations of interlocutors and their individual differences. John Benjamins. https://doi.org/10.1075/lllt.53 Gurzynski-Weiss, L., & Plonsky, L. (2017). Look who’s interacting: A scoping review of research involving non-teacher/non-peer interlocutors. In L. Gurzynski-Weiss (Ed.), Expanding individual difference research in the interaction approach (pp. 306–324). John Benjamins. https://doi.org/10.1075/aals.16.13gur Hiver, P., Al-Hoorie, A. H., & Mercer, S. (Eds.). (2020). Student engagement in the language classroom. Multilingual Matters. https://doi.org/10.21832/HIVER3606 IU Human Resource Protection Program. (2021a, January 19). HRPP Policy- Exempt Research. Indiana University Research. Retrieved on 2 June 2022 from https://research.iu.edu/policies/ human-subjects-irb/exempt-research.html IU Human Resource Protection Program. (2021b, January 19). HRPP Policy- IRB Review Process. Indiana University Research. Retrieved on 2 June 2022 from https://research.iu.edu/policies/ human-subjects-irb/irb-review-process.html

26 Laura Gurzynski-Weiss and YouJin Kim

Kagan, O., Carreira, M., & Chik, C. (2017). The Routledge handbook of heritage language education: From innovation to program building. Routledge. https://doi.org/10.4324/9781315727974 Kessler, M., Loewen, S., & Trego, D. (2020). Synchronous VCMC with TalkAbroad: Exploring noticing, transcription, and learner perceptions in Spanish foreign-language pedagogy. Language Teaching Research. https://doi.org/10.1177/1362168820954456 Kim, Y. (2008). The contribution of collaborative and individual tasks to the acquisition of L2 vocabulary. Modern Language Journal, 92(1), 114–130. https://doi.org/10.1111/j.1540-4781.2008.00690.x Kim, Y. (2012). Task complexity, learning opportunities, and Korean EFL learners’ question de velopment. Studies in Second Language Acquisition, 34(4), 627–658. https://doi.org/10.1017/S0272263112000368 Kim, Y., Choi, B., Yun, H., Kim, B., & Choi, S. (in press). Task repetition, synchronous writ ten corrective feedback, and the learning of Korean grammar: A classroom-based study. Language Teaching Research. https://doi.org/10.1177/1362168820912354 Lantolf, J. P. (2020). I~ You> I~ Me. In L. Gurzynski-Weiss (Ed.), Cross-theoretical explorations of interlocutors and their individual differences (pp. 79–97). John Benjamins. https://doi.org/10.1075/lllt.53.04lan Larson-Hall, J. (2015). A guide to doing statistics in second language research using SPSS and R. Routledge. https://doi.org/10.4324/9781315775661 Leow, R. P. (2000). A study of the role of awareness in foreign language behavior: Aware versus unaware learners. Studies in Second Language Acquisition, 22(4), 557–584. https://doi.org/10.1017/S0272263100004046 Leow, R. P. (2015). Explicit learning in the L2 classroom: A student-centered approach. Routledge. https://doi.org/10.4324/9781315887074 Leow, R. P. (2018). ISLA: How implicit or how explicit should it be? Theoretical, empirical, and pedagogical/curricular issues. Language Teaching Research, 23(4), 1–18. https://doi.org/10.1177/1362168818776674 Loewen, S. (2020). Introduction to instructed second language acquisition (2nd ed.). Routledge. https://doi.org/10.4324/9781315616797 Loewen, S., Crowther, D., Isbell, D. R., Kim, K. M., Maloney, J., Miller, Z. F., & Rawal, H. (2019). Mobile-assisted language learning: A Duolingo case study. ReCALL, 31(3), 293–311. https://doi.org/10.1017/S0958344019000065 Loewen, S., Isbell, D. R., & Sporn, Z. (2020). The effectiveness of app-based language instruc tion for developing receptive linguistic knowledge and oral communicative ability. Foreign Language Annals, 53(2), 209–233. https://doi.org/10.1111/flan.12454 Long, M. H. (1983). Native speaker/non-native speaker conversation and the negotiation of com prehensible input. Applied Linguistics, 4(2), 126–141. https://doi.org/10.1093/applin/4.2.126 Long, M. H. (1996). The role of the linguistic environment in second language acquisition. In W. C. Ritchie & T. K. Bhatia (Eds.), Handbook of second language acquisition (pp. 413–468). Academic Press. Long, A. Y., & Geeslin, K. L. (2020). Examining the role of instructor first language in class room-based oral input. In L. Gurzynski-Weiss (Ed.), Cross-theoretical explorations of interlocutors and their individual differences (pp. 127–157). John Benjamins. https://doi.org/10.1075/lllt.53.07lon Mackey, A. (1999). Input, interaction, and second language development: An empirical study of question formation in ESL. Studies in Second Language Acquisition, 21(4), 557–587. https://doi.org/10.1017/S0272263199004027

Chapter 1. Getting started 27

Mackey, A., & Gass, S. M. (2022). Second language research: Methodology and design (3rd ed.). Routledge. Mitchell, R., Myles, F., & Marsden, E. (2019). Second language learning theories. Routledge. https://doi.org/10.4324/9781315617046 Nakatsukasa, K. (2017). Gender and recasts. In L. Gurzynski-Weiss (Ed.), Expanding individual difference research in the interaction approach: Investigating learners, instructors, and other interlocutors (pp. 100–119). John Benjamins. https://doi.org/10.1075/aals.16.05nak Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quan titative meta-analysis. Language Learning, 50(3), 417–528. https://doi.org/10.1111/0023-8333.00136 Payant, C. & Kim, Y. (2019). Impact of task modality on collaborative dialogue among plurilin gual learners: A classroom-based study. International Journal of Bilingual Education and Bilingualism, 22(5), 614–627. https://doi.org/10.1080/13670050.2017.1292999 Plonsky, L. (Ed.). (2015). Advancing quantitative methods in second language research. Routledge. https://doi.org/10.4324/9781315870908 Plonsky, L., & Brown, D. (2015). Domain definition and search techniques in meta-analyses of L2 research (Or why 18 meta-analyses of feedback have different results). Second Language Research, 31(2), 267–278. https://doi.org/10.1177/0267658314536436 Plonsky, L., Eggbert, J., & LaFlair, G. (2015). Bootstrapping in applied linguistics: Assessing its potential using shared data. Applied Linguistics, 36(5), 591–610. Ruben, J. (1975). What the ‘good language learner’ can teach us. TESOL Quarterly, 9, 41–51. https://doi.org/10.2307/3586011 Ryan, J., & B. Lafford. (1992). Acquisition of lexical meaning in a study abroad environment: Ser and estar and the Granada experience. Hispania, 75(3), 714–722. https://doi.org/10.2307/344152 Schmitt, N., & Celce-Murcia, M. (2020). An overview of applied linguistics. In N. Schmitt & M. P. H. Rodgers (Eds.), An introduction to applied linguistics (3rd ed.) (pp. 1–15). Routledge. Serafini, E. J. (2020). The impact of learner perceptions of interlocutor IDs on learner possible selves during a short-term experience abroad. In L. Gurzynski-Weiss (Ed.), Cross-theoretical explorations of interlocutors and their individual differences (pp. 210–243). John Benjamins. https://doi.org/10.1075/lllt.53.09ser Serafini, E. J., & Sanz, C. (2016). Evidence for the decreasing impact of cognitive ability on sec ond language development as proficiency increases. Studies in Second Language Acquisition, 38(4), 607–646. https://doi.org/10.1017/S0272263115000327 Shintani, N., Li, S., & Ellis, R. (2013). Comprehension-based versus production-based instruc tion: A meta-analysis of comparative studies. Language Learning, 63(2), 296–329. https://doi.org/10.1111/lang.12001 Sonbul, S., & Schmitt, N. (2013). Explicit and implicit lexical knowledge: Acquisition of colloca tions under different input conditions. Language Learning, 63(1), 121–159. https://doi.org/10.1111/j.1467-9922.2012.00730.x Spada, N., & Tomita, Y. (2010). Interactions between type of instruction and type of language feature: A meta-analysis. Language Learning, 60(2), 263–308. https://doi.org/10.1111/j.1467-9922.2010.00562.x Swain, M. (2005). The output hypothesis: Theory and research. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 495–508). Routledge.

28

Laura Gurzynski-Weiss and YouJin Kim

Taguchi, N. (2015). Instructed pragmatics at a glance: Where instructional studies were, are, and should be going. Language Teaching, 48(1), 1–50. https://doi.org/10.1017/S0261444814000263 Toomer, M., & Elgort, I. (2019). The development of implicit and explicit knowledge of collo cations: A conceptual replication and extension of Sonbul and Schmitt (2013). Language Learning, 69(2), 405–439. https://doi.org/10.1111/lang.12335 Valdés, G. (2000). Introduction. In L. Sandstedt (Ed.), Spanish for native speakers. Harcourt College. VanPatten, B. (1987). Classroom learners’ acquisition of ser and estar: Accounting for develop mental patterns. In B. VanPatten, T. Dvorak, & J. F. Lee (Eds.), Foreign language learning: A research perspective (pp. 61–75). Newbury House. VanPatten, B., & Benati, A. G. (2010). Key terms in second language acquisition. Bloomsbury. VanPatten, B., & Williams, J. (2015). Introduction: The nature of theories. In B. VanPatten & J. Williams (Eds.), Theories in second language acquisition: An introduction (2nd ed.) (pp. 1–16). Routledge. VanPatten, B., Williams, J., Keating, G. D., & Wulff, S. (2020). Introduction: The nature of theo ries. In B. VanPatten, G. D. Keating, & S. Wulff (Eds.), Theories in second language acquisition (pp. 1–18). Routledge. https://doi.org/10.4324/9780429503986-1 Williams, J. N. (2004). Implicit learning of form-meaning connections. In B. VanPatten, J. Williams, S. Rott, & M. Overstreet (Eds.), Form meaning connections in second language acquisition (pp. 203–218). Lawrence Erlbaum Associates. Williams, J. N. (2005). Learning without awareness. Studies in Second Language Acquisition, 27(2), 269–304. https://doi.org/10.1017/S0272263105050138 Ziegler, N., & Smith, G. (2017). Teachers’ provision of feedback in L2 text-chat. In L. Gurzynski-Weiss (Ed.), Expanding individual difference research in the interaction approach: Investigating learners, instructors, and other interlocutors (pp. 255–279). John Benjamins. https://doi.org/10.1075/aals.16.11zie

Section 2

Identifying your research approach

Chapter 2

Quantitative research methods in ISLA Shaofeng Li

Florida State University

This chapter provides a comprehensive, in-depth discussion of quantitative research methods in ISLA. It starts by informing the reader of the basic elements and principles of quantitative ISLA research and the major research questions examined. It then identifies three categories of ISLA research: experimental, correlational, and observational, followed by a discussion of the methods for evaluating the quality of quantitative research from the perspectives of construct validity, internal validity, external validity, and statistical conclusion validity. The chapter proceeds to offer advice for future researchers on how to carry out original and robust quantitative research and propose strategies to address com mon methodological issues. A conclusion is offered at the end. Keywords: quantitative research methods, instructed second language acquisition, experimental research, correlational research, observational research

1. The fundamentals of quantitative ISLA research 1.1

Nature

Quantitative ISLA research draws on numeric data to examine the effectiveness of instructional interventions, the relationships between phenomena, and the oc currence of events or behaviors in ISLA contexts. Based on this conceptualization, quantitative research has the following characteristics. First, behaviors, processes, phenomena, events, and outcomes are measured, quantified, and reduced to numbers. Therefore, measurement is a central element of this research paradigm, and the validity of measurement tools, instruments, and procedures is essential. Second, a major objective is to make inferences about the whole population based on the current sample and generalize the results to other settings and learners. Generalizability relies at least partly on sample size; a larger sample is a proxy of higher quality and is always strived for. Third, a defining feature is the use of statistical analysis, which provides the size of a difference, the strength of a corre lation, and the probability of an event, and helps the researcher infer whether the https://doi.org/10.1075/rmal.3.02li © 2022 John Benjamins Publishing Company

32

Shaofeng Li

finding is due to chance. Fourth, a quantitative study focuses on one or a small set of variables, and because the purpose is to observe a clear effect for the investi gated variable(s), it is important to control or tease out the effects of other poten tially confounding variables. Therefore, manipulation and control of variables are of central importance in quantitative research. Fifth, a major goal of quantitative research is to compare – individuals, groups, instruction types, learning conditions, etc. Therefore, accounting for variation and differences is key. Sixth, quantitative research aims to answer predetermined questions or test hypotheses derived from theories. Thus, the goal is straightforward and specific, and only data that are rele vant to the research questions are collected, coded, and analyzed. Seventh, because the goal is to generalize the finding to other settings and samples, it is important to provide a clear conceptualization of the examined phenomenon and implement or operationalize it with high fidelity and with consistency within and beyond the context of the current study. 1.2

Variables

One fundamental concept of quantitative research is the variable, which refers to a behavior, trait, or phenomenon that varies, has values, and can be measured or operationalized. In terms of value, a variable can be categorical (e.g., simple vs. complex tasks) or continuous (e.g., learning gains represented by test scores) depending on how it varies. In terms of function in the research, a usual distinction is between independent and dependent variables, with the former referring to the causal factor whose variation is independent of other factors and the latter to the outcome that varies in response to the independent variable. In experimental ISLA research, the independent variable is normally categorical because the purpose is to examine treatment effects, which involves the comparison between categories/ groups/conditions/treatments. In correlational research, where causal relationships are irrelevant, the hypothetical “causal” variable in a correlation can be called the “predicting variable” (e.g., anxiety) and the hypothetical “outcome” variable the “criterion variable” or “response variable” (e.g., learning gains). They are hypothet ical “causal” or “outcome” variables because a correlational design does not allow a causal relationship to be established. If no hypothetical causal relationship is assumed, such as the relationship between two individual difference variables (e.g., working memory and language aptitude) or two variables relating to L2 proficiency (e.g., grammar and vocabulary), then it is better not to employ the terms “predict ing” and “criterion.” In observational research where the frequency of an event is of primary interest, researchers are often interested in whether the frequency differs between categories, in which case the categories constitute an independent variable and frequency count the dependent variable.

Chapter 2. Quantitative research methods in ISLA 33

1.3

Statistics

Statistical analysis is a defining feature of quantitative research. There are two types of statistics: descriptive and inferential. Descriptive statistics refer to indices of the sample of the current study, including measures of central tendency and measures of variability. Measures of central tendency are revealing of the commonalities of the sample and focus on the center of the data, including the mean (the average), mode (the most frequent data point), and median (the middle data point). Measures of variability refer to indices pertaining to the variation and distribution of data points, including the standard deviation (the average distance between all scores and the mean score), range (the difference between the highest and lowest scores), skewness (whether the distribution is symmetrical around the mean), and kurtosis (whether the peak is steep or flat). Inferential statistics aim to make inferences or draw conclusions about the population based on the data contributed by the current sample. Inferential statistics are primarily based on null hypothesis significance testing, which reveals whether the difference or relationship based on this sample is due to chance or whether it is statistically significant/meaningful. A null hypothesis states that there is no difference between the experiment and control groups in the population or that there is no correlation between the tested variables. A statisti cal test is then conducted that generates a p value that represents whether there is difference or correlation in the population. A p value below .05 means that the probability that the result is due to chance is below five percent, and the conclusion is that the difference or correlation is significant. Inferential statistics are prone to two types of statistical errors: Type I and Type II. A Type I error occurs when a significant result is found but actually does not exist; alternatively, it means that the null hypothesis is not rejected when it should be. Type I errors may result from flawed research methods, such as when the observed effect is attributable to extraneous variables instead of the independent variable, or from repeated statistical analyses with the same sample, which may inflate p values and increase the probability that the results are due to chance. Accordingly, to minimize Type I errors, one needs to increase methodological rigor and/or make adjustments to significance values by adopting more conservative significance values such as .01 instead of .05 or making statistical corrections such as using the Bonferroni test. Type II errors occur when one fails to find a significant result that actually exists or when one fails to reject a null hypothesis that should be rejected. The probability of making a Type II error is called beta (β), and the probability of not making Type II errors is called power, which is represented by 1- β. The solution to Type II errors is to increase power, which in turn is enhanced by increasing the sample size (see later sections on power analysis).

34

Shaofeng Li

1.4

Sampling

A sample is a group or subset of people of the population (i.e., all individuals with the examined characteristic), and sampling refers to the selection of a sample of the population who serve as participants of an empirical study and who contrib ute data that enable the researcher to answer the research questions. Sampling entails sample selection and sample size. Sample selection can be discussed along dimensions of age group, learning context, target language, and so on. Sample size is of central importance in quantitative ISLA research as it determines the gener alizability of the results. The best way to determine the appropriate sample size is through power analysis, which can show the probability of finding a significant effect if it exists. For example, a power of .8 means that the probability of finding a significant effect is 80%. Power is the opposite of Type II error, which is the failure to find a significant effect that exists. To conduct a power analysis for the purpose of determining the appropriate sample size, one needs to have the estimated effect size, the significance value for the effect, and the size of the requested power. The effect size (e.g., Cohen’s d or r) should be based on previous research, preferably meta-analysis, which aggregates all empirical research on the topic and generates a mean effect size. If no research has been conducted on the topic, which is rare, then the researcher may refer to research on related topics in the field to get an estimated effect size for power analysis. Power analysis can also be conducted a posteriori after the study was conducted, in which case the purpose is to determine how much power the study has based on the obtained effect size and the current sample. 2. Typical research questions in quantitative ISLA studies 2.1

Which types of instructional intervention are effective or more effective in enhancing L2 learning gains, behaviors, and processes?

This category of research questions is examined by experimental studies, and the primary concern is whether a certain type of instruction is effective in enhancing L2 learning or leading to features/processes that contribute to L2 learning. L2 learning, the dependent variable, is mostly operationalized as achievements or gains meas ured via focused tests that target a particular linguistic structure learners received treatment on. What distinguishes this category of research questions from other categories is the systematic manipulation of the independent variable – L2 instruc tion type or learning condition. These studies can be further divided into micro and macro studies. Micro studies examine the impact of certain instructional f eatures, techniques, or devices, while macro studies investigate the effects of instructional methods, which are instructional packages rather than discrete aspects or com ponents of instruction. Some example instructional features examined in micro

Chapter 2. Quantitative research methods in ISLA 35

studies include different types of oral (recasts, metalinguistic, input-providing vs. output-prompting, etc.) and written (direct, indirect, and metalinguistic) corrective feedback; input manipulation, such as input processing, input enhancement, or input modification; modified output; and so on. Example instructional approaches or packages examined in macro studies include PPP (present, practice, production), task-based instruction, and audiolingual instruction. Both micro and macro studies have also examined the constraining factors for the (comparative) effectiveness of instructional treatments, including learner-internal factors such as working memory or language aptitude or learner-external factors such as proficiency level. Those factors can be called the moderators of treatment effects. 2.2

What teaching/learning behaviors or activities, or what relationships between teaching/learning behaviors/phenomena, can be observed?

This category of research questions concerns (1) the occurrence and variation of behaviors, events, and outcomes or (2) the relationships between them. The studies examining these research questions seek to describe the phenomena in their natural state without any instructional manipulation. The studies can be further divided into subcategories examining the following topics: – Characteristics of classroom discourse or instructional approaches. Example topics include teacher questions, teachers’ wait time before students’ responses to questions, teachers’ use of corrective feedback, students’ noticing of correc tive feedback, teachers’ language use, the sequence organization of teacherstudent interaction, teachers’ and students’ L1 use, and focus or nature of class room activities. – Relationships between variables, behaviors, and phenomena, such as between L1 and L2 fluency; between teachers’ stated beliefs about corrective feedback and their feedback-providing behaviors in the classroom; between teachers’ use of motivating strategies and students’ motivation; between vocabulary and grammar on one hand and reading comprehension on the other; between different individual difference factors, such as between aptitude and anxiety; and between predicting and criterion variables, such as working memory and vocabulary size. – The longitudinal (within-group) or cross-sectional (between-group) variability in L2 proficiency or other outcome measures. For example, one stream of L2 writing research is to track the trajectory of learners’ writing development over a period. A large body of research has examined whether study-abroad learners show faster L2 development than at-home learners. In ID research, one research question is whether learners show a high or low level on a trait or disposition variable, such as whether learners have high or low writing motivation.

36

Shaofeng Li

2.3

How can L2 proficiency, achievements, or learner traits be effectively assessed?

The third category of research questions concerns assessments of L2 proficiency, L2 achievements, or psychological constructs. Example questions relating to the assessments of L2 proficiency include how to measure implicit and explicit lin guistic knowledge; what aspects of learners’ speech performance determine raters’ judgements of L2 speech quality; whether self-assessments are correlated with L2 proficiency; to what extent the number of options for multiple choice questions influences test validity. Assessments of achievements refer to tests of treatment effects, which are normally used in experimental ISLA research where learners are given a test before the treatment (pretest) and the same test after the treatment (posttest). Typically, the test measures learners’ knowledge of a particular structure and includes a small number of items. Assessments of psychological constructs refer to tests or instruments (such as questionnaires) used to measure ID variables such as working memory or L2 anxiety, and ISLA research abounds in validation studies examining the validity of such tests or instruments.

3. Most common options for investigation using quantitative research methods In general, quantitative ISLA research falls into three broad categories (Table 1): experimental, correlational, and observational. Experimental research seeks to examine causal relationships. It is characterized by consistent manipulation of one or more variables, and the purpose is to explore whether the manipulation leads to a change in learner behaviors, processes, and outcomes. Correlational research does not examine causal relationships; rather they explore whether phenomena, behaviors, or traits are related to each other, and the most common statistical analysis is simple correlation or other analyses in the correlation family (e.g., mul tiple regression). Observational research describes the occurrence of events, phe nomena, or behaviors, and the data typically take the form of frequency counts. It should be clarified that there are likely studies that fall outside the three categories, but these are typical designs in ISLA research, and the taxonomy provides a con venient framework to discuss ISLA research. In the following sections, I describe the three types of research in further detail. 3.1

Experimental research

3.1.1 Design Experimental research examines causal relationships. Thus, the research questions and results can be described in causal terms such as “lead to,” “influence,” “result in,” or “have differential effects.” In a typical experimental study, learners take a pretest,

Chapter 2. Quantitative research methods in ISLA 37

Table 1. Taxonomy of quantitative ISLA research

Experimental

Correlational

Observational

Focus

Change

Relationship

Occurrence

Typical analyses

ANOVA/T-tests

Correlation

Chi-square

Design

Learners are divided into minimally two groups: experimental and control. They take pretests and posttests, and statistical analyses are conducted to determine whether the experimental group performs significantly better than the control group.

The same group of learners are tested on a predictor variable and a criterion variable, and correlation analysis is conducted to determine whether the two sets of scores are significantly correlated.

A teaching or learning behavior, process, or phenomenon is observed in its natural state. The purpose is to see how frequently it occurs and/or whether its occurrence varies as a function of other factors.

Basis of analysis

Mean differences

Covariance

Frequency

Is language aptitude correlated with grammar learning?

What types of corrective feedback do teachers provide in an ESL class?

Example Is input enhancement more research effective for L2 development questions than no enhancement?

receive a certain instructional treatment, and take a posttest, and then s tatistical analyses are conducted to determine whether the treatment led to significant learn ing gains. Two basic types of research design can be identified: between-group and within-group, although the two can be combined, in which case it can be called a mixed design. A between-group study involves minimally two groups or conditions, and the purpose is to explore whether there will be group differences resulting from the independent variable. A within-group study aims to examine the change the same group of learners undergo or the variation of the same group’s performance/behaviors at different time points in response to a treatment. The within-group design is also called “repeated measures” because the same learners are tested or measured multiple times to determine whether the performance or outcome varies between the measures or tests. A mixed design study consists of multiple groups tested multiple times, and the purpose is to examine both differ ences between groups and differences within each group between different time points. In ISLA research, most experimental studies examining the effectiveness of instructional treatments are mixed design studies in the sense that they involve multiple treatment groups, and each group is tested before and after the treatments to determine whether the groups demonstrate different levels of improvement. In a between-group study examining the effectiveness of an instructional treat ment, the experimental group receives the treatment and the control group receives

38

Shaofeng Li

no treatment. The idea of a control group is from medical and psychological research where patients in the experimental group receive a certain treatment (e.g., therapy or drug) while the control group do not receive the treatment or just receive a placebo treatment. The results will show whether the treatment is effective compared with no treatment. Alternatively, a study may compare the effects of different treatments, and a control group (no treatment) is normally included if it is uncertain whether the treatments are effective compared with no treatment. In ISLA research, a control group is included to ensure that the larger gains (if any) of the treatment group in comparison with the control group are due to the treatment rather than the repe tition of the same test because the control group also took the same test. However, to claim an effect for a treatment based on its comparison with a control group is often of less interest in ISLA than in medicine or psychology because spending more time studying a second language necessarily leads to a larger gain than doing nothing. Therefore, ISLA researchers need to include a comparison group that does not receive the manipulated instruction in addition to a control group, and it is necessary to name the comparison group properly and interpret the results accord ingly. For example, in a study investigating the effectiveness of task-based corrective feedback, the researcher should include three groups: a group that receives feedback while performing communicative tasks, a group that only performs the tasks without receiving feedback (which can be labeled “task only” or “comparison”), and a control group that does not receive any treatment. The researcher can then compare the effects of feedback, no feedback (task only), and no treatment. An important feature of experimental research is random group assignment, which means that each participant has an equal chance of being assigned to any participant group. The reason for random assignment is simple: we want any between-group differences to be attributable only to the independent variable, and random assignment makes it likely to achieve this. Because the assignment of group membership is random, it should be independent of the researcher’s decision. It must be clarified that random assignment does not guarantee the lack of systematic differences between groups, and the researcher needs to take action if system atic differences exist, such as adjusting group membership so that the proficiency levels are comparable between groups. A study where random assignment is not conducted is called a quasi-experimental study. In this sense, classroom studies where groups were formed based on intact classes are quasi-experimental because group assignment is not random. 3.1.2

Treatment

Treatment length Treatment length has implications for treatment effects in that shorter treatments are less likely to lead to significant effects than longer treatments. There are no rules of thumb to follow for the length of treatment, and ISLA researchers rarely

Chapter 2. Quantitative research methods in ISLA 39

justify the length or duration of the treatment provided to learners (see Chapter 6). Researchers are encouraged to refer to previous research and consider factors that may affect the effects of treatments when deciding on treatment lengths, including the type and intensity of instruction and the nature of the linguistic structure.

Selection of the target structure In a typical ISLA experimental study investigating treatment effects, a target struc ture is selected for the treatment. Therefore, the treatment is focused on one struc ture instead of multiple structures. Target structure selection involves several other decisions. One is whether to select an old structure that learners have previous knowledge about or a new structure. Targeting a new structure makes it easier to tease out treatment effects and to control ceiling effects, which refer to excessive previous knowledge learners have about the target structure that leaves little room for further improvements. A second consideration is whether learners are ready for the structure, as previous research has found that learners benefit more from L2 instruction when they are ready to learn the structure (Mackey & Philp, 1998). A third consideration is whether to select a simple or complex structure, which may have implications for which type of instruction is more effective. For example, simple structures may be amenable to explicit learning while complex structures benefit more from implicit learning (Li & DeKeyser, 2021). Data elicitation In studies involving L2 production (speaking or writing), eliciting the use of the target structure through production tasks may be challenging because learners may avoid the structure if they have insufficient knowledge. One way to address the issue is to use reproduction tasks where learners perform a comprehension task (reading or listening) where the target structure is embedded before performing a production task. If the influence of L2 input is a concern, the comprehension task can be performed in the learners’ L1. Another consideration for data elicitation is the contexts of obligatory use of the target structure, which must be validated and cannot be assumed. Instruction type In a meta-analysis on the effectiveness of L2 instruction, Norris and Ortega (2000) distinguished three types of L2 instruction, into which most ISLA treatments fall: focus on formS, focus on form, and focus on meaning. In focus on formS, linguistic forms are addressed in isolation, and forms are not integrated with meaning; in focus on form, meaning is integrated with form; and in focus on meaning, linguistic forms are not addressed. The researchers also made a distinction between explicit and implicit instruction, depending on whether rules are taught and whether learn ers are encouraged to attend to linguistic forms. Nearly two decades later, Sok et al.

40 Shaofeng Li

(2019) conducted a replication analysis based on an updated database and found “a clear decrease in studies investigating FonFS (33%, compared to 65% in Norris and Ortega) and an increase in the number of studies investigating FonF instruction (68%, compared to 51% in Norris and Ortega)” (p. 424). The researchers argued that the trends are indicative of the evolution of ISLA research and the shift of focus of the field. Specific instruction types or features examined in ISLA research can be found in the section on common research questions. 3.1.3 Assessments Experimental ISLA research aims to facilitate L2 learning, which must be meas ured using pretests and posttests. One essential principle of effective assessments is reliability, a measure of the consistency of learners’ performance on a test. Reliability is of two types: internal reliability and test-retest reliability, with the former referring to the consistency of learners’ responses to different items of the same test and the latter to the consistency of learners’ performance on the same test administered on two different occasions. Reliability can be increased by including more test items, providing clear instructions, and conducting an item analysis to replace or remove items that do not seem to measure the right content, that show low discrimination (an item’s correlations with total scores), and that do not have an appropriate difficulty level. In addition to reliability, the researcher also needs to consider validity, which refers to whether a test measures what it is supposed to measure. First, the researcher needs to consider whether to use old items that appear in the treatment as test items (which measure learners’ ability to recall learned knowledge); new items not involved in the treatment (which tap into the ability to apply knowledge in new con texts); or a mix of old and new items. Second, researchers are advised to use multiple test formats to capture treatment effects. For instance, explicit knowledge (which is conscious and verbalizable) is typically measured using untimed grammaticality judgment or metalinguistic knowledge tests, whereas implicit knowledge (which is intuitive and tacit) can be measured via elicited imitation, word monitoring, and self-paced reading. L2 development can also be operationalized as automatization of previous knowledge, in which case reaction time can be recorded. 3.1.4 Analysis Because the primary goal of experimental ISLA research is to investigate differences in treatment effects, the most typical statistical analysis used is ANOVA – a statis tical analysis designed to detect group differences. In a between-group design that involves more than two groups, an ANOVA can be conducted to determine whether there are significant differences between the participant groups, and if significant differences are found, pairwise comparisons are conducted using T-tests to determine which two groups are significantly different. Because multiple comparisons are con ducted, which may cause Type I errors, statistical guides normally advise researchers  ie-407

ie-408

ie-409

ie-410

Chapter 2. Quantitative research methods in ISLA 41

to adjust the p value using a certain method such as Bonferroni or Tukey. In a simple within-group design that involves one group measured multiple times, a repeated measure ANOVA is conducted, and if significant differences between the different sets of scores are detected, Paired-Samples T-tests are conducted to locate the signifi cance. In a mixed design that involves both between- and within-group comparisons, which is typical of ISLA research, normally an overall mixed ANOVA is performed, followed by ANOVAs and T-tests to locate significant differences if there are any. One common approach is to conduct a between-group ANOVA with learners’ pretest scores to see whether there are significant differences in the groups’ previous knowledge about the target structure. In the absence of significant differences in pretest scores, ANOVA is conducted with learners’ posttest scores, assuming that any difference in posttest scores is due to the treatment instead of differences in the groups’ previous knowledge. In the case of significant group differences in pretest scores, normally gain scores (differences between pretest and posttest scores) are subjected to ANOVA to compare group means. However, difference scores lead to low reliability (Draheim et al., 2019), and a proper way to analyze the data is to use posttest scores as the outcome variable and include pretest scores as a covariate. Furthermore, pretest scores should be analyzed as a covariate regardless of group equivalency on the pretest. A study involving more than one independent variable is called a factorial design, where “factors” refers to variables. A factorial design is described as A × B × C, where A, B, and C represent the number of levels or groups of each independent variable. Thus, a 2 × 2 × 3 design means that there are three independent variables, and that the first two variables each have two levels/conditions/groups and the third variable has three. A factorial ANOVA generates two results: main effects and interaction effects. A main effect means that the independent variable has a significant im pact regardless of other variables, and an interaction effect means that the effect of one variable depends on another variable. For example, in a study examining whether the effect of input enhancement on grammar learning varies between different proficiency levels, there are two independent variables: with vs. without input enhancement and high vs. low proficiency. If a main effect is found for input enhancement, it means learners learn more with input enhancement regardless of proficiency. Likewise, if a main effect is found for proficiency, it means that learners at the higher (or lower) proficiency level learn more regardless of whether they received input enhancement. If an interaction effect is found, it means that whether input enhancement works depends on the learner’s proficiency level. If a main effect is found without interaction, the main effect should be focused on. If an interaction effect is found, then main effects should be ignored, and further analysis should be focused on pairwise comparisons to locate the interaction. An interaction effect can have many scenarios, and it is up to the researcher to decide what further analysis is relevant. Figure 1 shows example scenarios of main effects and interaction.  ie-411

ie-412

ie-413

ie-414

ie-415

ie-416

ie-417

ie-418

42

Shaofeng Li

a. Interaction IE: No IE:

LP

HP

b. Interaction IE: No IE:

LP

HP

c. No Interaction IE: No IE:

LP

HP

d. No Interaction IE: No IE:

LP

HP

Figure 1. Interaction and Main Effects in a Factorial Design Note. LP: low proficiency; HP: high proficiency; IE: input enhancement; No IE: no input enhancement

Chapter 2. Quantitative research methods in ISLA 43

3.2

Correlational research

Correlational research seeks to identify whether two variables are correlated; a significant correlation does not suggest a causal relationship. Three indices are important for correlation analysis: – r, the correlation coefficient, ranges between −1 and +1 and represents the strength of a correlation – p is the significance value, and it refers to the probability that a correlation is due to chance – r 2 refers to the percentage of overlap between the two variables Correlations are the basis of a whole family of analysis including multiple regres sion, path analysis, factor analysis, and structural equation modeling analysis. Statisticians sometimes refer to the relationship between a predictor and an out come variable in multiple regression, path analysis, or structural equation modeling as “causal”. However, it must be clarified that the relationship is not causal even though an increase in the value of a predictor may lead to an increase in the value of the outcome variable. To claim a causal relationship, the design of the study must be experimental. For example, if anxiety is found to be a significant, negative predictor of L2 proficiency, the only interpretation is that there is a negative rela tionship between anxiety and proficiency. To examine whether anxiety causes low proficiency, an experiment is necessary where two groups of learners are recruited who are comparable in all aspects such as learning experience, proficiency, anxiety, and so on. Then one group can be made anxious by being put on camera while receiving a certain instructional treatment, and the other group is not videotaped while receiving the same treatment (MacIntyre & Gardner, 1994). To carry out a correlational study, one simply needs to measure the variables hypothesized to be correlated and perform correlation-type analyses to identify their relationships. Because of the uncontrolled nature of correlational research, the required sample size needs to be larger than experimental research. The uncontrolled nature and alleviation from the burden of implementing instructional treatments as would happen in experimental research makes it necessary and pos sible to investigate more variables. In correlational research, if the sample size is large, even a weak correlation can be significant, in which case it would be more meaningful to examine the strengths rather than significance of correlations. In a small-sample study where it is difficult to obtain significant p values, it is important to examine the strengths of correlations, and additionally it is important to observe the pattern of correlations, namely whether the correlations are in general strong or weak and negative or positive (Li & Fu, 2018).

44 Shaofeng Li

One major stream of correlational ISLA research centers on ID factors in ISLA. The research foci include (1) the predictive power of ID factors for learning out comes such as overall L2 proficiency and aspects of L2 proficiency such as L2 skills (reading, listening, speaking, and writing) and L2 knowledge (grammar, vocab ulary, pronunciation, and pragmatics), (2) the relationships between ID factors, (3) the relationships between the components of the same ID factor, (4) factors that cause variations in IDs such as the variation of L2 motivation as a function of gender, age, instructional setting, or intervention. One stream of ID research that has gained momentum in recent ISLA research that spans both experimental and correlational research is ATI (aptitude-treatment-interaction) research (Li, 2017), which examines the interface between ID factors and instructional treatments. ATI research assumes that learners with different ID profiles benefit from instruction in different ways, and the role of an ID variable depends on the cognitive demands the learning task imposes on the learner. In ATI research, learners are divided into groups and receive different types of treatments, and they are measured on certain ID variables. The data are analyzed to determine whether the same ID variable shows different relationships with different treatments or whether different ID variables show different relationships with the same treatment. 3.3

Observational research

Given the topic of the chapter, which is methods in instructed SLA, the discussion of observational research is restricted to studies focusing on observations of teach ing and learning behaviors, classroom discourse, and instructional activities. One purpose of observational research is to show the status quo or pinpoint the nature of instruction, such as whether current language classes are dominated by traditional approaches or meaning-focused approaches such as communicative language teach ing or task-based teaching. Another purpose is to examine the occurrence of certain instructional features, such as corrective feedback, teachers’ questioning strategies, etc. A third purpose of observation is to monitor the fidelity of the instruction to the principles of the examined instruction approach or feature in experimental or correlational research. In these studies, observation is not a major focus; rather it is used as a tool to ensure internal validity – a topic of the next section. 3.4

Study quality

Study quality is of crucial importance for ISLA research, but criteria for study qual ity are rarely addressed systematically in methodological guides in ISLA. In this chapter, study quality is discussed by following a framework proposed by Cooper

Chapter 2. Quantitative research methods in ISLA 45

(2010), which consists of construct validity, internal validity, external validity, and statistical conclusion validity. 3.4.1 Construct validity A construct refers to the concept underlying a certain phenomenon in L2 instruc tion or learning. A construct is represented by concrete behaviors, events, or pro cedures, which can serve as independent or dependent variables in an empirical study. The process of transforming a concept into its manifestations – behaviors, events, or procedures – is called operationalization. For example, in task-based research, the linguistic complexity of task performance is a concept that has been operationalized in many ways, such as subordination, sentence length, use of past forms, etc. Construct validity refers to the extent to which the operationalization of an investigated treatment, variable, or feature is supported by theory and/or evidence. To ensure construct validity, the construct must be defined and delineated with no ambiguity, and clear guidelines must be formulated to pinpoint the nature and scope of the concept. Unclear conceptualization leads to inconsistent opera tionalizations of the construct in the research, and this is evident in the research on form-focused instruction. Construct validity may also result from unfaithful operationalization, in which case there is clear conceptualization but the way it is operationalized deviates from the conceptualization. In the above cases, construct validity is vetted via the degree of alignment between an operationalization and the relevant theory. In some cases, in addition to theoretical vetting, empirical evidence must be collected to verify construct validity. One example is task complexity, which can be operationalized as with or without contextual support, fewer or more elements, and fewer or more reasoning demands. In all these distinctions, the for mer represents a simple task and the latter a complex task. However, whether a sim ple task is indeed simple and a complex task is indeed complex cannot be a ssumed and must be verified empirically (Révész, 2014). 3.4.2 Internal validity Internal validity refers to whether the outcome can be attributable to the treatment, intervention, or the hypothesized causal factor, which can be collectively referred to as the independent variable of a study. In practical terms, this refers to whether group differences, if any, are due to the presence or absence of the treatment or manipulated feature. Internal validity is essentially a matter of methodological soundness; in other words, studies that use flawed methods normally have low internal validity. To ensure internal validity, it is crucial to maintain comparability between groups or conditions other than the independent variable and to control for or minimize the influence of other variables. Factors that influence internal validity are called extraneous variables, which are discussed as follows.

46 Shaofeng Li

(1) Sample characteristics a. Previous knowledge. In experimental research that examines the effects of an instructional treatment, learners’ previous knowledge needs to be tested or controlled to ensure that treatment effects are due to the treatment rather than differences in learners’ previous knowledge or initial proficiency. b. Random assignment. One way to ensure comparability between groups is to assign learners randomly to groups so that the groups will not differ systematically. However, random assignment based on small samples is insufficient for controlling extraneous variables. c. ID factors. To maximize the attributability of treatment effects to the treatment or independent variable, the researcher may measure learners’ individual affective (e.g., anxiety), conative (e.g., motivation), cognitive (e.g., working memory), or sociodemographic differences and include them as covariates. (2) Control group. Including a control or comparison group enables the researcher to attribute group differences to the treatment or independent variable instead of test-retest effects. (3) Comparability of treatments. To make a claim about the comparative effectiveness of two instructional treatments, the researcher must ensure that the treatments differ only in the manipulated feature and are comparable otherwise, such as the dose of treatment, the instructor, materials, or procedure. (4) Construct-irrelevant variance. This refers to the influence of the logistic aspects of the study or unexpected learner behaviors, such as the failure of the equipment to record data accurately, participants’ lack of motivation to contribute valid data such as engaging in off-task behavior during the treatment, etc. (5) Interaction between participants. This includes communication between learners under the same treatment condition and between different participant groups. Independence of data points is a fundamental assumption of experimental research and inferential statistics. (6) Researcher and participant bias. The researcher may introduce a bias toward or against a certain treatment, so the researcher should not be involved in the treatment, or else there must be a protocol to prevent researcher bias. Participant bias refers to staged behavior or Hawthorne effects, namely that participants who are aware of the research goal may behave in a way that meets the expectations of the researcher or that makes themselves “look good.” (7) Reactivity. This refers to the influence of the research instrument on the results. For example, stimulated recall, where learners watch the recording of an instructional treatment and reflect on their mental behaviors during the treatment, has been used to investigate the noticing of corrective feedback. However, noticing may have resulted from the stimulated recall per se rather than feedback provided during the instructional treatment.  ie-485

ie-486

Chapter 2. Quantitative research methods in ISLA 47

(8) Measurement. Measurement is essential in ISLA research, and measurement errors weaken internal validity. If the tests used to measure the independent or dependent variable lack validity or reliability, the results are not due to the examined constructs but to the measurement – an extraneous variable. 3.4.3 External validity External validity refers to the extent to which the findings of a study can be gener alized to other learners and other contexts. Cooper (2010) divided external v alidity into two kinds: population validity and ecological validity, with the former re ferring to whether the results are generalizable to learners not included in the study in the same setting, and the latter to whether the results are generalizable to other settings. There are no clearly defined criteria to evaluate external validity, and external validity has been interpreted differently by methodologists. In ISLA, external validity is referred to as ecological validity, and it refers to the extent to which research findings are applicable to real-world language teaching and learning settings. One issue that is often brought up is the applicability of lab-based find ings to classroom settings, given that lab-based studies are typically conducted in highly controlled contexts that do not exist in the real world. In comparison with laboratory studies, classroom-based studies are more representative of the type of learning that occurs in ISLA, where most learners study a second language in the classroom. Therefore, lab-based research seems to have low external validity while classroom research has low internal validity. It must be pointed out that internal validity is a prerequisite for external validity, and a study with no internal validity cannot be generalized to other set tings or other learners. Cooper (2010) stated that to evaluate external validity, one can assess “the representativeness of a study’s participants, settings, treatments, and measurements” (p. 125). Although there are no specific guidelines for how to evaluate external validity, the following aspects and questions can be considered in consultation with Cooper’s recommendations. (1) Research topic, focus, and questions: Is the investigated topic of interest to prac titioners? For example, teachers often provide corrective feedback in response to learners’ oral or written errors. Therefore, the research on corrective feedback has practical significance and has the potential to be generalizable to language classes in the real world. (2) Sampling a. Sample selection: Are the learners randomly selected from the popula tion? Random selection of participants increases a study’s generalizability. Although it is usually unrealistic to randomly select L2 learners from the whole population, researchers should strive to practice random selection in the local setting, such as the institution where the learners are recruited.

48 Shaofeng Li

b. Sample characteristics: Are the learners involved in the study typical of learners in ISLA in terms of background knowledge, learner traits, etc.? c. Sample size: Is the sample size large enough to allow the results to general ize to the whole population? The larger the sample, the more generalizable the results are. (3) Setting. Is the target language used outside of the classroom? Results of studies conducted in ISLA settings where the language is not used outside of class may not generalize in settings where learners are exposed to the target language in the surrounding community. (4) Treatment a. Are there similar instructional procedures or practices in the real world that the results apply to? For example, in experimental research on written corrective feedback, feedback is normally provided on a single structure, which is not typical of L2 writing instruction in the real world. Therefore, the results have been criticized for the lack of external validity. b. If the treatment doesn’t happen in the real world, can it be adapted, imple mented, or incorporated in the L2 classroom? Because of the manipula tion and control of extraneous variables, the treatment may not perfectly correspond to real world practices, but the treatment may be adapted to fit real world scenarios, in which case the study has a certain degree of external validity. (5) Measurements: Are variables measured in ways that occur in the real world? For example, in ISLA research, oral and written proficiency are often meas ured via complexity, accuracy, and fluency, which are not typical of classroom assessments. Therefore, the results may not have direct relevance to classroom instruction. 3.4.4 Statistical conclusion validity Statistical conclusion validity pertains to the appropriateness of the statistical analysis and the reporting and interpreting of statistical results. First, threats to the validity of statistical analysis include the statistical power of the analysis (which depends on the sample size); the normality of data distribution (which is impor tant for parametric tests); independence of data points; and selection of statistical analysis. Second, the results must be adequately reported, including both descrip tive and inferential statistics, and both results of null hypothesis statistical testing (t, F, χ2, p) and effect sizes (eta squared, Cohen’s d, r, odds ratio). Third, statisti cal findings must also be interpreted appropriately. Irrespective of whether the p value is significant, the effect size, which represent the magnitude of an effect or the strength of a correlation, must be incorporated in the interpretation. For large-sample studies where significant p values are easy to come by, more emphasis

Chapter 2. Quantitative research methods in ISLA 49

should be placed on the magnitude of effect sizes. For small-sample studies, which have low power and where it is difficult to obtain significant p values, the statistical power must be made prominent, and large effect sizes should be interpreted as suggestive of potential statistical significance given sufficient power. 4. Advice for future researchers All research starts with topic selection. In topic selection, novelty or originality is perhaps what most researchers prioritize, and novelty can be interpreted as new topics, new perspectives, or new methods. In ISLA, most experimental research has focused on the acquisition of L2 morphosyntax or grammar, and areas that require more experimental research include listening, reading, writing, vocabulary, pro nunciation, and pragmatics. One area in need of further research is the role of ID factors in ISLA, especially in certain subdomains of ISLA, such as reading, listening, pragmatics, and writing, where ID factors have been extremely under-researched (Li et al., 2022). While originality is valued in empirical research, replication re search is equally important (see Chapter 5). In topic selection, research design, and data analysis, it is important to adopt an interactional approach. The essence of such an approach is that the effective ness of a certain treatment, the relationship between variables, or the occurrence of an event is moderated by other variables. Thus, it is essential and fruitful to view ISLA phenomena from a dynamic and dialectical perspective, examining the impact of moderating variables as independent variables and/or conducting post hoc analysis of the data if the variables were not controlled in the original design. In research design, it is important to examine a small set of variables rather than a host of variables, which would make it difficult to have a clear overall picture of the examined relationships (Norris & Ortega, 2000). In a factorial design, investigating more than two independent variables would pose a great challenge for examining and interpreting interaction effects. However, using a simple design does not make the study less robust, and the best approach is to keep the design simple but use rigorous and sophisticated methods. Validity is key to quantitative ISLA research. To achieve construct validity, it is important to define your construct clearly and operationalize it accordingly. This requires a thorough reading of the literature and an accurate understanding of the theory and research on the construct. To ensure the validity of research instru ments, researchers are encouraged to use established or validated methods, but the fact that a certain method has been used previously is no guarantee for its validity. It is incumbent on the researcher to reexamine its validity or report data that c onfirm its validity. It is advisable to prioritize internal validity over external validity when

50

Shaofeng Li

there is a competition between them. Measurement and assessment are key ele ments of quantitative research, and results based on invalid tests are misleading. Therefore, quantitative ISLA researchers should have the basic knowledge about testing (test validity, reliability, etc.) even if they do not conduct research in testing. Finally, when writing a research report, clarity should be prioritized. A research project can be complex, and the details and intricacies make clear writing essential. A key to clarity is to elaborate concepts, terms, and specialized information and not to assume that the reader has background knowledge about the topic. The language of a research report should be precise, unambiguous, and straightforward. 5. Troubleshooting quantitative methods 5.1

Sampling

Although random sampling is ideal, it is difficult to achieve. Therefore, recruit ment of participants is normally based on convenience sampling, that is, learn ers are selected as participants because they are available. One way to overcome the limitations of a small sample is to use a mixed design integrating quantita tive and qualitative methods, which allows the researcher to get richer data and examine the research questions from multiple perspectives. Another strategy is to use a within-group design, giving different treatments to the same learners. In a within-group design, concerns over between-group variation are nonexistent, but the sequence of the different treatments may be an extraneous variable. The order of treatment tasks can be randomized to counter ordering effects; this strategy is called counterbalancing. In counterbalancing, half of learners complete task A before task B and half in the reverse order. All possible orders should be applied, and order of treatments or tasks can also be analyzed as a covariate. In sampling, as much as possible, selected learners should be homogeneous, and a heterogeneous sample may lead to unexpected results because of potential extraneous variables relating to learners’ backgrounds. Another consideration is whether the selected sample fits the research design and research questions. For example, learners who are not used to a task-based approach may not be able to perform a communicative task, and learners in foreign language settings may not have implicit knowledge. 5.2

Assessments

All assessments must adhere to principles of effective assessments. Tests must be reliable and valid. For internal reliability, alpha values of .7–.9 are adequate and below .5 unacceptable (Wagner & Skowronski, 2019). However, the standards for

Chapter 2. Quantitative research methods in ISLA 51

tests of implicit knowledge or implicit aptitude should be more lenient (Li & Zhao, 2021). One pitfall for tests of treatment effects is the small number of items, which leads to low reliability, and a rule of thumb is that a minimum of 30 items should be included in a test to ensure reliability. In a test that measures multiple skills or components of a construct, reliability should be calculated separately for different sections. Other dimensions of validity include content validity – the measured content must be relevant and comprehensive; divergent validity – measures of a construct should be separate from those of constructs hypothesized to be different; convergent validity – measures of a construct should be correlated with those of constructs in the same or related paradigm; and predictive validity – measures of a construct should predict the relevant outcome. Content validity applies to all tests, and the other three types of validity apply more to assessments of psychological constructs. For experimental ISLA research examining treatment effects, the same test items should be used for pretests and posttests, but distractors and the order of items may differ between tests. 5.3

Data collection, coding, and analysis

For data collection, it is necessary to establish a detailed protocol that includes all instructions (not just an outline), steps, and procedures, and implement them strictly. Faithful implementation is crucial because any deviation, even a minor change, may have an impact on the way learners perform a task or respond to a test. During data collection, the researcher should closely monitor the whole process and log any noteworthy anomaly or event that may affect the validity of the data such as off-task behaviors or technical failures and that may help the researcher interpret the results in later stages such as time on task. Technological advances and innovations make it possible to collect data in virtual settings and asynchronously; therefore, researchers should adapt their mindset to the ever-changing situation and find the best way to collect data. Data coding is the next crucial step. Researchers should perform some trial coding and formulate a detailed, justified coding scheme. All coding needs to be double or even triple checked, and if possible, all data should be coded twice and by different coders. For subjective ratings, at least two raters are needed, and the scoring needs to be calibrated to reach an acceptable level of interrater reliability. Evidence for intra-rater reliability, which refers to consist ency of scoring for the same rater, is also required. Intra-rater reliability can be achieved by asking the same rater to rate the same samples twice to determine whether the two ratings are consistent. The objective of careful coding and interand intra-rater reliability is to ensure internal validity, not to meet journal require ments or please reviewers. Where necessary and possible, data should be coded in different ways, which may lead to different results. What statistical analysis to

52

Shaofeng Li

conduct depends on the research design and research questions instead of whether it appears sophisticated or follows a fashion in the field. It is important to analyze the data from different perspectives, try different analyses, and present the results in a way that best represents the study and data. 6. Conclusions Learning how to conduct quantitative research seems a daunting task, but the difficulty should not be overstated. The first step is to find a topic that the researcher is interested in, and then read widely and deeply to narrow down the topic, adjust the research questions, and learn the methods that have been used to examine the interested topic. ISLA is an applied field, and the fundamental questions are what factors lead to L2 learning success and which instructional approaches/methods/ techniques are effective or are more effective than others. Guided by these two broad and yet specific questions, it is not difficult to find topics that we want to collect evidence to examine. It is not necessary to wait until one completes rigorous methodological training to conduct high-quality quantitative research. Often, the researcher needs to learn the methods on the fly and resolves methodological issues as they arise. However, it is necessary to make an effort to learn/know the basic prin ciples of rigorous research, including construct, internal, external, and statistical conclusion validity. These terms sound abstract, but they are simply abstractions of mundane problems and ideas about how to effectively answer the research ques tions of an empirical study and how to ensure that the findings are valid and that the interpretations are justifiable. I hope that this chapter has achieved the goal of providing a succinct summary of quantitative research methods in ISLA within the space of a chapter-length text. 7. Further reading Brown, D. (2016). The type and linguistic foci of oral corrective feedback in the L2 classroom: A meta-analysis. Language Teaching Research, 20(4), 436–458. https://doi.org/10.1177/1362168814563200 Li, S., & Zhao, H. (2021). The methodology of the research on language aptitude: A synthetic review. Annual Review of Applied Linguistics, 41, 25–54. https://doi.org/10.1017/S0267190520000136 Li, S. (2018). Data collection in the research on the effectiveness of corrective feedback: A syn thetic and critical review. In A. Gudmestad & A. Edmonds (Eds.), Critical reflections on data in second language acquisition (pp. 33–61). John Benjamins. https://doi.org/10.1075/lllt.51.03li

Chapter 2. Quantitative research methods in ISLA 53

Loewen, S., & Hui, B. (2021). Small samples in instructed second language acquisition research. Modern Language Journal, 105(1), 187–193. https://doi.org/10.1111/modl.12700 Wagner, M., & Skowronski, J. (2019). Reliability and validity of measurement in the social and behavioral sciences. In J. Edlund & A. Nichols (Eds.), Advanced research methods for the social and behavioral sciences (pp. 21–37). Cambridge University Press.

References Cooper, H. (2010). Research synthesis and meta-analysis: A step-by-step approach. Sage. Draheim, C., Mashburn, C., Martin, J., & Engle, R. (2019). Reaction time in differential and development research: A review and commentary on the problems and alternatives. Psychological Bulletin, 145(5), 508–535. https://doi.org/10.1037/bul0000192 Li, S. (2017). Cognitive differences and ISLA. In S. Loewen & M. Sato (Eds.), Handbook of instructed second language learning (pp. 396–417). Routledge. https://doi.org/10.4324/9781315676968-22 Li, S., & DeKeyser, R. (2021). Implicit aptitude: Conceptualizing the construct, validating the meas ures, and examining the evidence. Studies in Second Language Acquisition, 43(3), 473–493. https://doi.org/10.1017/S0272263121000024 Li, S., & Fu, M. (2018). Strategic and unpressured within-task planning and their associations with working memory. Language Teaching Research, 22(2), 230–253. https://doi.org/10.1177/1362168816684367 Li, S., Hiver, P., & Papi, M. (2022). Individual differences in second language acquisition: Theory, research, and practice. In S. Li, P. Hiver, & M. Papi (Eds.), The Routledge handbook of SLA and individual differences (pp. 3–34). Routledge. MacIntyre, P., & Gardner, R. (1994). The subtle effects of language anxiety on cognitive process ing in the second language. Language Learning, 44(2), 283–305. https://doi.org/10.1111/j.1467-1770.1994.tb01103.x Mackey, A., & Philp, J. (1998). Conversational interaction and second language development: recasts, responses, and red herrings? Modern Language Journal, 82(3), 338–356. https://doi.org/10.1111/j.1540-4781.1998.tb01211.x Norris, J., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quanti tative meta-analysis. Language Learning, 50(3), 417–528. https://doi.org/10.1111/0023-8333.00136 Révész, A. (2014). Towards a fuller assessment of cognitive models of task-based learning: Investigating task-generated cognitive demands and processes. Applied Linguistics, 35(1), 87–92. https://doi.org/10.1093/applin/amt039 Sok, S., Kang, E. Y., & Han, Z. H. (2019). Thirty-five years of ISLA on form-focused instruction: A methodological synthesis. Language Teaching Research, 23(4), 403–427. https://doi.org/10.1177/1362168818776673

Chapter 3

Qualitative ISLA research methodologies and methods Peter I. De Costa, Robert A. Randez, Carlo Cinaglia, and D. Philip Montgomery Michigan State University

In this chapter, we introduce the principles and practices underlying quali tative methodologies (e.g., ethnography, case studies, action research) and qualitative methods (e.g., field observations, interviews) that are compati ble with socioculturally-oriented SLA theories (e.g., language socialization, identity theory, Vygotskian sociocultural theory). We highlight the explora tory and interpretive nature of qualitative research in that it intends to explain phenomena through the experiences and perspectives of learners and teachers by providing rich descriptions of the learning and teaching contexts in which these learners and teachers are socially situated. Working on the premise that SLA theories need to be aligned with methodologies and research paradigms, we also explain and detail how these theories have been applied to better under stand and conduct classroom-based research involving language learners and teachers. Further, we break down qualitative methodology into steps, outlining common methods and instruments used for collecting data, and highlighting ethical and procedural considerations associated with this research approach. Keywords: case study, ethnography, grounded theory, conversation analysis, narrative inquiry

1. Introducing qualitative research and its importance Qualitative research, according to educational scholars Wilson and Anagnostopoulos (2021), focuses on “the why and how of human interactions, people’s perspectives, and their lives” (p. 654). Qualitative researchers, they add, explore “mechanisms, forces, factors, and structures that shape social interactions, always with attention to context” (p. 654). These same characteristics apply to qualitative research in ISLA, which seeks to problematize and understand L2 learning and teaching processes that take place within and beyond the classroom; this is a key contrastive feature https://doi.org/10.1075/rmal.3.03dec © 2022 John Benjamins Publishing Company

56

Peter I. De Costa et al.

between quantitative and qualitative ISLA research. While the former often seeks to measure the effects of instructional interventions on learning outcomes, the latter seeks to better understand the contexts in which these interventions o ccur. Importantly, qualitative research is interpretive in nature. This also explains why qualitative research designs are generally more adaptable and flexible than quanti tative designs, because the instruments used in the latter “are often carefully con structed according to existing theory” (Rose, 2017, p. 33). That is not to suggest that theory does not matter in qualitative ISLA research, however. In actuality, the importance of theory cannot be overstated, which is why we devote a sizable chunk of our chapter (Section 2) to describing two key theories and five constructs often informing and/or adopted in qualitative ISLA research. As stated in De Costa et al. (2019b), qualitative methodology always needs to be aligned with a qualita tive researcher’s paradigm (i.e., their epistemology and ontology) and the theory that guides a study. Due to space constraints, we will not delve into specific data collection methods (i.e., the actual instruments and procedures such as interviews, observations, artifacts) used in qualitative research. For that, we direct you to Rose et al. (2019), a book-length volume that explores applied linguistic data collection research methods in an accessible manner. Our primary goal in this chapter is (1) to introduce you to six core methodologies (Section 3) that have been used to inves tigate phenomena in L2 learning and teaching, and as stated, (2) to underscore the significance of theory and social context in the ISLA qualitative research enterprise. 2. Theories used in qualitative ISLA research In this section, we (1) introduce two theories (language socialization and Vygotskian sociocultural theory) and five theoretical constructs (identity, agency, emotion, motivation, and investment) that are commonly used in qualitative ISLA research, and (2) suggest three research concerns that can be explored using each frame work. We present these theories and theoretical constructs to highlight their history and associated characteristics, while acknowledging that overlaps exist between them. The theories and theoretical constructs are introduced because methodology, theory, and paradigm are inextricably linked in qualitative ISLA research (De Costa et al., 2017, 2019b). 2.1

Language socialization

Language socialization theory (LS) stems from anthropological studies that sought to understand how children learned to become “legitimate” users of language and other semiotic resources in the world around them (Ochs & Schieffelin, 2012, p. 6).

Chapter 3. Qualitative ISLA research methodologies and methods 57

Since then, it has expanded to include wide-ranging investigations of how newcom ers use language to gradually become members and then experts in their commu nities, whether that is a classroom, a workplace, or a community center. LS draws parallels with situated learning theory (Lave & Wenger, 1991), whereby newcomers are socialized into communities of practice by explicit and implicit language prac tices across spoken, written, and non-verbal interactions (Friedman, 2019). These interactions are both the process and goal of LS, as newcomers are socialized to using language through using language (Ochs & Schieffelin, 2012). A recent edited volume entitled Language socialization in classrooms (Burdelski & Howard, 2020) demonstrates the wide-ranging applications of LS in class room instruction. For example, in a study conducted in Sweden, Cekaite (2020) describes how children with immigrant backgrounds engage with their teacher’s vocabulary-related explanations of Swedish terms. Using classroom observations and interviews with students, she found that students not only learned the language through these interactions but also actively negotiated, and sometimes resisted, the tacit cultural values, norms, and worldview embedded within classroom interac tions. In the closing chapter, Duff (2020b) summarizes that, by observing and docu menting detailed real-time interactions of language practices over an extended time period, researchers can account for not only how language is learned but also how a community comes to share values, habits, norms, subjectivities, and moralities. LS has also evolved to acknowledge multidirectional socializing processes whereby novices socialize each other into communicative practices (Anderson, 2017) or, at times, play a role in socializing the more established members of a community (Goodman & Montgomery, 2020). Especially relevant to the field of ISLA is how LS has proven useful in understanding international teaching environments (Uzum, 2017). Research concerns that can be explored using LS include: – How are international students socialized into the linguistic practices that are valued in their host universities? – How do classroom interactions socialize pre-service language teachers into acquiring genre-appropriate discourse expected by their academic programs? 2.2

Vygotskian sociocultural theory

Vygotskian sociocultural theory (SCT), informed by the work of Russian psycholo gist Lev Vygotsky, is a framework that views social interaction as the foundation for knowledge development. Lantolf (2011) notes that while SCT was not developed exclusively for researching SLA, it does provide a useful lens for understanding the development of an additional language. In particular, SCT holds two important ideas in terms of SLA: that social interaction is necessary for L2 development to

58

Peter I. De Costa et al.

occur, and that L2 development must be examined in its social context in order to fully understand SLA as a dynamic and situated process. A central concept in SCT that pertains to L2 development is mediation or the process through which language is used to construct, appropriate, and regulate knowledge, usually through interaction with others. Another SCT concept relevant to SLA is the zone of proximal development (ZPD) or the space between what an individual L2 learner can accomplish unaided and what she can achieve with assistance. Related to ZPD, a third SCT concept informing L2 development is scaffolding, which Saville-Troike and Barto (2017) define as “the verbal collaboration of peers to perform a task which would be too difficult for any one of them in individual performance” (p. 216). Importantly, Saville-Troike and Barto’s definition reflects how scaffolding can be provided by experts as well as by peers, as illustrated in two example studies. Gagné and Parks (2013) examined scaffolding practices among young French-L1 learners of English in Quebec during cooperative learning tasks, observing the use of requesting assistance and other-correction, among other prac tices, in order to negotiate for meaning. In a similar context, van Compernolle (2019) analyzed student-teacher interaction among English-L1 university students learning French in the US to examine the development of sociolinguistic and prag matic variation during a conversation task. Research concerns that can be explored using SCT include: – How do the tasks in the language classroom scaffold the students’ acquisition of the target grammar item? – How do online interactive resources mediate effective student learning? 2.3

Identity and agency

Examining student identity can provide insights into how students engage in the classroom or interact with instruction as a result of the identity positions (e.g., a “good” or “bad” language learner) they inhabit. ISLA research looks to identify ef fective or ineffective pedagogical strategies that impact L2 acquisition, and identity and agency frameworks complement these areas of inquiry by highlighting how individuals’ perceptions of their environment, and how they position themselves and others within this environment, can influence their uptake of such pedagogical strategies. In addition to learner identity, ISLA researchers also need to consider the identity of the teacher. With the “I” in ISLA standing for instructed, one should focus not only on the students but also on the teachers, as both are integral partici pants in interactions that promote learning. In particular, ISLA researchers need to consider how teacher identity (e.g., teacher as an authoritative figure vs. teacher as a friend; “native” vs. “nonnative” speaker labels) influences the quality and texture

Chapter 3. Qualitative ISLA research methodologies and methods 59

of the instruction delivered in the classroom because the quality of instruction impacts the language learning outcomes of students (Norton & De Costa, 2018). Defined as “the socioculturally mediated capacity to act” (Ahearn, 2001, p. 112), agency has gained interest in language learning and teaching. Early research about agency in language learning has promoted the view that “learners are not simply passive or complicit participants in language learning and use, but can also make informed choices, exert influence, resist (e.g., remain silent, quit courses), or comply, although their social circumstances may constrain their choices” (Duff, 2012b, p. 413). Whether understood as residing within individuals or in the space between individuals and their surroundings at any given moment, the concept of agency has been applied to research about language teachers as well and has been theorized to include the choices teachers make in and out of their classrooms (Pappa et al., 2019). Whether looking at teacher or learner agency, researchers need to be aware that these two aspects of agency are interdependent as both the teacher and learner often occupy the same pedagogical space (Miller, 2012). Agency, there fore, has become a useful concept to describe and understand the ways teachers and learners navigate language learning and teaching. Research concerns that can be explored using the constructs of identity and agency include: – In what ways are the identities of minoritized language students ratified or denigrated in the classroom? – How do multilingual teachers negotiate a top-down monolingual language policy implemented at their school? 2.4

Emotion

The study of emotion has steadily increased in the field of SLA (Barcelos & Aragao, 2018). Research in emotions, which has provided great insight into the experi ences of language learners, specifically learners who come from marginalized back grounds, often highlights the reflexive nature of qualitative research; that is, as researchers learn about their participants’ emotions, they do so in relation to their own (see Section 3.6). The emotion-based ISLA research agenda has extended to language teachers in recent years. De Costa et al. (2019a) highlight how focusing on language teacher emotions can provide insight into how teachers process their classroom experiences and how these experiences, in turn, influence their instruction. For example, De Costa et al. (2018) used an emotion-based framework to understand how profes sional challenges impacted Math teachers working in an English as a medium of instruction (EMI) school in China and Nepal. The authors found that a lack of support from their focal teacher participants’ school administration contributed

60 Peter I. De Costa et al.

to teacher burnout. Consistent with De Costa et al. (2018), Gkonou and Miller (2021) demonstrated that attention to teacher emotions could have a more favora ble impact in the classroom. Based on their findings, they asserted that the more conscious teachers were about the impact their profession has on their emotions, the more they were able to cope with any work-related negativity they encountered. Turning to language learners, emotion-framed research can also show how broader sociocultural influences aid, or impede, classroom language acquisition. Though applied linguists have examined the effects of policies on language educa tion, few ISLA researchers have attempted to understand how such policies affect learners. One notable exception is Rawal and De Costa (2019). Focusing on the lasting effect of being identified as an English language learner (ELL), they inter viewed two students who struggled with the perception of being labeled an ‘ELL’ even though they had graduated from ESL classes and joined mainstream classes. Research concerns that can be explored using the construct of emotion include: – To what extent do L2 teachers encounter emotional burnout? – In what ways does the emotional trauma encountered by undocumented student immigrants affect their language development? 2.5

Motivation and investment

Motivation and investment in SLA are seen as complementary approaches to un derstanding an individual’s engagement with and commitment to L2 learning. Whereas motivation developed as a psychological construct to examine learner differences in terms of cognitive and affective factors, investment has emerged as a sociological construct to explore learner experiences in light of social identities and relations of power. As Darvin and Norton (2021) note, the two constructs “hold up two different lenses to investigate the same reality: why learners choose to learn an additional language” (p. 1). Earlier research on L2 motivation developed from social psychology and sought to answer questions such as whether and to what extent learners were motivated (Gardner, 1985). Much of the motivation research at the time adopted a quantitative approach to measuring L2 learning experiences in terms of relationships among fixed variables and categories. In contrast, as Darvin and Norton (2021) observe, investment was put forth as a way to better understand the complexity of the L2 learning experience by examining the learner in relation to larger social contexts. An example of this difference is observed when comparing the complementary con cepts of willingness to communicate (WTC) (MacIntyre et al., 1998) in motivation and the right to speak (Norton, 2013) in investment. Whereas WTC is understood

Chapter 3. Qualitative ISLA research methodologies and methods 61

in relation to internal psychological variables, the right to speak is described as “a claim to one’s legitimacy as an L2 speaker within contexts of power” (Darvin, 2019, p. 249). In light of more recent L2 motivation research, greater convergences and par allels can be observed between the two constructs. For example, whereas Norton’s (2013) concept of identity sought to expand on notions of fixed personality traits and view the complexity of the individual in relation to the world, Ushioda’s (2009) concept of person-in-context and relational view of motivation begins to consider motivation as more of an external process than an internal state. Similarly, the concept of imagined communities (Kanno & Norton, 2003), involving real and po tential identities as and affiliations with L2 users, shares similarities to Dörnyei’s (2009) L2 Motivational Self System, which includes the ideal L2 self and the ought-to L2 self. Darvin and Norton (2021) emphasize that current L2 motivation research, such as a complex dynamic systems approach (Dörnyei et al., 2015), has sought to understand motivation less as an internal, stable trait and more as a contextual and transitory process. Research concerns that can be explored using the constructs of motivation and investment include: – How do students’ ideal and ought-to selves shape their L2 motivation? – What imagined communities does the language learner want to be part of, and how does their teacher facilitate the development of this imagined identity in the classroom? 3. Guidelines for common qualitative methodologies and methods The theoretical frameworks described in the previous section serve as a founda tion for much qualitative research in ISLA, but these frameworks do not often stand alone in a research project. Rather, and in keeping with De Costa et al.’s (2019b) observation that qualitative methodologies need to be aligned with a researcher’s paradigm (e.g., postpositivist, postmodernist) and the theoretical framework/s selected to guide a study, we describe in the sub-sections that follow six methodologies – case study, ethnography, action research, grounded theory, conversation analysis, and narrative inquiry – that can be used when research is informed by any of the previously mentioned theoretical frameworks. Each section provides an overview of the methodological approach, a discussion of its principles and procedures, and a recent study that exemplifies the methodology in action. We also provide brief, personal comments on each exemplar study.

62

Peter I. De Costa et al.

3.1

Case study

Case studies are conducted when a particular phenomenon within a specific social context is investigated. Duff (2020a) explains that case studies investigating lan guage learning and teaching may focus on a family, a classroom, a program, an institution, a country, or any number of other entities that may be nested within another. A researcher may choose to focus on a single case (e.g., one teacher or one classroom) or cast a wider net to include multiple cases (e.g., teachers or class rooms). Even in multiple case studies, it is typical to include only a handful of participants due to the detailed focus on individual experiences in a given context. In this way, case study research aims to achieve a depth – rather than breadth – of analysis (Duff, 2012a). Case studies can be designed in a variety of ways (see Yazan, 2015 for an over view). While many case studies are rooted in an objectivist, positivist paradigm (Yin, 2018), qualitative case studies tend to adopt an interpretive lens that seeks to “make visible some of the complex dimensions of people’s language-related and social engagements in events that resonate with others” (Duff, 2020a, p. 144). To carry out a case study research project, researchers should first select a case (or cases) that highlights the phenomenon of interest. For example, case studies in ISLA have explored the ways students reintegrate into their domestic L2 programs after study abroad (Lee & Kinginger, 2018) and how curricular reform impacts EFL teacher identity change (Jiang & Zhang, 2021). Merriam (2009) explains that a case can be selected because it is seen as typical, unique, successful, or simply “intrinsically interesting” (p. 42). Common data sources include semi-structured individual interviews, classroom observations, teaching artifacts like lesson plans or instructional materials, and participant journaling or project work. Where possible, case studies integrate multiple data sources to present a detailed description of the context and findings. Case studies also tend to be longitudinal in order to achieve an in-depth understanding of the participants and phenomenon in question. In our exemplar study, Uzum (2017) applies a language socialization framework to examine how one Uzbek language teacher’s early learning experiences, interper sonal relationships, and understanding of theory and practice mediated her social ization into her role as a Fulbright language teaching assistant in the U.S. Exemplar study: Uzum (2017) Research question How are a language teacher’s pedagogical beliefs and practices shaped by biographical, contextual, and dialogic factors? Theoretical framework Language socialization

Chapter 3. Qualitative ISLA research methodologies and methods 63

Methods Uzum documented the year-long professional socialization processes of Nargiz (a pseudonym), an Uzbek language teacher in the Fulbright language teaching assistants (FLTA) program. The researcher collected multiple forms of data, including interviews at the beginning, middle, and end of the program, audio- and video-recorded lesson observations with accompanying field notes, and a collection of teaching materials (e.g., lesson plans, syllabus). Findings The data analytical procedure yielded three categories of factors that impacted Nargiz’s language teacher socialization: – biographical (previous life experiences, especially as a language learner), – contextual (interactions with students and institutional resources), and – dialogic (understanding and use of language learning theories in teacher practice). Throughout her year-long experience, Nargiz transformed her beliefs about teaching languages, evidenced by several discourse markers that denote Nargiz’s shifting beliefs. Take-aways Nargiz was selected as a focal participant of this single-case study, not only for her unique experience teaching a language the author was familiar with but also because her experience may likely represent other teachers of less commonly taught languages who may undergo similar socialization processes. The author then relied on a range of data sources to paint a vivid picture of Nargiz over one year, including her prior learning experiences, her interactions with teachers and students, and her changing beliefs about language teaching. The special attention given to defining the case, justifying its importance, and longitudinally documenting the participant’s experiences and beliefs are hallmarks of a qualitative case study. The personal accounts generated in such a study are not intended to generate generalizable findings for all language teachers, but rather help to verify understandings of language learning and teaching as they relate to varied socialization experiences that occur in different educational contexts.

3.2

Ethnography

A classroom-based ethnography involves a researcher who is embedded within a community for a considerable length of time and who seeks to “explore and track the dynamic and complex situated meanings and practices” of a community’s members (Lillis, 2008, p. 355). This rich involvement in the lives of the research participants often affords the researcher an emic (insider) perspective of the cultural practices, beliefs, and values that influence the classroom community. Ethnographic studies can range in length from a few months to several years. Traditionally, ethnographies of language learning and teaching have consisted of a researcher regularly visiting a classroom to conduct observations, recording spontaneous interactions between teachers and students, and among students themselves. Although ethnographic research does not follow a rigid set of protocols

64 Peter I. De Costa et al.

and procedures, ethnographies share a common investigative focus on the culture of the research site, understood through prolonged interaction (Copland & Creese, 2017). The researcher’s observations, recordings, and interviews or focus groups with students, teachers, or perhaps administrators then form the basis of the data analysis. In this way, researchers hope to develop a multifaceted and nuanced un derstanding of classroom dynamics. Classroom ethnographies of this sort have been conducted to better understand immigrant student identity construction (De Costa, 2010) and the creation and revision of academic writing as a social process (Lillis, 2008). More recently, ethnographic methodologies have expanded to include several newer approaches. For instance, “netnography” has emerged to document the ways online communities establish and maintain cultural practices, in an on line community dedicated to learning Korean (Isbell, 2018). Autoethnography has also recently gained traction, as researchers turn their analytical lens onto them selves to interrogate their identities and practices as teachers or language learners (Mirhosseini, 2018). In our exemplar study, Ferrada et al. (2020) pair ethnographic inquiry with multimodal interactional analysis to reveal the central roles affect and emotion play as Latinx youth scrutinize the relations between language, race, identity, and power in their lives. Exemplar study: Ferrada, Bucholtz, & Corella (2020) Research question How is affective agency enacted interactionally? Theoretical framework Agency and emotion Methods The researchers collected video-recorded interactions of 12 Latinx high-school students during a five-month-long after school program aimed at raising critical awareness of and agentive responses to language practices and ideologies in their lives. Students participated in discussions about video clips and other media portraying racist and xenophobic stances about language and completed a community awareness project to promote linguistic pluralism. The activities and interactions of one focal participant, Valeria, were analyzed to demonstrate how affective responses to linguistic racism represent a form of agency that can promote social justice. Findings Ferrada et al. found that affect played an important role in how Valeria moved from encountering to responding to linguistic racism in her community. They detail how embodied emotional responses can be leveraged into social action through the development of affective agency. For example, Valeria is drawn to tears while watching an infamous clip of Newt Gingrich equating English with prosperity and Spanish with the ghetto. Throughout the program, Valeria’s interactions with peers and workshop facilitators demonstrated the impact that affective responses are challenged, negotiated, supported, and potentially transformed into agentive actions as a force for social change.

Chapter 3. Qualitative ISLA research methodologies and methods 65

Take-aways This study seeks to spotlight emotion in educational spaces, especially for racialized minorities. An ethnographic methodology makes this possible due to its aim to understand cultural practices, beliefs, and values from an insider’s perspective. In order to argue that students can use affect as an agentive force to resist bigotry and racism, the authors first needed to get to know Valeria and her background, relationships, and values. Their extended observations of classroom interactions led them to understand Valeria from within her community, not as observers from the outside. An ethnographic approach thus seeks to enable researchers to examine such sensitive issues, in all their complexity and dynamism, both critically and ethically.

3.3

Action research

Action research is a useful methodology that allows language teacher-researchers to investigate their own pedagogical practice, either alone or in collaboration with researchers, in order to support student learning. In other words, language teach ers act as both producers and consumers of professional knowledge. In addition to the benefits of immediate pedagogical application and recognition of teacher expertise, another benefit of action research is the capacity for collaboration be tween teachers. Avineri (2017) notes that this collaboration can occur both locally as teacher-researchers engage in action research with other practitioners in their immediate instructional context and globally as teachers share their findings with the wider community of language educators and researchers. Action research can be understood as a way of engaging in systematic reflec tion through pedagogical action. In the multi-authored TESOL Quarterly research guidelines article (Mahboob et al., 2016), Burns describes a cycle of phases that make up the action research process, including planning, acting, observing, reflecting, and ongoing acting. The planning phase involves a teacher identifying an issue of concern in their instructional context or teaching practice and developing a pedagogical plan or modification to address it. The acting and observing phases consist of implementing the developed plan and gathering data as possible evidence of the plan’s effectiveness. The reflecting phase requires the teacher to consider the usefulness of their plan in light of the data and any new insights toward their teaching practice or their students’ learning process, and this leads once again into an acting phase, thereby continuing the cycle. Teacher-researchers engaging in action research make use of various data sources that reflect the teaching and learning processes as well as the contexts and spaces in which they occur. Banegas and Consoli (2020) note that these data sources might include observations and recordings of teaching and learning practices, student and teacher artifacts (e.g., assignments and lesson materials), and students’ and teachers’ reflection journals

66 Peter I. De Costa et al.

about their experiences, along with traditional data sources such as interviews and questionnaires. In our exemplar study, Jones and Mutumba (2019) focus on identity in their action research project to explore how the development of a student-focused curriculum might promote students’ literacy development. Exemplar study: Jones & Mutumba (2019) Research question How can teachers use mother tongue-based pedagogy and resources to support children’s language and literacy development, and general learning, in the pre-primary classroom? Theoretical framework Identity and agency Methods This study consisted a collaborative action research project involving one of the co-authors and two teachers at a Ugandan pre-school where English was the language of instruction. Teachers created storybooks in Luganda, the students’ mother tongue (MT), reflecting students’ lives and communities, and designed multimodal activities to support their literacy development and achieve the school’s curriculum goals. The teachers collaborated in both designing and implementing the lessons in class. Afterward, the teachers reflected together about their pedagogical practices and students’ engagement. Data sources included observations, interviews, field notes, lesson plans, and student-generated work. The data were analyzed to examine student language use and overall learning, as well as expressions of student identity related to MT use. Findings The study found that the MT-based practices and materials affirmed students’ identity by creating a supportive learning environment where students could “understand, expand and contribute their ideas, and strengthen and build their social networks” (p. 217). The teacher-researchers observed (a) student engagement with materials as they validated their identities and with each other as a classroom community, (b) student agency in interacting with each other and with their teachers, and (c) student awareness of their developing translingual identities as users of both Luganda and English. Take-aways This research design exemplifies action research in two ways. First, teacher-researchers engaged in a truly collaborative endeavor throughout the entire action research cycle, including preparing pedagogical materials, implementing activities, analyzing data, and reflecting on their teaching practices and student experiences. In addition, this study highlights the potential for action research to generate knowledge and increased awareness among teacher-researchers, as the teachers participating in this study benefited by developing their own MT-based pedagogical practice.

3.4

Chapter 3. Qualitative ISLA research methodologies and methods 67

Grounded theory

Created in the U.S. in the 1960s by sociologists Glaser and Strauss, grounded theory (GT) (Glaser & Strauss, 1967) was developed as an alternative to positivist-oriented research. Reflective of the cultural revolution happening throughout the world at the time, which challenged the notion of inherent standards and norms, GT pushed qualitative research away from applying theoretical frameworks a priori, arguing instead that theory should be “grounded” or should originate from the data col lected (Charmaz, 2014). Often, researchers adapt elements of GT and couple them with other methodologies. These types of studies, which we refer to as grounded theory-like, use an abridged version of the GT process. GT-influenced studies are too diverse to be summarized in one publication, but we will draw your attention to a common characteristic which is how researchers have used GT’s unique coding sequence to analyze their data. Coding is a common action performed by researchers regardless of the research conducted (i.e., qualitative, quantitative, or mixed-methods). Coding in GT fol lows a particular set of conventions. However, the act of coding is separated into three levels, each with its own purpose. How these levels are labeled differs from researcher to researcher, but the concept remains the same.1 We will use Hadley’s (2017) labels of open, focused, and theoretical coding to explain the three levels. Open coding is the first step of analysis where the researcher engages the data iden tifying what is and what is not present. Along with memo-writing, reflexivity is central in initial coding, as explained by Hadley (2017), who adds that “coding will always be idiosyncratic. It reveals as much about your ontological and epistemo logical beliefs as it does about the material being coded” (p. 103). As the researcher extracts codes from the data, the second level of coding requires these codes to be grouped into categories. Focused coding groups individual codes together in categories based on commonalities. Throughout the focused coding process, the researcher revisits her data, memos, and initial coding notes to ensure that the focused codes are not omitting information or narrowing the focus of the analysis. As the focused coding process continues, the categories that are found begin to lay the foundation for the emergent theory, which is constructed at the theoretical coding level. In our exemplar study, Henry (2019) uses GT to analyze how motiva tion develops through online media creation among learners of English in Sweden.

1. It is not uncommon to find in the GT literature these terms used: open coding, axial coding, and selective coding which loosely correspond with Hadley’s (2017) open, focused, and theoret ical coding categories, respectively.

68 Peter I. De Costa et al.

Exemplar study: Henry (2019) Research questions How can we understand L2 motivation that arises when learners create online media? Theoretical framework Motivation and identity Methods Henry collected data in the form of field notes, interviews, observations, and student blog posts. From these pieces of data, he created his open codes, which were then compared to previous literature on the media practices of individuals the same age as his participants. From these back-and-forth analyses, three conceptual categories of motivational influences were identified: influences associated with (1) the blog artifacts, (2) the perception of the online media audience, and (3) the documentation of identities constructed by his participants. Henry then applied those conceptual categories to the data set to verify their relevance before arriving at a theoretical framework to guide his study. Findings He found that L2 motivation in online spaces is highly dependent on the relevance of the tasks being performed and thus called for digital L2 motivation to be treated as an original construct. In addition, as technology use becomes more common in language classrooms, he asserted that teachers need to be conscious of the activities they implement and the connection between in-class activities and activities that their students are already performing outside of the classroom. Take-aways Henry’s study demonstrates the effectiveness GT offers when exploring contexts with little or no preexisting attention. This is important in ISLA research because instruction has branched beyond traditional classrooms. Non-traditional digital platforms such as online media used by language learners require new methodologies of inquiry that can be constructed through GT or GT-like research. GT can be a useful methodology for ISLA researchers who hope to explore non-traditional classrooms, pedagogical practices and modalities, and other language learning contexts that have arisen from technological innovation and social advancements.

3.5

Conversation analysis

The usefulness of conversation analysis (CA) for ISLA research is reflected in Wong and Waring’s (2021) observation that “conversation is the medium through which we do language learning” (p. 2). CA is ideal for observing how interactional com petence develops and how opportunities for L2 learning are created in interactional contexts (Kasper & Wagner, 2011). By seeking to understand interactional com petence in terms of language learners’ communicative practices (e.g., turn-taking, sequencing, and repair), CA highlights the structural aspects as well as the social nature of L2 development. Importantly, CA is both a theory and a methodology.

Chapter 3. Qualitative ISLA research methodologies and methods 69

CA principles can be understood in terms of three processes: data collection, transcription, and analysis. CA data must be collected from naturally occurring interaction, including classroom interaction, not produced for or solicited by the analyst in any way. CA observes the development of L2 interactional competence within naturally-occurring talk. To capture such data, audio or video recordings of interaction are necessary because they facilitate transcription and analysis by al lowing for repeated listening and providing a multimodal record of the interaction. The next step is transcribing the data so that they can be analyzed visually in written form. In addition to recording the content of talk-in-interaction, CA transcription practices have the power to capture paralinguistic features of talk (e.g., pauses, stress, and volume), which may also carry interactional meaning. Furthermore, researchers should transcribe interaction as they hear it, not mak ing corrections to what is actually said. This maintains an authentic record of the interaction as the participants produced it. Finally, researchers should provide a transcription key to illustrate symbols or notation used to signify linguistic and paralinguistic features of interaction. The third step is analyzing the data. A central principle to analysis is CA’s emic approach to understanding talk-in-interaction from participants’ perspectives. Analyzing CA data begins with identifying interactional practices and routines, such as turn-taking, silence, or repetition. While some analysts may already have an interactional practice they plan to investigate, they should also engage in unmotivated looking for potential practices of relevance that emerge through inductive analysis. While this bottom-up tendency stems from CA’s ethnomethodological perspective, which intentionally avoids starting with an externally imposed theory in conducting analysis, Kasper and Wagner (2011) note that “post-analytic con nections to exogenous theory may be guided by the researcher’s agenda” (p. 125). In our exemplar study, van Compernolle (2019) combines the methodological ap proach of CA with Vygotskian SCT to explore how a L2 learner develops a socio linguistic repertoire through talk-in-interaction. Exemplar study: van Compernolle (2019) Research question How is one student’s L2 sociolinguistic repertoire mediated through social interaction with his tutor? Theoretical framework Vygotskian SCT Methods The data come from a larger study focusing on developing L2 sociolinguistic awareness among U.S. university students learning French. Explicit instruction and communicative tasks focused on different language constructions, including the presence or absence of

70 Peter I. De Costa et al.

the particle ne, which is used or omitted for negation, depending on different pragmatic scenarios or relationships. One-on-one tutor meetings between one focal student, Leon, and a teacher were audio and video recorded, and examples of Leon’s negation were analyzed structurally (i.e., for the presence or absence of ne) as well as interactionally (i.e., in terms of how the construction developed amid the ongoing talk). Findings Analysis revealed the development of Leon’s negation practices as shaped by interaction with his teacher and the context of the task scenario. Leon progressed from using only ne-present negation constructions to varying between ne-present and ne-absent constructions for different scenarios. Furthermore, he engaged in “self-repair,” a practice whereby speakers make modifications to their own ongoing talk to deal with difficulties in speaking, hearing, or understanding. Leon’s ability to adjust his construction use is viewed as “gaining regulatory control” over his L2 (p. 889). Take-aways van Compernolle’s study exemplifies CA in that it traces the development of interactional competence by focusing on specific communicative practices observed by analyzing talk-in-interaction, reflecting a view of SLA as interactionally mediated. Specifically, L2 development is revealed to involve the use of lexicogrammatical structures (i.e., negation with or without the particle ne) in contextually-sensitive ways, achieved via interactional practices such as self-repair. As illustrated in this study, CA has “the potential to provide insight into the microgenetic [real time] developmental processes as they occur in usage” (p. 873).

3.6

Narrative inquiry

Narrative inquiry allows researchers to analyze the personal experiences of indi viduals with the aim of understanding how and why these individuals interact with a phenomenon in question. Traditionally, narratives have taken the form of writ ten documents provided by the participant, whose response was elicited by the re searcher (Vásquez, 2011). However, recent narrative inquiry research has expanded narratives to encompass oral interviews and/or journaling (Barkhuizen, 2016; Swain et al., 2015). Though the variety of modalities in which a narrative takes shape has grown, attention to the experience of the individual remains the central focus. Researchers who intend to use narrative inquiry as a means of analysis should reflexively consider the relationship between the participant and the researcher. The narrative is constructed collaboratively as the participant shares their narrative, verbally or written, with the researcher who contextualizes it within the broader social context in which the narrative is situated. Because narratives are (co)con structed by the researcher and their participants (Barkhuizen, 2016), researchers always need to be conscious that they are not projecting their own experiences onto their participants’ narratives when presenting and analyzing the latter’s sto ries. How the teacher’s identity, ideology, agency, and emotions interact with their

Chapter 3. Qualitative ISLA research methodologies and methods 71

pedagogical practices are common objects of investigation in narrative inquiry research. Barkhuizen (2016), who used narrative inquiry to study the imagined identity of a novice teacher, for example, found that the teacher’s personal beliefs affected how she approached her role as a teacher. Equally common in narrative inquiry research is how researchers connect the narratives of their participants with broader social contexts. For example, Swain et al. (2015), who paired narrative inquiry with SCT, examined how their participants reflected on how their personal lives influenced the ways they learned or taught languages. The authors also took into consideration the impact of emotions on the language learning and teaching experiences of their participants and how these emotions were shaped by forces beyond the classroom. In our exemplar study, Prior (2016) use narrative inquiry to study transnational identity construction and emotions. Exemplar study: Prior (2016) Research questions What are the sociolinguistic trajectories of adult working-class immigrants? Theoretical framework(s) Identity and emotion Methods Prior’s data set consisted of multiple interviews. The first participant in the study helped recruit others using their own social network. Though sexual orientation was not an element of inquiry by Prior, it became a commonality that bound Prior to his participants as their shared sexual orientation contributed to the participants’ willingness to contribute to the study. Findings Prior found that the experiences of his participants were often sources of trauma tied to their status as immigrants and their sexual orientation. These traumatic experiences resulted in participants forming close social groups of individuals who shared similar characteristics. This further segregated the participants and solidified their identity as an immigrant or outsider. Prior also found the research process emotionally charged for himself and the participants. As participants shared their trauma with Prior, his reflexivity was shifted from researcher to confidant. Prior was forced to recognize his own emotions in relation to his participants’ stories. Take-aways Prior’s study highlights: (1) reflexivity between the story owner and storyteller, and (2) time and space of the narrative. As shown by the recruitment process and the data collection process, Prior’s reflexivity greatly influenced the participants’ willingness to engage the study and what they were willing to share during the interviews. This influenced how the narratives were shaped, which did not reflect Prior’s original intention. Regarding the second point, Prior shows how broader sociocultural constructs shaped how the narratives were constructed. The participants’ identities tied to their immigration status and sexual orientation were directly influenced by the perception of the majority population both of which further segregated them.

72

Peter I. De Costa et al.

4. Advice for future qualitative researchers To reiterate, qualitative methodologies need to be aligned with the theories that guide the study. However, in tandem with the social orientation of such methodologies, ISLA researchers will need to expand their theoretical scope by learning about socially-oriented SLA theories. A good place to start is Atkinson’s (2011) edited volume, Alternative approaches to second language acquisition. This edited volume is a nice complement to this chapter as it maps onto the theoretical approaches we discuss, as well as other approaches not mentioned in this chapter. Throughout this chapter, we have also emphasized the need to be reflexive about your role as a researcher. Such reflexivity, which entails providing detailed and transparent explanations of your rationalizations as a researcher regarding the choices made throughout your study, contributes significantly to the study’s and your credibility as a researcher (Duff, 2012a). At the same time, because of the evolving nature of data and data collection processes, you need to be both theoretically and methodologically adaptable and flexible (Rose, 2017), a point to which we alluded at the outset of this chapter. In this vein, you will also need to be reflexive about unexpected dilemmas that occur during the data collection, analysis, and presentation phases of your study (for de tails, see De Costa et al., 2021). Flexibility is key here because researchers routinely must make executive just-in-time decisions to uphold the core ethical principles of respect for persons, yielding optimal benefits while minimizing harm and preserv ing justice. And this includes deciding where to present and publish your research so that your findings can be accessible to a wider audience (e.g., teachers, policy makers, parents) and not just an exclusive, academic audience. While we did not address the meta-methodological aspects of qualitative ISLA research, we would like to direct you to the growing place and value of qualitative research synthesis (QRS). In contrast to a meta-analysis which is quantitative in nature, QRS is “a useful method to aggregate qualitative findings of naturalistic classroom-based studies, which are often criticized because of their lack of gener alizability” (Chong & Plonsky, 2021, p. 3). In fact, Chong and Plonsky discuss two benefits of QRS, namely, its potential (1) to offer a more holistic view of how specific pedagogical interventions are implemented and experienced and (2) to facilitate research-pedagogy dialogue by reaching audiences beyond academia.

Chapter 3. Qualitative ISLA research methodologies and methods 73

5. Troubleshooting qualitative methods Researchers are likely to encounter challenges when they begin to use qualitative methods in their work. With regard to study design, we strongly recommend that you immerse yourself in the social context (e.g., the physical or virtual classroom or the broader community in which the classroom or school is situated) that con stitutes your research site. Prolonged engagement with the site will give you a bet ter idea of which aspect of language learning or teaching (e.g., identity, emotion, agency) to focus on and, correspondingly, which methodology best aligns with the phenomenon in question. Due to the difficulty in gaining access and establishing trust with research participants, it may be a good idea to start with contexts and participants you already know well. Another common challenge researchers face is choosing what data to collect and ensuring that the analytical method aligns with the theoretical framework being applied. You may start with a methodology that typically involves analysis of certain documents. However, your selection of methods will ultimately depend on what data you need and have access to. For example, a participant might grant you consent to audio but not video recording, which will determine the subse quent mode of data analysis. One quick way to acquaint yourself with the wide array of discourse analytic approaches would be to consult a handbook such as The Routledge handbook of discourse analysis (Gee & Handford, 2012) before exploring in greater depth a specific approach that you select. Keep in mind that you may need to be flexible as you adapt to the constraints of your available data. However, in the end, there needs to be discourse and analytic and theoretical congruence. Put simply, there is no shortcut or silver bullet to resolving design, data collection, and data analysis concerns; you need to invest the time in learning about the available options – both theoretical and methodological – before making a decision. 6. Conclusions In this chapter, we introduced you to six methodologies that have been used in qualitative ISLA research. But the menu we have offered you is by no means exhaustive. Given the transdisciplinary nature of our field, it would not be surpris ing if ISLA researchers decide to adopt methodologies such as phenomenology and historiography (Wilson & Anagnostopoulos, 2021) from other disciplines in the near future. In keeping with Dewaele’s (2019) call for ontological, epistemological, and methodological diversity in applied linguistics, we argue that such diversity should also be applied to ISLA research. This step forward can take the form of embracing mixed methods research (Sato, this volume) as we think of new ways to extend and enhance the ISLA research agenda.

74

Peter I. De Costa et al.

7. Further reading and additional resources 7.1

General overview of different methodologies

Phakiti, A., De Costa, P. I., Plonsky, L., & Starfield, S. (Eds.). (2018). The Palgrave handbook of applied linguistics research. Palgrave. https://doi.org/10.1057/978-1-137-59900-1 McKinley, J., & Rose, H. (Eds.). (2020). The Routledge handbook of research in applied linguistics. Routledge.

7.2

Specific methodologies

Case study: Yin, R. K. (2018). Case study research: Design and methods (6th ed.). Sage. Ethnography: Copland, F., & Creese, A. (2017). Linguistic ethnography: Collecting, analyzing, and presenting data. Sage. Grounded theory: Hadley, G. (2017). Grounded theory in applied linguistics: A practical guide. Routledge. https://doi.org/10.4324/9781315758671 Action research: Avineri, N. (2017). Research methods for language teaching. Palgrave. https://doi.org/10.1057/978-1-137-56343-9 Conversation analysis: Wong, J., & Waring, H. Z. (2021). Conversation analysis and second language pedagogy: A guide for ESL/EFL teachers (2nd ed.). Routledge. Narrative Inquiry: Barkhuizen, G., Benson, P., & Chik, A. (2014). Narrative inquiry in language teaching and learning research. Routledge.

References Ahearn, L. M. (2001). Language and agency. Annual Review of Anthropology, 30(1), 109–137. https://doi.org/10.1146/annurev.anthro.30.1.109 Anderson, T. (2017). The doctoral gaze: Foreign PhD students’ internal and external academic discourse socialization. Linguistics and Education, 37, 1–10. https://doi.org/10.1016/j.linged.2016.12.001 Atkinson, D. (Ed.). (2011). Alternative approaches to second language acquisition. Routledge. https://doi.org/10.4324/9780203830932 Banegas, D. L., & Consoli, S. (2020). Action research in language education. In J. McKinley & H. Rose (Eds.), The Routledge handbook of research methods in applied linguistics (pp. 176–187). Routledge. Barcelos, A., & Aragao, R. (2018). Emotions in language teaching: A review of studies on teacher emotions in Brazil. Chinese Journal of Applied Linguistics, 41(4), 506–531. https://doi.org/10.1515/cjal-2018-0036 Barkhuizen, G. (2016). A short story approach to analyzing teacher (imagined) identities over time. TESOL Quarterly, 50(3), 655–683. https://doi.org/10.1002/tesq.311 Burdelski, M., & Howard, K. (Eds.). (2020). Language socialization in classrooms. Cambridge University Press. https://doi.org/10.1017/9781316946237

Chapter 3. Qualitative ISLA research methodologies and methods 75

Cekaite, A. (2020). Teaching words, socializing affect, and social identities. In M. Burdelski & K. Howard (Eds.), Language socialization in classrooms (pp. 112–131). Cambridge University Press. https://doi.org/10.1017/9781316946237.008 Charmaz, K. (2014). Constructing grounded theory (2nd ed.). Sage. Chong, S. W., & Plonsky, L. (2021). A primer on qualitative research synthesis in TESOL. TESOL Quarterly, 55(3), 1024–1034. https://doi.org/10.1002/tesq.3030 Darvin, R. (2019). L2 motivation and investment. In M. Lamb, K. Csizér, A. Henry, & S. Ryan (Eds.), The Palgrave handbook of motivation for language learning (pp. 245–264). Palgrave Macmillan. https://doi.org/10.1007/978-3-030-28380-3_12 Darvin, R., & Norton, B. (2021). Investment and motivation in language learning: What’s the difference? Language Teaching. Advance online publication. https://doi.org/10.1017/S0261444821000057 De Costa, P. I. (2010). Language ideologies and standard English language policy in Singapore: Responses of a ‘designer immigrant’ student. Language Policy, 9(3), 217–239. https://doi.org/10.1007/s10993-010-9176-1 De Costa, P. I., Li, W., & Rawal, H. (2019a). Language teacher emotions. In M. A. Peters (Ed.), Springer Encyclopedia of teacher education. Springer. https://doi.org/10.1007/978-981-13-1179-6_262-1 De Costa, P. I., Li, W., & Rawal, H. (2019b). Qualitative classroom methods. In J. W. Schweiter & A. Benati (Eds.), The Cambridge handbook of language learning (pp. 111–136). Cambridge University Press. https://doi.org/10.1017/9781108333603.006 De Costa, P. I., Rawal, H., & Li, W. (2018). Should I stay or leave? Exploring L2 teachers’ pro fession from an emotionally inflected framework. In C. Gkonou, J. M. Dewaele, & J. King (Eds.), The emotional rollercoaster of language teaching (pp. 211–227). Multilingual Matters. De Costa, P. I., Sterling, S., Lee, J., Li, W., & Rawal, H. (2021). Research tasks on ethics in applied linguistics. Language Teaching, 54(1), 58–70. https://doi.org/10.1017/S0261444820000257 De Costa, P. I., Valmori, L., & Choi, I. (2017). Qualitative research methods. In S. Loewen & M. Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 522–540). Routledge. https://doi.org/10.4324/9781315676968-29 Dewaele, J.-M. (2019). The vital need for ontological, epistemological and methodological diver sity in applied linguistics. In C. Wright, L. Harvey, & J. Simpson (Eds.), Voices and practices in applied linguistics: Diversifying a discipline (pp. 71–88). White Rose University Press. https://doi.org/10.22599/BAAL1.e Dörnyei, Z. (2009). The L2 motivational self system. In Z. Dörnyei & E. Ushioda (Eds.), Motivation, language identity, and the L2 self (pp. 9–42). Multilingual Matters. https://doi.org/10.21832/9781847691293-003 Dörnyei, Z., MacIntyre, P., & Henry, A. (Eds.). (2015). Motivational dynamics in language learning. Multilingual Matters. Duff, P. A. (2012a). How to carry out case study research. In A. Mackey & S. M. Gass (Eds.), Research methods in second language acquisition (pp. 95–116). Wiley-Blackwell. Duff, P. A. (2012b). Identity, agency, and second language acquisition. In S. M. Gass & A. Mackey (Eds.), Handbook of second language acquisition (pp. 410–426). Routledge. Duff, P. (2020a). Case study research: Making language learning complexities visible. In J. McKinley & H. Rose (Eds.), The Routledge handbook of research methods in applied linguistics (pp. 144–153). Routledge.

76

Peter I. De Costa et al.

Duff, P. (2020b). Language socialization in classrooms: Findings, issues, and possibilities. In M. Burdelski & K. Howard (Eds.), Language socialization in classrooms (pp. 249–264). Cambridge University Press. https://doi.org/10.1017/9781316946237.016 Ferrada, J. S., Bucholtz, M., & Corella, M. (2020). “Respeta mi idioma”: Latinx youth enacting affective agency. Journal of Language, Identity, and Education, 19(2), 79–94. https://doi.org/10.1080/15348458.2019.1647784 Friedman, D. (2019). Citation as a social practice in a TESOL graduate program: A language socialization approach. Journal of Second Language Writing, 44, 23–36. https://doi.org/10.1016/j.jslw.2019.01.004 Gagné, N., & Parks, S. (2013). Cooperative learning tasks in a grade 6 intensive ESL class: Role of scaffolding. Language Teaching Research, 17(2), 118–209. https://doi.org/10.1177/1362168812460818 Gardner, R. C. (1985). Social psychology and second language learning: The role of attitudes and motivation. Edward Arnold. Gee, J. P., & Handford, M. (Eds.). (2012). The Routledge handbook of discourse analysis. Routledge. Gkonou, C., & Miller, E. R. (2021). An exploration of language teacher reflection, emotion labor, and emotion capital. TESOL Quarterly, 55(1), 134–155. https://doi.org/10.1002/tesq.580 Glaser, B. & Strauss, A. (1967). The discovery of grounded theory: Strategies for qualitative research. Aldine. Goodman, B. & Montgomery, D. P. (2020). “Now I always try to stick to the point:” Socialization to and from genre knowledge in an English-medium university in Kazakhstan. Journal of English for Academic Purposes, 48, 1–14. https://doi.org/10.1016/j.jeap.2020.100913 Henry, A. (2019). Online media creation and L2 motivation: A socially situated perspective. TESOL Quarterly, 53(2), 372–404. https://doi.org/10.1002/tesq.485 Isbell, D. R. (2018). Online informal language learning: Insights from a Korean learning com munity. Language Learning & Technology, 22(3), 82–102. Jiang, A. & Zhang, L. (2021). Teacher learning as identity change: The case of EFL teachers in the context of curriculum reform. TESOL Quarterly, 55(1), 271–284. https://doi.org/10.1002/tesq.3017 Jones, S., & Mutumba, S. (2019). Intersections of mother tongue-based instruction, funds of knowledge, identity, and social capital in an Ugandan preschool classroom. Journal of Language, Identity, and Education, 18(4), 207–221. https://doi.org/10.1080/15348458.2019.1607349 Kanno, Y., & Norton, B. (Eds.). (2003). Imagined communities and educational possibilities [Special issue]. Journal of Language, Identity, and Education, 2(4). Kasper, G. & Wagner, J. (2011). A conversation-analytic approach to second language acqui sition. In D. Atkinson (Ed.), Alternative approaches to second language acquisition (pp. 117–142). Routledge. Lantolf, J. P. (2011). The sociocultural approach to second language acquisition: Sociocultural theory, second language acquisition, and artificial L2 development. In D. Atkinson (Ed.), Alternative approaches to second language acquisition (pp. 24–47). Routledge. Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge University Press. https://doi.org/10.1017/CBO9780511815355 Lee, S.-H. & Kinginger, C. (2018). Narrative remembering of intercultural encounters: A case study of language program reintegration after study abroad. Modern Language Journal, 102(3), 578–593. https://doi.org/10.1111/modl.12505

Chapter 3. Qualitative ISLA research methodologies and methods 77

Lillis, T. (2008). Ethnography as method, methodology, and “deep theorizing:” Closing the gap between text and context in academic writing research. Written Communication, 25(3), 353–388. https://doi.org/10.1177/0741088308319229 MacIntyre, P. D., Clement, R., Dörnyei, Z., & Noels, K. A. (1998). Conceptualizing willingness to communicate in a L2: A situated model of confidence and affiliation. Modern Language Journal, 82(4), 545–562. https://doi.org/10.1111/j.1540-4781.1998.tb05543.x Mahboob, A., Paltridge, B., Phakiti, A., Wagner, E., Starfield, S., Burns, A., Jones, R. H., & De Costa, P. I. (2016). TESOL Quarterly research guidelines. TESOL Quarterly, 50(1), 42–65. https://doi.org/10.1002/tesq.288 Merriam, S. B. (2009). Qualitative research: A guide to design and implementation. Jossey-Bass. Miller, E. R. (2012). Agency, language learning, and multilingual spaces. Multilingua, 31, 441–468. https://doi.org/10.1515/multi-2012-0020 Mirhosseini, S.-A. (2018). An invitation to the less-treaded path of autoethnography in TESOL research. TESOL Journal, 9(1), 76–92. https://doi.org/10.1002/tesj.305 Norton, B. (2013). Identity and language learning: Extending the conversation. Multilingual Matters. https://doi.org/10.21832/9781783090563 Norton, B., & De Costa, P. I. (2018). Research tasks on identity and language education. Language Teaching, 51(1), 90–112. https://doi.org/10.1017/S0261444817000325 Ochs, E., & Schieffelin, B. B. (2012). The theory of language socialization. In A. Duranti, E. Ochs, & B. B. Schieffelin (Eds.). The handbook of language socialization (pp. 1–21). Wiley-Blackwell. Pappa, S., Moate, J., Ruohotie-Lyhty, M., & Eteläpelto, A. (2019). Teacher agency within the Finnish CLIL context: Tensions and resources. International Journal of Bilingual Education and Bilingualism, 22(5), 593–613. https://doi.org/10.1080/13670050.2017.1286292 Prior, M. T. (2016). Emotion and discourse in L2 narrative research. Multilingual Matters. Rawal, H., & De Costa, P. I. (2019). “You are different and not mainstream:” An emotion-based case study of two South Asian English language learners. International Multilingual Research Journal, 13(4), 209–221. https://doi.org/10.1080/19313152.2019.1590906 Rose, H. (2017). Responding to theoretical shifts in research design. In J. McKinley & H. Rose (Eds.), Doing research in applied linguistics: Realities, dilemmas, and solutions (pp. 27–36). Routledge. Rose, H., McKinley, J., & Baffoe-Djan, J. B. (2019). Data collection research methods in applied linguistics. Bloomsbury. https://doi.org/10.5040/9781350025875 Saville-Troike, M., & Barto, K. (2017). Introducing second language acquisition (3rd ed.). Cambridge University Press. Swain, M., Kinnear, P., & Steinman, L. (2015). Sociocultural theory in second language education: An introduction through narratives (2nd ed.). Multilingual Matters. https://doi.org/10.21832/9781783093182 Ushioda, E. (2009). A person-in-context relational view of emergent motivation, self and iden tity. In Z. Dörnyei & E. Ushioda (Eds.), Motivation, language identity, and the L2 self (pp. 215–228). Multilingual Matters. https://doi.org/10.21832/9781847691293-012 Uzum, B. (2017). Uncovering the layers of foreign language teacher socialization: A qualitative case study of Fulbright language teaching assistants. Language Teaching Research, 21(2), 241–257. https://doi.org/10.1177/1362168815614338 van Compernolle, R. A. (2019). Constructing a second language sociolinguistic repertoire: A sociocultural usage-based perspective. Applied Linguistics, 40(6), 871–893. https://doi.org/10.1093/applin/amy033

78

Peter I. De Costa et al.

Vásquez, C. (2011). TESOL, teacher identity, and the need for “small story” research. TESOL Quarterly, 45(3), 535–545. https://doi.org/10.5054/tq.2011.256800 Wilson, S. M., & Anagnostopoulos, D. (2021). Methodological guidance paper: The craft of con ducting a qualitative review. Review of Educational Research, 91(5), 651–670. https://doi.org/10.3102/00346543211012755 Yazan, B. (2015). Three approaches to case study methods in education: Yin, Merriam, and Stake. The Qualitative Report, 20(2), 134–152. https://doi.org/10.46743/2160-3715/2015.2102

Chapter 4

Mixed methods research in ISLA Masatoshi Sato

Universidad Andrés Bello

This chapter explores the potential of mixed methods research (MMR) for conducting robust ISLA research to understand complex second language (L2) learning phenomena in instructed settings. MMR provides researchers with a pragmatic framework to answer research questions by mixing quantitative and qualitative approaches during data collection, analysis, and integration. In this chapter, I will first discuss how MMR can counter ISLA’s unique methodolog ical challenges by zeroing in on the importance of the relationship between research and practice. I will then examine the three core MMR designs proposed by Creswell and Plano Clark (2018): convergent, explanatory sequential, and exploratory sequential. As a key to successful MMR, I will focus on integration of quantitative and qualitative components in writing a manuscript for publica tion. In conclusion, I argue that a successful MMR ISLA study carefully balances internal and ecological validity to maintain its scientific rigor, while incorporat ing teachers’ and students’ voices and experiences into the study. Keywords: mixed methods research, ecological validity, evidence-based practice, practice-based research

1. What is MMR and why is it important in ISLA research? 1.1

MMR as a research paradigm

MMR can be defined simply as research methodology in which quantitative (QUAN) and qualitative (QUAL) methods are combined at the stages of data col lection, analysis, and interpretation. MMR can also be seen as a methodological paradigm “with its own worldviews” (Tashakkori & Teddlie, 2003, p. x). This is because MMR incorporates different ontological (e.g., What is reality? What is truth?) and epistemological (e.g., How is knowledge constructed? How is knowl edge reached?) beliefs into an empirical investigation (Johnson & Onwuegbuzie, 2004). In other words, because different worldviews (e.g., post-positivism and https://doi.org/10.1075/rmal.3.04sat © 2022 John Benjamins Publishing Company

80 Masatoshi Sato

constructivism) are integrated into a single study, MMR is a third research para digm, independent of QUAN and QUAL research paradigms (see Hulstijn et al., 2014; Uprichard & Dawney, 2019). This methodological and philosophical debate continues to date (i.e., paradigm war) especially in the journal dedicated to MMR (Journal of Mixed Methods Research: see Ghiara, 2020). Understanding this debate is important given that QUAN and QUAL methods can often answer the same research question yet QUAN and QUAL findings are often shared and discussed separately in their respective research communities. For example, it is rare to see a research synthesis combining a meta-analysis and a narrative review (see Chapter 3 about qualitative research synthesis), potentially due to a researcher’s philosophical beliefs regarding what counts as research evidence. Such a philosophical clash or incommensurability is ultimately unproductive if the collective goal of ISLA re search is to discover L2 learning processes and ways in which those processes are facilitated through instruction. Say a research objective is to understand L2 learners’ perceptions of contentbased instruction. This objective can be met with QUAN or QUAL methodology. A researcher can design a quasi-experimental study – a QUAN design in which the effectiveness of an intervention is tested with an experimental group (i.e., the group that receives the intervention) and a control group (i.e., the group that receives the same instruction minus the specific intervention). In this study, the experimental group receives content-based instruction and the students’ perceptions collected via a questionnaire are compared with those of the control group (see Chapter 2). Another researcher can interview a group of learners in a content-based class ask ing about their perceptions in depth – a QUAL method (see Chapter 3). However, some researchers may see a quasi-experimental study as inappropriate because the unique experiences of individual learners are not considered. Alternatively, other researchers may perceive interviewing a small number of learners as an invalid method because the results cannot be generalized to other learners. The way in which a person perceives the world is strictly individualistic; there is no right or wrong ontology or epistemology. To this end, MMR is a pragmatic approach to investigating complex issues of L2 teaching and learning (at least to my personal ontology and epistemology) (Johnson et al., 2007). Pragmatism is a philosophy on its own. Going back to the example before, while the research question related to the impact of content-based instruction can be answered by either QUAN or QUAL method, a pragmatic ap proach would be: why not combine them if both QUAN and QUAL methods can be usefully used to answer the research question. It is true that questionnaire re sults may be generalizable but the method may miss important information related to individual learners’ experiences. It is also true that the interview results may

Chapter 4. Mixed methods research in ISLA 81

give us in-depth information of individual learners but the method may not con sider other learners than the study’s participants or the impact of content-based instruction is not experimentally teased apart. To this end, combining QUAL and QUAN approaches can provide nuanced findings, by incorporating a balanced, plu ralistic, problem-solving perspective into research. However, pragmatism does not mean that a researcher can do whatever they want to do without proper justification (see Denscombe, 2008). I will discuss this in more detail in the following section. MMR has some additional benefits. First, MMR has the potential to facili tate communication between researchers who are believers of QUAN or QUAL approaches (Maxcy, 2003). MMR research can be consumed and taken up to advance research agendas by both QUAN-oriented and QUAL-oriented research ers. Second, in an effort to tackle complex educational issues, QUAN-oriented and QUAL-oriented researchers may collaborate to answer research questions together. Third, a pragmatic approach may well serve researchers in communicating with practitioners who also have different epistemological beliefs; some teachers may be inclined to incorporate QUAN findings into their pedagogical decision making more yet others may appreciate QUAL findings more. 1.2

How MMR can address challenges of ISLA research

ISLA is an academic field that aims to understand L2 learning processes in in structed settings (see Chapter 1 to the current volume; Loewen, 2020; Sato, under review). One of ISLA’s research objectives is to influence pedagogical practices so that L2 teaching becomes more efficient and effective (Ortega, 2012). To this end, ISLA researchers face challenges in conducting a scientifically rigorous study whose findings are practically relevant. Those include dual commitment, complexity of teaching, and context. MMR can be a useful methodological approach to each of those challenges. First, when a L2 researcher hopes to conduct a study resulting in pedagogical implications, they are committed to two objectives that are often conflicting with each other. The first is to advance scientific knowledge. In order to do this – usually via publications in research journals – a study needs to be theoretically compelling and methodologically rigorous. The second commitment is to influence pedagogi cal practices. The researcher may wish to produce and share research findings that are readily applicable to teaching. The two objectives are conflicting because, first, publications in research journals are behind a journal paywall requiring subscrip tion and often physically inaccessible for practitioners. Second, academic publica tions may be conceptually inaccessible due to their philosophical and theoretical issues, jargon, technical language, and inferential statistics all of which require

82

Masatoshi Sato

specific research training to consume (Borg & Liu, 2013; Marsden & Kasprowicz, 2017; Sato & Loewen, 2022; Spada, 2015: see also Chapters 1 and 15 of the current volume). I argue that this “dual commitment to improving educational practices and furthering our understanding of learning processes” (Sandoval, 2014, p. 20) ne cessitates a nuanced methodological approach. While advancing our understanding of L2 learning in instructed settings via QUAN and QUAL methodologies, MMR can produce evidence that is more likely to be accepted by practitioners. The second challenge for ISLA researchers in conducting a study with high practical relevance is the gap between the complexity of classroom teaching and the nature of research. Simply speaking, no single study can answer a pedagogical question that always involves multiple factors influencing the impact of teaching on student learning. These factors include classroom management (Egeberg et al., 2016), learner psychology (Sato & Csizér, 2021), and curricular objectives (Leite et al., 2020), to name just a few. However, educational research by nature is lim ited to explaining successful/unsuccessful instruction in a particular context with a particular group of learners. For example, when a researcher decides to con duct a classroom-based study in order to increase the study’s ecological validity, small sample sizes seem a necessary evil, which sometimes results in “small-scale decontextualized experimental” research (Larsen-Freeman, 2015, p. 270; see also discussion in Chapter 6). Also, in order to advance theories of L2 learning, research ers need to focus on specific variables (e.g., linguistic targets, types of feedback, the degree of task complexity, etc.), which may result in limited applicability of the findings from a practitioner’s perspective. McIntyre (2005, p. 360) argued that the “difference between the complexity of teaching and the simplicity of research findings” contributes to the gap between research and practice. Going back to the sample size issue as an example, while there are ways to deal with small sample sizes statistically (see Loewen & Hui, 2021), MMR can compensate for those often unavoidable methodological issues comparatively more effectively than can QUAN or QUAL methodology alone. Yet another challenge for ISLA researchers is the mediating effect of teaching and learning contexts on L2 learning processes. Similar to the challenge related to the complexity of classroom teaching, the generalizability of any ISLA study is constrained by the fact that the data were collected in a particular teaching/ learning context with a focus on specific variables. Consequently, claims based on classroom-based studies can be “mushy, highly contingent, and heavily qualified” (Levin, 2013, p. 14). One way to tackle this challenge is to include contextual infor mation in research output so that practitioners would be more equipped to adjust research findings to their respective teaching contexts. To do this, QUAL data are useful to supplement inferential statistics. For one, a researcher can add QUAL

Chapter 4. Mixed methods research in ISLA 83

data of the context of the study (e.g., documents of the governmental educational policies). For another, QUAL data (e.g., classroom observation) can add contextual information to the QUAN results (e.g., pre-post test results of a classroom interven tion). As Cukurova et al. (2018) argued, the omission of contextual information in research articles “devalues the impact of this research on the practice of educators… [and] prevents the application of research findings to their own contexts, and there fore prevents the relatability of them” (p. 324: emphasis added). 1.3

Practice-based research (PBR)

In defining the scope of ISLA, Sato (under review) argued that a missing piece for the research field is investigations of the relationship between researchers and prac titioners. This is because even if a researcher conducted a study with high practical relevance, its findings are unlikely to reach the classroom when practitioners do not see their relevance or categorically reject using the findings. To this end, in the field of general education, recent research frameworks emphasize collaborative partnerships between researchers and practitioners so that research becomes more useful for the complex realities of classroom teaching (see Coburn & Penuel, 2016; Farley-Ripple et al., 2018; Joyce & Cartwright, 2020; see also Chapter 15, this vol ume). Those initiatives value involvement of practitioners in designing, conducting, and evaluating a research study. In the field of ISLA, Sato and Loewen (2022) proposed practice-based research (PBR) as a methodological framework designed to facilitate the dialogue between researchers and practitioners.1 In PBR – juxtaposed with evidence-based practice (EBP) in which knowledge is assumed to derive from researchers and practitioners are considered as knowledge recipients – practitioners’ knowledge and experiences are valued and incorporated into research. In PBR, research agendas are deter mined by researchers and teachers together (see Hargreaves, 2000). In ideal PBR, researchers and teachers equally contribute to research and mutually benefit from it. PBR is a cyclical model with three main components. First, at the stage of selecting a topic and designing the study, the researcher consults with practitioners to discover a pedagogically-relevant issue. This way, the researcher can avoid investigating an issue that is not actually important for practitioners. QUAL methods (e.g., focus group) may be used in this stage to 1. While action research is an effective way to promote the research-practice relationship, it is not included in the current discussion. This is because action research is done by a specific group of teachers who are interested in conducting research themselves and who are afforded time to devote to conducting research; see Chapter 3 for information on action research.

84

Masatoshi Sato

understand practitioners’ concerns and/or a needs analysis with a questionnaire (QUAN) may ensure the chosen topic is indeed practically relevant. When design ing the study (e.g., intervention materials) as well, the researcher collaborates with teachers to explore what appropriate material for their students would be, how to engage students with the material, whether the material is pedagogically sound, etc. Classroom observation and teacher interviews (QUAL) would be helpful to increase the intervention’s ecological validity (QUAN). Second, during data col lection, teachers are front and center in administering the intervention material so that the researcher can obtain data from an authentic learning environment. It is advisable to obtain both QUAN (e.g., pre-post tests examining the learn ing products; see Chapter 2, this volume) and QUAL (e.g., systematic classroom observations focusing on the learning processes) data so that the ecological validity of the study increases (see Sato & Loewen, 2019). Finally, the resulting pedagogical recommendations of the study are examined by teachers – ideally the teachers who were involved in the first cycle. It is likely that the teacher discovers a new pedagog ical challenge during this process which can be again taken up by the researcher. In this stage, QUAN (e.g., a questionnaire) and QUAL (e.g., interviews) can be used in examining the sustainability of the intervention in the real classroom and in developing a new research agenda (returning to the first stage). 2. Typical research questions in mixed methods ISLA research Research questions in MMR can be posed as a combination of separate QUAN and QUAL questions or a single question incorporating QUAN and QUAL components (Tashakkori & Creswell, 2007). Regardless of how research questions are framed, they are to arrive at “the most informative, complete, balanced, and useful research results” (Johnson et al., 2007, p. 129). Again, MMR is a pragmatic approach to an swering research questions of any type. Below are some examples of how research questions are formulated in ISLA research. Nakatsuhara et al. (2017) used the convergent design (specific designs will be explained in detail in the following section) to compare face-to-face and video-conference modes for testing L2 speaking skills. One of the research ques tions was: “Are there any differences in linguistic output, specifically types of lan guage function, elicited from test takers?” To answer this question, the researcher analyzed test sores (QUAN) and transcripts of the recorded speeches (QUAL) in the two modes. Peltonen (2018) investigated differences between L1 and L2 fluency. In this explanatory sequential study, the researcher asked: “To what extent can measures of L2 fluency be predicted from L1 fluency measures?” For this question,

Chapter 4. Mixed methods research in ISLA 85

the researcher analyzed temporal measures (e.g., lengths of pauses) (QUAN). The following research question was: “What kinds of individual differences can be found in the connections between L1 and L2 fluency with a qualitative analysis of learners’ productions with high frequencies of stalling mechanisms?” For this question, the researcher conducted QUAL transcript analyses of focused participants chosen based on the extent of stalling mechanisms they used. Using the exploratory se quential design, Rahmati et al. (2019) explored EFL teachers’ vision and motivation to teach. The first QUAL component was conducted via interviews whose results were used to answer the following two research questions: “Do Iranian L2 teachers possess a vision of their future professional selves?” and “What are the components (ideal/ought-to/feared self) of Iranian L2 teachers’ vision?” The coded data and mo tivation questionnaire scores were then submitted to statistical analyses (QUAN) to answer the final research question: “Is there any relationship between Iranian L2 teachers’ vision and motivation?” 3. Common options for mixed methods research design Designs are procedures for collecting, analyzing, and interpreting data in a study (see Chapter 1 of this volume). MMR designs have evolved over years, and there are no correct MMR designs per se (Maxwell, 2016). Different designs have been proposed with their variants within. Overall, the bottom line is that QUAN and QUAL approaches need to be “mixed in ways that offer the best opportunities for answering important research questions” (Johnson & Onwuegbuzie, 2004, p. 16). QUAN and QUAL components do not have to be given the same weight. In some studies, the QUAN component takes the primary role, and in others the QUAL component is more important. In the MMR literature, the difference in weight is sometimes indicated by capital and lower letters (QUAN vs. quan). In this chap ter, I will explain the three core designs discussed by Creswell and Plano Clark (2018): convergent, explanatory sequential, and exploratory sequential. The three designs differ in terms of (a) the ordering of QUAN and QUAL components and (b) how QUAN and QUAL components are integrated. While in the convergent design, QUAN and QUAL data are concurrently collected, in the explanatory and exploratory designs, QUAN and QUAL data are collected at different time periods. The decision as to when and how the QUAN and QUAL components are integrated necessarily affects how the researcher interprets the results and ar rives at the conclusion (Plano Clark, 2019). Consequently, I focus on how QUAN and QUAL components can be integrated and presented in writing a manuscript for publications.

86 Masatoshi Sato

Table 1 includes ISLA studies conducted with MMR designs. As many of those studies did not state which specific MMR design was used, I retrospectively catego rized them into the three overall designs: convergent, explanatory sequential, and exploratory sequential. In the table, objectives are included for which QUAN and QUAL components were used. Data collection/analysis indicates main instruments used for QUAN and QUAL components. Integration suggests the weight given to QUAN and QUAL, according to my reading, and how QUAN and QUAL compo nents were integrated. The collection of studies is not meant to be comprehensive; rather, I chose different research topics with different MMR designs to show MMR’s versatility. Due to the volume, MMR studies in the last few years (2017-) have been selected, also to underscore recent relevant topics in ISLA research. 3.1

Convergent design

The convergent design is useful when either a QUAN or QUAL method alone is insufficient in answering the research question(s). In this design, QUAN and QUAL components complement each other to meet the research objective. QUAN (e.g., eye tracking) and QUAL (e.g., verbal protocol during eye tracking) data are collected concurrently, but the datasets are usually independent of each other. However, QUAN and QUAL components can be included in a single data col lection tool as well (e.g., a questionnaire with Likert-scale items and open-ended items). Then, the QUAN and QUAL results are compared in an integrative way (see Figure 1). Convergence can be done at the stage of data analysis as well. For instance, QUAL results from a thematic analysis of interview data can be submitted to inferential statistics, such as an exploratory factor analysis (QUAN), after tallying the frequencies of the coded themes. Finally, the results are interpreted with a focus on similarities and differences between the QUAN and QUAL results. QUAN data collection/analysis Results integrated

Interpretation

QUAL data collection/analysis

Figure 1. Convergent design (QUAN + QUAL) Note. Adapted from Creswell and Plano Clark (2018) and Plano Clark (2019)

Chapter 4. Mixed methods research in ISLA 87

The convergent design is ideal for conducting classroom-based ISLA studies. QUAN or QUAL methods in isolation are often insufficient for investigating a problem in the complex learning environment of the classroom. For instance, a questionnaire (QUAN) may not tap into the dynamic nature of L2 motivation; classroom observation or interviews (QUAL) may provide the researcher with different or novel aspects of L2 motivation in the classroom. The impact of an intervention on the development of a grammatical structure may be adequately examined with an experimental design (QUAN). However, if the researcher is interested in the feasibility and sustainability of the intervention as part of “the impact of intervention,” QUAL methods, such as interviewing the teacher and learners asking about their experiences with the intervention, become useful. When a researcher poses an exploratory question (e.g., What is engagement in the classroom?), classroom observation data including anything the researcher sees (QUAL) can be coded and tallied for correlational analyses of the relationships between the different classroom phenomena (QUAN). The majority of ISLA MMR studies employ the convergent design in one of two ways: (A) a questionnaire with Likert-scale items and open-ended items, and (B) a questionnaire and interviews with focus participants. In Type A, the QUAL component serves as additional data to add meat on the bone (i.e., QUAN re sults). For instance, Resnik and Dewaele (2020) examined additional language anxiety and enjoyment in L1 and LX (L2) classes with questionnaire responses from 768 learners of English. The details are summarized in the call-out box. In Type B, QUAN and QUAL components are used with relatively equal weights to scaffold information to answer a research question. For instance, Brutt-Griffler and Jang (2019) examined the impact of a newly designed English-Spanish dual language program in the United States. To evaluate the program, the following datasets were collected and analyzed separately: grades of English Language Arts and Math (QUAN), self-assessed proficiency (QUAN), a questionnaire of student engagement (QUAN), semi-structured focus group interviews (QUAL). The QUAN and QUAL components were then integrated into discussion of the impact of the dual language program. Exemplar study (Convergent design): Resnik & Dewaele (2020) Research topic L2 enjoyment and anxiety Research questions RQ1: To what extent do enjoyment and anxiety ratings differ in and across L1 and LX classes? RQ2: How strongly are enjoyment and anxiety ratings linked in each context, across contexts and how strongly is trait emotional intelligence linked to both? RQ3: What can a qualitative analysis reveal about the feelings and emotions that students experience in L1 and LX classes?  ie-1017

ie-1018

88

Masatoshi Sato

Methods The researchers used a questionnaire with a large number of participants (768 secondaryand tertiary-level learners of English in European countries). The questionnaire included already-validated scales (for enjoyment and anxiety) as well as open-ended items. Take-aways While the rigor of the QUAL component in a convergent design tends to be weaker than the QUAN component in many studies, Resnik and Dewaele thoroughly analyzed the QUAL data by focusing on emotion-laden words. This was possible because of the number of participants producing a large amount of QUAL data, albeit with only two open-ended questions. The QUAN and QUAL results are given relatively comparable attention throughout the results and discussion sections in answering all RQs. The QUAN and QUAL visual aids (tables and figures) are noteworthy.

3.2

Explanatory sequential design

In an explanatory sequential design, QUAN data collection and analysis are con ducted first. Then, a QUAL component is used to explain the results from the QUAN component (see Figure 2). Note that the use of QUAL components is different from that of the convergent design; in the convergent design, QUAL data complements QUAN data, or vice-versa, in order to investigate one research problem, whereas in the explanatory sequential design, QUAL data collection/analysis follows the QUAN results. Therefore, QUAL research questions and instruments are sometimes devel oped based on the (predicted) QUAN findings. For instance, interview participants can be purposefully selected from the QUAN database. The interview questions can be developed based on the patterns found in the QUAN results. In the explanatory sequential design, the QUAN component often takes the primary role and the QUAL component is used to dive deeper into the QUAN results. The design is useful when a researcher encounters QUAN results that are surprising or confusing. When this happens, QUAL methods may be used to re-evaluate the validity and quality of QUAN tools (e.g., a survey tool, an inter vention material, testing materials). For instance, a Cronbach alpha of a question naire may not reach the threshold level or the structural results of a questionnaire

QUAN data collection/analysis

Results connected to and explained by

QUAL data collection/analysis

Interpretation (dive deeper)

Figure 2. Explanatory sequential design (QUAN → QUAL) Note. Adapted from Creswell and Plano Clark (2018) and Plano Clark (2019)

Chapter 4. Mixed methods research in ISLA 89

(e.g., structural equation modelling) may contradict previous research (QUAN). Then, a researcher can design a follow-up QUAL method investigating the va lidity of the questionnaire, which may open up a new theoretical direction of the given construct. The exploratory sequential design is compatible with PBR. For instance, when the newly-developed pedagogical intervention did not work as predicted (QUAN), the researcher may go back to the original classroom and conduct classroom observation (QUAL) to qualitatively explore why it did not work. Interviewing students and the teacher about the intervention may shed light on issues related to the intervention. Furthermore, QUAL methods can be used to explain unique cases (i.e., outliers) in the QUAN results, which may lead to another MMR study. In the field of ISLA, Andujar (2020) examined the effects of dynamic assess ment (scaffolded feedback) provided via mobile instant messaging (WhatsApp). The first QUAN phase used an experimental design in which the experimental group received feedback from the teacher. Results from the pre- and post-tests were statistically analyzed. In the following QUAL phase, the researcher analyzed tran scripts of teacher-student interactions on WhatsApp, which explained how feed back facilitated L2 development found in the QUAN component. The sequential design can be complex. Bryfonski and Sanz (2018) investigated corrective feedback that L2 learners received during study abroad. QUAN and QUAL components were used in “multiple iterations to inform and follow up on one another” (p. 9). The details are provided in the call-out box. Exemplar study (Explanatory sequential design): Bryfonski & Sanz (2018) Research topic Corrective feedback during study abroad Research questions RQ1: What is the relationship between the time spent abroad and the amount and type of corrective feedback provided in conversation groups? RQ2: What differences exist between the corrective feedback provided by native speakers versus nonnative peer interlocutors during study abroad conversation groups? RQ3: What is the role of the L1 in facilitating opportunities to engage with corrective feedback in these conversation groups? Methods The researchers collected data from US university-level L2-Spanish learners who were in a study abroad program in Spain. During and after study abroad, multiple data sets were collected: (a) audio-recordings during conversation groups; (b) semi-structured follow-up interviews; and (c) tailor-made L2 tests. First, corrective feedback episodes and L1 uses were tallied to answer RQs 1, 2, and 3. The frequencies of those indices were statistically analyzed (QUAN). The results informed the tailor-made tests. Subsequently, the interview data was submitted to thematic analysis (QUAL) in order to understand the QUAN findings.

90 Masatoshi Sato

Take-aways What is unique in this MMR study is the sequence of QUAN and QUAL data collections at different time points during the study. While the effectiveness of feedback on L2 development had been shown by many previous studies, the QUAL component (learner perceptions of feedback) increased our understanding of how feedback works in more nuanced ways. It is also interesting that all RQs were answered using both QUAN and QUAL results.

3.3

Exploratory sequential design

The exploratory sequential design starts with a QUAL component. The QUAL results are then used to develop an approach or instrument used in the following QUAN component (see Figure 3). This way, the QUAN instrument will be “grounded in the views of participants” (Creswell & Plano Clark, 2018, p. 84). This design is particu larly useful when a researcher wants to avoid using an already-existing QUAN tool (e.g., a questionnaire) that may be inappropriate for their study. Also, it is useful when QUAN instruments that are necessary for answering the research question (e.g., testing tools, experimental activities, coding schemes) do not currently exist. More globally, an exploratory sequential design can be used when there is no guid ing theoretical framework or set variables to investigate the research problem. In general, in the exploratory sequential design, the QUAL component takes a more important role in the study with a bottom-up approach. The exploratory sequential design may be compatible with PBR and equipped to increase ecological validity of a study. For instance, a researcher can consult with teachers to develop intervention material (QUAL) before testing its impact on L2 learning (QUAN). A researcher can interview students (QUAL) in order to adjust and implement an already existing questionnaire before distributing it to a larger group of participants (QUAN). A researcher can observe classes to explore context-specific variables (QUAL) that could be integrated into the development and testing of the intervention (QUAN) as well. Furthermore, a researcher can ask teachers a global (yet important) question: What are pedagogical issues you

QUAL data collection/analysis

Results connected to and build to

QUAN data collection/analysis

Interpretation (generalize)

Figure 3. Exploratory sequential design (QUAL → QUAN) Note. Adapted from Creswell and Plano Clark (2018) and Plano Clark (2019)

Chapter 4. Mixed methods research in ISLA 91

are currently facing? The researcher then analyzes the QUAL findings in order to explore a theoretical framework or to develop an intervention whose impact can be examined with QUAN methods. ISLA studies with the exploratory sequential design have been scant. This might be due to researchers’ tendency – ISLA or otherwise – to base research objectives on previous theories, methodologies, and data collection/analysis in struments, as opposed to exploring novel research objectives. One of a few studies is Rahmati et al. (2019) who used the exploratory sequential design for instrument development. First, to explore context-specific motivational variables of L2 teach ers in the Iranian context, the researchers conducted semi-structured interviews (n = 10). Based on the QUAL results, a questionnaire was developed whose re sults were compared with an already-existing questionnaire of teacher motivation (n = 211). Another exemplary study is Liao and Li (2020) who investigated teacher perceptions and practices of intercultural competence. The details are provided in the call-out box. Exemplar study (Exploratory sequential design): Liao & Li (2020) Research topic Teacher perceptions and practices of intercultural competence teaching Research questions RQ1: How do the selected instructors define intercultural English teaching in practice? What is the prevalence of these perceptions? RQ2: What are the characteristics of the intercultural pedagogical approaches adopted by the selected instructors? What is the prevalence of these characteristics? RQ3: How are these pedagogical approaches received by the students enrolled in the selected English courses? Methods Because there were no existing observation tools of the teaching of intercultural competence, the researchers first conducted QUAL data collection and analysis entailing: teaching materials, interviews, class observations, and reflective journals (RQs 1 and 2). 10 teachers participated in this stage. The emerged themes from the QUAL data were then tallied and submitted to descriptive statistics that produced effect sizes (QUAN). In answering RQ3, students’ individual and focus-group interview data were used (QUAL). Take-aways The use of the QUAL components in this study is exemplary not only because it produced rigorous data but also because it is methodologically justified throughout the manuscript. In the methods section, there is a subsection devoted to justifying the use of MMR in answering the RQs, along with a visual representation of the design. Although the QUAN component could have been more rigorous, its integration into the QUAL results is exceptional.

92

Masatoshi Sato

3.4

Integrating results and writing up a manuscript

How QUAN and QUAL results are integrated is central to MMR and is “what sepa rates a mixed methods study from a study that happens to include some quantitative information and some qualitative information” (Plano Clark, 2019, p. 108: emphasis added). There are two overall ways of integration. First, you can merge QUAN and QUAL results in order to answer the research question holistically and comprehen sively. This way, you can answer the research question posed within the convergent design. Second, you can connect QUAN and QUAL results in a sequential design. Onwuegbuzie and Teddlie (2003) proposed various ways of integration. For in stance, data transformation can be used to make QUAL data (e.g., coded themes from group work transcripts) into QUAN data (e.g., frequency of themes submitted to correlation analyses). Data consolidation involves combining QUAN (e.g., survey of teacher motivation) and QUAL data (e.g., interviews with teachers) to arrive at a new variable (e.g., teacher de-motivators; see Sato et al., 2022). Integration can be presented with a visual display as well, via forms of tables, matrices, or figures, depicting how QUAN and QUAL results were integrated. During integration, you make an important decision as to which component is given more weight. After deciding how to integrate the QUAN and QUAL components, the final step is to write up the study. First, it is advisable to frame your study as an MMR study so that the reader can be ready to consume the background and progression of your study. You can do this in the title of your study, abstract, and/or the intro ductory part where you explain your research problem and objective. Framing your study as MMR may be important especially for exploratory sequential de signs; when your study is unconventional and challenges a pre-existing theoretical and/or methodological framework, explaining MMR at the outset may prevent the reader’s confusion. Explaining that you aimed to collect more comprehensive data (convergent) or follow-up data (explanatory) may be straightforward. However, again, explaining the innovative data collection of your study (exploratory) requires MMR as a backup. After setting up your study as MMR, the next MMR-specific writing appears in the methods section. Explaining your specific MMR design would help avoid the readers’ impression that you simply cast a wide net without much precision or you did not have a clear plan for your study. In other words, MMR provides a legitimate methodological framework for your pragmatic approach to the research problem. The final section is where you discuss the QUAN and QUAL findings. For the convergent design, it is crucial that the QUAN and QUAL results are inte grated into answering the research question, at least somewhere in the discussion section. Unintegrated discussion for the QUAN and QUAL components may ap pear as though they are independent research projects with different objectives.

Chapter 4. Mixed methods research in ISLA 93

For sequential designs as well, the QUAN and QUAL results need to be integrated; however, unlike the convergent design, it sometimes makes sense to discuss the QUAN and QUAL results separately because the data collections were conducted sequentially with different objectives. Not surprisingly, the amount of data collected in MMR tends to be larger than that of QUAN or QUAL alone. Consequently, it is often difficult to include both QUAN and QUAL data in a single paper. It is increasingly common to report differ ent portions of an MMR study in multiple publications. In this case, you can explain that the larger project used MMR and refer the reader to another publication in which the other component is reported. Table 1. ISLA studies with MMR designs Design Study

Objective

Data collection/analysis

Integration

To examine L2 learners’ perceptions of social media as a learning tool Kormos & To explore the Préfontaine relationship (2017) between learner affects and L2 fluency during a task Li et al. To investigate (2021) teacher beliefs of learner gender and L2 learning Nakatsuhara To compare face-toet al. (2017) face vs. online speaking test modes

QUAN: questionnaire QUAL: interviews; open-ended items in the questionnaire

(QUAN > QUAL): QUAL to support part of QUAN component

QUAN: temporal measures of L2 speech; questionnaire of learner affects QUAL: interviews of learner affects QUAN: questionnaire QUAL: interviews

(QUAN > QUAL): QUAL+QUAL to triangulate learner affects data

Resnik & Dewaele (2020)

Révész et al. To examine the (2019) cognitive processes of pause and revision behaviors during L2 writing

(QUAN > QUAL): QUAL to add information to QUAN component QUAN: keystroke logging; (QUAN ≈ QUAL): Both used in eye-tracking during L2 interpretation writing QUAL: stimulated recall interviews

Convergent (QUAN+QUAL)

Aloraini & Cardoso (2020)

QUAN: test scores; performance during tests QUAL: verbal reports; observation notes during speaking tests To compare anxiety QUAN: questionnaire and enjoyment in QUAL: open-ended items L1 vs. L2 classes in the questionnaire

(QUAN ≈ QUAL): Both used in interpretation (QUAN ≈ QUAL): Both used in interpretation

(continued)

94 Masatoshi Sato

Table 1. (continued) Design Study

Objective

Data collection/analysis

Integration

SánchezHernández (2018)

QUAN: sociocultural adaptation scale; pragmatics test QUAL: in-depth interviews after study abroad

(QUAN > QUAL): QUAL to add information to QUAN component

Tsang (2020)

To examine the effect of sociocultural adaptation on the development of pragmatic production during study abroad To investigate learner perceptions of teachers’ accents

QUAN: rating of different accents QUAL: interviews of different accents

(QUAN ≈ QUAL): Both used in interpretation

QUAN: developmental test; frequency of feedback QUAL: interaction between the teacher and students on WhatsApp QUAN: grade scores; task completion scores; questionnaire QUAL: interviews QUAN: the number and types of feedback over time; tailor-made L2 test QUAL: interviews of study abroad experience; stimulated recall of feedback episodes QUAN: temporal measures of L1 and L2 speech samples QUAL: in-depth transcript analyses of learners with high frequencies of stalling mechanisms QUAN: questionnaire of language engagement QUAL: questionnaire and interview in-depth inter views after study abroad QUAN: experimental design QUAL: transcripts of interaction; interviews

(QUAN > QUAL): QUAL to give additional interpretation of QUAN results (QUAN ≈ QUAL): QUAL on participants chosen from QUAN results (QUAN ≈ QUAL): Both sequentially used to give a broader picture for the objective

Explanatory sequential (QUAN→QUAL)

Andujar (2020)

Bakla (2018)

To examine the impact of mobile-mediated dynamic assessment

To explore learner-generated content in a flipped class Bryfonski & To investigate the Sanz (2018) opportunities and effects of corrective feedback during study abroad

Peltonen (2018)

To investigate the effects of L1 fluency on L2 fluency

Mitchell To explore et al. (2020) L2 learners’ engagement changes after study abroad

 ie-1120

ie-1121

Rahimi & To examine the Fathi (2021) impact of wiki-based collaborative writing on L2 writing

(QUAN > QUAL): QUAL to explain individual differences found in QUAN results (QUAN < QUAL): QUAN to give background information for QUAL component (QUAN ≈ QUAL): QUAL to explain differences found in the experimental results

Chapter 4. Mixed methods research in ISLA 95

Table 1. (continued) Design Study

Objective

Data collection/analysis

Integration

Sasaki et al. (2018)

To trace the development of language learning strategies

(QUAN > QUAL): QUAL to explain QUAN results

Teng et al. (2020)

To examine the relationship between motivational regulation strategies and writing skills

QUAN: proficiency test; writing test; coded protocol data QUAL: verbal protocol; interviews QUAN: questionnaire of motivational regulation strategies QUAL: interviews

QUAL: L2 texts QUAN: reliability of the developed coding scheme; comparison between L1 and L2 writers QUAL: teaching materials; interviews; class observations; reflective journals QUAN: descriptive statistics on the emerged themes from QUAL QUAL: interviews QUAN: questionnaire developed from QUAL QUAL: interviews QUAN: questionnaire; experimental design

(QUAN > QUAL): QUAL to develop reliable coding scheme

(QUAN > QUAL): QUAL on participants chosen from QUAN results

Exploratory sequential (QUAL→QUAN)

Doolan (2021)

To compare how L1 and L2 writers use the original source test

Liao & Li (2020)

To explore pedagogical practices of intercultural competence

Rahmati To compare et al. (2019) teacher vision and motivation SalvadorTo explore the García et al. impact of CLIL in (2020) physical education classes

(QUAN < QUAL): QUAL to arrive at themes to statistically compare different participants (QUAN ≈ QUAL): QUAL to inform QUAN instrument (QUAN ≈ QUAL): QUAL to inform QUAN design and instruments

4. Advice for future mixed methods researchers 4.1

Plan ahead and choose the right MMR design

As much as MMR is suited for investigating complex issues related to L2 teaching and learning, designing and conducting an MMR study usually takes longer than conducting a QUAN or QUAL study. Simply, you need to prepare both QUAN and QUAL data collection/analysis instruments. Importantly, however, an MMR study can be either fixed or emergent. In the emergent approach, you can be reactive and

96 Masatoshi Sato

design a new phase of data collection (as in the explanatory and exploratory se quential designs). Alternatively, you may plan to use the convergent design. Either way, you must be cognizant of the abilities and limitations inherent in the research design, and be actively working to minimize the limitations, whether in the initial planning stages, or as a retrospective understanding of the results of your study. I list some guiding questions that may help the reader decide which MMR design to use: Do I want to collect more comprehensive data than QUAN or QUAL data alone (convergent)? Do I want to collect data that explains the statistical results (convergent or explanatory)? Is my statistical design exploratory in nature, and do I anticipate unexpected findings (explanatory)? Do I believe the current theoretical framework is adequate for investigating my research problem (exploratory)? Do I have appropriate data collection tools for answering my research question (exploratory)? Depending on your answer, you can decide which MMR design is the best for your study. More complex MMR designs with different combinations and integrations of QUAN and QUAL components can be found in Creswell and Plano Clark (2018). Importantly, both QUAN and QUAL components need to be sufficiently rigorous, as Creswell and Plano Clark (2018) cautioned, “[j]ust because a study is mixed methods does not mean that the researchers can relax the standards for qualitative and quantitative research” (p. 182). 4.2

Collaborate with other researchers

Although discussing the paradigmatic and philosophical issues surrounding MMR may not be central to publishing a MMR study (but see King & Mackey, 2016; Riazi, 2016), skills of conducting MMR are necessary. I have seen many manuscripts in which L2 learners’ perceptions of a classroom intervention were explored. In those cases, the impact of the intervention was examined with a QUAN method, and learners’ perceptions were explored with a QUAL method. However, in some studies, the QUAL method (e.g., the development of interview questions and analysis of the interview data) is insufficiently justified and explained. I have also seen manuscripts in which inferential statistics were poorly executed although the QUAL component was strongly developed and executed. When you – whether QUAN-oriented or QUAL-oriented – do not have adequate skills in executing both QUAN and QUAL components, the best solution is to collaborate with other re searchers whose skill set complements your methodological skills. Not only does this type of collaboration lead to strong MMR research but it will contribute to collective efforts in examining an ISLA issue that is typically addressed by QUAN and QUAL researchers separately.

Chapter 4. Mixed methods research in ISLA 97

4.3

Collaborate with practitioners

MMR is ideal for conducting studies with high practical relevance and incorpo rating the participation of individuals with diverse backgrounds and skill sets (see the section of PBR). This way, your study is more likely to meet the ethical stand ards recommended for ISLA research as well (see Chapter 6). As Ortega (2012) argued, “interrogating the moral ends of research is of the utmost importance to enable good research, in the dual sense of more ethically useful and more valid research” (p. 210). 4.4 Be open-minded, critical, and creative MMR is useful when you encounter unexpected findings. For instance, in a conver gent design, the results of the QUAN and QUAL components may diverge. Rather than considering this as null findings or failed data collection, you could interpret it as new findings and report it as is. Subsequently, you can add a new MMR com ponent with an explanatory sequential design. Alternatively, the new component can use an exploratory sequential design in which you go back to the drawing board and generate a new theoretical direction. (I)SLA is a relatively new field; there are many unexplored research issues. Such an open-minded and critical attitude counters the thinking that statistically non-significant findings are automatically unpublishable. Meanwhile, ISLA research is increasingly incorporating theoretical frameworks and data collection instruments from other fields such as educational psychology. Assuming that already-existing instruments are appropriate for ISLA research runs a risk of low internal validity. In this sense, a creative MMR design could either (dis)confirm the already-existing framework (via the explanatory se quential design) or pave a path for a new research direction specific to ISLA (via the exploratory sequential design). 5. Conclusions In this chapter, I argued that MMR is a pragmatic approach to complex ISLA research problems. I also discussed that MMR can challenge researchers’ the oretical and methodological assumptions, and, as a result, it can cultivate their open-mindedness, critical thinking, and creativity. An additional benefit of MMR as a methodological approach is that it helps the cause of ISLA research shared by researchers and practitioners – to discover more effective and efficient pedagog ical techniques and materials. In pursuing this goal, collaboration is key. Within

98 Masatoshi Sato

the researcher community, MMR provides a framework for a project in which QUAN-oriented and QUAL-oriented researchers work together. As Spada (2019) stated, what is important for facilitating the use of research in the classroom is that “researchers working in the different traditions make concerted efforts to be as methodologically rigorous and as ecologically valid as possible within the con straints of their respective methodologies” (p. 214). Between the researcher and practitioner communities, MMR can be used to incorporate practitioners’ voices and experiences in designing and implementing an ISLA study so that “mutual respect and effective communication” (Goldstein et al., 2019, p. 47) can be achieved. 6. Further reading Readers can find more MMR studies of specific topics such as pragmatics (Ross & Hong, 2019; Taguchi, 2018), language testing (Jang et al., 2014; Turner, 2014), and social interaction (Dewey, 2017). For general reviews of MMR studies in the field of applied linguistics, see: Hashemi and Babaii (2013); Hashemi and Gohari Moghaddam (2019); Ivankova and Creswell (2009); and Mackey and Bryfonski (2018). Creswell and Plano Clark (2018), and Tashakkori and Teddlie (2003) are classic MMR resources. Creswell, J., & Plano Clark, V. (2018). Designing and conducting mixed methods research (3 ed.). Sage. Dewey, D. P. (2017). Measuring social interaction during study abroad: Quantitative methods and challenges. System, 71, 49–59. https://doi.org/10.1016/j.system.2017.09.026 Hashemi, M. R., & Babaii, E. (2013). Mixed methods research: Toward new research designs in applied linguistics. Modern Language Journal, 97(4), 828–852. https://doi.org/10.1111/j.1540-4781.2013.12049.x Hashemi, M. R., & Gohari Moghaddam, I. (2019). A mixed methods genre analysis of the dis cussion section of MMR articles in applied linguistics. Journal of Mixed Methods Research, 13(2), 242–260. https://doi.org/10.1177/1558689816674626 Ivankova, N. V., & Creswell, J. W. (2009). Mixed methods. In J. Heigham & R. A. Croker (Eds.), Qualitative research in applied linguistics: A practical introduction (pp. 135–161). Palgrave Macmillan. Jang, E. E., Wagner, M., & Park, G. (2014). Mixed methods research in language testing and assessment. Annual Review of Applied Linguistics, 34, 123–153. https://doi.org/10.1017/S0267190514000063 Mackey, A., & Bryfonski, L. (2018). Mixed methodology. In A. Phakiti, P. De Costa, L. Plonsky, & S. Starfield (Eds.), The Palgrave handbook of applied linguistics research methodology (pp. 103–121). Springer. https://doi.org/10.1057/978-1-137-59900-1_5 Ross, S. J., & Hong, Y. (2019). Mixed methods in L2 pragmatics research. In N. Taguchi (Ed.), The Routledge handbook of second language acquisition and pragmatics (pp. 212–225). Routledge. https://doi.org/10.4324/9781351164085-14

Chapter 4. Mixed methods research in ISLA 99

Taguchi, N. (2018). Description and explanation of pragmatic development: Quantitative, qual itative, and mixed methods research. System, 75, 23–32. https://doi.org/10.1016/j.system.2018.03.010 Tashakkori, A., & Teddlie, C. (Eds.). (2003). Handbook on mixed methods in the behavioral and social sciences. Sage. Turner, C. E. (2014). Mixed methods research. In A. J. Kunnan (Ed.), The companion to language assessment (pp. 1403–1417). Wiley Blackwell. https://doi.org/10.1002/9781118411360.wbcla142

References Aloraini, N., & Cardoso, W. (2020). Social media in language learning: A mixed-methods in vestigation of students’ perceptions. Computer Assisted Language Learning. Advance online publication. https://doi.org/10.1080/09588221.2020.1830804 Andujar, A. (2020). Mobile-mediated dynamic assessment: A new perspective for second lan guage development. ReCALL, 32(2), 178–194. https://doi.org/10.1017/S0958344019000247 Bakla, A. (2018). Learner-generated materials in a flipped pronunciation class: A sequential explanatory mixed-methods study. Computers & Education, 125, 14–38. https://doi.org/10.1016/j.compedu.2018.05.017 Borg, S., & Liu, Y. (2013). Chinese college English teachers’ research engagement. TESOL Quarterly, 47(2), 270–299. https://doi.org/10.1002/tesq.56 Brutt-Griffler, J., & Jang, E. (2019). Dual language programs: an exploration of bilingual students’ academic achievement, language proficiencies and engagement using a mixed methods ap proach. International Journal of Bilingual Education and Bilingualism, 1–22. https://doi.org/10.1080/13670050.2019.1616670 Bryfonski, L., & Sanz, C. (2018). Opportunities for corrective feedback during study abroad: A mixed methods approach. Annual Review of Applied Linguistics, 38, 1–32. https://doi.org/10.1017/S0267190518000016 Coburn, C. E., & Penuel, W. R. (2016). Research-practice partnerships in education: Outcomes, dynamics, and open questions. Educational Researcher, 45(1), 48–54. https://doi.org/10.3102/0013189X16631750 Cukurova, M., Luckin, R., & Baines, E. (2018). The significance of context for the emergence and implementation of research evidence: The case of collaborative problem-solving. Oxford Review of Education, 44(3), 322–337. https://doi.org/10.1080/03054985.2017.1389713 Denscombe, M. (2008). Communities of practice: A research paradigm for the mixed methods approach. Journal of Mixed Methods Research, 2(3), 270–283. https://doi.org/10.1177/1558689808316807 Doolan, S. M. (2021). An exploratory analysis of source integration in post-secondary L1 and L2 source-based writing. English for Specific Purposes, 62, 128–141. https://doi.org/10.1016/j.esp.2021.01.003 Egeberg, H. M., McConney, A., & Price, A. (2016). Classroom management and national profes sional standards for teachers: A review of the literature on theory and practice. Australian Journal of Teacher Education, 41(7), 1–18. https://doi.org/10.14221/ajte.2016v41n7.1 Farley-Ripple, E., May, H., Karpyn, A., Tilley, K., & McDonough, K. (2018). Rethinking connec tions between research and practice in education: A conceptual framework. Educational Researcher, 47(4), 235–245. https://doi.org/10.3102/0013189X18761042

100 Masatoshi Sato

Ghiara, V. (2020). Disambiguating the role of paradigms in mixed methods research. Journal of Mixed Methods Research, 14(1), 11–25. https://doi.org/10.1177/1558689818819928 Goldstein, H., McKenna, M., Barker, R. M., & Brown, T. H. (2019). Research-practice partner ship: Application to implementation of multi-tiered system of supports in early childhood education. Perspectives of the ASHA Special Interest Groups, 4(1), 38–50. https://doi.org/10.1044/2018_PERS-ST-2018-0005 Hargreaves, D. (2000). Teaching as a research-based profession: Possibilities and prospects. In B. Moon, J. Butcher, & E. Bird (Eds.), Leading professional development in education (pp. 200–210). Routledge. Hulstijn, J. H., Young, R. F., Ortega, L., Bigelow, M., DeKeyser, R., Ellis, N. C., Lantolf, J. P., Mackey, A., & Talmy, S. (2014). Bridging the gap: Cognitive and social approaches to re search in second language learning and teaching. Studies in Second Language Acquisition, 36(3), 361–421. https://doi.org/10.1017/S0272263114000035 Johnson, R., & Onwuegbuzie, A. (2004). Mixed methods research: A research paradigm whose time has come. Educational Researcher, 33(7), 14–26. https://doi.org/10.3102/0013189X033007014 Johnson, R. B., Onwuegbuzie, A. J., & Turner, L. A. (2007). Toward a definition of mixed meth ods research. Journal of Mixed Methods Research, 1(2), 112–133. https://doi.org/10.1177/1558689806298224 Joyce, K. E., & Cartwright, N. (2020). Bridging the gap between research and practice: Predicting what will work locally. American Educational Research Journal, 57(3), 1045–1082. https://doi.org/10.3102/0002831219866687 King, K. A., & Mackey, A. (2016). Research methodology in second language studies: Trends, concerns, and new directions. Modern Language Journal, 100(S1), 209–227. https://doi.org/10.1111/modl.12309 Kormos, J., & Préfontaine, Y. (2017). Affective factors influencing fluent performance: French learn ers’ appraisals of second language speech tasks. Language Teaching Research, 21(6), 699–716. https://doi.org/10.1177/1362168816683562 Larsen-Freeman, D. (2015). Research into practice: Grammar learning and teaching. Language Teaching, 48(2), 263–280. https://doi.org/10.1017/S0261444814000408 Leite, C., Fernandes, P., & Figueiredo, C. (2020). National curriculum vs. curricular contextual isation: Teachers’ perspectives. Educational Studies, 46(3), 259–272. https://doi.org/10.1080/03055698.2019.1570083 Levin, B. (2013). To know is not enough: Research knowledge and its use. Review of Education, 1(1), 2–31. https://doi.org/10.1002/rev3.3001 Li, J., McLellan, R., & Forbes, K. (2021). Investigating EFL teachers’ gender-stereotypical beliefs about learners: A mixed-methods study. Cambridge Journal of Education, 51(1), 19–44. https://doi.org/10.1080/0305764X.2020.1772720 Liao, H., & Li, Y. (2020). Intercultural teaching approaches and practices of Chinese teachers in English education: An exploratory mixed methods study. Language Teaching Research. Advance online publication. https://doi.org/10.1177/1362168820971467 Loewen, S. (2020). Introduction to instructed second language acquisition (2nd ed.). Routledge. https://doi.org/10.4324/9781315616797 Loewen, S., & Hui, B. (2021). Small samples in instructed second language acquisition research. Modern Language Journal, 105(1), 187–193. https://doi.org/10.1111/modl.12700 I207

Chapter 4. Mixed methods research in ISLA 101

Marsden, E., & Kasprowicz, R. (2017). Foreign language educators’ exposure to research: Reported experiences, exposure via citations, and a proposal for action. Modern Language Journal, 101(4), 613–642. https://doi.org/10.1111/modl.12426 Maxcy, S. J. (2003). Pragmatic threads in mixed methods research in the social sciences: The search for multiple modes of inquiry and the end of the philosophy of formalism. In A. Tashakkori & C. Teddlie (Eds.), Handbook of mixed methods in social & behavioral research (pp. 51–89). Sage. Maxwell, J. A. (2016). Expanding the history and range of mixed methods research. Journal of Mixed Methods Research, 10(1), 12–27. https://doi.org/10.1177/1558689815571132 McIntyre, D. (2005). Bridging the gap between research and practice. Cambridge Journal of Education, 35(3), 357–382. https://doi.org/10.1080/03057640500319065 Mitchell, R., Tracy-Ventura, N., & Huensch, A. (2020). After study abroad: The maintenance of multilingual identity among anglophone languages graduates. Modern Language Journal, 104(2), 327–344. https://doi.org/10.1111/modl.12636 Nakatsuhara, F., Inoue, C., Berry, V., & Galaczi, E. (2017). Exploring the use of video-conferenc ing technology in the assessment of spoken language: A mixed-methods study. Language Assessment Quarterly, 14(1), 1–18. https://doi.org/10.1080/15434303.2016.1263637 Onwuegbuzie, A., & Teddlie, C. (2003). A framework for analyzing data in mixed methods research. In A. Tashakkori & C. Teddlie (Eds.), Handbook of mixed methods in social & behavioral research (pp. 351–383). Sage. Ortega, L. (2012). Epistemological diversity and moral ends of research in instructed SLA. Language Teaching Research, 16(2), 206–226. https://doi.org/10.1177/0267658311431373 Peltonen, P. (2018). Exploring connections between first and second language fluency: A mixed methods approach. Modern Language Journal, 102(4), 676–692. https://doi.org/10.1111/modl.12516 Plano Clark, V. L. (2019). Meaningful integration within mixed methods studies: Identifying why, what, when, and how. Contemporary Educational Psychology, 57, 106–111. https://doi.org/10.1016/j.cedpsych.2019.01.007 Rahimi, M., & Fathi, J. (2021). Exploring the impact of wiki-mediated collaborative writing on EFL students’ writing performance, writing self-regulation, and writing self-efficacy: A mixed methods study. Computer Assisted Language Learning. Advance online publication. https://doi.org/10.1080/09588221.2021.1888753 Rahmati, T., Sadeghi, K., & Ghaderi, F. (2019). English language teachers’ vision and motivation: Possible selves and activity theory perspectives. RELC Journal, 50(3), 457–474. https://doi.org/10.1177/0033688218777321 Resnik, P., & Dewaele, J.-M. (2020). Trait emotional intelligence, positive and negative emotions in first and foreign language classes: A mixed-methods approach. System, 94, 102324. https://doi.org/10.1016/j.system.2020.102324 Révész, A., Michel, M., & Lee, M. (2019). Exploring second language writers’ pausing and re vision behaviors: A mixed-methods study. Studies in Second Language Acquisition, 41(3), 605–631. https://doi.org/10.1017/S027226311900024X Riazi, A. M. (2016). Innovative mixed-methods research: Moving beyond design technicalities to epistemological and methodological realizations. Applied Linguistics, 37(1), 33–49. https://doi.org/10.1093/applin/amv064

102 Masatoshi Sato

Salvador-García, C., Capella-Peris, C., Chiva-Bartoll, O., & Ruiz-Montero, P. J. (2020). A mixed methods study to examine the influence of CLIL on physical education lessons: Analysis of social interactions and physical activity levels. Frontiers in Psychology, 11(578). https://doi.org/10.3389/fpsyg.2020.00578 Sánchez-Hernández, A. (2018). A mixed-methods study of the impact of sociocultural adapta tion on the development of pragmatic production. System, 75, 93–105. https://doi.org/10.1016/j.system.2018.03.008 Sandoval, W. (2014). Conjecture mapping: An approach to systematic educational design research. Journal of the Learning Sciences, 23(1), 18–36. https://doi.org/10.1080/10508406.2013.778204 Sasaki, M., Mizumoto, A., & Murakami, A. (2018). Developmental trajectories in L2 writing strategy use: A self-regulation perspective. Modern Language Journal, 102(2), 292–309. https://doi.org/10.1111/modl.12469 Sato, M. (under review). Instructed second language acquisition. In B. Spolsky & F. M. Hult (Eds.), The handbook of educational linguistics. John Wiley & Sons. Sato, M., & Csizér, K. (2021). Introduction to combining learner psychology and ISLA research: Intersections in the classroom. Language Teaching Research, 25(6), 839–855. https://doi.org/10.1177/13621688211044237 Sato, M., Fernández Castillo, F., & Oyanedel, J. C. (2022). Teacher motivation and burnout of EFL teachers: Do demotivators really demotivate them? Frontiers in Psychology, 27. https://doi.org/10.3389/fpsyg.2022.891452 Sato, M., & Loewen, S. (2019). Methodological strengths, challenges, and joys of classroom-based quasi-experimental research: Metacognitive instruction and corrective feedback. In R. DeKeyser & G. Prieto Botana (Eds.), Doing SLA research with implications for the classroom: Reconciling methodological demands and pedagogical applicability (pp. 31–54). John Benjamins. https://doi.org/10.1075/lllt.52.03sat Sato, M., & Loewen, S. (2022). The research-practice dialogue in second language learning and teaching: Past, present, and future. Modern Language Journal, 106(3). https://doi.org/10.1111/modl.12791 Spada, N. (2015). SLA research and L2 pedagogy: Misapplications and questions of relevance. Language Teaching, 48(1), 69–81. https://doi.org/10.1017/S026144481200050X Spada, N. (2019). Balancing methodological rigor and pedagogical relevance. In R. DeKeyser & G. P. Botana (Eds.), Doing SLA research with implications for the classroom: Reconciling methodological demands and pedagogical applicability (pp. 201–215). John Benjamins. https://doi.org/10.1075/lllt.52.10spa Tashakkori, A., & Creswell, J. (2007). Editorial: Exploring the nature of research questions in mixed methods research. Journal of Mixed Methods Research, 1(3), 207–211. https://doi.org/10.1177/1558689807302814 Teng, L. S., Yuan, R. E., & Sun, P. P. (2020). A mixed-methods approach to investigating moti vational regulation strategies and writing proficiency in English as a foreign language con texts. System, 88, 102182. https://doi.org/10.1016/j.system.2019.102182 Tsang, A. (2020). Why English accents and pronunciation ‘still’ matter for teachers nowadays: A mixed-methods study on learners’ perceptions. Journal of Multilingual and Multicultural Development, 41(2), 140–156. https://doi.org/10.1080/01434632.2019.1600528 Uprichard, E., & Dawney, L. (2019). Data diffraction: Challenging data integration in mixed methods research. Journal of Mixed Methods Research, 13(1), 19–32. https://doi.org/10.1177/1558689816674650

Chapter 5

Replication research in instructed SLA Kevin McManus

Pennsylvania State University

Replication is a research methodology designed to verify, consolidate, and advance knowledge and understanding within empirical fields of study. By repeating a study’s methodology (with or without change), a replication aims to better understand the nature and generalizability of a previous study’s findings. This chapter introduces readers to the replication research process, beginning with a description of what replication research is, what the most common types of replication research are, and why carrying out replication is important. Close attention is paid to the types of research questions that replication studies are designed to investigate. This is followed by an overview of replication in the field of ISLA, with links to studies and resources. In addition, specific guidelines are provided for carrying out and reporting replication studies. Recommendations for future replications in ISLA are suggested as well as ways in which researchers can integrate replication into future programs of research. Keywords: replication, ISLA, methodology, design

1. What is replication research and why is it important? There is growing interest, curiosity, and, to some extent, consensus about the role and place of replication research in ISLA. For example, some journals now include replication studies as a specific manuscript type (e.g., Applied Psycholinguistics, Language Teaching, Studies in Second Language Acquisition); workshops, summer schools, and book-length guides are helping researchers design and report rep lication studies (e.g., Porte, 2012; Porte & McManus, 2019); reviews of the field are synthesizing current practices (e.g., Marsden et al., 2018); funding bodies are beginning to invest in the replication of influential studies (e.g., Dutch Research Council, Institute of Education Sciences); and some professional organizations now include replication studies as highly-valued scholarly products in their guidelines for tenure and promotion (e.g., American Association for Applied Linguistics). Even

https://doi.org/10.1075/rmal.3.05mcm © 2022 John Benjamins Publishing Company

104 Kevin McManus

though these initiatives represent just a small number of developments in our field, they point to an increasing awareness about the need for and importance of replication studies in growing and strengthening a discipline (Gass et al., 2021; McManus, 2022). One likely reason why the reproducibility and replicability of research repre sents a talking point in ISLA is that replication can move us towards a more nu anced, finer-grained understanding about the nature of a specific study’s findings (Gould & Kolb, 1967; Porte & McManus, 2019; Schmidt, 2009). This is important given (i) the impact of “failures to replicate” in social psychology when numerous classic and contemporary findings could not be reproduced (Klein et al., 2014; Open Science Collaboration, 2015), and (ii) the observation that our field is built on a small number of influential studies (e.g., Bailey et al., 1974; for review, see Myles, 2010). In sum, ISLA researchers are beginning to recognize that no single study provides all the answers, and replication, along with meta-analysis, allows us to address this (Allen & Preiss, 1993; Plonsky, 2012). In ISLA, replication studies are critical since this line of research seeks to un derstand “how the systematic manipulation of the mechanisms of learning and/ or the conditions under which they occur enable or facilitate the acquisition of an additional language” (Loewen, 2015, p. 2). The key term from this definition is “systematic manipulation.” This is because a replication revisits a previous study, systematically manipulates the variable(s) of interest, and then repeats the study while keeping as many aspects of its methodology as similar as possible. Systematic comparisons are made throughout. In this way, a replication study examines in what ways intentionally modifying an initial study’s methodology altered the findings. An extension study, in contrast, has a fundamentally different aim because the mo tivation here comes from the findings of multiple studies to investigate new research questions, using new methodologies and/or new analyses. The key difference is this: A replication systematically repeats a previous study (with or without changes) to contribute a more nuanced understanding of that study and its findings, but an extension study builds on what we know about a particular phenomenon by tak ing an existing line of research in new directions, applying it in new contexts, etc.

Thus, replication is key to growing and strengthening a discipline precisely because it draws conclusions through systematic manipulation, repetition, and comparison with a previous study. A replication study’s aim is to always focus back on the initial study through systematic comparison. Replication is therefore a research methodol ogy involving repetition and systematic comparison with the explicit aim to better understand and/or establish a piece of knowledge.

Chapter 5. Replication research in instructed SLA 105

In this chapter, my aim is to introduce readers to some of the key components of replication research in ISLA. I begin by reviewing what the typical research questions in replication research are. I then present common types of replication research in our field, with examples. I end this chapter by trouble-shooting some common issues in replication research as well as offering points of advice for future replication researchers. To meet these objectives, I mostly limit myself to discus sions of replication and quantitative research with brief discussion of qualitative research (see Markee, 2017; Porte & Richards, 2012). 2. Research questions in replication research All empirical studies begin with research questions and/or hypotheses of some kind (Cumming & Calin-Jageman, 2017), and replication studies are no different. A research question is integral to designing, conducting, and reporting an empirical study because it narrows the focus of the research and makes carrying out the study manageable (e.g., selecting data collection materials, analyzing the data). Because a core aim of replication research is to systematically revisit and make comparisons with a previous study, the starting point is to use the same research question(s) and hypotheses from that previous study (Porte & McManus, 2019). We can add variations to that research question to reflect the changes made, but, on the whole, the research question should be the same or very similar at the least. For example, McManus and Liu (2022) is a close replication of Wu and Ortega’s (2013) Mandarin Chinese elicited imitation test (EIT) study. The replication study followed Wu and Ortega (2013) very closely and made the following changes: (i) it excluded heritage language speakers, (ii) it excluded graduate students from the sample, and (iii) it added a new group of beginner learners. All other aspects of Wu and Ortega’s (2013) design were the same (e.g., data coding, analyses). These changes were made to better understand how sampling in the initial study influ enced the findings as well as the extent to which the EIT could also distinguish between finer-grained language abilities (see Section 4.1 for more information). When any changes are made, it is important for the replication study to clearly report these. One way to achieve this is by listing the same research questions from the initial study with some type of mark-up and explanation. The following extract shows how McManus and Liu (2022, p. 119) highlighted differences between the initial and the replication studies:

106 Kevin McManus

The original study included four research questions (RQ). We used three of W&O’s [Wu & Ortega’s] research questions. W&O’s RQ2 is not included because it inves tigated performance differences between foreign language learners and heritage speakers. RQ1 is the same as the original study except for a minor change that reflects our variable modification (underlining in RQ1 indicates our modification). The original study’s RQ3 and RQ4 remain the same in this replication: RQ1: RQ2:

RQ3:

To what extent do Chinese EIT scores distinguish between three institu tionally-defined high, low, and beginner language ability groups? What is the relationship between participants’ performance on the Chinese EIT and their performance on an oral narrative task, evaluated in terms of three CAF-related measures: average numbers of clauses, mo tion clauses, and motion verb types? What features of the items might influence the outcomes observed for the EIT and help explain sources of varying item difficulty?

Three points are important to note here. First, the explicit mark-up and reporting show what is the same and what is different between the studies (e.g., the initial study’s second research question is not included). Second, changes made to the initial study’s research questions are underlined. Wu and Ortega (2013) recruited participants from two language ability groups, which was reflected in their word ing of “two institutionally-defined high and low language ability groups” (p. 687). McManus and Liu (2022) highlighted this difference by changing the wording of that research question. Third, the reporting additionally shows that the other research questions are the same (“The original study’s RQ3 and RQ4 remain the same in this replication”). Taken together, this reporting indicates to the reader which aspects of the initial study remain constant in the replication and which aspects are different. Although research questions are an integral component of empirical research, some published works provide hypotheses or predictions instead. In such cases, the replication should use the initial study’s hypotheses and/or predictions and indicate what changes were made, if any. The replication should also state that the initial study did not include research questions, if that is the case. This approach not only allows all between-study differences to be made salient and clear, but it additionally moves us towards a report that is more transparent, complete, and comparative. In sum, the research question(s) in a replication study should be as similar as possible to those in the initial study. The replication can make changes to the research questions, but these must be indicated in a clear and transparent way.

Chapter 5. Replication research in instructed SLA 107

3. Common options for replication research in instructed SLA In a replication study, the aim is to design and report a piece of research that repeats a previous study and makes comparisons with it. The extent to which a replication study is different from the previous one must be clearly reported, including in the title, abstract, and main text (Appelbaum et al., 2018). Common labels used in replication research to indicate the amount of change include ‘exact replication’ (or direct), ‘close replication’ (or partial), ‘approximate replication,’ and ‘conceptual replication’. We will briefly review these main types of replication studies before looking in detail at an example of a close replication study and an approximate replication study in the field of ISLA. In an exact replication, the initial study’s entire procedure is followed without alteration. Applied to ISLA, this means that the data sample (i.e., participants) is the same, the research questions and design are the same, and the coding and analyses are the same. Clearly, carrying out an exact replication is probably one of the most difficult types of replication study to do. This is because even though it might be possible to locate and use the initial study’s materials and analyses, it is very likely that we would struggle to keep the data sample the same. Of course, the sample could be similar (e.g., ESL learners in Australia vs. United Kingdom), but a variety of factors will make the population different (e.g., context, time, individual partic ipant backgrounds). This is one reason why exact replication in the social sciences is immensely difficult if not “an unachievable objective” (Porte & McManus, 2019, p. 72). Indeed, Nosek and Errington (2020, p. 3) claim that “there is no such thing as exact replication,” because there are always differences between the initial study and the replication. One exception to this, however, can include corpus-based work since corpora are time-stamped, and the content included in a corpus does not change over time, contexts, etc. As long as the replication has access to the same corpus as used in the initial study, then an exact replication is at least possible in ISLA. Of course, using the same corpus as a previous study does not make it an exact replication unless all other aspects of the study are also repeated. A close replication is the most achievable type of replication in ISLA that allows for the most comparison with the initial study. In a close replication study, only one major variable is modified and all other aspects of the initial study are kept as constant as possible. This means that a replication study could modify, for example, the instruction, the outcome measure, or the L1, to understand how manipulating that one variable influenced the study’s outcomes. A close replication is perhaps the clearest way to advance knowledge and understanding in the field because it sets out to investigate how changing a single variable impacts the study’s findings. Examples of self-identified close replications include Waring (1997), McManus & Liu (2022), and McManus & Marsden (2018)

108 Kevin McManus

An approximate replication is quite similar to a close replication but with more room for modification. Here, two variables are modified (Porte & McManus, 2019). Then, those changes are compared with the initial study, following the same line of comparison as discussed for close replications. This means that everything else remains the same as in the initial study. Again, the more that is changed, the more difficult comparisons become. Examples of self-identified approximate replications include Booth (2013); Crossley & McNamara (2008), and Johnson & Nicodemus (2016). The last type of replication to be discussed here is conceptual replication. A conceptual replication allows for almost all aspects of the initial study to be changed and involves “repetition of a test of a hypothesis or a result of earlier research with different methods” (Schmidt, 2009, p. 91). In other words, a conceptual replication study usually asks the same general research question or tests the same hypothesis as the initial study but draws on a different research methodology (e.g., different sample, different methods, different analyses). Compared to close and approximate replications, conceptual replications can be difficult to compare with the previous study because many variables may have been changed. This is an important point because the reason for conducting a replication study is to draw comparisons, but the more that is changed the more difficult drawing comparisons becomes. In sum, a conceptual replication asks the same general question as the initial study but investigates it in a new way. Examples of self-identified conceptual replications include Hiver & Al-Hoorie (2020), Rott & Gavin (2015), and Kessler et al. (2021). Before looking at examples of close and approximate replication in our field, we should briefly discuss replication and qualitative research (Casanave, 2012; Markee, 2017; Matsuda, 2012). Here, debate has focused on whether replication is a useful and/or relevant research methodology in qualitative research (for dis cussion, see Porte & Richards, 2012). Markee (2017, p. 367) has noted that “the standard reaction to this question is often a resounding ‘No!’” but suggests that qualitative research is interested in questions about qualitative phenomena and how they obtain across contexts. Instead of replication, Markee proposed the term “comparative re-production,” defined as “the empirical study of qualitative phenomena that occur in one context, which are then shown also to obtain in another” (p. 367). A recent example of comparative re-production includes East (2021), who modified a previously designed task-based language teaching course to understand how a series of curricular changes contributed to students’ under standing of task-based language teaching. Indeed, as Porte and Richards (2012) argue, the principles of replication are relevant to empirical studies using all types of research methodologies.

Chapter 5. Replication research in instructed SLA 109

In the rest of this section, we look at two replication studies in the field of ISLA to understand some of the ways in which replications have been carried out. We use McManus and Marsden (2018) as an example close replication study and Eckerth (2009) as an example approximate replication study. The aim is to review each approach to replication to understand (i) the rationale/motivation for replication and (ii) how replication was approached. Exemplar close replication study: McManus & Marsden (2018) Initial study McManus and Marsden (2017) Background McManus and Marsden (2017) investigated the extent to which explicit instruction about L2 and L1 was beneficial for the L2 learning of the French imparfait, a grammatical aspect form well-documented to be late-acquired due to complex L1-L2 differences in terms of viewpoint aspect form-meaning mappings (Howard, 2005; Kihlstedt, 2015). McManus and Marsden (2017) hypothesized that increasing learners’ sensitivity to L1-L2 differences could be one way to facilitate L2 learning (see also McManus, 2019, 2021a). The instruction included (i) explicit information (EI) about aspect forms and (ii) comprehension practice of aspect forms in sentences. Using this design, two instructional treatments were created. The “L2-only” treatment included EI about aspect forms used in French, followed by comprehension practice of these forms in sentences. The “L2+L1” treatment included the same French EI and practice, but with additional EI about English aspect forms and comprehension practice of English sentences. A comparison (or ‘control’) group completed outcome tests at pretest, posttest, and delayed posttest without receiving any explicit instruction. McManus and Marsden (2017) reported short-term improvement among learners in the “L2-only” group in both offline (judgment tasks in reading and listening) and online (self-paced reading) tasks. However, these gains disappeared six weeks later at the delayed posttest. In contrast, improvements in the L2+L1 group on the same tasks were maintained six weeks later (i.e., at delayed posttest). Taken together, these findings indicated that explicit instruction about L2 and L1 was beneficial for improving L2 learners’ online and offline comprehension of the French imparfait, whereas explicit instruction about L2 only resulted in limited gains that were not sustained at delayed posttest. Motivation for replication McManus and Marsden’s (2018) close replication set out to better understand these findings given that the initial study presented some inconsistencies with previous research (e.g., online benefits following explicit instruction, see Andringa & Curcic, 2015). However, most importantly, the replication sought to better understand the effectiveness of the “L2+L1” treatment by investigating what role the additional L1 practice played. Even though some previous research has provided EI about L1 with mixed results, no previous research has provided L1 practice as an intentional design feature of the instruction. Therefore, the purpose of McManus and Marsden’s (2018) close replication study was to better understand how the instructional components about L1 contributed to L2 performance in online and offline tests.

110 Kevin McManus

Design of the replication The close replication modified one variable and kept all other aspects of the initial study the same. Therefore, the replication study used the same data collection materials, recruited participants that matched the criteria of the initial study, and analyzed the data using the same coding and analysis protocols. The variable modification involved repeating the “L2+L1” treatment but without the EI about L1. To do this, a new instructional treatment was created, titled “L2+L1prac”, which included the same EI about L2 plus the same practice in L2 and L1 as in the “L2+L1” treatment. As a result, the difference between the “L2+L1” and the “L2+L1prac” treatments was that “L2+L1” group received EI about L2 and L1, but the “L2+L1prac” group received EI about L2 only. The close replication study retained the L1 practice component to understand how providing additional EI about L1 influenced the initial study’s findings. If results in the “L2+L1prac” and “L2+L1” groups patterned similarly, it could be inferred that the L1 EI likely played a minimal role. But, if “L2+L1prac” performance was like that in the “L2-only” group, it could be inferred that EI about L1 likely played a larger role in explaining the initial study’s findings. Thus, the replication study was designed to better understand how the L1 instructional components contributed to the learning grains reported in the initial study. Findings The results showed that performance in the “L2+L1prac” patterned like performance in the “L2-only” group, showing short-term and limited improvement compared to the “L2+L1” group. That is, by removing the L1 EI component from the “L2+L1” treatment, the results showed that L1 practice without EI about L1 did not lead to the same learning gains as found for learners in the “L2+L1” group. Take-away The replication concluded the additional L1 practice by itself did not explain the previously documented learning gains. Rather, it was the combination of additional L1 EI and L1 practice that likely led to these L2 learning gains (see also McManus, 2021b; McManus & Marsden, 2019a, 2019b). This conclusion furnished through replication is an important step forward in our understanding of the ways in which explicit instruction about L1 can facilitate L2 learning. I281 

I285 

I286 

Exemplar approximate replication study: Eckerth (2009) Initial study Foster (1998) Background For a long time now, ISLA researchers have been interested in understanding the ways in which interaction, usage, and negotiation of meaning shape L2 learning (see Gass, 2017) as well as the ways in which opportunities for interaction and negotiation of meaning can be maximized. At the same time, reviews of the field have expressed concern that much of our understanding of these phenomena is based on research studies carried out in “laboratory conditions rather than in actual classrooms” (Ellis, 1997, p. 230). Contributing to this debate, Foster (1998) investigated interaction and negotiation for meaning in classroom contexts. Using a classroom observation design (see Mackey & Gass, 2016), Foster (1998) examined L2 performance in information exchange tasks among ESL learners in dyads and small groups. Overall, she concluded that L2 performance in information exchange tasks was not influenced by working in dyads or groups. Furthermore, it was suggested that the group-level analyses had skewed the interpretation of the results because of the “disproportionate influence of a small number of the students” (p. 1).

Chapter 5. Replication research in instructed SLA 111

Motivation for replication Eckerth (2009) revisited Foster’s study in an approximate replication in order to (i) “confirm or otherwise the results” and (ii) “shed more light on the validity of Foster’s interpretation” of her findings (p. 111–112), specifically that “‘negotiating for meaning’ is not a strategy that language learners are predisposed to employ when they encounter gaps in their understanding” (Foster, 1998, p. 1). Design of the replication The replication followed the initial study’s research procedures, including participant profiles, classroom setting, data collection procedures, coding, and analyses. The differences with the initial study were that the replication (i) modified the data sample from ESL learners to L2 German learners and (ii) added a stimulated recall method. This methodological addition sought to shed light on Foster’s claim that speakers perform differently in classroom versus laboratory contexts because the classroom is a more relaxed environment, leading to a “let it pass” strategy. By adding stimulated recall, the replication explored students’ perceptions of the classroom setting in order to separate potential confounds between (i) the classroom setting itself and (ii) students’ perceptions of the classroom setting. In addition to the stimulated recall, Eckerth focused on performance in dyads only. Findings In terms of the findings, Eckerth (2009) found that all students participated in the information exchange activities. In the initial study, however, Foster (1998) reported that not all participants interacted to the same extent. Furthermore, Eckerth found that all learners participated in some form of negotiation for meaning (e.g., confirmation checks, clarification requests, comprehension checks), whereas Foster reported that a lot of students did no negotiation for meaning at all. Take-away If we move beyond the specific findings of Eckerth’s replication, it is possible to see how replication added more nuance to the initial study’s conclusions. For example, Eckerth’s findings that all participants were involved in language production and negotiation for meaning contrasted with Foster’s findings. Also, Foster claimed that the learners in her study may have let communication issues pass because the tasks and the classroom context may have led them to see the classroom as being relaxed and relatively informal. As Eckerth noted, however, this interpretation of the contextual effects on task performance was relatively speculative. The replication study was able to directly explore this issue by adding a new research tool, stimulated recall.

4. Troubleshooting replication research in ISLA Up until this point, we have considered (i) the types of research questions that a rep lication study can investigate and (ii) common options for replication studies in the field of ISLA. In this section, suggestions for trouble-shooting replication research are discussed. I limit my focus to three issues: variable modification (Section 4.1), statistical options (Section 4.2), and interpreting a replication study’s findings (Section 4.3). These three issues have been prioritized because they represent some

112 Kevin McManus

of the most common obstacles to carrying out and reporting replication studies in the field (see also Porte & McManus, 2019). For each issue, a description for why it can be problematic is provided followed by suggestions for ways to address the issue. 4.1

Considerations for variable modification

In a close or approximate replication, it is important to reflect on how variable modification is approached. With these types of replications, the aim is to keep the amount of intentional change between the initial and the replication studies to the smallest it can be. We also want to proceed with degrees of change instead of substantial changes. This is because our aim is to compare the outcomes of the two studies in order to understand how the change influenced the study’s conclusions, if at all. If the change is substantial (e.g., an entirely new set of tasks), making com parisons is difficult. Porte and McManus (2019) discuss common candidates for variable modification in our field (e.g., participant characteristics, group/context, treatment/instruction type, task variables). While this is not an exhaustive list, it can be used to get a sense of how and why a variable might be selected for modification. An additional point related to variable modification is that a replication study needs a reason why and how changes were made. The motivation will almost al ways come from previous work in this area. In McManus and Liu’s (2022) replica tion of the Mandarin EIT, for example, sampling in the initial study’s high group was modified because that grouping contained three different language abilities recruited from three levels of instruction. Sampling in the high group was modified by recruiting learners from two instructional levels. This procedure excluded the graduate student level. The rationale for making this change was as follows: “The replication excluded graduate-level students from the high group to understand the extent to which including these potentially more experienced and more proficient users of L2 Chinese may have elevated mean scores in the high group” (McManus & Liu, 2022, p. 130). One way to approach variable modification is to begin with the research ques tion and then design the replication to address that specific research question. A useful tool to facilitate this process is the pre-registration template for replication studies (Brandt et al., 2014). This template includes a list of primer questions about “the nature of the effect,” “designing the replication study,” “documenting differ ences between the original and the replication study,” and “analysis and replication evaluation”. We will take two of the questions from Brandt et al.’s Replication recipe template to show how this template can facilitate replication planning (see Brandt et al., 2014 for more detail). One question is as follows: What is the effect you are trying to replicate? Here, the aim is to identify the specific result or finding to be replicated. For example,

Chapter 5. Replication research in instructed SLA 113

in McManus and Liu (2022), a close replication was carried out to verify Wu and Ortega’s (2013) finding that the Chinese EIT reliably distinguished between differ ent instructional levels. In a close or approximate replication, the aim is to replicate the same result or findings as the initial study. A second question that is important to consider is as follows: What differences between the original study and your study might be expected to influence the size and/ or direction of the effect? This question encourages the researcher to think about how the variable modification might influence the effect to be replicated. In short, this involves making a prediction based on what is currently known in the field. In McManus and Marsden (2018), the instruction was modified to understand how instruction that excluded additional EI about L1 contributed to the previously re ported L2 learning outcomes from McManus and Marsden (2017). The predictions thus required an account for why removing L1 EI might influence the size and/or direction of the effect. To do this, the replication drew on previous research about the role of EI in L2 learning (e.g., Marsden & Chen, 2011; Sanz & Morgan-Short, 2004) and predicted that learners would not benefit from the additional EI about L1 because they would be able to induce the EI from the practice. Thus, it was anticipated that the L2+L1prac condition would pattern like the L2+L1 condition and that L1 EI was not needed (incidentally, this prediction was not supported). In summary, planning is an integral component of the research process that applies to replication studies as much as it does to any piece of empirical research. Before carrying out a replication study, it is important to plan what variable(s) will be modified and why. Brandt et al.’s (2014) Replication recipe template can be a helpful tool for planning a replication study. 4.2

What statistical options are available in replication studies?

In a replication study, it is important to review (i) what statistical procedures were performed in the initial study and (ii) to what extent those procedures can be changed in a replication. This is an important decision especially if the review of the initial study indicated ambiguous or missing information about the statistical procedures. It should also be kept in mind that when a study is replicated, the researcher should not knowingly replicate problematic or questionable practices. So, for exam ple, if the sample size is low in the initial study, the replication study should address this in the replication. Calculating and reporting statistical power is one way to do this (see Norouzian, 2020). Equally, if the statistical tests used were not appro priate, then these procedures should not be repeated in full knowledge that there are problems. In cases like these, it is important to make our observations known to the reader. If it turns out that a large number of methodological and statistical

114 Kevin McManus

issues need to be addressed, a conceptual replication is likely a good first route. The conceptual replication can then be revisited with a series of close replications (see Porte & McManus, 2019). In the remainder of this subsection, two ways to address statistical procedures in replication studies are highlighted: report both sets of tests (from initial and replication studies) and, when available/appropriate, draw comparisons from de scriptive results. First, because not all statistical tests share the same assumptions, one way to facilitate comparability with the initial study is to present the results from both tests if the replication carries out different statistical procedures. For example, McManus and Marsden (2018) calculated within-group effect sizes in two different ways, one set with and one set without a correction for the dependence (correlation) between the two means (Morris & DeShon, 2002; for a review of effect sizes in SLA research, see Plonsky & Oswald, 2014). This corrected effect size calculation can account for the correlated nature of pretest-posttest data, which, when not corrected for, can lead to inflated effect size values. The two calculations were presented to (i) remain consistent with the initial study and (ii) provide greater transparency and account ability in the replication’s data analysis. The new analyses were presented in the online supplementary data files (see Chapter 15). Second, in addition to presenting statistical tests, the descriptive results should be included as both a point of comparison and a source of the analysis (Plonsky, 2015). Many studies will include the descriptive results in a data table that can be used for comparison using effects sizes, for example (see McManus & Liu, 2022 as an example). Using the descriptive results as the point of comparison between the stud ies, in addition to the statistical tests, allows the reader to understand the magnitude of the comparability more fully (McManus & Liu, 2022; Porte & McManus, 2019). 4.3

Interpreting a replication’s findings

A concern in replication research is how to interpret the replication’s findings in light of the initial study. Marsden et al.’s (2018) narrative and systematic reviews of replication studies in SLA research indicated that the most common form of comparisons are: narrative comparison (93%), mentioning the findings of the initial study (90%), and using dichotomous interpretation from null hypothesis signifi cance testing (84%). The limitation with interpreting a replication’s findings using methods like narrative comparison is that there are no agreed conventions on how to do that. For example, in terms of null hypothesis significance testing, the replica tion would likely look for a p-value that trended in the same direction as the initial study (e.g., p  I~ Me. In L. Gurzynski-Weiss (Ed.), Cross-theoretical explorations of interlocutors and their individual differences (pp. 79–97). John Benjamins. https://doi.org/10.1075/lllt.53.04lan Larson-Hall, J. (2015). A guide to doing statistics in second language research using SPSS and R. Routledge. https://doi.org/10.4324/9781315775661 Li, S. (in press). Individual differences in task-based language teaching. John Benjamins.

144 Laura Gurzynski-Weiss and YouJin Kim

Loewen, S., Lavolette, E., Spino, L. A., Papi, M., Schmidtke, J., Sterling, S., & Wolff, D. (2014). Statistical literacy among applied linguists and second language acquisition researchers. TESOL Quarterly, 48(2), 360–388. https://doi.org/10.1002/tesq.128 MacIntyre, P. D., Clément, R., Dörnyei, Z., & Noels, K. A. (1998). Conceptualizing willingness to communicate in a L2: A situational model of L2 confidence and affiliation. Modern Language Journal, 82(4), 545–562. https://doi.org/10.1111/j.1540-4781.1998.tb05543.x Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quan titative meta-analysis. Language Learning, 50(3), 417–528. https://doi.org/10.1111/0023-8333.00136 Ortega, L. (2010, March). The bilingual turn in SLA [Plenary]. Annual Conference of the American Association for Applied Linguistics. Atlanta, GA, United States. Ortega, L., & Iberri-Shea, G. (2005). Longitudinal research in second language acquisition: Recent trends and future directions. Annual Review of Applied Linguistics, 25, 26–45. https://doi.org/10.1017/S0267190505000024 Otheguy, R., García, O., & Reid, W. (2015). Clarifying translanguaging and deconstructing named languages: A perspective from linguistics. Applied Linguistics Review 6(3), 281–307. https://doi.org/10.1515/applirev-2015-0014 Philp, J., & Duchesne, S. (2016). Exploring engagement in tasks in the language classroom. Annual Review of Applied Linguistics, 36, 50–72. https://doi.org/10.1017/S0267190515000094 Prada, J. (2021). Translanguaging thinking y el español como lengua de herencia. In D. Pascual y Cabo, J. Torres, & J. Muñoz-Basols (Eds.), Aproximaciones al estudio del español como lengua de herencia (pp. 111–126). Routledge. https://doi.org/10.4324/9780429443657-11 Prada, J., Guerrero-Rodriguez, P., & y Cabo, D. P. (2020). Heritage language anxiety in two Spanish language classroom environments: A comparative mixed methods study. Heritage Language Journal, 17(1), 92–113. https://doi.org/10.46538/hlj.17.1.4 Prada, J., & Turnbull, B. (2018). The role of translanguaging in the multilingual turn: Driving philosophical and conceptual renewal in language education. EuroAmerican Journal of Applied Linguistics and Languages, 5(2), 8–23. https://doi.org/10.21283/2376905X.9.151 Reschly, A. L., & Christenson, S. L. (2012). Jingle, jangle, and conceptual haziness: Evolution and future directions of the engagement construct. In S. L. Christenson, A. L. Reschly, & C. Wylie (Eds.), Handbook of research on student engagement (pp. 3–19). Springer. https://doi.org/10.1007/978-1-4614-2018-7_1 Révész, A. (2011). Task complexity, focus on L2 constructions, and individual differences: A classroom-based study. Modern Language Journal, 95(s1), 162–181. https://doi.org/10.1111/j.1540-4781.2011.01241.x Ricci, V. (2005). Fitting distributions with R. https://cran.r-project.org/doc/contrib/Riccidistributions-en.pdf Robinson, P. (2003). The cognition hypothesis, task design, and adult task-based language learn ing. Second Language Studies, 21(2), 45–105. Robinson, P. (2007). Task complexity, theory of mind, and intentional reasoning: Effects on L2 speech production, interaction, uptake and perceptions of task difficulty. International Review of Applied Linguistics in Language Teaching, 45(3), 193–213. https://doi.org/10.1515/iral.2007.009 Sato, M., & Dussuel Lam, C. (2021). Metacognitive instruction with young learners: A case of willingness to communicate, L2 use, and metacognition of oral communication. Language Teaching Research. https://doi.org/10.1177/13621688211004639

Chapter 6. Unique considerations for ISLA research across approaches 145

Serafini, E. J. (2020). The impact of learner perceptions of interlocutor individual differences on learner possible selves during a short-term experience abroad. In L. Gurzynski-Weiss (Ed.), Cross-theoretical explorations of interlocutors and their individual differences (pp. 211–245). John Benjamins. https://doi.org/10.1075/lllt.53.09ser Sheen, Y. (2008). Recasts, language anxiety, modified output, and L2 learning. Language Learning, 58(4), 835–874. https://doi.org/10.1111/j.1467-9922.2008.00480.x Skinner, E. A., & Pitzer, J. R. (2012). Developmental dynamics of student engagement, coping, and everyday resilience. In S. L. Christenson, A. L. Reschly, & C. Wylie (Eds.), Handbook of research on student engagement (pp. 21–44). Springer. https://doi.org/10.1007/978-1-4614-2018-7_2 Swain, M., & Lapkin, S. (2001). Focus on form through collaborative dialogue: Exploring task effects. In M. Bygate, P. Skehan, & M. Swain (Eds.), Researching pedagogic tasks: Second language learning, teaching and testing (pp. 99–118). Routledge. Sykes, J. M. (2013). Multiuser virtual environments: Learner apologies in Spanish. In N. Taguchi & J. Sykes (Eds.), Technology in interlanguage pragmatics research and teaching (pp. 71–100). John Benjamins. https://doi.org/10.1075/lllt.36.05syk Taguchi, T., Magid, M., & Papi, M. (2009). The L2 motivational self system among Japanese, Chinese and Iranian learners of English: A comparative study. In Z. Dörnyei, & E. Ushioda (Eds.), Motivation, language identity, and the L2 self (pp. 66–97). Multilingual Matters. https://doi.org/10.21832/9781847691293-005 Teimouri, Y., Goetze, J., & Plonsky, L. (2019). Second language anxiety and achievement: A me ta-analysis. Studies in Second Language Acquisition, 41(2), 363–387. https://doi.org/10.1017/S0272263118000311 Thompson, A. S. (2017). Don’t tell me what to do! The anti-ought-to self and language learning motivation. System, 67, 38–49. https://doi.org/10.1016/j.system.2017.04.004 Torres, J., & Serafini, E. J. (2016). Microevaluating learners’ task-specific motivation in a taskbased business Spanish course. Hispania, 99(2), 289–304. https://doi.org/10.1353/hpn.2016.0055 Wei, L. (2018). Translanguaging as a practical theory of language. Applied Linguistics, 39(1), 9–30. https://doi.org/10.1093/applin/amx039 Zhang, X., Dai, S., & Ardasheva, Y. (2020). Contributions of (de) motivation, engagement, and anxiety to English listening and speaking. Learning and Individual Differences, 79. https://doi.org/10.1016/j.lindif.2020.101856 Ziegler, N., & Smith, G. (2017). Teachers’ provision of feedback in L2 text-Chat: Cognitive, con textual, and affective factors. In L. Gurzynski-Weiss (Ed.), Expanding individual difference research in the interaction approach (pp. 256–279). John Benjamins. https://doi.org/10.1075/aals.16.11zie

Section 4

Designing instructional interventions for specific skills & competencies

Chapter 7

Pragmatics Assessing learning outcomes in instructional studies Naoko Taguchi and Soo Jung Youn

Northern Arizona University / Daegu National University of Education

The field of instructed SLA (ISLA) has grown rapidly in recent years to examine how systematic manipulations of instructional conditions can lead to the de velopment of second language (L2) knowledge and use (Loewen & Sato, 2017). Following this trend, L2 pragmatics researchers have implemented various instructional methods and examined their effectiveness using experimental de signs (Taguchi, 2015; Taguchi & Roever, 2017). The critical issue in this practice is how to assess learning outcomes in order to make a claim that certain instruc tional methods can produce robust pragmatic knowledge. To address this ques tion, this chapter presents an overview of assessment methods used in analyzing pragmatics learning outcomes. The chapter surveys instructed L2 pragmatics studies published in the last four decades to identify common assessment meth ods (i.e., discourse completion tasks and role-play tasks). We provide step-bystep illustrations of expert studies in order to demonstrate how researchers design an assessment task, evaluate learning, and interpret results. The chapter concludes with a discussion of the challenges and limitations of current methods and provides direction for addressing such challenges in pragmatics research. Keywords: pragmatics, instructional studies, learning outcomes, assessment, DCT, role-play

1. Introduction: What is pragmatics and why is it important? With recognition that pragmatics, just like grammar and vocabulary, is a critical area of second language (L2) learning, researchers and teachers have explored a variety of methods to examine, evaluate, and teach pragmatic knowledge in L2 settings. Pragmatics reflects a dynamic interaction among linguistic forms, mean ings that the forms designate, and the sociocultural context where form-meaning pairings are realized.

https://doi.org/10.1075/rmal.3.07tag © 2022 John Benjamins Publishing Company

150 Naoko Taguchi and Soo Jung Youn

The central role of context in pragmatics is to provide a broader perspective of our language use. Our linguistic choice is fundamentally grounded in a given social circumstance and communicative goal – the communicative functions we want to achieve (e.g., apologizing, complaining, or sympathizing), the impressions we want to convey (e.g., friendly, open, or funny), and the kind of personal re lationships we want to cultivate (e.g., intimate, informal, or distant). Depending on the circumstance and communicative goal, we select certain linguistic forms to convey our intentions. For example, there are a variety of forms we can use to greet someone (e.g., “Hello”, “What’s going on?”). Among those forms, we select a certain form based on our understanding of context (i.e., whom we are greeting in what setting). When the context involves a friend’s get-together, a casual greeting like “Hey” or “What’s up?” is appropriate. But when we greet a new colleague in a work situation, we often use more formal expressions such as “Nice to meet you” or “How are you?” Our linguistic choice is also guided by our agency. Depending on what kind of impression we want to make (e.g., being friendly or serious), we use a specific expression strategically. As Verschueren (1999) claims, pragmatics is a “general cognitive, social, and cultural perspective on linguistic phenomena in relation to their usage in forms of behavior” (p. 7). This claim implies that knowledge of pragmatics is not limited to a cognitive dimension alone (e.g., syntactic knowledge, inferencing). Knowledge of cultural conventions and norms of language use, as well as knowledge of how to interact with others to achieve a communicative goal, are all critical aspects of pragmatic knowledge. Given the broader scope of what pragmatics entails, a challenge for researchers and practitioners is how to teach pragmatics and how to assess the associated learn ing outcomes. The last few decades have seen an expansion of teaching methods and approaches in different domains of pragmatics (e.g., pragmatic knowledge, awareness, perception, and performance) (Plonsky & Zhuang, 2019; Taguchi, 2015; Taguchi & Roever, 2017). A variety of instructional materials have been developed to teach pragmatics, and, at the same time, different ways of assessing learning outcomes have been proposed. This chapter discusses current instructional studies in L2 pragmatics focusing on methods used to assess learning outcomes. We will first present a synthesis of instructional studies published from the 1980s up to 2021. The purpose of this re view is to demonstrate how pragmatics has been taught and how learning outcomes have been assessed. Based on the synthesis, we present in-depth discussions of two of the most common methods of assessing learning outcomes – discourse comple tion tasks (DCTs) and role-plays. Following these discussions, we present advice and direction for future researchers and practitioners. We conclude the chapter with tips for trouble-shooting research designs in instructed pragmatics research.

Chapter 7. Pragmatics 151

2. What we know and what we need to know about pragmatics in ISLA1 This section situates the chapter within the broader scope of instructed L2 prag matics research. We present a synthesis of 77 instructional studies published from 1989 to March 2021, including instructional studies published between 1989 and 2015 in Taguchi’s (2015) comprehensive review (see Appendix). Following the four criteria below, instructional intervention studies in pragmatics were identified using database searches (e.g., LLBA, ERIC): (1) The study examined how systematic manipulations of instructional conditions can lead to improved pragmatic knowledge (Loewen & Sato, 2017) by using a pre-/posttest design with or without a control group. (2) The study explained the teaching methods with details. (3) The study provided sufficient information about participants (e.g., age, years of formal study) and instructional settings (e.g., college-level language class rooms) to contextualize the instruction. (4) The study interpreted the outcomes of effective instruction in the research design. Each study was coded for target language, target pragmatic features, sample size, participants’ L1, instructional methods, methods for assessing learning outcomes, and findings (see Appendix). In the following section, we highlight general trends of the current literature, focusing on how pragmatics has been taught and how learning outcomes have been assessed. 2.1

Current trends in instructed pragmatics

Parallel to the growth of instructed SLA, instructed pragmatics has grown rap idly as a field that investigates the effectiveness of instruction in promoting prag matic knowledge. In designing a study and developing instructional materials, researchers have adopted a variety of SLA theories that represent both cognitive and social camps, including the noticing hypothesis, Skill Acquisition Theory, and Sociocultural Theory (for a review, see Taguchi & Roever, 2017). Adopting Schmidt’s (1993) noticing hypothesis that capitalizes on the role of consciousness and attention in learning, a number of studies have compared instructional effects between explicit and implicit teaching methods. The former 1. This chapter focuses on quantitative studies that assessed learning outcomes after instruc tional intervention. Studies using qualitative data (e.g., observation, interviews) are beyond the scope of this chapter.

152 Naoko Taguchi and Soo Jung Youn

provides direct information about form-function-context mappings (e.g., informa tion about which linguistic forms to use when refusing someone’s invitation), while the latter promotes learners’ discovery of pragmatic rules through input exposure and consciousness-raising (e.g., identifying refusal forms in input). In her 2015 review, Taguchi concluded that explicit instruction was generally superior to implicit instruction. At the same time, implicit instruction was reported to be equally beneficial when using meaningful activities that targeted pragmatic features. In addition, the effect of instruction over non-instruction was confirmed in the original review. These conclusions have been confirmed again in the current review in additional target languages, such as Chinese (Taguchi et al., 2017) and Korean (Kim et al., 2018), as well as in diversified age groups of learners including young learners (e.g., Alemi & Haeri, 2020). Going beyond the implicit and explicit comparison, the effects of additional instructional methods with distinct learning theories have been explored in recent studies. For example, Taguchi and Kim (2016) adopted a task-based approach to teaching requests. They examined the effects of collaborative dialogues based on Swain and Lapkin’s (1998) theoretical concept of language-related episodes (see Box 1 in the next section). Their study demonstrated that negotiation of pragmatic knowledge while completing the task jointly promoted a deeper level of cognitive processing, which in turn led to effective pragmatic learning. In another study, Takimoto (2020) examined the effect of the cognitive linguistic approaches, which utilize mnemonics (e.g., metaphor, spatial concepts) to internalize sociopragmatic and pragmalinguistic aspects of pragmatics, in teaching requests for English as a foreign language (EFL) learners. Focusing on different types of request strategies that invoke a different degree of hypotheticality, Takimoto designed instructional materi als involving explanations on the spatial concepts illustrating the level of politeness. Another noticeable trend is the increasing number of instructional studies that involve technology platforms both in learning environments and outcome assessment methods. For example, Eslami et al. (2015) investigated the effect of form-focused explicit instruction on requests through asynchronous computer me diated communication. The Iranian EFL learners in two treatment groups (explicit vs. implicit) met with telecollaborative tutors who were graduate students in the U.S., while the learners in the control group received traditional in-class activities on four language skills. They reported that both treatment groups outperformed the control group, with greater learning outcomes in the explicit group compared to the implicit group. Alemi and Haeri (2020) examined the effect of robot-assisted instruction to teach speech acts of request and thanking for young EFL learners. They utilized humanoid robots as assistants to teach how to play games, repeat sentences, and interact with each other, which yielded positive learning outcomes. These recent studies reflect on-going trends in technology-mediated pragmatics teaching and learning (Sykes & González-Lloret, 2020; Taguchi & Sykes, 2013).

Chapter 7. Pragmatics 153

Several recent studies have used technology to design outcome assessment methods. Halenko and Jones (2017) developed a computer-animated production test that involved role-play interaction with virtual figures. Learners listened to a virtual interlocutor (e.g., librarian) and responded to the virtual interlocutor’s pre-recorded turn. By having learners engage in virtual role-play interaction, the researchers intended to enhance authenticity of learning outcome assessment meth ods. In another study, Sydorenko et al. (2020) assessed learning outcomes using self-accessed technology-enhanced simulations. After watching a short video il lustrating a situation, learners recorded their spoken response. For example, after watching a video of an instructor typing on the computer in an office, learners greeted the professor by saying “Hi Professor. Can I come in?” After this, they se lected the option that matched the spoken response that they just recorded, which determined the next sequence of video that corresponded with the learners’ option. This stepwise movement of turns lasted until the end of the conversation. The computer simulations were able to elicit extended sequences including a greeting and pre-request, even without human interlocutors. These examples illustrate the attempt to modify existing formats of role-plays and DCTs in order to address some of the shortcomings of each measure (discussed in the next section). 3. Data elicitation and interpretation options for pragmatics The goal of this chapter is to guide researchers through study conceptualization, design, implementation, and data analysis of instructional intervention research for pragmatics learning. Specifically, as part of analysis, we focus on the most common methods for assessing learning outcomes in instructional studies in pragmatics. Our review of 77 studies has revealed a variety of tasks assessing learning out comes in instructed pragmatics research (see Appendix). Common assessment tasks with brief descriptions are provided below (see Culpeper et al., 2018, for details). (1) Discourse completion task (DCT) Participants read a scenario and either write down or speak what they would say in a given situation. (2) Role-play Participants read a scenario and act out assigned roles with their interlocutor. (3) Appropriateness/acceptability judgment task Participants read a speech act utterance (e.g., a request-making utterance) and judge the degree of appropriateness of the utterance (e.g., politeness and directness) using a Likert-scale. (4) Multiple-choice questions (or comprehension/recognition test) Participants read a scenario and a list of speech act utterances featuring the scenario. They select the utterance that is most appropriate in the situation.

154 Naoko Taguchi and Soo Jung Youn

(5) Tasks eliciting extended spoken or written discourse (e.g., emails, essays, inter views, conversations, discussions, and narratives) Participants complete a less structured task eliciting an extended text (writ ten or spoken). Target pragmatic features appearing in the text are analyzed (e.g., opening and closing expressions in email; discourse markers in essays; sentence-final particles in conversations; politeness modals in academic discussions). Among existing tasks, two stood out as common methods – a DCT and a role-play. The former was used in 50% of the studies (the most common measure), while the latter was used in 24% of the studies (the second most common measure). Based on these findings, we focus on these two assessment tasks in this chapter. Although these two methods have been used widely in instructed pragmatics, they elicit and measure different aspects of pragmatic competence, which are closely tied to in structional goals. Given that instructional effects vary considerably depending on the choice of learning outcome assessment methods (Taguchi, 2015), researchers need to choose a learning outcome carefully depending on the targeted pragmatic features. In the next sections, we further discuss the characteristics of DCTs and role-plays, as well as step-by-step illustrations of these methods. 3.1

Discourse completion tasks (DCTs)

3.1.1 Characteristics of DCTs DCTs have been used widely to elicit a variety of speech acts (e.g., requests, apolo gies). A typical DCT involves first presenting a brief scenario followed by an open slot for participants to complete by filling it in with a speech act utterance. The fol lowing is an example of a DCT item eliciting the speech act of apology (Blum-Kulka et al., 1989, p. 14). Participants are asked to imagine the given situation and produce the targeted speech act as if they were actually in the situation performing the role. Situation: A student has borrowed a book from her teacher, which she promised to return today. When meeting her teacher, however, she realizes she forgot to bring the book. Teacher: Miriam, I hope you brought the book I lent you. Miriam: __________________________________________________ As this example illustrates, the situational description usually presents information about the context, such as the purpose of the interaction (i.e., to apologize) and the interlocutor relationship (i.e., professor and student). The degree of the imposition in the speech act can be inferred from the setting (i.e., what Miriam is apologizing

Chapter 7. Pragmatics 155

for). By using scenarios with different contextual factors, researchers can elicit and examine participants’ ability to adapt their linguistic demeanors to various social situations. Using DCTs, we can assess learners’ knowledge of linguistic forms and strat egies associated with a speech act as instructional outcomes (e.g., use of modals like “could” when making a request; closing a request with “thank you.”). We can also assess their knowledge of modification devices used to soften the imposition of a speech act (e.g., hedging and imposition minimizers as in “If possible, could you please turn down the music a little bit?”), if those forms are taught during instruction. Earlier versions of DCTs (1980s–1990s) were predominantly in the written mode. Participants were typically asked to write down what they would say in a given situation. As such, these written DCTs do not elicit features specific to speech such as fluency and prosodic features. Cohen and Shively (2007) highlight this mismatch stating that a written DCT is an “indirect means for assessing spoken language in the form of a written production measure” (p. 196). To compensate for the limitations of written DCTs, oral DCTs were developed in the 2000s. Oral DCTs are often administered via a computer. After reading a situational description, participants give their spoken response which is recorded automatically by the computer. Oral DCTs have helped us go beyond the traditional analysis of mor phosyntactic aspects of speech acts, extending to more features that are specific to speech such as oral fluency. Fluency measures (e.g., speech rate, pause length) help us examine pragmatics learning from a performance perspective; they provide an indirect reflection on learners’ ease or difficulty in processing pragmatics informa tion. Hence, oral DCTs are useful when instructional goals involve fluency devel opment. For example, Li (2012) investigated instructional effects on accurate and fluent recognition and production of request-making in L2 Chinese. Participants were assigned to three conditions: an intensive training group, a regular training group, and a control group. The intensive and regular groups received metaprag matic information on request forms, followed by receptive skill-based practice. The intensive group practiced twice more than the regular group. The learners’ recog nition and production of the request forms were assessed by an oral DCT and a listening judgment test. Results showed that a larger amount of practice in receptive skills led to more accurate and fluent recognition of the request forms but did not extend to fluent production of the forms, although the practice promoted accuracy of production. In other words, receptive-skill-based practice improved accuracy in both comprehension and production tasks, but the effect did not transfer across modalities at the level of fluency.

156 Naoko Taguchi and Soo Jung Youn

Exemplar study: Taguchi & Kim (2016) Research questions 1. What is the effect of task-based pragmatic instruction on L2 pragmatic development? Are there any differences in the learning of the request speech act between the collaborative and individual task group? 2. Are there any differences in the frequency of pragmatic-related episodes and quality of task performance during instruction between the collaborative and individual task group? Theoretical framework Collaborative dialogue Methods Participants: 74 learners of English were divided into three groups. The “collaborative group” received direct metapragmatic information on requests and constructed a request-making dialogue in pairs. The “individual group” received the same direct information on requests but completed the same task individually. The control group did not receive instruction. Instructions: Instruction involved two components: explicit metapragmatic information and DCT-based dialogue construction tasks. Metapragmatic information: Participants received scenarios involving two types of requests: a low-imposition request directed to someone of equal status (e.g., asking a friend for a pen) and a high-imposition request directed to someone of higher status (e.g., asking a teacher for a permission to miss a class). Scenarios were followed by sample dialogues and explanations. Low-imposition request: When you ask for a small favor from someone of equal status, you can say “Can I + verb?”, “Can you + verb?”, or “Could you + verb?” The request should be short. You do not have to say or explain anything in great detail. High-imposition request: When you ask for a big favor from someone superior to you, you say more. a. You start with a “preparator” to prepare the person for the request (“Hello xxxxx, do you have time? May I ask you something?”, “Hello xxxxx, I am here to talk about xxxx). b. You give a reason or explanation to support your request (“I was accepted to the national dancing contest, and …..”). The reason has to be clear and detailed. c. You use a long form of request (“I was wondering if I could” + verb or “Is there any way that I could” + verb) d. As an option, you can use an “amplifier” to emphasize your feeling (e.g., really, very). You can also use “hedging” to soften your tone (e.g., possibly, maybe). DCT-based dialogue construction task: Participants were presented with a scenario with a picture of people in the scenario and created a dialogue involving a request. The collaborative group created the dialogue in pairs, while the individual group created the dialogue by themselves while verbalizing their thought processes. The sample task scenario and the picture used in the task are shown below.

Chapter 7. Pragmatics 157

Situation: Look at the picture which displays a TV scene. Jeonghyun is a student representative at Yongshin middle school. Many students complain about the school’s old computers in the lab. In this scene, you will see Jeonghyun visiting the principal’s office to discuss this issue. Jeonghyun makes a request to the school principal to politely ask whether the old computers could be replaced by newer computers next year.

Methods for assessing learning outcomes: After completing the dialogue construction tasks, participants completed a written DCT involving 15 items (8 target items and 7 distractor items). Three parallel versions of the DCTs were prepared for a pretest, immediate posttest, and delayed posttest. They read a scenario and wrote down what they would say in that situation. The sample item below illustrates this task. Situation: You are a student representative for your school. The national Korean soccer team is playing a game with Japan during the World Cup. You want to ask the vice principal of your school if your class can watch the game during the self-study hour. What do you say? (You go into the vice principal’s office. He is sitting at the desk.) You:_____________________________________________________ Vice principal: OK, I will tell teachers about the program. Findings In the DCT responses, request head acts (e.g., “I was wondering if …”) and modifications (e.g., justification for the request) were identified and scored. The collaborative group outperformed the individual group on the production of the request head act at the immediate posttest, but they did not maintain their superiority at the delayed posttest. No group differences were found in request modifications. An analysis of the during-task interaction and the think-aloud data showed that the collaborative group produced the target head acts more successfully than the individual group, but no significant group differences were found in the use of modifications.

158 Naoko Taguchi and Soo Jung Youn

3.1.2 Pros and cons of DCTs Practicality is the main advantage of DCTs. After instruction, teachers and research ers can administer a DCT to a large number of participants in a single session. Because the format and scenarios in DCTs are standardized, data are comparable between pre- and posttest to determine instructional effects. DCTs also enable us to manipulate social factors in a scenario (power, social distance, and degree of imposition) so we can teach and assess how different social factors affect people’s linguistic performance. The major limitation of DCTs is that they do not elicit face-to-face interactional data. Since DCTs elicit only one-directional speech, DCTs neglect critical aspects of spoken interaction such as turn-taking and sequential organization of conversation. In addition, prosodic features such as intonation have rarely been analyzed in DCT data. Hence, DCT-elicited data often reduce the scope of instruction and assessment to discrete aspects of morphosyntactic knowledge and do not extend to the actual language use in naturalistic situations. Another limitation of DCTs is a lack of authenticity in the task (see Golato, 2003, on the mismatch between authentic and DCT-elicited compliment responses). Participants are presented with a situational scenario and asked to put themselves into an imagined role. Although some studies have presented a visual representa tion of the situation (e.g., a picture), most studies have only used a written text as the prompt. In oral DCTs, after reading a prompt, participants are asked to speak to a blank computer screen, rather than to the actual person in the situation. This format inevitably undermines the authenticity of real-world speech interaction because in real-life we do not typically speak to a computer screen without the presence of an interlocutor. Hence, DCTs may not be suitable for assessing transfer of knowledge to real-life situations, that is, how learners apply learned knowledge during instruction to the outside world. Yet, DCTs are useful for eliciting and as sessing learners’ knowledge and repertoire of pragmalinguistic forms (see Section 5 for when to use DCTs). 3.1.3 Step-by-step DCT illustration When creating a DCT, two steps are critical: (1) identifying appropriate situations to use for the target population and (2) confirming the situational differences across DCT items. For the aforementioned example study (Taguchi & Kim, 2016), we chose scenarios that satisfy these two steps and conducted a pilot study. (1) A survey involving 14 request-making situations was created. Each situation was followed by two 5-point Likert-scales. One scale asked them to rate how difficult the request was to perform, and the other asked them to rate how common the situation is:

Chapter 7. Pragmatics 159

Situation: You forgot to bring a cell phone. You need to ask your classmate to lend her cell phone to you to call your mother. How difficult is it to ask this? How common do you think this situation is?

1 very easy 1 very rare

2 2

3 3

4 4

5 very difficult 5 very common

In addition to the Likert-scale items, open-ended questions were asked so participants could write down the request situations that they encountered in real-world settings. (2) Pilot participants (n = 34) rated the degree of difficulty in completing the re quest described in the scenario. They also rated the degree of commonality of each situation. (3) Participants’ responses were tallied. Four situations that received higher ratings of difficulty were selected as high-imposition requests, while four situations with lower ratings were selected as low-imposition requests. (4) Commonality ratings were checked to avoid using situations that do not rep resent real-life. (5) Participants’ responses to the open-ended question were used when writing situations. 3.2

Role-play tasks

3.2.1 Characteristics of role-play tasks Role-plays are simulations of communicative encounters in which participants take certain social roles and act upon predetermined scenarios. As an alternative to naturally-occurring interactions, researchers can focus on specific pragmatic actions in role-plays and manipulate participants’ social roles in a more controlled setting. Compared to DCTs, role-plays are less commonly used as methods for assessing learning outcomes in instructional studies. However, as the analysis of pragmatic interaction has gained ground, we now have a better understanding of the scope of pragmatic ability taught and assessed using role-plays. The discussion of role-plays as a data collection method for L2 pragmatics dates back to the early 1990s. Kasper and Dahl (1991) distinguished open role-plays from closed role-plays. In closed role-plays, participants respond to an interlocutor’s scripted initiation and provide a single-turn response. Thus, interactional outcomes are predetermined, and participants do not engage in an extended conversation. Essentially, a closed role-play is quite similar to a one-turn spoken DCT as de scribed in the previous section.

160 Naoko Taguchi and Soo Jung Youn

Open role-plays, on the other hand, allow more spontaneous interactions among participants without predetermined interactional outcomes. Often, open role-plays present participants with consequences of their interaction and con tingent details that they need to manage during the course of the interaction. The following is a sample of an open role-play task from Grabowski (2009, pp. 105–106). In this role-play of making a complaint, two separate prompts are given to two participants. It is worth noting that different communicative goals were given to each participant to allow for spontaneous interaction and negotiation. Prompt 1 Roles You: a tired person Your partner: your 25-year-old neighbor Situation It is 1:30 am on a Wednesday night. You have been trying to fall asleep for some time now, but you can’t because of all the noise coming from your neighbor’s apartment. This has been an ongoing problem with your neighbor. You get out of bed, go over to your neighbor’s apartment, and knock on the door. Your neighbor opens the door. Your goal During the conversation, make sure your partner knows that you are frustrated by the situation. Prompt 2 Roles You: a 25-year-old Your partner: your annoying neighbor Situation It is 1:30 am on a Wednesday night. You and some of your friends are having a good time in your apartment. You are all listening to music and laughing. There is a knock on the door, so you go to the door and answer it. It’s one of your neighbors. You find this neighbor very annoying because s/he is always telling you to turn down your music. Your goal During the conversation, make sure your partner knows you don’t think you are too loud. When using a role-play as a learning outcome measure in instructional research, it is crucial to understand the characteristics of pragmatic targets taught and assessed during the role-play interaction. Depending on theoretical and analytical frame works, researchers have examined different aspects of pragmatic ability in role-play

Chapter 7. Pragmatics 161

interactions. Influenced from a cross-cultural pragmatics tradition, one line of re search has investigated types of pragmalinguistic strategies used to express direct ness, politeness, and illocutionary force (e.g., modals used in request-making). For example, Félix-Brasdefer (2007) employed open role-plays in his cross-sectional study that examined the development of requests in Spanish as a second language. He categorized requests into three strategy types (direct, conventional indirect, non-conventional indirect) and examined the frequency of pragmatic strategies and paralinguistic resources observed during role-play performances among L2 Spanish learners with different proficiency levels. He reported that high proficiency learners used conventional indirect request strategies more frequently by using a wider range of pragmalinguistic strategies and complex grammatical resources. The detailed pragmalinguistic strategies and discourse moves found in Félix-Brasdefer (2007) represent the nature of L2 learners’ pragmatic competence. However, the analysis did not reveal how the learners engaged in the role-play interaction. In another line of research, role-plays were used to examine how learners accomplish pragmatic actions sequentially turn-by-turn in extended spoken dis course (Kasper, 2006; Taguchi & Roever, 2017). Conversation analysis (CA) was commonly used to analyze the characteristics of simulated role-play interaction (e.g., Al-Gahtani & Roever, 2012, 2018; Huth, 2010; Kasper & Youn, 2018; Nguyen, 2018; Okada, 2010; Stokoe, 2013). CA’s focus on the sequential and temporal phe nomena of social actions allows us to examine pragmatic speaking ability in an extended scope. In particular, CA-informed research has contributed to explaining what a pragmatically-appropriate conversation entails; such research has revealed how to contextualize an upcoming action and how to place context-appropriate information reflective of what an interlocutor says in the prior turn. For example, Youn’s (2020) turn-by-turn analysis of refusal role-play interactions demonstrated that highly proficient L2 learners oriented to the face-threatening nature of refusal sequentially by using between-turn delays, discourse markers (e.g., well, you know), and hesitation markers. Furthermore, their subsequent turns were contingent upon an interlocutor’s response to their initial refusal, indicating that their sensitivity to the context was displayed sequentially. For example, when an interlocutor’s re sponse to the initial refusal was not an expected response (e.g., an immediate confir mation of the refusal), the learners added the possibility of complying with a request as post-expansion (e.g., “if not, I can try to change my schedule”) to minimize the degree of disaffiliation (Pomerantz & Heritage, 2013). These findings illustrate that the fundamental interactional organizations, such as an adjacency pair and normally expected sequence organizations, are observed in simulated role-play interaction. In an exemplar study below, Barón et al. (2020) used a role-play as a learning outcome measure.

162 Naoko Taguchi and Soo Jung Youn

Exemplar study: Barón, Celaya, & Levkina (2020) Goal of the study To investigate the effect of teaching speech acts in pragmatic interaction (giving opinion, agreeing/disagreeing, interrupting, acknowledgment) using a task-based teaching approach by comparing a group who received a task-supported pragmatic teaching and a group who received a deductive pragmatic instruction with no tasks. Research question Does pragmatic instruction through TSLT (task-supported language teaching) enhance L2 pragmatic learning? Participants 50 Catalan/Spanish EFL learners at a B2 CEFR proficiency level in three intact classes participated. The first group (G1) received the task-supported pragmatic instruction and the second group (G2) was instructed in pragmatics using a traditional deductive approach with no tasks. The last group (G3), as a control group, did not receive any pragmatic instruction or use tasks. Instruction The students in both G1 and G2 received a multi-phased pragmatic input and instruction during the five two-hour lesson plans. Each lesson was devoted to teaching specific speech acts. The main difference is that the students in G1 were instructed using a series of tasks that differ in terms of task complexity. The students prepared and completed tasks, such as a debate and conducting interviews. The students in G2 completed a series of activities to learn pragmatic moves. The activities, such as filling-in-the-gap, were designed to develop learners’ pragmatic awareness. The students were given a situation and asked to choose the most appropriate expression. Methods for assessing learning outcomes Two role-play situations, one involving the roles of classmates and another with the roles of friends, were designed. The sample role-play situation with the roles of classmates is summarized below. Situation for Student A Imagine that you and your classmates prepare an upcoming field trip. Of the various options for trip destinations, you love visiting museums and cities full of culture. Try to convince your partner to agree with your choice. Situation for Student B Imagine that you and your classmates prepare an upcoming field trip. Of the various options for trip destinations, you love adventure trips (e.g., mountains, surfing, rafting). Try to convince your partner to agree with your choice. Findings Both G1 and G2 showed differences, confirming the positive effect of metapragmatic explanations. However, a noticeable difference between G1 and G2 was not reported. In terms of the speech act type, only acknowledging the interlocutors showed a difference in G1, compared to G2, indicating a positive effect of the task-supported instruction on improving interactional skills. However, the learners displayed limited variety in terms of giving opinions or agreement/disagreement. The researchers noted the age of participants (i.e., teenagers) and the need for additional meaningful tasks as possible reasons for explaining the results.

Chapter 7. Pragmatics 163

3.2.2 Pros and cons of role-play tasks The biggest advantage of role-plays is their capacity to elicit a wide range of prag matic speaking ability as learning outcomes (e.g., preparing an interlocutor for an upcoming action, sequencing information in a context-appropriate manner, producing a turn while maintaining the continuity of talk). Thus, role-plays help elicit and examine the target pragmatic act in an extended conversation. Another benefit is that role-plays ensure some degree of standardization in a somewhat controlled setting compared to naturally occurring interaction. Just like a DCT, role-plays allow teachers and researchers to compare role-play performances from pre- to post-instruction. With the increased use of role-plays, we have to critically examine the authen ticity of simulated role-play interaction in terms of its interactional organizations (e.g., Hassall, 2020; Stokoe, 2013). This issue is also closely tied to the generaliza bility of role-play interaction to real-life interaction. On this, Gumperz and CookGumperz (1982, p. 11, as cited in Kasper, 2008) make an important point: they contend that participants’ experiences with varying authentic situations can still serve as the basis for “recreating socially realistic experimental conditions” in roleplays. As long as the situations in role-plays are skillfully constructed and reflect participants’ previous familiar experiences, it is highly plausible to observe partic ipants’ unconscious use of discourse strategies in role-plays. The aforementioned advantages are closely tied with shortcomings of role-plays. Role-plays are not suitable for all instructed learners, especially those of limited target language proficiency and processing capacities. In addition, given the wide scope of pragmatic ability elicited in role-plays, role-plays may not be suitable for certain pragmatic targets, such as a specific pragmalinguistic strategy or a particular grammatical construction. When these narrowly-focused pragmatic features are taught, a DCT might be a more effective measure of learning outcomes. When choosing scenarios for role-plays, learners’ language use needs and familiarity with targeted situations (e.g., academics, institutions, everyday conversations) can be carefully considered. In order to design role-plays that elicit authentic interaction, we can utilize various technological applications and ensure consequentiality of role-play interaction (see Section 5 for how to design role-plays). Lastly, coordinat ing the administration of role-plays and scoring role-play interaction also presents major challenges in terms of practicality. Teachers and researchers often need to arrange individual meetings with participants to administer role-plays. They also spend a considerable amount of time in rating role-play performances.

164 Naoko Taguchi and Soo Jung Youn

3.2.3 Step-by-step role play illustration When creating their role-play tasks, Barón et al. (2020) considered (a) the degree of social distance; (b) ways to ensure authentic role-play interaction with consequential outcomes by asking each student to defend their choices; and (c) familiar situations that reflect students’ everyday interaction. The researchers transcribed and coded the audio-recorded role-play interaction. In order to develop a coding scheme, the data were analyzed to identify pragmatic strategies and moves, which resulted in assigned scores corresponding to the directness level in the expressions. Depending on the speech act type, the researchers specified concrete examples of direct and indirect expressions. 4. Advice to future pragmatics researchers in ISLA: How to improve assessment methods of learning outcomes Based on the characteristics of DCTs and role-plays outlined above, in this section we discuss how these methods for learning outcomes can be improved in future research. 4.1

Incorporating prosody as instructional targets and learning outcomes

The current practice predominantly focuses on linguistic forms and semantic moves when teaching speech acts and assessing learning outcomes. Prosody has largely been neglected in instructional studies in pragmatics. Suprasegmental features like intonation, pitch, stress, and voice quality can greatly contribute to pragmatics-meaning making. These features work as a set of contextualization cues (Gumperz, 1982), signaling to the listener how to interpret the speaker’s intention (e.g., being sincere vs. sarcastic). Hence, we recommend that prosodic features that accompany speech acts be a part of the instructional materials. Teachers can emphasize how tone, pitch, and stress help convey intention, politeness, and directness by exposing learners to different speech samples in a variety of social situations. Correspondingly, the scope of analysis of oral-DCT and role-play elicited data should go beyond the typical features of linguistic forms and fluency, extending to features of prosody. Computer programs such as PRAAT (Boersma & Weenink, 2016) can help with the analysis of speech properties using visualization techniques. A systematic and objective acoustic analysis made possible by those programs can reveal L2 learners’ abilities to use prosody to express meaning (Kang et al., 2021; Taguchi et al., 2021). Correspondingly, such abilities can be taught explicitly in instruction.

Chapter 7. Pragmatics 165

4.2

Improving authenticity of DCTs and role-plays via technology applications

Using technology, DCTs can be improved to incorporate features that reflect au thentic, real-life interaction. Particularly relevant are visual and auditory input that can be incorporated into DCT scenarios. Some recent efforts have been made in this area. For example, Halenko (2013) used an internet-based animated movie website to develop animated scenarios for DCTs. Rockey et al. (2020) developed a task to examine nonverbal aspects of attention-getting behaviors in request-making among L2 Spanish learners. They used FlipGrid to deliver a video prompt and video-record participants’ responses. Hashimoto and Nelson (2020) created a platform simulat ing the online Q&A community (Yahoo Answers!) to elicit advice-giving. Taguchi (2021) used immersive virtual reality (VR) technology to create an oral DCT and compared participants’ speech acts (requests, refusals, opinions) with those elicited via a traditional computer-based DCT. Technology can also improve the format of role-plays. For example, Chen (2019) developed semi-structured role-plays administered online. In her setup, learners either produced the first turn to initiate an action or listened to their in terlocutor’s pre-recorded turn and provided a relevant response. This exchange lasted several turns, reflecting the nature of an extended conversation. In these technology-assisted role-plays, researchers can arrange participants to interact with an interlocutor via a computer or have participants record their responses by them selves, which could lessen the logistical burden involved in role-play administration. While a variety of technology-enhanced DCTs and role-plays exist, so far, very few instructional studies have used them as methods to assess learning outcomes (e.g., Halenko & Jones, 2017; Sydorenko et al., 2020) and thus additional research is needed. Such research can address how instruction is effective in improving pragmatic knowledge in an authentic task and how learners can transfer learned knowledge to real-life situations. 4.3

Enhancing consequentiality in role-play interaction

In order to strengthen the authenticity of simulated role-play interaction and to promote transfer of learned pragmatic knowledge to the real-world, a role-play task needs to be designed to be consequential, both as an instructional task and an assessment measure. Simply providing general role-play situations and stat ing participants’ social roles is not sufficient for a simulated role-play. Researchers need to carefully consider how a role-play set-up allows participants to engage in simulated interaction as naturally as possible, and how contingencies impact the

166 Naoko Taguchi and Soo Jung Youn

course of interaction. To make a conversation consequential, participants should be provided with different goals of interaction via role-play cards or should be given some information that is not shared by both participants. When assigning roles for participants, we recommend considering participants’ previous experiences so that the simulated interaction is grounded in participants’ social identity and real-life experiences. To this end, researchers need to specify the degree to which reality and consequentiality of role-plays is represented in their research reports. For example, researchers can describe the degree to which interactional outcomes are imposed on participants and how participants manage contingencies during role-play in teraction (e.g., a participant does not need to know unknown factors, such as a schedule constraint, in advance because these factors influence the likelihood of a request’s acceptability). Finally, since several pragmatic acts are inevitably face-threatening (e.g., mak ing a complaint, disagreeing with someone), as part of ethical considerations, re searchers need to consider the psychological and physiological impact of those acts when participants perform them. Since the manner in which such acts are performed can be a direct representation of participants’ identity and personality, data should be kept confidential. 5. Troubleshooting ISLA pragmatics research Common question 1: How can we choose the most appropriate method to assess learning outcomes? Should we choose DCTs or role-plays? The appropriacy of each measure in instructional studies depends on the goals of instruction, targeted pragmatic features, and learner characteristics. If instruc tion is intended for specific offline pragmatic knowledge (e.g., bi-clausal forms in requests), DCTs designed to elicit request pragmalinguistic knowledge can be used. On the other hand, when assessing the effect of task-based instruction on interactive pragmatic strategies (e.g., effective turn-taking strategies in request in teraction), role-plays can provide convincing evidence of intended pragmatic in struction. In addition, learner characteristics need to be considered when selecting an appropriate outcome measure. Even if pragmatic targets involve interactive prag matic performances, a role-play might not be appropriate for young learners or low proficiency learners due to its high cognitive demands. Instead, technology-assisted methods that combine the characteristics of role-plays and DCTs (cf. Sydorenko et al., 2020) may be more suitable.

Chapter 7. Pragmatics 167

Beyond DCTs and role-plays, other assessment methods can be implemented depending on instructional targets (see Culpeper et al., 2018, for further reading). For example, when we teach pragmatic features that occur frequently in everyday conversations such as discourse markers (e.g., “you know,” “I mean”), a conversation task in which learners talk about routine topics can be an appropriate measure. When we teach discussion strategies (e.g., how to interrupt others, how to agree or disagree with others), a discussion task in face-to-face or video conferencing using Zoom can be useful. Common question 2: How can we determine assessment criteria and score bands when evaluating learning outcomes? Determining the appropriacy of elicited pragmatic performance is not straight forward as pragmatic norms differ across various contexts. We recommend re searchers and teachers adapt existing assessment criteria that are known to be valid and reliable. In doing so, the degree to which the chosen rating criteria reflect the intended pragmatic targets needs to be further examined for both DCT-elicited and role-play data. In terms of rating scales, the decision can be made whether fine discriminations among learners are necessary or not. Some instructional studies use rating criteria on a scale of 1 to 10 and others use a scale of 1 to 5. Often, a wide range of rating levels are preferred in a large-scale proficiency test to create quantitative variance that distinguishes among a large number of learners and their varying language proficiencies. However, in a classroom context with a relatively small number of learners, a wide range of score bands are not necessarily effective. Inconsistent rating decisions between adjacent scores (e.g., 8 and 9) out of a wide range of score bands can result in low inter-rater reliability. Thus, these issues need to be considered when making a principled decision regarding assessment criteria and score bands. 6. Conclusions This chapter described DCTs and role-plays as the two most common assessment methods in instructed pragmatics research. DCTs can assess learners’ pragmatic knowledge of linguistic forms (e.g., syntactic and lexical mitigations) associated with specific communicative functions (e.g., speech acts of request and refusal). On the other hand, role-plays can assess learners’ pragmatic performance as they collaboratively construct a social action with their interlocutors via turn-taking. The choice between these methods depends on instructional goals and materials.

168 Naoko Taguchi and Soo Jung Youn

If the goal is to expand learners’ pragmalinguistics repertoire, form-focused ma terials are typically used in instruction, and the resulting learning outcomes are assessed with DCTs. In contrast, if the goal is to develop learners’ ability to use pragmatic knowledge to interact with others, performance-based tasks such as role-plays are often used for both instruction and assessment. An important con sideration for teachers is to determine learning objectives based on their students’ needs and to design instructional materials accordingly. Methods for assessing learning outcomes should be developed based on what is taught and how it is taught. Critically, teachers should aim at adapting (rather than adopting) existing DCTs and role-plays according to their local contexts and needs. When adapting existing methods, teachers can carefully examine the pros and cons of targeted methods, as well as ways to overcome any limitations of existing methods. We believe that the areas for improvement outlined in this chapter (e.g., eliciting and assessing often-neglected areas such as prosody and consequentiality, contextual enrichment via technology applications) are helpful for this purpose. 7. Further reading and additional resources 7.1

Suggested books

Culpeper, J., Mackey, A., & Taguchi, N. (2018). Second language pragmatics: From theory to research. Routledge. https://doi.org/10.4324/9781315692388 Ishihara, N., & Cohen, A. D. (2010). Teaching and learning pragmatics. Routledge. Roever, C. (2021). Teaching and testing second language pragmatics and interaction: A practical guide. Routledge. https://doi.org/10.4324/9780429260766 Taguchi, N., & Kim, Y. (Eds.). (2018). Task-based approaches to teaching and assessing pragmatics. John Benjamins. https://doi.org/10.1075/tblt.10

7.2

Suggested journals, professional organizations, and websites

Applied Pragmatics. https://benjamins.com/catalog/ap Journal of Pragmatics. https://www.journals.elsevier.com/journal-of-pragmatics/ Intercultural Pragmatics. https://www.degruyter.com/journal/key/IPRG/html International Pragmatics Association. https://pragmatics.international Center for Advanced Research on Language Acquisition: Pragmatics and speech acts. https:// carla.umn.edu/speechacts/index.html

Chapter 7. Pragmatics 169

References Alemi, M., & Haeri, N. S. (2020). Robot-assisted instruction of L2 pragmatics: Effects on young EFL learners’ speech act performance. Language Learning & Technology, 24(2), 86–103. Al-Gahtani, S., & Roever, C. (2012). Proficiency and sequential organization of L2 requests. Applied Linguistics, 33(1), 42–65. https://doi.org/10.1093/applin/amr031 Al-Gahtani, S., & Roever, C. (2018). Proficiency and preference organization in second language refusals. Journal of Pragmatics, 129, 140–153. https://doi.org/10.1016/j.pragma.2018.01.014 Barón, J., Celaya, M. L., & Levkina, M. (2020). Learning pragmatics through tasks: When in teraction plays a role. Applied Pragmatics, 2(1), 1–25. https://doi.org/10.1075/ap.18010.bar Blum-Kulka, S., House, J., & Kasper, G. (1989). Cross-cultural pragmatics: Requests and apologies. Ablex. Boersma, P., & Weenink, D. (2016). Praat: Doing phonetics by computer (Version 6.0.11) [Software]. Available from http://www.fon.hum.uva.nl/paul/praat.html Chen, S. (2019, October 4–5). Developing an L2 pragmatic speaking test using conversation analysis findings [Paper presentation]. Annual Midwest Association of Language Testers (MwALT) conference, Indiana University Bloomington. Cohen, A. D., & Shively, R. L. (2007). Acquisition of requests and apologies in Spanish and French: Impact of study abroad and strategy-building intervention. Modern Language Journal, 91(2), 189–212. https://doi.org/10.1111/j.1540-4781.2007.00540.x Eslami, Z. R., Mirzaei, A., & Dini, S. (2015). The role of asynchronous computer mediated com munication in the instruction and development of EFL learners’ pragmatic competence. System, 48, 99–111. https://doi.org/10.1016/j.system.2014.09.008 Félix-Brasdefer, J. C. (2007). Pragmatic development in the Spanish as a FL classroom: A cross-sectional study of learner requests. Intercultural Pragmatics, 4(2), 253–286. https://doi.org/10.1515/IP.2007.013 Golato, A. (2003). Studying compliment responses: A comparison of DCTs and recordings of naturally occurring talk. Applied Linguistics, 24(1), 90–121. https://doi.org/10.1093/applin/24.1.90 Grabowski, K. C. (2009). Investigating the construct validity of a test designed to measure gram matical and pragmatic knowledge in the context of speaking [Unpublished doctoral disser tation]. Columbia University. Gumperz, J. J. (1982). Discourse strategies. Cambridge University Press. https://doi.org/10.1017/CBO9780511611834 Gumperz, J. J., & Cook-Gumperz, J. (1982). Introduction: Language and the communication of social identity. In J. J. Gumperz (Ed.), Language and social identity (pp. 1–21). Cambridge University Press. Halenko, N. (2013). Using computer animation to assess and improve spoken language skills. In Conference proceedings: ICT for language learning (p. 286). Libreriauniversitaria. Halenko, N., & Jones, C. (2017). Explicit instruction of spoken requests: An examination of pre-departure instruction and the study abroad environment. System, 68, 26–37. https://doi.org/10.1016/j.system.2017.06.011 Hashimoto, B. J., & Nelson, K. (2020). Using a corpus in creating and evaluating a DCT. Applied Pragmatics, 2(1), 80–120. https://doi.org/10.1075/ap.19009.has

170 Naoko Taguchi and Soo Jung Youn

Hassall, T. (2020). Preference structure in request sequences: What about role-play? Journal of Pragmatics, 155, 321–333. https://doi.org/10.1016/j.pragma.2019.02.020 Huth, T. (2010). Can talk be inconsequential? Social and interactional aspects of elicited sec ond-language interaction. Modern Language Journal, 94(4), 537–553. https://doi.org/10.1111/j.1540-4781.2010.01092.x Kang, O., Kermad, A., & Taguchi, N. (2021). The interplay of proficiency and study abroad experience on prosody of L2 speech acts. Journal of Second Language Pronunciation, 7(3), 343–369. https://doi.org/10.1075/jslp.20024.kan Kasper, G. (2006). Beyond repair: Conversation analysis as an approach to SLA. AILA Review, 19(1), 83–99. https://doi.org/10.1075/aila.19.07kas Kasper, G. (2008). Data collection in pragmatics research. In H. Spencer-Oatey (Ed.), Culturally speaking: Culture, communication and politeness theory (pp. 279–302). Continuum. Kasper, G., & Dahl, M. (1991). Research methods in interlanguage pragmatics. Studies in Second Language Acquisition, 13(2), 215–247. https://doi.org/10.1017/S0272263100009955 Kasper, G., & Youn, S. J. (2018). Transforming instruction to activity: Roleplay in language assess ment. Applied Linguistics Review, 9(4), 589–616. https://doi.org/10.1515/applirev-2017-0020 Kim, M., Lee, H., & Kim, Y. (2018). Learning of Korean honorifics through collaborative tasks: Comparing heritage and foreign language learners. In N. Taguchi & Y. Kim (Eds.), Taskbased approaches to teaching and assessing pragmatics (pp. 27–54). John Benjamins. https://doi.org/10.1075/tblt.10.02kim Li, S. (2012). The effects of input-based practice on pragmatic development of requests in L2 Chinese. Language Learning, 62(2), 403–438. https://doi.org/10.1111/j.1467-9922.2011.00629.x Loewen, S., & Sato, M. (Eds.). (2017). The Routledge handbook of instructed second language acquisition. Routledge. https://doi.org/10.4324/9781315676968 Nguyen, H. T. (2018). Interactional practices across settings: From classroom role-plays to work place patient consultations. Applied Linguistics, 39(2), 213–235. Okada, Y. (2010). Role-play in oral proficiency interviews: Interactive footing and interactional competencies. Journal of Pragmatics, 42(6), 1647–1668. https://doi.org/10.1016/j.pragma.2009.11.002 Plonsky, L., & Zhuang, J. (2019). A meta-analysis of second language pragmatics instruction. In N. Taguchi (Ed.), Routledge handbook of SLA and pragmatics (pp. 287–307). Routledge. Pomerantz, A., & Heritage, J. (2013). Preference. In J. Sidnell & T. Stivers (Eds.), The handbook of conversation analysis (pp. 210–228). Wiley-Blackwell. Rockey, C., Tiegs, J., & Fernández, F. (2020). Mobile application use in technology-enhanced DCTs. CALICO, 37(1), 85–108. https://doi.org/10.1558/cj.38773 Schmidt, R. (1993). Consciousness, learning and interlanguage pragmatics. In G. Kasper & S. Blum-Kulka (Eds.), Interlanguage pragmatics (pp. 21–42). Oxford University Press. Stokoe, E. (2013). The (in)authenticity of simulated talk: Comparing role-played and actual interaction and the implications for communication training. Research on Language and Social Interaction, 46(2), 165–185. https://doi.org/10.1080/08351813.2013.780341 Swain, M., & Lapkin, S. (1998). Interaction and second language learning: Two adolescent French immersion students working together. Modern Language Journal, 82(1), 320–337. https://doi.org/10.1111/j.1540-4781.1998.tb01209.x Sydorenko, T., Jones, Z. W., Daurio, P., & Thorne, S. L. (2020). Beyond the curriculum: Extended discourse practice through self-access pragmatics simulations. Language Learning & Technology, 24(2), 48–69.

Chapter 7. Pragmatics 171

Sykes, J. M., & González-Lloret, M. (2020). Exploring the interface of interlanguage (L2) prag matics and digital spaces. CALICO Journal, 37(1), i–xv. https://doi.org/10.1558/cj.40433 Taguchi, N. (2015). Instructed pragmatics at a glance: Where instructional studies were, are, and should be going. Language Teaching, 48(1), 1–50. https://doi.org/10.1017/S0261444814000263 Taguchi, N. (2021). Application of immersive virtual reality (VR) to pragmatics data collection methods: Insights from interviews. CALICO, 38(2), 181–201. https://doi.org/10.1558/cj.41136 Taguchi, N., Hirschi, K., & Kang, O. (2021). Longitudinal L2 development in the prosodic marking of pragmatic meaning: Prosodic changes in L2 speech acts and individual factors. Studies in Second Language Acquisition, 1–16. Advance online publication. https://doi.org/10.1017/S0272263121000486 Taguchi, N., & Kim, Y. (2016). Collaborative dialogue in learning pragmatics: Pragmatic-related episodes as an opportunity for learning request-making. Applied Linguistics, 37(3), 416–437. https://doi.org/10.1093/applin/amu039 Taguchi, N., Li, Q., & Tang, X. (2017). Learning Chinese formulaic expressions in a scenar io-based interactive environment. Foreign Language Annals, 50(4), 641–660. https://doi.org/10.1111/flan.12292 Taguchi, N., & Roever, C. (2017). Second language pragmatics. Oxford University Press. Taguchi, N., & Sykes, J. M. (Eds.). (2013). Technology in interlanguage pragmatics research and teaching. John Benjamins. https://doi.org/10.1075/lllt.36 Takimoto, M. (2020). Investigating the effects of cognitive linguistic approach in developing EFL learners’ pragmatic proficiency. System, 89. https://doi.org/10.1016/j.system.2020.102213 Verschueren, J. (1999). Understanding pragmatics. Oxford University Press. Youn, S. J. (2020). Pragmatic variables in role-play design for the context validity of assessing interactional competence. Papers in Language Testing and Assessment, 9(1), 95–127.

172 Naoko Taguchi and Soo Jung Youn

Appendix. Instructional pragmatics studies published from 1989 to 2021 (77 studies) NOTE. MAQ: metapragmatic awareness questionnaire. AJ: appropriateness/acceptability judgment task. MCQ: multiple-choice questionnaire. DCT: discourse completion test. Explicit: instruction with metapragmatic information. Implicit: instruction without metapragmatic information. TG: treatment group. CG: control group. Study

Design

Participants L2

Ajabshir (2020) Pre-post- Iranians delay / (n = 67) control

English

Pragmatic target(s)

Treatment type

Request

Input-based (textual

Outcome measure(s)

Results

AJ, written DCT Comprehension test results: Textual input enhancement (TE) output-based instruction and Output-based instruction >CG; Production test results: OI>TE>IF>CO

Alcón-Soler (2007)

Pre-post- Spanish L1 delay/ (n = 132) control

English

Request

Explicit vs. implicit

Recognition test Explicit outperformed implicit

Alcón-Soler (2015)

Pre-post- Spanish L1 delay/ (n = 60) control

English

Request mitigators in email

Explicit

Emails

Alcón-Soler (2017)

Pre-post/ Spanish L1 control (n = 60)

English

Request in email

Deductive, Inductive

Alcón-Soler & GuzmanPitarch (2013)

Pre-post

Spanish L1 (n = 92)

English

Refusal

Explicit

Alemi & Haeri (2020)

Pre-post

Iranian L1 children (n = 38)

English

Request, thanking

Robot-assisted pragmatic Pictorial teaching elicitation task

delayed test Emails for indirect request strategies and Interview

and thanking

Chapter 7. Pragmatics 173

Study

Design

Alsuhaibani (2020)

Pre-post/ Saudi control (n = 136)

English

Bardovi-Harlig Pre-post/ Mixed L1 control (n = 37)

English

Pre-post/ Catalan/ control Spanish bilinguals (n = 50)

English

(2020)

Participants L2

Pragmatic target(s)

Treatment type

Outcome measure(s)

Results

Compliment

Conscious-raising & corpus-based

Written DCT; survey

Pragmatic routines and discussion strategies

Explicit

Oral DCT

Pragmatic routines and discussion strategies

Task-supported instruction vs. traditional instruction

Role-play

Gains in TG; task-supported group only outperformed on acknowledging the interlocutor Frequency of modals increased by 22

between corpus-based and consciousness-raising groups

Belz & Pre-post Vyatkina (2005)

Mixed L1s (n = 16)

German Modals

Explicit

Online discussion

Bouton (1999)

Pre-post

Mixed L1s (n = 14)

English

Explicit

MCQ

Cunningham & Vyatkina (2012)

Pre-post

English (n = 9)

German Politeness modals

Explicit

Online discussion

Appropriate use of target forms at posttest

Cohen & Taron Pre-post/ Mixed L1s (1994) control (n = 25)

English

Opinion

Explicit

Essays

TG outperformed CG at post

da Silva (2003)

Pre-post/ Spanish L1 control (n = 14)

English

Refusal

Explicit

Role play

More indirect refusals and supporting moves at posttest

Derakhshan & Eslami (2015)

Pre-post

English

Apology, Request

Iranian L1 (n = 60)

Implicature

(discussion, role-play, translation)

Multiple-choice DCT discussion group improved most

174 Naoko Taguchi and Soo Jung Youn

Study

Design

Participants L2

Eslami & Pre-post/ Iranians Eslami-Rasekh control (n = 52) (2008)

English

Pragmatic target(s)

Treatment type

Outcome measure(s)

Request, apology

Explicit

Recogni-tion test; written DCT

Results

and group

Pre-post/ Iranians L1 control (n = 74)

English

Request

(2015)

Asynchronous CMC; explicit vs. implicit

Written DCT, emails

Eslami & Liu (2013)

Pre-post/ Iranians control

English

Request

Explicit vs. explicit with additional practice via CMC

Written DCT

Eslami-Rasekh Pre-post/ Iranians control (n = 66)

English

Request, apology, complaint

Explicit

MCQ

TG outperformed CG at posttest

Félix-Brasdefer Pre-post- English L1 (2008) delay/ (n = 32) control

Spanish Refusal

Explicit vs. implicit

Role play

Explicit outperformed implicit

Fukuya & Zhang (2002)

English

Request

Implicit

Written DCT

TG outperformed CG at posttest

Epistemic markers

Explicit vs. implicit

Essay

Explicit outperformed implicit

Refusal

Explicit

Written DCT

Explicit group outperformed CG;

Explicit vs. implicit

Written DCT, role-play

Explicit outperformed implicit

Pre-post/ Chinese L1 control (n = 24)

Fordyce (2014) Pre-post- Japanese L1 English delay (n = 143) (2016) Ghobadi & Fahim (2009)

Pre-post/ Iranian L1 control (n = 104)

English

Pre-post

English

Arabic L1 (n = 60)

CMC group outperformed CG; explicit CMC outperformed implicit CMC; more use of supportive moves in TGs.

Chapter 7. Pragmatics 175

Study

Design

Participants L2

Pragmatic target(s)

Treatment type

Outcome measure(s)

Glasler (2016)

Pre-post

German L1 (n = 49)

English

Refusal

Explicit inductive vs. explicit deductive

Written DCT, role-play

Gómez & Ede- Pre-post Hernandez (2021)

L1 is not reported (n = 10)

Spanish Compliments Learning strategy instruction

Written DCT; survey

Halenko & Jones (2011)

Pre-post

Chinese L1 (n = 26)

English

Request

Explicit

Written DCT

Halenko & Jones (2017)

Pre-post- Chinese L1 delay/ (n = 34) control

English

Request

Explicit

Role-plays with computeranimation

Hernández (2021)

Pre-post/ Spanish L1 control (n = 18)

English

Apologies

Explicit

Oral DCT

House (1996)

Pre-post

English

Gambits, discourse strategies

Explicit vs. implicit

Role-play

20% more use of gambits in explicit than implicit group

Ishida (2009)

Pre-post/ Mixed L1s control (n = 6)

Japanese Speech style

Explicit

MAQ

TG commented on speech style 11

Iwai (2013)

Pre-post/ Mixed L1s control (n = 28)

Japanese Interactional marker

Explicit

Conversation

Over 70% of TG produced target forms, but none in the CG

Strategic instruction

Written DCT

Explicit

Emails

German L1 (n = 32)

Kakegawa (2009)

English L1 (n = 11)

(2018)

Positive change in showing intimacy and sociability of forms; better metapragmatic awareness

TG outperformed CG; interaction type; attrition in the delayed test change in apology strategies

Japanese L1 English (n = 22)

Pre-post- Mixed L1s delay (n = 46)

inductive group had a modest advantage

CG

Johnson & Pre-post deHaan (2013) Pre-post

Results

Request, apology

Japanese

but not for accuracy Frequency of particles increased by

particles Korean

Korean

Task-based collaborative Written DCT, writing AJ

Improved receptive and productive

176 Naoko Taguchi and Soo Jung Youn

Study

Design

Participants L2

Kim & Taguchi Pre-post- Korean L1 (2015) delay/ (n = 73) control Kondo (2008)

Pre-post

Kubota (1995)

English

Pragmatic target(s)

Treatment type

Request

Outcome measure(s)

Results

Written DCT

Both TGs outperformed the CG; only the complex task group maintained knowledge of request head act at delayed posttest Frequency of strategy use changed by 11–20% toward native speaker baseline

explicit

Refusal

Explicit

Oral DCT

Pre-post- Japanese L1 English delay (n = 126)

Implicature

Inductive vs. deductive

Comprehension test

Li (2012)

Pre-post- Chinese L1 delay (n = 197)

Request

Explicit vs. input-enhanced vs. input-output

Written DCT

Li (2019)

Pre-post- L1 not delay/ reported control (n = 39)

Liddicoat & Crozet (2001)

Pre-post- English L1 delay (n = 10)

Lin & Wang (2020)

Pre-post Pre-post

Martínez-Flor (2006)

English

Input-enhanced outperformed other conditions

CMC with and without particle (SFP) data-driven instruction; ne in Chinese Explicit

Form-function matching, and TG in SFP use production task

French

Structure of small talk

Explicit

Role-play

Taiwanese L1 (n = 30)

English

Apology in email

Genre-based approach, explicit

MCQ, Written DCT, interview

planning strategies

Chinese L1 (n = 3)

English

Interview skills

Explicit

Mock job interview

Interview skills ratings improved by about 50% at posttest

Pre-post- English L1 delay/ (n = 106) control

French

Address forms

Explicit

Written & oral DCT, MCQ

time and group

Pre-post

English

Suggestion

Explicit vs. implicit

(2010) Lyster (1994)

Japanese L1 English (n = 38)

Spanish L1 (n = 81)

rating

Content of small talk increased from zero to 86%; form increased from 10 to 60%

Chapter 7. Pragmatics 177

Study

Design

Participants L2

Pragmatic target(s)

Treatment type

Outcome measure(s)

Martínez-Flor (2008)

Pre-post

Spanish L1 (n = 38)

Request

Inductive & deductive

Role-play

Narita (2012)

Pre-post/ Mixed L1s control (n = 41)

Implicit

Knowledge tests, oral production

English

Results

25.6% to 74.4%. Japanese Hearsay expressions

time and group.

Nguyen (2013) Pre-post- Vietnamese English delay/ L1 (n = 50) control

Criticisms

Explicit

Written DCT, role- play, peer feedback

TEG outperformed CG at posttest

Nguyen (2018) Pre-post, Vietnamese English delay/ L1 (n = 32) control

Email requests

Explicit

Written DCT

Explicit vs. implicit

Written DCT, role-play, peer feedback

Explicit outperformed implicit on all three methods

Written DCT

Four TGs outperformed CG at immediate posttest, but not at delayed-posttest

Pre-post- Vietnamese English delay/ (n = 69) control

Criticisms

(2012)

(2018)

Pre-post- Vietnamese English delay/ L1 control (n = 79)

Syntactic Explicit vs. implicit downgraders in (feedback types) email requests

(2017)

Pre-post- Vietnamese English delay L1 /control (n = 41)

Two criticism Implicit (enhanced input Written DCT, realization and recasts) peer feedback strategies

Safont (2004)

Pre-post

Spanish L1 (n = 160)

English

Request

Explicit

Written DCT, role-play

frequency

Sardegna & Molle (2010)

Pre-post

Japanese L1 English (n = 5)

Reactive tokens

Explicit & implicit

Online discussion

Target forms emerged at post, but negative L1 transfer remained

Pre-post

Persian L1 (n = 60)

Apology

Explicit vs. implicit

Written DCT

Explicit outperformed implicit

(2014)

English

ie-2010

TG outperformed CG with head act;

178 Naoko Taguchi and Soo Jung Youn

Study

Design

Participants L2

Sykes (2009, 2013)

Pre-post

Spanish Request & Mixed L1s apology (n = 53 & 25)

Sydorenko

Pre-post- Mixed L1s delay (n = 26)

English

Tan & Farashaian (2012)

Pre-post/ Malay L1 control (n = 60)

Taguchi & Kim Pre-post- Korean L1 (2016) delay/ (n = 74) control

(2017)

Pre-post- Mixed L1s delay (n = 30)

Pragmatic target(s)

Treatment type

Outcome measure(s)

Results

Implicit

Written DCT

1–6% gain for request strategies; 49% gain for apology strategies

Request

Implicit vs. Implicit + Explicit (IE)

Role-play,

Small descriptive changes in TGs; more self-reported noticing in IE group; both groups reported sociopragmatic strategies

English

Request

Explicit

Written DCT, AJ

TG outperformed CG at post

English

Request

Written DCT

Collaborative group better on request head act at immediate

dialogue; explicit

Chinese Formulaic expressions

Computer-delivered, scenario-based interactive practice

the-dialogue, interview

Takahashi (2001)

Pre-post

Japanese L1 English (n = 107)

Request

Explicit vs. three implicit Written DCT, groups rating

Takimoto (2006)

Pre-post- Japanese L1 English delay/ (n = 45) control

Request

Explicit vs. implicit input Written DCT, processing role-play, AJ

Takimoto (2007)

Pre-post- Japanese L1 English delay/ (n = 41) control

Request

Referential activities only Written DCT, AJ activities

Pre-post-delayed gain on comprehension; pre-post gain on production, but not in delayed post Explicit outperformed implicit. No

Chapter 7. Pragmatics 179

Study

Design

Takimoto (2009) Takimoto (2014)

Participants L2

Pragmatic target(s)

Treatment type

Outcome measure(s)

Pre-post- Japanese L1 English delay/ (n = 60) control

Request

Explicit vs. problem-solving vs. input processing

Written DCT, role-play, AJ

Pre-post- Japanese L1 English delay/ (n = 45) control

Request

Combination of Written DCT, pragmalinguistics & AJ sociopragmatics-focused instruction (CI) vs. sociopragmatics-focused instruction (SI)

Takimoto (2020)

Pre-post/ Japanese L1 English control (n = 89)

Request

Takimoto (2021)

Pre-post- Japanese L1 English delay/ (n = 133) control

Request strategies processing

Tateyama (2001)

Pre-post

Tateyama (2009)

Cognitive vs. non-cognitive linguistic approaches

Written DCT, AJ, survey

Cognitive & non-cognitive linguistic approaches

AJ, survey, interview

Japanese Routines

Explicit vs. implicit

MCQ, roleplay

Pre-post/ Mixed L1s control (n = 46)

Japanese Request

Explicit-regular vs. explicit-expanded

Phone message, role-play

Tateyama (2007)

Pre-post/ Mixed L1s control (n = 46)

Japanese Request

Explicit-regular vs. explicit-expanded

AJ

Taylor (2002)

Pre-post

L1 not reported (n = 16)

Spanish Gambits

Explicit

Discussion, role-play

Spanish L1 (n = 10)

English

Usó-Juan (2013)

Pre-post

Mixed L1s (n = 27)

Refusal

Explicit

DCT

Results

CI outperformed SI and control between SI and control. No group

outperformed in written DCT and AJ TG outperformed CG in AJ

discussion but not for role-play.

180 Naoko Taguchi and Soo Jung Youn

Study

Design

Participants L2

Utashiro & Kawai (2009)

Pre-post

Mixed L1s (n = 24)

Van Compernolle (2011)

Pre-post

Van Compernolle

Pre-post

(2000)

Treatment type

Outcome measure(s)

Japanese Reactive tokens

Explicit

Recognition test, production test

English L1 (n = 1)

French

Concept-based instruction

Interview

L1 not reported (n = 19)

Spanish Conceptual Concept-based pragmatic instruction knowledge on address terms

Written DCT, AJ

English

Explicit

Planned & unplanned writing task

TG outperformed CG at posttest

Explicit

Story telling

Discourse marker increased from .02 to. 39/clause for TG. No change for CG.

Pre-post/ Mixed L1s control (n = 26)

Yoshimi (2001) Pre-post/ Mixed L1s control (n = 17)

Pragmatic target(s)

Address forms

Hedging

Japanese Discourse marker

Results

Expression of more nuanced understanding of address forms.

Chapter 8

Vocabulary A guide to researching instructed second language vocabulary acquisition Emi Iwaizumi and Stuart Webb University of Western Ontario

Acquiring vocabulary knowledge is a vital part of L2 learning because vocab ulary plays a significant role in every mode of communication (reading, lis tening, writing, and speaking). For learners to become independent users of a L2, they must know many thousands of words and learn how to use them well in communication. For example, learners of English must acquire up to 9,000 words (e.g., happy) and their morphologically-related forms (e.g., happiness, unhappy, happily) to comprehend spoken and written texts (e.g., conversation, television programs, films, novels, and newspapers) (Nation, 2006; Webb & Rodgers, 2009a, 2009b). Moreover, acquisition of L2 vocabulary entails learn ing different aspects of word knowledge such as word parts, collocations, and associations, not only learning form-meaning connections. Thus, the teaching, learning, and researching of L2 vocabulary can be highly complex. The purpose of this chapter is to provide a guide to researching instructed second language vocabulary acquisition. The chapter sets out to provide (1) an overview of key concepts in vocabulary research, (2) a brief overview of L2 vocabulary research focusing on intervention studies, (3) an overview of a frequently employed study design (pretest-posttest design), different measures for assessing L2 vocabulary knowledge as well as options and cautions for interpreting data, (4) advice for future vocabulary researchers, and (5) tips to overcome potential challenges. Keywords: intentional vocabulary learning, incidental vocabulary learning, testing vocabulary knowledge, receptive and productive vocabulary knowledge, single-word, multiword items

https://doi.org/10.1075/rmal.3.08iwa © 2022 John Benjamins Publishing Company

182 Emi Iwaizumi and Stuart Webb

1. Conceptualizing vocabulary knowledge and why vocabulary research is important Second language vocabulary acquisition has garnered a great deal of attention in the field of ISLA in the last few decades. This is likely because researchers, teachers, and learners agree that knowledge of vocabulary provides the foundation to commu nicate in a L2. In fact, lexical knowledge has been shown to significantly correlate with listening (Stæhr, 2008; Wang & Treffers-Daller, 2017), reading (Stæhr, 2008; Qian, 2002; Qian & Schedl, 2004), speaking (Crossley et al., 2014; Saito et al., 2016; Uchihara & Clenton, 2018; Uchihara & Saito, 2019), and writing (Shi & Qian, 2012; Stæhr, 2008) proficiency. The central goal of studies of instructed L2 vocabulary is to investigate how instructional intervention promotes development of L2 vocabu lary knowledge. Thus, it is important to first define what it means to know a word. There are several ways to conceptualize what it means to know a word (see Nation, 2013; Yanagisawa & Webb, 2020 for a review). However, the vocabulary components approach, which conceptualizes vocabulary knowledge as multidi mensional, is the most widely employed categorization of L2 vocabulary knowledge used in studies of instructed L2 vocabulary learning. The components approach conceptualizes learning a word as involving mastering different aspects of word knowledge, not only the form-meaning connections of words. Such a concep tualization of vocabulary knowledge is often referred to as depth of knowledge (Anderson & Freebody, 1981; Henriksen, 1999; Nation, 2013; Read, 2007; Richards, 1976; Webb, 2013) which is often contrasted with breadth of knowledge (i.e., the extensiveness of one’s knowledge of meanings of words). While each operationalization of depth of vocabulary knowledge has contrib uted to understanding and assessing L2 vocabulary knowledge more comprehen sibly, Nation’s (2013) framework of word knowledge is the most frequently used (González-Fernández, 2022; Yanagisawa & Webb, 2020). This description of word knowledge consists of three components of knowledge: form, meaning, and use. Each of the components is further divided into three aspects, and each of these includes receptive and productive dimensions (Table 1). Receptive and productive knowledge have also been referred to as passive and active (Laufer & Goldstein, 2004). For more detailed descriptions of each aspect of vocabulary knowledge, readers are directed to Nation (2013). Nation’s (2013) description of word knowledge primarily targets single-word items (e.g., go, happy, day). However, vocabulary knowledge also involves knowl edge of multiword units/items (Nation, 2013). Thus, Nation and Webb (2011) sug gested that knowledge of multiword items could also be assessed using a similar framework with slight changes in the aspects of word parts, grammatical functions, and collocations; we combine the two in Table 1. It should be noted that the term

Chapter 8. Vocabulary 183

Table 1. Description of knowledge of Single-word and multiword units (MWU) Form

Spoken Written

Meaning

Word parts 6402-ei Form and meaning Concept and referents Associations 7402-ei

Use

Grammatical functions 8402-ei Collocations 9402-ei

Constraints on use 052-ei

R P R P R P R P R P R P

What does the word/MWU sound like? How is the word/MWU pronounced? What does the word/MWU look like? How is the word/MWU written and spelled? What word parts/MWU parts are recognizable in this word/MWU? What word/MWU parts are needed to express the meaning? What meaning does this word/MWU form signal? What word/MWU form can be used to express this meaning? What is included in the concept? What items can the concept refer to? What other words/MWUs does this make us think of? What other words/MWU could we use instead of this one?

R P R P

In what patterns does the word/MWU occur? In what patterns must we use this word/MWU? What words, MWUs, or types of words/MWUs occur with this one? What words, MWUs, or types of words/MWUs must we use with this one? Where, when, and how often would we expect to meet this word/MWU? Where, when, and how often can we use this word/MWU?

R P

Note. R = Receptive knowledge, P = Productive knowledge. Source: Adapted from Nation (2013, p. 49) and Nation & Webb (2011, p. 226)

multiword items (units) has been used to encompass all kinds of sequences of words which may differ in their properties (Nation & Webb, 2011). Furthermore, the terms formulaic language/sequences have also been widely used to express multiword items (Siyanova-Chanturia & Pellicer-Sánchez, 2019; Wood, 2015; Wray, 1999, 2002). Unlike the term multiword items, formulaic language/sequences place a greater em phasis on how multiword items may be stored in memory and retrieved as wholes at the time of use (Wray, 2002). In this chapter, multiword items will be used as an umbrella term to cover all types of multiword sequences to avoid confusion. Classifying multiword items is important but challenging because of their di verse nature (Wray & Perkins, 2000). Wood (2020) suggests that multiword items can be classified in three ways. The first approach is to distinguish multiword items by their structural, semantic, and syntactic properties. This approach identifies six types of multiword items including collocations (e.g., “run an errand”), idioms (e.g., “the elephant in the room”), metaphors (e.g., “time is a healer”), simile (e.g., “life is like a box of chocolates”), proverbs (e.g., “let sleeping dogs lie”), compounds (e.g., “dog run”), and phrasal verbs (e.g., “run over”). The second approach to distinguish multiword items is to look at how they are used pragmatically. This approach iden tifies two types of multiword items including lexical phrases (e.g., “so far so good”),

184 Emi Iwaizumi and Stuart Webb

and pragmatic formulas (e.g., “what’s up?”). The third approach is to distinguish multiword items using corpus-based frequency data (e.g., n-gram frequency). This approach identifies multiword items through examining how frequently a combina tion of two or more words occurs (e.g., Biber et al., 2004). This approach identifies two types of multiword items including lexical bundles (e.g., “on the other hand”) and congrams, a sequence of words that frequently occur together but are often interrupted by other words (e.g., “play…role”). Despite the complexity involved in knowing a word, developing L2 vocabulary knowledge is important for learners to become functional L2 users. Research has consistently shown that lexical knowledge strongly correlates with and predicts reading, listening, speaking, and writing proficiency (Stæhr, 2008). Furthermore, the importance of vocabulary knowledge has been underpinned by corpus-driven studies investigating the lexical profiles of a variety of English discourse types. Research has suggested that learners may need to acquire roughly 3,000 word families to understand every day spoken discourse (Adolphs & Schmitt, 2003), TV programs and movies (Webb & Rodgers, 2009a, 2009b), 4,000 word families to understand academic spoken English (Dang & Webb, 2014), and 8,000–9,000 word families to comprehend a variety of written texts such as novels or newspapers (Nation, 2006). It is worth noting that a word family consists of inflected and de rived forms. For example, the word family for develop includes develops, developed, developing, development, developmental, developmentally, and so forth (Bauer & Nation, 1993). Taken together, lexical profiling research suggests that L2 vocabulary knowledge is fundamental in becoming an independent user of a language. To help L2 learners acquire vocabulary knowledge effectively, vocabulary studies of ISLA often evaluate the efficacy of different vocabulary teaching approaches to reveal the degree to which different instructional interventions can promote L2 vocabu lary learning. The following section provides an overview of such L2 vocabulary learning research. 2. What we know and what we need to know about vocabulary research in ISLA Vocabulary can be learned in various ways, but approaches to investigating instructed L2 vocabulary learning can be categorized into two main categories: incidental and intentional learning. Incidental vocabulary learning refers to learning that occurs as a byproduct of exposure to spoken and written language (meaning-focused i nput; Nation, 2013; Webb & Nation, 2017). For example, studies of incidental vocabulary learning have investigated to what extent learners can acquire different aspects of word knowledge through reading-while-listening (Horst, 2005; Webb et al., 2013)

Chapter 8. Vocabulary 185

and reading texts (Pellicer-Sánchez & Schmitt, 2010), listening to short stories (van Zeeland & Schmitt, 2013a) or songs (Pavia et al., 2019), and watching television programs (Peters & Webb, 2018). In incidental vocabulary learning studies, partic ipants are not told to learn vocabulary (cf. Godfroid et al., 2018; Pellicer-Sánchez, 2016; Webb, 2020). On the other hand, intentional vocabulary learning generally refers to learning words deliberately through completing exercises that are intended to develop learners’ word knowledge. For instance, research has examined to what extent vocabulary can be learned from using word pairs (Webb, 2007a), flashcards (Nakata, 2008), reading glossed sentences (Webb, 2007a; Webb & Kagimoto, 2011), completing fill-in-the-blanks (Hulstijn & Laufer, 2001), and writing sentences (Barcroft, 2004; Webb, 2005). Other types of intentional vocabulary learning studies have examined the effects of word-focused instruction such as teaching vocabulary learning strategies (Wei, 2015) or directing learners’ attention to specific linguistic features in target items (see Boers & Lindstromberg, 2008). Both incidental and intentional learning approaches are found to contribute to lexical development to varying degrees (de Vos et al., 2018; Webb et al., 2020). While vocabulary gains can be larger when words are studied deliberately (e.g., Laufer, 2003), it is impor tant to note that different intentional learning conditions yield varying vocabulary gains, at least in terms of form-meaning knowledge (Webb et al., 2020). Taken together, the available evidence suggests that both incidental and intentional in terventions can contribute to the incremental development of different aspects of vocabulary knowledge. There are several gaps in ISLA vocabulary research that future research could address. First, some aspects of word knowledge are much less researched. For example, few studies examined the acquisition of spoken forms. Several studies have examined the degree to which receptive (Peters & Webb, 2018) and productive (Uchihara et al., 2022) knowledge of spoken forms of single-word items could be developed. Research investigating the acquisition of multiword items also tended to focus on the acquisition of written forms (e.g., Boers et al., 2014; Peters & Pauwels, 2015; Webb & Kagimoto, 2011; Webb et al., 2013). Overall, there is a lack of ISLA vocabulary research investigating how different instructional interventions can pro mote learning of spoken forms. The reason that there are few intentional vocabulary learning studies investigating the acquisition of spoken forms might be that many word-focused activities present target items in their written forms (see Webb & Nation, 2017 for a comprehensive review of activities). However, because words are also learned in listening activities and research has often used aural input as the source of input (e.g., Webb et al., 2013), more research investigating the learning of the spoken forms is clearly warranted. Second, while some researchers have long been interested in teaching and learning sequences of words (Wood, 2006, 2015; Wray, 1999), most incidental learning research has been conducted with single

186 Emi Iwaizumi and Stuart Webb

word items (Webb, 2020), and studies examining incidental learning of multiword items are limited (Pellicer-Sánchez, 2017). Although this trend is changing as a growing number of researchers recognize the importance of multiword items in L2 vocabulary development (see Siyanova-Chanturia & Pellicer-Sánchez, 2019 for a review), the acquisition of the spoken forms of multiword items has been largely unexplored. 3. Data elicitation and interpretation options for ISLA vocabulary research Many studies of ISLA have examined the degree to which different intentional and incidental interventions contributed to gaining different aspects of vocabu lary knowledge (Pigada & Schmitt, 2006; van Zeeland & Schmitt, 2013b; Webb, 2007a, 2007b). To measure vocabulary gains, ISLA vocabulary research has often used pretest-posttest (and delayed posttest) designs (Kremmel, 2020). This section provides an overview of a pretest/posttest design and measures that can be used to evaluate vocabulary learning. The brief overview of the pretest-posttest design and vocabulary measures will be followed by a guide to interpreting vocabulary test results and a critique of incidental and intentional L2 vocabulary learning studies. 3.1

Data elicitation options for research involving incidental and intentional interventions

The goal of intervention studies is to examine the efficacy of an instructional intervention. Such studies typically include experimental groups which will re ceive treatments and a control group which will not receive any treatment. Studies examining the intervention effects typically administer pretests and posttests (see also Nation & Webb, 2011). The pretest result can be useful in two ways. First, it can be used to measure pre-existing knowledge of vocabulary items which are expected to be learned through the intervention. It is possible that learners are already familiar with the vocabulary items prior to receiving the treatment. Thus, the pretest result can help researchers determine whether vocabulary items are appropriate target items (e.g., Webb & Kagimoto, 2011). Second, the pretest result can be used as the baseline performance. Researchers can estimate learning gains by examining the degree to which their scores on the pretest and posttest differ after the treatment (within-subject designs). The degree to which instructional intervention contributed to vocabulary gains can also be measured by examining the difference in the posttest scores between the experimental group(s) and the control group (between-subject designs).

Chapter 8. Vocabulary 187

There are usually two posttest retention intervals: an immediate posttest and a delayed posttest. In intervention studies, immediate posttests are administered soon after the treatment is given. The immediate posttest results are useful when estimating the initial learning effect of the treatment. Delayed posttests are usually administered after a few days or weeks to measure whether vocabulary gains are retained in memory (Kremmel, 2020). Although immediate posttest results can in dicate the effectiveness of the treatment to some degree, the result of an immediate posttest should be interpreted with caution because the vocabulary gains found in this test may only be temporary (Schmitt, 2010). 3.2

Measuring learning gains

The components approach operationalizes vocabulary knowledge as being multi faceted. When depth of vocabulary knowledge is conceptualized using Nation’s (2013) framework, one should determine which aspect(s) of knowledge should be measured to assess learning gains from an intervention. Ideally, intervention studies should employ multiple measures to assess vocabulary gains through incidental and intentional learning because it is expected that vocabulary knowledge develops incrementally (e.g., Nagy et al., 1985; Webb, 2007b). Using more than one measure allows researchers to assess the degree to which vocabulary learning occurred, and thus the effectiveness of incidental vocabulary learning can be estimated more comprehensively. When selecting aspects to measure, researchers should consider which of the specific aspects of word knowledge might be developed through completing the treatment. For example, if research investigates incidental learning from listening, knowledge of spoken forms as well as knowledge of written forms may be gained. If research investigates intentional vocabulary learning through completing sen tence production activities, then it is sensible to measure learning gains on tests measuring knowledge of written forms, collocation, and grammatical functions as well as form-meaning connections. It is also important to consider the input in which target words are encountered when deciding on test formats to measure learning gains. For instance, if learners were exposed to target words in aural input, it makes sense to measure knowledge of the spoken forms of target items in tests (e.g., learners listen to spoken forms of target items to recall the L2 meanings). Once one decides which aspect(s) of vocabulary knowledge will be measured to examine learning gains from an intervention, one should choose the format of the tests eliciting vocabulary knowledge. Vocabulary test formats typically employ recognition or recall (Schmitt, 2010). Recognition tests present the target item and the options from which test-takers must recognize the correct answer. Recall tests

188 Emi Iwaizumi and Stuart Webb

present no options from which test-takers can choose the correct answer, and par ticipants must recall the answer. Incidental vocabulary learning studies have tended to assess learning gains using recognition tests because they are more sensitive to small gains in knowledge (Yanagisawa et al., 2020), whereas intentional vocabulary learning studies have more frequently employed recall tests than recognition tests to measure learning gains (Webb et al., 2020). However, it should be noted that a recall test format is typically more demanding compared to a recognition test (Laufer & Goldstein, 2004). Thus, measuring different aspects of word knowledge using both recognition and recall tests is useful in ISLA studies because different interventions (both incidental and intentional) may develop vocabulary knowledge to varying degrees (e.g., Webb, 2005). The ISLA research investigating incidental and intentional learning of multi word items can also measure different aspects of knowledge using recognition or recall test formats. Studies have used several types of recognition and recall test formats to measure knowledge of the form-meaning connections and written forms of multiword items (Webb & Kagimoto, 2011; Webb et al., 2013). For example, knowledge of form-meaning connections of multiword items can be measured on a test requiring participants to recall target multiword items when given their corresponding L1 translations (Webb & Kagimoto, 2011), and knowledge of written forms of multiword items can be measured on a test requiring participants to write a word for one component word from a collocation presented as a prompt (Webb et al., 2013). Additionally, knowledge of written forms of multiword items can be measured on a multiple-choice test which presents one of the component words from a collocation as a cue and participants must select the option that most fre quently occurs with the prompt word from among several options (e.g., Nguyen & Webb, 2017). Another type of test assessing written forms of multiword items asks participants to select the correct combination of words among several options. For example, a test may be presented with several options such as clear talk, close talk, pretty talk, and small talk, and participants must choose the one that they encoun tered (e.g., Jin & Webb, 2020). 3.3

Interpreting the results

When interpreting test results, it can be useful to employ multiple scoring systems with differing levels of sensitivity to assess partial knowledge of words (Rodgers & Webb, 2019; Webb et al., 2013). Because gaining vocabulary knowledge occurs incrementally, detecting partial learning gains can help to assess learning more ac curately. Among the nine aspects of word knowledge in Nation’s (2013) framework, the distinction between knowledge of form and form-meaning is worth noting. For

Chapter 8. Vocabulary 189

example, if knowledge of written form is the target construct measured by a test, researchers should place importance on the spelling of the target items. On the other hand, when learning of form-meaning connections is measured, researchers should focus on the degree to which test takers are able to link form and meaning rather than the spellings of the target items produced. When measuring knowledge of form-meaning connections, employing a lenient scoring method in terms of spellings is sensible, because the test is not intended to measure mastery of written forms (spelling). If knowledge of form-meaning connection is assessed on a spoken recall test, scoring should be focused on the degree to which spoken forms and meanings are successfully linked rather than on the way words are pronounced (e.g., in a target-like manner). Knowledge of spoken forms could also be measured on this test, but the quality of spoken output should be measured separately. Thus, the test might be scored twice, once for form-meaning connection and once for spoken form. 3.4

Example incidental learning study

Let us now review the studies by Webb et al. (2013) and Webb and Kagimoto (2011) to look at how studies investigating incidental and intentional vocabulary learning are designed. A growing number of studies have started to examine instructed learning of L2 multiword items (Dang et al., 2022; Vu & Peters, 2022) and this review of research follows suit.

190 Emi Iwaizumi and Stuart Webb

Exemplar study: Webb, Newton, & Chang (2013) Research questions Can collocations be learned incidentally through reading while listening to a modified graded reader? How many encounters with collocations are needed to incidentally learn the written form of the collocations when reading while listening to a modified graded reader? How many encounters with collocations are needed to incidentally learn the form and meaning of the collocations when reading while listening to a modified graded reader? Study design Between-subject design with a pretest and posttests Instructional intervention Participants were randomly assigned into one of five groups (four experimental groups and one control group). Each experimental group received one of four versions of a graded reader in which the target items appeared 1 time, 5 times, 10 times, or 15 times. During the treatment, each group simultaneously read and listened to one of the versions of the graded reader. Target items Eighteen verb-noun collocations appearing in the graded reader were selected as the target items. The collocations were semantically opaque (i.e., the meaning of each combination of words could not be inferred from the meanings of constituent words). Target collocations were not congruent to the Chinese-speaking participants (i.e., no direct L1 translations of the L2 collocations were available). Procedure and data elicitation instruments One week prior to the treatment, the Vocabulary Levels Test (Schmitt et al., 2001) and a pretest measuring receptive knowledge of written forms of the target collocations (Test 1) were administered. After the treatment, four posttests measuring productive and receptive knowledge of written forms and form-meaning connections of collocations were administered in the following order: Test 2 → Test 1 → Test 3 → Test 4. Delayed posttests were administered but the results were not reported because some participants indicated that they deliberately studied the target items after the immediate posttests. Test 1: Form recognition test assessing knowledge of written forms meet a) seat b) demand c) name d) question e) I don’t know Test 2: Form recall test assessing knowledge of written forms meet demand Test 3: Form recall test assessing knowledge of form-meaning connection 満足需要 meet demand Test 4: Meaning recall test assessing knowledge of form-meaning connection meet demand 満足需要 Scoring The posttests that measured productive knowledge of written form were scored using two scoring methods: partial and full knowledge. The authors found no statistical difference between the scores obtained using these scoring methods; therefore, they only reported the scores obtained using the stricter scoring method (only correctly spelled collocations were awarded points).

Chapter 8. Vocabulary 191

Findings Results indicated that collocations could be learned through encountering them repeatedly in context and that the more often learners encountered the collocations, the greater the vocabulary learning gains. Take-aways While Webb et al. comprehensively examined the learning of different aspects of vocabulary knowledge using multiple measures, the learning effects from taking multiple tests may have been present. For example, the L1 cues presented in the form recall test measuring form-meaning connections could have informed participants of the answers for the receptive form-meaning test which required participants to write the L1 translations of the target collocations. Controlling for possible test order effects in which test takers gain knowledge through completing a test can be difficult when using multiple tests. One way to overcome this limitation in future studies is to have different groups take different tests. This ensures that learning gains can be attributed to the treatment rather than through taking earlier tests. However, it does require larger numbers of participants. It should be noted that the delayed posttest results were not interpreted in this study because some participants studied the target items after the immediate posttest. This problem could have been avoided if pseudowords were utilized because participants will not encounter pseudowords outside of the treatment (Pellicer-Sánchez, 2017). In addition, although other variables that may potentially affect ease of learning (Puimège & Peters, 2020) can also be controlled with the use of pseydowords, this may lower ecological validity and generalizability of research findings.

3.5

Example intentional learning study

Exemplar study: Webb & Kagimoto (2011) Research questions How does the number of collocates presented with node words influence learning collocations? Does the position of the node word in a collocation have an effect on learning collocations? Is it more or less effective to learn the collocations for synonyms together? Study design Within-subject design with a pretest and a posttest Instructional intervention A total of 60 English adjective-noun collocations were divided into five sets (see below) and studied through reading glossed sentences (L1 translations were presented with target collocations). Participants were told to learn each set of the target items in 3 minutes by reading the glossed sentences (Example 1).

192 Emi Iwaizumi and Stuart Webb

An overview of the word sets Word sets

The number of node words presented in one set

The number of collocates presented with the node words

Set 1 Set 2 Set 3 Set 4 Set 5

2 4 12 4 12

6 3 1 3 1

The position of the Synonyms of the node node words (before or words were presented after the collocates) in the word set Before Before Before After Before

No No No No Yes

Example 1. An example of the glossed sentence Quick check ちょっと調べてみること

A quick check outside before you go would make me feel safer. Target items Sixty English adjective-noun collocations were selected as the target items. Target collocations were not congruent to the Japanese-speaking participants (i.e., no exact equivalent exists in their L1). Procedure and data elicitation instrument Participants took a written form recall pretest one week before the learning session. Any participants who demonstrated some knowledge of the target collocations were excluded from the study (n = 66). The remaining 41 participants took part in the learning treatment. Immediately after each set of target collocations was studied, participants took the form recall test shown below. The posttest was the same format as the pretest. Test: Form recall test assessing knowledge of form-meaning connections

ちょっと調べてみること_____________________(answer: quick check)

Scoring Responses including misspelled words and added inflectional suffixes (e.g., fast trak* or tracks* instead of fast track) were scored as correct because the test intended to measure form-meaning knowledge rather than knowledge of written form. Findings Results indicated that increasing the number of collocates presented with the node words yielded larger learning gains, the position of the node word did not affect learning, and synonymous node words among a set of collocations negatively affected learning.

Take-aways Although this study provided useful insight into the word-grouping factors that could influence intentional learning of collocations, it could be argued that the learning difference between the sets of collocations with synonyms and without synonyms of the node words could have been due to item difficulty. In particular, semantic transparency (Gyllstad & Wolter, 2016) of the target collocations for the L1 Japanese speaking learners was not controlled. This may be a confounding variable that affected the results.

Chapter 8. Vocabulary 193

3.6

Characteristics of the studies involving incidental and intentional interventions

Researchers should be aware of the strengths and weaknesses of incidental and intentional study designs to develop sound research methodology. First, while in cidental and intentional learning studies provide useful implications for pedagogy, it is important to acknowledge that the interventions employed in research may not fully mirror how learning typically occurs. For example, intentional learning research has often examined the degree to which completing one type of exer cise can improve vocabulary knowledge (e.g., Webb & Kagimoto, 2011). However, in classrooms, teachers typically use several exercises to teach new vocabulary to further expand on knowledge through the completion of subsequent activities. Additionally, incidental learning research measuring the learning that occurs through engaging in a single meaning-focused input activity (e.g., Dang et al., 2022) may not mirror what is expected of incidental vocabulary learning: encountering new words in various contexts through exposure to a large amount of input (Nagy et al., 1985). Research investigating the degree to which words can be learned inci dentally through engaging in multiple meaning-focused input activities is scarce. This may be because measuring vocabulary gains through the interventions can be challenging. Rodgers and Webb (2019) provide an example of how incidental learning gains occurring through multiple exposures to L2 input can be measured. Second, while well-controlled studies can estimate learning gains that are at tributable to the learning interventions, the high degree of control may threaten ecological validity. For example, although it is common to control for the occur rences of target words in meaning-focused input by modifying the texts (e.g., Webb & Chang, 2022; Webb et al., 2013), this results in decreased ecological validity compared to research that uses unmodified input materials (Dang et al., 2022). Additionally, learners in classrooms typically have access to dictionaries to look up meanings of unknown words that they encounter in meaning-focused input. However, in highly controlled incidental learning research, participants would not have access to supplemental resources to support comprehension despite being told to focus on understanding the meaning-focused input (e.g., Dang et al., 2022; Feng & Webb, 2020). Additionally, although pseudowords may be used to control for a potential learning effect from taking a pretest or encountering the target words outside of the treatment (Pellicer-Sánchez, 2017; Webb & Piasecki, 2018), ecological validity may be reduced.

194 Emi Iwaizumi and Stuart Webb

4. Advice for future vocabulary researchers Once research questions are developed, a study that can answer the questions should be carefully designed. At this stage, several methodological decisions must be made. The following questions can be considered to guide the process of design ing incidental or intentional vocabulary learning studies. (1) Does the study design require experimental and control groups to assess the effectiveness of the intervention? (2) Does the study follow the pretest-posttest design? Can a delayed posttest(s) be included? How might the learning gains be measured? (3) What target items do participants learn? How do participants encounter/study the target items? Are pseudowords more appropriate than real words to answer the research questions? (4) Which aspect(s) of word knowledge does the intervention intend to develop? How many vocabulary tests will be administered and in what order? (5) Can the vocabulary tests measure the target aspects of word knowledge appropriately? What potential responses might be expected in the tests, and how will these be scored? Is it appropriate to use lenient or strict scoring meth ods or both? One of the most challenging steps in designing a vocabulary learning study is de veloping an intervention. In the following section, guidelines for (1) designing incidental and intentional learning input material, (2) ethical considerations, (3) in terpreting and analyzing data, and (4) reporting of vocabulary research methods are provided. 4.1

Designing the input material employed in ISLA vocabulary research

If one conducts vocabulary learning research, the material used as a source of input should be comprehensible to participants. This is important when conducting both intentional and incidental vocabulary learning research. However, it is particularly important when meaning-focused input is the source of learning because incidental learning is more likely to occur when the input is clearly understood. A frequently employed approach to determining whether the material is at an appropriate level for the target population is to examine the lexical profiles of texts using vocabulary profiling tools such as the Range program (Heatley et al., 2002) or Tom Cobb’s Vocabulary Profilers (https://www.lextutor.ca/vp/) in rela tion to participants’ Vocabulary Levels Test (VLT) scores (Webb & Nation, 2017). Vocabulary profiling software can classify the words encountered in input into

Chapter 8. Vocabulary 195

different frequency levels according to frequency-based wordlists. The VLT scores indicate the degree to which test takers can recognize the form-meaning connec tions of single word items at different frequency levels (Schmitt et al., 2001; Webb et al., 2017). For example, the updated VLT (Webb et al., 2017) reveals the degree to which test takers have form-meaning knowledge of single words at the 1,000, 2,000, 3,000, 4,000, and 5,000 levels. Researchers often use the results of lexical profiling analysis of texts and the VLT to estimate lexical coverage (the number of words known) for a target population which could in turn indicate whether the learning materials are at an appropriate level for the participants (Laufer & Ravenhorst-Kalovski, 2010; Schmitt et al., 2011). For instance, the example studies (Webb & Kagimoto, 2011; Webb et al., 2013) have also explored the appropriateness of the input material using this approach. It is important to note that learners’ knowledge of form-meaning connections of running words should not be interpreted as an indication of the degree to which they comprehend the materials because being able to comprehend written or spoken input involves more than knowing the meaning of words (Carney, 2021; Jeon & Yamashita, 2014; Vafaee & Suzuki, 2020). Although incidental learning research that estimated comprehensibility based on the estimated lexical coverage figures of texts has revealed vocabulary gains (Dang et al., 2021; Feng & Webb, 2020), researchers are strongly encouraged to ensure the appropriateness of the research materials using other methods. For example, Peters and Webb (2018) conducted a pilot study with a group of learners resembling the participants in their experiment to examine whether the level of difficulty of the material (a full-length TV program) was appropriate using a questionnaire and through an interview. Additionally, Jin and Webb (2020) examined the rate of speech (i.e., the number of words spoken per minute) to determine whether the spoken input might be comprehensible to the participants. Finally, it may be useful to assess comprehension of the material separately and use this as a covariate to estimate the effectiveness of the treatment more accurately. 4.2

Ethical considerations and data collection

Once you have designed a study, you must consider any potential ethical issues in conducting research given that instructed L2 vocabulary research involves human participants. A potential ethical issue in incidental vocabulary learning research is that the true objective of the research and the presence of upcoming vocabulary tests (immediate and delayed posttests) may not be disclosed. Thus, participants will not be aware of the purpose of the research. Incidental learning should occur by exposure to meaning-focused input, and thus participants should not be cognizant

196 Emi Iwaizumi and Stuart Webb

that the intervention was intended to promote vocabulary learning. Being aware that there will be vocabulary tests may change participant behavior by directing attention to the vocabulary encountered during the intervention. While these types of manipulations are necessary to conduct incidental learning research, researchers who embark in such research must ensure that their study adheres to ethics proto cols. As such, disclosing the true objective of the research at the end of the study, and providing participants with an opportunity to withdraw from the study after the data collection are necessary. Another ethical issue that needs to be addressed when conducting ISLA vocab ulary research is the inclusion of a control group. A control group may be necessary if researchers choose to estimate the treatment effect by comparing the degree to which experimental groups outperformed a control group. However, participants assigned to the control group may not learn the L2 through taking part in the study (e.g., control groups will sometimes simply take the pre and posttests without re ceiving a treatment). To compensate the participants in control groups for a lack of learning benefits, researchers may offer the participants an opportunity to receive the instructional intervention or the list of target items learned in the intervention once the research is completed. Lastly, vocabulary researchers should consider what might be the most appro priate target for learning. For example, if intervention studies measure the learn ing of spoken forms of words through exposure to various accents (e.g., British, American, Australian), it would be unethical to consider only one of the dialects that they are exposed to as the learning target. To overcome such issues, variation in dialects should be accepted as correct responses. In addition, it may not be ap propriate to only score the gains dichotomously as learning can occur to varying degrees. This issue can be overcome by involving measures that are sensitive to different degrees of gains (Watanabe, 1997; Webb et al., 2013). 4.3

Interpreting and analyzing data

When the vocabulary components approach is used to assess learning gains, it is im portant to ensure that each dependent measure assesses only one aspect of vocab ulary knowledge. For example, Webb and Kagimoto (2011) used a scoring method that allowed them to disentangle knowledge of written forms and form-meaning connections of target items. Such an approach is useful when it comes to inter preting results. Additionally, caution should be taken when correct responses can vary. For instance, if form-meaning connections of target items are measured on a meaning recall test (i.e., participants must write the L1 translations when prompted with their L2 forms), researchers should develop a clear scoring system(s) and

Chapter 8. Vocabulary 197

consistently apply the scoring method(s) to evaluate responses. It is also recom mended that at least two researchers score tests and confirm interrater coefficients to ensure the reliability of the scoring procedure (Nation & Webb, 2011). Once vocabulary tests are scored, it is important to confirm the degree to which research instruments could assess the target knowledge construct. To do this, researchers must check reliability coefficients such as omega (Dunn et al., 2014) or summability (Goeman & de Jong, 2018). 4.4 Reporting the methodology with sufficient detail to allow replication Research methodology should be reported in detail as a lack of such information limits readers’ ability to evaluate the findings. However, recent meta-analytic re search (Yanagisawa et al., 2020) reported that studies often lacked clear descrip tions of research instruments (e.g., materials, test formats, target items), treatments (e.g., what exactly participants did/did not do during the learning sessions), and participant characteristics (e.g., age, gender, proficiency, vocabulary levels). Clear reporting of these details is necessary because it can provide greater options for meta-analytic approaches allowing more accurate and robust estimations (de Vos et al., 2018; Webb et al., 2020), and it encourages replication studies (Chapter 5, this volume; Marsden et al., 2018; Yanagisawa et al., 2020). Therefore, researchers should be encouraged to report their studies as clearly as possible. 5. Troubleshooting ISLA vocabulary research Common question 1: How should I select target vocabulary in my intervention study? It is challenging yet critical to select target items that are similar in difficulty to truly investigate the effect of the intervention unless word-related variables are also inves tigated in the study. Research investigating the acquisition of single-word items has identified a number of word-related variables that may affect learnability of words such as pronounceability (Ellis & Beaton, 1993), cognateness (formally and seman tically related words in learners’ L1 and L2) (Peters & Webb, 2018; Rogers et al., 2015), words that share orthographic and phonological similarity with other words (e.g., adopt/adapt, effect/affect; Laufer, 1997), and concreteness and imageability (De Groot & Keijzer, 2000; Paivio et al., 1994). Research investigating the acqui sition of multiword items has shown that prosodic cues (e.g., pauses before and after multiword items) in speech (Lin, 2018), L1-L2 congruency in collocations (L2 collocations can be translated in the learners’ L1s; Peters, 2016), and knowledge of component words in collocations (Zhang, 2017) may affect learning. Researchers

198 Emi Iwaizumi and Stuart Webb

are recommended to review such word-related variables that may make learning inherently more difficult when selecting appropriate target items to answer the research questions. Common question 2: Word properties and variability among learners may have influenced the learning outcomes in my intervention study. Is there a way to explore these types of variables in my analysis? Researchers embarking on an instructed L2 vocabulary acquisition study should consider numerous variables that could be confounded in the research design. In particular, we listed variables that are attributable to word properties and to the variations within participants (e.g., proficiency levels, levels of comprehension of the input). Despite the complexity in researching L2 vocabulary acquisition, there are a number of approaches to data analysis that may provide greater clarity. One of these is mixed-effects modeling that enables researchers to estimate the the item-level variables and participant-level variables simultaneously (Linck & Cunnings, 2015). Additionally, such an analysis can provide a more accurate esti mation of the variable of interest (e.g., the effect of the treatment) while considering other relevant covariates (Baayen et al., 2008). Furthermore, mixed-effects models can be conducted with data containing missing values (Linck & Cunnings, 2015). This technique can also be useful when obtained data are not normally distributed. 6. Conclusions The aim of this chapter has been to offer a guide to conducting instructed L2 vocab ulary acquisition research using incidental and intentional learning approaches. In particular, the chapter reviewed the vocabulary knowledge components approach to conceptualize and operationalize vocabulary knowledge. In light of a lack of re search into multiword items, the chapter has reviewed research that investigated the acquisition of multiword items. To conclude, this chapter provided some guidance on how ISLA vocabulary research may be designed. Researchers embarking on a study investigating L2 vocabulary acquisition should also consult with additional resources listed below.

Chapter 8. Vocabulary 199

7. Further reading and additional resources 7.1

Additional readings

Nation, I. S. P., & Webb, S. (2011). Researching and analyzing vocabulary. Heinle. This book discusses a wide range of topics related to vocabulary research methodology and issues related to designing incidental and intentional vocabulary research. Nation, I. S. P. (2013). Learning vocabulary in another language (2nd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9781139858656 This book provides a comprehensive review of what it means to know a word, and how different aspects of vocabulary knowledge can be measured. Webb, S., & Nation, I. S. P. (2017). How vocabulary is learned. Oxford University Press. This book discusses key concepts related to teaching and learning vocabulary. It provides a comprehensive review of different vocabulary exercises, which should be useful for research ers designing research examining the efficacy of vocabulary exercises. Webb, S. (2020) (Ed.). The Routledge handbook of vocabulary studies. Routledge. This handbook is useful for novice researchers looking to learn more about incidental and intentional vocabulary learning studies (Chapters 15 & 16), potential variables affecting affecting learning of single word and multiword items (Chapters 9 & 10), various approaches to measuring vocabulary gains (Chapters 24, 25, & 26), and tools to analyze word properties (Chapters 21, 22, & 29)

7.2

Conferences for L2 vocabulary researchers

“Vocab@”conferences A series of “Vocab@”conferences have been held in recent years in Wellington, New Zealand in 2013, Tokyo, Japan in 2016, and Leuven, Belgium in 2019. “Vocab@” conferences bring together researchers investigating vocabulary from a variety of perspectives (vocabulary learning, teach ing, processing, and testing).

7.3

Resources for L2 vocabulary researchers

Cobb, T. (n.d.). Compleat Lexical Tutor v.8.3 [computer software]. https://www.lextutor.ca Tom Cobb’s Lexical Tutor provides many useful tools for researchers to analyze vocabulary in various ways.

7.4

Recommended journals

ISLA vocabulary research has been frequently published in second language acquisition journals such as Studies in Second Language Acquisition, Language Learning, TESOL Quarterly, Modern Language Learning Journal, ITL – International Journal of Applied Linguistics, Language Teaching Research, and Applied Linguistics.

200 Emi Iwaizumi and Stuart Webb

References Adolphs, S., & Schmitt, N. (2003). Lexical coverage of spoken discourse. Applied Linguistics, 24(4), 425–438. https://doi.org/10.1093/applin/24.4.425 Anderson, R. C., & Freebody, P. (1981). Vocabulary knowledge. In J. T. Guthrie (Ed.), Comprehension and teaching: Research reviews (pp. 77–117). International Reading Association. Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed ran dom effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. https://doi.org/10.1016/j.jml.2007.12.005 Barcroft, J. (2004). Effects of sentence writing in second language lexical acquisition, Second Language Research, 20(4), 303–334. https://doi.org/10.1191/0267658304sr233oa Bauer, L., & Nation, I. S. P. (1993). Word families. International Journal of Lexicography, 6(4), 253–279. https://doi.org/10.1093/ijl/6.4.253 Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371–405. https://doi.org/10.1093/applin/25.3.371 Boers, F., Demecheleer, M., Coxhead, A., & Webb, S. (2014). Gauging the effects of exercises on verb–noun collocations. Language Teaching Research, 18(1), 54–74. https://doi.org/10.1177/1362168813505389 Boers, F., & Lindstromberg, S. (2008). How cognitive linguistics can foster effective vocabulary teaching. In F. Boers & S. Lindstromberg (Eds.), Cognitive linguistic approaches to teaching vocabulary and phraseology (pp. 1–61). Mouton de Gruyter. https://doi.org/10.1515/9783110199161.1.1 Carney, N. (2021). Diagnosing L2 listeners’ difficulty comprehending known lexis. TESOL Quarterly, 55(2), 536–567. https://doi.org/10.1002/tesq.3000 Crossley, S. A., Salsbury, T., & McNamara, D. S. (2014). Assessing lexical proficiency using ana lytic ratings: A case for collocation accuracy. Applied Linguistics, 36(5), 570–590. https://doi.org/10.1093/applin/amt056 Dang, T. N. Y., Lu, C., & Webb, S. (2022). Incidental learning of single words and collocations through viewing an academic lecture. Studies in Second Language Acquisition, 44(3), 708–736. https://doi.org/10.1017/S0272263121000474 Dang, T. N. Y., & Webb, S. (2014). The lexical profile of academic spoken English. English for Specific Purposes, 33, 66–76. https://doi.org/10.1016/j.esp.2013.08.001 De Groot, A. M. B., & Keijzer, R. (2000). What is hard to learn is easy to forget: The roles of word concreteness, cognate status, and word frequency in foreign-language vocabulary learning and forgetting. Language Learning, 50(1), 1–56. https://doi.org/10.1111/0023-8333.00110 De Vos, J. F., Schriefers, H., Nivard, M. G., & Lemhofer, K. (2018). A meta-analysis and me ta-regression of incidental second language word learning from spoken input. Language Learning, 68(4), 906–941. https://doi.org/10.1111/lang.12296 Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105(3), 399–412. https://doi.org/10.1111/bjop.12046 Ellis, N. C., & Beaton, A. (1993). Psycholinguistic determinants of foreign language vocabulary learning. Language Learning, 43(4), 559–617. https://doi.org/10.1111/j.1467-1770.1993.tb00627.x

Chapter 8. Vocabulary 201

Feng, Y., & Webb, S. (2020). Learning vocabulary through reading, listening, and viewing: Which mode of input is most effective? Studies in Second Language Acquisition, 42(3), 499–523. https://doi.org/10.1017/S0272263119000494 Godfroid, A., Ahn, J., Choi, I., Ballard, L., Cui, Y., Johnston, S., … Yoon, H. J. (2018). Incidental vocabulary learning in a natural reading context: An eye-tracking study. Bilingualism: Language and Cognition, 21(3), 563–584. https://doi.org/10.1017/S1366728917000219 Goeman, J. J., & de Jong, N. H. (2018). How well does the sum score summarize the test? Summability as a measure of internal consistency. Educational Measurement: Issues and Practice, 37(2), 54–63. https://doi.org/10.1111/emip.12181 González-Fernández, B. (2022). Conceptualizing L2 vocabulary knowledge: An empirical exam ination of the dimensionality of word knowledge. Studies in Second Language Acquisition. Advance online publication. https://doi.org/10.1017/S0272263121000930 Gyllstad, H., & Wolter, B. (2016). Collocational processing in light of the phraseological con tinuum model: Does semantic transparency matter? Language Learning, 66(2), 296–323. https://doi.org/10.1111/lang.12143 Heatley, A., Nation, I. S. P., & Coxhead, A. (2002). Range: A program for the analysis of vocabulary in texts [Computer software]. https://www.wgtn.ac.nz/lals/resources/paul-nations-resources/ vocabulary-analysis-programs Henriksen, B. (1999). Three dimensions of vocabulary development. Studies in Second Language Acquisition, 21(2), 303–317. https://doi.org/10.1017/S0272263199002089 Horst, M. (2005). Learning L2 vocabulary through extensive reading: A measurement study. The Canadian Modern Language Review, 61(3), 355–382. https://doi.org/10.3138/cmlr.61.3.355 Hulstijn, J. H., & Laufer, B. (2001). Some empirical evidence for the involvement load hypothesis in vocabulary acquisition. Language Learning, 51(3), 539–558. https://doi.org/10.1111/0023-8333.00164 Jeon, E. H., & Yamashita, J. (2014). L2 reading comprehension and its correlates: A meta-analysis. Language Learning, 64(1), 160–212. https://doi.org/10.1111/lang.12034 Jin, Z., & Webb, S. (2020). Incidental vocabulary learning through listening to teacher talk. Modern Language Journal, 104(3), 550–566. https://doi.org/10.1111/modl.12661 Kremmel, B. (2020). Measuring vocabulary learning progress. In S. Webb (Ed.), The Routledge handbook of vocabulary studies (pp. 81–96). Routledge. https://doi.org/10.4324/9780429291586 Laufer, B. (1997). What’s in a word that makes it hard or easy? Intralexical factors affecting vocabulary acquisition. In N. Schmitt & M. McCarthy (Eds.), Vocabulary: Description, acquisition, and pedagogy (pp. 140–155). Cambridge University Press. https://doi.org/10.1017/S0272263100012171 Laufer, B. (2003). Vocabulary acquisition in a second language: Do learners really acquire most vocabulary by reading? Some empirical evidence. The Canadian Modern Language Review, 59(4), 567–58. https://doi.org/10.3138/cmlr.59.4.567 Laufer, B., & Goldstein, Z. (2004). Testing vocabulary knowledge: Size, strength, and computer adaptiveness. Language Learning, 54(3), 399–436. https://doi.org/10.1111/j.0023-8333.2004.00260.x Laufer, B., & Ravenhorst-Kalovski, G. C. (2010). Lexical threshold revisited: Lexical text cover age, learners’ vocabulary size and reading comprehension. Reading in a Foreign Language, 22, 15–30. http://hdl.handle.net/10125/66648 Lin, P. (2018). The prosody of formulaic sequences: A corpus and discourse approach. Bloomsbury.

202 Emi Iwaizumi and Stuart Webb

Linck, J. A., & Cunnings, I. (2015). The utility and application of mixed-effects models in second language research. Language Learning, 65(1), 185–207. https://doi.org/10.1111/lang.12117 Marsden, E. J., Morgan-Short, K., Thompson, S., & Abugaber, D. (2018). Replication in second language research: Narrative and systematic reviews, and recommendations for the field. Language Learning, 68(2), 321–391. https://doi.org/10.1111/lang.12286 Nagy, W. E., Herman, P. A., & Anderson, R. C. (1985). Learning words from context. Reading Research Quarterly, 20(2), 232–253. https://doi.org/10.2307/747758 Nakata, T. (2008). English vocabulary learning with word lists, word cards and computers: Implications from cognitive psychology research for optimal spaced learning. ReCALL, 20(1), 3–20. https://doi.org/10.1017/S0958344008000219 Nation, I. S. P. (2006). How large a vocabulary is needed for reading and listening? The Canadian Modern Language Review, 63(1), 59–82. https://doi.org/10.3138/cmlr.63.1.59 Nguyen, T. M. H., & Webb, S. (2017). Examining second language receptive knowledge of collo cation and factors that affect learning. Language Teaching Research, 21(3), 298–320. https://doi.org/10.1177/1362168816639619 Paivio, A., Walsh, M., & Bons, T. (1994). Concreteness effects on memory: When and why? Journal of Experimental Psychology: Learning, Memory, and Cognition, 20(5), 1196–1204. https://doi.org/10.1037/0278-7393.20.5.1196 Pavia, N., Webb, S., Faez, F. (2019). Incidental vocabulary learning through listening to songs. Studies in Second Language Acquisition, 41(4), 745–768. https://doi.org/10.1017/S0272263119000020 Pellicer-Sánchez, A. (2016). Incidental L2 vocabulary acquisition from and while reading. Studies in Second Language Acquisition, 38(1), 97–130. https://doi.org/10.1017/S0272263115000224 Pellicer-Sánchez, A. (2017). Learning L2 collocations incidentally from reading. Language Teaching Research, 21(3), 381–402. https://doi.org/10.1177/1362168815618428 Pellicer-Sánchez, A., & Schmitt, N. (2010). Incidental vocabulary acquisition from an authentic novel: Do things fall apart? Reading in a Foreign Language, 22(1), 31–55. http://hdl.handle. net/10125/66652 Peters, E. (2016). The learning burden of collocations: The role of interlexical and intralexical fac tors. Language Teaching Research, 20(1), 113–138. https://doi.org/10.1177/1362168814568131 Peters, E., & Pauwels, P. (2015). Learning academic formulaic sequences. Journal of English for Academic Purposes, 20, 28–39. https://doi.org/10.1016/j.jeap.2015.04.002 Peters, E., & Webb, S. (2018). Incidental vocabulary acquisition through watching a single epi sode of L2 television. Studies in Second Language Acquisition, 40(3), 551–557. https://doi.org/10.1017/S0272263117000407 Pigada, M., & Schmitt, N. (2006). Vocabulary acquisition from extensive reading: A case study. Reading in a Foreign Language, 18(1), 1–28. http://hdl.handle.net/10125/66611 Puimège, E., & Peters, E. (2020). Learning formulaic sequences through viewing L2 television and factors that affect learning. Studies in Second Language Acquisition, 42(3), 525–549. https://doi.org/10.1017/S027226311900055X Qian, D. D. (2002). Investigating the relationship between vocabulary knowledge and academic reading performance: An assessment perspective. Language Learning, 52(3), 53–536. https://doi.org/10.1111/1467-9922.00193 Qian, D. D., & Schedl, M. (2004). Evaluation of an in-depth vocabulary knowledge measure for assessing reading performance. Language Testing, 21(1), 28–52. https://doi.org/10.1191/0265532204lt273oa

Chapter 8. Vocabulary 203

Read, J. (2007). Second language vocabulary assessment: Current practices and new directions. International Journal of English Studies, 7(2), 105–126. Richards, J. C. (1976). The role of vocabulary teaching. TESOL Quarterly, 10(1), 77–89. https://doi.org/10.2307/3585941 Rodgers, M. P. H., & Webb, S. (2019). Incidental vocabulary learning through viewing television. ITL – International Journal of Applied Linguistics, 171(2). 191–220. https://doi.org/10.1075/itl.18034.rod Rogers, J., Webb, S., & Nakata, T. (2015). Do the cognacy characteristics of loanwords make them more easily learned than noncognates? Language Teaching Research, 19(1), 9–27. https://doi.org/10.1177/1362168814541752 Saito, K., Webb, S., Trofimovich, P., & Isaacs, T. (2016). Lexical profiles of comprehensible second language speech: The role of appropriateness, fluency, variation, sophistication, abstract ness, and sense relations. Studies in Second Language Acquisition, 38(4), 677–701. https://doi.org/10.1017/S0272263115000297 Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. Palgrave Macmillan. https://doi.org/10.1057/9780230293977 Schmitt, N., Jiang, X., & Grabe, W. (2011). The percentage of words known in a text and reading comprehension. Modern Language Journal, 95(1), 26–43. https://doi.org/10.1111/j.1540-4781.2011.01146.x Schmitt, N., Schmitt, D., & Clapham, C. (2001). Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test. Language Testing, 18(1), 55–88. https://doi.org/10.1177/026553220101800103 Shi, L., & Qian, D. (2012). How does vocabulary knowledge affect Chinese EFL learners’ writ ing quality in web-based settings? Evaluating the relationships among three dimensions of vocabulary knowledge and writing quality. Chinese Journal of Applied Linguistics, 35(1), 117–127. https://doi.org/10.1515/cjal-2012-0009 Siyanova-Chanturia, A., & Pellicer-Sánchez, A. (2019). Formulaic language. In A. SiyanovaChanturia & A. Pellicer-Sánchez (Eds.), Understanding formulaic language: A second language acquisition perspective (pp. 1–15). Routledge. https://doi.org/10.4324/9781315206615 Stæhr, L. S. (2008). Vocabulary size and the skills of listening, reading and writing. Language Learning Journal, 36(2), 139–152. https://doi.org/10.1080/09571730802389975 Uchihara, T., & Clenton, J. (2018). Investigating the role of vocabulary size in second language speaking ability. Language Teaching Research, 24(4), 540–556. https://doi.org/10.1177/1362168818799371 Uchihara, T., & Saito, K. (2019). Exploring the relationship between productive vocabulary knowledge and second language oral ability. Language Learning Journal, 47(1), 64–75. https://doi.org/10.1080/09571736.2016.1191527 Uchihara, T., Webb, S., Saito, K., & Trofimovich, P. (2022). The effects of talker variability and frequency of exposure on the acquisition of spoken word knowledge. Studies in Second Language Acquisition, 44(2), 357–380. https://doi.org/10.1017/S0272263121000218 Vafaee, P., & Suzuki, Y. (2020). The relative significance of syntactic knowledge and vocabulary knowledge in second language listening ability. Studies in Second Language Acquisition, 42(2), 383–410. https://doi.org/10.1017/S0272263119000676 van Zeeland, H., & Schmitt, N. (2013a). Lexical coverage in L1 and L2 listening comprehension: The same or different from reading comprehension? Applied Linguistics, 34(4), 457–479. https://doi.org/10.1093/applin/ams074

204 Emi Iwaizumi and Stuart Webb

van Zeeland, H., & Schmitt, N. (2013b). Incidental vocabulary acquisition through L2 listening: A dimensions approach. System, 41(3), 609–624. https://doi.org/10.1016/j.system.2013.07.012 Vu, D. V., & Peters, E. (2022). Incidental learning of collocations from meaningful input. A lon gitudinal study into three reading modes and factors that affect learning. Studies in Second Language Acquisition, 44(3), 685–707. https://doi.org/10.1017/S0272263121000462 Wang, Y., & Treffers-Daller, J. (2017). Explaining listening comprehension among L2 learners of English: The contribution of general language proficiency, vocabulary knowledge and metacognitive awareness. System, 65, 139–150. https://doi.org/10.1016/j.system.2016.12.013 Watanabe, Y. (1997). Input, intake, and retention. Studies in Second Language Acquisition, 19(3), 287–307. https://doi.org/10.1017/S027226319700301X Webb, S. (2005). Receptive and productive vocabulary learning: The effects of reading and writ ing on word knowledge. Studies in Second Language Acquisition, 27(1), 33–52. https://doi.org/10.1017/S0272263105050023 Webb, S. (2007a). Learning word pairs and glossed sentences: The effects of a single context on vocabulary knowledge. Language Teaching Research, 11(1), 63–81. https://doi.org/10.1177/1362168806072463 Webb, S. (2007b). The effects of repetition on vocabulary knowledge. Applied Linguistics, 28(1), 46–65. https://doi.org/10.1093/applin/aml048 Webb, S. (2013). Depth of vocabulary knowledge. In C. Chappelle (Ed.), Encyclopedia of applied linguistics (pp. 1656–1663). Wiley-Blackwell. https://doi.org/10.1002/9781405198431.wbeal1325 Webb, S. (2020). Incidental vocabulary learning. In S. Webb (Ed.), The Routledge handbook of vocabulary studies (pp. 225–239). Routledge. https://doi.org/10.4324/9780429291586 Webb, S., & Chang, A. C.-S. (2022). How does mode of input affect the incidental learning of collocations. Studies in Second Language Acquisition, 44(1), 35–56. https://doi.org/10.1017/S0272263120000297 Webb, S., & Kagimoto, E. (2011). Learning collocations: Do the number of collocates, position of the node word, and synonymy affect learning? Applied Linguistics, 32(3), 259–276. https://doi.org/10.1093/applin/amq051 Webb, S., Newton, J., & Chang, A. (2013). Incidental learning of collocation. Language Learning, 63(1), 91–120. https://doi.org/10.1111/j.1467-9922.2012.00729.x Webb, S., & Piasecki, A. (2018). Re-examining the effects of word writing on vocabulary learn ing. ITL – International Journal of Applied Linguistics, 169(1), 72–94. https://doi.org/10.1075/itl.00007.web Webb, S. & Rodgers, M. P. H. (2009a). The lexical coverage of movies. Applied Linguistics, 30(3), 407–427. https://doi.org/10.1093/applin/amp010 Webb, S. & Rodgers, M. P. H. (2009b). The vocabulary demands of television programs. Language Learning, 59(2), 335–366. https://doi.org/10.1111/j.1467-9922.2009.00509.x Webb, S., Sasao, Y., & Ballance, O. (2017). The updated vocabulary levels test: Developing and validating two new forms of the VLT. ITL – International Journal of Applied Linguistics, 168(1), 33–69. https://doi.org/10.1075/itl.168.1.02web Webb, S., Yanagisawa, A., & Uchihara, T. (2020). How effective are intentional vocabulary-learn ing activities? A meta-analysis. Modern Language Journal, 104(4), 715–738. https://doi.org/10.1111/modl.12671 Wei, Z. (2015). Does teaching mnemonics for vocabulary learning make a difference? Putting the keyword method and the word part technique to the test. Language Teaching Research, 19(1), 43–69. https://doi.org/10.1177/1362168814541734

Chapter 8. Vocabulary 205

Wood, D. (2006). Uses and functions of formulaic sequences in second language speech: An explo ration of the foundations of fluency. The Canadian Modern Language Review, 63(1), 13–33. https://doi.org/10.3138/cmlr.63.1.13 Wood, D. (2015). Fundamentals of formulaic language: An introduction. Bloomsbury. Wood, D. (2020). Classifying and identifying formulaic language. In S. Webb (Ed.), The Routledge handbook of vocabulary studies (pp. 30–45). Routledge. https://doi.org/10.4324/9780429291586 Wray, A. (1999). Formulaic language in learners and native speakers. Language Teaching, 32(4), 213–231. https://doi.org/10.1017/S0261444800014154 Wray, A. (2002). Formulaic language and the lexicon. Cambridge University Press. https://doi.org/10.1017/CBO9780511519772 Wray, A., & Perkins, M. R. (2000). The functions of formulaic language: An integrated model. Language and Communication, 20(1), 1–28. https://doi.org/10.1016/S0271-5309(99)00015-4 Yanagisawa, A., & Webb, S. (2020). Measuring depth of vocabulary knowledge. In S. Webb (Ed.), The Routledge handbook of vocabulary studies (pp. 81–96). Routledge. https://doi.org/10.4324/9780429291586 Yanagisawa, A., Webb, S., & Uchihara, T. (2020). How do different forms of glossing contrib ute to L2 vocabulary learning from reading? A meta-regression analysis. Studies in Second Language Acquisition, 42(2), 411–438. https://doi.org/10.1017/S0272263119000688 Zhang, X. (2017). Effects of receptive-productive integration tasks and prior knowledge of the component words on L2 collocation development. System, 66, 156–167. https://doi.org/10.1016/j.system.2017.03.019

Chapter 9

Grammar Documenting growth in L2 classrooms Paul D. Toth

Temple University

Meaningfully assessing the effect of instruction on second language (L2) gram matical development requires research that credibly relates observed learner behaviors to relevant theories of language and language learning. After defining the role of grammar in meaningful language use, this chapter compares cogni tive and social views of grammatical development to identify key constructs for exploration in ISLA research. It illustrates the relationship between theory and L2 elicitation strategies by first outlining general principles and then describing the design of acceptability judgment tasks and picture description tasks in detail, along with a framework for working with classroom recordings. Specific exam ples from the author’s research in U.S. high school L2 Spanish classrooms on explicit grammar instruction will be provided. Advice to future researchers will center on empowering readers to find optimal research designs through aware ness of the questions about teaching and learning they wish to explore. Keywords: Spanish, picture description task, acceptability judgment task, production, interpretation, explicit instruction

1. What is grammar and why is it important? In countless second language (L2) classrooms around the world, grammat ical accuracy is treated as a fundamental building block for L2 communicative proficiency. Together with the lexicon (i.e., vocabulary, see Chapter 8), gram matical patterns and rules are often the organizing feature of both structure- and communicatively-focused L2 curricula. Nonetheless, current theories of lan guage consider grammatical structures as co-essential vehicles for communica tion together with the phonological forms from which they are constructed, the meaningful distinctions they express, the lexical items they accompany, and the pragmatic expectations within which they are employed (Chomsky, 2005; Halliday & Matthiessen, 2014). Indeed, it is through the rapid synthesis of disparate features https://doi.org/10.1075/rmal.3.09tot © 2022 John Benjamins Publishing Company

208 Paul D. Toth

of form, meaning, and use that we organize our thinking, negotiate social relation ships, and come to understand others (Larsen-Freeman, 2003). However, L2 classroom experiences that center on reproducing verb conjuga tions and other grammatical forms misrepresent this complexity, often fostering unrealistic expectations that what appear to be simple forms should be quickly mastered. Many times, such exercises are accompanied by technical, metalinguistic explanations that explicitly describe how target structures should be used. Although the perceived value of such explanations may vary among practitioners, the consist ently accurate use of targeted structures rarely emerges directly from even the most well-crafted explanations (Larsen-Freeman, 2003). This can lead to frustration and self-blame on the part of teachers and learners over the time and effort needed for L2 grammatical accuracy to develop, as well as simplistic pedagogical proposals that either double down on extensive explanations and mechanical exercises or give up on attending to grammatical accuracy altogether (Larsen-Freeman & Tedick, 2016). In this chapter, after first defining what grammar is, we will explore the cur rent state of knowledge about how L2 grammatical development occurs and the role of instruction in the process. Then we will consider options for gathering and analyzing data with examples coming from a large research project on instructed L2 grammatical development that I worked on with various colleagues. We will conclude with advice to future grammar researchers, strategies for trouble-shooting data collection, and resources for further reading. Although theoretical perspectives on the nature of grammatical knowledge vary, there is a broad consensus that, whereas lexical items depict the entities, events, and qualities of our world – as nouns (N), verbs (V), and adjectives/ad verbs (Adj/Adv), respectively – myriad grammatical structures exist to systemically provide information about these items as accompanying inflectional morphemes (prefixes, suffixes, articles, pronouns, auxiliaries). Meanwhile, grammatical norms also establish appropriate syntactic sequences for words, phrases, and sentences (Yule, 2020). Thus, within linguistic terminology, “grammar” is often referred to as morphosyntax. For example, consider that to describe the situation in Figure 1, an English speaker might notice a few salient objects and select the nouns chef, fish, and hat. To describe the events, the speaker might choose the verbs cook and wear. To characterize the chef, fish, or hat, the adjectives talented, big, and tall, might come to mind, while adverbs such as carefully or often might describe the cooking and wearing events. Immediately upon selecting these lexical items however, the grammatical domain of syntax becomes relevant as the speaker puts them in order to compose a coherent statement. The sequences, Chef carefully cook big fish, and Chef often wear tall hat, (N Adv V Adj N) would likely be the most comprehensible to experienced English users, while alternatives might range from comprehensible

Chapter 9. Grammar 209

Figure 1. Visual eliciting use of lexical and grammatical resources for description

but challenging (Big fish cook chef carefully) to incomprehensible (Cook big slowly fish chef). Even with optimal syntactic sequencing however, an utterance with bare lexical items might be difficult to understand, and important information about the speaker’s perspective on referenced objects and events would have to be inferred by other means. The inflectional morphology that systematically accompanies lexical items in any language would fill many of these gaps. In English, adding the suffix – ed to the verb cook (“cooked”) would mark the event as completed in a past time frame, whereas adding the auxiliary is and the suffix – ing (“is cooking”) would mark it as presently taking place. Meanwhile, preceding the nouns chef and fish with the definite article the would show that the speaker considers these referents as information already shared with the other person: “The chef cooked the fish.” By contrast, substituting the indefinite article a before either noun would position the referent as new and direct the other’s attention toward it: “The chef cooked a fish,” or “A chef cooked the fish.” Thus, as with many inflectional morphemes, a brief moment in the phonological sound stream conveys subtle meanings that have an important impact on communication. These communicative impacts can in turn have social and cognitive conse quences. In a context where participants are viewing Figure 1 together, using two indefinite articles to present both referents as new information, “A chef cooked a fish,” would seem discursively odd, despite the lack of morphosyntactic errors. Likewise, whether essential for comprehension or not, the systematicity with which

210 Paul D. Toth

certain inflectional morphemes are supplied in any language creates expectations among users that, when violated, require more effort to process. Such would be the case in English if a speaker used the lexical adverb yesterday while omitting the verbal suffix -ed: “Yesterday the chef cook a big fish.” Although a past-time reference would still be conveyed through the adverb, the missing verb inflection could inhibit the other person’s perception of seamless communication, with social consequences for the speaker. 2. What we know and what we need to know about grammatical development in ISLA Current theories of L2 grammatical development recognize that there are often significant gaps between learners’ first exposure to a particular structure, their metalinguistic understanding of its use, and their ability to accurately and con sistently deploy the structure in real time (Mitchell et al., 2019). Whereas cogni tive approaches to second language acquisition (SLA) aim to distinguish different components of grammatical development as universal mental processes, social approaches more holistically consider how development affects our capacity for meaningful activity within particular contexts. 2.1

Cognitive approaches to investigating grammar learning

Based on neuroimaging research, cognitive approaches differentiate between declarative knowledge of explicit, consciously-reportable facts and implicit, non-conscious procedural knowledge of how to execute complex mental and mo tor activities (Ullman & Lovelett, 2018). Led by selective attention to auditory, visual, or other sensory stimuli, relevant declarative and procedural knowledge are summoned from long-term memory to the metaphorical space of working memory to interpret the input and formulate a response (Baars & Franklin, 2003; Cowan, 2014). For example, when an L2 English learner attends to Figure 1 to describe the picture, the learner’s summoned declarative knowledge might consist of remem bered word meanings and explicit understandings of grammatical rules, as well as an awareness of task goals and procedures, and any personal reasons for completing it. In contrast, procedural knowledge would consist of the learner’s real-time ability to integrate their intended response with the selection and sequencing of lexical and inflectional morphemes, and to coordinate the structures’ expression through spoken or written gestures. Given the complexity of this process and the fact that much of it depends on a synchronization of implicit procedures, it is unsurprising

Chapter 9. Grammar 211

that significant gaps should appear between our declarative understanding of how grammar works, and our procedural ability to deploy L2 structures within fractions of a second. Although current evidence suggests that the storage and activation of declara tive and procedural knowledge are distinct cognitive processes, an emerging con sensus holds that declarative knowledge, which can be quickly acquired through analytic reasoning, can catalyze growth in procedural knowledge, which occurs through the gradual accumulation of interrelated experiences from which general izations are derived (Ullman & Lovelett, 2018). Consciously-accessible declarative knowledge is said to motivate how attention is directed to features of sensory input, which, in ISLA contexts, might include the use of difficult-to-perceive inflections or syntactic sequences in classroom interaction (N. C. Ellis, 2015). Such explicit declarative knowledge also lends coherence to episodes of implicit, procedural ac tivity as the two are simultaneously activated in working memory. This coherence matters because it maximizes the efficiency of interpretive processing, gradually allowing the complexity of real-time activities to increase over time in a process called automatization (DeKeyser, 2017). In L2 grammar research, cognitive approaches therefore prioritize evidence of growth in implicit, procedural abilities, with related interests in how explicit, declarative knowledge and directed attention might affect such growth. Most often, large-scale quantitative methods are employed so that group results can be general ized through inferential statistics to broader statements about how various instruc tional conditions affect L2 learner cognition (Plonsky, 2017). In ISLA, such work usually involves providing one or more instructional experiences that combine different types and amounts of explicit rules, information, and feedback with op portunities for interpretive or productive L2 use (Nassaji, 2017). Learning outcomes are then assessed through oral or written tasks that can be said to represent implicit versus explicit L2 knowledge to varying degrees (Loewen, 2018). Many studies may also aim to characterize developmental processes through verbal self-reports or technologies such as eye-tracking, brain imaging, or reaction time measures, which assess how participants’ attention is directed and the degree to which implicit knowledge cognitive processes may underlie their behaviors (Bowles, 2018; Grey & Tagarelli, 2018). 2.2

Social approaches to investigating grammar

By contrast, social approaches to L2 grammatical development are motivated by holistic concerns over how instructional experiences develop learners’ control over meaningful activity. Because what counts as “meaningful” to any learner is highly

212 Paul D. Toth

dependent on individual and contextual circumstances, social approaches prior itize a subjective, emic interpretation of data from the participant’s perspective rather than preconceived, etic theoretical distinctions. To develop emic hypothe ses, descriptive qualitative methods are usually employed, involving painstaking analyses from multiple data sources (Lew et al., 2018). In ISLA, these may include classroom observations, video and audio recordings, individual and community demographics, or interviews and written accounts of participant experiences (De Costa et al., 2017). Given the large amounts of data that must be integrated, this work necessarily requires much smaller participant numbers than quantita tive research, at times involving case studies of groups or individuals (Duff, 2014). Although such research operates outside statistical assumptions that generalize numerical trends to larger populations, the rich contextual details accompany ing qualitative work provide finer-grained explanations for learner behavior than quantitative studies can, while still yielding implications for learning under similar circumstances (see Chapters 2 and 3, this volume). Social approaches also view grammar as a linguistic resource for meaning ful activity rather than a manifestation of universal mental processes (Halliday & Matthiessen, 2014). Capable others are said to mediate L2 grammatical develop ment when they respond to emerging learner needs with optimally-tuned assistance (Lantolf & Poehner, 2014). Such mediation may consist of teacher or peer feedback, explicit grammatical information, or opportunities to “talk through” the language used in a task (Swain et al., 2015). In contrast with quantitative cognitive research, qualitative socially-oriented work aims to determine how such instructional activ ities can meet individuals’ needs in a particular context rather than whether, when uniformly-applied, they can produce positive results for learner groups on average. Inspired by the writings of Vygotsky (1978, 1986), Sociocultural Theory further stipulates that optimal mediation should provide learners with conceptual and pro cedural tools that can extend the capacity for future activity beyond the immediate task at hand (Lantolf & Poehner, 2014). Thus, the value of a teacher’s grammat ical mediation, whether given as an explanation, assistive prompt, or corrective feedback, would depend on factors such as learners’ perception of the teacher’s assistive intention, their progress in being able to independently incorporate the structure into their L2 use, and their ability to link the present information to per tinent grammatical concepts (Swain et al., 2015). When assistance can be perceived as relevant to learners’ present activity goals, it serves as a resource, or affordance, in the activity, and learners are said to demonstrate personal agency by appropriating it for their purposes (van Lier, 2008). When learners ultimately move from assisted to self-regulated activity, they are said to have internalized its mediational means. This would entail not only a more effective use of target grammatical constructions for future self-expression, but also a conceptual, metalinguistic understanding of

Chapter 9. Grammar 213

how to make such activity possible (Poehner et al., 2019). Within this framework, data collection usually involves qualitatively documenting developmental pro cesses through recordings of L2 use, linguistic problem solving, and/or participant interviews (van Compernolle, 2019). This may be carried out longitudinally by gath ering data at multiple timepoints or by capturing precise moments of microgenesis, when learners begin to transform their L2 behavior through mediation. 3. Data elicitation and interpretation options and step-by-step guidelines for grammar research 3.1

Operationalizing grammatical development

Data elicitation to assess L2 grammar development requires careful thinking about how changes in L2 grammatical knowledge can be operationalized as observable linguistic behavior. Within a cognitive framework that distinguishes explicit from implicit knowledge, data elicitation should consist of tasks that can convincingly be said to encourage or diminish opportunities for conscious reflection on L2 use (R. Ellis, 2009; Suzuki & DeKeyser, 2017). Many cognitively-informed studies there fore operationalize implicit L2 knowledge through tasks that require language interpretation or production under time pressure or that measure participants’ attention or reaction times during such tasks. Test items may also involve subtle linguistic principles that would arguably lie beyond participants’ conscious aware ness, such as the syntactic or semantic properties that distinguish grammatical from ungrammatical sentences. Among common implicit knowledge assessments are ac ceptability judgment tasks (AJTs; traditionally known as grammaticality judgment tasks, GJTs), where participants must decide whether given sentences are possible in the L2, either as stand-alone items or as depictions of a picture or situation (Spinner & Gass, 2019). Aurally presented stimuli are also generally considered better de pictions of implicit knowledge than written stimuli due to the reduced time for metalinguistic reflection (J. Kim & Nam, 2017). When computer laboratory space is available, researchers may use elicited imitation tasks, where participants hear recorded sentences and must repeat them from memory (Y. Kim et al., 2016), or processing tasks, where eye-tracking devices or reaction-time software capture the focus of participants’ attention while they respond to on-screen stimuli (Godfroid, 2020; Suzuki, 2017). Meanwhile, explicit, declarative knowledge may be assessed through L2 use with more open-ended time demands, such as judgment, writing, or problem-solving tasks, often accompanied by requests that learners make cor rections to ungrammatical items (Spinner & Gass, 2019), or that they “think aloud” about their L2 use, either concurrently or retrospectively (Bowles, 2018).

214 Paul D. Toth

Recently, laboratory-based L2 research has suggested that processing tasks may better operationalize implicit procedural knowledge than AJTs, given that scores on the two do not always correlate strongly (Maie & Godfroid, 2022; Suzuki, 2017). Instead, it is claimed that time-pressure judgments may better represent rapidly accessed, automatized explicit knowledge. Although this might seem to suggest that L2 classroom studies without access to laboratory equipment lack validity, the distinction is arguably less important for questions of pedagogical effectiveness than theories of L2 cognition, given that both implicit procedural knowledge and automatized explicit knowledge are believed to support fluent, accurate L2 use (Paradis, 2009; Spada, 2015). Indeed, Suzuki and DeKeyser (2017) conclude that automatized explicit knowledge plays a vital role in L2 development, given that, in their study, L2 learners’ scores on timed AJT correlated more strongly with im provements on L2 processing tasks than separate measures of aptitude for implicit learning. Thus, what is important for cognitively-oriented classroom research is that at least one assessment task requires participants to interpret or produce language without significant time for reflection, while another allows such time so that the two performances can be compared. In contrast, because social theories view L2 development as a process by which learners gain control over meaningful activity without concern over distinctions between different kinds of knowledge, elicitation techniques that encourage the use of explicit, conceptual knowledge are of primary interest (Lantolf & Poehner, 2014; Poehner et al., 2019). Likewise, tasks that entail use of L2 grammatical structures in real-life classroom activities are valued over attempts to isolate target structures or distinguish receptive from productive modalities (van Compernolle, 2019). Indeed, within the qualitative studies that predominate in socially-oriented research, data elicitation often consists of audio- or video-recorded classroom activities that cap ture how learners utilize L2 structures within the affordances provided by teachers, peers, and instructional materials. For example, current sociocultural research on dynamic assessment has documented how learners respond to teacher mediation that systematically adapts to their in-the-moment needs during communica tive L2 tasks (Davin, 2016; Poehner, 2018). Another research strand focuses on concept-based instruction (García, 2018; Lantolf & Zhang, 2017), where teachers package explicit grammar explanations within “scientific” grammatical concepts such as tense, aspect, mood, or voice, and learners then work on applying the con cepts to their L2 use. Regardless of theoretical perspective, however, most L2 grammar studies re quire participants to employ targeted grammar structures in tasks that involve interpretation, production, or some mix of the two, either in solo or interactive performances. Discerning grammatical development most often means assess ing whether the participant’s performance conforms to the norms of a standard,

Chapter 9. Grammar 215

native-speaker variety of the language (Mackey & Gass, 2022). However, studies that take an emic view of development may simply aim to see whether partici pants demonstrate interactional competence by satisfactorily achieving the task goals regardless of any non-standard grammatical features (Huth, 2021). Other studies meanwhile focus on whether participants use L2 grammatical resources in sociolinguistically or pragmatically appropriate ways (Taguchi et al., 2016). Finally, ISLA grammar research may also focus on non-target-like features of learners’ L2 use before and after instruction, given that these can shed light on cognitive and social developmental processes. L2 grammar researchers are often keenly interested in the extent of learners’ overgeneralization of instructed forms to other linguistic contexts and whether instruction can counteract the effects of first-language (L1) transfer, where non-target-like L2 use reflects analogous gram matical constructions in learners’ L1. 3.2

Operationalizing grammatical knowledge

The design of elicitation instruments should validly reflect how target grammar structures represent interwoven components of form, meaning, and use. Traditional AJTs isolate target structures within a mix of acceptable and unacceptable sentences and simply ask participants to indicate whether they are possible in the language. Such a design emphasizes accuracy in form over meaning or use, given that partici pants do not have to interpret the items or associate them with any communicative purpose. A disproportionate focus on form alone is similarly evident in production tasks where participants fill blanks in isolated sentences with target forms. To build meaning and use into grammar assessments, AJTs and other recep tive tasks often accompany test items with pictures or text to provide a contextual basis for participant decisions. For example, to ensure that L2 learners of English understand the meaning of passives, the sentence John was hit by the ball might be accompanied by one picture depicting a ball landing on John’s head and another showing John hitting the ball with a bat. Participants would then decide which of them accurately reflected the meaning of the sentence. Likewise, unacceptable sen tences such as Sarah was arrived at the party might be included to assess whether learners understand how verb meanings determine whether passive constructions are possible. Meanwhile, assessments of acceptable use often provide visual or textual prompts that are more or less favorable contexts for target structures, with partic ipants either rating the item or choosing the best among two or more items. With regard to English passives, such a prompt might explain that during a baseball game, one of the players hit a ball into the stands, followed by the question, “What happened to the ball after that?” Participants would then choose the pragmatically

216 Paul D. Toth

more appropriate active-voice sentence The ball hit John, which puts the ball as given information in subject position and John as new information in the predicate, or the less appropriate John was hit by the ball, which puts new and given information in the opposite positions. In productive tasks, aspects of target structure form, meaning, and use are potentially available for assessment in every learner response. For example, if par ticipants in the previous scenario were asked to write rather than select an answer to the question, “What happened to the ball after that?” researchers could focus on whether or not a passive was appropriately used, whether sentence meaning reflected the scenario provided, and whether morphosyntactic forms were accu rately used. Production task formats also vary widely in their solicitation of oral versus writ ten responses and the length and degree of open-endedness allowed. Design choices in turn have consequences for the analysis strategies required and whether the data can validly answer the study’s research questions. Because open-ended prompts for oral or written responses may allow participants to avoid target structures with grammatically acceptable alternatives, researchers must consider whether measur ing the frequency of target structures within an open-ended task is a desired goal, or whether incorporating greater structure into the task would better suit their purpose. An assessment seeking greater use of passives than an open-ended prompt might therefore provide sentential subjects to use for each description of various situations. For example, a prompt asking participants to describe a picture showing John getting hit by a baseball might be followed by a fill-in-the-blank answer space, John ________. On the other hand, because such constraints are rarely found out side of institutional contexts, open-ended prompts are arguably more ecologically valid as assessments of participants’ capacity for using target structures naturally. Whereas researchers must usually transcribe oral production, written re sponses can often be worked with directly. In either case, analysis involves identi fying response contents that address the research questions and then coding them according to a definition of that behavior, whether it be the use or omission of target structures, the creation of contexts where target structures might be used, or variation in the choice of target structures and accompanying contexts. Inevitably a raw frequency count of coded instances follows, which is often compared with accompanying counts of possible contexts of use from which percentages can be derived (Bernard, 2018). In controlled tasks where every item creates a favorable context for target structure use, percentage of use alone may sufficiently address the research questions. However, in more open-ended tasks, frequency counts of contexts created and percentages of use provide an importantly nuanced picture of participant behaviors.

Chapter 9. Grammar 217

3.3

Data elicitation options

To illustrate how operationalization principles can be put into practice, we will focus specifically on the design of AJTs and picture description tasks (PDTs), two instruments that can be used in classrooms with or without supportive technology. We will then consider how recordings of classroom interactions can facilitate qual itative analyses of grammatical development before turning in Section 3.4 to con crete examples from an ISLA project that my colleagues and I conducted with high school L2 Spanish learners (Toth, 2022; Toth et al., 2013). 3.3.1 Acceptability judgment tasks (AJTs) As indicated in Sections 3.1 and 3.2, the design of AJTs should address research questions within broader assumptions about the nature of grammatical knowledge and development. Since few people take tests like AJTs outside research contexts, their ecological validity is necessarily weaker than more open-ended communica tive tasks. However, because AJTs allow for carefully fine-tuned grammatical assess ments, they remain invaluable tools within this acknowledged limitation, as long as researchers transparently describe the considerations underlying their design (Plonsky et al., 2020). One fundamental feature of AJT design is the kinds of items intended to be grammatically “acceptable” versus “unacceptable.” Most often, acceptable items assess the impact of some instructional experience (e.g., explicit feedback, L2 immersion, study abroad) by testing learners before and one or more times after the experience. To assess the full extent of the impact, acceptable items may include both directly targeted L2 structures as well as related constructions that might be contexts for spillover effects. Likewise, AJTs should include unacceptable items that reflect hypotheses about any systematicity in learner errors. Often, these items represent a transfer of analogous L1 structures to the L2 or simplification strategies such as an overgeneralization of target structures to linguistically similar contexts. Other possibilities include items with simplified or missing grammatical features, such as infinitive verb forms, bare lexical items, or highly frequent but improper inflections, which learners often turn to when they cannot process more complex target structures. Once the acceptable and unacceptable item categories have been determined, the items in each category must be sufficient in number to assure internal validity (i.e., a sufficiently similar range of responses; see Chapter 2) while also not resulting in an AJT that is so lengthy that it produces participant fatigue. Pilot testing prior to data collection is key to success here. Finally, researchers must decide whether the AJT will be administered with or without time limitations for each item, and whether the stimuli will be presented auditorily, in written form, or both. As described in Section 3.1, these decisions will

218 Paul D. Toth

necessarily affect the kind of grammatical knowledge that the assessment can be said to validly represent (Plonsky et al., 2020). Nonetheless, real-world constraints on the physical context of data gathering must also be considered, including the nature of available technology and media resources, and the possibilities for testing participants individually or in intact classes. To illustrate, in a seminal study by Rod Ellis (2009), participants responded to written items with item-wise time gaps ranging from 1.8 to 6.2 seconds while working individually at computer stations. The gaps were determined by adding 20% to the average time taken for each item by a native speaker comparison group. Meanwhile, participants in the Toth (2022) project took a paper-and-pencil test in their regular classrooms with pre-recorded aural stimuli presented via low-tech stereo equipment. A fixed, four-second time gap was used for all items based on pilot testing with similar L2 Spanish learners who rated four seconds among other gaps as the one they found “challenging” but not “overwhelming” (p. 18). 3.3.2 Picture description tasks (PDTs) Designing an effective PDT can be more complex than it appears because tar get structure production depends not only on participants’ underlying lexical and grammatical capabilities but also on what they infer about the nature of relevant responses from the instructions and item prompts. In many ways, the challenge is similar to that of any teacher using picture prompts for L2 speaking or writing activities, except that ISLA researchers often require a more systematic approach to ensure elicitation in all linguistic contexts of interest. PDT designers must therefore try to anticipate how participants will “read” the intentions for L2 use in the task as the basis for evoking particular lexical and grammatical items. Section 1.1 illustrated this process regarding a description for Figure 1: The prominence of objects and activities in the visual prompt, together with a contextually-shaped understanding of the purpose of language use influ ences the lexical and grammatical items that come to mind. Understanding the interplay between intended L2 use and related forms and meanings can also in form decisions about how controlled or open-ended the task should be, as well as how lengthy participants should make their responses. As suggested in Section 3.2, more open-ended tasks are better at assessing the frequency with which particular structures come to mind for participants, while more strictly controlled tasks can shed light on structural contexts that learners might otherwise avoid. For PDTs with paragraph-level target responses, researchers can control for the variable length of participants’ responses by focusing their analysis of target structure use within fixed word limits (Bernard, 2018).

Chapter 9. Grammar 219

3.3.3 Qualitative data collection The possibilities for qualitative data elicitation are vast, and whereas most involve straightforwardly capturing L2 use within loosely-controlled conditions (e.g., re cordings of classroom interactions, semi-structured interviews, written notes), the techniques required for analysis can be quite complex (see Chapter 3). As mentioned previously, qualitative ISLA grammar research most often focuses on either the context and frequency of target structure use, or participants’ explicit understanding of their L2 use. Samples of learner language may be captured ei ther concurrently during a language task (Bowles, 2018), or through retrospective reflection (e.g., Toth, 2004). The appropriate analysis depends on whether pre conceived theoretical concepts motivate the study’s research questions, or if the research questions call for an exploration of emergent patterns from which novel theoretical concepts might derive. For example, based on both cognitive and sociocultural theoretical con cepts, much qualitative ISLA research since the 1990s has sought to catalog language-related episodes (LREs) in L2 peer interaction as catalysts for L2 gram matical development. Given Swain and Lapkin’s (1998) seminal work defining LREs as “any part of a dialogue where the students talk about the language they are producing, question their language use, or correct themselves or others” (p. 326), research since then has applied this definition to qualitative data to better under stand the social and instructional conditions in which such deliberations occur and their impact on subsequent L2 use (Philp et al., 2014). Meanwhile, other ISLA grammar studies have taken a more bottom-up approach to identifying the functions that certain L2 structures perform and the processes by which learners’ grammatical repertoires expand. Eskildsen (2015), for example, documented two L2 English learners’ use of interrogative constructions in a multi-year series of classroom recordings from which he hypothesized about stages in the development of these constructions. Similarly, in the study by Toth et al. (2013) featured in the second exemplar study box, recordings of how learn ers talked about grammar during an L2 problem-solving task were used to draw conclusions about the effectiveness of explicit grammar explanations by identifying different levels of abstraction in their analytic talk. 3.4

Examples of published research on instructed grammatical development

The goal of the Toth (2022) ISLA project was to document the effects of two forms of explicit instruction on the Spanish pronoun se in passive and inchoative (i.e., spontaneous changes of state) constructions. The instructional treatments were each implemented in different third-year L2 Spanish classes over three 90-minute lessons in three different U.S. public high schools (six classrooms total; two in each

220 Paul D. Toth

school). In addition to these two experimental groups, another two classes in one of the schools served as a control group for comparison. Through AJTs and PDTs administered immediately before (pretest), immediately following (posttest), and six weeks after the three lessons (delayed posttest), my colleagues and I aimed to identify changes in learners’ accurate use of se and the appearance of overgener alization and L1 transfer errors. Interactions during small-group and whole-class activities were also audio and video recorded to document classroom interaction. One of the experimental groups experienced a top-down, deductive approach to explicit instruction, where each lesson began with a general rule for se followed by sentence-level examples. Communicative tasks then directed learners’ attention to se in narrative texts and activities. The second experimental class experienced a guided, bottom-up, inductive approach to formulating grammatical rules, where learners first completed comprehension tasks using the same narrative texts as the deductive class and then identified contrasting examples of the use and omis sion of se in them. Next, during what are known as “co-construction” activities (Adair-Hauck & Donato, 2016), learners worked in small groups to hypothesize rules for se based on the texts. They then shared these in a whole-class follow-up while the teacher guided them toward acceptable generalizations. Communicative tasks identical to those of the deductive group then occupied the remainder of the lesson. Meanwhile, the control group received no explicit grammar instruction during the three lessons but rather worked on writing tasks. In this way, test results from the control classes would provide a comparative baseline for identifying the effect of explicit instruction in the experimental classes. In the first exemplar study box, I provide an overview of the quantitative aspects of the Toth (2022) project, which appears in a special issue of the journal Language Learning. The full project features both quantitative and qualitative analyses of this large data set carried out by several authors using different cognitive and social methodological tools. Complete versions of the AJT and PDT, as well as qualitative transcripts of classroom interactions can be found online at the University of York’s IRIS database (http://www.iris-database.org; see Toth, 2021a, 2021b, 2021c). Exemplar study: Toth (2022) Research question How will explicit, deductive vs. guided inductive instruction on the pronoun se affect high school L2 Spanish learners’ performance on an AJT and a PDT in the following areas? – Acceptance of instructionally-targeted forms – Overgeneralization to constructions with similar semantic properties – Transfer of English-like constructions Theoretical framework Cognitive psycholinguistics

Chapter 9. Grammar 221

Methods Participants in the deductive, guided inductive, and control groups completed three different versions of a written, sentence-level PDT and a picture-based, auditory timed AJT as a pretest immediately before instruction, a posttest immediately afterward, and a delayed posttest six weeks later. Statistical analyses focused on the items in each task that assessed target-like uses of se, overgeneralization, and L1 transfer. The three main factors in the analyses were: (1) participant group, (2) construction type, and (3) time of test (to assess changes over time). Findings Instructed learners outperformed controls on both tasks, and on both posttests, for target-like uses of se. However, among the instructed groups, deductive learners showed somewhat stronger improvements. To a lesser degree, both instructed groups overgeneralized se to semantically similar verbs on both tasks, but the control learners did not. Both the instructed and control groups continually used unacceptable English-like constructions for at least some items on both tasks, despite the instructed learners’ simultaneous target-like improvements. Take-aways Given the time pressure on the AJT, the increased acceptance of target uses of se suggested that both types of explicit instruction influenced the development of implicit procedural and/or automatized explicit L2 grammatical knowledge. A selective overgeneralization to verbs with similar semantic properties to the instructional targets furthermore suggested that explicit instruction may engage with an implicit knowledge of verb meanings to affect linguistic behavior. Nonetheless, the persistence of L1 transfer errors across all learner groups suggested that instruction-influenced growth in L2 grammatical knowledge does not necessarily preempt reliance on well-established competing constructions from learners’ L1. The somewhat stronger instructional effects in the deductive group, coupled with the qualitative data discussed in the next box, raised the possibility that greater clarity on how the target structure works was achieved via this technique.

Figure 2 provides examples of four AJT items from Toth’s (2022) project as they ap peared in the audio recording, the test booklets, and the answer sheets. Participants rated the acceptability of each sentence as a depiction of an accompanying picture on a six-point Likert scale, which allowed their numerical responses to be used as scores within each category of acceptable and unacceptable items. The answer sheet also allowed learners to select “don’t know” instead of rendering a judgment so that intermediate points on the scale would more likely indicate neutral acceptability than any comprehension difficulties (Spinner & Gass, 2019). Acceptable items directly assessed the passive and inchoative objects of in struction (Items 1 and 4), while one set of unacceptable items tested for over generalization by using se with verbs that do not accept the pronoun but bear semantic similarities to those that do (Item 2). L1 transfer was assessed by in cluding inchoatives without the required se pronoun that resembled acceptable English equivalents (Item 3). In all, the AJT contained 88 items where se was used or omitted in a variety of acceptable and unacceptable contexts, including additional

222 Paul D. Toth

constructions not shown in Figure 2. Other items were distractors unrelated to se. Together, each item category included 3–4 target test items, which were randomly distributed throughout the AJT. The test took approximately 25 minutes of class time to complete. Recorded sentences: English glosses (not included in the test): 1. “La puerta se abrió” The door [se] opened – acceptable (inchoative se) 2. “ Raúl se descansa en la cama” Raúl [se] rests in bed – unacceptable (overgeneralization of se) 3. “El pescado cocinó” The fish [Ø] cooked – unacceptable (L1 transfer of inchoative without se) 4. “Se lavaron los platos.” The plates [se] were washed – acceptable (passive se)

Corresponding visuals appearing in test book:

1.

3.

Raúl

4.

2.

Corresponding items on the answer sheet: How well does each sentence match the picture? totally wrong 1

bad 2

poor 3

fair 4

good 5

excellent match 6

1.

don’t know

3.

don’t know

2.

don’t know

4.

don’t know

Figure 2. Sample Acceptability Judgment Task items from the Toth (2022) project as they appeared in the test materials

Chapter 9. Grammar 223

1. viajar las muchachas

2. romper el vaso

3. pedir una comida deliciosa

English gloss (not on test): to travel the women Target response: omission of se “Las muchachas viajaron”

English gloss: to break the glass Target response: inchoative se “El vaso se rompió ” English gloss: to order a delicious meal Target response: passive se “Se pidió una comida deliciosa.”

4. nacer el bebé

English gloss: to be born the baby Target response: omission of se “Nació el bebé”

Figure 3. Sample Picture Description Task items as they appeared in the test materials

As shown in Figure 3, the PDT used in the Toth (2022) ISLA project involved a con trolled, sentence-level, written design to focus participants’ production on various target features of se constructions. In addition to ensuring greater systematicity in the assessment of linguistic contexts, the design also facilitated scoring procedures that could accommodate the large number of participants (N = 138). The PDT asked participants to describe situations depicted in 15 pictures using nouns and verbs accompanying each item. Three items represented five different contexts for accurately using or omitting se in an overall randomized order. Because participants

224 Paul D. Toth

would likely produce grammatically acceptable sentences that avoided the target se constructions, a binary, descriptive scoring scheme was devised that assigned one point for each use of se and no points for omission, regardless of whether the sentence was grammatically acceptable. This produced a percentage score for use of se that could be applied to each item category, whether it meant desirable target-like behavior or overgeneralization to semantically similar verbs. Shortly after the ISLA project data were collected, Toth et al. (2013) conducted an early qualitative analysis of the whole-class and small-group activities in the guided induction group. We sought to document the social and discursive processes by which learners completed the co-construction activities. Given the somewhat stronger quantitative outcomes for deductive learners described in the summary of Toth (2022), our goal was to see whether our implementation of this bottom-up, dialogic technique for mediating L2 use had lived up to its sociocultural principles. As described in the second exemplar study box, our analysis focused on inter actions in four focal learner groups in one class and the whole-class discussions that followed. We took an exploratory approach to identifying explicit grammati cal analyses that ultimately uncovered four levels of analytical abstraction within the co-construction activities. Our results suggested that although the bottom-up analyses may have successfully supported rule formulation, difficulties in describing grammatical processes without greater instructional support may have weakened the utility of the resulting explicit knowledge. Exemplar study: Toth et al. (2013) Research question What processes in explicit analytic reasoning emerge during the co-construction activities of guided induction, and how are they distributed among participants during small-group and whole-class interactions? Theoretical framework Sociocultural Theory Methods Data was gathered in an intact class of 17 third-year L2 Spanish students at a U.S. high school. Nine learners in four small groups were audio recorded, and whole-class follow-up discussions were video recorded. All recordings were manually transcribed, coded, and independently verified by two members of the research team. “Analytic talk” was identified as propositions that overtly mentioned L2 form and meaning while participants hypothesized rules for se. A reiterative review and thematic coding of transcripts focused on identifying different levels of abstraction in participants’ analytic talk. Findings Through inter-rater reliability procedures, four non-overlapping increasing levels of abstraction were verified: – labeling, which made form or meaning references to specific items in the texts used as instructional sources

Chapter 9. Grammar 225

– categorizing, which identified commonalities in form or meaning among two or more text items – patterning, which referred to “relationships between two categories of form, such as subject-verb agreement, or a relationship between a category of form and meaning, such as the appearance of se and the lack of a specific agent” (Toth et al., 2013, p. 287) – rule formation, which extended observed patterns to generalizations about the Spanish language Although analytic talk comprised only 17.5% of the available time in small-group interaction, compared to 57.6% in the whole-class follow-ups, more than 64% of all analytic talk occurred at the higher levels of patterning and rule formulation. Individuals played similar roles in more or less actively contributing to analytic talk during both interaction contexts. While negotiating their analytic proposals, learners often creatively extended the meaning of known grammatical concepts and informal terminology to cope with the unfamiliar target structure. Take-aways The proportion of time spent in higher levels of analytic talk suggests that co-construction activities achieved their objective of guiding learners’ reasoning toward explicit target structure generalizations. Nonetheless, uneven group participation in this process, together with the need to extend and negotiate the meaning of needed terminology, suggest that greater instructional support might have better mediated learners’ explicit knowledge development.

4. Advice to future grammar researchers Reflecting on the Toth (2022) project, a few takeaways emerge for ISLA grammar research. First, interpreting learner linguistic behaviors requires a thoughtful dia logue with theories of language as both the vehicle and object of instruction, and of language learning as its goal. Relevant cognitive and social theories informed the design of this study’s materials and procedures so that it might contribute to con versations about why some learner behaviors readily reflect instructional influence while others require a more protracted effort. Second, because grammatical development inevitably involves a complex inter play between the knowledge that we can consciously articulate about language and our ability to express intended L2 meanings in the moment, data collection should involve at least two or more performance measures, such as the AJT and PDT, to triangulate the documentation of learner behaviors. In the qualitative analyses, having a whole-class video recording to visually corroborate our audio recordings of small-group interactions greatly facilitated the work of transcription. In intact classrooms, it is also important to capture both the processes and the products of instruction, even if the project is primarily quantitative or qualitative in fo cus. Quantitative outcomes will often lack an explanatory grounding if there is

226 Paul D. Toth

insufficient documentation of how instructional strategies were implemented, and qualitative accounts of learning processes will lack a satisfactory understanding of where they lead if a valid documentation and assessment of learner behaviors beyond the immediate instructional experience is not included. Third, researchers should remember that L2 classrooms are complex socio cognitive spaces where instructional design is but one among numerous factors that shape linguistic behavior (Toth & Davin, 2016; Toth & Moranski, 2018). It is therefore important to consider a broad range of potential influences on behaviors of interest when choosing theoretical and methodological research tools. Although the quantitative methods used in much cognitive research may achieve general izability by isolating mental processes from contextual influences, the ecological validity of their findings will be limited by the fact that individuals only engage mental processes for purposeful activity within particular contexts. Moreover, in quantitative analyses, statistical reliability only indicates the degree of consistency or variation among data points, which may change with the number of factors and participants included. Thus, such broad findings may not be able to answer specific questions about what teachers or learners should do to respond to the instructional challenges in a particular situation. Likewise, although the qualitative methods in most socially-oriented research provide rich depictions of contextually-motivated behavior, the applicability of the findings elsewhere can only be inferred, and the question of how widespread the phenomena are remains open. Finally, even when research procedures are theoretically and methodologi cally well-motivated, getting to know participants before data gathering builds a mutual understanding that strengthens project validity and allows researchers to adapt their work to real-world needs. Ensuring that teacher and learner participants maximally benefit from the research, given the investments of time and potential self-exposure are indeed core principles for conducting ethical research. Classroom research in particular requires relationships of trust between researchers, teachers, and learners, as project activities must be perceived as productive uses of instruc tional time. In addition to pilot testing, consulting with participants on the design of materials to address one’s research questions can foster a mutually beneficial collaboration as participants explore their linguistic experiences more deeply and researchers come to understand the contextual influences on classroom behavior. 5. Troubleshooting grammar research Even with an ideal research design, proceeding with data gathering presents op portunities for any number of unforeseen circumstances to require a change of plans. The following provides answers to some common questions about how to keep things on track:

Chapter 9. Grammar 227

Common question 1: What do I do if I lose a data source, either because a teacher backs out of the project or an instructional treatment doesn’t go as planned? Plan to gather more data than you think you will need. Regardless of your meth odology, involving more than the minimum necessary classrooms, participants, recordings, or other data sources will allow you to adjust if anything goes wrong. On the other hand, if everything works out, you will have ample material for explor ing numerous facets in your data and dividing the project into multiple research publications. Common question 2: How do I avoid mishaps during data collection? In addition to pilot testing well in advance, check all technology resources im mediately before you need them to ensure they work properly for data gathering. Confirm with collaborators that they have everything they need to proceed. Also, be sure a knowledgeable member of the research team is present or immediately available to participants during data collection to address any uncertainties they may have. Common question 3: What happens if I need more information about participant backgrounds to explain results that I’ve discovered in my data? Accompany all data collection activities with an anonymous demographic ques tionnaire to identify key aspects of participant backgrounds that could affect their linguistic behavior and inform the interpretation of your observations. 6. Conclusions Because grammar plays a fundamental role in human communication, L2 gram matical development will always be a topic of interest in applied linguistics. It is nonetheless important that ISLA grammar research be grounded within current theories of language and language learning so that it can offer meaningful conclu sions about the role of instruction in this process. The complexity of successful re search requires continual reflection about how the linguistic behavior documented through materials design and analysis might reflect both current L2 theories and the lived experiences of participants. Well-chosen conceptual and analytical tools in turn allow for interpretations of observed behaviors that not only contribute substantively to ongoing theoretical discussions but offer real-world implications that help teachers and learners better understand their experiences.

228 Paul D. Toth

7. Further reading and additional resources In addition to the readings cited in the references list, the following may be helpful for further reading and research examples. Although few sources in applied lin guistics are devoted exclusively to both grammar and ISLA, the topic is pervasively covered in the following sources. 7.1

Suggested books

Ellis, R., Skehan, P., Li, S., Shintani, N., & Lambert, C. (2020). Task-based language teaching: Theory and practice. Cambridge University Press. https://doi.org/10.1017/9781108643689 Hall, J. K. (2019). Essentials of SLA for L2 teachers. Routledge. Lantolf, J. P., Poehner, M. E., & Swain, M. (Eds.). (2018). The Routledge handbook of sociocultural theory and second language development. Routledge. https://doi.org/10.4324/9781315624747 Loewen, S., & Sato, M. (Eds.). (2017). The Routledge handbook of instructed second language acquisition. Routledge. https://doi.org/10.4324/9781315676968 VanPatten, B. (2017). While we’re on the topic: BVP on language, acquisition, and classroom practice [Apple Books version]. American Council on the Teaching of Foreign Languages. https://my.actfl.org/portal/ItemDetail?iProductCode=BVP-LACP

7.2

Suggested journals

Foreign Language Annals: Official research journal of the American Council on the Teaching of Foreign Language Language Awareness: A journal focused on the role of explicit knowledge in language development. Language Teaching Research: A broad-based journal for research on teaching and learning any L2. Modern Language Journal: A journal devoted to research on teaching and learning any L2, but with a particular commitment to “high quality work in non-English languages.” TESOL Quarterly: A broad-based journal for research on teaching and learning L2 English.

7.3

Suggested professional organizations and web resources

The American Council on the Teaching of Foreign Languages: https://www.actfl.org IRIS digital repository of instruments and materials for research into L2s: https://www.irisdatabase.org/iris/app/home/index International Association for Task-Based Language Teaching: https://www.iatblt.org Task-Based Language Learning Task Bank (instructional materials): https://tblt.indiana.edu TESOL International Association: https://www.tesol.org

Chapter 9. Grammar 229

References Adair-Hauck, B., & Donato, R. (2016). PACE: A story-based approach for dialogic inquiry about form and meaning. In J. L. Shrum & E. W. Glisan (Eds.), Teacher’s handbook: Contextualized language instruction (5th ed.) (pp. 206–230). Cengage Learning. Baars, B. J., & Franklin, S. (2003). How conscious experience and working memory interact. Trends in Cognitive Sciences, 7(4), 166–172. https://doi.org/10.1016/S1364-6613(03)00056-1 Bernard, H. R. (2018). Research methods in anthropology: Qualitative and quantitative approaches (6th ed.). Rowman & Littlefield. Bowles, M. A. (2018). Introspective verbal reports: Think-alouds and stimulated recall. In A. Phakiti, P. De Costa, L. Plonsky, & S. Starfield (Eds.), The Palgrave handbook of applied linguistics research methodology (pp. 339–357). Palgrave Macmillan. https://doi.org/10.1057/978-1-137-59900-1_16 Chomsky, N. (2005). Three factors in language design. Linguistic Inquiry, 36(1), 1–22. https://doi.org/10.1162/0024389052993655 Cowan, N. (2014). Working memory underpins cognitive development, learning, and education. Educational Psychology Review, 26(2), 197–223. https://doi.org/10.1007/s10648-013-9246-y Davin, K. J. (2016). Classroom dynamic assessment: A critical examination of constructs and practices. Modern Language Journal, 100(4), 813–829. https://doi.org/10.1111/modl.12352 De Costa, P., Valmori, L., & Choi, I. (2017). Qualitative research methods. In S. Loewen & M. Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 522–540). Routledge. https://doi.org/10.4324/9781315676968-29 DeKeyser, R. M. (2017). Knowledge and skill in ISLA. In S. Loewen & M. Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 15–21). Routledge. https://doi.org/10.4324/9781315676968-2 Duff, P. A. (2014). Case study research on language learning and use. Annual Review of Applied Linguistics, 34, 233–255. https://doi.org/10.1017/S0267190514000051 Ellis, N. C. (2015). Implicit AND explicit language learning: Their dynamic interface and com plexity. In P. Rebuschat (Ed.), Implicit and explicit learning of languages (pp. 3–23). John Benjamins. https://doi.org/10.1075/sibil.48.01ell Ellis, R. (2009). Measuring implicit and explicit knowledge of a second language. In R. Ellis, S. Loewen, C. Elder, R. Erlam, J. Philp, & H. Reinders (Eds.), Implicit and explicit knowledge in second language learning, testing, and teaching (pp. 31–64). Multilingual Matters. https://doi.org/10.21832/9781847691767-004 Eskildsen, S. W. (2015). What counts as a developmental sequence? Exemplar-based L2 learning of English questions. Language Learning, 65(1), 33–62. https://doi.org/10.1111/lang.12090 García, P. N. (2018). Concept-based instruction: Investigating the role of conscious conceptual manipulation in L2 development. In J. P. Lantolf, M. E. Poehner, & M. Swain (Eds.), The Routledge handbook of sociocultural theory and second language development (pp. 181–196). Routledge. https://doi.org/10.4324/9781315624747-12 Godfroid, A. (2020). Eye tracking in second language acquisition and bilingualism: A research synthesis and methodological guide. Routledge. https://doi.org/10.4324/9781315775616 Grey, S., & Tagarelli, K. M. (2018). Psycholinguistic methods. In A. Phakiti, P. De Costa, L. Plonsky, & S. Starfield (Eds.), The Palgrave handbook of applied linguistics research methodology (pp. 287–312). Palgrave Macmillan. https://doi.org/10.1057/978-1-137-59900-1_14

230 Paul D. Toth

Halliday, M. A. K., & Matthiessen, C. M. I. M. (2014). Halliday’s introduction to functional grammar (4th ed.). Routledge. https://doi.org/10.4324/9780203783771 Huth, T. (2021). Conceptualizing interactional learning targets for the second language cur riculum. In S. Kunitz, N. Markee, & O. Sert (Eds.), Classroom-based conversation analytic research: Theoretical and applied perspectives on pedagogy (pp. 359–381). https://doi.org/10.1007/978-3-030-52193-6_18 Kim, J., & Nam, H. (2017). Measures of implicit knowledge revisited: Processing modes, time pressure, and modality. Studies in Second Language Acquisition, 39(3), 431–457. https://doi.org/10.1017/S0272263115000510 Kim, Y., Tracy-Ventura, N., & Jung, Y. (2016). A measure of proficiency or short-term memory? Validation of an elicited imitation test for SLA research. Modern Language Journal, 100(3), 655–673. https://doi.org/10.1111/modl.12346 Lantolf, J. P., & Poehner, M. E. (2014). Sociocultural theory and the pedagogical imperative in L2 education: Vygotskian praxis and the research/praxis divide. Routledge. https://doi.org/10.4324/9780203813850 Lantolf, J. P., & Zhang, X. (2017). Concept-based language instruction. In S. Loewen & M. Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 146–165). Routledge. https://doi.org/10.4324/9781315676968-9 Larsen-Freeman, D. E. (2003). Teaching language: From grammar to grammaring. Thomson Heinle. Larsen-Freeman, D. E., & Tedick, D. J. (2016). Teaching world languages: Thinking differently. In D. H. Gitomer & C. A. Bell (Eds.), Handbook of research on teaching (5th ed.) (pp. 1335–1387). American Educational Research Association. https://doi.org/10.3102/978-0-935302-48-6_22 Lew, S., Yang, A. H., & Harklau, L. (2018). Qualitative methodology. In A. Phakiti, P. De Costa, L. Plonsky, & S. Starfield (Eds.), The Palgrave handbook of applied linguistics research methodology (pp. 79–101). Palgrave Macmillan. https://doi.org/10.1057/978-1-137-59900-1_4 Loewen, S. (2018). Instructed second language acquisition. In A. Phakiti, P. De Costa, L. Plonsky, & S. Starfield (Eds.), The Palgrave handbook of applied linguistics research methodology (pp. 663–680). Palgrave Macmillan. https://doi.org/10.1057/978-1-137-59900-1_29 Mackey, A., & Gass, S. M. (2022). Second language research: Methodology and design (3rd ed.). Routledge. https://doi.org/10.4324/9781003188414 Maie, R., & Godfroid, A. (2022). Controlled and automatic processing in the acceptability judg ment task: An eye-tracking study. Language Learning, 72(1), 158–197. https://doi.org/10.1111/lang.12474 Mitchell, R., Myles, F., & Marsden, E. (2019). Second language learning theories (4th ed.). Routledge. https://doi.org/10.4324/9781315617046 Nassaji, H. (2017). Grammar acquisition. In S. Loewen & M. Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 205–223). Routledge. https://doi.org/10.4324/9781315676968-12 Paradis, M. (2009). Declarative and procedural determinants of second languages. John Benjamins. https://doi.org/10.1075/sibil.40 Philp, J., Adams, R. J., & Iwashita, N. (2014). Peer interaction and second language learning. Routledge. Plonsky, L. (2017). Quantitative research methods. In S. Loewen & M. Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 505–521). Routledge. https://doi.org/10.4324/9781315676968-28

Chapter 9. Grammar 231

Plonsky, L., Marsden, E., Crowther, D., Gass, S. M., & Spinner, P. (2020). A methodological syn thesis and meta-analysis of judgment tasks in second language research. Second Language Research, 36(4), 583–621. https://doi.org/10.1177/0267658319828413 Poehner, M. E. (2018). Probing and provoking L2 development: The object of mediation in dy namic assessment and mediated development. In J. P. Lantolf, M. E. Poehner, & M. Swain (Eds.), The Routledge handbook of sociocultural theory and second language development (pp. 249–265). Routledge. https://doi.org/10.4324/9781315624747-16 Poehner, M. E., van Compernolle, R. A., Esteve, O., & Lantolf, J. P. (2019). A Vygotskian de velopmental approach to second language education. Journal of Cognitive Education and Psychology, 17(3), 238–259. https://doi.org/10.1891/1945-8959.17.3.238 Spada, N. (2015). SLA research and L2 pedagogy: Misapplications and questions of relevance. Language Teaching, 48(1), 69–81. https://doi.org/10.1017/S026144481200050X Spinner, P., & Gass, S. M. (2019). Using judgments in second language acquisition research. Routledge. https://doi.org/10.4324/9781315463377 Suzuki, Y. (2017). Validity of new measures of implicit knowledge: Distinguishing implicit knowl edge from automatized explicit knowledge. Applied Psycholinguistics, 38(5), 1229–1261. https://doi.org/10.1017/S014271641700011X Suzuki, Y., & DeKeyser, R. (2017). The interface of explicit and implicit knowledge in a second language: Insights from individual differences in cognitive aptitudes. Language Learning, 67(4), 747–790. https://doi.org/10.1111/lang.12241 Swain, M., Kinnear, P., & Steinman, L. (2015). Sociocultural theory in second language education: An introduction through narratives (2nd ed.). Multilingual Matters. https://doi.org/10.21832/9781783093182 Swain, M., & Lapkin, S. (1998). Interaction and second language learning: Two adolescent French immersion students working together. Modern Language Journal, 82(3), 320–337. https://doi.org/10.1111/j.1540-4781.1998.tb01209.x Taguchi, N., Gomez-Laich, M. P., & Arrufat-Marques, M.-J. (2016). Comprehension of indirect meaning in Spanish as a foreign language. Foreign Language Annals, 49(4), 677–698. https://doi.org/10.1111/flan.12230 Toth, P. D. (2004). When grammar instruction undermines cohesion in L2 Spanish classroom discourse. Modern Language Journal, 88(1), 14–30. https://doi.org/10.1111/j.0026-7902.2004.00216.x Toth, P. D. (2021a). Classroom transcripts. Materials from “What do the Data Show? Multiple Perspectives on Classroom L2 Learning from a Single Data Set” [Data: Transcriptions]. IRIS Database, University of York, UK. https://doi.org/10.48316/ytwx-jv53 Toth, P. D. (2021b). Test scores from the grammaticality judgment task. Materials from “What do the data show? Multiple perspectives on classroom L2 learning from a single data set” [Data: Scores on measures / tests]. IRIS Database, University of York, UK. https://doi.org/10.48316/jm90-7t67 Toth, P. D. (2021c). Test scores from the picture description task. Materials from “What do the Data Show? Multiple Perspectives on Classroom L2 Learning from a Single Data Set” [Data: Scores on measures / tests]. IRIS Database University of York, UK. https://doi.org/10.48316/ms1w-fe12 Toth, P. D. (Ed.). (2022). What do the data show? Multiple perspectives on classroom L2 learning from a single data set [Special issue]. Language Learning 72(S1). Toth, P. D., & Davin, K. J. (2016). The sociocognitive imperative of L2 pedagogy. Modern Language Journal, 100(Supplement 1), 148–168. https://doi.org/10.1111/modl.12306

232 Paul D. Toth

Toth, P. D., & Moranski, K. (2018). Why haven’t we solved instructed SLA? A sociocognitive account. Foreign Language Annals, 51(1), 73–89. https://doi.org/10.1111/flan.12322 Toth, P. D., Wagner, E., & Moranski, K. (2013). ‘Co-constructing’ explicit L2 knowledge with high school Spanish learners through guided induction. Applied Linguistics, 34(3), 255–278. https://doi.org/10.1093/applin/ams049 Ullman, M. T., & Lovelett, J. T. (2018). Implications of the declarative/procedural model for im proving second language learning: The role of memory enhancement techniques. Second Language Research, 34(1), 39–65. https://doi.org/10.1177/0267658316675195 van Compernolle, R. A. (2019). The qualitative science of Vygotskian sociocultural psychology and L2 development. In J. W. Schwieter & A. G. Benati (Eds.), The Cambridge handbook of language learning (pp. 62–83). Cambridge University Press. https://doi.org/10.1017/9781108333603.004 van Lier, L. (2008). Agency in the classroom. In J. P. Lantolf & M. E. Poehner (Eds.), Sociocultural theory and the teaching of second languages (pp. 163–186). Equinox. Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Harvard University Press. Vygotsky, L. S. (1986). Thought and language. The MIT Press. Yule, G. (2020). The study of language (7th ed.). Cambridge University Press. https://doi.org/10.1017/9781108582889

Chapter 10

Pronunciation What to research and how to research in instructed second language pronunciation Andrew H. Lee and Ron I. Thomson Brock University

This chapter discusses research methods in the domain of instructed second language (L2) pronunciation. After explaining what pronunciation is and why it is important, the chapter provides an overview of the current consensus surrounding instructed L2 pronunciation research, including major linguistic targets, speech modalities, and instructional techniques in both classroom- and laboratory-based studies, in addition to their effects on the acquisition of L2 pronunciation. It then introduces a range of speech elicitation instruments used in instructed L2 pronunciation research, followed by a discussion of approaches to L2 speech data analysis. The chapter also proposes future directions in instructed L2 pronunciation research and multiple venues in which L2 pronunciation researchers can disseminate their findings, while urging researcher-practitioner collaboration for evidence-based L2 pronunciation teaching. Finally, it concludes by shedding light on the importance of methodological rigor and offering troubleshooting strategies in instructed L2 pronunciation research. Keywords: instructed L2 pronunciation, L2 pronunciation research methods, L2 speech instruments, L2 speech analysis, researcher-practitioner collaboration

1. What is pronunciation and why is it important? While many dialectal features of language can contribute to a speaker being per ceived as belonging (or not belonging) to a particular speech community, none are as salient as accent. The author of Judges, the seventh book of the Bible, recounts a story of conquest in which the Gileadites identified conquered Ephraimites by asking them to pronounce the word ‘Shibboleth’. Those who mispronounced it as ‘Sibboleth’ were immediately identified as enemies and killed. While the conse quences of speaking with an accent today are rarely so dire (whether nonnative https://doi.org/10.1075/rmal.3.10lee © 2022 John Benjamins Publishing Company

234 Andrew H. Lee and Ron I. Thomson

or simply a different dialect), there are still plenty of instances where nonnative- accented speech leads to miscommunication, or worse, negative evaluation of the speaker (Ennser-Kananen et al., 2021; Lippi-Green, 2012). In addition to provoking instances of overt discrimination, nonnative accents can impact the social integra tion of outsiders. Kogan et al. (2021), for example, found a relationship between nonnative-accented speech and the willingness of majority language speakers to form relationships with immigrants. In particular, their study found that speaking with a nonnative accent not only correlated with fewer friendships with native speakers, but even more so with the reluctance of native speakers to form partner ships with or marry speakers with nonnative accents. This is consistent with Scovel (1988), who theorized that accent is evolutionary in nature, having arisen as a means of preventing mating outside one’s own group. Kogan et al. (2021) argue that these social responses demonstrate the extent to which nonnative accents function as important social and cultural markers and the extent to which such reactions are innate. It is worth noting that in multilingual contexts, diverse accents may not as overtly signal outsider status since most people may speak a common language with an accent (e.g., Coetzee-Van Rooy, 2014; Prah, 2010). Accent functions as such a determinative marker of group identity, because learning to speak a new language without a detectable accent is impossible for most adult learners (Abrahamson & Hyltenstam, 2009; Flege, 1995; Flege & Bohn, 2021; Munro, 2021). This aspect makes L2 pronunciation unique among L2 skills, given that many adult L2 learners can develop a convincingly nativelike grasp of L2 vocabulary and grammar. While Birdsong (2018) has argued for nonnativeness in these other domains as well, detecting nonnative ability can require microscopic scrutiny. In contrast, detecting a nonnative accent is often effortless, especially in speakers whose native languages are phonologically distant from the L2. Given the apparent intractability of a nonnative accent, why should we be con cerned about trying to improve it? Some have emphasized that communication is a two-way street (Derwing & Munro, 2015; Levis, 2018), meaning that listeners have a role to play. Nonetheless, there is still much that adult L2 learners can do to increase the likelihood that listeners will understand their speech. This is important, since even the most patient and tolerant of listeners may have difficulty understanding some L2 speech. While we do not discount the important role of listeners, this chapter focuses on empowering L2 learners to achieve the goal of producing com fortably intelligible speech, rather than improving the listening ability of listeners. Abercrombie (1963) describes ‘comfortably intelligible speech’ as pronunciation that requires little to no conscious effort on the part of listeners. Further, Abercrombie (1963) is a realist, arguing that anything more than comfortably intelligible speech is a largely unattainable ideal (see also Abrahamsson & Hyltenstam, 2009). While interest in L2 pronunciation waned during the Communicative Language Teaching

Chapter 10. Pronunciation 235

era, by the turn of the 21st century, the pendulum had begun to swing back. This was in no small part due to the influence of Munro and Derwing’s (1995a, 1995b) seminal papers and subsequent studies, which examined the relationship between strength of accentedness (i.e., divergence from a target variety), comprehensibility (i.e., the amount of effort required on the listener’s part), and intelligibility (i.e., how much is understood). Munro and Derwing (1995a) demonstrated that accent is only partially corre lated with comprehensibility and intelligibility. Specifically, it is possible to retain a strong nonnative accent and still be highly comprehensible and intelligible. The extent to which an accent influences comprehensibility and intelligibility depends upon the nature of pronunciation errors. Catford (1987) argues that some sound substitutions are more deleterious to communication than others, because they have a higher ‘functional load.’ For example, substituting /l/ for /ɹ/ is predicted to lead to confusion, because this pair of phonemes distinguishes many words in English (e.g., lice vs. rice). Conversely, substituting /t/ for /θ/ will not lead to as much confusion across the language, since this is a low functional load pairing with fewer contrasts. There are also general hierarchies relating to the importance of sounds within words. For example, in English, errors in vowels impact intelligibility more than errors in consonants (Bent et al., 2007), and errors in onset consonants are more problematic than errors in coda consonants (Levis, 2018). Research findings have increasingly led the field back to accepting the primacy of what Levis (2005, 2018) calls the ‘intelligibility principle’ over the ‘nativeness principle’. In most contexts, the goal of being comfortably intelligible is both re alistic and sufficient, while nativelike speech is both unrealistic and unnecessary. Comprehensibility is a higher standard but still achievable. It is most required in social and workplace contexts where interaction is more frequent and sustained, which increases processing demands on listeners (Thomson, 2018). For example, call centers expect nonnative-accented agents to exceed basic intelligibility, because calls often involve frustrated clients. A more controversial goal is ‘acceptability,’ which refers to listeners’ subjective impressions of their degree of irritability while listening to a speaker with a particular accent (Szpyra-Kozłowska, 2014). Even in contexts where acceptability is the goal, like intelligibility and comprehensibility, it is only partially correlated with strength of accent (Zahro, 2019). Further, not all pronunciation errors have the same impact on acceptability ratings (Van den Doel, 2006). Tulaja (2020) found that a consonant substitution error triggered a lower acceptability judgment relative to a vowel substitution error. Whether the goal is intelligibility, comprehensibility, or acceptability, there is strong evidence that achieving any of these goals is possible without eliminating a nonnative accent. Before considering the impact of instruction on pronunciation, it is critical to understand what is possible in naturalistic learning. The development of first

236 Andrew H. Lee and Ron I. Thomson

language (L1) pronunciation unfolds in such a way that it will unavoidably interfere with L2 pronunciation learning later in life. Learning an L1 requires that infants establish a foundation of stable phonological categories upon which higher level learning (e.g., vocabulary and grammar) can occur (Laing & Bergelson, 2020). Successful L1 acquisition requires creating efficiencies within the emerging pho nological system. The highly variable input that infants are exposed to is rich in phonetic detail, much of which is irrelevant to categorization of ambient language sounds. While infants initially pay attention to all phonetic information, within a few months they learn to direct their attention only to information that facilitates sound recognition. It is this L1 developmental outcome that conspires against adult L2 pronunciation learning, which would be far easier if the ability to attend to fine-grained phonetic distinctions were maintained (Munro, 2021). Despite weakening, there is evidence that the mechanisms underlying L1 speech learning are retained and available across the lifespan (Flege & Bohn, 2021). This contradicts earlier notions of a putative biological critical period, after which L2 learning was believed to rely on different mechanisms (Scovel, 2000). Instead, there now seems to be consensus that the relationship between age and degree of nonnative accent is a linear one; that is, the older we are when we begin learning an additional language, the stronger, on average, our nonnative accents will be (Munro, 2021). The relationship between age and strength of accent is also mediated by experience. L2 learners who continue to use their L1 more tend to have stronger nonnative accents than those who use their L1 less (Flege et al., 1997). While L2 pronunciation development reflects some of the same patterns as L1 pronunciation development, in the absence of L2 instruction, adult L2 pronunciation develop ment either plateaus or progresses so slowly as to be nearly impossible to measure. Research suggests that this happens somewhere within the first year or two after arrival in the L2 environment (Derwing & Munro, 2015). Fortunately, there is evi dence that L2 instruction can reactivate learning as described below. 2. What we know and what we need to know about pronunciation research in ISLA This section presents critical topics in instructed L2 pronunciation research, with a brief introduction to the major linguistic targets (e.g., segmentals and suprasegmen tals) and speech modalities, focusing on both speech perception and production. Several instructional techniques, such as explicit phonetic instruction, practice op portunities, and corrective feedback, employed in classroom- and laboratory-based L2 pronunciation research, are discussed.

Chapter 10. Pronunciation 237

2.1

Linguistic targets

Instructed L2 pronunciation research focuses on the acquisition of L2 segmentals and suprasegmentals in various learning contexts. Segmentals refer to individual consonants and vowels, while suprasegmentals refer to features beyond individ ual consonants and vowels, such as stress, intonation, and rhythm. With respect to segmentals, numerous L2 consonants (e.g., the English /ɹ/ in Saito, 2013b and Saito & Lyster, 2012; the Spanish stop, approximants, and rhotic consonants in Kissling, 2013) as well as L2 vowels such as English vowels (e.g., Lee & Lyster, 2016a, 2017; Thomson, 2011) and French vowels (e.g., Inceoglu, 2016) have been targeted in pronunciation research. As for suprasegmentals, linguistic targets in the literature include Mandarin tones (e.g., Saito & Wu, 2014), English word stress (e.g., Sadat-Tehrani, 2017), and French prosody (e.g., Hardison, 2004). Overall, previous research supports the effects of L2 pronunciation instruc tion on the acquisition of various L2 segmentals and suprasegmentals. Notably, meta-analyses by Lee et al. (2015) and Saito (2021) revealed that pronunciation instruction that targeted stress or rhythm, as well as instruction that targeted both segmentals and suprasegmentals showed large effects. Their findings support previ ous studies (e.g., Derwing et al., 1998) that highlighted the effects of suprasegmen tal-based instruction. Further, suprasegmental-based instruction has been found to be twice as effective as segmental-based instruction (Gordon & Darcy, 2012; Yates, 2003). The effectiveness of suprasegmental-based instruction is at least partially owing to the ability of suprasegmentals to impact speech intelligibility (Derwing & Munro, 2005, 2009, 2015; Hahn, 2004). Nevertheless, most studies in instructed L2 pronunciation research disproportionately target the acquisition of segmentals. This may be because instructors tend to find segmental-based instruction easier to teach (Couper, 2016; Huensch, 2019; Saito, 2014). For instance, articulatory descriptions of individual segmentals are more tangible and easier to implement compared to information on how to produce targetlike intonation patterns. In the domain of research methodology, reading words or passages is frequently utilized owing to its convenience. However, this method tends to naturally orient learners’ attention more to segmental features than to suprasegmental features. In addi tion, data analysis in segmental-based research is more straightforward than its suprasegmental counterpart. The acoustic characteristics of consonants and vowels are more evident than those of suprasegmental features, enabling researchers to conduct relatively straightforward acoustic analyses. Given the paucity of studies targeting suprasegmentals, or more importantly, the significant role of supraseg mentals in speech intelligibility, more research on suprasegmentals is needed.

238 Andrew H. Lee and Ron I. Thomson

As addressed so far, the main interest in instructed L2 pronunciation research is the extent to which L2 learners acquire L2 segmentals and suprasegmentals in an intelligible manner. By acquire, we refer specifically to the degree to which L2 learn ers can perceive and produce linguistic targets. The following subsection discusses these two key modalities – speech perception and speech production – focusing on the link between the two. 2.2

Speech perception and production

L2 pronunciation research differentiates between speech perception and speech production. Speech perception, which mainly involves what L2 learners hear (i.e., input), focuses on their ability to identify and discriminate among individual sounds (both segmentals and suprasegmentals) in the L2 input. In contrast, speech production focuses on learners’ ability to articulate those sounds in oral produc tion. Most L2 speech researchers hold the view that speech perception and speech production are interconnected, with perception preceding production, whereas others contend that they are two independent modalities. Drawing from direct realism (Fowler, 1986), the Perceptual Assimilation Model (PAM; Best, 1995) pos its that L2 speakers adopt articulatory gestures for speech production as the basis of speech perception and that speech perception and production systems share representations based in the gestural system. In contrast, auditorists argue for the perception-first view, claiming that targetlike L2 speech perception is a prerequisite for targetlike L2 speech production (Flege, 1995; Flege & Bohn, 2021; Thomson, 2022). Evidence for a link between L2 speech perception and production largely comes from training studies, which have repeatedly demonstrated that perception train ing improves L2 speech production accuracy (Hardison, 2003; Thomson, 2011; Wang et al., 2003). Similarly, Sakai and Moorman’s (2018) meta-analysis found that instruction focused on L2 speech perception improves L2 speech production accuracy. Neurological evidence (Pulvermüller & Schumann, 1994) also corrobo rates such views, highlighting that neurons controlling muscles relevant for speech articulation are co-activated with the sensory neurons that control the auditory system. In a similar vein, Pulvermüller et al. (2006) found that (…) during speech perception, specific motor circuits are recruited that reflect pho netic distinctive features of the speech sounds encountered, thus providing direct neuroimaging support for specific links between the phonological mechanisms for speech perception and production. (p. 7,865)

Chapter 10. Pronunciation 239

Not all research supports the view that L2 speech perception and production are connected, with some arguing that the representations underlying speech perception and speech production are distinct (Huensch & Tremblay, 2015). De Jong et al. (2009) and Peperkamp and Bouchon (2011) failed to show a link between the two modalities. In this respect, Bradlow et al. (1997) reported that “[it] is not the case that improvement in perception and production proceeded in parallel within individual subjects” (p. 2,307). Nagle (2021) also holds that the link between the two modalities is far from straightforward, after finding that only between-subjects perception predictors (not those of within-subjects) were significant and that the strength of the link depended on tasks and target sounds. In instructed L2 pronunciation research, both perception-based instruction and production-based instruction have been designed and tested. For example, Lee et al. (2020) revealed that instruction that includes more perception-oriented tasks may be more effective than instruction that includes more production-oriented tasks, supporting the perception-first view. More importantly, Lee and Lyster (2017) suggested that L2 pronunciation instruction should include perception-based in struction that raises learners’ perceptual noticing and awareness of new L2 sounds, followed by production-based instruction that encourages the learners to practice producing the new L2 sounds with the aid of several instructional techniques, some of which are introduced next. 2.3

Instructional techniques

With the burgeoning interest in L2 research in instructed learning contexts, ap plied linguists have tested a number of instructional techniques, including input enhancement, metalinguistic explanation, practice opportunities, and corrective feedback. With this line of research having targeted mostly L2 grammar or lexical features, L2 pronunciation had not received much attention until the early 2010s. Saito (2011) championed the inclusion of pronunciation in the realm of instructed L2 acquisition research by testing the effects of explicit instruction and corrective feedback on the acquisition of the English phoneme /ɹ/. Since his initial work, several researchers (including the first author of this chapter) have further investi gated the effects of instructional techniques on the acquisition of L2 pronunciation, focusing on various linguistic targets, languages, and instructional contexts. The following three techniques – explicit phonetic instruction, practice opportunities, and corrective feedback – yielded particularly important implications for instructed L2 pronunciation research, each of which is summarized in what follows.

240 Andrew H. Lee and Ron I. Thomson

Explicit phonetic instruction (comprising metalinguistic explanation) and prac tice opportunities are theoretically supported by skill acquisition theory (DeKeyser, 1998, 2001; Lyster & Sato, 2013), which stipulates two types of knowledge: declara tive and procedural (see Chapter 1). Declarative knowledge includes metalinguistic information such as grammar rules and word definitions (i.e., knowledge about the target form), whereas procedural knowledge connotes the ability to access declara tive knowledge and actually apply it to language production in a targetlike manner. In this sense, metalinguistic explanation in explicit phonetic instruction enables L2 learners to develop declarative knowledge. Practice opportunities in turn en courage L2 learners to proceduralize their declarative knowledge as they practice the linguistic targets. Metalinguistic explanation can be provided as part of explicit phonetic in struction, which includes multiple exposures to a target sound accompanied by an explanation of the articulatory gestures for that sound (e.g., pronounce /i/ while rounding your lips for the French /y/). L2 learners pay primary attention to words and their prosodic patterns to derive their meaning from linguistic input (Cutler et al., 1997; Kuhl, 2000, 2004). Therefore, L2 learners initially focus on word-sized units of L2 phonological information, which are thus affected by lexical factors (Imai et al., 2005). After this stage, learners start to notice sound-sized units of L2 phonological information; that is, individual phonemes. Thus, explicit phonetic instruction serves as a catalyst for L2 learners to acquire sound-sized units of L2 phonological information, allowing them to generalize their phonemic knowledge to new lexical contexts. Saito (2013b) studied the effect of explicit phonetic instruc tion, demonstrating that explicit phonetic instruction integrated into a meaning- or communication-oriented lesson draws L2 learners’ attention to sound-sized units of L2 phonological information, thus enabling them to acquire target phonemes. Explicit phonetic instruction is often followed by several instructional activities that encourage L2 learners to practice the target sounds (e.g., Burri et al., 2017; Trofimovich & Gatbonton, 2006). Lee and Lyster (2017) highlight the importance of practice opportunities in L2 pronunciation instruction. Based on the perception-first view in the development of L2 pronunciation, their study hypothesized that higher L2 perception accuracy as a result of perception training would also result in higher L2 production accuracy. However, their results showed that targetlike perception did not guarantee targetlike production. More specifically, the study demonstrated that only L2 learners who physically articulated the target sounds during perception training improved their production accuracy. Consequently, Lee and Lyster (2017) argued that practice opportunities are essential for promoting proceduralization and for transferring knowledge to speech production. In addition to the techniques mentioned above, corrective feedback has also been extensively discussed in instructed L2 pronunciation research, following Lyster

Chapter 10. Pronunciation 241

and Ranta’s (1997) seminal taxonomy. For example, in response to an erroneous production (e.g., sushi [l]ice), reformulations provide L2 learners with the correct forms (e.g., [r]ice), while prompts withhold the correct forms and push the learners to produce targetlike output themselves (e.g., not lice, but…?), providing further practice opportunities. Research shows that L2 learners notice corrective feedback on pronunciation errors more readily than corrective feedback on other linguistic errors (e.g., Lyster, 2001; Mackey et al., 2000), and that reformulations (recasts in particular) are effective for improving L2 learners’ speech production (e.g., Saito, 2013a; Saito & Lyster, 2012). Gooch et al. (2016) found that while recasts improve controlled production, prompts facilitate both controlled and spontaneous produc tion. They argued that recasts enable L2 learners to refine their production accuracy by providing targetlike exemplars, while prompts push L2 learners to improve their speech intelligibility through adjustments to their interlanguage. When it comes to corrective feedback on L2 speech, only a few studies target perception, while most of the research focuses on production. For instance, Lee and Lyster (2016a) investigated how Korean learners of English perceived specific nonnative phonemic contrasts and the extent to which they benefited from corrective feedback. Having demonstrated the importance of corrective feedback in L2 speech perception, Lee and Lyster (2016b) examined the effects of different types of corrective feedback on L2 speech perception. They showed that providing both a target form and a nontarget form (e.g., sheep, not ship) was most effective for improving L2 speech perception, by optimizing learners’ awareness of the phonemic differences between the two forms. This section introduced key issues in the field of L2 pronunciation research, focusing on what we know and what we need to know in the field. The following section describes relevant research techniques. 3. Data elicitation and analysis in instructed L2 pronunciation research As one would expect, assessing L2 pronunciation is complex. It can include meas urement of perception and/or production (see Thomson, 2022) and focus on dis crete features of speech, including segmentals, suprasegmentals, or both. Certain instrumental techniques are utilized to examine these discrete features. For ex ample, to measure perception, identification or discrimination tasks are typically used (see Lee & Lyster, 2016a, 2016b, 2017). Ultimately, the assessment techniques chosen depend on the goal of the assessment. Earlier in this chapter, we made distinctions among accentedness, comprehensibility, and intelligibility, which can be characterized as global pronunciation constructs. Though they can be im pacted by measurable changes at the segmental and suprasegmental level, the three

242 Andrew H. Lee and Ron I. Thomson

pronunciation constructs are listener-centered, because each one is measured via listener judgments. While more fine-grained measures of individual speech char acteristics are helpful for understanding developmental patterns for new segmental and suprasegmental features (e.g., whether something is learned in a controlled perceptual or speaking context), here we focus on listener judgments, which are the ultimate endpoint of pronunciation learning and teaching. Without a change in how L2 speech is perceived by listeners, not much has been accomplished that will matter in the real world. 3.1

How to collect speech samples

Within the accentedness, comprehensibility, and intelligibility paradigms, research typically involves recording L2 learners’ speech, before and after instruction, and sometimes much later in a delayed posttest in order to measure retention (see Thomson & Derwing, 2015). These recordings serve as the basis for evaluation by human judges. The choice of what speaking task to use depends on the goal. For instance, reading-aloud tasks such as reading the Stella passage (see the exam ple below and also http://accent.gmu.edu/) and the Rainbow passage are used to measure L2 learners’ accentedness, comprehensibility, and intelligibility in highly controlled settings. Please call Stella. Ask her to bring these things with her from the store: Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob. We also need a small plastic snake and a big toy frog for the kids. She can scoop these things into three red bags, and we will go meet her Wednesday at the train station.

While it is possible to use highly controlled tasks (e.g., reading) or partially con trolled tasks (e.g., elicitation of target words using picture prompts in a guided activity) to measure a range of abilities, L2 pronunciation research typically uses either monologic tasks or picture-description tasks for accentedness and comprehensibility, and sentence-reading or repetition tasks for intelligibility (see Thomson, 2018). Monologic or picture-description tasks are less controlled than other tasks and are therefore more likely to reflect what learners can do in the real world, where attention cannot be so explicitly directed towards pronunciation. A widely used picture-description task is Derwing et al.’s (2004) eight-frame story about two people mixing up identical suitcases after bumping into each other on a city street (see Figure 1). Short segments of the resulting recordings (usually under a minute), excluding any initial false starts, are then extracted for data analysis and interpretation.

Chapter 10. Pronunciation 243

Figure 1. Picture-description task

3.2

How to analyze speech samples

To assess accentedness and comprehensibility, scalar judgments are typically used. For accentedness and comprehensibility ratings, raters are presented with rand omized samples of recorded learner speech and are asked to evaluate them on a Likert-type scale with only endpoints defined. For example, from Munro and Derwing (1995a): Accentedness rating 1 2 no foreign accent Comprehensibility rating

1 2 extremely easy to understand

3

4

5

6

7

8 9

3

4

5

6

7

8 9

very strong foreign accent

impossible to understand

If these particular distinctions are made, it is crucial that the understanding of these constructs aligns with the research program established over many years by Munro (2018), with its origins in Munro and Derwing (1995a). As Thomson (2018) points out, there is a history of frequent misapplication, which makes comparison of results across studies difficult or even impossible.

244 Andrew H. Lee and Ron I. Thomson

In terms of the judgment scales, while there is no consensus on what length is best, 9-point scales are simple to implement and are usually effective in iden tifying important patterns. Isaacs and Thomson (2013) argue in favor of 9-point scales over a shorter 5-point scale, although even the 5-point scale is quite reliable. While individual raters report a variety of factors that most influence their ratings, mean ratings across experts (experienced language teachers or researchers) and nonexperts (educated individuals with no background in language teaching or re search) tend to be consistent, and when the number of raters exceeds 20, interrater reliability is usually quite high. To evaluate intelligibility, a much wider variety of approaches have been used (see Thomson, 2018), depending on the level of analysis (i.e., individual sounds or complete sentences). If individual sounds are being assessed, a forced-choice identification task is used. Raters are presented with randomized recordings of L2 speech production and asked to indicate what target sounds they hear with refer ence to phonetic symbols or to keywords including the target sounds. Forced-choice identification tasks are usually done via computers but can be administered as a pencil-and-paper task. It is critical that the recorded productions are presented in a random sequence, so as to avoid potential bias effects if raters know which re cordings are from before or after training. Including samples collected from native speakers of the target language is also recommended to confirm the reliability of ratings (i.e., those samples would receive high rating scores). For sentence-level intelligibility, the most common approach is a transcription task. The percent age of words (or sounds) that are correctly transcribed by raters is a speaker’s intelligibility score. When intelligibility is assessed at the sentence level, contextual cues often make otherwise unintelligible productions intelligible. For example, if a learner says, “I like to lun,” the rater may transcribe it as “I like to run” based on the semantic context. This broad operationalization of intelligibility reflects listener understanding in the real world. To ensure that there is no effect of order of presentation on listener judgments, several precautions are advised. First, listeners are shown the same picture prompts that were used to elicit speech from the recorded speakers. This means that they should be familiar with the content of the speech regardless of which speaker they hear first. Second, listeners are given several practice items so that they can familiar ize themselves with the task and ask any questions of the researchers. Third, speech samples should be randomized across listeners (or at least across several groups of listeners) so that any possible order effects can be detected (i.e., early items being rated more harshly or leniently than later items). Once accentedness and comprehensibility ratings and intelligibility scores have been collected, they can be separated into each data collection point to measure

Chapter 10. Pronunciation 245

change over time, or to compare across different learner groups. Correlations between each construct, or with predictor variables (e.g., functional load scores, particular error types, learners’ individual differences, etc.) can also be used to iden tify areas for further focus. Various statistical models (e.g., ANOVA, MANOVA, correlation, regression, mixed-effects models, and discriminant analysis) are then employed to analyze and interpret the quantitative data. The following study exem plifies data elicitation and analysis in L2 pronunciation research. Exemplar study: Derwing & Munro (2013) Main research question To what extent do adult immigrant learners of English (Mandarin and Slavic language speakers) improve their speech comprehensibility, fluency, and accentedness over 7 years? Methods This study employed a picture-description task, during which participants produced narratives based on an 8-frame cartoon story about two people mixing up suitcases. Each participant’s narrative was recorded at the following three points in time: 2 months, 2 years, and 7 years after enrollment in the study. Their narratives were assessed by monolingual native speakers of English and highly proficient L2 speakers of English, focusing on comprehensibility (from 1 = easy to understand to 9 = extremely difficult to understand), fluency (from 1 = extremely fluent to 9 = extremely dysfluent), and accentedness (from 1 = no accent to 9 = extremely strong accent). Individual information, such as years of English studied in the country of origin, age on arrival to Canada, and frequency of conversations in English, was also collected. Findings The Mandarin language speakers showed no change over time on any of the three speech constructs, whereas the Slavic language speakers showed improvement in their comprehensibility and fluency. Improvement in accentedness was limited. The interaction among L1 background, age of arrival, the depth and breadth of the learners’ conversations in English, and willingness to communicate contributed to these outcomes. In addition, there were strong similarities between the ratings of the native speakers and those of the L2 speakers. Take-aways This study exemplifies L2 pronunciation research in terms of data elicitation (picture-description task) and data analysis (comprehensibility, fluency, and accentedness). Particularly, the finding that both native speakers and highly proficient L2 speakers gave similar ratings of the speech data offers a significant methodological implication in L2 pronunciation research; that is, L2 pronunciation research does not need to restrict raters to one group or the other.

246 Andrew H. Lee and Ron I. Thomson

4. Advice to future pronunciation researchers In this section, we invite our colleagues and future researchers to consider some key issues regarding conducting instructed L2 pronunciation research. First, early research on L2 pronunciation was conducted primarily in laboratory settings in which L2 learners received phonetic training in a decontextualized, con trolled setting. Although many applied linguists have conducted various types of in structed L2 pronunciation research in intact L2 classrooms, more classroom-based studies are needed for the sake of ecological validity. In contrast to laboratory-based studies, classroom-based studies do pose a number of methodological challenges, with the most significant being the design and implementation of research mate rials involving both researchers and practitioners. In most cases, researchers de sign treatment and measurement tasks for pretest-posttest designs and then ask practitioners to implement them in their classrooms. Classroom-based studies often include a teacher training session to ensure that the treatment designed by researchers is operationalized as intended. However, there have been multiple cases (e.g., Spada, 2005) in which practitioners were not fully aware of the nature of the research studies and their operationalization, which ended up affecting the validity of the studies. Conversely, researchers sometimes design theoretically sound, yet pedagogically poor materials, which potentially limit the pedagogical implications of the studies. Hence, classroom-based studies require a great deal of collabora tion among researchers and practitioners, which is often missing in the field of instructed L2 acquisition research in general. Based on our research experience, it is critical for researchers to involve practitioners (specifically, language teachers) at the initial stages of their studies, so that both parties can mutually benefit from their respective expertise in designing and operationalizing classroom-based studies. It is also important to encourage practitioners to engage in data analysis and dissemi nation, including co-presentations at academic and professional venues such as the meetings of Pronunciation in Second Language Learning and Teaching, English Pronunciation: Issues and Practices, and Accents. Submitting co-authored arti cles to academic and professional journals such as the Journal of Second Language Pronunciation is also recommended. Overall, such engagement would give prac titioners a strong sense of contribution and ownership in their research studies in addition to compelling them to recognize their value, all of which maximize the synergistic effects for both researchers and practitioners. Most importantly, mean ingful collaboration would also result in evidence-based L2 pronunciation teach ing, which in turn makes L2 pronunciation learning more efficient for learners. Chapter 4 of this book (Mixed methods research in ISLA), also introduces a useful methodological framework for practice-based research that promotes dialogue be tween researchers and practitioners in instructed L2 research.

Chapter 10. Pronunciation 247

Second, notwithstanding the benefits of human-delivered instruction and computer-assisted language instruction in L2 pronunciation teaching and learning, Lee et al. (2015) found that studies employing the latter instruction tend to produce smaller effects than those employing the former instruction. Given the strengths of each instruction, L2 learners are likely to benefit the most from a combination of both. Yet, to our knowledge, there are no specific studies that investigate this question in detail. In addition, it would be beneficial to prioritize linguistic targets that significantly affect speech intelligibility and comprehensibility. One place to start is with more empirical studies regarding the contribution of functional load of individual sounds. While functional load has been frequently mentioned in the literature, only two small-scale empirical studies exist (Kang & Moran, 2014; Munro & Derwing, 2006). It is also possible to investigate the relative impact of particular types of vowel and consonant errors (e.g., epenthesis, substitution, deletion, etc.) and the relative impact of suprasegmental errors (see Bergeron & Trofimovich, 2017; Isaacs & Thomson, 2020). Third, it is evident that various types of pronunciation instruction help learners acquire diverse linguistic targets in L2 pronunciation (see Lee et al., 2015; Thomson & Derwing, 2015). One noteworthy question arising from previous studies asks whether and how L2 pronunciation instruction and phonological knowledge affect the acquisition of other linguistic domains. For instance, hypothesizing that L2 pho nological knowledge is a sine qua non for acquiring L2 grammatical targets (e.g., French determiners), Lee (2018) explored the sources that prevented L2 learners from producing the grammatical targets correctly; that is, whether the deficiency is due to a lack of their L2 grammatical knowledge, phonological knowledge, or both. Implementing several experimental conditions, the study highlighted the importance of L2 phonological knowledge and, furthermore, L2 pronunciation instruction in the acquisition of French grammar as well as their interdependence with lexical and morphological domains. Martin and Jackson (2016) also found that pronunciation training facilitated the acquisition of L2 grammatical structures focusing on German separable- and inseparable-prefix verbs. We believe that this line of research is important to expand horizons in regard to the roles attributed to L2 pronunciation instruction in L2 acquisition overall. Loewen (2019) agrees that L2 research needs to address the relationship between pronunciation and other lin guistic domains. In this regard, we hope for future studies that can contribute to our understanding of the roles of phonological knowledge and pronunciation instruc tion in the acquisition of not only pronunciation but also other linguistic domains. Finally, we stress that L2 pronunciation researchers must maintain methodo logical rigor in their work and make their research materials and data available for open access in the Open Science Framework (e.g., the IRIS Repository). We also hope to vitalize a Research Methods strand at pronunciation-focused conferences

248 Andrew H. Lee and Ron I. Thomson

to encourage the development of field-specific research tools and statistical models, as well as to further promote research methods and ethics standards in instructed L2 pronunciation research. The following section discusses potential complications with the recommended approaches in instructed L2 pronunciation research, as well as how to address them. 5. Troubleshooting ISLA pronunciation research Common question 1: How do I select the most appropriate task for my study? Though effective and widely used, the tasks outlined in this chapter are not with out drawbacks. Reading-aloud tasks, while easy to implement (and having thus dominated L2 pronunciation research), do not reflect what learners can do in the real world. Relative to Reading-aloud tasks, monologic or picture-description tasks reflect more spontaneous language use (see Thomson & Derwing, 2015). At the same time, while monologic speech is more spontaneous, it is difficult to control content beyond a general topic. Fortunately, extemporaneous picture-description tasks provide a middle ground and afford greater control over content reflected in the pictures used. We caution that not all picture-description tasks are the same and that learners may have more or less difficulty with particular sequences of pictures, depending on their content and complexity. Rossiter et al. (2008) describe this in detail and offer helpful guidelines for selecting picture-description tasks. Finally, reusing established tasks across studies, as has been the case with Derwing et al.’s (2004) ‘Suitcase story’, allows for comparison across different populations. In addi tion to monologic tasks, dialogic tasks have also been used and are arguably more reflective of the dynamic nature of most listening in the real world (see Crowther, 2020; Trofimovich et al., 2020). However, while dialogic speaking tasks may be commonplace in the real world, an observer’s judgment of how comprehensible they are does not reflect how comprehensible each participant is to the opposing interlocutor during real-time dialogue. To the observer-rater, comprehensibility is the sum of the contributions of both participants in a dialogue. Common question 2: What are ways to encourage robust data collection and analysis? The collection of speech for pronunciation assessment is labor-intensive. It requires individually recording each learner, editing sound files, and later randomizing them for presentation to raters. Because the data collected typically include both pretests and posttests, there is always the risk of attrition. To maintain motivation, partici pants are often given an honorarium for their participation. Another way to reduce attrition is to collect data during normal class time, rather than outside of class. This

Chapter 10. Pronunciation 249

requires buy-in from instructors, and appropriate ethics approval must be obtained to ensure that participants are adequately informed prior to consent and given an opportunity to withdraw without penalty. In our research experience, learners tend to be very happy to be pulled out of classes to participate, since it gives them an opportunity to interact with a new interlocutor. The use of picture-description tasks takes very little time (usually less than 5 minutes) and can often be combined with other speaking tasks if researchers desire. Collecting ratings from judges also typically requires offering a small hono rarium. The fact that both experts and nonexperts tend to provide reliable ratings makes it easier to recruit raters. While many studies have conducted rating sessions in groups using a pencil-and-paper task, with the advent of cloud-based systems, it is easier to obtain data at a place and time that can accommodate the raters. The drawback here is that raters need to be given strict instructions to avoid distrac tions, which the researchers cannot directly observe (Nagle & Rehman, 2021). Most rating studies of L2 pronunciation use Cronbach’s Alpha to report interrater relia bility. This statistic does not mean that raters agree on a specific speaker’s absolute value on a rating scale, but that they consistently rank particular speakers higher and other speakers lower. The overall mean rating scores for particular speakers are not particularly important, other than to measure improvement over time or to de termine who is more or less comprehensible and accented relative to other speakers within the sample. As noted earlier, with 20 or more raters, reliability is very high (typically above .90). While it is possible to obtain reasonable reliability scores with fewer than 20 raters, reliability will likely decrease with less skilled raters. Thus, having 20 or more raters is recommended for valid data analysis. It is important to note that apart from a very brief familiarization phase prior to the rating tasks, the rating paradigm promoted here does not require that raters be formally trained. Rather, ratings simply reflect their immediate reaction to the stimuli and do not require any detailed analysis of the speech the way that formal assessment does. Previous suprasegmental-focused studies (e.g., Hardison, 2004) have shown that it is extremely difficult for raters to disregard segmental quality even when explicitly asked to pay attention to and assess suprasegmental quality (i.e., prosody). Moreover, segmental quality often influences raters’ assessment of suprasegmental quality. To avoid such effects, employing a digital filter (low pass, Blackman, 100th order), which renders segmental contents of speech unintelligible while retaining prosodic information, is worth considering for analyzing suprasegmental data. Duan (2017) offers more information on the use of the digital filter. Last but not least, ethical issues have been a growing discussion in the field of linguistics, particularly focused around speech (Müller & Ball, 2013). For in stance, raters listen to speech samples collected from L2 learners as a course of data analysis. Though anonymity is a particularly important principle to uphold in

250 Andrew H. Lee and Ron I. Thomson

research ethics, the confidentiality of participants’ identity cannot be completely guaranteed owing to the recognizability of the human voice. Hence, it is highly important for researchers to not only state this issue in their consent forms but also clearly explain it to their participants in order to prevent any ethical dilemmas. 6. Conclusions The current chapter discussed the main constructs of pronunciation, the impor tant role that pronunciation plays in L2 acquisition, as well as justifications and methods for research in the domain of instructed L2 pronunciation. After intro ducing segmentals, suprasegmentals, speech perception, and speech production, the chapter presented several instructional techniques commonly studied in the do main, such as explicit phonetic instruction, practice opportunities, and corrective feedback. It then introduced various speech instruments in addition to data analysis methods employed in instructed L2 pronunciation research. Along with future directions, troubleshooting strategies in task design, data collection, and analysis were also offered. In the end, we underscored the need for continued methodolog ical rigor to move the field forward. Also emphasized was researcher-practitioner collaboration to increase the ecological validity and impact of instructed L2 pronunciation research, not only on the academic and professional communities of L2 teaching and learning but also on the success of L2 learners at large. 7. Further readings and additional resources 7.1

Suggested readings

Derwing, T. M., & Munro, M. J. (2015). Pronunciation fundamentals: Evidence-based perspectives for L2 teaching and research. John Benjamins. https://doi.org/10.1075/lllt.42 Derwing, T. M., Munro, M. J., & Thomson, R. I. (2022). The Routledge handbook of second language acquisition and speaking. Routledge. https://doi.org/10.4324/9781003022497 Kang, O., Thomson, R. I., & Murphy, J. M. (2018). The Routledge handbook of contemporary English pronunciation. Routledge. Levis, J. M. (2018). Intelligibility, oral communication, and the teaching of pronunciation. Cambridge University Press. https://doi.org/10.1017/9781108241564 Levis, J. M., Derwing, T. M., & Munro, M. J. (2022). The evolution of pronunciation teaching and research. John Benjamins. https://doi.org/10.1075/bct.121 Levis, J. M., Derwing, T. M., & Sonsaat-Hegelheimer, S. (2022). Second language pronunciation: Bridging the gap between research and teaching. Wiley-Blackwell. Munro, M. J. (2021). Applying phonetics: Speech science in everyday life. Wiley-Blackwell.

Chapter 10. Pronunciation 251

7.2

Suggested journal and conferences

Journal: Journal of Second Language Pronunciation. https://benjamins.com/catalog/jslp Conferences: Pronunciation in Second Language Learning and Teaching, Accents, New Sounds, English Pronunciation: Issues & Practices

References Abercrombie, D. (1963). Problems and principles in language study (2nd ed.). Longman. Abrahamsson, N., & Hyltenstam, K. (2009). Age of onset and nativelikeness in a second lan guage: Listener perception versus linguistic scrutiny. Language Learning, 59(2), 249–306. https://doi.org/10.1111/j.1467-9922.2009.00507.x Bent, T., Bradlow, A. R., & Smith, B. L. (2007). Phonemic errors in different word positions and their effects on intelligibility of non-native speech: All’s well that begins well. In O.-S. Bohn & M. J. Munro (Eds.), Language experience in second language speech learning: In honor of James Emil Flege (pp. 331–347). John Benjamins. https://doi.org/10.1075/lllt.17.28ben Bergeron, A., & Trofimovich, P. (2017). Linguistic dimensions of accentedness and comprehen sibility: Exploring task and listener effects in second language French. Foreign Language Annals, 50(3), 547–566. https://doi.org/10.1111/flan.12285 Best, C. T. (1995). A direct realist view of cross-language speech perception. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 171–204). York Press. Birdsong, D. (2018). Plasticity, variability and age in second language acquisition and bilingual ism. Frontiers in Psychology, 9, 1–17. https://doi.org/10.3389/fpsyg.2018.00081 Bradlow, A. R., Pisoni, D. B., Akahane-Yamada, R., & Tohkura, Y. (1997). Training Japanese lis teners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production. Journal of the Acoustical Society of America, 101(4), 2299–2310. https://doi.org/10.1121/1.418276 Burri, M., Baker, A., & Chen, H. (2017). “I feel like having a nervous breakdown:” Pre-service and in-service teachers’ developing beliefs and knowledge about pronunciation instruction. Journal of Second Language Pronunciation, 3(1), 109–135. https://doi.org/10.1075/jslp.3.1.05bur Catford, J. C. (1987). Phonetics and the teaching of pronunciation: A systemic description of English phonology. In J. Morley (Ed.), Current perspectives on pronunciation: Practices anchored in theory (pp. 87–100). TESOL. Coetzee-Van Rooy, S. (2014). Explaining the ordinary magic of stable African multilingual ism in the Vaal Triangle region in South Africa. Journal of Multilingual and Multicultural Development, 35(2), 121–138. https://doi.org/10.1080/01434632.2013.818678 Couper, G. (2016). Teacher cognition of pronunciation teaching amongst English language teachers in Uruguay. Journal of Second Language Pronunciation, 2(1), 29–55. https://doi.org/10.1075/jslp.2.1.02cou Crowther, D. (2020). Rating L2 speaker comprehensibility on monologic vs. interactive tasks: What is the effect of speaking task type? Journal of Second Language Pronunciation, 6(1), 96–121. https://doi.org/10.1075/jslp.19019.cro

252 Andrew H. Lee and Ron I. Thomson

Cutler, A., Dahan, D., & Van Donselaar, W. (1997). Prosody in the comprehension of spoken language: A literature review. Language and Speech, 40(2), 141–201. https://doi.org/10.1177/002383099704000203 de Jong, K., Hao, Y.-C., & Park, H. (2009). Evidence for featural units in the acquisition of speech production skills: Linguistic structure in foreign accent. Journal of Phonetics, 37(4), 357–373. https://doi.org/10.1016/j.wocn.2009.06.001 DeKeyser, R. M. (1998). Beyond focus on form: Cognitive perspectives on learning and prac ticing second language grammar. In C. J. Doughty & J. Williams (Eds.), Focus on form in classroom second language acquisition (pp. 42–63). Cambridge University Press. DeKeyser, R. M. (2001). Automaticity and automatization. In P. Robinson (Ed.), Cognition and second language instruction (pp. 125–151). Cambridge University Press. https://doi.org/10.1017/CBO9781139524780.007 Derwing, T. M., & Munro, M. J. (2005). Second language accent and pronunciation teaching: A research-based approach. TESOL Quarterly, 39(3), 379–397. https://doi.org/10.2307/3588486 Derwing, T. M., & Munro, M. J. (2009). Putting accent in its place: Rethinking obstacles to com munication. Language Teaching, 42(4), 476–490. https://doi.org/10.1017/S026144480800551X Derwing, T. M., & Munro, M. J. (2013). The development of L2 oral language skills in two L1 groups: A 7-year study. Language Learning, 63(2), 163–185. https://doi.org/10.1111/lang.12000 Derwing, T. M., Munro, M. J., & Wiebe, G. (1998). Evidence in favor of a broad framework for pronunciation instruction. Language Learning, 48(3), 393–410. https://doi.org/10.1111/0023-8333.00047 Derwing, T. M., Rossiter, M. J., Munro, M. J., & Thomson, R. I. (2004). Second language fluency: Judgments on different tasks. Language Learning, 54(4), 655–679. https://doi.org/10.1111/j.1467-9922.2004.00282.x Duan, W. (2017). Teaching French pronunciation to Chinese adult learners in communica tive language classrooms: Examining the effectiveness of explicit phonetic instruction [Unpublished master’s thesis]. McGill University. Ennser-Kananen, J., Halonen, M., & Saarinen, T. (2021). “Come join us and lose your ac cent!:” Accent modification courses as hierarchization of international students. Journal of International Students, 11(2), 322–340. https://doi.org/10.32674/jis.v11i2.1640 Flege, J. E. (1995). Second-language speech learning: Theory, findings, and problems. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 233–277). York Press. Flege, J. E., & Bohn, O.-S. (2021). The revised speech learning model (SLM-r). In R. Wayland (Ed.), Second language speech learning: Theoretical and empirical progress (pp. 3–83). Cambridge University Press. https://doi.org/10.1017/9781108886901.002 Flege, J. E., Frieda, E. M., & Nozawa, T. (1997). Amount of native-language (L1) use affects the pronunciation of an L2. Journal of Phonetics, 25(2), 169–186. https://doi.org/10.1006/jpho.1996.0040 Fowler, C. A. (1986). An event approach to the study of speech perception from a direct-realist perspective. Journal of Phonetics, 14(1), 3–28. https://doi.org/10.1016/S0095-4470(19)30607-2 Gooch, R., Saito, K., & Lyster, R. (2016). Effects of recasts and prompts on L2 pronunciation development: Teaching English /ɹ/ to Korean adult EFL learners. System, 60, 117–127. https://doi.org/10.1016/j.system.2016.06.007 Gordon, J., & Darcy, I. (2012, March). The development of comprehensible speech in L2 learners: Effects of explicit pronunciation instruction on segmentals and suprasegmentals. Paper pre sented at the 2012 American Association for Applied Linguistics Conference, Boston, MA.

Chapter 10. Pronunciation 253

Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of su prasegmentals. TESOL Quarterly, 38(2), 201–223. https://doi.org/10.2307/3588378 Hardison, D. M. (2003). Acquisition of second-language speech: Effects of visual cues, context, and talker variability. Applied Psycholinguistics, 24(4), 495–522. https://doi.org/10.1017/S0142716403000250 Hardison, D. M. (2004). Generalization of computer-assisted prosody training: Quantitative and qualitative findings. Language Learning & Technology, 8(1), 34–52. https://doi.org/10125/25228 Huensch, A. (2019). Pronunciation in foreign language classrooms: Instructors’ training, class room practices, and beliefs. Language Teaching Research, 23(6), 745–764. https://doi.org/10.1177/1362168818767182 Huensch, A., & Tremblay, A. (2015). Effects of perceptual phonetic training on the perception and production of second language syllable structure. Journal of Phonetics, 52, 105–120. https://doi.org/10.1016/j.wocn.2015.06.007 Imai, S., Walley, A. C., & Flege, J. E. (2005). Lexical frequency and neighborhood density effects on the recognition of native and Spanish-accented words by native English and Spanish listeners. The Journal of the Acoustical Society of America, 117(2), 896–907. https://doi.org/10.1121/1.1823291 Inceoglu, S. (2016). Effects of perceptual training on second language vowel perception and production. Applied Psycholinguistics, 37(5), 1175–1199. https://doi.org/10.1017/S0142716415000533 Isaacs, T., & Thomson, R. I. (2013). Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions. Language Assessment Quarterly, 10(2), 135–159. https://doi.org/10.1080/15434303.2013.769545 Isaacs, T., & Thomson, R. I. (2020). Reactions to second language speech: Influences of discrete speech characteristics, rater experience, and speaker first language background. Journal of Second Language Pronunciation, 6(3), 402–429. https://doi.org/10.1075/jslp.20018.isa Kang, O., & Moran, M. (2014). Functional loads of pronunciation features in nonnative speakers’ oral assessment. TESOL Quarterly, 48(1), 176–187. https://doi.org/10.1002/tesq.152 Kissling, E. M. (2013). Teaching pronunciation: Is explicit phonetics instruction beneficial for FL learners? Modern Language Journal, 97(3), 720–744. https://doi.org/10.1111/j.1540-4781.2013.12029.x Kogan, I., Dollmann, J., & Weißmann, M. (2021). In the ear of the listener: The role of foreign accent in interethnic friendships and partnerships. International Migration Review, 55(3), 746–784. https://doi.org/10.1177/0197918320988835 Kuhl, P. K. (2000). A new view of language acquisition. Proceedings of the National Academy of Sciences of the United States of America, 97(22), 11850–11857. https://doi.org/10.1073/pnas.97.22.11850 Kuhl, P. K. (2004). Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience, 5(11), 831–843. https://doi.org/10.1038/nrn1533 Laing, C., & Bergelson, E. (2020). From babble to words: Infants’ early productions match words and objects in their environment. Cognitive Psychology, 122, 1–15. https://doi.org/10.1016/j.cogpsych.2020.101308 Lee, A. H. (2018). The effects of different instructional and cognitive variables on the acquisi tion of grammatical gender by second language learners of French [Unpublished doctoral dissertation]. McGill University. Lee, A. H., & Lyster, R. (2016a). The effects of corrective feedback on instructed L2 speech per ception. Studies in Second Language Acquisition, 38(1), 35–64. https://doi.org/10.1017/S0272263115000194

254 Andrew H. Lee and Ron I. Thomson

Lee, A. H., & Lyster, R. (2016b). Effects of different types of corrective feedback on receptive skills in a second language: A speech perception training study. Language Learning, 66(4), 809–833. https://doi.org/10.1111/lang.12167 Lee, A. H., & Lyster, R. (2017). Can corrective feedback on second language speech perception errors affect production accuracy? Applied Psycholinguistics, 38(2), 371–393. https://doi.org/10.1017/S0142716416000254 Lee, B., Plonsky, L., & Saito, K. (2020). The effects of perception-vs. production-based pronunci ation instruction. System, 88, 1–13. https://doi.org/10.1016/j.system.2019.102185 Lee, J., Jang, J., & Plonsky, L. (2015). The effectiveness of second language pronunciation instruc tion: A meta-analysis. Applied Linguistics, 36(3), 345–366. https://doi.org/10.1093/applin/amu040 Levis, J. M. (2005). Changing contexts and shifting paradigms in pronunciation teaching. TESOL Quarterly, 39(3), 369–377. https://doi.org/10.2307/3588485 Lippi-Green, R. (2012). English with an accent: Language, ideology, and discrimination in the United States (2nd ed.). Routledge. https://doi.org/10.4324/9780203348802 Loewen, S. (2019, September). Instructed second language acquisition and pronunciation. Paper presented at the 11th Pronunciation in Second Language Learning and Teaching Conference, Flagstaff, AZ. Lyster, R. (2001). Negotiation of form, recasts, and explicit correction in relation to error types and learner repair in immersion classrooms. Language Learning, 51(s1), 265–301. https://doi.org/10.1111/j.1467-1770.2001.tb00019.x Lyster, R., & Ranta, L. (1997). Corrective feedback and learner uptake: Negotiation of form in communicative classrooms. Studies in Second Language Acquisition, 19(1), 37–66. https://doi.org/10.1017/S0272263197001034 Lyster, R., & Sato, M. (2013). Skill acquisition theory and the role of practice in L2 development. In P. García Mayo, M. Gutierrez-Mangado, & M. Martínez Adrián (Eds.), Contemporary approaches to second language acquisition (pp. 71–92). John Benjamins. https://doi.org/10.1075/aals.9.07ch4 Mackey, A., Gass, S., & McDonough, K. (2000). How do learners perceive interactional feed back? Studies in Second Language Acquisition, 22(4), 471–497. https://doi.org/10.1017/S0272263100004010 Martin, I. A., & Jackson, C. N. (2016). Pronunciation training facilitates the learning and reten tion of L2 grammatical structures. Foreign Language Annals, 49(4), 658–676. https://doi.org/10.1111/flan.12224 Müller, N., & Ball, M. J. (2013). Research methods in clinical linguistics and phonetics: A practical guide. John Wiley & Sons. Munro, M. J. (2018). Dimensions of pronunciation. In O. Kang, R. I. Thomson, & J. M. Murphy (Eds.), The Routledge handbook of contemporary English pronunciation (pp. 413–431). Routledge. Munro, M. J., & Derwing, T. M. (1995a). Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. Language Learning, 45(1), 73–97. https://doi.org/10.1111/j.1467-1770.1995.tb00963.x Munro, M. J., & Derwing, T. M. (1995b). Processing time, accent, and comprehensibility in the perception of native and foreign-accented speech. Language and Speech, 38(3), 289–306. https://doi.org/10.1177/002383099503800305 Munro, M. J., & Derwing, T. M. (2006). The functional load principle in ESL pronunciation instruction: An exploratory study. System, 34(4), 520–531. https://doi.org/10.1016/j.system.2006.09.004

Chapter 10. Pronunciation 255

Nagle, C. L. (2021). Revisiting perception-production relationships: Exploring a new approach to investigate perception as a time-varying predictor. Language Learning, 71(1), 243–279. https://doi.org/10.1111/lang.12431 Nagle, C. L., & Rehman, I. (2021). Doing L2 research online: Why and how to collect online ratings data. Studies in Second Language Acquisition, 43(4), 916–939. https://doi.org/10.1017/S0272263121000292 Peperkamp, S., & Bouchon, C. (2011). The relation between perception and production in L2 phonological processing. Proceedings of the 12th Annual Conference of the International Speech Communication Association, 1, 168–171. https://doi.org/10.21437/Interspeech.2011-72 Prah, K. K. (2010). Multilingualism in urban Africa: Bane or blessing. Journal of Multicultural Discourses, 5(2), 169–182. https://doi.org/10.1080/17447143.2010.491916 Pulvermüller, F., Huss, M., Kherif, F., del Prado Martin, F. M., Hauk, O., & Shtyrov, Y. (2006). Motor cortex maps articulatory features of speech sounds. Proceedings of the National Academy of Sciences of the United States of America, 103(20), 7865–7870. https://doi.org/10.1073/pnas.0509989103 Pulvermüller, F., & Schumann, J. H. (1994). Neurobiological mechanisms of language acquisi tion. Language Learning, 44(4), 681–734. https://doi.org/10.1111/j.1467-1770.1994.tb00635.x Rossiter, M. J., Derwing, T. M., & Jones, V. M. L. O. (2008). Is a picture worth a thousand words? TESOL Quarterly, 42(2), 325–329. https://doi.org/10.1002/j.1545-7249.2008.tb00127.x Sadat-Tehrani, N. (2017). Teaching English stress: A case study. TESOL Journal, 8(4), 943–968. https://doi.org/10.1002/tesj.332 Saito, K. (2011). Effects of form-focused instruction on L2 pronunciation development of /ɹ/ by Japanese learners of English [Unpublished doctoral dissertation]. McGill University. Saito, K. (2013a). The acquisitional value of recasts in instructed second language speech learn ing: Teaching the perception and production of English /ɹ/ to adult Japanese learners. Language Learning, 63(3), 499–529. https://doi.org/10.1111/lang.12015 Saito, K. (2013b). Re-examining effects of form-focused instruction on L2 pronunciation devel opment: The role of explicit phonetic information. Studies in Second Language Acquisition, 35(1), 1–29. https://doi.org/10.1017/S0272263112000666 Saito, K. (2014). Experienced teachers’ perspectives on priorities for improved intelligible pronunciation: The case of Japanese learners of English. International Journal of Applied Linguistics, 24(2), 250–277. https://doi.org/10.1111/ijal.12026 Saito, K. (2021). What characterizes comprehensible and native-like pronunciation among English-as-a-second-language speakers? Meta-analyses of phonological, rater, and instruc tional factors. TESOL Quarterly, 55(3), 866–900. https://doi.org/10.1002/tesq.3027 Saito, K., & Lyster, R. (2012). Effects of form-focused instruction and corrective feedback on L2 pronunciation development of /ɹ/ by Japanese learners of English. Language Learning, 62(2), 595–633. https://doi.org/10.1111/j.1467-9922.2011.00639.x Saito, K., & Wu, X. (2014). Communicative focus on form and L2 suprasegmental learning: Teaching Cantonese learners to perceive Mandarin tones. Studies in Second Language Acquisition, 36(4), 647–680. https://doi.org/10.1017/S0272263114000114 Sakai, M., & Moorman, C. (2018). Can perception training improve the production of second language phonemes? A meta-analytic review of 25 years of perception training research. Applied Psycholinguistics, 39(1), 187–224. https://doi.org/10.1017/S0142716417000418 Scovel, T. (1988). A time to speak: A psycholinguistic inquiry into the critical period for human speech. Newbury House. Scovel, T. (2000). A critical review of the critical period research. Annual Review of Applied Linguistics, 20, 213–223. https://doi.org/10.1017/S0267190500200135

256 Andrew H. Lee and Ron I. Thomson

Spada, N. (2005). Conditions and challenges in developing school-based SLA research programs. Modern Language Journal, 89(3), 328–338. https://doi.org/10.1111/j.1540-4781.2005.00308.x Szpyra-Kozłowska, J. (2014). Pronunciation in EFL instruction. Multilingual Matters. https://doi.org/10.21832/9781783092628 Thomson, R. I. (2011). Computer assisted pronunciation training: Targeting second language vowel perception improves pronunciation. CALICO Journal, 28(3), 744–765. https://doi.org/10.11139/cj.28.3.744-765 Thomson, R. I. (2018). Measurement of accentedness, intelligibility, and comprehensibility. In O. Kang & A. Ginther (Eds.), Assessment in second language pronunciation (pp. 11–28). Routledge. Thomson, R. I. (2022). The relationship between L2 speech perception and production. In T. M. Derwing, M. J. Munro, & R. I. Thomson (Eds.), The Routledge handbook of second language acquisition and speaking (pp. 373–385). Routledge. https://doi.org/10.4324/9781003022497-32 Thomson, R. I., & Derwing, T. M. (2015). The effectiveness of L2 pronunciation instruction: A narrative review. Applied Linguistics, 36(3), 326–344. https://doi.org/10.1093/applin/amu076 Trofimovich, P., & Gatbonton, E. (2006). Repetition and focus on form in processing L2 Spanish words: Implications for pronunciation instruction. Modern Language Journal, 90(4), 519–535. https://doi.org/10.1111/j.1540-4781.2006.00464.x Trofimovich, P., Nagle, C. L., O’Brien, M. G., Kennedy, S., Reid, K. T., & Strachan, L. (2020). Second language comprehensibility as a dynamic construct. Journal of Second Language Pronunciation, 6(3), 430–457. https://doi.org/10.1075/jslp.20003.tro Tulaja, L. (2020). Exploring acceptability: L1 judgements of L2 Danish learners’ errors. In O. Kang, S. Staples, K. Yaw, & K. Hirschi (Eds.), Proceedings of the 11th Pronunciation in Second Language Learning and Teaching Conference (pp. 197–206). Iowa State University. Van den Doel, R. (2006). How friendly are the natives? An evaluation of native-speaker judge ments of foreign-accented British and American English [Unpublished doctoral disserta tion]. Utrecht University. Wang, Y., Jongman, A., & Sereno, J. A. (2003). Acoustic and perceptual evaluation of Mandarin tone productions before and after perceptual training. Journal of the Acoustical Society of America, 113(2), 1033–1043. https://doi.org/10.1121/1.1531176 Yates, K. (2003). Teaching linguistic mimicry to improve second language pronunciation [Unpublished master’s thesis]. University of North Texas. Zahro, S. K. (2019). Native and non-native listeners perceptual judgement of English accented ness, intelligibility, and acceptability of Indonesian speakers. Lingua Cultura, 13(1), 39–44. https://doi.org/10.21512/lc.v13i1.5362

Chapter 11

Listening Exploring the underlying processes Ruslan Suvorov

University of Western Ontario

Out of the four language skills, listening is generally deemed to be the least understood and most under-researched (Aryadoust, Kumaran et al., 2020), partly due to its complex and ephemeral nature. Given that neither the process nor the product of instructed L2 listening can be directly observed (Brown & Abeywickrama, 2019), understanding L2 listening development and learners’ performance on listening tasks poses a number of challenges for language educators and researchers alike. Traditionally, listening has been measured indi rectly by analyzing its products, such as responses to comprehension questions, whereas research on the processes underlying L2 listening has been lacking (Vandergrift, 2010). Aiming to encourage the use of process-oriented approaches to investigating L2 listening, this chapter summarizes key research trends that are of particular relevance for ISLA and discusses four main methods that can be utilized for data collection and analysis in process-oriented L2 listening studies. The chapter concludes with step-by-step guidelines for implementing eye tracking into L2 listening studies and some additional recommendations for researchers interested in embarking on this type of research. Keywords: listening, technology, eye tracking, process-oriented approaches, methods

1. What is listening and why is it important in ISLA? Defined as “the process of receiving, constructing meaning from, and responding to spoken and/or nonverbal messages” (ILA, 1995, p. 4), listening is covert and transient. Since it takes place in the listener’s mind, listening cannot be directly ob served, recorded, or measured (Brown & Abeywickrama, 2019), which poses con siderable challenges both for researching and for teaching this skill. In 2002, Nunan referred to listening as “the Cinderella skill in second language learning” (p. 238), partly because it had traditionally been given secondary importance compared https://doi.org/10.1075/rmal.3.11suv © 2022 John Benjamins Publishing Company

258 Ruslan Suvorov

to other language skills. Almost two decades later, listening is still regarded by many researchers as the most under-investigated and the least understood skill (Aryadoust, Kumaran et al., 2020; Vandergrift & Goh, 2012). Despite its arguably neglected status, listening plays a pivotal role in ISLA for several reasons. First, given that a primary focus of current ISLA research is to probe into L2 learners’ processes underlying their interaction with the linguistic input and understand how the systematic manipulation of these processes can fa cilitate L2 learning (Leow, 2019; Loewen, 2015), it is essential for ISLA researchers and practitioners to know how exactly L2 learners process audio/visual stimuli that are oftentimes used as the main means of linguistic input in a language classroom (Brown, 2017). Second, with an estimated 66 percent of the total communication time in real-life contexts being dedicated to listening (Burgoon et al., 2010), it appears to be more ubiquitous than other skills in L2 communication, which is a process deemed indispensable for both the development of communicative com petence and effective language acquisition (Littlewood, 2011). Finally, unraveling the processes and strategies that L2 learners use when engaging in listening can shed light on other aspects of L2 development, such as vocabulary and grammar, and inform the design of more effective instructional interventions aimed at facil itating L2 learning. In light of the essential role that listening plays in language education, this chapter seeks to provide an overview of L2 listening research that is of relevance to ISLA and to highlight key research gaps that are in need of further investigation. Following this overview is a discussion of four main methods used for eliciting and analyzing data that represent evidence of L2 listening comprehension and develop ment. In addition, the chapter offers step-by-step guidelines for implementing one of these methods – eye tracking – in the L2 listening research design. The chapter concludes with advice for scholars interested in conducting process-oriented L2 listening studies followed by ideas for troubleshooting ISLA listening research. 2. What we know and what we need to know about listening research in ISLA Listening is a complex, multidimensional cognitive construct that can be defined and operationalized in different ways. As posited by Flowerdew and Miller (2010), there are three main types of models that can explain the process of listening: bottom-up models, top-down models, and interactive models. Bottom-up models postulate that listening entails the ability first to process the smallest acoustic units (i.e., individual sounds) and then progressively to combine them into words, phrases, and sentences. In top-down models, listeners process the acoustic input

Chapter 11. Listening 259

primarily by relying on their prior knowledge rather than on the ability to con struct a meaning from individual sounds. Finally, interactive models combine the bottom-up and top-down views to explain the process of listening comprehension. In their cognitive model of listening comprehension, Vandergrift and Goh (2012) expanded the construct of listening by treating it not as a one-way process but as a two-way interaction that comprises both the processing of an acoustic sig nal and the formulation of utterances. A key component of this interactive model is metacognition, which enables listeners to plan, monitor, and control all the cog nitive processes involved in listening comprehension and speech production. By elucidating the processes underlying the interaction between listening and speak ing, Vandergrift and Goh’s (2012) model emphasizes the importance of listening in communication. A similar model was proposed by Bejar et al. (2000) who espoused the idea that listening comprehension consists of two stages – namely, the listening stage and the response stage – and required the use of situational knowledge, lin guistic knowledge, and background knowledge. What the existing models of listening comprehension suggest is that L2 listening in classroom settings can be affected by a multitude of variables or characteristics that can be grouped into four main categories (see also classifications by Bloomfield et al., 2010; Buck, 2001; and Rubin, 1994). The first category comprises listener characteristics such as first language (L1), overall proficiency in the target L2, L2 vocabulary knowledge, background knowledge, working memory capacity, use of metacognitive strategies, as well as affective factors such as motivation and anxiety. In addition, listening can be affected by speaker characteristics that include accent, speech rate, and the use of hesitations and pauses. Another group of characteristics affecting listening is related to the input, namely its length, lexical and syntactic complexity, content organization, and the use of visuals. The last category, task characteristics, includes types of tasks associated with the listening activity (e.g., multiple-choice questions, open-ended questions), response modality (e.g., writ ten, oral, or visual), delivery format (i.e., paper-based or computer-based), time constraints, note-taking, and playback control. This surfeit of factors capable of impacting listening comprehension suggests that research into L2 listening needs to be varied. Indeed, a look at the existing literature reveals a wide array of issues that have been explored in relation to this skill (see, for instance, Flowerdew & Miller, 2010). Among the existing studies, there are several key influential research trends that appear to be of particular relevance for ISLA: the use of multimedia in L2 listening, the use of listening strategies, in teractive listening, and authenticity. The first research trend is concerned with the use of multimedia, which can be defined as the presentation of two modalities: verbal modality (i.e., spoken or printed text) and visual or pictorial modality (e.g., video, images, illustrations, or

260 Ruslan Suvorov

animation; Mayer, 2005). In the context of L2 listening, research on multimedia has traditionally examined four main topics: (a) the role and effect of visual cues on L2 listening comprehension, (b) the effect of multimedia-based instructional materials on the development of L2 listening skills, (c) the use of audio-visual texts in L2 listening assessments, and (d) the use of multimedia for scaffolding purposes (see also Suvorov, 2019). Studies investigating the first topic have revealed that visual cues, such as seeing the speaker’s face and gestures, are perceived favorably by L2 learners and have a facilitative effect on listening comprehension (e.g., Batty, 2021; Dahl & Ludvigsen, 2014; Suvorov, 2018). Similarly, there is empirical evidence that multimedia materials integrated into listening instruction can promote the devel opment of L2 listening skills (e.g., Becker & Sturm, 2017; Herron et al., 1995). The results of studies exploring the effect of audio-visual prompts on L2 listening test performance, however, are less conclusive. In particular, while some studies have found that the inclusion of visuals in L2 listening assessment instruments leads to higher scores (e.g., Wagner, 2010), other studies found no effect (e.g., Batty, 2015; Londe, 2009) or even a deleterious effect of visuals on test scores (e.g., Pusey & Lenz, 2014; Suvorov, 2009). Such inconclusive results are oftentimes attributed to the methodological differences among these studies that include learner charac teristics, test design characteristics, and test administration procedures (Suvorov & He, 2022). The last group of studies within this research trend has examined the use of multimedia in the form of captions, subtitles, transcripts, and annotations, with the results generally favoring the use of multimedia help options that have been found to facilitate both the process of listening comprehension and vocab ulary learning (Mohsen, 2016; Montero Perez et al., 2013). Hayati and Mohmedi (2011), for instance, found a positive effect of using subtitles in the target language on L2 listening comprehension. Similarly, in Çakmak and Erçetin’s study (2018), L2 learners’ vocabulary recognition was facilitated by multimedia glosses used in a mobile-assisted listening task. Overall, research evidence suggests that the extent to which multimedia can facilitate L2 listening comprehension and development of listening skills, affect performance on L2 listening tests, and provide scaffolding depends both on listening task characteristics and on learners’ individual differ ences such as prior knowledge, working memory, and strategy use (Vandergrift & Goh, 2012). The second trend in L2 listening research that is of relevance for ISLA is the use of listening strategies, which can be defined as “[t]echniques or plans that contribute directly to the comprehension and recall of listening input” (Rost, 2011, p. 330) and comprise both cognitive strategies (e.g., inferencing and elaboration) and metacog nitive strategies such as planning, monitoring, evaluating, and problem-solving (Cross & Vandergrift, 2018; Graham, 2017). There is a wide body of research on this topic ranging from studies exploring what types of listening strategies are used by

Chapter 11. Listening 261

L2 learners and how those strategies affect their listening comprehension (e.g., Fung & Macaro, 2021; Graham et al., 2010, 2011) to studies examining the effect of teach ing listening strategies on the development of L2 listening skills (e.g., Vandergrift & Tafaghodtari, 2010; Yeldham & Gao, 2020). While Berne’s (2004) review of listening comprehension strategies revealed differences in strategy use between less- and more-proficient L2 learners, the author was unable to draw any specific conclu sions because of the descriptive nature of the studies and the lack of standardized measures of L2 listening proficiency. In a similar vein, other researchers (e.g., Field, 2019; Macaro et al., 2007) have also warned against drawing generalizable conclu sions in this area because of the “fragmented nature” (Field, 2019, p. 296) of the studies that used different research designs and applied different criteria to cate gorize strategies. Nonetheless, empirical evidence suggests that it is difficult for L2 learners to develop effective listening strategies on their own and, therefore, explicit instruction that focuses on the development of bottom-up strategies (e.g., attending to discourse markers and intonation patterns) and top-down strategies (e.g., using contextual cues to determine the meaning of the spoken input) is indispensable (Graham, 2017; Graham et al., 2011). Such instruction can be strategy-based in struction that aims at teaching specific cognitive strategies to L2 learners to help them improve their listening comprehension and metacognitive instruction that teaches L2 learners how to select and use strategies based on the requirements of a listening task (Vandergrift & Goh, 2012; Yeldham & Gao, 2020). A recent study by Yeldham and Gao (2020) demonstrated that L2 learners appear to benefit most from listening instruction methods that match their individual cognitive styles. Another influential research trend relevant to ISLA comprises studies that in vestigate listening as part of interaction. Informed by theoretical views of listening as a key element of communicative competence (e.g., the cognitive model of listen ing comprehension, see Vandergrift & Goh, 2012), this line of research examines the development of listening skills in conjunction with speaking, underscoring the importance of the communicative approach to language teaching (Aryadoust, Kumaran et al., 2020). This research trend has two main foci: the integration of listening with speaking (e.g., Rukthong, 2021; Rukthong & Brunfaut, 2020) and au thenticity (e.g., Emerick, 2019; Gilmore, 2007, 2011; Weyers, 1999). Also known as bi-directional listening (Vandergrift, 2007) or interactive listening (Huang, 2020), this type of listening plays a key role in interaction as it enables L2 learners to ne gotiate and co-construct meaning. While tasks that integrate listening and speaking do not appear to be widely incorporated in a typical language classroom, which is “known for its impoverished semi-artificial setting” (Leow, 2019, p. 485), the use of such tasks in language assessment contexts is becoming increasingly common (Brunfaut & Rukthong, 2018), as evidenced by the burgeoning body of research in this area (e.g., Rukthong, 2021; Rukthong & Brunfaut, 2020). This increasing

262 Ruslan Suvorov

adoption can be partly attributed to the fact that in many high-stakes testing con texts such as testing for matriculation, employment, and immigration purposes language proficiency is conceptualized as a unitary construct rather than a set of multiple language skills. Among the studies that investigated the integration of listening and speaking, Rukthong (2021), for instance, explored the extent to which integrated listening-to-summarize tasks assess listening processes required in real-life communication. The findings revealed that, unlike discrete point tasks such as multiple-choice questions, integrated listening tasks engaged L2 listeners in the higher cognitive processing and elicited strategies that enabled them to fill in the gaps in their comprehension of the auditory input when producing oral summaries. In a different study, Rukthong and Brunfaut (2020) examined the forms of cogni tive processing and strategy use among L2 learners while they were completing listening-to-summarize tasks. The authors found that, in order to process the aural input and complete oral summaries, L2 learners had to rely on both lower-level cognitive processes (e.g., acoustic-phonetic processing and word recognition) and higher-level cognitive processes (e.g., semantic and pragmatic processing), as well as on an array of cognitive strategies (e.g., inferencing) and metacognitive strategies (e.g., comprehension monitoring). Related to research on interactive listening are studies that explore the issue of authenticity, which is frequently cited as one of the main advantages of using inte grated rather than discrete skills tasks (Huang et al., 2018; Rukthong, 2021). While authenticity has been historically defined in various ways (see Gilmore, 2007, for a definitional overview; also Pinner, 2014), it typically refers to the use of authentic materials and authentic language which “entails patterns of language and mean ing that are recognizable within and across communities of speakers and that are appropriated as one’s own” (van Compernolle & McGregor, 2016, p. 3). The use of authentic listening materials has been found to be beneficial for the development of communicative competence (Gilmore, 2011; Weyers, 1999). Weyers (1999) con ducted an experiment in which he exposed his students in a Spanish course to two episodes of an authentic soap opera each week for a period of two months. A statistically significant increase in listening comprehension and vocabulary knowl edge that was evidenced from students’ enhanced oral output led the researcher to conclude that the use of authentic videos had a positive effect on students’ com municative skills. In a more recent longitudinal study, Gilmore (2011) explored the potential of authentic materials selected from a variety of sources (e.g., films, web resources, songs, newspaper articles) for developing L2 learners’ overall commu nicative competence in English. As indicated by the analysis of different measures of communicative competence, one of which was a listening test, learners in the experimental group that used authentic materials increased their communicative

Chapter 11. Listening 263

competence to a greater extent compared to learners in the control group who used standard textbook materials. Overall, the findings in this research trend have important implications for ISLA, suggesting the benefits of exposing L2 learners to authentic language and materials that integrate listening and speaking skills. Such benefits include a larger degree of contextualization, access to paralinguistic cues, and increased opportunities for L2 listeners to use clarification strategies with an interlocutor (Vandergrift, 2007, 2010). While most of the L2 listening studies within the above-discussed trends have been conducted in a language classroom – a “prototypical context for ISLA” (Loewen & Sato, 2017, p. 2) – a significant amount of disconnect is believed to exist between research and instructors’ beliefs and practices in L2 listening classrooms (Emerick, 2019; Graham, 2017). Some reasons for this disconnect include the in structors’ focus on testing rather than teaching L2 listening (Graham, 2017) and the existing gap between authentic language and materials and the language and materials used in the textbooks (Gilmore, 2007). More importantly, the disconnect is also galvanized by the preponderance of the traditional approach to L2 listening instruction that is based on providing learners with the listening input followed by comprehension questions, which, as Goh and Taib (2006) aver, is “the modus op erandi” (p. 225) of many language teachers because of the washback effect of public exams. Such a product-oriented approach has been criticized for its emphasis on the products of listening rather than on the underlying processes (Field, 2019). The reason for this criticism is that the products of listening (e.g., students’ responses to comprehension questions) do not explain the processes that give rise to those products (that is, a response to a comprehension question does not provide insights into what a student has actually heard). Similarly, a product-oriented approach has also been dominant in research on L2 listening (Vandergrift, 2007). Specifically, most of the existing studies investi gating L2 listening have traditionally focused on exploring the product of listening using experimental and correlational designs (see Vandergrift, 2010), whereas re search on the processes underlying L2 listening has been sparse. Taking into con sideration that a process-oriented approach to examining listening can generate valuable insights into L2 learners’ processes and strategies, researchers need to direct more attention to this under-investigated area. In fact, as contended by Leow (2019), engaging in process-oriented research should be one of the main directions for ISLA because understanding how L2 learners process the input can “assist in the creation of instructional interventions designed to encourage active use of students’ mechanisms of learning while performing classroom-based tasks or activities” (p. 489). The following section offers an overview of the main methods that can be utilized for data collection and analysis in process-oriented L2 listening research.

264 Ruslan Suvorov

3. Data elicitation and interpretation options There are four main types of data collection methods that can be used in processoriented ISLA listening studies: survey research, verbal reports, behavioral meth ods, and neuroimaging methods. To illustrate how these methods are commonly applied to elicit evidence of L2 listening comprehension and development, this section provides a brief summary of each method, discusses its advantages and limitations, and cites L2 listening studies that have used each research method. The section concludes with a more detailed example of an eye-tracking study and a set of step-by-step guidelines for implementing eye-tracking methodology in L2 listening research. Eye tracking has been selected as the focal method for this chapter because of the increasing affordability of eye-tracking technology and its potential to provide valuable process-oriented data about listening relevant for ISLA (such as the type and extent of L2 learners’ viewing behavior during multi media-enhanced listening tasks) that cannot be elicited through more traditional, more impressionistic methods such as surveys or verbal reports. 3.1

Survey research

The first method commonly used for gathering process-based data is survey re search that comprises questionnaires and interviews. It should be noted that while the terms “survey” and “questionnaire” are sometimes used interchangeably in applied linguistics and ISLA, they in fact represent two different concepts, with the former being a type of a research method and the latter being a type of a data collection instrument. Survey research has a number of advantages such as cost effectiveness and practicality, especially for questionnaires that can be designed and administered fairly efficiently to large and diverse populations (Wagner, 2015). Meanwhile, this method has potential drawbacks that include overreliance on con venience sampling that can limit the generalizability of the findings and different forms of bias leading to responses that misrepresent what respondents really think or feel. Concerns have also been raised about the extent to which survey research methods are capable of eliciting valid data about unconscious behaviors (Wagner, 2015). Still, for ISLA researchers interested in studying L2 listening in classroom settings, this method is perhaps the most practical one and easy to implement and appears to be commonly used in ISLA studies, as illustrated below. Survey research on L2 listening has been used extensively to elicit perceptual data such as learners’ attitudes, opinions, and beliefs, as well as information about the processes underlying learners’ listening comprehension (e.g., learners’ use of metacognitive strategies during listening tasks). It is customary for researchers

Chapter 11. Listening 265

to design their own questionnaires for each L2 listening study in order to garner participants’ perceptions (e.g., Cubilo & Winke, 2013). Some studies, however, have implemented specialized, validated instruments such as the Metacognitive Awareness Listening Questionnaire (MALQ), which is a self-assessment instrument comprised of 21 Likert-scale items in five categories (i.e., planning and evaluation, directed attention, person knowledge, mental translation, and problem-solving) that inquire about language learners’ awareness of listening processes and strategy use (Vandergrift et al., 2006). The MALQ has been used to explore, for instance, the potential of metacognitive strategies for improving listening proficiency of L2 French learners (Becker, 2021), the relationship between metacognitive awareness and listening performance of ESL learners (Goh & Hu, 2014), and the effects of a metacognitive approach to teaching L2 listening in a 13-week French language course (Vandergrift & Tafaghodtari, 2010). Similarly, interviews have been utilized in a variety of contexts, for example, to examine the potential of authentic materials and associated tasks to develop communicative competence (Gilmore, 2011) and to probe into language teachers’ use of listening materials and views on authenticity in listening instruction (Emerick, 2019). Cautioning that self-reported data elicited via questionnaires or interviews may not portray actual listening behavior accurately, Cross and Vandergrift (2015) recommend conducting survey research immediately after L2 learners complete a listening task or activity to avoid any memory decay. 3.2

Verbal report methods

Verbal report methods can be divided into two main groups: concurrent verbal reports such as think-alouds and retrospective verbal reports such as stimulated recalls (Bowles, 2019; this volume, Chapter 13). Concurrent verbal reports require L2 learners to verbalize their thoughts while completing a specific task, whereas in retrospective verbal reports verbalizations take place shortly after task com pletion. Arguably, verbal reports are a versatile method for data elicitation that can provide valuable insights into L2 learners’ processes and strategies that they deploy to complete L2 listening tasks. There are, however, two main validity con cerns related to the use of verbal reports that are commonly cited in the literature (e.g., Bowles, 2010; Gass & Mackey, 2016): reactivity (i.e., altering of the original thoughts because of the cognitive demands of the verbalization process that be comes an additional task) and non-veridicality (i.e., inaccurate reporting of the original thoughts due to memory decay). While reactivity appears to be the major validity threat for think-alouds, non-veridicality poses challenges to the validity of stimulated recalls, which are a special type of retrospective report that use a stim ulus (e.g., a video recording of the participants doing the task or a screen recording

266 Ruslan Suvorov

of the participants’ eye movements during the task completion) to help participants recall their thought processes at the time of the task (Bowles, 2019). One way to mitigate the reactivity risk in think-alouds is to provide clear guidelines, and even a brief training, to research participants on how to express their thoughts as they are completing a listening task. Such guidelines, for example, may instruct learners when to pause the audio(visual) prompt (in case they are in control of the playback settings) and what types of information they are expected to report. Regarding the issue of non-veridicality, conducting retrospective verbal reports with L2 learn ers as soon as they have completed a listening task, as well as using a stimulus to guide the recall, can substantially reduce memory decay and increase the validity of verbalizations. Furthermore, researchers should consistently adhere to the same protocol when using this method across all participants and urge each participant to verbalize what they were thinking during the original listening task rather than what they might be thinking during the reporting (see Bowles, 2019, for a more detailed discussion). While some experts (e.g., Ockey, 2007; Yeldham, 2017) do not consider thinkalouds to be a viable method for listening research because of the challenges asso ciated with listening and reporting one’s thoughts at the same time, think-alouds are nevertheless utilized in L2 listening studies. Seo (2002), for instance, used thinkaloud protocols to explore how the presence of visuals affected the participants’ choice of listening strategies, whereas in Inceçay and Koçoğlu’s (2017) study L2 listeners underwent this procedure to report their use and perceptions of visuals in different input delivery modes. Similar to think-alouds, stimulated recalls are also widely adopted in L2 listening research. Examples of studies that used stimu lated recalls include Rukthong (2021) who analyzed listening processes used by L2 learners to complete two types of tasks, Huang’s (2020) investigation of listeners’ discursive practices during interactive listening, and Révész and Brunfaut (2013) who examined listening text characteristics that their participants found difficult for processing the text. ISLA researchers interested in using verbal reports for studying L2 listening processes should be aware that, unlike survey research methods, this method is less practical as it allows for gathering data from one participant at a time. As a result, conducting verbal reports in a classroom is typically not very feasible and, instead, needs to be done in a lab or outside-of-class settings. On the other hand, compared to survey research methods, verbal reports can furnish ISLA researchers with more rigorous and comprehensive data about L2 learners’ emic perspectives.

3.3

Chapter 11. Listening 267

Behavioral and neuroimaging methods

The last two groups of reportedly the most robust methods for procuring evidence of listening processes consist of (a) behavioral methods such as eye tracking and (b) neuroimaging methods such as electroencephalography (EEG) and functional magnetic resonance imaging (fMRI). Among these methods, eye tracking has spawned most interest among (I)SLA researchers in recent years, as indicated by the considerable proliferation of studies that have leveraged this technology. Although eye tracking has been deployed primarily in reading research to examine the learn ing of vocabulary and grammatical structures, especially through textual enhance ment (Godfroid, 2019b), the number of L2 listening studies that exploited this method has been growing as well (e.g., Aryadoust, 2020; Batty, 2021; Holzknecht et al., 2017; Suvorov, 2015; Winke & Lim, 2014), but predominantly in language testing rather than in ISLA contexts. Unlike eye tracking, neuroimaging, which refers to various techniques for visu alizing the anatomical and functional elements of the nervous system including the brain, has not yet been adopted in L2 listening research. Nonetheless, some research ers have started experimenting with this method. Examples include Suvorov and Camp’s (2017) exploratory study that used EEG, which is a non-invasive technique for recording the electrical activity of the brain using electrodes, and Aryadoust, Ng et al. (2020) who leveraged functional near-infrared spectroscopy (fNIRS) imaging, which is an optical brain imaging technique that uses near-infrared light to measure oxygen changes in blood flow. Such neuroimaging methods have the potential to record and measure the brain activity representing certain cognitive processes of L2 learners during a listening event; however, they appear to be the least feasible methods for ISLA researchers due to the high costs, limited access, a steep learning curve, and low practicality associated with their use. Eye tracking refers to both technology (an eye-tracker) that can record and measure eye movements of a person and methodology that allows for analyzing and interpreting eye-tracking data. According to the eye-mind hypothesis postulated by Just and Carpenter (1980), there is a close link between the eye gaze (i.e., what a viewer looks at) and attention (i.e., what a viewer focuses on cognitively), and the existence of this link has been empirically tested and proven by a growing body of eye-tracking studies. Because an eye-tracker is capable of recording a vast amount of data every second, scholars who integrate this technology in their studies often face a challenge of selecting which data to analyze and what eye-tracking measures to use. The two most common eye-tracking measures are saccades and fixations. Saccades are rapid eye movements between fixations, whereas fixations are points of a relatively stationary eye gaze. Fixations can be spatial (i.e., representing the lo cation of an eye gaze) and temporal (i.e., representing the duration of an eye gaze).

268 Ruslan Suvorov

Several recent publications offer an in-depth overview of eye-tracking measures and detailed procedures for their analysis (see, for instance, Conklin & Pellicer-Sánchez, 2016; Conklin et al., 2018; Godfroid, 2019a). Given the richness and sensitivity of eye gaze data, eye tracking can be lever aged to obtain “a fine-grained, time-sensitive representation of the learning process” and shed light on “the cognitive processes and knowledge that participants use to accomplish a particular task or goal” (Godfroid, 2019b, p. 44), which indicates a significant potential of this research method for investigating L2 listening processes and processing in ISLA contexts. To demonstrate this potential, the following sec tion provides an example of a language assessment study with a step-by-step illus tration of how eye tracking was implemented to explore L2 learners’ test-taking strategies during the completion of adapted items of the Michigan English Test. Exemplar study: Suvorov (2018) Research questions RQ1: What test-taking strategies do test-takers employ when completing computer-delivered items adapted from the Michigan English Test (MET)? RQ2: What differences in test-taking strategies do test-takers demonstrate when completing five different types of computer-delivered items adapted from the MET? RQ3: To what extent do test-wiseness strategies (i.e., strategies such as random guessing that are irrelevant to the construct measured by the test) introduce construct-irrelevant variance and affect scores for computer-delivered items adapted from the MET? Methods In this mixed-methods study, the researcher used a data triangulation design to collect three types of data: eye-tracking data (i.e., the recordings of participants’ eye movements during the MET), verbal report data (i.e., participants’ verbalizations of their test-taking strategies), and test score data (i.e., participants’ scores on test items from the MET). Fifteen L2 learners of English completed 58 computer-delivered multiple-choice items (including 24 listening items), with their eye movements recorded by an eye-tracker. Participants were subsequently shown the videorecording of their eye movements and asked to explain (a) the types of test-taking strategies they used for answering each item and (b) the reasons for choosing a specific answer. The eye-tracking data and verbal report data were converged during the qualitative analysis to determine test-taking strategies used for answering each test item (RQ1 and RQ2), whereas all three types of data were analyzed quantitatively to measure the extent to which the participants’ use of test-wiseness strategies influenced their test scores (RQ3). Findings The findings revealed an array of test-taking strategies that comprised 20 test-management strategies grouped in three categories and seven test-wiseness strategies (RQ1). Some strategies were found to be skill-specific and applied only to certain item types (RQ2). Finally, the researcher determined that the participants’ use of test-wiseness strategies led to a statistically significant increase in their test scores (RQ3).

Chapter 11. Listening 269

Take-aways This study demonstrated that complementing verbal report data with eye-tracking data was more effective for research on test-taking strategies because the eye-tracking data provided evidence about what the participants looked at and how they visually interacted with the listening and reading items, whereas verbal report data explained the underlying reasons for those interactions.

4. Step-by-step guidelines for implementing eye tracking in an L2 listening study Below is a summary of key steps for integrating eye tracking in an L2 listening study, including suggestions for gathering, analyzing, and interpreting the data. These guidelines presuppose that a scholar interested in undertaking this endeavor has already completed the literature review, identified a research gap, and generated re search questions for their study. Additional guidelines for designing and conducting L2 listening studies can be found in Cross and Vandergrift (2015). Step 1. Select an appropriate eye-tracker Eye-trackers come in all shapes and forms and vary with regard to the position of the device’s camera (i.e., remote, head-mounted, and head-stabilized eye-trackers), type of tracking (i.e., monocular or binocular eye tracking), and the data quality that depends on the accuracy and precision of the eye-tracker and on its data sam pling rate. The data sampling rate, for instance, can range from 30 Hz to 2000 Hz, which translates into 30 and 2000 data points recorded each second, respectively. Coupled with the affordances (and constraints) of eye-tracking software, all these characteristics differentially affect the types and amount of eye-tracking data that can be recorded and analyzed, as well the types of interpretations and conclusions that can be made on the basis of such data analysis. For research on L2 listen ing in instructed settings, a remote eye-tracker, which is positioned remotely near the monitor and allows for free head movement, might be a better choice than a head-mounted or even head-stabilized eye-tracker as it would result in higher eco logical validity. Novice researchers are advised to familiarize themselves with the methodological considerations related to the choice of the eye-tracking equipment and software by consulting Conklin et al. (2018) and Godfroid (2019a).

270 Ruslan Suvorov

Step 2. Design the experiment After selecting and familiarizing themselves with the eye-tracking equipment and software, the researcher can start designing their listening-focused study. The design should be informed by the research questions guiding the study. In addition, several considerations need to be taken into account. First, the researcher needs to deter mine the exact purpose for which eye tracking will be used, such as to elicit evidence of the learner’s eye movements during the listening task (represented by eye-tracking measures) or to cue stimulated recalls via the eye-movement recording. Second, the researcher needs to design or find appropriate stimuli for the experiment. In the context of L2 listening research, such stimuli can be audio-only or audiovisual. If a stimulus is particularly long, it should be divided into smaller segments to facilitate both the data collection and the subsequent data analysis. When designing each stimulus, it is important to specify an area(s) of interest within which eye-tracking data will be collected (Conklin & Pellicer-Sánchez, 2016). For example, if a stimu lus within a listening task contains both a prompt (e.g., a video presentation) and a multiple-choice comprehension question presented simultaneously on a computer screen, the researcher needs to decide whether to collect the eye-tracking data asso ciated only with the prompt or also with the comprehension question. Next, depending on the research questions, the researcher needs to deter mine suitable eye-tracking measures, which can be a daunting task considering the existence of over 150 such measures (Holmqvist et al., 2011). When selecting eye-tracking measures, two fundamental assumptions underlying all eye-tracking research should be considered: (a) that the amount of time a learner spends fixat ing their eyes on a specific element is indicative of the amount of cognitive effort necessary for processing that element and (b) that the element a learner fixates their eyes on represents the focus of their attention (Conklin & Pellicer-Sánchez, 2016). These assumptions can help one decide on the types of data required for answering the research questions and the eye-tracking measures that need to be calculated. Finally, the researcher should delineate selection criteria for participants and determine the number of participants required for the experiment. Depending on the eye-tracking equipment, selection criteria should specify, for example, what types of glasses or contact lenses are allowed and whether participants who have glaucoma, cataracts, or permanently dilated pupils are eligible to participate. It is likely that the data from some participants will have to be discarded because of poor quality or technical issues during data collection sessions, so the researcher should plan to recruit more participants to account for any potential data loss.

Chapter 11. Listening 271

Step 3. Conduct a pilot study Because the process of eye-tracking data collection and analysis is generally timeconsuming and labor-intensive, it is highly recommended to run a small pilot study before launching the actual study. The pilot study can help identify and rectify any potential issues with the experimental set-up and the data analysis procedure, which will save time and costs in the future. Step 4. Collect the data After conducting the pilot study, the researcher can proceed to the data collection stage. Each individual data collection session must start by calibrating the eye-tracker to ensure the accuracy and precision of the recorded data. The calibration procedure entails showing a series of dots across the computer screen that a participant must fixate on. Following the calibration procedure, the researcher may proceed to an actual experiment. During the experiment, the researcher must constantly monitor the eye-tracker and make any adjustments in case of any complications with the data recording process (e.g., when the eye-tracker stops recognizing and recording the participant’s eyes). Depending on the length of the listening task that the par ticipant is expected to complete, it may be necessary to take breaks and recalibrate the eye-tracker to maintain the quality of the data being gathered. If the experiment presupposes the use of eye-movement recordings as stimuli for the participant’s ver balizations of their thought processes during the listening task, it is recommended not just to audio-record their verbalizations, but to make a video-recording that combines the recording of the participant’s eye movements during the initial task with the audio of the participant explaining their thought processes cued by the eye-movement recording. After the end of each experiment, the researcher needs to ensure that all the data are properly labeled and saved for subsequent analysis. Step 5. Analyze the data and interpret the results The last step entails checking, selecting, cleaning and, if necessary, transforming the data to prepare them for analysis. When interpreting the results of data analysis, it is important to remember that eye tracking does not provide overt information about cognitive processes, but only reveals the learners’ viewing behavior and interaction with a listening stimulus (i.e., what the learners look at while listening). To draw any reliable conclusions about L2 listeners’ cognitive processes, the analysis of the eye-tracking data needs to be complemented with the data from other sources such as verbal reports that provide insights into the reasons underlying the learners’

272 Ruslan Suvorov

viewing behavior (i.e., why the learners look at certain elements of the stimulus while listening). Data triangulation is thus of paramount importance for valid and reliable interpretation of the findings (Godfroid, 2019b; Vandergrift, 2007, 2010). 5. Advice to future listening researchers Researchers seeking to set up a study to investigate processes associated with L2 listening in ISLA contexts should keep in mind the following recommendations. (1) Design your study carefully Read the relevant literature and familiarize yourself with existing studies in this area. Select a research design and data elicitation methods that are most appropriate for your study goals and research questions but are also feasible and practical in classroom settings. Be mindful of the informed consent rules, the factors that may affect learners’ willingness to participate or withdraw from the study, and, in case of an intervention study, the impact of such a study on the class where it is conducted. (2) Strive for authenticity Utilize authentic listening materials in your research, although scripted or semi-scripted materials can also be appropriate in certain contexts (see Wagner et al., 2021). Besides investigating what happens in the L2 learners’ mind during a listening activity (i.e., listening as an isolated skill), consider exploring how L2 learners process and utilize linguistic input as part of a communicative activity (i.e., interactive listening). The use of authentic materials is particularly relevant for intervention studies conducted in regular classes to ensure generalizable findings and produce a positive impact on the development of students’ L2 listening skills. (3) Ensure the validity of your results Choose data elicitation methods that will yield results that are not only robust and ecologically valid, but are also generalizable and applicable to classroom contexts. When using verbal reports and surveys, especially with less profi cient L2 learners, conduct them in the learners’ native language(s). Doing so will ensure both the depth and the breadth of learners’ responses. Whenever possible and appropriate, triangulate the data. (4) Consider implications for ISLA Following Leow (2019), make sure your study objectives are aligned with cur ricular goals so that the expected outcomes will have the potential to inform pedagogical practices and the design of instructional interventions capable of promoting successful L2 listening development.

Chapter 11. Listening 273

6. Troubleshooting ISLA listening research Common question 1: Which variables should I include in the design of my L2 listening study? With listening being a multi-dimensional skill, a wide range of variables can affect L2 learners’ performance on a listening task. For instance, higher performance on a video-based listening activity can be falsely attributed to the effect of visuals, whereas in reality higher scores can be the result of L2 learners’ using test-wiseness strategies such as guessing. It is therefore essential to carefully design a study so that the variables that are not germane to research questions are controlled to the maximum extent possible. Common question 2: What types of data should I collect for my L2 listening study? Learners vary in terms of how they process the input and what strategies they use to complete listening tasks. To avoid results that mask individual differences and fail to provide insights into the underlying reasons for these differences, a researcher should consider designing a mixed-methods study and gather both quantitative and qualitative data when investigating a specific phenomenon. Common question 3: How do I ensure that the results of my L2 listening study are valid?  ie-3571

ie-3572

Because listening cannot be directly measured and observed, a researcher must be careful when interpreting the analyzed data and drawing conclusions about listening. As mentioned in the step-by-step guidelines above, data triangulation is essential in process-oriented L2 listening research and can help avoid threats to the validity of the interpretations made on the basis of the findings. 7. Conclusions The main goal of this chapter was to demonstrate a need for more research into processes underlying L2 listening comprehension and development and to pro pose step-by-step guidelines for using eye tracking in process-oriented L2 listening studies. In doing so, the chapter underscored the essential role of listening in ISLA and discussed four influential research trends that are of particular relevance for ISLA: the use of multimedia in L2 listening, the use of listening strategies, research on interactive listening, and authenticity. Following this discussion, the chapter highlighted a lack of process-oriented L2 listening studies and introduced four re search methods that could be utilized to elicit and analyze the data about listening processes and processing. A specific emphasis in this chapter was made on the use

274 Ruslan Suvorov

of eye tracking, as demonstrated by the inclusion of a sample eye-tracking study and step-by-step guidelines for implementing this methodology in L2 listening research. The chapter concluded with advice for L2 listening researchers and sug gestions for trouble-shooting ISLA listening research. Scholars interested in this topic may seek additional information in the literature and explore the resources listed below. 8. Further reading and additional resources 8.1

Suggested books and book chapters

Conklin, K., Pellicer-Sánchez, A., & Carrol, G. (2018). Eye-tracking: A guide for applied linguistics research. Cambridge University Press. https://doi.org/10.1017/9781108233279 Field, J. (2019). Second language listening: Current ideas, current issues. In J. W. Schwieter & A. Benati (Eds.), The Cambridge handbook of language learning (pp. 283–319). Cambridge University Press. https://doi.org/10.1017/9781108333603.013 Godfroid, A. (2019a). Eye tracking in second language acquisition and bilingualism: A research synthesis and methodological guide. Routledge. https://doi.org/10.4324/9781315775616 Suvorov, R. (2022). Technology and listening in SLA. In N. Ziegler & M. González-Lloret (Eds.), Routledge handbook of second language acquisition and technology. Routledge. https://doi.org/10.4324/9781351117586-13 Vandergrift, L. (2015). Researching listening. In B. Paltridge & A. Phakiti (Eds.), Research methods in applied linguistics: A practical resource (pp. 299–314). Bloomsbury.

8.2

Suggested journals and professional organizations

International Journal of Listening. https://www.tandfonline.com/toc/hijl20/current International Listening Association. https://www.listen.org/

References Aryadoust, V. (2020). Dynamics of item reading and answer changing in two hearings in a computerized while-listening performance test: An eye-tracking study. Computer Assisted Language Learning, 33(5–6), 510–537. https://doi.org/10.1080/09588221.2019.1574267 Aryadoust, V., Kumaran, Y. I., & Ferdinand, S. (2020). Linguistics. In D. L. Worthington & G. D. Bodie (Eds.), The handbook of listening (pp. 139–161). John Wiley & Sons. https://doi.org/10.1002/9781119554189.ch9 Aryadoust, V., Ng, L. Y., Foo, S., & Esposito, G. (2020). A neurocognitive investigation of test methods and gender effects in listening assessment. Computer Assisted Language Learning, 1–21. https://doi.org/10.1080/09588221.2020.1744667

Chapter 11. Listening 275

Batty, A. O. (2015). A comparison of video- and audio-mediated listening tests with many-facet Rasch modeling and differential distractor functioning. Language Testing, 32(1), 3–20. https://doi.org/10.1177/0265532214531254 Batty, A. O. (2021). An eye-tracking study of attention to visual cues in L2 listening tests. Language Testing, 38(4), 511–535. https://doi.org/10.1177/0265532220951504 Becker, S. R. (2021). Metacognitive instruction in L2 French: An analysis of listening performance and automaticity. Foreign Language Annals, 54(1), 9–26. https://doi.org/10.1111/flan.12506 Becker, S. R., & Sturm, J. L. (2017). Effects of audiovisual media on L2 listening comprehension: A preliminary study in French. CALICO Journal, 34(2), 147–177. https://doi.org/10.1558/cj.26754 Bejar, I., Douglas, D., Jamieson, J., Nissan, S., & Turner, J. (2000). TOEFL 2000 listening framework: A working paper (Report No. RM-00-07). Educational Testing Service. https://www. ets.org/Media/Research/pdf/RM-00-07.pdf Berne, J. E. (2004). Listening comprehension strategies: A review of the literature. Foreign Language Annals, 37(4), 521–533. https://doi.org/10.1111/j.1944-9720.2004.tb02419.x Bloomfield, A., Wayland, S., Rhoades, E., Blodgett, A., Linck, J., & Ross, S. (2010). What makes listening difficult? Factors affecting second language listening comprehension (Technical Report No. E.3.1 TTO 81434). University of Maryland, Center for Advanced Study of Language. https://apps.dtic.mil/sti/pdfs/ADA550176.pdf. https://doi.org/10.21236/ADA550176 Bowles, M. A. (2010). The think-aloud controversy in second language research. Routledge. https://doi.org/10.4324/9780203856338 Bowles, M. A. (2019). Verbal reports in instructed SLA research: Opportunities, challenges, and limitations. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 31–43). Routledge. https://doi.org/10.4324/9781315165080-3 Brown, H. D., & Abeywickrama, P. (2019). Language assessment: Principles and classroom practices (3rd ed.). Pearson. Brown, S. (2017). L2 listening. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning: Volume III (pp. 201–213). Routledge. https://doi.org/10.4324/9781315716893-15 Brunfaut, T., & Rukthong, A. (2018). Integrative testing of listening. In J. I. Liontas (Ed.), The TESOL encyclopedia of English language teaching (pp. 1–7). John Wiley & Sons. https://doi.org/10.1002/9781118784235.eelt0622 Buck, G. (2001). Assessing listening. Cambridge University Press. https://doi.org/10.1017/CBO9780511732959 Burgoon, J. K., Guerrero, L. K., & Floyd, K. (2010). Nonverbal communication. Routledge. https://doi.org/10.4324/9781315663425 Çakmak, F., & Erçetin, G. (2018). Effects of gloss type on text recall and incidental vocabulary learning in mobile-assisted L2 listening. ReCALL, 30(1), 24–47. https://doi.org/10.1017/S0958344017000155 Conklin, K., & Pellicer-Sánchez, A. (2016). Using eye-tracking in applied linguistics and second language research. Second Language Research, 32(3), 453–467. https://doi.org/10.1177/0267658316637401 Cross, J., & Vandergrift, L. (2015). Guidelines for designing and conducting L2 listening studies. ELT Journal, 69(1), 86–89. https://doi.org/10.1093/elt/ccu035 Cross, J., & Vandergrift, L. (2018). Metacognitive listening strategies. In The TESOL Encyclopedia of English language teaching. Wiley. https://doi.org/10.1002/9781118784235.eelt0589

276 Ruslan Suvorov

Cubilo, J., & Winke, P. (2013). Redefining the L2 listening construct within an integrated writ ing task: Considering the impacts of visual-cue interpretation and note-taking. Language Assessment Quarterly, 10(4), 371–397. https://doi.org/10.1080/15434303.2013.824972 Dahl, T. I., & Ludvigsen, S. (2014). How I see what you’re saying: The role of gestures in native and foreign language listening comprehension. Modern Language Journal, 98(3), 813–833. https://doi.org/10.1111/modl.12124 Emerick, M. R. (2019). Explicit teaching and authenticity in L2 listening instruction: University language teachers’ beliefs. System, 80, 107–119. https://doi.org/10.1016/j.system.2018.11.004 Flowerdew, J., & Miller, L. (2010). Listening in a second language. In A. D. Wolvin (Ed.), Listening and human communication in the 21st century (pp. 158–177). Wiley-Blackwell. https://doi.org/10.1002/9781444314908.ch7 Fung, D., & Macaro, E. (2021). Exploring the relationship between linguistic knowledge and strategy use in listening comprehension. Language Teaching Research, 25(5), 540–564. https://doi.org/10.1177/1362168819868879 Gass, S. M., & Mackey, A. (2016). Stimulated recall methodology in applied linguistics and L2 research (2nd ed.). Routledge. https://doi.org/10.4324/9781315813349 Gilmore, A. (2007). Authentic materials and authenticity in foreign language learning. Language Teaching, 40(2), 97–118. https://doi.org/10.1017/S0261444807004144 Gilmore, A. (2011). “I prefer not text:” Developing Japanese learners’ communicative compe tence with authentic materials. Language Learning, 61(3), 786–819. https://doi.org/10.1111/j.1467-9922.2011.00634.x Godfroid, A. (2019b). Investigating instructed second language acquisition using L2 learners’ eye-tracking data. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 44–57). Routledge. https://doi.org/10.4324/9781315165080-4 Goh, C. C. M., & Hu, G. (2014). Exploring the relationship between metacognitive awareness and listening performance with questionnaire data. Language Awareness, 23(3), 255–274. https://doi.org/10.1080/09658416.2013.769558 Goh, C. C. M., & Taib, Y. (2006). Metacognitive instruction in listening for young learners. ELT Journal, 60(3), 222–232. https://doi.org/10.1093/elt/ccl002 Graham, S. (2017). Research into practice: Listening strategies in an instructed classroom set ting. Language Teaching, 50(1), 107–119. https://doi.org/10.1017/S0261444816000306 Graham, S., Santos, D., & Vanderplank, R. (2010). Strategy clusters and sources of knowledge in French L2 listening comprehension. Innovation in Language Learning and Teaching, 4(1), 1–20. https://doi.org/10.1080/17501220802385866 Graham, S., Santos, D., & Vanderplank, R. (2011). Exploring the relationship between listening development and strategy use. Language Teaching Research, 15(4), 435–456. https://doi.org/10.1177/1362168811412026 Hayati, A., & Mohmedi, F. (2011). The effect of films with and without subtitles on listening comprehension of EFL learners. British Journal of Educational Technology, 42(1), 181–192. https://doi.org/10.1111/j.1467-8535.2009.01004.x Herron, C., Morris, M., Secules, T., & Curtis, L. (1995). A comparison study of the effects of video-based versus text-based instruction in the foreign language classroom. The French Review, 68(5), 775–795. https://www.jstor.org/stable/397850 Holmqvist, K., Nyström, M., Andersson, R., Dewhurst, R., Jarodzka, H., & Van de Weijer, J. (2011). Eye tracking: A comprehensive guide to methods and measures. Oxford University Press. I755

Chapter 11. Listening 277

Holzknecht, F., Eberharter, K., Kremmel, B., Zehentner, M., McCray, G., Konrad, E., & Spöttl, C. (2017). Looking into listening: Using eye-tracking to establish the cognitive validity of the Aptis Listening Test. ARAGs Research Reports Online (Report AR-G/2017/3). https://www. britishcouncil.org/sites/default/files/looking_into_listening.pdf Huang, A. (2020). The dialogical nature of language use in interactive listening: Revisiting mean ing in context. Language Awareness, 29(1), 21–40. https://doi.org/10.1080/09658416.2019.1686509 Huang, H. T. D., Hung, S. T. A., & Plakans, L. (2018). Topical knowledge in L2 speaking assess ment: Comparing independent and integrated speaking test tasks. Language Testing, 35(1), 27–49. https://doi.org/10.1177/0265532216677106 Inceçay, V., & Koçoğlu, Z. (2017). Investigating the effects of multimedia input modality on L2 listening skills of Turkish EFL learners. Education and Information Technologies, 22(3), 901–916. https://doi.org/10.1007/s10639-016-9463-3 International Listening Association (ILA). (1995, April). A ILA definition of listening. The Listening Post, 53, 1, 4–5. Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87(4), 329–354. https://doi.org/10.1037/0033-295X.87.4.329 Leow, R. P. (2019). From SLA > ISLA > ILL: A curricular/pedagogical perspective. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 483– 491). Routledge. https://doi.org/10.4324/9781315165080-33 Littlewood, W. (2011). Communicative language teaching: An expanding concept for a changing world. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning: Volume II (pp. 541–557). Routledge. https://doi.org/10.4324/9780203836507.ch33 Loewen, S. (2015). Introduction to instructed second language acquisition. Routledge. https://doi.org/10.4324/9780203117811 Loewen, S., & Sato, M. (2017). Instructed second language acquisition (ISLA): An overview. In S. Loewen & M. Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 1–12). Routledge. https://doi.org/10.4324/9781315676968-1 Londe, Z. C. (2009). The effects of video media in English as a second language listening com prehension tests. Issues in Applied Linguistics, 17(1), 41–50. https://escholarship.org/uc/ item/0c080191. https://doi.org/10.5070/L4171005108 Macaro, E., Graham, S., & Vanderplank, R. (2007). A review of listening strategies: Focus on sources of knowledge and on success. In A. D. Cohen & E. Macaro (Eds.), Language learner strategies: Thirty years of research and practice (pp. 165–185). Oxford University Press. Mayer, R. E. (2005). Introduction to multimedia learning. In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (pp. 1–16). Cambridge University Press. https://doi.org/10.1017/CBO9780511816819.002 Mohsen, M. A. (2016). The use of help options in multimedia listening environments to aid language learning: A review. British Journal of Educational Technology, 47(6), 1232–1242. https://doi.org/10.1111/bjet.12305 Montero Perez, M., Van Den Noortgate, W., & Desmet, P. (2013). Captioned video for L2 listen ing and vocabulary learning: A meta-analysis. System, 41(3), 720–739. https://doi.org/10.1016/j.system.2013.07.013 Nunan, D. (2002). Listening in language learning. In J. C. Richards & W. A. Renandya (Eds.), Methodology in language teaching: An anthology of current practice (pp. 238–241). Cambridge University Press. https://doi.org/10.1017/CBO9780511667190.032

278 Ruslan Suvorov

Ockey, G. J. (2007). Construct implications of including still image or video in computer-based listening tests. Language Testing, 24(4), 517–537. https://doi.org/10.1177/0265532207080771 Pinner, R. (2014). The authenticity continuum: Towards a definition incorporating international voices. English Today, 30(4), 22–27. https://doi.org/10.1017/S0266078414000364 Pusey, K., & Lenz, K. (2014). Investigating the interaction of visual input, working memory, and listening comprehension. Language Education in Asia, 5(1), 66–80. https://doi.org/10.5746/LEiA/14/V5/I1/A06/Pusey_Lenz Révész, A., & Brunfaut, T. (2013). Text characteristics of task input and difficulty in second language listening comprehension. Studies in Second Language Acquisition, 35(1), 31–65. https://doi.org/10.1017/S0272263112000678 Rost, M. (2011). Teaching and researching listening (2nd ed.). Longman. Rubin, J. (1994). A review of second language listening comprehension research. The Modern Language Journal, 78(2), 199–221. https://doi.org/10.1111/j.1540-4781.1994.tb02034.x Rukthong, A. (2021). MC listening questions vs. integrated listening-to-summarize tasks: What listening abilities do they assess? System, 97, 1–12. https://doi.org/10.1016/j.system.2020.102439 Rukthong, A., & Brunfaut, T. (2020). Is anybody listening? The nature of second language listen ing in integrated listening-to-summarize tasks. Language Testing, 37(1), 31–53. https://doi.org/10.1177/0265532219871470 Seo, K. (2002). Research note: The effect of visuals on listening comprehension: A study of Japanese learners’ listening strategies. International Journal of Listening, 16(1), 57–81. https://doi.org/10.1080/10904018.2002.10499049 Suvorov, R. (2009). Context visuals in L2 listening tests: The effects of photographs and video vs. audio-only format. In C. A. Chapelle, H. G. Jun & I. Katz (Eds.), Developing and evaluating language learning materials (pp. 53–68). Iowa State University. Suvorov, R. (2015). The use of eye tracking in research on video-based second language (L2) listening assessment: A comparison of context videos and content videos. Language Testing, 32(4), 463–483. https://doi.org/10.1177/0265532214562099 Suvorov, R. (2018). Investigating test-taking strategies during the completion of computer-de livered items from Michigan English Test (MET): Evidence from eye tracking and cued ret rospective reporting. Cambridge Michigan Language Assessment (CaMLA) Working Papers 2018–02. Retrieved on 2 June 2022 from https://michiganassessment.org/wp-content/ uploads/2020/02/20.02.pdf.Res_.MichiganEnglishTestMET-EvidencefromeyeTrackingand CuedRetrospectiveReporting.pdf Suvorov, R. (2019). Multimedia in the teaching and assessment of listening. In M. A. Peters & R. Heraud (Eds.), Encyclopedia of Educational Innovation (pp. 1–6). Springer. https://doi.org/10.1007/978-981-13-2262-4_87-1 Suvorov, R., & Camp, A. (2017). An exploratory study of test-takers’ cognitive states using elec troencephalography (EEG) and retrospective interviews in the context of L2 testing. In J. Colpaert, A. Aerts, R. Kern, & M. Kaiser (Eds.), Proceedings of the XVIIIth International CALL Research Conference: CALL in context (pp. 698–704). University of Antwerp. Retrieved on 2 June 2022 from http://call2017.language.berkeley.edu/wp-content/up loads/2017/07/CALL2017_proceedings.pdf Suvorov, R., & He, S. (2022). Visuals in the assessment and testing of second language listening: A methodological synthesis. International Journal of Listening, 36(2), 80–99. https://doi.org/10.1080/10904018.2021.1941028

Chapter 11. Listening 279

van Compernolle, R. A., & McGregor, J. (2016). Introducing authenticity, language and interaction in second language contexts. In R. A. van Compernolle & J. McGregor (Eds.), Authenticity, language and interaction in second language contexts (pp. 1–9). Multilingual Matters. https://doi.org/10.21832/9781783095315-002 Vandergrift, L. (2007). Recent developments in second and foreign language listening compre hension research. Language Teaching, 40(3), 191–210. https://doi.org/10.1017/S0261444807004338 Vandergrift, L. (2010). Researching listening. In B. Paltridge & A. Phakiti (Eds.), Continuum companion to research methods in applied linguistics (pp. 160–173). Continuum. Vandergrift, L., & Goh, C. M. (2012). Teaching and learning second language listening: Metacognition in action. Routledge. https://doi.org/10.4324/9780203843376 Vandergrift, L., Goh, C. C. M., Mareschal, C. J., & Tafaghodtari, M. H. (2006). The metacogni tive awareness listening questionnaire (MALQ): Development and validation. Language Learning, 56(3), 431–462. https://doi.org/10.1111/j.1467-9922.2006.00373.x Vandergrift, L., & Tafaghodtari, M. H. (2010). Teaching L2 learners how to listen does make a difference: An empirical study. Language Learning, 60(2), 470–497. https://doi.org/10.1111/j.1467-9922.2009.00559.x Wagner, E. (2010). The effect of the use of video texts on ESL listening test-taker performance. Language Testing, 27(4), 493–513. https://doi.org/10.1177/0265532209355668 Wagner, E. (2015). Survey research. In B. Paltridge & A. Phakiti (Eds.), Research methods in applied linguistics: A practical resource (pp. 83–99). Bloomsbury. Wagner, E., Liao, Y.-F., & Wagner, S. (2021). Authenticated spoken texts for L2 listening tests. Language Assessment Quarterly, 18(3), 205–227. https://doi.org/10.1080/15434303.2020.1860057 Weyers, J. R. (1999). The effect of authentic video on communicative competence. Modern Language Journal, 83(3), 339–349. https://doi.org/10.1111/0026-7902.00026 Winke, P., & Lim, H. (2014). The effects of testwiseness and test-taking anxiety on L2 listen ing test performance: A visual (eye-tracking) and attentional investigation. IELTS Research Reports Online Series, 3, 1–30. Retrieved on 2 June 2022 from www.ielts.org/~/media/re search-reports/ielts_online_rr_2014-3.ashx Yeldham, M. (2017). Techniques for researching L2 listeners. System, 66, 13–26. https://doi.org/10.1016/j.system.2017.03.001 Yeldham, M., & Gao, Y.-J. (2020). Examining whether learning outcomes are enhanced when L2 learners’ cognitive styles match listening instruction methods. System, 97, 1–15. https://doi.org/10.1016/j.system.2020.102435

Chapter 12

Reading Adopting interdisciplinary paradigms in ISLA reading research Irina Elgort

Victoria University of Wellington

What is involved in skilled reading and learning to read in a second language (L2), compared with the first language (L1)? Answers to this question can shape our thinking about what needs to be taught and investigated in instructed second language acquisition of reading by learners already literate in their L1s. Reading comprehension is a complex cognitive task that is notoriously diffi cult to define and measure (Nation & Waring, 2019), and building on relevant research paradigms and methods from adjacent fields offers a way to address this complexity. In this chapter, I consider interdisciplinary methodological approaches used to study reading that ISLAR researchers can adopt and adapt in investigating learning to read and reading to learn in an additional language. The chapter concludes with a detailed look at selected studies exemplifying the use of such research methods and advice for future L2 reading researchers. Keywords: instructed second language acquisition of reading, component processes in reading, eye-tracking, self-paced reading, interdisciplinary research methods

In this chapter, I first consider the place of reading in second language acquisition and outline what is involved in skilled reading and learning to read in a second or an additional language (L2), comparing it with the first language (L1). I then consider what needs to be taught in instructed second language acquisition of reading (ISLAR) and present research approaches and paradigms used to study reading and reading development. The chapter concludes with a detailed look at individual studies that can be used to model ISLAR research and advice for future reading researchers.

https://doi.org/10.1075/rmal.3.12elg © 2022 John Benjamins Publishing Company

282 Irina Elgort

1. What is reading and why is it important for ISLA? Reading is an intrinsic part of modern-day life. We read for information and enjoy ment. Reading influences our cognitive and emotional development. Reading is also critical in academic and professional study. Through different stages of education, from kindergarten, through primary and secondary schooling, to tertiary study, reading is used to access and share knowledge about the world and to understand the human condition. Reading is a key source of target language input for L2 learners (VanPatten, 1996). While engaging with the text to derive and construct meaning, L2 learners encounter new words, phrases, and syntactic structures, which contribute to their target language proficiency (Bernhardt, 2011; Grabe, 2009; Han & Anderson, 2009). At intermediate and advanced stages of L2 acquisition, reading is a critical source of increasing vocabulary beyond the 5000–6000 most frequent word families, as lower-frequency words and word meanings are more likely to occur in written texts than in day-to-day spoken interactions (Nation, 2006). Readers learn covertly (without intent or awareness) through exposure to the language, and overtly through discovering and noticing linguistic forms and fea tures, deriving rules and conventions, and deliberately committing them to mem ory (Ellis, 2015; Rebuschat & Williams, 2012). Written language is well suited to overt and deliberate study because it is ‘frozen in time,’ affixed to the page; so, L2 learners are able to notice, analyze, and study language forms, make and test mean ing inferences, verify hypotheses about morphosyntax, and use linguistic and meta linguistic strategies. Being more regular and standardized than connected speech, written text provides unique opportunities for inductive and discovery learning of the target language regularities. Grabe (1991, p. 375) described reading as “probably the most important skill for second language learners in academic contexts.” 2. An overview of ISLA reading research: What we know and what we need to know 2.1

Reading as a complex process

Grabe (2014) called fluent reading a “miracle.” An ability to fluently decode written symbols and create meaning from text comes later than speech in normal child development and requires considerable deliberate learning and instruction. Below I consider what we know about the component knowledge and processes under pinning the common “miracle” of fluent reading comprehension.

Chapter 12. Reading 283

Reading involves sublexical, lexical, syntactic, and discourse level processing, and engages multiple knowledge systems: orthographic, linguistic, and general knowledge (Perfetti & Stafura, 2014). Reading ability relies on the foundational processes of mapping of visual perceptual input onto linguistic representations, recognition of word forms and accessing their contextually appropriate meanings, and parsing sentences into chunks for comprehension and encoding semantic propositions. Reading also engages higher-order processing skills of constructing text representation and situational models by progressively building meaning from successive sentences, “supplemented with inferences, when needed, to make the text coherent” (Verhoeven & Perfetti, 2017, p. 22; see also Graesser et al., 1997). Most researchers accept an interactive account of reading comprehension, where language (lexical and morphosyntactic) knowledge and word and sentence processing (real-time recognition of words and word parts, and syntactic parsing) interact with readers’ knowledge about the world, topic familiarity, use of reading strategies, inferences, and predictions. Reading comprehension requires an ongoing interaction between (1) decoding written language, (2) real-time processing of new information from the input, and (3) integrating it with the information previously derived from the text and that of the reader’s existing knowledge about the word (Gerrig & McKoon, 1998; Kintsch, 1998). Reading in an L2 has additional processes associated with cross-language in fluences (Koda, 2005) that may be facilitatory or inhibitory. For example, cognates (i.e., words that have an overlapping form and meaning in the two languages of a bilingual, e.g., ring or winter in English and Dutch) are recognized and processed faster than non-cognates in L2 reading (van Hell & Dijkstra, 2002), but interlingual homographs (words with similar spellings but different meanings, e.g., angel, which means ‘sting’ in Dutch) may be processed slower (van Heuven et al., 2008). Although key component processes of skilled L1 and L2 reading are the same, their differences become apparent from a developmental perspective. While chil dren learn to read in their L1 at pre-school and primary school, ISLA reading re search considered in this chapter most commonly involves adolescents and adults, who are already skilled L1 readers. Critically, learning an L2 and learning to read in that language often take place simultaneously. Beginner L1 readers already have considerable knowledge of their L1 (spoken vocabulary, grammar) and listening comprehension skills; this allows them to further fine-tune and extend their linguis tic knowledge from reading. Beginner L2 readers, however, need to master the lan guage while learning to read in it. Hereby lies a paradox (Cobb, 2007): L2 learners need to bring sufficient target language knowledge to reading, in order to achieve sufficient comprehension and gain new target language knowledge from reading. What we need to know is how to assist L2 learners in resolving this paradox.

284 Irina Elgort

To clarify where instructional interventions may be most effective, Bernhardt (2011) proposed a research-informed framework of L2 reading comprehension with three core dimensions: L2 proficiency, L1 reading skills, and other skills and knowledge including strategies, inferences, motivation, and general knowledge. The first dimension, L2 proficiency (including vocabulary, grammar, and syntax), is likely related to the third dimension, as lower-proficiency learners tend to use ineffective reading strategies, such as word-by-word reading and translating into L1 (Bernhardt & Kamil, 1995), and are less successful in inferring meaning from context (Elgort et al., 2020) and increasing their vocabulary from reading (Laufer, 2003, 2009). Because access to L2 knowledge is more effortful for lower-proficiency learners, they are less able to attend to both form and meaning in real-time (VanPatten, 1996) or to engage in higher-order comprehension processes, such as inference-making, building a coherent representation of an extended discourse and critically evaluating its truth value. The second dimension, L1 reading abil ity, has been conceptualized either as unconditionally transferable to L2 reading (Cummins, 1979) or conditionally transferrable, i.e., transferable once L2 profi ciency has reached a minimum threshold and increasing its contribution as L2 proficiency develops (Alderson, 1984; Yamashita, 2002). In any case, L1 reading ability plays an important positive role in L2 reading comprehension. A recent multi-site L2 English reading (eye-tracking) study with participants from 12 L1s showed that L2 reading fluency was mostly predicted by L1 reading fluency, but that L2 reading comprehension accuracy was related to mastery of L2 reading com ponent skills, such as spelling, vocabulary, and decoding (Kuperman et al., 2022). In summary, L2 reading development is affected positively by L1 literacy and neg atively by (low) L2 proficiency, with the latter affecting a host of lower-order and higher-order processes required in fluent reading comprehension. 2.2

Reading development

An influential model of reading, the simple view of reading, which is widely adopted in L1 reading development research, posits that reading comprehension is a prod uct of two latent variables (or skill-sets): word reading (decoding) and listening comprehension (Gough & Tunmer, 1986; Hoover & Gough, 1990; Verhoeven & Leeuwe, 2012). To investigate the relationships between reading comprehension, language and cognitive component skills, word reading, and listening compre hension, Kim (2017) administered a battery of tasks to 350 children in grade 2 (7–8-year-olds). Multiple measures and standardized tests normed for the target learner population were used where possible. Because the study tested theoreti cally motivated hypotheses about the causal relationships among the latent vari ables (i.e., listening comprehension, reading comprehension, and word reading)

Chapter 12. Reading 285

and observable variables derived from specific models of reading comprehension (i.e., simple view of reading and direct and indirect effect model of reading), the author used confirmatory factor analysis (CFA). Note that CFA is used to “test and evaluate the validity of specific, theoretically motivated predictions or theo retical frameworks that may generalize to other samples and populations” (Goring et al., 2021, p. 2). Kim found that word reading and listening comprehension ex plained virtually all the variance in reading comprehension (as predicted by the simple view of reading), and that they were, in turn, predicted by a constellation of language and cognitive skills. Importantly, the results showed that vocabulary and grammatical knowledge were necessary for higher-order cognitive skills such as inference making, perspective taking, and comprehension monitoring (Kim, 2017, p. 325). Language and cognitive component skills explained 66% of variance in reading comprehension. Moreover, word reading (the third latent variable) was also predicted by foundational skills, including vocabulary and grammatical knowledge. In summary, what we know about reading development is that foun dational language knowledge and skills underpin all latent components in reading comprehension. This conclusion underscores that poor knowledge of vocabulary and grammar (Bernhardt, 2011) is likely to be a bottleneck in L2 reading compre hension. Therefore, what we need to know is how teachers may create conditions where even low proficiency L2 learners would be able to read in the target language with understanding. Children learning to read in the L1 are sophisticated L1 users who know thou sands of words and have good mastery of grammar needed for listening com prehension. For them, a key challenge is word reading (e.g., letter recognition, knowledge of grapheme-phoneme correspondence rules, spelling, sight word efficiency). For example, when English-speaking children learn to read, they al ready know the meaning of the (spoken) word, cat, so their task is to associate the spoken form and meaning with its written form c-a-t, and to learn that the letter, c, in cat is pronounced /k/. Developing L1 readers’ benefit from explicit decoding instructions, such as phonics instructions in alphabetic L1s, whereby children are taught grapheme-phoneme relationships (Ehri et al., 2001; Torgerson et al., 2006) and practice written word recognition and decoding (Castles et al., 2018). As L1 readers improve their word reading skills – due to explicit instruction, practice, and experience – their reading comprehension also improves. Thus, word reading is no longer a strong predictor of individual differences in L1 reading comprehension in adolescents and adults (Ricketts et al., 2020); instead, these differences in com prehension are associated with individual readers’ vocabulary size and quality of lexical knowledge, as well as higher-order skills (e.g., inference making, perspective taking, comprehension monitoring) and with their ability to flexibly select and use appropriate reading comprehension strategies.

286 Irina Elgort

For developing L2 readers, challenges extend far beyond word reading as experienced by children learning to read in L1. When learners’ L1 and L2 are orthographically distant, such as Chinese and English or English and Hebrew, de coding written L2 texts may be particularly challenging at early stages of learning to read (Brice et al., 2022; Hamada & Koda, 2008; Katz & Frost, 1992). Importantly, for developing L2 readers, key challenges are assumed to be primarily related to the foundational knowledge (i.e., vocabulary and grammar) and input processing skills in the target language that affect both listening and reading comprehension. These common L2 and language specific challenges need to be addressed when adopting approaches to L2 reading instruction and developing learning materials. 2.3

What is worth teaching?

An important question that needs to be considered in ISLAR is: What is worth teaching? Nation (2007) proposed a “common-sense” balanced language learning curriculum approach; an L2 program of study should allocate approximately equal amounts of time to meaning-focused input, meaning-focused output, languagefocused learning, and fluency development. Reading is included in at least three of these four strands. A key activity in the meaning-focused input strand is extensive reading – reading plenty of books at an appropriate difficulty level, with no more than 2–5% unfamiliar running words (Hu & Nation, 2000; Laufer, 1989). In the language-focused learning strand, vocabulary focused activities, word reading, intensive reading and text study are core. Other activities in this strand include practicing effective reading and learning strategies (e.g., guessing from contexts and using dictionaries; see also Grabe & Stoller, 2018, for instructional activities that fit into this strand). In the fluency development strand, learners also engage in meaning-focused reading activities, but the readings must be even easier than in the meaning-focused input strand, containing no unknown words or grammar structures and relying on familiar content. The goal is to read with understanding at a speed of around 200 words per minute. This strand is characterized by high input volumes and activities performed under time pressure, including speed reading and repeated reading. The role of direct teaching in the four-strands curriculum is relatively small (Nation, 2012, p. 137): “[t]he teacher’s main jobs are to plan a good course (the most important job), to organise learners’ learning opportunities both in and outside the classroom, to train learners in language learning strategies …, to test learners to make sure that they are making progress and that they know how well they are doing, and finally, the least important but still important job, to teach.” Explicit teaching of reading, therefore, is not the main job of a language teacher, whereas planning a course and creating learning opportunities should be at the forefront of

Chapter 12. Reading 287

ISLAR. This view of ISLAR is aligned with what Sato and Loewen (2019) refer to as evidence-based pedagogy, i.e., teachers making course and teaching decisions based on the findings of research in SLA and applied linguistics. Reading research is a case in point; since reading at an appropriate difficulty level has been shown to be the most effective way to improve L2 reading comprehension and fluency, teachers need to secure access to a large reading repository, collate and curate readings suitable for learners of different proficiency (including graded readers for beginner and early intermediate learners), covering a wide range of topics and genres that would appeal to the students, and create tasks and activities that necessitate reading in and out of class. Another important task of a language teacher is to use appropriate measures to monitor students’ progress in reading and motivate their learning (for further information on assessing reading, see Alderson, 2000; Bernhardt, 2011; Chapter 6; Chapter 17; Grabe, 2009; Nation, 2009). In summary, explicit L2 reading instruction should include teaching regular ities in the L2 writing system and orthography-to-phonology mapping at early stages of L2 reading acquisition, helping students build vocabulary and grammar knowledge from reading (Elgort, 2020; VanPatten, 1996; see also Chapters 8 and 9 in this volume), teaching students how to select level-appropriate readings, and requiring and motivating them to read in and out of class (Nation & Waring, 2019). 3. Methodological approaches to data elicitation and interpretation Within the scope of this chapter, I focus on L2 reading studies with L1-literate learn ers, whose proficiency in the L2 is lower than in the L1 (i.e., unbalanced bilinguals). In this section, I highlight the interdisciplinary nature of reading research and the value of building on research paradigms and measures not only from ISLA and applied linguistics, but also from cognitive psychology and education. I first focus on reading and reading development studies that have used experimental research paradigms that afford a high degree of control over factors that affect learning and processing. I then consider in-situ (classroom and independent learning) methods and observational, exploratory studies that assess component knowledge and skills contributing to reading comprehension (and their interrelationships). 3.1

Experimental laboratory research methods

Experimental laboratory methods have been used in L2 reading research to investi gate the effects of instructional interventions and learning approaches. These stud ies are conducted under highly controlled conditions that reduce the influence of potential confounds. Normally, researchers conduct such instructional and learning

288 Irina Elgort

interventions individually or in small groups, closely monitoring learner behavior and task completion; or, learning may be conducted outside of the laboratory, but tested using highly-controlled, experimental tasks. Real-time (online) processing and offline measures are used to record and evaluate changes that arise during and from reading (Elgort et al., 2020; Elgort & Warren, 2014; Godfroid et al., 2018; Pellicer-Sánchez, 2016) and resulting from specific instructional interventions (Rastle et al., 2021). Among offline pre-testing and outcome measures, researchers commonly use tests of vocabulary recognition and recall, reading aloud of words and nonwords, grammaticality judgements, and cloze and gap-fill tasks. Reading comprehension can be measured using true/false, multiple-choice, short answer, open-ended ques tions, free recall, and information transfer tasks (Day & Park, 2005; Pulido, 2007; Vander Beken & Brysbaert, 2018; see also Grabe, 2009; Nation, 2009). Such offline tests are not conducted under time pressure (although time-on-task may be re corded and included in the results’ interpretation in some cases). These tests do not distinguish between strategic and automatic L2 reading processes. Real-time measures, on the other hand, are used when researchers aim to reduce readers’ use of meta-cognitive strategies, probing time-sensitive online processes characteristic of fluent reading. Experimental laboratory research meth ods offer insights into real-time processing not only post-intervention but also as learning unfolds. Response latency and accuracy measures for individual learning episodes can be used to measure incremental changes and for charting develop mental trajectories over time. Reading times can be recorded and analyzed to track the dynamics of incremental change in learning from context (Dirix et al., 2020; Elgort et al., 2018; Godfroid et al., 2018). These measures reflect processes that take place during reading, such as word-to-text integration (Mulder et al., 2021; Perfetti & Stafura, 2014), ambiguity resolution (Frenck-Mestre & Pynte, 1997; Van der Schoot et al., 2009), and allocation of attention (Godfroid et al., 2013). Some of these tasks are only possible in the laboratory because they require specialized equipment and software, for example, masked priming and such neurophysiological research methods as recording event-related brain potentials (ERPs; see Elgort & Warren, in press, for an overview). Eye-tracking is a research method that is increasingly being adopted in ISLAR research and, therefore, should be considered in detail. Originally, a laboratory research method, it is now increasingly used on location (e.g., in schools or li braries). Eye-tracking provides a window into real-time reading comprehension noninvasively, as readers’ eye-movements are recorded moment-by-moment, under conditions close to natural reading. It is generally accepted that oculomotor behav iors index cognitive processes in reading comprehension (e.g., Kuperman & Van Dyke, 2011). Eye-movement research affords insights into effects of instructional

Chapter 12. Reading 289

and learning approaches on L2 learners’ online reading and learning behavior. Using eye-tracking, Dirix et al. (2020), for example, found a processing explana tion for an earlier ISLAR study finding that performance on offline recall tests is significantly worse when the same texts are read in the L2 than in the L1 (Vander Beken & Brysbaert, 2018). Eye-movement measures used by Dirix et al. showed that the processing costs of reading expository texts in an L2 appear to be particu larly high when students are instructed to study the text for a test, compared to informal reading. Multiple eye-movement measures are usually recorded to establish a more de tailed and accurate picture of the component processes associated with reading (Rayner, 2009). In reading research, it is common to distinguish between early measures (e.g., first-fixation duration, gaze duration, probability of word skipping), associated with word identification, and late measures (e.g., go-past time, total read ing time, number of fixations, number of regressions from and to a word), associ ated with meaning integration and constructing a mental model of the text (Dirix et al., 2020; Elgort et al., 2018). Thus, recording eye movements during reading can show how different types of instruction affect individual component processes and time-course of reading comprehension. Dirix et al. (2020) and Elgort et al. (2018) studied L2 reading as a source of content learning (i.e., extracting and retaining information from expository texts) and language learning (i.e., learning new words from reading), respectively. The use of eye-tracking enabled these researchers to investigate how L2 readers learn from reading, with practical implications for ISLA. Dirix et al. (2020) showed that when L2 is the medium of instruction and content learning is the goal, it takes learners longer to process and recall new information from reading, compared to reading in their L1; they concluded that supplementing academic L2 reading with L1 readings on the same topic would be beneficial for content learning. Elgort’s et al. (2018) study showed that unassisted incidental learning of L2 vocabulary from reading remains incomplete after tens of contextual encounters, suggesting that initial contextual word learning could be improved through the use of more deliberate learning techniques and strategies (Elgort, 2020). 3.2

In-situ research methods

L2 reading research outside the laboratory can be grouped into two main cate gories: classroom studies, where learner behavior is observed and data collected during classroom activities (e.g., Boutorwick et al., 2019), and independent reading research, where students may complete reading and associated tasks in their own time, often using digital tools and software. In-situ reading studies have been used to examine the effectiveness of different instructional approaches and strategies,

290 Irina Elgort

design of computer-assisted reading software, and the development of students’ motivation to read. In addition to the offline reading comprehension measures described in the laboratory research section above, in-situ studies use more naturalistic types of assessment, such as classroom observations, self-report, progress charts, activity records, and book club style activities, among others. For example, such qualitative techniques as think-aloud procedures or post-reading questionnaires may be used, in which readers reflect and report on their use of cognitive and metacognitive strategies (Ghavamnia et al., 2013; Lin & Yu, 2015). An important area of investigation in in-situ reading research is extensive read ing (Nation & Waring, 2019). One of the main topics investigated in extensive reading research is reader motivation and willingness to read (e.g., Hardy, 2016; Yamashita, 2013). Factors that may influence motivation include “the pleasure of reading, the reward of success in reading, the satisfaction of obvious progress, the virtuous feeling of doing something of value, and the power of independence and control” (Nation & Waring, 2019, p. 98). The effect of instructional approaches can be investigated using e-reading tools like Xreading (https://xreading.com) and MReader (https://mreader.org) that log student access and progress and require them to answer comprehension questions, providing data for ISLAR researchers (Huang, 2013; Nation & Waring, 2019). Another topic is the effect of extensive reading programs on reading comprehension, commonly measured using reading sections of standardized language proficiency tests (e.g., TOEFL, IELTS) or inde pendently developed comprehension questions of the types described in the labora tory research methods section. The third important (but less researched) direction in extensive reading research is reading fluency improvements. When measuring reading speed gains, Kramer and McLean (2019) recommend not only considering word count but also calculating average word length in characters, i.e., estimating fluency gains using standard words per minute. Challenges in measuring reading fluency in extensive reading include choosing the right text length and difficulty for the target learner group (e.g., by matching learners’ vocabulary size with the lexical profile of the text) and selecting and matching texts for pre- and post-tests (Nation & Waring, 2019, pp. 107–109). Nation and Waring recommend using longer texts (e.g., about 3 minutes instead of 1 minute long) and multiple measures, in order to increase reliability. In the context of a large and fast-expanding volume of L2 reading research, metaanalyses offer a useful research paradigm that adds clarity to the findings across multiple individual studies (e.g., Hall et al., 2017; Klingner et al., 2006; Li et al., 2021). In this context, we are also hearing calls for increased methodological rigor in meta-analysis research, which could include replications of previous meta-analyses (Boers et al., 2021; Jeon & Yamashita, 2022) and empirically testing their findings

Chapter 12. Reading 291

(Hamada, 2020). In a replication meta-analysis of extensive reading studies, Hamada (2020) first estimated the effect sizes applying strict study selection criteria to ensure equivalence between treatment and control groups. He found that the effect sizes reported in previous large-scale meta-analyses were likely overestimated. He then followed up with an empirical study with 224 Japanese learners of English, applying his proposed equivalency criteria (the propensity score matching), and confirmed a smaller effect size of extensive reading on reading comprehension. Hamada’s in novative research methodology highlights the importance of research design when conducting and interpreting meta-analyses – a critical lesson for future researchers. 3.3

Detailed examples from experts’ research: Considerations for ISLAR

This section introduces two studies that illustrate how laboratory research design and methods have been realized in recent L2 reading studies. The focus of the detailed examples is on interdisciplinary research methods and approaches that may be less familiar to budding researchers but are gaining momentum in ISLAR research. 3.3.1 Learning to read words: An instructed artificial language study Researchers in cognitive and developmental psychology and psycholinguistics often tackle topics that are also at the heart of SLA research, and reading is one of them. These reading studies offer useful experimental paradigms, statistical tools, and insights that can be adopted in ISLA research. Thus, it behooves L2 researchers to look beyond our own field and learn from research in adjacent disciplines. Exemplar study: Rastle et al. (2021) Goals and research questions The question posed by Rastle and colleagues in this laboratory study is immediately relevant to ISLAR. The researchers compared outcomes of two learning approaches: (1) instructed approach, in which explicit instructions about the novel writing system (i.e., spelling-tosound and spelling-to-meaning regularities) were provided prior to experiential learning, and (2) experience (or discovery) approach, in which no explicit instructions were provided. Participants 48 adult native speakers of English learned to read words in two artificial languages with unfamiliar writing systems (regular and irregular). Design and procedure For one artificial language (24 words), the last (silent) symbol of each word was systematically associated with a semantic category; for the other language (24 words), this association was random. The learning method (explicit-instruction vs. discovery-learning) was manipulated between two groups of learners, while the semantic-marker factor was manipulated within participants.

292 Irina Elgort

The learning procedure conducted over 10 days was designed to ensure equivalency in the initial knowledge of the experimental (explicit instructions) and control (discovery-learning) groups. In the pre-training phase, participants from both groups learned sound-meaning associations of the novel words. After this, the discovery-learning group completed nine orthography training sessions consisting of four tasks: reading the words aloud, saying the meaning of the visually presented words, orthographic search (picture word matching), and meaning judgement (a multiple-choice task), in which participants selected the correct orthographic form for a given meaning description. For the explicit-instruction group, the first orthography training session was replaced by a session that included an explicit explanation of orthography-phonology mapping, phonics training (orthography-phonology mapping practice with feedback) and symbol-picture matching (orthography-meaning mapping practice with feedback). The eight remaining training sessions for these groups were the same as for the discovery-learning group. This design ensured identical prior knowledge and training procedures for the two groups, bar the instructional intervention for the explicit-instruction group. Such tight control over the learning conditions is impossible in in-situ settings. Measures Six post-tests were conducted for both groups on day 10 in the following order: reading aloud, saying the meaning, nonword reading aloud (24 untrained novel words for each language), recognition memory, auditory-semantic matching, and semantic generalization (identifying the semantic category of untrained novel words, 24 in each language). The use of multiple outcome measures offered a more robust and fine-grained way of testing the knowledge gained from the learning phase. Both words encountered in the learning phase and novel (untrained) items from the same artificial languages were included in the posttests. This allowed the researchers to test not only the retention of trained words but also learners’ ability to generalize spelling-sound and spelling-meaning regularities to new items – a critical skill for language learners who commonly encounter both familiar and unfamiliar words in L2 reading. Beyond response accuracy on posttests, the researchers charted the learning trajectory for the two groups using accuracy of responses in each training session during the learning phase, revealing the time-course dynamics for the two instructional conditions and language types. Data analysis The use of generalized mixed-effect models (Jaeger, 2008) in the analysis of response accuracy data for each test accounted for the inherent variability of learning for the individual participants (by including a random factor for participants in each model). Results The researchers found a clear advantage of the explicit-instruction-plus-experience approach over the discovery-learning approach on learners’ ability to generalize underlying regularities, but this advantage was less prominent in the learning of individual words. Rastle et al. concluded that the study evidenced “the dramatic impact of teaching method on outcomes during reading acquisition” (p. 1) and recommended that explicit instruction on spelling-sound and spelling-meaning regularities is used “to ensure that all learners acquire knowledge of important underlying regularities within the writing system” (Rastle et al., 2021, p. 13).

Chapter 12. Reading 293

This study is an excellent example of laboratory simulation learning research. The use of two artificial languages, designed to reflect two levels of regularity in form-meaning mapping (i.e., systematic/random), made it easier to investigate the effect of explicit instruction on these two L2 types. Using artificial languages also made it easier to control learners’ prior knowledge of the language, as all partici pants were equally unfamiliar with the archaic scripts used to create the learning targets. In addition to controlling the participants’ target language proficiency and item characteristics, the researchers controlled the number, nature, and regularity (lag) of the training sessions. These variables are often difficult (if not impossi ble) to perfectly match in natural languages and real learning settings. Thus, the study design afforded a high degree of confidence that the observed effects were due to the primary-interest predictors – learning treatment (with/without explicit instructions) and regularity of form-meaning mapping (systematic/random). 3.3.2 Learning to read sentences: A self-paced reading study Next, I consider an L2 reading study that used a real-time processing measure from psycholinguistics, increasingly adopted in ISLA research, self-paced reading (SPR). Exemplar study: Mulder et al. (2021) Goals and research questions Two research questions were posed: (1) How do word-to-text integration (WTI) processes change over time, after controlling for word frequency and students’ decoding fluency, gender, and age? (2) How does WTI relate to reading comprehension? WTI is concerned with retrieving contextually appropriate meanings of words, integrating them into the meaning of the text, and updating the situation model, in real-time (Perfetti & Stafura, 2014), bridging word-level and higher-order processes in reading comprehension (Perfetti, 2007). L2 readers may have difficulties in WTI due to less-than-optimal quality of word knowledge, insufficient experience with L2 syntactic structures, and problems with suppressing or inhibiting irrelevant and out-of-date information when updating the mental model of the text. Participants 7th grade Dutch learners of English (11–13-year-olds) Design Mulder et al. examined how three WTI text manipulations (anaphora resolution, argument overlap, and anomaly detection) in complex and simple (baseline) passages affected L2 reading. A longitudinal study design was adopted, with results from the Fall (T1) and Spring (T2) teaching terms compared to ascertain the effect of instruction on the development of WTI in L2 reading.

294 Irina Elgort

Measures Reading times on the interest areas, measured in a SPR task, were used as the main outcome variable. SPR has been used to study real-time word and phrase integration into L2 sentence and text representation (see Marsden et al., 2018, for a methodological synthesis). In SPR, initially all words in a reading text are masked (presented as dashes); the first button-press by the reader displays the first word, the next button-press hides it and displays the next word, and so on. The time of each display duration is recorded, affording a measure of reading (dis)fluency. Slower reading times are predicted for interest areas containing syntactic or semantic ambiguities or anomalies. Mulder et al. tested the effect of syntactic and semantic WTI manipulations on reading times on the target word and the spillover region (two words following it). The researchers also tested whether reading comprehension (measured by a standardized test) could be predicted by WTI. Results Mulder et al. found that the average reading speed of the L2 readers improved in T2, compared to T1, suggesting an overall positive effect of secondary school English language instruction on reading fluency. Importantly, the researchers also found a relationship between reading times for two types of WTI manipulations and participants’ reading comprehension. L2 readers who showed larger processing costs (longer reading times) for complex compared to simple argument overlap and anomaly passages achieved better reading comprehension.

By selecting an appropriate data elicitation method, SPR, Mulder et al. (2021) con structed a nuanced representation of changes in sensitivity to different anomalies in L2 readers, over time. The use of reading times as a measure of longitudinal changes in L2 WTI afforded a more detailed understanding of the intricate relationship be tween reading speed at the points of semantic and syntactic ambiguity and reading comprehension, showing that faster is not always better. 4. Advice for future reading researchers While acknowledging important advantages of the innovative interdisciplinary methods described above, being aware of their limitations informs design decisions. Early-career ISLAR researchers can benefit from carefully considering limitations of published studies because they provide excellent ideas for future research. Let us consider how such limitations can be used to this end. Although the use of artificial languages has many advantages (see above), ar tificial languages are often created to exemplify a particular structure or relation ship in a consistent way. Thus, using artificial language is helpful in hypothesis testing, but direct generalizations from the findings of artificial language studies to learning to read in a natural L2 are not straightforward. Therefore, in-situ ISLAR

Chapter 12. Reading 295

studies are useful in extending and adding ecological validity to the findings of highly-controlled experimental studies. The importance of choosing appropriate measures to test research hypotheses of interest cannot be overestimated. Every research method has pros and cons. Let us consider SPR. The SPR procedure is straightforward, easy to create and admin ister even in a school computer lab. SPR participants do not need to make explicit decisions about target items (unlike most experimental response-time tasks), which reduces the likelihood of them adopting task-related strategies and leads to more automatic processing. However, SPR obscures some component processes in read ing, including word skipping, previewing, and regressions to previous words, plus there is no way of distinguishing between early and late reading processes. For some studies (such as Mulder et al., 2021) these limitations are less problematic, but for other studies (e.g., investigating predictive processing in reading or global and local contexts effects on WTI), SPR data would not be a fine-grained enough measure and other methods, such as eye-tracking or ERP, need to be considered. In summary, an experimental paradigm is only as good as its affordances to generate the right type of evidence for the research question posed. To obtain fine-grained insights into L2 reading, researchers increasingly turn to eye-tracking. Indeed, recording eye-movements during reading is one of the most time-course-sensitive techniques (another is recoding neural activation dur ing reading using ERPs, but it is not covered in this chapter). However, ISLAR researchers also need to be careful not to over interpret results of the oculomotor behavior, since there is little consistency in how eye-movement measures correlate with different standardized reading comprehension tests (e.g., Mézière et al., 2021). Also, although researchers agree that eye-movements reflect cognitive processes in reading, one-to-one mapping of eye-movement measures to specific component processes is not an exact science. Research aiming to clarify this relationship in L1 reading is only recently starting to appear (Southwell et al., 2020), while L2 research is yet to emerge. Southwell et al. (2020) were able to generate accurate predictions of L1 readers’ comprehension scores (on multiple-choice comprehension ques tions) from global eye-movement features, which generalized across studies. This pioneering research holds promise that eye-movement data may be used alongside (or even instead of) traditional tests of reading comprehension in the future. SLA researchers can make an important contribution to this exciting line of research by recording eye-movements in reading texts of different lengths and genres by L2 readers of different proficiency and L1, and making their data freely available via open research repositories, such as the Open Science Framework (see Kuperman et al., 2022).

296 Irina Elgort

5. ISLAR research methodological guide In planning ISLAR studies, the following key methodological decisions need to be made: (1) Type of study: a. learning to read or reading to learn; b. experimental laboratory studies: to test effects of primary-interest predictors on specific reading components, controlling for confounding effects of other variables (e.g., recording eye-movements to test the effect of global and local context on word-to-text integration in reading, or using priming to study the nature of L2 semantic or morphological processing); c. quasi-experimental in-situ design: when the goal is to investigate reading in ecologically accurate learning environments, either in the classroom or in independent learning (e.g., observational studies, post-reading interviews, think-aloud techniques, motivation surveys). (2) Type of instruction: a. explicit language-focused or skill-focused instruction (e.g., grapheme- phoneme correspondence rules; grammar rules; pre-teaching vocabulary; instructions on using dictionaries and vocabulary notebooks for contextual word learning; speed-reading); b. meaning-focused reading instruction (e.g., pre- and during-reading strategies, such as activating background knowledge, using graphic organizers); c. implicit reading instruction, developing tasks for discovery learning (e.g., increasing frequency or salience of target linguistic features, words, phrases, and structures in reading materials). (3) Materials: a. real L2 items (cognates/non-cognates; high/low frequency vocabulary; single words/multi-word expressions; ambiguous/unambiguous language; anomalous/normal syntactic structures); b. artificial language (e.g., acquisition of vocabulary and morphosyntax of an artificial language through reading); c. matching reading materials to learner profiles (e.g., L2 proficiency, vocabulary size) and learning goals (e.g., extensive/intensive reading; reading fluency/comprehension accuracy). (4) Measures: a. standardized published tests (e.g., L2 reading and listening comprehension, word reading efficiency, vocabulary size); b. assessment of cognitive and verbal skills using psychometric tests and surveys, developed by educational psychologists and psycholinguists; c. type of outcome variables (e.g., accuracy of comprehension; reading fluency); d. incremental learning measures, post-tests, or both.  ie-3923

ie-3924

ie-3925

Chapter 12. Reading 297

6. Troubleshooting ISLAR research Common question 1: How do I ensure participants engage with a reading task in the same way? Ensuring that participants understand and closely follow the learning procedure is a key requirement of ISLAR studies. This is easier to control in laboratory experi ments than in in-situ studies, where data may be collected over multiple learning sessions and in varying contexts. Using digital tools that automatically log students’ time-on-task, actions, and progress are helpful in monitoring and fine-tuning learn ing behavior, thereby reducing participant and data loss. Alternatively, participants could be instructed to keep a reading/learning log and share it with the researcher. Common question 2: How do I maintain the integrity of ISLAR experimental conditions? When designing comparative ISLAR experiments (e.g., comparing different in structional approaches) it is important to maintain the integrity of the experimen tal conditions. If the goal is to compare effectiveness of discovery-learning and explicit instruction, neither condition should be privileged in terms of learning opportunities or outcome measures, a priori. For example, discovery-learning of spelling-meaning regularities should expose learners to novel word-forms in con texts, where learning is supported through contextual inferences and co-occurrence with known words. In measuring learning outcomes of explicit and implicit instruction, learners’ ability to generalize newly acquired knowledge to unfamiliar items should be tested using both online and offline measures. This is because using offline measures may privilege explicit instruction, as participants have time to ac cess their declarative knowledge about phenomena that have been explicitly taught. Conversely, real-time reading comprehension measures may privilege implicit in struction because it is more likely to result in implicit, procedural knowledge than explicit instruction. Common question 3: How do I determine the effect of reading instructions? When measuring the effect of reading instructions, researchers should consider using both immediate and delayed post-tests. This is because ISLA research has shown that immediate uptake, consolidation, and retention of knowledge and skills can be affected differently by explicit and implicit instructions. The effect of explicit instructions, for instance, may be short-lived, compared to implicit instructions and discovery-learning. Common question 4: How can I best capture the development of reading knowledge and processing?

298 Irina Elgort

Finally, because reading in another language is a complex task involving a whole gamut of knowledge components, processes, and subskills that may be acquired to a different extent by individual learners, using multiple measures of knowledge and processing offers a more complete and nuanced picture of the effects of instruc tional approaches and covariates on the outcome measures (Kuperman et al., 2022). 7. Conclusions I started with an overview of reading as a complex multidimensional process that presents similar and distinct acquisition challenges to L1 and L2 learners. In light of empirical evidence that knowledge of regularities in the L2 writing system, orthography-to-phonology mapping, vocabulary, and grammar are necessary for higher-order cognitive processes of sense-making, it seems reasonable that ISLAR research should be concerned with devising effective instructional approaches that facilitate the development of this foundational knowledge and fluent access to this knowledge in reading. To this end, instructions on how to learn the target language from written L2 input need to be continuously fine-tuned and tested. A close look at data elicitation methods used in studies of learning to read and learning from reading showcased selected measures used to ascertain the effect of instructional approaches on the outcome variables, associated with component knowledge and processes in reading comprehension. Importantly, because read ing is a critical source of L2 knowledge, time-course-sensitive measures that show learning in progress are critical to understanding not only if, but how instructional approaches affect learning. I conclude with a further word of advice to future L2 reading researchers. When planning and designing ISLAR research, ask yourself, What component knowledge and skills are best taught directly and what needs to be learned indirectly, in order to facilitate L2 reading development? Many interesting research questions can be formulated in both of these research directions. For instance, researchers may investigate the effect of teaching different aspects of vocabulary, grammar, and writing system regularities on reading fluency. More research is needed on the effect of ease/difficulty of reading materials on learning to read in an L2 and of different support materials (e.g., dictionaries, glosses, illustrations, note-taking tools) on L2 reading comprehension. Further research is also needed on how we learn from reading, using research approaches and methods that record knowledge gains and processing changes in real time, such as eye-tracking. These data should be shared, where possible, with a wider SLA research community, to support the development of models of L2 reading. Lessons from previous meta-analyses of L2 reading studies underscore the importance of well-defined inclusion criteria based

Chapter 12. Reading 299

on quality of research design. Finally, ISLAR research would do well to maintain a balance between research that is immediately relevant and translatable into L2 teaching and research that uses experimental laboratory paradigms from cognitive psychology and psycholinguistics (e.g., priming, eye-movement, and ERPs) to study L2 reading comprehension, which contributes to L2 teaching in less direct but not less important ways. 8. Further reading and additional resources 8.1

Book-length volumes

Godfroid, A. (2020). Eye tracking in second language acquisition and bilingualism: A research synthesis and methodological guide. Routledge. https://doi.org/10.4324/9781315775616 A methodological guide for designing, conducting, and interpreting eye-tracking research in L2 studies. Grabe, W. (2009). Reading in a second language. Cambridge University Press. A detailed research-based overview of component knowledge and processes in L2 reading and their implications for ISLAR. Nation, I. S. P., & Waring, R. (2019). Teaching extensive reading in another language. Routledge. https://doi.org/10.4324/9780367809256

8.2

A comprehensive review of research and practice of extensive reading.

Suggested journals, professional organizations, and conferences

Society for the Scientific Study of Reading (Annual Conference); Scientific Studies of Reading (Taylor & Francis Ltd.) https://www.tandfonline.com/journals/hssr20 Reading in a Foreign Language (an open access journal published by the National Foreign Language Resource Center at the University of Hawai’i at Mānoa.) https://nflrc.hawaii.edu/rfl/ Reading and Writing: An Interdisciplinary Journal (Springer) https://www.springer.com/journal/ 11145/

References Alderson, J. C. (1984). Reading in a foreign language: A reading problem or a language prob lem? In J. C. Alderson & A. H. Urquhart (Eds.), Reading in a foreign language (pp. 1–27). Longman. Alderson, J. C. (2000). Assessing reading. Cambridge University Press. https://doi.org/10.1017/CBO9780511732935 Bernhardt, E. B. (2011). Understanding advanced second-language reading. Routledge. Bernhardt, E. B., & Kamil, M. L. (1995). Interpreting relationships between L1 and L2 read ing: Consolidating the linguistic threshold and the linguistic interdependence hypotheses. Applied Linguistics, 16(1), 15–34. https://doi.org/10.1093/applin/16.1.15

300 Irina Elgort

Boers, F., Bryfonski, L., Faez, F., & McKay, T. (2021). A call for cautious interpretation of me ta-analytic reviews. Studies in Second Language Acquisition, 43(1), 2–24. https://doi.org/10.1017/S0272263120000327 Boutorwick, T. J., Macalister, J., & Elgort, I. (2019). Two approaches to extensive reading and their effects on L2 vocabulary development. Reading in a Foreign Language, 31(2), 150–172. Brice, H., Siegelman, N., Van den Bunt, M., Frost, S., Rueckl, J., Pugh, K., & Frost, R. (2022). Individual differences in L2 literacy acquisition: Predicting reading skill from sensitivity to regularities between orthography, phonology, and semantics. Studies in Second Language Acquisition, 44(3), 737–758. https://doi.org/10.1017/S0272263121000528 Castles, A., Rastle, K., & Nation, K. (2018). Ending the reading wars: Reading acquisition from novice to expert. Psychological Science in the Public Interest, 19(1), 5–51. https://doi.org/10.1177/1529100618772271 Cobb, T. (2007). Computing the vocabulary demands of L2 reading. Language Learning & Technology, 11(3), 38–63. Cummins, J. (1979). Linguistic interdependence and the educational development of bilingual children. Review of Educational Research, 49(2), 222–251. https://doi.org/10.3102/00346543049002222 Day, R., & Park, J. (2005). Developing reading comprehension questions. Reading in a Foreign Language, 17(1), 60–73. Dirix, N., Vander Beken, H., De Bruyne, E., Brysbaert, M., & Duyck, W. (2020). Reading text when studying in a second language: An eye-tracking study. Reading Research Quarterly, 55(3), 371–397. https://doi.org/10.1002/rrq.277 Ehri, L. C., Nunes, S. R., Stahl, S. A., & Willows, D. M. (2001). Systematic phonics instruction helps students learn to read: Evidence from the National Reading Panel’s meta-analysis. Review of Educational Research, 71(3), 393–447. https://doi.org/10.3102/00346543071003393 Elgort, I. (2020). Building vocabulary knowledge from and for reading – ways to improve lex ical quality. In J. Clenton & P. Booth (Eds.), Vocabulary and the four skills (pp. 114–118). Routledge. https://doi.org/10.4324/9780429285400-12 Elgort, I., Beliaeva, N., & Boers, F. (2020). Trial-and-error and errorless treatments in contextual word learning. Studies in Second Language Acquisition, 42(1), 7–32. https://doi.org/10.1017/S0272263119000561 Elgort, I., Brysbaert, M., Stevens, M., & Van Assche, E. (2018). Contextual word learning dur ing reading in a second language: An eye-movement study. Studies in Second Language Acquisition, 40(2), 341–366. https://doi.org/10.1017/S0272263117000109 Elgort, I., & Warren, P. (2014). L2 vocabulary learning from reading: Explicit and tacit lexical knowledge and the role of learner and item variables, Language Learning, 64(2), 365–414. https://doi.org/10.1111/lang.12052 Elgort, I., & Warren, P. (in press). Studying L2 comprehension. In A. Godfroid & Hopp, H. The Routledge handbook of second language acquisition and psycholinguistics. Routledge. Ellis, N. C. (2015). Implicit AND explicit learning: Their dynamic interface and complexity. In Patrick Rebuschat (Ed.), Implicit and explicit learning of languages (pp. 1–24). John Benjamins. https://doi.org/10.1075/sibil.48.01ell Frenck-Mestre, C., & Pynte, J. (1997). Syntactic ambiguity resolution while reading in a second and native language. Quarterly Journal of Experimental Psychology A, 50(1), 119–148. https://doi.org/10.1080/027249897392251 Gerrig, R. J., & McKoon, G. (1998). The readiness is all: The functionality of memory-based text processing. Discourse Processes, 26(2–3), 67–86. https://doi.org/10.1080/01638539809545039

Chapter 12. Reading 301

Ghavamnia, M., Ketabi, S., & Tavakoli, M. (2013). L2 reading strategies used by Iranian EFL learners: A think-aloud study. Reading Psychology, 34(4), 355–378. https://doi.org/10.1080/02702711.2011.640097 Godfroid, A., Ahn, J., Choi, I., Ballard, L., Cui, Y., Johnston, S., Lee, S., Sarkar, A., & Yoon, H.-J. (2018). Incremental vocabulary learning in a natural reading context: An eye-tracking study. Bilingualism: Language & Cognition, 21(3), 563–584. https://doi.org/10.1017/S1366728917000219 Godfroid, A., Boers, F., & Housen, A. (2013). An eye for words: Gauging the role of attention in incidental L2 vocabulary acquisition by means of eye tracking. Studies in Second Language Acquisition, 35(3), 483–517. https://doi.org/10.1017/S0272263113000119 Goring, S. A., Schmank, C. J., Kane, M. J., & Conway, A. R. (2021). Psychometric models of in dividual differences in reading comprehension: A reanalysis of Freed, Hamilton, and Long (2017). Journal of Memory and Language, 119, 104221. https://doi.org/10.1016/j.jml.2021.104221 Gough, P. B., & Tunmer, W. E. (1986). Decoding, reading, and reading disability. Remedial and Special Education, 7(1), 6–10. https://doi.org/10.1177/074193258600700104 Grabe, W. (1991). Current developments in second language reading research. TESOL Quarterly, 25(3), 375–406. https://doi.org/10.2307/3586977 Grabe, W. (2014). Key issues in L2 reading development. In X. Deng & R. Seow (Eds.), 4th CELC Symposium Proceedings (pp. 8–18). National University of Singapore. Grabe, W., & Stoller, F. L. (2018). Building an effective reading curriculum: Guiding principles. In J. M. Newton, D. R. Ferris, C. M. Goh, W. Grabe, F. L. Stoller, & L. Vandergrift (Eds.), Teaching English to second language learners in academic contexts: Reading, writing, listening, and speaking (pp. 28–47). Routledge. https://doi.org/10.4324/9781315626949-4 Graesser, A. C., Millis, K. K., & Zwaan, R. A. (1997). Discourse comprehension. Annual Review of Psychology, 48(1), 163–189. https://doi.org/10.1146/annurev.psych.48.1.163 Hall, C., Roberts, G. J., Cho, E., McCulley, L. V., Carroll, M., & Vaughn, S. (2017). Reading in struction for English learners in the middle grades: A meta-analysis. Educational Psychology Review, 29(4), 763–794. https://doi.org/10.1007/s10648-016-9372-4 Hamada, A. (2020). Using meta-analysis and propensity score methods to assess treatment ef fects toward evidence-based practice in extensive reading. Frontiers in Psychology, 11, 617. https://doi.org/10.3389/fpsyg.2020.00617 Hamada, M., & Koda, K. (2008). Influence of first language orthographic experience on second language decoding and word learning. Language Learning, 58(1), 1–31. https://doi.org/10.1111/j.1467-9922.2007.00433.x Han, Z., & Anderson, N. J. (2009). Second language reading research and instruction: Crossing the boundaries. University of Michigan Press. Hardy, J. (2016). The effects of a short-term extensive reading course in Spanish. Journal of Extensive Reading, 4, 47–68. Hoover, W. A., & Gough, P. B. (1990). The simple view of reading. Reading and Writing: An Interdisciplinary Journal, 2(2), 127–160. https://doi.org/10.1007/BF00401799 Hu, M., & Nation, I. S. P. (2000). Vocabulary density and reading comprehension. Reading in a Foreign Language, 23, 403–430. Huang, H. (2013). E-reading and e-discussion: EFL learners’ perceptions of an e-book reading program. Computer Assisted Language Learning, 26(3), 258–281. https://doi.org/10.1080/09588221.2012.656313

302 Irina Elgort

Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434–446. https://doi.org/10.1016/j.jml.2007.11.007 Jeon, E. H. & Yamashita, J. (2022). L2 reading comprehension and its correlates: An updated meta-analysis. In E.H. Jeon & Y. In’nami (eds.), Understanding L2 Proficiency: Theoretical and meta-analytic investigations (pp. 29–86). Benjamins. https://doi.org/10.1075/bpa.13 Katz, L., & Frost, R. (1992). Reading in different orthographies: The orthographic depth hypoth esis. In R. Frost & L. Katz (Eds.), Orthography, phonology, morphology, and meaning (pp. 67–84). North Holland. https://doi.org/10.1016/S0166-4115(08)62789-2 Kim, Y. (2017). Why the simple view of reading is not simplistic: Unpacking component skills of reading using a direct and indirect model of reading (DIER). Scientific Studies of Reading, 21(4), 310–333. https://doi.org/10.1080/10888438.2017.1291643 Kintsch, W. (1998). Comprehension: A paradigm for cognition. Lawrence Erlbaum Associates. Klingner, J. K., Artiles, A. J., & Barletta, L. M. (2006). English language learners who struggle with reading. Journal of Learning Disabilities, 39(2), 108–128. https://doi.org/10.1177/00222194060390020101 Koda, K. (2005). Insights into second language reading. Cambridge University Press. https://doi.org/10.1017/CBO9781139524841 Kramer, B., & McLean, S. (2019). L2 reading rate and word length: The necessity of charac ter-based measurement. Reading in a Foreign Language, 31(2), 201–225. Kuperman, V., Siegelman, N., Schroeder, S., Acartürk, C., Alexeeva, S., Amenta, S., Bertram, R., Bonandrini, R., Brysbaert, M., Chernova, D., Da Fonseca, S. M., Dirix, N., Duyck, W., Fella, A., Frost, R., Gattei, C., Kalaitzi, A., Lõo, K., Marelli, M., … Usal, K. A. (2022). Text reading in English as a second language: Evidence from the multilingual eye-movements corpus. Studies in Second Language Acquisition. Advance online publication. https://doi.org/10.1017/S0272263121000954 Kuperman, V., & Van Dyke, J. A. (2011). Effects of individual differences in verbal skills on eye-movement patterns during sentence reading. Journal of Memory and Language, 65(1), 42–73. https://doi.org/10.1016/j.jml.2011.03.002 Laufer, B. (1989). What percentage of text-lexis is essential for comprehension? In C. Laurén & M. Nordman (Eds.), Special language: From humans thinking to thinking machines (pp. 316–323). Multilingual Matters. Laufer, B. (2003). Vocabulary acquisition in a second language. Do learners really acquire most vocabulary by reading? Some empirical evidence. The Canadian Modern Language Review, 59(4), 567–587. https://doi.org/10.3138/cmlr.59.4.567 Laufer, B. (2009). Second language vocabulary acquisition from language input and from form-focused activities. Language Teaching, 42(3), 341–354. https://doi.org/10.1017/S0261444809005771 Li, J.-T., Tong, F., Irby, B. J., Lara-Alecio, R., & Rivera, H. (2021). The effects of four instructional strategies on English learners’ English reading comprehension: A meta-analysis. Language Teaching Research. Advance online publication. https://doi.org/10.1177/1362168821994133 Lin, L.-C., & Yu, W.-Y. (2015). A think-aloud study of strategy use by EFL college readers reading Chinese and English texts. Journal of Research in Reading, 38(3), 286–306. https://doi.org/10.1111/1467-9817.12012 Marsden, E., Thompson, S., & Plonsky, L. (2018). A methodological synthesis of self-paced read ing in second language research. Applied Psycholinguistics, 39(5), 861–904. https://doi.org/10.1017/S0142716418000036

Chapter 12. Reading 303

Mézière, D., Yu, L., Reichle, E., von der Malsburg, T., & McArthur, G. (2021). Using eye-move ments to predict performance on reading comprehension tests. 34th Annual CUNY Conference on Human Sentence Processing. Retrieved on 2 June 2022 from https://www. cuny2021.io/wp-content/uploads/2021/02/CUNY_2021_abstract_92.pdf Mulder, E., van de Ven, M., Segers, E., Krepel, A., de Bree, E. H., de Jong, P. F., & Verhoeven, L. (2021). Word-to-text integration in English as a second language reading comprehension. Reading & Writing, 34(4), 1049–1087. https://doi.org/10.1007/s11145-020-10097-3 Nation, P. (2006). How large a vocabulary is needed for reading and listening? The Canadian Modern Language Review, 63(1), 59–82. https://doi.org/10.3138/cmlr.63.1.59 Nation, P. (2007). The four strands. Innovation in Language Learning and Teaching, 1(1), 1–12. https://doi.org/10.2167/illt039.0 Nation, P. (2009). Teaching ESL/EFL reading and writing (Vol. 10). Routledge. Nation, P. (2012). What does every ESL teacher need to know? Compass Media. Pellicer-Sánchez, A. (2016). Incidental L2 vocabulary acquisition from and while reading: An eye-tracking study. Studies in Second Language Acquisition, 38(1), 97–130. https://doi.org/10.1017/S0272263115000224 Perfetti, C. A. (2007). Reading ability: Lexical quality to comprehension. Scientific Studies of Reading, 11(4), 357–383. https://doi.org/10.1080/10888430701530730 Perfetti, C., & Stafura, J. (2014). Word knowledge in a theory of reading comprehension, Scientific Studies of Reading, 18(1), 22–37. https://doi.org/10.1080/10888438.2013.827687 Pulido, D. (2007). The effects of topic familiarity and passage sight vocabulary on L2 lexical inferencing and retention through reading. Applied Linguistics, 28(1), 66–86. https://doi.org/10.1093/applin/aml049 Rastle, K., Lally, C., Davis, M. H., & Taylor, J. S. H. (2021). The Dramatic impact of explicit in struction on learning to read in a new writing system. Psychological Science, 32(4), 471–484. https://doi.org/10.1177/0956797620968790 Rayner, K. (2009). Eye-movements and attention in reading, scene perception, and visual search. The Quarterly Journal of Experimental Psychology, 62(8), 1457–1506. https://doi.org/10.1080/17470210902816461 Rebuschat, P., & Williams, J. N. (Eds.). (2012). Statistical learning and language acquisition. Mouton de Gruyter. Ricketts, J., Lervåg, A., Dawson, N., Taylor, L. A., & Hulme, C. (2020). Reading and oral vocabu lary development in early adolescence. Scientific Studies of Reading, 24(5), 380–396. https://doi.org/10.1080/10888438.2019.1689244 Sato, M., & Loewen, S. (2019). Towards evidence-based second language pedagogy: Research proposals and pedagogical recommendations. In M. Sato & S. Loewen (Eds.), Evidencebased second language pedagogy: A collection of Instructed Second Language Acquisition studies (pp. 1–24). Routledge. https://doi.org/10.4324/9781351190558-1 Southwell, R., Gregg, J., Bixler, R. and D’Mello, S. K. (2020). What eye-movements reveal about later comprehension of long-connected texts. Cognitive Science, 44(10), e12905. https://doi.org/10.1111/cogs.12905 Torgerson, C., Brooks, G., & Hall, J. (2006). A systematic review of the research literature on the use of phonics in the teaching of reading and spelling (Research Report RR711). U.K. Department for Education and Skills. Retrieved on 2 June 2022 from http://wsassets.s3. amazonaws.com/ws/nso/pdf/f211f7a331f7a210fa7b64d0855f2afc.pdf

304 Irina Elgort

van Hell, J. G., & Dijkstra, T. (2002). Foreign language knowledge can influence native lan guage performance in exclusively native contexts. Psychonomic Bulletin & Review, 9(4), 780–789. https://doi.org/10.3758/BF03196335 van Heuven, W. J. B., Schriefers, H., Dijkstra, T., & Hagoort, P. (2008). Language conflict in the bilingual brain. Cerebral Cortex, 18(11), 2706–2716. https://doi.org/10.1093/cercor/bhn030 Van der Schoot, M., Vasbinder, A. L., Horsley, T. M., Reijntjes, A., & van Lieshout, E. C. (2009). Lexical ambiguity resolution in good and poor comprehenders: An eye fixation and selfpaced reading study in primary school children. Journal of Educational Psychology, 101(1), 21–36. https://doi.org/10.1037/a0013382 Vander Beken, H., & Brysbaert, M. (2018). Studying texts in a second language: The importance of test type. Bilingualism: Language and Cognition, 21(5), 1062–1074. https://doi.org/10.1017/S1366728917000189 VanPatten, B. (1996). Input processing and grammar instruction in second language acquisition. Ablex. Verhoeven, L., & Perfetti, C. A. (Eds.). (2017). Learning to read across languages and writing systems. Cambridge University Press. https://doi.org/10.1017/9781316155752 Verhoeven, L., & van Leeuwe, J. (2012). The simple view of second language reading throughout the primary grades. Reading and Writing, 25(8), 1805–1818. https://doi.org/10.1007/s11145-011-9346-3 Yamashita, J. (2002). Mutual compensation between L1 reading ability and L2 language profi ciency in L2 reading comprehension. Journal of Research in Reading, 25(1), 81–95. https://doi.org/10.1111/1467-9817.00160 Yamashita, J. (2013). Effects of extensive reading on reading attitudes in a foreign language. Reading in a Foreign Language, 25(2), 248–263.

Chapter 13

Writing Researching L2 writing as a site for learning in instructed settings Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

Georgetown University / Universidad de Murcia / Michigan State University

This chapter first provides a brief overview of the current state of L2 writing re search conducted in or relevant for L2 learning programs in three areas: (1) L2 learning processes and outcomes associated with instructional interventions via the manipulation of tasks; (2) L2 learning processes and outcomes of class room learners’ processing of written corrective feedback; and (3) written L2 development in instructed settings. We then analyze key methodological consid erations associated with empirical research in these strands, together with a dis cussion of ways of assessing and interpreting the results from the perspective of language learning. In the final part we discuss methodological challenges facing future empirical investigations in each of the three areas identified and provide suggestions for addressing these challenges. Keywords: L2 writing, L2 writing tasks, feedback, written corrective feedback, written language development, CAF measures

1. What is writing, and why is it important in ISLA? L2 writing is one of the major components of most language curricula given its potential for communicative language use and associated learning processes. Given this importance, a recent call has been made to conduct L2 writing research from an instructed second language acquisition (ISLA) perspective (Leow, 2020; Manchón & Leow, 2020). We have selected three areas that can be characterized as ISLA-oriented research because of their relevance to L2 instruction: (1) language learning processes and outcomes resulting from instructional interventions – spe cifically, the manipulation of writing tasks; (2) language learning processes and outcomes of written corrective feedback (WCF) processing and use; and (3) lex ical and grammatical development of writing in instructed settings. These areas https://doi.org/10.1075/rmal.3.13leo © 2022 John Benjamins Publishing Company

306 Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

illustrate the range of methodological choices involved in conducting cognitive ISLA-oriented research on L2 writing. First, we report on the pedagogically-relevant knowledge gathered in the three cognitively-oriented research areas along with empirical questions to be addressed in future work. We then discuss key methodological considerations associated with empirical research in these strands. Finally, we discuss methodological challenges facing empirical investigations in each of the three areas identified and provide suggestions for addressing these challenges. 2. What we know and what we need to know about L2 writing in ISLA 2.1

Learning processes and outcomes through the manipulation of writing tasks

2.1.1 What we know This strand has addressed the L2 learning outcomes and (to a lesser extent) pro cesses associated with the manipulation of task-related dimensions, including task implementation conditions (e.g., oral vs. written mode, individual vs. collaborative, accessibility to external sources), medium (paper-based or computer-based), and task complexity factors. From an ISLA perspective, research needs to shed light on outcomes related to L2 learning processes (such as noticing, focus on form, or hypothesis testing), the characteristics of the texts written under diverse task conditions, and the affordances of engaging in these diverse writing conditions in terms of either expansion or consolidation of L2 knowledge. Task modality and task complexity studies have produced some findings with implications for L2 instruction. It has been found that the written mode gener ally brings about the use of more accurate and at times more complex language (see Manchón & Vasylets, 2019; Polio, 2022; Vasylets & Gilabert, 2022 for detailed discussions), as well as longer-lasting effects on the learning of new grammatical forms. Furthermore, research points to benefits for grammar learning in writing (e.g., Zalbidea, 2020a) and for vocabulary learning in speaking (e.g., Sánchez et al., 2020; Vasylets et al., 2020). Yet, Laufer and Hulstijn (2001) suggested that writing should facilitate vocabulary acquisition better than speaking, but again, the pic ture is complex, as explained by Kyle (2022). Additionally, increased task complex ity appears to induce attention to writing processes (especially formulation and monitoring, Johnson, 2017) and to enhance the complexity, accuracy, and fluency (CAF) of written performance.

Chapter 13. Writing 307

These research findings should be viewed cautiously due to limited research and some important methodological considerations, including variation in how studies have operationally defined task complexity, tasks used, participant charac teristics, study designs, and production measures used (Johnson, 2017; Manchón & Vasylets, 2019). 2.1.2 What we need to know As previously mentioned, pedagogically-relevant knowledge in the domain relates to the dimensions of (1) learning processes being fostered or implemented in var ious task conditions; (2) the characteristics of the texts written under diverse task conditions; and (3) the more global affordances of engaging in diverse writing task conditions for expanding and/or consolidating L2 knowledge. Current knowledge on (1) and (3) is limited. Regarding (1), although the overall conclusion is that more complex tasks induce higher levels of processing, few studies have produced detailed analyses of these potential variations in processing (notable exceptions are Révész et al.’s, 2017 study of online behavior and writing process as mediated by task complexity factors, and Michel et al.’s, 2020 study of writing processes in independ ent/integrated tasks). Therefore, more research in this domain is needed, including studies looking into correlations between processes and learning outcomes (see suggestions in Manchón & Leow, 2020). Future work should prioritize the iden tification and analysis of task complexity factors that are instructionally relevant (see Johnson, 2017, for a systematic analysis of task complexity dimensions) instead of the existing theory-testing aims (i.e., providing evidence for diverse theoretical positions on task complexity). Furthermore, predictions regarding differential learning outcomes (grammar vs. lexis) across modalities should be put to the empirical test. Readers are referred to Polio (2020) and Schmitt (2020) for comprehensive, pedagogically-oriented, forward-looking research agendas on grammar and vocabulary learning through writing, and to Vasylets and Gilabert (2022) for recommendations regarding com bining oral and writing components in task designs. Future studies should also look more closely at the interaction between task-related and learner-related var iables in bringing about learning through writing. Such interaction has already been underscored with diverse populations (younger/older, more/less proficient) in task repetition (e.g., Sánchez et al., 2020), task complexity (Cho, 2018; Michel et al., 2019; Révész et al., 2017), and task modality (e.g., Zalbidea, 2017; Zalbidea & Sanz, 2020) studies.

308 Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

2.2

Learning through written corrective feedback

2.2.1 What we know Studies have approached the role of WCF from several perspectives (e.g., Ellis, 2009). For example, studies have compared types of WCF varying in directness (e.g., direct, indirect, metalinguistic) and amount or scope of feedback (e.g., focused, unfocused). While inconsistent findings have been reported for both types of WCF on subsequent L2 development, an overall more positive result has been found for focused feedback (see Leow, 2020, for a review). For studies investigating WCF and collaborative writing conditions, studies appear to indicate that collaboration produces better results (e.g., Kim & Emeliyanova, 2021). Methodologically, while the majority of studies have adopted a product-oriented approach using written compositions (product) to address the effects of WCF, some recent studies have adopted a process-oriented approach (e.g., Caras, 2019; Manchón et al., 2020) employing instruments (e.g., think-aloud protocols, written languaging) to gather concurrent data on L2 writers’ processing and processes (e.g., levels of awareness, depth of processing, strategies, etc.) as they interact with the WCF provided. There has also been an uptick in recent studies addressing the effectiveness of computer-mediated programs designed to provide concurrent WCF to L2 writers’ written texts. These studies have also followed product-oriented (e.g., Zhang, 2020) and/or process-oriented (e.g., Ranalli, 2021) perspectives that employed a combination of eye-tracking, screen-capture, stimulated recalls, and interviews. The product-oriented digital studies have mainly focused on improve ment in writing quality while the process-oriented ones have focused on changes in drafts, revision strategies, language proficiency, individual differences (IDs) po tential for learning and trust in Automated Writing Evaluation. Recent studies have also situated their research designs within an existing language curriculum (e.g., Amelohina et al., 2020; Caras, 2019; Coyle et al., 2018; Leow et al., in press). Findings from these longitudinal and quasi-experimental studies likely have more robust pedagogical ramifications for the classroom when compared to one-shot, controlled lab-based WCF studies. 2.2.2 What we need to know There are several areas in the WCF strand of research that warrant future investiga tion. First, the construct L2 learning, in many cases associated with error correction or grammatical accuracy, is vague in its operationalization. A more defined opera tionalization clearly warrants serious consideration. Second, future classroom-ori ented research should continue addressing the impact of WCF on different types of linguistic items in designs that do not remove the ecological validity of the writ ing task. Third, there is a clear need for more process-oriented studies to better

Chapter 13. Writing 309

understand how L2 writers process WCF during both composing and revising stages of writing and the relationship between such processes and subsequent language learning (Leow & Manchón, 2022). A triangulation of data elicitation procedures can address potential limitations of current approaches (Leow et al, 2014; Révész & Michel, 2019). Fourth, WCF studies also need to situate their designs within a language curriculum. This requisite for pedagogical ramifications falls in line with one of the two ISLA sub-strands proposed by Leow (2019), namely, ISLA applied. ISLA applied refers to studies situated within the language curriculum that seek to inform pedagogical practice via pedagogical intervention, as opposed to applied ISLA, which are studies that investigate the many variables in the instructed setting without any proposed pedagogical extrapolations or link to the language curriculum. Embedding the research design within the syllabus logically leads to a re-examina tion of the use of a one-shot or laboratory-based design and instead opts for longer period quasi-experimental designs that would allow the investigation of writing processes employed by L2 writers as they perform writing activities over different time spans (Leow, 2020; Leow et al., in press). Such a design (crucial for ISLA applied studies) allows pedagogical contributions to be aligned with the writing component of the language curriculum to derive maximum benefits from theory and research. Fifth, WCF studies should pursue further investigations into the roles of IDs (e.g., Li & Roshan, 2019), type of linguistic item (e.g., Caras, 2019), and level of proficiency (e.g., Park & Kim, 2019), variables that may more clearly illuminate potential rela tionships between L2 writers and the provision and use of WCF. 2.3

Written L2 development in instructed settings

2.3.1 What we know A number of studies have tracked written L2 development in language or writing classes in an attempt to describe changes in the L2 (e.g., Menke & Strawbridge, 2019), to examine the influence of an instructional program (e.g., Yasuda, 2011), or to compare interventions (e.g., McDonough & De Vleeschauwer, 2019) or writ ing conditions or tasks (e.g., Yoon & Polio, 2017). While many studies focus on short-term learning, we consider here only longitudinal studies, specifically those that track students’ progress over time, whether it be short (e.g., over a month) or long (e.g., over a year). Given the variation in type of instruction, participant variables, length of study and, to a limited extent, target language, it is difficult to make generalizations, but Polio (2022) elaborates on conclusions and lack thereof regarding written language development. First, although global aspects of writing as measured on rubrics often improve, accuracy, for example, tends to remain stable regardless of the length of instruction, measures used, or participants’ L2 (e.g., Polio & Shea, 2014; Serrano,

310 Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

2011; Yoon & Polio, 2017). However, certain participant and contextual variables, such as age or amount of input, may have a greater influence than instruction alone. For example, a few studies have shown some development for either children and adolescents (e.g., Pérez-Vidal & Roquet, 2015) or students studying abroad (e.g., Godfrey et al., 2014). Second, despite a lack of improvement on various complexity and accuracy measures, some studies have shown development on specific features in academic writing, such as noun phrase complexity (Mazgutova & Kormos, 2015) or noun and adjective that-clauses (Man & Chau, 2019). 2.3.2 What we need to know Empirical studies focus on small interventions but rarely on the entire curriculum, and, hence, few directly consider the influence of the curriculum or instruction on language development (see Yasuda, 2011, for a notable exception). Therefore, larger scale studies that include classroom observations are needed as well as those that focus on students’ and teachers’ roles in promoting written language learning. First, it is not clear how students’ and teachers’ goals affect written language development, but using the framework of goal theory for academic writing (Cumming, 2006), Zhou et al. (2014) found that students’, ESL instructors’, and university instructors’ goals for language development did not match: Students focused on error-free writing whereas instructors of ESL and university content courses emphasized grammatical complexity, stylistic appropriateness, and clear ideas. Lim et al. (2021) came to similar conclusions. Yet, neither study was able to directly link the role of goals and written L2 development. Second, many studies on development imply that the classroom is a place for L2 learning but treat the curriculum (and the classroom) as a black box, meaning that there is no clear picture of what is happening inside. Yasuda (2011) and Harman (2013) are inter esting models that could be used to study a wider range of L2 features. Finally, we need to better understand how written L2 develops in realistic writing conditions, and this will require specific data collection methods. For example, Khuder and Harwood (2015) studied writers’ processes by giving them unlimited time and access to internet sources while using keystroke logging software to collect the data. Another example is Li and Schmitt’s (2009) case study of the development of formulaic sequences, which used interviews to tap into one writer’s longitudinal use of formulaic language.

Chapter 13. Writing 311

3. Data elicitation and interpretation options with example studies ISLA-oriented L2 writing studies include a variety of research designs. Table 1 in cludes representative examples from the three areas of research covered in this chapter. We do not have space to elaborate on each of the designs, but we note that there is some overlap among the categories. Table 1. Designs in ISLA-oriented L2 writing studies Design

Pretest-posttest(delayed posttest) designs

Representative examples Tasks

WCF

Written language development

Zalbidea (2020a, 2020b); Zalbidea & Sanz (2020)

Caras (2019)

Yasuda (2011)

Repeated-measures Cho (2018); Kang & Lee (2019); (counterbalanced) Vasylets et al. (2017, 2019); designs Zalbidea (2017)

Sachs & Polio Yoon & Polio (2017) (2007)

Correlational

Leijten et al. (2019); Michel et al. (2019)

Manchón et al. (2020)

Quasi- and true experimental

Ong (2014); Ong & Zhang (2010) Hartshorn et al. (2010)

Roquet & Pérez-Vidal (2017)

Descriptive/ Observational

Azkarai & García-Mayo (2015); García-Mayo & Azkarai (2016); Knospe et al. (2019)

Ferris et al. (2013)

Menke & Strawbridge (2019)

Case study

Zheng & Yu (2018)

Chan et al. (2015); Li & Schmitt (2009)

Ethnography

Smith et al. (2017)

Harman (2013)

3.1

Wijers (2018)

Data elicitation and data types

3.1.1 Process data ISLA-oriented writing research has looked into diverse processing phenomena, including cognitive processes while writing and processing feedback, the nature and resolution of language-related episodes (LREs) while performing tasks, and writers’ online behaviors (especially pauses and fluency) and writing processes as a function of task-related factors. Additionally, in task complexity studies, there has been a gradual increase in the interest in writers’ own perceptions of the cognitive complexity of tasks, both as research in itself (e.g., Cho, 2018; Zalbidea, 2020b)

312 Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

and as a methodological procedure to measure or validate the task complexity manipulation (e.g., Révesz et al., 2017). In the case of WCF processing, there are recent attempts to gather concurrent data on the cognitive processes employed while interacting with WCF in addition to probing deeper into depths of processing to shed light on what and how L2 writers process during the revision phase (e.g., Caras, 2019; Leow et al., in press; Manchón et al., 2020). The research instruments and techniques used to access this range of process ing phenomena, shown in Table 2, include introspective techniques (e.g., thinkaloud protocols, collaborative dialogue, written languaging) and retrospective techniques (e.g., interviews, questionnaires, stimulated recall). Procedures that capture minute-to-minute writing actions at the point of inscription include key stroke logging tools, screen capture technologies, and eye-tracking. We focus here on introspective verbal protocols (i.e., think alouds) and retrospective verbal pro tocols (i.e., stimulated recall). Table 2. Data collection instruments in process studies

Tasks

WCF

Development

Think alouds

Caras (2019); Park & Kim (2019)

Stimulated recall

Kang & Lee (2019); Zalbidea (2020b)

Koltovskaia (2020)

Sasaki (2004); Sasaki et al. (2018)

Keystroke logging Knospe et al. (2019); Michel et al. (2020)

Miller et al. (2011)

Eye-tracking

Michel et al. (2020); Révész et al. (2017)

Ranalli (2021)

Screen capture

Knospe et al. (2019); Smith et al. (2017)

Koltovskaia (2020); Ranalli (2021)

Collaborative dialogue

García-Mayo & Azkarai (2016); Kessler et al. (2020)

Coyle et al. (2018)

Interviews & questionnaires

Cho (2018); Smith et al. (2017)

Han & Hyland Li & Schmitt (2015); Zhang (2020) (2009)

Video recordings

Smith et al. (2017)

In a think-aloud (TA) task, participants verbalize what they are thinking as they compose or engage with WCF, which may be metacognitive TA (requiring partici pants to provide metalinguistic explanations for their TAs), or non-metacognitive TA (only requiring participants to think aloud). There are two major concerns about TAs. First, they may change the writing process or written product (i.e., the reactivity problem). Second, the verbalization may not provide a full picture of what the writer is doing (i.e., the veridicality problem). In addition, not everyone can verbalize what they are thinking as they write, especially if the thinking aloud

Chapter 13. Writing 313

is done in the L2. Moreover, the amount and type of training given to participants may affect how and what they verbalize. Stimulated recalls (SR) are a type of retrospective verbal report that gathers concurrent data after participants have completed a task. To access such data, re searchers show participants some type of recording or artifact of task completion (e.g., screen recording, audio/video recording of the stimulus). Participants are then requested to report what they were thinking at that point in time during their interaction with the stimulus. Given the post-exposure stage at which these data are gathered, SR has been critiqued for its potential for veridicality (like the TA protocol) or memory decay and double exposure to the L2 input (Leow & Bowles, in press). SRs can provide insights into L2 writers’ noticing of target L2 data and potential strategies employed during composing or revising, text production pro cesses, their personal perspective toward the topic of their composition, and their overall writing experience. Although both TAs and SRs are performed orally, par ticipants in some WCF studies have also provided their recall comments in writing in the form of written languaging (e.g., Suzuki, 2012). As suggested by Leow and Bowles (in press), to minimize potential threats to the validity of TAs and SR, best practices for TAs, as described in Bowles (2010), and for SR, as described by Gass and Mackey (2016), should be closely followed. The issue of reactivity (TA) can be addressed by including in the research design a control (non-TA) group while participants should also be allowed to complete the task at their own pace. For SR, the issue of memory decay may be minimized by obtaining the data within a short time lapse after exposure to the stimulus. In addition, the stimulus should be as robust as possible with multimodal input (e.g., screen captures, videos, or keystroke logging tools) to remind participants of their processes while they were completing the task. 3.1.2 Product data Although product data is more straightforward than process data, there are many choices that researchers have to make when eliciting texts from learners. Writing prompts can vary according to task, topic, genre, intended audience, and modality, with several variations within each variable. For example, a prompt might be “write an email (modality) to your university president (audience) suggesting (genre) that winter break (topic) be extended.” Conditions of the data elicitation can also vary according to length of time-on-task, access to sources, and collaborative versus individual composition. Each of these choices depends on the context, design, and focus of the study. For example, in a study of how a writing task affects language use, the task would be dictated by the research questions (e.g., studies of task complexity), and it would be important to control as many variables as possible, so limiting time and access to resources might be important. The downside of this is that by controlling so many

314 Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

factors, researchers are likely not replicating how students write under real-life conditions. Sometimes the focus of the study is the conditions and not the prompt, such as in studies of planning time or collaborative writing. In these cases, choosing a prompt is not straightforward, but one solution is to choose a type of writing that students might have to do in class or outside of the classroom. 3.2

Data analysis

Both process and product data can be coded and analyzed in a number of ways. We outline some approaches below and provide examples of studies that might be helpful in understanding the options. 3.2.1 Process data To code and operationalize learners’ processing, recent studies (e.g., López-Serrano et al., 2019; Park & Kim, 2019) have drawn from Leow’s (2015) definition of depth of processing (DoP) as “the relative amount of cognitive effort, level of analysis and elaboration of intake, together with usage of prior knowledge, hypothesis testing and rule formation employed in decoding and encoding some grammatical or lex ical item in the input” (p. 204) together with his three levels of DoP (low, medium, high) also aligned with levels of awareness (noticing with a low DoP, reporting with a medium DoP, and + understanding [the underlying rule] with a high DoP) and accompanied by a description of the levels and their corresponding descriptors. Park and Kim (2019) modified the original three-level coding scheme to separate DoP (minimum, low, high), cognitive effort (minimum, low, high), and level of awareness (+Understanding), -Understanding), while Cerezo et al. (2019) proposed five levels of DoP. Further validation of DoP is clearly warranted. Another development in process data coding and analysis is the identification and analysis of LREs, that is, “any segment of the protocol in which a learner either (1) spoke about a language problem they encountered while writing and solved it either correctly or incorrectly or left it unresolved or (2) simply solved it without having explicitly identified it as a problem” (Swain & Lapkin, 1995, p. 378). LREs are identified by segmenting introspection data (from think alouds or collaborative dialogues) and locating instances in which writers discuss language-related issues. Research on collaborative writing has focused primarily on the amount, linguistic focus, and resolution of LREs, in addition to issues pertaining to patterns of and language used in the interaction while dealing with LREs (e.g., García-Mayo & Azkarai, 2016). López-Serrano et al. (2019) constitutes the most detailed theoretically-motivated and empirically-based coding scheme of the language processing activity of L2 writ ers in individual writing conditions. Their detailed analysis of the methodological procedure followed to set up the coding scheme can be useful to future researchers

Chapter 13. Writing 315

embarking on the analysis of LREs in think-aloud protocols. These researchers added new dimensions to the analysis of LREs as their coding identified the categories of linguistic focus, resolution categories, strategies used to solve the problems experi enced, and the DoP and orientation of the L2 writers’ strategic behavior. 3.2.2 Product data Traditionally, measures of language development have included assessments of complexity (both syntactic and lexical), accuracy, and fluency, or the so-called CA(L)F measures (see also Chapter 14, this volume, for spoken language). They came under scrutiny in Wolfe-Quintero et al. (1998) who examined how well the measures were associated with change over time or with proficiency differences. More recently, Polio and Friedman (2017) summarized the various approaches to measuring textual features of learners’ writing. Some of the constructs, such as accuracy, are relatively easy to define (e.g., the absence of error) but are still challenging to measure, as discussed in Polio and Shea (2014). Lexical complexity has been approached from a variety of perspec tives, including diversity and sophistication. Crossley and Kyle’s linguistic analysis website (http://www.kristopherkyle.com/tools.html) includes tools for various lex ical measures along with useful references. Syntactic complexity was originally measured by considering the length of sentences or T-units and subordination, but beginning with Lu (2011), who created the syntactic complexity analyzer, re searchers began to use a wider range of measures, including those focusing on coordination and various phrasal measures. Kyle et al. (2021) have extended the construct of complexity by also considering the use of specific verbs in relation to the frequency of associated verb-argument constructions. Cohesion and specific grammatical constructions have also been examined in learners’ written products particularly if the focus of the study is on an intervention. Fluency has traditionally been measured by counting the number of words that learners write in a given time, but this construct has been re-evaluated, and now fluency is considered to be multi-dimensional (Van Waes & Leijten, 2015) and based on process-related measures associated with pausing and revision behaviors. 3.3

Data interpretation

ISLA-oriented L2 writing research should be primarily concerned with the extent to which instructional manipulations of writing conditions (including the provi sion of WCF) lead to language learning gains. Therefore, a crucial methodological concern in this research is precisely how to operationalize and validate measures of learning. We next analyze some selected issues in current interpretations of what constitutes learning through writing.

316 Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

3.3.1 Interpreting the data The interpretation of WCF data in relation to what constitutes learning is quite challenging. This is due primarily to the operationalization of the construct of learn ing, together with the type of research designs employed in many previous WCF studies and the theoretical underpinning of WCF. Learning is typically subsumed in an overall report of several aspects of grammatical knowledge or global accuracy rate, which is quite inadequate in pinpointing specifically what L2 writers learned through exposure to WCF. Grammatical accuracy of specific linguistic items is usu ally addressed in focused WCF designs that do not hold much ecological validity in a classroom setting. Some researchers (e.g., Cerezo et al., 2019; Coyle et al., 2018; Manchón et al., 2020) have also viewed learning more as a developmental phenomenon that allows the construct of learning to subsume partial learning of target or non-target features of the L2 in subsequent production. This partial learning in some cases has been referred to as uptake, which is operationalized and defined broadly, following Lyster and Ranta (1997), as learner responses (usually some form of “modified” or “pushed” output) to oral recasts, which are implicit corrections that provide L2 learners with the correct target item in a reformulation of their production. With respect to type of research design (pretest – WCF – posttest – delayed posttest), learning via WCF can be viewed from two perspectives: Accuracy on immediate posttests may reflect non-systematized knowledge while accuracy on delayed posttests may provide more robust evidence of changes in the L2 system or systematized knowledge (see Leow, 2020 below). A recent theoretical framework, Leow’s (2020) feedback processing framework provides a cognitive explanation for the role of WCF in L2 development in direct relation to how L2 writers process such feedback. Of importance is the postulation that learning, or what is internalized, may not be necessarily accurate, and accu rate restructuring may co-exist with previous inaccurate knowledge. According to Leow, old (or inaccurate) output is interpreted to represent a potential absence or low depth of prior processing of the WCF provided or not much confidence in the newly restructured knowledge if the feedback was indeed internalized. New or modified output is interpreted to represent the learner’s production of the restruc tured L2 and assumed to represent the L2 knowledge the learner has at that point in time in their internal system. However, to address whether a complete accurate restructuring did take place (as in system learning) or whether such restructuring was temporary, immediate, or reflective of item learning, delayed posttests or data on subsequent performances are strongly recommended. One of the challenges in interpreting data from writers’ texts is also under standing what counts as evidence of learning. For example, fluency, in terms of the number of words written in a given time, may improve without any other indices

Chapter 13. Writing 317

improving (e.g., Yoon & Polio, 2017). It seems that this should be evidence of learn ing of some kind. Indeed, Norris and Ortega (2003) commented on emergentist and usage-based approaches to SLA stating that ‘‘Acquired, for emergentists, means fast, accurate, effortless performance attained along attested learning curves that reflect nonlinear, exemplar-driven learning’’ (p. 728). Thus, if one can produce lan guage more quickly, there is evidence of learning. Norris and Ortega also mention accuracy, which may seem uncontroversial, and although it tends to be of primary concern in WCF studies, many studies of development omit the construct and fo cus more on complexity. It is also important to note that many studies of writing, particularly collaborative writing, work within a sociocultural theory framework where evidence of learning is often found in interactions with others as opposed to in writers’ texts (e.g., Lantolf et al., 2015). As learning becomes internalized, stu dents are able to perform a task with less scaffolding from others, so the presence or absence of a structure alone does not capture learning. Evidence of learning is viewed from an interactional perspective thus requiring more than text data to observe learning. 4. Advice for future writing researchers Although several suggestions have been put forward in previous sections, we provide next a selected analysis of what in our view constitute key methodological consid erations in future research agendas in the three research domains reviewed above: (1) Future work must move from theory-testing research to zoom into task complexity considerations in pedagogically-relevant ways: How writers con ceptualize the task at hand may or may not correspond to researchers’ oper ationalization of task complexity. This potential discrepancy has important methodological implications that have not received sufficient scholarly atten tion so far. (2) Future work needs to investigate task engagement and task completion in the real time-distributed nature of writing within and across writing tasks (see Smith et al. 2017, for a representative example) and look into the effects of the instructional manipulation of tasks across the time-distributed nature of task-based language teaching-oriented curricular practices. (3) The role of IDs has to be made more central. Importantly, the growing research on cognitive IDs (such as working memory or language aptitude) needs to be expanded with studies that focus on affective IDs (see Papi, 2022), including issues of anxiety and task motivation on the part of learners (and potential correlations with task engagement and resulting learning) and of teachers

318 Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

(focusing on considerations on teachers’ motivational strategies as pertaining to engagement and completion of writing tasks). Multiple-case studies (e.g., Han & Hyland, 2015) have also reported differential L2 writers’ attitudes, per ceptions, and behavioral tendencies toward the WCF they received. This is par ticularly important in the case of heritage learners who may be relying on oral skills as they write. Their response to feedback and developmental trajectories may differ from non-heritage learners (e.g., Henshaw, 2015). The implemen tation of more qualitative approaches may be fruitful within this approach. (4) The context in which writing and WCF provision take place needs to be se riously considered (SLA vs. ISLA vs. ILL – Instructed Language Learning). Indeed, if we approach writing as a site for learning, the need to situate research within the language curriculum is of paramount importance. This adoption of a curricular approach leads to the consideration of pedagogical robustness of empirical findings in relation to the learning outcomes of the language curric ulum, which fits neatly with the notion of ISLA applied (Leow, 2019). (5) The relative dearth of concurrent data on L2 writer processing and processes during exposure to WCF needs to be further investigated (product vs. process). There is a need for triangulation of data from different sources but with one major caveat: The robustness of the data must be seriously considered if they are associated with the provision of WCF as a site for learning. (6) While the type of linguistic item has been addressed by some WCF studies (e.g., Benson & DeKeyser, 2018; Caras, 2019), the scope of linguistic items remains under-investigated and warrants future probing. (7) WCF writing conditions (collaborative vs. individual) have been investigated by relatively few studies and mostly from a sociocultural perspective (e.g., Wigglesworth & Storch, 2012). A fuller perspective into both writing condi tions can provide insights into whether it is the condition or the cognitive pro cessing and processes involved that contribute to subsequent L2 development. (8) There is a paucity of longitudinal research (see Sasaki, 2004, for an exception) on how the writing process changes, and how those processes affect linguistic changes. Furthermore, process data as well as classroom data (including obser vations, artifacts, and teacher interviews) should be triangulated with product data. (9) Of the studies cited, most focus on English. In all three areas, research on languages other than English are needed, particularly those languages where there is some theoretically motivated reason to expect that a writing system might affect the results. For example, Kessler et al. (2020) showed that text chat (as opposed to oral discussion) may hinder learners of Chinese because of the arduous task of writing in Chinese.

Chapter 13. Writing 319

5. Step-by-step guidelines and example studies Caras (2019) is arguably the first study to attempt to address WCF not only from a process-oriented perspective (motivated by the predominant product-oriented research designs in this strand of research) but also from an ISLA applied one. Interested in how L2 writers concurrently process different types of WCF and the effects of linguistic items (Spanish copulas ser vs. estar and imperfect vs. preterit), Caras employed think alouds to gather data on participants’ DoP while interacting with carefully created prompts that elicited the target linguistic items. She addressed ISLA applied by situating her design within the existing language curriculum and syllabus of her population. While she followed the usual and ecologically valid unfocused feedback procedure of the writing component of the language curric ulum to provide WCF to her participants, her study was focused on feedback for two target linguistic items contained within these compositions. 61 participants attended three sessions in the language laboratory during their normal class ses sions. In Session 1 (pretest), they practiced thinking aloud and then wrote their compositions while recording their think alouds on Echo360 (average of 54 min utes, 46 seconds). They were then randomly assigned to one of four experimental WCF conditions (direct, indirect, metalinguistic, and control). In Session 2 (viewed as the immediate posttest), one week later, they returned to the laboratory to rewrite their compositions while addressing the type of WCF provided and thinking aloud (average of 30 minutes, 55 seconds). They also completed a language background questionnaire. In Session 3 (delayed posttest), three weeks after Session 1, partic ipants were provided with their original compositions and asked to revise them without any feedback (average of 18 minutes, 35 seconds). Learning was measured in relation to the participants’ performance on the two linguistic items across the three sessions. Exemplar study: Caras (2019) Research questions 1. How do adult L2 beginning Spanish learners process unfocused WCF on ser versus estar and the preterit versus imperfect past-tense aspects? 2. Does the type of unfocused WCF (direct, indirect, metalinguistic) have a differential effect on adult L2 beginning Spanish learners’ subsequent written production accuracy of ser versus estar and the preterit versus imperfect past-tense aspects? If so, does the effect on accuracy of each respective target dichotomy last over two weeks? Theoretical framework Leow’s (2015) Model of the L2 Learning Process in ISLA

320 Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

Methods 61 English-speaking students in first semester Spanish at a US university were randomly assigned to one of four experimental WCF conditions (direct, indirect, metalinguistic, and control). Think-aloud protocols were gathered to address how participants processed type of WCF and linguistic item and coded using Leow’s (2015) DoP coding scheme. Learning was measured by performances on the three drafts of the composition at the pretest, immediate posttest, and delayed posttest stages. Findings Participants who received indirect WCF processed primarily at a low level while those receiving unfocused direct or metalinguistic WCF processed at all levels (high, medium, and low). For the copula, at the time of Draft 2 (second Session), participants who received direct WCF performed significantly better when compared to those in the other experimental conditions (metalinguistic, indirect, and control). The metalinguistic WCF group also outperformed the control group at this stage. However, these superior performances were not maintained two weeks later. For the preterit versus imperfect, type of WCF did not appear to play a significant role in subsequent performances on either the immediate or delayed posttests. Take-aways This study addressed recent calls for a process-oriented versus product-oriented approach to the WCF strand of research. While the study does have limitations (e.g., number of participants, limited number of linguistic items, potential ceiling effects due to participants’ relatively high prior knowledge, the laboratory setting versus at-home assignments, same composition), it sets an example of a process-oriented ISLA applied design that warrants future studies to not only address the limitations of the original study but also expand the type of population and include other variables (e.g., IDs, language proficiency). The study also addresses the call for ISLA studies to situate designs within the language curriculum and syllabus that may provide more robust and ecologically valid pedagogical and curricular ramifications to the classroom setting. Pedagogically, it appears that indirect WCF at this proficiency level may not lead to much cognitive engagement when compared to either direct or metalinguistic.

6. Troubleshooting ISLA writing research Common question 1: How do I avoid overinterpreting my results? Viewed from an ISLA applied perspective, one pitfall may lie in too much extrap olation derived from the results if the study design does not support such extrapo lations. It is important to consider that the writing component in an L2 classroom is carried out in many different contexts (e.g., in-class, at home, paper and pen, online) and conditions (e.g., individual, collaborative, free-style, prompted, differ ent genres, testing). Studies need to carefully consider and report the context and conditions under which their design was created to avoid any misinterpretation or misunderstanding. Also, conclusions about learning should not be made categori cally until delayed posttests are conducted to measure potential retention.

Chapter 13. Writing 321

Common question 2: What types of data are important to collect in L2 writing research?  ie-4264

The writing process should not be conflated with the written product. L2 writers employ different strategies, cognitive processes, and amount of time to produce written L2 data, especially during at-home assignments, and IDs may influence such production. To account for these differences, the collection of both process and product data together with ID data is of paramount importance. Especially relevant is consideration of task factors in helping or hindering learning through writing, hence the need to collect process and product data on the performance of more than one task. Common question 3: How can I robustly analyze L2 writing data? For studies addressing the construct of learning, it is highly recommended to analyze both process and product data. For example, in the WCF strand, do the corrections by L2 writers based on direct WCF indicate that WCF was successful in restructuring learners’ prior inaccuracies, that is, that they understood and in ternalized the correct item? Or were these corrections a simple repetition of the feedback provided processed at a low depth? The analysis of both quantitative and qualitative data can support claims made on the amount of learning assumed to have occurred. 7. Conclusions ISLA-oriented L2 writing research has made important progress in the range of research problems investigated in the three domains covered in the chapter and in the range of methodological approaches employed to provide valid answers to the questions guiding research. Yet, as noted throughout the chapter, the major chal lenges facing future work entail the posing of empirical questions whose answers have real implications for the language classroom and less so for theory building. For progress in these pedagogically-relevant research agendas, major methodo logical challenges facing future work entail, at a minimum: (1) the expansion of populations under study so that we can confidently assume that our research in sights reflect a wider and fairer representation of L2 users; (2) opting for designs and research instruments that capture the complex interactions of variables that may bring about learning via writing (including IDs, variations in task engagement and resulting investment and processing activity, nature of the tasks being per formed and implementation conditions, nature of WCF and engagement with it, dimensions of performance under the spotlight); and, crucially, (3) making sure that studies are guided by theoretically-informed and pedagogically-valid opera tionalizations of learning through and by writing.

322 Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

We hope the analysis presented in the chapter can assist future researchers i nterested in writing as a site for language learning to identify pedagogically-relevant questions to be asked and to be mindful of the range of methodological considera tions to take into account when designing and conducting their studies. 8. Further reading and additional resources 8.1

Books

Leow, R. (Ed.) (2019). The Routledge handbook of second language research in classroom learning. Routledge. https://doi.org/10.4324/9781315165080 Manchón, R. M., & Polio, C. (2022). The Routledge handbook of second language acquisition and writing. Routledge. Polio, C., & Friedman, D. (2017). Understanding, evaluating, and conducting second language writing research. Routledge.

8.2

Journals

Journal of Second Language Writing Assessing Writing Language Awareness Research Methods in Applied Linguistics Journal of Writing Research College Composition and Communication Composition Studies Computers and Composition Journal of Academic Writing Written Communication

8.3

Websites

https://europeanwritingcenters.eu/ http://writinglabnewsletter.org/ http://english.ttu.edu/acw/ http://www.awpwriter.org/ https://www.linguisticanalysistools.org/

8.4 Conferences Symposium on Second Language Writing: http://sslw.asu.edu/ College Composition and Communication: http://www.ncte.org/ccc

Chapter 13. Writing 323

References Amelohina, V., Manchón, R. M., & Nicolás-Conesa, F. (2020). Effects of task repetition with the aid of direct and indirect written corrective feedback: A longitudinal study. In R. M. Manchón (Ed.), Writing and language learning: Advancing research agendas (pp. 145–181). John Benjamins. https://doi.org/10.1075/lllt.56.07ame Azkarai, A., & García-Mayo, P. (2015). Task-modality and L1 use in EFL oral interaction. Language Teaching Research, 19(5), 550–571. https://doi.org/10.1177/1362168814541717 Benson, S., & DeKeyser, R. (2018). Effects of written corrective feedback and language aptitude on verb tense accuracy. Language Teaching Research. 12(1), 13–42. https://doi.org/10.1177/1362168818770921 Bowles, M. (2010). The think-aloud controversy in second language research. Routledge. https://doi.org/10.4324/9780203856338 Caras, A. (2019). Written corrective feedback in compositions and the role of depth of process ing. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 188–200). Routledge. https://doi.org/10.4324/9781315165080-13 Cerezo, L., Manchón, R. M., & Nicolás-Conesa, F. (2019). What do learners notice while pro cessing written corrective feedback? A look at depth of processing via written languaging. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 173–187). Routledge. https://doi.org/10.4324/9781315165080-12 Chan, H. P., Verspoor, M., & Vahtrick, L. (2015). Dynamic development in speaking versus writ ing in identical twins. Language Learning, 65(2), 298–325. https://doi.org/10.1111/lang.12107 Cho, M. (2018). Task complexity and modality: Exploring learners’ experience from the perspec tive of flow. Modern Language Journal, 102(1), 162–180. https://doi.org/10.1111/modl.12460 Coyle, Y., Cánovas-Guirao, J., & Roca de Larios, J. (2018). Identifying the trajectories of young EFL learners across multi-stage writing and feedback processing tasks with model texts. Journal of Second Language Writing, 42, 25–43. https://doi.org/10.1016/j.jslw.2018.09.002 Cumming, A. (2006). Goals for academic writing: ESL students and their instructors. John Benjamins. https://doi.org/10.1075/lllt.15 Ellis, R. (2009). A typology of written corrective feedback types. ELT Journal, 63(2), 97–107. https://doi.org/10.1093/elt/ccn023 Ferris, D., Liu, H., Sinha, A., & Senna, M. (2013). Written corrective feedback for individual L2 writers. Journal of Second Language Writing, 22(3), 307–329. https://doi.org/10.1016/j.jslw.2012.09.009 García-Mayo, P., & Azkarai, A. (2016). EFL task-based interaction. Does task modality impact on language-related episodes? In M. Sato, A. Bello, & S. Ballinger (Eds.), Peer interaction and second language learning. Pedagogical potential and research agenda (pp. 241–266). John Benjamins. https://doi.org/10.1075/lllt.45.10gar Gass, S. M., & Mackey, A. (2016). Stimulated recall methodology in applied linguistics and L2 research (2nd ed.). Routledge. https://doi.org/10.4324/9781315813349 Godfrey, L., Treacy, C., & Tarone, E. (2014). Change in French second language writing in study abroad and domestic contexts. Foreign Language Annals, 47(1), 48–65. https://doi.org/10.1111/flan.12072 Han, Y., & Hyland, F. (2015). Exploring learner engagement with written corrective feedback in a Chinese tertiary EFL classroom. Journal of Second Language Writing, 30, 31–44. https://doi.org/10.1016/j.jslw.2015.08.002

324 Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

Harman, R. (2013). Literary intertextuality in genre-based pedagogies: Building lexical cohesion in fifth-grade L2 writing. Journal of Second Language Writing, 22(2), 125–140. https://doi.org/10.1016/j.jslw.2013.03.006 Hartshorn, K. J., Evans, N. W., Merrill, P. F., Sudweeks, R. R., Strong-Krause, D., & Anderson, N. J. (2010). Effects of dynamic corrective feedback on ESL writing accuracy. TESOL Quarterly, 44(1), 84–109. https://doi.org/10.5054/tq.2010.213781 Henshaw, F. (2015). Learning outcomes of L2-heritage learner interaction: The proof is in the posttests. Heritage Language Journal, 12(3), 245–270. https://doi.org/10.46538/hlj.12.3.2 Johnson, M. (2017). Cognitive task complexity and L2 written syntactic complexity, accuracy, lexical complexity, and fluency: A research synthesis and meta-analysis. Journal of Second Language Writing, 37, 13–38. https://doi.org/10.1016/j.jslw.2017.06.001 Kang, S., & Lee, J.-H. (2019). Are two heads always better than one? The effects of collaborative planning on L2 writing in relation to task complexity. Journal of Second Language Writing, 45, 61–72. https://doi.org/10.1016/j.jslw.2019.08.001 Kessler, M., Polio, C., Xu, C. & Hao, X. (2020). The effects of oral discussion and text chat on L2 Chinese writing. Foreign Language Annals, 53(4), 666–685. https://doi.org/10.1111/flan.12491 Khuder, B., & Harwood, N. (2015). L2 writing in test and non-test situations: Process and prod uct. Journal of Writing Research, 6(3), 233–278. https://doi.org/10.17239/jowr-2015.06.03.2 Kim, Y., & Emelivanova, L. (2021). The effects of written corrective feedback on the accuracy of L2 writing: Comparing collaborative and individual revision behavior. Language Teaching Research, 25(2), 234–255. https://doi.org/10.1177/1362168819831406 Knospe, Y., Sullivan, K., Malmqvist, A., & Valfridsson, I. (2019). Observing writing and web site browsing: Swedish students write L3 German. In E. Lindgren & K. Sullivan (Eds.), Observing writing: Insights from keystroke logging and handwriting (pp. 258–284). Brill. Koltovskaia, S. (2020). Student engagement with automated written corrective feedback (AWCF) provided by Grammarly: A multiple case study. Assessing Writing, 44. https://doi.org/10.1016/j.asw.2020.100450 Kyle, K. (2022). Writing and vocabulary learning. In R. Manchón & C. Polio (Eds.), The Routledge handbook of second language acquisition and writing (pp. 183–198). Routledge. Kyle, K., Crossley, S., & Verspoor, M. (2021). Measuring longitudinal writing development using indices of syntactic complexity and sophistication. Studies in Second Language Acquisition, 43(4), 781–812. https://doi.org/10.1017/S0272263120000546 Lantolf, J. P., Thorne, S. L., & Poehner, M. E. (2015). Sociocultural theory and second language development. In B. VanPatten & J. Williams (Eds.), Theories in second language acquisition (pp. 207–226). Routledge. Laufer, B., & Hulstijn, J. (2001). Incidental vocabulary acquisition in a second language: The construct of task-induced involvement. Applied Linguistics, 22(1), 1–26. https://doi.org/10.1093/applin/22.1.1 Leijten, M., van Waes, L., Schrijver, I., Bernolet, S., & Vangehuchten, L. (2019). Mapping master’s students’ use of external sources in source-based writing in L1 and L2. Studies in Second Language Acquisition, 41(3), 555–582. https://doi.org/10.1017/S0272263119000251 Leow, R. P. (2015). Explicit learning in the classroom. A student-centered perspective. Routledge. https://doi.org/10.4324/9781315887074 Leow, R. P. (2019). From SLA > ISLA > ILL: A curricular perspective. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 485–493). Routledge. https://doi.org/10.4324/9781315165080-33

Chapter 13. Writing 325

Leow, R. P. (2020). L2 writing-to-learn: Theory, research, and a curricular approach. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 95–117). John Benjamins. https://doi.org/10.1075/lllt.56.05leo Leow, R. P., & Bowles, M. A. (in press). Verbally mediated data: Concurrent/retrospective ver balizations via think-alouds and stimulated recalls. In R. M. Manchón & J. Roca de Larios (Eds.), Research methods in the study of writing processes. John Benjamins. Leow, R. P., Grey, S., Marijuan, S., & Moorman, C. (2014). Concurrent data elicitation proce dures, processes, and the early stages of L2 learning: A critical overview. Second Language Research, 30(2), 111–127. https://doi.org/10.1177/0267658313511979 Leow, R. P., & Manchón, R. M. (2022). Expanding research agendas: Directions for future re search agendas on writing, language learning and ISLA. In R. M. Manchón & C. Polio (Eds.), Routledge handbook of second language acquisition and writing (pp. 299–311). Routledge. Leow, R. P., Thinglum, A., & Leow, S. A. (in press). WCF processing in the L2 curriculum: A look at type of WCF and type of linguistic item. Studies in Second Language Learning and Teaching. Li, J., & Schmitt, N. (2009). The acquisition of lexical phrases in academic writing: A longitudinal case study. Journal of Second Language Writing, 18(2), 85–102. https://doi.org/10.1016/j.jslw.2009.02.001 Li, S., & Roshan, S. (2019). The associations between working memory and the effects of four different types of written corrective feedback. Journal of Second Language Writing, 45, 1–15. https://doi.org/10.1016/j.jslw.2019.03.003 Lim, J., Tigchelaar, M., & Polio, C. (2021). Understanding written linguistic development through writing goals and writing behaviors. Language Awareness, 30(1), 117–136 https://doi.org/10.1080/09658416.2021.2002880 López-Serrano, S., Roca de Larios, J., & Manchón, R. M. (2019). Language reflection fostered by individual L2 writing tasks: Developing a theoretically-motivated and empirically-based coding system. Studies in Second Language Acquisition, 41(3), 503–527. https://doi.org/10.1017/S0272263119000275 Lyster, R., & Ranta, L. (1997). Corrective feedback and learner uptake. Studies in Second Language Acquisition, 19(1), 37–66. https://doi.org/10.1017/S0272263197001034 Lu, X. (2011). A corpus-based evaluation of syntactic complexity measures as indices of col lege-level ESL writers’ language development. TESOL Quarterly, 45(1), 36–62. https://doi.org/10.5054/tq.2011.240859 Man, D., & Chau, M. H. (2019). Learning to evaluate through that-clauses: Evidence from a longitudinal learner corpus. Journal of English for Academic Purposes, 37, 22–33. https://doi.org/10.1016/j.jeap.2018.11.007 Manchón, R. M. (2014). The internal dimension of tasks: The interaction between task factors and learner factors in bringing about learning through writing. In H. Byrnes & R. Manchón (Eds.), Task-based language learning. Insights from and for L2 writing (pp. 27–52). John Benjamins. https://doi.org/10.1075/tblt.7.02man Manchón, R. M., & Leow, R. P. (2020). An ISLA perspective on L2 learning through writing. Implications for future research agendas. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 335–355). John Benjamins. https://doi.org/10.1075/lllt.56.14man

326 Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

Manchón, R. M., Nicolás-Conesa, F., Cerezo, L., & Criado, R. (2020). L2 writers’ processing of written corrective feedback: Depth of processing via written languaging. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 241–265). John Benjamins. https://doi.org/10.1075/lllt.56 Manchón, R. M., & Vasylets, O. (2019). Language learning through writing: Theoretical per spectives and empirical evidence. In J. W. Schwieter & A. Benati (Eds.), The Cambridge handbook of language learning (pp. 341–362). Cambridge University Press. https://doi.org/10.1017/9781108333603.015 Mazgutova, D., & Kormos, J. (2015). Syntactic and lexical development in an intensive English for Academic Purposes programme. Journal of Second Language Writing, 29, 3–15. https://doi.org/10.1016/j.jslw.2015.06.004 McDonough, K., & De Vleeschauwer, J. (2019). Comparing the effect of collaborative and in dividual prewriting on EFL learners’ writing development. Journal of Second Language Writing, 44, 123–130. https://doi.org/10.1016/j.jslw.2019.04.003 Menke, M., & Strawbridge, T. (2019). The writing of Spanish majors: A longitudinal analysis of syntactic complexity. Journal of Second Language Writing, 46, 100665. https://doi.org/10.1016/j.jslw.2019.100665 Michel, M., Kormos, J., Brunfaut, T., & Ratajczak, M. (2019). The role of working memory in young second language learners’ written performances. Journal of Second Language Writing, 45, 31–45. https://doi.org/10.1016/j.jslw.2019.03.002 Michel, M., Révész, A., Lu, J., Kourtali, N.-E., Li, M., & Borges, M. (2020). Investigating L2 writing processes across independent and integrated tasks: A mixed-methods study. Second Language Research, 36(3), 277–304. https://doi.org/10.1177/0267658320915501 Miller, K. S., Lindgren, E., & Sullivan, K. P. H. (2011). The psycholinguistic dimension in second language writing: opportunities for research and pedagogy using computer keystroke log ging. TESOL Quarterly, 42(3), 433–454. https://doi.org/10.1002/j.1545-7249.2008.tb00140.x Norris, J. M., & Ortega, L. (2003). Defining and measuring L2 acquisition. In C. Doughty, & M. H. Long (Eds.). Handbook of second language acquisition (pp. 717–761). Blackwell. https://doi.org/10.1002/9780470756492.ch21 Ong, J. (2014). How do planning time and task conditions affect metacognitive processes of L2 writers? Journal of Second Language Writing, 23, 17–30. https://doi.org/10.1016/j.jslw.2013.10.002 Ong, J., & Zhang, L. (2010). Effects of task complexity on the fluency and lexical complexity in EFL students’ argumentative writing. Journal of Second Language Writing, 19(4), 218–233. https://doi.org/10.1016/j.jslw.2010.10.003 Papi, M. (2022). The role of motivational and affective factors in learners’ writing performance and engagement with written corrective feedback. In R. Manchón & C. Polio, (Eds.), The Routledge handbook of second language acquisition and writing (pp. 152–165). Routledge. Park, E. S., & Kim, O. Y. (2019). Learners’ engagement with indirect written corrective feedback: Depth of processing and self-correction. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 212–226). Routledge. https://doi.org/10.4324/9781315165080-15 Pérez-Vidal, C., & Roquet, H. (2015). CLIL in context: Profiling language abilities. In M. JuanGarau & J. Salazar-Noguera (Eds.), Content-based language learning in multilingual educational environments (pp. 237–255). Springer. https://doi.org/10.1007/978-3-319-11496-5_14 Polio, C. (2020). Can writing facilitate grammatical development? Advancing research agendas. In R. M. Manchón (Ed.), The language learning potential of L2 writing: Moving forward in theory and research (pp. 381–420). John Benjamins. https://doi.org/10.1075/lllt.56.16pol

Chapter 13. Writing 327

Polio, C. (2022). Writing and grammar development. In R. Manchón & C. Polio (Eds.), The Routledge handbook of second language acquisition and writing (pp. 169–182). Routledge. Polio, C., & Friedman, D. (2017). Understanding, evaluating, and conducting second language writing research. Routledge. Polio, C., & Shea, M. (2014). Another look at accuracy in second language writing development. Journal of Second Language Writing, 23(1), 10–27. https://doi.org/10.1016/j.jslw.2014.09.003 Ranalli, J. (2021). L2 student engagement with automated feedback on writing: Potential for learning and issues of trust. Journal of Second Language Writing, 52, 1–16. https://doi.org/10.1016/j.jslw.2021.100816 Révész, A., Kourtali, N. E., & Mazgutova, D. (2017). Effects of task complexity on L2 writing behaviors and linguistic complexity. Language Learning, 67(1), 208–241. https://doi.org/10.1111/lang.12205 Révész, A., & Michel, M. (Eds.). (2019). Methodological advances in investigating L2 writing processes [Special Issue]. Studies in Second Language Acquisition, 41(3). https://doi.org/10.1017/S0272263119000329 Roquet, H., & Pérez-Vidal, C. (2017). Do productive skills improve in content and language integrated learning contexts? The case of writing. Applied Linguistics, 38(4), 489–511. Sachs, R., & Polio, C. (2007). Learners’ uses of two types of written feedback on an L2 writing revision task. Studies in Second Language Acquisition, 29(1), 67–100. https://doi.org/10.1017/S0272263107070039 Sánchez, A., Manchón, R. M., & Gilabert, R. (2020). The effects of task repetition across modalities and proficiency levels. In R. M. Manchón (Ed.), Writing and language learning: Advancing research agendas (pp. 121–143). John Benjamins. https://doi.org/10.1075/lllt.56.06san Sasaki, M. (2004). A multiple-data analysis of the 3.5- year development of EFL student writers. Language Learning, 54(3), 525–582. https://doi.org/10.1111/j.0023-8333.2004.00264.x Sasaki, M., Mizumoto, A., & Murakami, A. (2018). Developmental trajectories in L2 writing strategy use: A self-regulation perspective. Modern Language Journal, 102(2), 292–309. https://doi.org/10.1111/modl.12469 Schmitt, D. (2020). Can writing facilitate the development of a richer vocabulary? Advancing re search agendas. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 357–380). John Benjamins. https://doi.org/10.1075/lllt.56.15sch Serrano, R. (2011). The effect of program type and proficiency level on learners’ written produc tion. Revista Española de Lingüística Aplicada, 24, 211–226. Smith, B. E., Pacheco, M. B., & De Almeida, C. R. (2017). Multimodal codemeshing: Bilingual adolescents’ processes composing across models of language. Journal of Second Language Writing, 36, 6–22. https://doi.org/10.1016/j.jslw.2017.04.001 Suzuki, W. (2012). Written languaging, direct correction, and second language writing revision. Language Learning, 62(4), 1110–1133. https://doi.org/10.1111/j.1467-9922.2012.00720.x Swain, M., & Lapkin, S. (1995). Problems in output and the cognitive processes they generate: A step towards second language learning. Applied Linguistics, 16(3), 371–391. https://doi.org/10.1093/applin/16.3.371 Van Waes, L., & Leijten, M. (2015). Fluency in writing: A multidimensional perspective on writ ing fluency applied to L1 and L2. Computers and Composition, 38, 79–95. https://doi.org/10.1016/j.compcom.2015.09.012 Vasylets, O., & Gilabert, R. (2022). Task effects across modalities. In R. M. Manchón & C. Polio (Eds.), The Routledge handbook of second language acquisition and writing (pp. 39–51). Routledge.

328 Ronald P. Leow, Rosa M. Manchón, and Charlene Polio

Vasylets, O., Gilabert, R., & Manchón, R. M. (2017). The effects of mode and task complexity on second language production. Language Learning, 67(2), 394–430. https://doi.org/10.1111/lang.12228 Vasylets, O., Gilabert, R., & Manchón, R. M. (2019). Differential contribution of oral and written modes to lexical, syntactic and propositional complexity in L2 performance in instructed contexts. Instructed Second Language Acquisition, 3(2), 206–227. https://doi.org/10.1558/isla.38289 Vasylets, O., Gilabert, R., & Manchón, R. M. (2020). Task modality, communicative adequacy and CAF measures. In R. M. Manchón (Ed.), Writing and language learning: Advancing research agendas (pp. 183–206). John Benjamins. https://doi.org/10.1075/lllt.56.08vas Wigglesworth, G., & Storch, N. (2012). Feedback and writing development through collabora tion: A socio-cultural approach. In R. M. Manchón (Ed.), L2 writing development: Multiple perspectives (pp. 69–101). De Gruyter Mouton. https://doi.org/10.1515/9781934078303.69 Wijers, M. (2018). The role of variation in l2 syntactic complexity: A case study on subordinate clauses in Swedish as a foreign language. Nordic Journal of Linguistics, 41, 75–116. https://doi.org/10.1017/S0332586517000233 Wolfe-Quintero, K., Inagaki, S., & Kim, H. Y. (1998). Second language development in writing: Measures of fluency, accuracy, & complexity. University of Hawai’i Press. Yasuda, S. (2011). Genre-based tasks in foreign language writing: Developing writers’ genre awareness, linguistic knowledge, and writing competence. Journal of Second Language Writing, 20(2), 111–133. https://doi.org/10.1016/j.jslw.2011.03.001 Yoon, H. J., & Polio, C. (2017). ESL students’ linguistic development in two written genres. TESOL Quarterly, 51(2), 275–301. https://doi.org/10.1002/tesq.296 Zalbidea, J. (2017). ‘One task fits all’? The roles of task complexity, modality, and working mem ory capacity in L2 performance. Modern Language Journal, 101(2), 335–352. https://doi.org/10.1111/modl.12389 Zalbidea, J. (2020a). A mixed-methods approach to exploring the L2 learning potential of writ ing versus speaking In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 207–230). John Benjamins. https://doi.org/10.1075/lllt.56.09zal Zalbidea, J. (2020b). On the scope of output in SLA task modality, salience, L2 grammar noticing and development. Studies in Second Language Acquisition, 43(1), 50–82. https://doi.org/10.1017/S0272263120000261 Zalbidea, J., & Sanz, C. (2020). Does learner cognition count on modality? Working memory and L2 morphosyntactic achievement across oral and written tasks. Applied Psycholinguistics, 41(5), 1171–1196. https://doi.org/10.1017/S0142716420000442 Zhang, Z. (2020). Engaging with automated writing evaluation (AWE) feedback on L2 writing: Student perceptions and revisions. Assessing Writing, 43, 100439. https://doi.org/10.1016/j.asw.2019.100439 Zheng, Y., & Yu, S. (2018). Student engagement with teacher written corrective feedback in EFL writing: A case study of Chinese lower-proficiency students. Assessing Writing, 37, 13–24. https://doi.org/10.1016/j.asw.2018.03.001 Zhou, A. A., Busch, M., & Cumming, A. (2014). Do adult ESL learners’ and their teachers’ goals for improving grammar in writing correspond? Language Awareness, 23(3), 234–254. https://doi.org/10.1080/09658416.2012.758127

Chapter 14

Speaking Complexity, accuracy, fluency, and functional adequacy (CAFFA) Folkert Kuiken and Ineke Vedder University of Amsterdam

This chapter focuses on the assessment of oral performance in a second language (L2), viewed from the perspective of task-based language assessment and instructed second language acquisition. The notion of L2 proficiency, as presented in the Common European Framework of References (Council of Europe, 2001), rests on two pillars that have to be taken into account when measuring L2 speaking: (1) the linguistic dimension, referring to the complexity, accuracy, and fluency (CAF) of the speaker’s utterances; and (2) the communicative d imension, concerning the appropriateness and effectiveness of the message to be conveyed, labeled functional adequacy (FA). Whereas the majority of studies on the assessment of learner performance have focused on CAF, little attention has been devoted to the functional aspects of L2 performance. The rationale underlying the present chapter is that assessing L2 speaking is impossible without considering both CAF and FA (henceforth, CAFFA) and the mutual relationship between the two constructs. It is argued that L2 speaking should not only be assessed by measures along the CAF-triad but also in terms of FA. The chapter discusses the theoretical underpinnings and applicability of a rating scale for FA (Kuiken & Vedder, 2017, 2018). Keywords: L2 speaking, CAF (complexity, accuracy, fluency), FA (functional adequacy), CAFFA, rating scale

1. What is CAFFA and why is it important? L2 speaking is a complex skill, consisting of different inter-related components, such as language knowledge (vocabulary, grammar), pronunciation skills (speech sounds, word stress, intonation), and language processing skills (reaction time) (Levelt, 1989). L2 speakers differ in oral proficiency: while the speech of some may be characterized by short sentences, hesitations, and self-corrections, others speak https://doi.org/10.1075/rmal.3.14kui © 2022 John Benjamins Publishing Company

330 Folkert Kuiken and Ineke Vedder

fast and use long and flawless utterances. In general, L2 learners encounter more problems than L1 speakers in finding the right words, putting them in the right order, and articulating their utterances fluently and appropriately. This chapter focuses on the assessment of oral performance by L2 speakers, viewed from the perspective of task-based language teaching (TBLT), task-based language assessment (TBLA), and instructed second language acquisition (ISLA). Oral proficiency rests on two pillars that have to be taken into account when meas uring L2 speaking: the linguistic dimension referring to the complexity, accuracy, and fluency (CAF) of a speaker’s spoken output, and the communicative dimen sion, concerning the functional adequacy (FA) and effectiveness of the message to be conveyed (CEFR; Council of Europe, 2001). The rationale underlying this chapter is that assessing oral L2 performance is impossible without considering both CAF and FA (henceforth, CAFFA), and the mutual relationship and possible trade-offs between CAFFA dimensions (Kuiken & Vedder, 2017, 2018). If the pri mary goal of L2 speaking is to communicate successfully, L2 performance needs to be evaluated both with CAF indices and with measures of FA, to capture a wide array of learning outcomes associated with the accomplishment of real world tasks. Starting with Levelt’s Model of Speech Production (1989), we first discuss the main (sub)components of speech production in L2, i.e., the process by means of which thoughts and concepts are translated into speech. In an overview of the lit erature on L2 speaking, we define the constructs of the CAF triad and the concept of FA, followed by a review of studies on the measurement of CAFFA in various settings. Next, recommendations for future CAFFA researchers are provided, fol lowed by a discussion on future perspectives and methodological challenges for CAFFA research. 1.1

Levelt’s model of speech production

The most widely accepted model in research on oral production is Levelt’s Model of Speech Production (1989). Although the model was developed for the descrip tion of speech production of monolingual speakers in their L1, it has often been used (with some adaptations) to describe L2 speaking and/or speech production by multilingual speakers (De Bot, 1992; Izumi, 2003; Kormos, 2006; Yuan & Ellis, 2003). Levelt’s Model of Speech Production aims at describing normal, spontaneous language production of adults in their L1. In the model the following components are distinguished (see Figure 1): – A cognitive component (i.e., cognitive system), where general knowledge of the world (encyclopedic knowledge), together with more specific situational knowledge and verbal knowledge is stored.

Chapter 14. Speaking 331

– A conceptualizer, where the selection and ordering of relevant information takes place and the communicative intentions of the speaker (preverbal mes sages) are pre-processed for being converted into language. For example, in an utterance like ‘It is on platform four that the train from Amsterdam will arrive’ (cf. De Bot, 1992, p. 5), preverbal knowledge (e.g., mental images of trains, rail way stations, platforms, arrivals) is made ready to be passed to the formulator. – A formulator, where the preverbal message is converted into a so-called speech plan. The lexical items needed in the utterance (‘train,’ ‘platform,’ ‘Amsterdam,’ etc.) are retrieved from the mental lexicon, together with the application of the grammatical and phonological rules (i.e. grammatical encoding; phonological encoding) connected to these items. The verb ‘arrive,’ for instance, requires a subject but no object, and adverbials of time and place are optional. – An articulator which transmutes the speech plan into actual speech (i.e., a phonetic plan) and makes sure that the utterance is actually pronounced by activating the speech mechanism, which leads to the production of the sentence ‘It is on platform four that the train from Amsterdam will arrive.’

Cognitive system

Conceptualizer Message generation Monitoring

Preverbal message

Formulator Grammatical encoding

Phonetic plan

Articulator

Phonological encoding

Figure 1. Schematic representation of processes involved in speech output

Investigating the applicability of Levelt’s model for L2 speaking, De Bot (1992) hypothesizes that the conceptualizer may not be completely language-specific, as assumed by Levelt, but only partly. According to De Bot, the conceptualizer may also contain information about the specific language in which the utterance is to be produced. Through this information the relevant language-specific formulator is activated; the preverbal message is then converted into a speech plan and submit ted to the articulator, which is thought to store the possible sounds and prosodic patterns of various languages. While it can be argued that most aspects of L2 production can – with some adjustments – be explained by Levelt’s Model of Speech Production, there are a number of particularities of L2 speaking that distinguish it from L1 speech: L2 vocabulary size may be less complete compared to L1, lexical retrieval time may be longer, and grammar and phonology may still be underdeveloped. In spite of these differences, Levelt’s model is crucial for understanding speech production in both

332 Folkert Kuiken and Ineke Vedder

L1 and L2. Examples of research employing this model are Yuan and Ellis (2003), who examined the effects of pre-task and online planning on L2 oral production, and Izumi (2003), who investigated the psycholinguistic mechanisms that underlie speech comprehension and production in L2. 2. What we know and what we need to know about CAFFA in ISLA As mentioned before, in this chapter, assessment of oral performance in ISLA is considered from the perspective of TBLT and TBLA. TBLA highlights the assess ment of L2 performance elicited by real-life speaking tasks as vehicles for authen tic, goal-directed, and meaning-focused performance (Long, 2015): a phone call to a friend, a complaint about noise from the neighbors, a pitch presentation of a research project. In order to provide learners with the opportunity to engage in situated communicative interaction, assessment tasks need to be designed carefully, in terms of discourse context, relationship between addressee and listener, speech acts, etc. (González-Lloret, 2022). Oral performance elicited by communicative language tasks has been fre quently assessed in terms of CAF. Analyses mostly focused on whether particular task characteristics such as the number of participants, the amount of planning time, the number of different elements to be taken into account, elicited longer or shorter clauses, more or less varied lexicon, faster or slower speech (cf. Robinson, 2001). Typical research questions in this type of research concerned how different dimensions of the CAF triad evolved over time or varied across task conditions (Housen et al., 2012). As pointed out by Pallotti (2009), much less attention has been devoted to what extent communication was adequate. Questions such as ‘Is the message adequate and effective?’ ‘Does it achieve its goals?’ have seldom been asked. More recently, however, researchers have begun to include FA in their investigations, and the importance of also assessing FA as an essential component of oral and written L2 performance has been emphasized by several authors (e.g., Kuiken et al., 2010; Pallotti, 2009; Révész et al., 2016). 2.1

CAFFA: Definition of constructs and operationalizations

The three CAF constructs (i.e., complexity, accuracy, fluency) are multilayered, multifaceted, and multidimensional, a fact that has sometimes not sufficiently been acknowledged in the CAF literature. Moreover, notwithstanding the wide-spread use of CAF measures in L2 research, across studies the terms complexity, accuracy and fluency have often been used with different meanings.

Chapter 14. Speaking 333

As pointed out by Bulté and Housen (2012), linguistic complexity as a dimen sion of L2 complexity can be divided into lexical, morphological, syntactic, and phonological complexity. The most frequently and intensively measured compo nent is syntactic complexity – and to a lesser degree lexical complexity – whereas morphological and phonological complexity have been investigated much less. Syntactic complexity, referring to the degree of elaboration, size, breadth, width, and richness of the learner’s L2 system, has typically been assessed by means of length-based measures of overall complexity (e.g., mean sentence length) and measures of subordination (e.g., number of subordinate clauses). Criticisms were soon raised against this reductionist approach of L2 complexity (Bulté & Housen, 2012; Housen et al., 2012; Norris & Ortega, 2009; Pallotti, 2009, 2015). Instead of more global measures, the use of more fine-grained and sophisticated measures was proposed that address other syntactic levels (e.g., the phrasal level) and other domains, like morphological complexity (Brezina & Pallotti, 2019; De Clercq & Housen, 2019), propositional complexity (Vasylets et al., 2019), and phraseological complexity (Paquot, 2019). Accuracy is probably the most straightforward component of the CAF triad, referring to the extent to which an L2 learner’s performance deviates from a norm (i.e., the native speaker). These deviations with respect to the native norm are usu ally labeled errors (Wolfe-Quintero et al., 1998). Accuracy has often been measured by means of holistic scales, providing a general impression of accuracy, or by global measures, like number of error-free clauses or number of errors per 100 words. More specific accuracy measures have also been employed, which focus on a spe cific feature (e.g., inflection errors, agreement errors, wrong article use), or weigh the severity of the error (Foster & Wigglesworth, 2016). Three subdimensions of fluency are often distinguished (cf. De Jong et al., 2012; Révész et al., 2016): speed fluency (referring to rate and density of linguistic units produced), breakdown fluency (number, length, and location of pauses), and repair fluency (false starts, self-corrections). Thus defined, fluency is mainly a phonolog ical phenomenon that belongs to the spoken language, in contrast to complexity and accuracy, which can manifest themselves at all levels (i.e. the phonological, lexical, morphological, syntactic, and socio-pragmatic level; Housen et al., 2012; Michel, 2017). Fluency is generally assessed by listener judgments or by time-based measures calculating the ratio of syllables, the number of silent and filled pauses or repairs per second or per 100 words. Contrary to CAF, FA has been investigated much less. Similarly to the CAF constructs, FA is a multidimensional and multilayered concept, which has been interpreted in various ways: as successful information transfer (Upshur & Turner, 1995), pragmatic appropriateness (McNamara & Roever, 2007), text coherence and cohesion (Knoch, 2009), discursive practice and adequacy in oral communication

334 Folkert Kuiken and Ineke Vedder

(Ekiert et al., 2018; Révész et al., 2016), and successful task performance (De Jong et al., 2012). In line with the conceptualization of FA of De Jong et al. (2012), Kuiken and Vedder (2017, 2018) view FA as a task-related construct, considered within the framework of TBLT, TBLA, and ISLA, and defined in terms of successful task com pletion. In order to assess FA of L2 performance, an FA rating scale was developed (Kuiken & Vedder, 2017, 2018) in which four dimensions of FA were distinguished. More information about the FA rating scale is provided in Section 3.2. For an overview of the various CAF and FA constructs that have been discussed and examples of assessment measures that are often employed in language peda gogy and classroom practice research, we refer to Table 1. In Section 3.1 we will illustrate the use of these measures by means of some recent studies. Table 1. Constructs and examples of operationalizations Construct

Operationalization

Syntactic complexity

Overall complexity – Length based measures: Mean length of utterance, mean length of AS-unit – Ratios: S-nodes/AS-unit Complexity by coordination – Coordination index: Coordinate clauses/clauses Complexity by subordination – Subordination index: Subordinate clauses/AS-unit Phrasal complexity – Mean length of clause, number and length of postmodifying noun phrases Variety and sophistication of forms – Frequency of infinitival phrases, passive forms, conditionals

Morphological complexity

Inflectional complexity – Frequency of tensed forms, different verb forms, past tense forms Derivational complexity – Measure of affixation

Lexical complexity

Diversity – TTR (Type/Token Ratio), Guiraud Index, D Density – Lexical words/total words, lexical words/function words Richness – Number of unique words Sophistication – Less frequent words/total words

Chapter 14. Speaking 335

Table 1. (continued) Construct

Operationalization

Accuracy

Overall accuracy – Error-free T-units, errors/100 words, error-free clauses/clauses Specific measures – Errors in subject-verb agreement, tense-aspect forms, modal verbs, connectors

Fluency

Speed fluency – Phonation time ratio, articulation rate, syllables/minute Breakdown fluency – Number of silent pauses/100 words, number of filled pauses/100 words Repair fluency – False starts/100 words, self-repairs/100 words, repetitions/100 words

Functional adequacy

Task requirements – Have the task requirements been fulfilled successfully (e.g., genre, task type, speech acts, register, addressee)? Content – Is the number of ideas provided in the text adequate and are they consistent with each other? Comprehensibility – How much effort is required to understand text purpose and ideas? Coherence and cohesion – Is the text coherent and cohesive (e.g., use of strategies for coherence, cohesive devices)?

2.2

CAFFA: Studies on the assessment of oral performance

In this section we review a number of CAFFA studies on oral L2 performance. These studies illustrate the need to select the right assessment measures, depend ing on task modality, holistic versus fine-grained assessment, level of proficiency, longitudinal versus cross-sectional research, task type, and relationship between CAF and FA. Vasylets et al. (2019) conducted a study among 290 instructed L2 learn ers of English with Spanish and/or Catalan as their native language. The aim of the research was to investigate how the manifestation of syntactic, lexical, and propositional complexity was moderated by task modality (oral vs. written). The participants undertook an oral and written narrative video-retelling task. Syntactic complexity was calculated both by length-based measures (i.e., mean length of

336 Folkert Kuiken and Ineke Vedder

AS-unit or T-unit) and by measures to assess subordination, coordination, and phrasal complexity. Lexical complexity was computed in terms of lexical density, diversity, richness, and sophistication. The analysis revealed that, compared to the written texts, in the oral retellings lower scores were found for both syntactic and lexical complexity. Differences between the two modalities were also observed in the way speakers and writers conveyed the propositional content of the task, meas ured in terms of the occurrences of single and extended idea units. The oral retell ings contained more extended idea units, but a higher number of words per idea unit. A possible explanation is that in speaking online time pressure may induce speakers to use more loosely connected ideas, in contrast to writers who can create more complex, informationally dense ideas. De Clercq and Housen (2019) investigated the development of morphologi cal complexity in oral narratives in L2 French and L2 English at four levels of L2 proficiency to examine how traditional global complexity measures could be sup plemented by specific measures tapping into inflectional, derivational, and overall morphological complexity. The results indicated considerable increases of morpho logical complexity in L2 French, contrary to L2 English, where increases were found only at the two lower proficiency levels. These findings demonstrate the usefulness of fine-grained morphological complexity measures to assess oral performance, especially in languages with a rich morphological system, such as French. Lahmann et al. (2019) studied German L1 speakers who emigrated in 1938– 1939 to English-speaking countries to escape the Nazi regime between the ages of seven and seventeen.The participants, who were highly advanced speakers of L2 English, were interviewed either in German or in English. The interviews in German were used to investigate possible L1 attrition, while the interviews in English were employed to investigate L2 knowledge. The corpus consisted of 73 L1 German attriters and 102 L2 English learners. The authors advocate the necessity of a definition and operationalization of complexity that can also be applied to very advanced levels of spoken language production. They acknowledge the utility of holistic measures of syntactic and lexical complexity, but at the same time stress the importance of more fine-grained approaches to capture the syntactic and lexical characteristics of highly proficient English L2 speakers and German L1 attriters. Vercellotti (2019) discusses the findings of a longitudinal study over three aca demic semesters in a pre-university intensive English program, in order to examine the development of syntactic complexity in the speech of 66 English L2 learners (i.e., oral monologues produced during classroom-based activities). The author observed an overall growth of syntactic complexity over time, assessed by means of both commonly used overall measures of syntactic complexity (e.g., length of AS-unit, clause length, subordination index) and by more specific syntactic measures (e.g., syntactic variety and weight of complexity scores).

Chapter 14. Speaking 337

In a cross-sectional study, Lambert and Nakamura (2019) investigated devel opmental variation in syntactic and lexical complexity by comparing the oral per formance of 36 L2 learners of English at different proficiency levels (intermediate and advanced) in completing six picture-based information gap tasks. The authors found that the four types of clause combination strategies that were examined (coordination and nominal, adverbial, and relative subordination) varied with pro ficiency level, implying that decisions about the type of assessment measure should always be motivated by, for example, proficiency level and task type. Révész et al. (2016) looked at whether the relationship between linguistic fea tures and FA varied across different oral tasks. In the study, a task-dependent and task-independent rating scale for FA were employed, together with a wide array of CAF measures. 80 learners of English, divided over four equal groups of different proficiency levels, were subjected to five speaking tasks (i.e., complaint, refusal, narration, advice, and summary). The authors found that fluency was the strongest predictor of FA, with other dimensions, such as lexical diversity, grammatical and connector accuracy, and syntactic complexity also playing a role, albeit smaller. These effects were not moderated by the task variable, which means that the re lationship between CAF and FA remains constant across tasks, at least with these general measures. However, in a follow-up study, based on the same participants (Ekiert et al., 2018), in which the researchers focused on the complaint, refusal and advice tasks, their earlier result (no effect of task type) was confirmed only for highly proficient learners, but not for less proficient learners who appeared to struggle especially with the refusal task. 3. Data elicitation and interpretation In this section we first present some guidelines for conducting CAFFA research (Section 3.1), followed by two exemplar studies: in Section 3.2 we present the WISP project, whereas in Section 3.3 we describe the rating scale for FA. 3.1

Guidelines for conducting CAFFA research

The guidelines for conducting CAFFA research, presented in Box 1, refer to (1) type of research and the need for developmental, longitudinal research in combination with cross-sectional studies; (2) the necessity of cross-linguistic research involving various target and source languages; (3) the selection of appropriate measures to assess CAFFA; (4) the possible impact of task type; and (5) the importance of rater instruction and rater training.

338 Folkert Kuiken and Ineke Vedder

Box 1. Guidelines for conducting CAFFA research Type of research – Notice that individual learners may follow developmental paths that do not coincide with an observed group trend. – Try to combine findings from cross-sectional studies with data from longitudinal studies to identify individual patterns of how L2 speaking proficiency develops. Target and source languages – Keep in mind that speaking performance of learners with different source and target languages may vary. – Use native speaker data as a benchmark by means of which L2 learners can be compared. Measures – Consider carefully which measures to include in the research design. – Use valid, reliable, and non-redundant measures. – Use measures that are appropriate with respect to the proficiency level of the participants. – Use measures that are suitable to capture development over time. – If possible, make use of automated tools for assessing speaking proficiency. Task type – Realize that task type and task complexity may affect learning outcomes. – Carefully select language tasks and interpret outcomes in view of the cognitive complexity of the task. Rater training – When assessing oral proficiency, rater training is necessary. – Provide feedback to raters during the various stages of training.

These guidelines are further elaborated in Section 4. 3.2

The WISP project

In what follows we present two studies (De Jong et al., 2012; Hulstijn et al., 2012), that have resulted from the WISP project (‘What Is Speaking Proficiency?’). The project was conducted at the University of Amsterdam with the aim to assess the relationship between various CAFFA components and the influence of task com plexity on L2 and L1 speaking, by employing theoretically motivated procedures, assessment measures, and tools for eliciting and analyzing oral data. A concise description of the project can be found in Box 2.

Chapter 14. Speaking 339

Box 2. The WISP project Sample Studies De Jong et al. (2012)

Hulstijn et al. (2012)

Goal To investigate the effects of task complexity To examine the relationship between CAF on the oral performance of native speakers (i.e., grammatical accuracy, syntactic (NS) and non-native speakers (NNS), in complexity, lexical diversity) and FA on the terms of fluency, lexical diversity, and FA. one hand and global CEFR levels B1 and B2 on the other hand. Participants 208 NNS and 59 NS of Dutch. Tasks Four simple and four complex speaking tasks, involving role-play monologues addressed to the computer, which contrasted in complexity, formality, and discourse type. The task instructions specifically mentioned the audience that the participants should address in each task.  ie-4504

Measures Three types of fluency measures (breakdown, speed, and repair fluency), and lexical diversity.

Measures for declarative knowledge (productive vocabulary and grammar knowledge), speed of processing (speed of lexical retrieval, articulation, sentence building), and pronunciation (quality of vowels, diphthongs, consonants, intonation, word stress). Task-related rating scale for FA of six levels, based on the CEFR, containing descriptors pertaining to the amount and detail of information conveyed, relevant to the topic, setting (formal/informal), and discourse type (descriptive/persuasive), and the intelligibility of the answer. Results 1. CAF measures of grammatical accuracy 1. For NNS a significant difference was and lexical diversity correlate with the obtained in breakdown fluency (smaller phonation time ratio and more filled global CEFR levels B1 and B2. pauses in complex than in simple tasks) 2. Syntactic complexity plays a small role and in repair fluency (more repairs and only at higher proficiency levels. in complex tasks), while there was no significant difference for speed fluency (articulation rate). For NS no effect on silent pausing was found, but they used more filled pauses and had higher articulation rates in complex than in simple tasks. 2. Higher lexical diversity both for NS and NNS in complex speaking tasks compared to simple tasks.  ie-4512

 ie-4513

 ie-4514

340 Folkert Kuiken and Ineke Vedder

3.3

Development of a rating scale for FA

We will now describe the steps in which we developed a rating scale for FA. Fol lowing the recommendations by Pallotti (2009), Kuiken et al. (2010) included FA in their study on the relationship between CAF and FA for three different lan guages – Dutch L2 (N = 34), Italian L2 (N = 42), and Spanish L2 (N = 27) – in two written decision-making tasks. All texts were rated using two holistic scales, one for linguistic complexity and accuracy, the other for FA, inspired by descriptor scales derived from the CEFR. The tasks were judged by expert raters, who were native speakers of the target languages (Dutch: N = 4; Italian: N = 3; Spanish: N = 3). An important outcome of the study was that the CAF dimensions and FA in the three languages appeared to be connected. Particularly accuracy and lexical diversity were found to be strong predictors of adequacy ratings, while syntactic complexity had little or no effects. On the basis of the findings of this preliminary study Kuiken and Vedder (2017, 2018) presented a holistic six-point Likert rating scale for FA. The con struct of FA is related to Grice’s (1975) conversational maxims of quantity, rela tion, manner, and quality. FA is defined in terms of successful task fulfillment, focusing on the adequacy of L2 production in relation to a specific social con text and target task, interlocutor, speech act, register, and task modality. In the rating scale, four d imensions of FA are distinguished: Task requirements; Content; Comprehensibility; Coherence, and cohesion: – Task requirements: Have the requirements of the task been fulfilled successfully (e.g. genre, task type, speech acts, register, addressee)? – Content: Is the number of ideas provided in the text adequate and are they consistent with each other? – Comprehensibility: How much effort is required to understand text purpose and ideas? – Coherence and cohesion: Is the text coherent and cohesive (e.g., use of strategies for coherence, cohesive devices)? In order to test out the FA rating scale, the written data discussed above (Kuiken et al., 2010) were presented to non-expert raters (Kuiken & Vedder, 2017). All of them were university students of about the same age as the participants involved in the study (Dutch: N = 4, Italian: N = 4). Interrater agreement in terms of in traclass correlation coefficients varied from acceptable (0.73) to excellent (0.94). The FA rating scale was also tested out for the assessment of oral data. The same decision-making tasks were presented in an oral task modality (Kuiken & Vedder, 2018) to learners of Dutch L2 (N = 22, level A2-B2) and Italian L2 (N = 26, level

Chapter 14. Speaking 341

A2-C1). In the study on the speaking data, non-expert raters were asked to assess FA by means of the FA rating scale; these were all advanced university students and native speakers of the target language (Dutch: N = 4; Italian: N = 4). The rating scale had been slightly adapted for the assessment of the oral tasks: designations such as ‘text’, ‘writer’, and ‘reader’ used in the FA scale for writing, were replaced by ‘per formance’, ‘speaker’, and ‘listener’ in the scale for speaking. Intraclass correlations among the raters on the spoken texts varied for the two languages from good (0.86) to excellent (0.90). The two studies thus showed that the rating scale proved to be a valid, reliable, and useful instrument for assessing oral and written L2 performance. For an overview of studies in which the FA rating scales have been employed, we refer to Kuiken and Vedder (2022). 4. Advice to future CAFFA researchers As demonstrated by the review of CAFFA studies in Section 2.2 and the two case studies in Section 3.2 and 3.3, researchers have to make a number of theoretically motivated decisions and may encounter possible obstacles, setbacks, and pitfalls. This section elaborates on the guidelines for conducting CAFFA research that have been presented in Box 1 (Section 3.1). 4.1

Type of research

The majority of studies on L2 speaking have used a cross-sectional design, which makes it difficult to detect individual and developmental patterns of oral profi ciency. Besides that, research has shown that although the increase in oral profi ciency at the group level may be fairly linear, at the level of individual speakers there is a high degree of variability: individual learners follow different developmental paths that often do not coincide with observed mean group trends (De Clercq & Housen, 2019; Lahmann et al., 2019). Referring to Wolfe-Quintero et al. (1998), who demonstrated that many – if not all – aspects of language development are non-linear, Larsen-Freeman (2009) also called for research in which difference and variation occupy a central role. For these reasons, more attention should be paid to inter-individual variation, focusing more on different developmental patterns of individual learners. This means that cross-sectional studies must be complemented by or combined with longitudinal studies to identify individual patterns of how L2 speaking proficiency develops.

342 Folkert Kuiken and Ineke Vedder

4.2

Target and source languages

What is needed, next to developmental, longitudinal research (in combination with cross-sectional studies), is cross-linguistic research in which NS and NNS with various language backgrounds participate. NS data are needed as a benchmark by means of which the oral proficiency of L2 learners can be compared. Kuiken and Vedder (2017) observed in a study on written performance that L2 Italian participants received overall higher mean scores on FA than L2 Dutch learners. For speaking, on the other hand (Kuiken & Vedder, 2018), mean scores on all four dimensions of FA were more or less the same for both groups. Surprisingly, however, standard deviations in both writing and speaking were higher for the L2 Italian learners (all with Dutch as L1), displaying more interindividual variation than the L2 Dutch group (with various language backgrounds). This shows that the speaking performance on the same tasks of learners with similar proficiency levels, but with different source and target languages, may vary. Another interesting finding was observed by Ekiert et al. (2022). In a study on Japanese and Spanish learners of L2 English, Révész et al. (2016) concluded that higher frequency of filled pauses was associated with lower FA. However, this result was not repeated in their follow-up study (Ekiert et al., 2022), from which they eliminated the Japanese learners. They attribute this difference to the fact that, in Japanese, filled pauses are far more dominant than in English and therefore the L1 Japanese speakers seemed to overuse them in English. In Spanish, on the other hand, the use of silent and filled pauses is much more equivalent to that in English. Therefore, the L2 Spanish speakers might have used filled pauses in more target-like ways than their L2 Japanese counterparts. These studies illustrate how cross-linguistic differences and similarities be tween different target and source languages may affect oral proficiency. It is, there fore, advisable to take this evidence into consideration when assessing speaking performance. 4.3

Measures

Assessing oral proficiency can be done in many different ways, by focusing on one or more CAFFA components. With regard to complexity, we have seen in Section 2.1 that early research was restricted to syntactic and lexical complexity, mainly relying on length-based measures of overall complexity, measures of subor dination, and indices of lexical diversity. The later advance of automated complexity and natural language processing tools, like Coh-Metrix (Graesser et al., 2004), Lu’s (2010) L2 Syntactical Complexity Analyzer, and Kyle’s (2016) Tool for the

Chapter 14. Speaking 343

Automatic Analysis of Syntactic Sophistication and Complexity (TAASSC), has led to a substantial growth of complexity measures. These tools enable assessing the complexity of language performance in a relatively short time by means of some times more than 100 measures. What remains to be seen, however, is the extent to which these measures may be redundant (i.e., measuring the same construct), and in how far they can function as an index of L2 development. As hypothesized by Ortega (2003), beginning and intermediate L2 learners may prefer complexity by coordination and subordination, while phrasal complexity may be favored at more advanced levels of L2 proficiency. In a similar vein, researchers have shown that some measures may level off at later stages, e.g., mean length of utterance and morphological complexity (Brezina & Pallotti, 2019). At these later stages (CEFR levels B2-C2), other measures may be more suited to describe L2 performance, e.g., phraseological sophistication (Paquot, 2019). Compared to complexity, accuracy might seem easier to assess. There are, however, many occasions in which the judgment of whether a particular utterance should be rated as correct, less appropriate, or inaccurate is not that straightforward. Also L1 speakers often do not agree on what is accurate or acceptable. Another problem is that it might be difficult to determine at which stage of the acquisition process an L2 learner should be able to show accurate use of a particular structure or master the right pronunciation and intonation. Fluency, undoubtedly, also affects oral proficiency (see Section 2.1). What has to be sorted out is the role of the three main components (breakdown, speed, and repair fluency) and various subcomponents (false starts, repetitions, reformula tions, syllables per minute, filled and unfilled pauses). Concerning the relationship between fluency and FA, Révész et al. (2016) concluded that, depending on profi ciency, only repair fluency made a differential impact on FA: higher scores on FA were associated with a lower incidence of false starts in advanced L2 users’ speech. Based on a subset of these data, Ekiert et al. (2022) found that the fewer silent pauses L2 speakers produce between clauses, the more functionally adequate they are perceived to be (see also Section 4.2). Having this in mind, it is understandable that researchers may be overwhelmed by the plethora of measures that exist and that they struggle with the question of which to include in their study. It is not possible to give a specific answer to that question, but in general we would advise researchers on CAFFA to carefully con sider which measures they include in their design, by reflecting on the following questions: How valid and reliable are the measures employed in the study in relation to the construct(s) to be assessed? Is absence of redundancy among the selected measures guaranteed? Are the measures appropriate with respect to the proficiency level of the participants and/or are they suitable to capture development over time?

344 Folkert Kuiken and Ineke Vedder

4.4 Task type L2 users have to perform many different language tasks, varying from filling out a simple form to filing a complaint against a rent increase. As demonstrated in Section 2.2, in L2 speaking research, participants have been submitted to a sim ilar variety in oral tasks, including descriptive, narrative, and instructive tasks, next to more demanding assignments like refusals, opinion gap activities, argu mentative and problem-solving tasks. The question of whether type of task affects learning outcomes has been approached from different frameworks, in particular Skehan’s (1996) Limited Attentional Capacity Model or the Trade-off Hypothesis and Robinson’s (2001) Triadic Componential Framework for examining task in fluences on L2 production, also known as the Cognition Hypothesis. Both have provided evidence for an influence of task type. Foster and Skehan (1996) presented three different tasks (personal information exchange, narrative, and decision-making) to 32 pre-intermediate level students (18–30 years old) studying English as a foreign language under three different con ditions (unplanned, detailed planning, planned but without detail). Interactions were found between task type and planning conditions, such that the effects of planning were greater with the narrative and decision-making tasks than with the personal information exchange task. An example of a study lending (partial) support to the Cognition Hypothesis was carried out by Gilabert et al. (2011). They asked 42 Spanish/Catalan-speaking learners of L2 English to perform simple and complex versions of a narrative task, an instruction-giving map task, and a decision-making task. Task complexity had a different impact on each type of task, with learners being more accurate on the complex narrative task and less fluent and more lexically rich and accurate on the complex instruction-giving task. Task complexity had little or no influence on the decision-making task. It also appeared that each task type triggered very different types of discourse. These studies show that task type, in combination or not with other variables (like planning), may affect oral performance. Researchers are, therefore, recom mended to carefully select the language task(s) in their study and to interpret the outcomes in view of the cognitive complexity of the submitted task. 4.5

Rater training

When assessing oral proficiency, rater training is necessary. Rater training is a means to familiarize raters with CAFFA measures, scale dimensions, levels, and descriptors; it leads to a more effective use of scales and rubrics and to higher reliability and validity (Pill & Smart, 2021; Rezaei & Lovorn, 2010). As claimed by

Chapter 14. Speaking 345

Pill and Smart (2021, p. 136) “training is necessary to establish appropriate rater behavior and thereby develop rating expertise”. Crucial in rater training sessions is that raters receive feedback during the various stages of training. This holds for both CAF and FA, but assessment of FA has sometimes been found to be more difficult for raters. In a study on the communicative competence of German learners of English (Timpe, 2013; Timpe-Laughlin, 2018), it was found that while lexical and grammatical mistakes were easily determined by raters, they struggled with rating the appropriateness of L2 performance. The latter is in line with González-Lloret (2022), who signaled difficulties in assessing pragmatic features, as well as with Kuiken and Vedder (2022) who have recommended rater training when judging FA. Rater training does not have to be a lengthy process, as Kuiken and Vedder (2014, 2017) found that – for both expert and non-expert raters – two short train ing sessions fulfilled the aim to obtain acceptable or excellent interrater agreement scores on the FA rating scale. 5. Troubleshooting CAFFA research in ISLA During CAFFA research some hurdles may arise. The first difficulty concerns the use of rating scales in studies that include both L1 and L2 speakers. Secondly, there is the question to what extent rating oral proficiency depends on rater expertise and L1 speaker status. Third, the use of adapted rating scales may raise problems for construct validity. 5.1

L1 and L2 speakers in one and the same study

CAFFA measures can be used to assess oral proficiency of NS as well as of NNS. As mentioned in Section 4.3, when assessing NNS, data of NS are useful as they form a benchmark against which NNS results can be weighed. This is the approach that has been followed in, for example, the WISP project described in Section 3.2. In studies in which the FA rating scale has been used, some have included both NS and NNS, whereas others have focused on either NS or NNS (for an overview see Kuiken & Vedder, 2022). In research on the use of the FA rating scale where NS and NNS were included in one and the same study, some researchers have reported difficulties. Kuiken and Vedder (2014) noticed that in a study on written performance of Dutch (NS = 17, NNS = 32) and Italian (NS = 18, NNS = 39), NS tended to score at the higher end of the scale. In general, however, raters reported that they attempted to use all levels of the FA rating scale, although they were more consistent in judging NNS performance than NS output. Nuzzo and Bove (2020) came to a similar conclusion

346 Folkert Kuiken and Ineke Vedder

in their study with 20 NS and 20 NNS of Italian. NS were scored on the higher scale levels, such that this range restriction resulted in lower interrater correlation and alpha values. However, in another study focusing on the use of the FA rating scale by exclusively NS of Italian (N = 30), the authors observed a wider range in the use of the scale levels, resulting in satisfactory levels of reliability (Nuzzo & Bove, 2022). As the three latter studies all concern written performance, it is still undecided if what has been observed for L2 writing also holds for speaking. It is nonetheless recommended that researchers pay attention to this issue, and take these results into consideration when assessing the oral proficiency of both NS and NNS in one and the same study. 5.2

Rater variables

In Section 4.5 the relevance of rater training has been emphasized. Another often debated issue concerns raters’ background: should they be experienced language teachers or can laypersons be requested to do the job? And what about their lan guage proficiency: should they be NS of the target language or should NNS be able to assess L2 oral proficiency? According to Pill and Smart (2021, p. 136), “an expert rater is assumed to do a better job than a non-expert rater, but there are different types of expertise to con sider.” They conclude that, in general, experienced raters are more consistent than novices, although some may become adept at rating through training and experi ence. However, next to expertise in the process of rating, expertise in the specific linguistic domain being rated is also helpful. Duijm et al. (2018) found that both trained professional raters and untrained non-professional raters recognized vari ation in fluency and accuracy across a set of spoken performances, with the trained raters focusing relatively more on accuracy and the untrained raters on fluency. With regard to the language background of the raters, their L1 is generally viewed as irrelevant, but a sufficient level of proficiency in the L2 to be rated is needed. Zhang and Elder (2014) found that teacher raters with English L2 delivered ratings similar to those of teacher raters with English L1 (who underwent parallel training in using the rating scale), although the two groups approached the rating process somewhat differently. In studies in which the FA rating scale has been used, both expert and non- expert raters have been engaged, generally resulting in acceptable to excellent inter rater reliability scores. In studies where low interrater agreement was reported, this was not related to (lack of) rating experience, but to problems mentioned above, such as lack of rater training or difficulties to assess NS when combined with NNS in one and the same study. It should be noticed furthermore that the non-expert

Chapter 14. Speaking 347

raters who participated in these studies mostly were university students, often L2 learners of the target language under investigation, who could be assumed to have fairly good knowledge of the language under investigation, but did not necessarily have experience in rating L2 performance. This again, stresses the importance of rater training. 5.3

Scale adaptations

As described in Section 3.3, the FA rating scale developed by Kuiken and Vedder (2017, 2018) consists of four dimensions: Task requirements; Content; Comprehensibility; Coherence and cohesion. Some researchers have made adap tations, by either reducing the scale, e.g., Strobl & Baten (2022), who have combined Task requirements and Content into one dimension, or by extending the scale, e.g., Herraiz Martínez (2018) and Herraiz Martínez and Alcón Soler (2019), who have separated Coherence from Cohesion. Others have merged the outcomes on the subdimensions into a composite score for FA. There may be valid reasons for each of these adaptations, but the use of different dimensions, both separate and composite, is also hazardous. As Loewen (2022) points out, the use of different scales raises first of all questions about the construct(s) being measured. Every time an adaption is made, the scale should be validated again. Besides that, it becomes difficult to compare results across studies when different instruments are used. This has been demonstrated also for CAF studies (Michel, 2017). Studies in which a (slightly) different FA rating scale is used run the same risk. Loewen, therefore, warns researchers to consider the broader impact of changes they make to existing (FA) instruments. 6. Conclusions Starting with Levelt’s Model of Speech Production, we have discussed how oral proficiency in ISLA has been assessed along the lines of CAF and more recently also in terms of FA. By reviewing a number of studies on the assessment of speaking, including different languages, speaking tasks, and participants, we have discussed various theoretically motivated procedures, measures, and tools that have been employed for eliciting and analyzing oral data. On the basis of some example studies (the WISP project and our own research on FA) we have illustrated how to inter pret test results to establish L2 learning. Advice to future CAFFA researchers was given with respect to measures to be used, type of research, target and source lan guages, task types, and the importance of rater training. Cautions were formulated

348 Folkert Kuiken and Ineke Vedder

regarding the presence of NNS and NS in one and the same study, the participation of various types of raters (expert versus non-expert, NS and/or NNS), and the adaptation of rating scales. However, some challenges remain, like the question of how FA develops in individual learners and what the effect is of task type and task modality on (sub)dimensions of FA. It is also important to explore more thoroughly the potential role of FA in classroom and assessment practice and to examine the impact of different instructional treatments on the development of FA. These are research topics to be investigated further. 7. Further reading and additional resources 7.1

Books and journal articles

Culpeper, J., Mackey, A., & Taguchi, N. (2018). Second language pragmatics: From theory to research. Routledge. https://doi.org/10.4324/9781315692388 Housen, A., De Clercq, B., Kuiken, F., & Vedder, I. (Eds.). (2019). Special issue on linguistic complexity. Second Language Research 35(1). Ishihara, N., & Cohen, A. D. (2010). Teaching and learning pragmatics. Routledge. Kuiken, F., Michel, M., & Vedder, I. (Eds.). (2019). Special issue on linguistic complexity and instruction in SLA. Instructed Second Language Acquisition 3(2), 119–257. Kuiken, F., & Vedder, I. (Eds.). (2022). Special issue on the assessment of functional adequacy in language performance. TASK 2(1). https://doi.org/10.1075/task.21009.kui Kuiken, F., Vedder, I., De Clercq, B., & Housen, A. (Eds.) (2019). Special issue on syntactic com plexity. International Journal of Applied Linguistics 29(2). Roever, C. (2021). Teaching and testing second language pragmatics and interaction: A practical guide. Routledge. https://doi.org/10.4324/9780429260766 Taguchi, N., & Kim, Y. (Eds.) (2018). Task-based approaches to teaching and assessing pragmatics. John Benjamins. https://doi.org/10.1075/tblt.10

7.2

Journals, professional organizations, and websites

Applied Pragmatics. https://benjamins.com/catalog/ap Journal of Pragmatics. https://www.journals.elsevier.com/journal-of-pragmatics/ Intercultural Pragmatics. https://www.degruyter.com/journal/key/IPRG/html International Pragmatics Association. https://pragmatics.international Center for Advanced Research on Language Acquisition: Pragmatics and speech acts. https:// carla.umn.edu/speechacts/index.html Lu, X. Software and corpus downloads. http://www.personal.psu.edu/xxl13/download.html SALAT (Suite of Automatic Linguistic Analysis Tools). https://www.linguisticanalysistools.org/

Chapter 14. Speaking 349

References Brezina, V., & Pallotti, G. (2019). Morphological complexity in written L2 texts. Second Language Research, 35(1), 99–120. https://doi.org/10.1177/0267658316643125 Bulté, B., & Housen, A. (2012). Defining and operationalising L2 complexity. In A. Housen, F. Kuiken, & I. Vedder (Eds.), Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA (pp. 21–46). John Benjamins. https://doi.org/10.1075/lllt.32.02bul Council of Europe (2001). Common European framework of reference for languages: Learning, teaching, assessment. Cambridge University Press. De Bot, K. (1992). A bilingual production model: Levelt’s ‘speaking’ model adapted. Applied Linguistics 13(1), 1–24. https://doi.org/10.1093/applin/13.1 De Clercq, B., & Housen, A. (2019). The development of morphological complexity: A cross-lin guistic study of L2 French and English. Second Language Research, 35(1), 71–98. https://doi.org/10.1177/0267658316674506 De Jong, N. H., Steinel, M. P., Florijn, A. F., Schoonen, R., & Hulstijn, J. H. (2012). The effect of task complexity on functional adequacy, fluency and lexical diversity in speaking per formances of native and non-native speakers. In A. Housen, F. Kuiken, & I. Vedder (Eds.), Dimensions of L2 performance and proficiency. Complexity, accuracy and fluency in SLA (pp. 121–142). John Benjamins. https://doi.org/10.1075/lllt.32.06jon Duijm, K., Schoonen, R., & Hulstijn, J. H. (2018). Professional and non-professional raters’ re sponsiveness to fluency and accuracy in L2 speech: An experimental approach. Language Testing, 35(4), 501–527. https://doi.org/10.1177/0265532217712553 Ekiert, M., Lampropoulou, S., Révész, A., & Torgersen, E. (2018). The effects of task type and L2 proficiency on discourse appropriacy in oral task performance. In N. Taguchi & Y-J. Kim (Eds.), Task-based approaches to assessing pragmatics (pp. 247–264). John Benjamins. https://doi.org/10.1075/tblt.10.10eki Ekiert, M., Révész, A., Torgersen, E., & Moss, E. (2022). The role of pausing in L2 oral task per formance: Toward a complete construct of functional adequacy. TASK 2(1), 33–59. https://doi.org/10.1075/task.21013.eki Foster, P., & Skehan, P. (1996). The influence of planning and task type on second language per formance. Studies in Second Language Acquisition, 18(3), 299–323. https://doi.org/10.1017/S0272263100015047 Foster, P., & Wiggleworth, G. (2016). Capturing accuracy in second language performance: The case for a weighted clause ratio. Annual Review of Applied Linguistics, 36, 98–116. https://doi.org/10.1017/S0267190515000082 Gilabert, R., Barón Pares, J., & Levkina, M. (2011). Manipulating task complexity across task types and modes. In P. Robinson (Ed.), Second language task complexity: Researching the Cognition Hypotheses of language learning and performance (pp. 105–138). John Benjamins. https://doi.org/10.1075/tblt.2.10ch5 González-Lloret, M. (2022). The present and future of functional adequacy. TASK 2(1), 146–157. https://doi.org/10.1075/task.21008.gon Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193–202. https://doi.org/10.3758/BF03195564 Grice, H. P. (1975). Logic and conversation. In P. Cole, & J. L. Morgan (Eds.), Speech acts (pp. 41–58). Academic Press.

350 Folkert Kuiken and Ineke Vedder

Herraiz Martínez, A. (2018). Functional adequacy: The influence of English-medium instruc tion, English proficiency, and previous language learning experiences [Unpublished doc toral dissertation]. Universitat Jaume I, Castellón de la Plana. Herraiz Martínez, A., & Alcón Soler, E. (2019). Pragmatic outcomes in the English-medium instruction context. Applied Pragmatics, 1(1), 68–91, https://doi.org/10.1075/ap.00004.her Housen, A., Kuiken, F., & Vedder, I. (Eds.) (2012). Dimensions of L2 performance and proficiency. Complexity, accuracy and fluency in SLA. John Benjamins. https://doi.org/10.1075/lllt.32 Hulstijn, J. H., Schoonen, R., De Jong, N. H., Steinel, M. P., & Florijn, A. F. (2012). Linguistic competence of learners of Dutch as a second language at the B1 and B2 levels of speak ing proficiency of the Common European Framework of Reference for Languages (CEFR). Language Testing, 29(2), 203–221. https://doi.org/10.1177/0265532211419826 Izumi, S. (2003). Comprehension and production processes in second language learning: In search of the psycholinguistic rationale of the Output Hypothesis. Applied Linguistics 24(2), 168–197. https://doi.org/10.1093/applin/24.2.168 Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26(2), 275–304. https://doi.org/10.1177/0265532208101008 Kormos, J. (2006). Speech production and second language acquisition. Psychology Press. Kuiken, F., & Vedder, I. (2014). Rating written performance: What do raters do and why? Language Testing, 31(3), 329–348. https://doi.org/10.1177/0265532214526174 Kuiken, F., & Vedder, I. (2017). Functional adequacy in L2 writing: Towards a new rating scale. Language Testing, 34(3), 321–336. https://doi.org/10.1177/0265532216663991 Kuiken, F., & Vedder, I. (2018). Assessing functional adequacy of L2 performance in a task-based approach. In N. Taguchi, & Y.-J. Kim (Eds.), Task-based approaches to teaching and assessing pragmatics (pp. 265–285). John Benjamins. https://doi.org/10.1075/tblt.10.11kui Kuiken, F. & Vedder, I. (2022). Measurement of functional adequacy in different learning con texts: Rationale, key-issues and future perspectives. TASK 2(1), 8–32. https://doi.org/10.1075/task.00013.kui Kuiken, F., Vedder, I., & Gilabert, R. (2010). Communicative adequacy and linguistic complexity in L2 writing. In I. Bartning, M. Martin, & I. Vedder (Eds.), Communicative proficiency and linguistic development: Intersections between SLA and language testing research (pp. 81–100). European Second Language Association. Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine grained indices of syntac tic complexity and usage-based indices of syntactic sophistication [Doctoral dissertation, Georgia State University]. ScholarWorks. http://scholarworks.gsu.edu/alesl_diss/35 Lahmann, C., Steinkrauss, R., & Schmid, M. S. (2019). Measuring linguistic complexity in longterm L2 speakers of English and L1 attriters of German. International Journal of Applied Linguistics, 29(2), 173–191. https://doi.org/10.1111/ijal.12259 Lambert, C., & Nakamura, S. (2019). Proficiency-related variation in syntactic complexity: A study of English L1 and L2 oral descriptive discourse. International Journal of Applied Linguistics, 29(2), 248–264. https://doi.org/10.1111/ijal.12224 Larsen-Freeman, D. (2009). Adjusting expectations: The study of complexity, accuracy, and flu ency in second language acquisition. Applied Linguistics, 30(4), 579–589. https://doi.org/10.1093/applin/amp043 Levelt, W. J. M. (1989). Speaking. From intention to articulation. The MIT Press. Loewen, S. (2022). Functional adequacy: Task-based language teaching and instructed second language acquisition: A Commentary. TASK 2(1), 137–145. https://doi.org/10.1075/task.21007.loe I1040

Chapter 14. Speaking 351

Long, M. (2015). Second language acquisition and task-based language teaching. Wiley Blackwell. Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496. https://doi.org/10.1075/ijcl.15.4.02lu McNamara, T., & Roever, C. (2007). Testing: The social dimension. Blackwell. https://doi.org/10.1111/j.1473-4192.2006.00117.x Michel, M. (2017). Complexity, accuracy, and fluency in L2 production. In S. Loewen & M. Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 50–68). Routledge. https://doi.org/10.4324/9781315676968-4 Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in in structed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578. https://doi.org/10.1093/applin/amp044 Nuzzo, E., & Bove, G. (2020). Assessing functional adequacy across tasks: A comparison of learners’ and speakers’ written texts. E-JournALL, 7(2), 9–27. https://doi.org/10.21283/2376905X.12.175 Nuzzo, E., & Bove, G. (2022). Exploring the pedagogical use of the rating scale for functional adequacy in L1. TASK 2(1), 115–136. https://doi.org/10.1075/task.21011.nuz Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics, 24(4), 492–518. https://doi.org/10.1093/applin/24.4.492 Pallotti, G. (2009). CAF: Defining, refining and differentiating constructs. Applied Linguistics, 30(4), 590–601. https://doi.org/10.1093/applin/amp045 Pallotti, G. (2015). A simple view of linguistic complexity. Second Language Research, 31(1), 117–134. https://doi.org/10.1177/0267658314536435 Paquot, M. (2019). The phraseological dimension in interlanguage complexity research. Second Language Research, 35(1), 121–145. https://doi.org/10.1177/0267658317694221 Pill, J., & Smart, C. (2021). Rating: Behavior and training. In P. Winke & T. Brunfaut (Eds.), The Routledge handbook of second language acquisition and language testing (pp. 135–144). Routledge. https://doi.org/10.4324/9781351034784-15 Révész, A., Ekiert, M., & Torgersen, E. (2016). The effects of complexity, accuracy and fluency on communicative adequacy in oral task performance. Applied Linguistics, 37(6), 828–848. https://doi.org/10.1093/applin/amu069 Rezaei, A. R., & Lovorn, M. (2010). Reliability and validity of rubrics for assessment through writing. Assessing Writing, 15(1), 18–39. https://doi.org/10.1016/j.asw.2010.01.003 Robinson, P. (2001). Task complexity, cognitive resources and syllabus design: A triadic frame work for examining task influences on SLA. In P. Robinson (Ed.), Cognition and second language instruction (pp. 185–316). Cambridge University Press. https://doi.org/10.1017/CBO9781139524780.012 Skehan, P. (1996). A framework for the implementation of task-based instruction. Applied Linguistics 17(1), 38–62. https://doi.org/10.1093/applin/17.1.38 Strobl, C., & Baten, K. (2022). Assessing writing development during study abroad: The role of task and measures of linguistic and communicative performance. TASK 2(1), 60–84. https://doi.org/10.1075/task.21010.str Timpe, V. (2013). Assessing intercultural communicative competence: The dependence of receptive sociopragmatic competence and discourse competence on learning opportunities and input. Peter Lang.

352 Folkert Kuiken and Ineke Vedder

Timpe-Laughlin, V. (2018). Pragmatics in task-based language assessment. Opportunities and challenges. In N. Taguchi, & Y.-J. Kim (Eds.), Task-based approaches to assessing pragmatics (pp. 288–304). John Benjamins. https://doi.org/10.1075/tblt.10.12tim Upshur, J. A., & Turner, C. E. (1995). Constructing rating scales for second language tests. ELT Journal, 49(1), 3–12. https://doi.org/10.1093/elt/49.1.3 Vasylets, O., Gilabert, R., & Manchón, R. M. (2019). Differential contribution of oral and written modes to lexical, syntactic and propositional complexity in L2 performance in instructed contexts. Instructed Second Language Acquisition, 3(2), 206–227. https://doi.org/10.1558/isla.38289 Vercellotti, M. A. (2019). Finding variation: Assessing the development of syntactic complexity in ESL speech. International Journal of Applied Linguistics, 29(2), 233–247. https://doi.org/10.1111/ijal.12225 Wolfe-Quintero, K., Inagaki, S., & Kim, H.-Y. (1998). Second language development in writing: Measures of fluency, accuracy, and complexity. University of Hawai’i Press. Yuan, F., & Ellis, R. (2003). The effects of pre-task planning and on-line planning on fluency, complexity and accuracy in L2 monologic production. Applied Linguistics 24(1), 1–27. https://doi.org/10.1093/applin/24.1.1 Zhang, Y., & Elder, C. (2014). Investigating native and non-native English-speaking teacher raters’ judgements of oral proficiency in the College English Test-Spoken English Test (CETSET). Assessment in Education: Principles, Policy & Practice, 21(3), 306–325. https://doi.org/10.1080/0969594X.2013.845547

Section 5

Sharing your research

Chapter 15

Contributing to the advancement of the field Collaboration and dissemination in ISLA YouJin Kim and Laura Gurzynski-Weiss Georgia State University / Indiana University

In this final chapter, we share suggestions for developing ISLA research that informs pedagogy and vice-versa, particularly focusing on the importance of training ISLA researchers, promoting collaboration among various stakeholders, and disseminating ISLA research to a larger audience. First, we highlight the importance of connecting ISLA research and L2 pedagogy. We argue for the benefits of collaboration between ISLA researchers, program directors, teacher trainers, and classroom teachers, and share practical guidelines for promoting mutual benefits. Second, we share insights into how to disseminate ISLA research meaningfully both within and beyond academia. In terms of washback of ISLA research, we discuss answers to the “so what?” question, which include steps to follow after conducting ISLA research. We also provide detailed guidance with examples on how to write up ISLA research manuscripts and discuss various venues to share your research with the purpose of making an impact in the real world. We end the chapter and thus the volume with future directions for ISLA research in terms of different research methodologies and ways to maximize the robustness of your design and therefore the impact of your ISLA research on L2 pedagogy, our ultimate aim in the field. Keywords: instructed second language acquisition, research methods, teacher-researcher collaborations, research report

1. Introduction As we have seen throughout the volume, ISLA-related empirical questions may be answered through classroom-based and/or tightly controlled lab-based research; both types of studies can draw implications for L2 instruction. In this final chapter, we focus on two critical questions regarding how to facilitate collaboration with di verse stakeholders and how ISLA researchers can disseminate their work to a larger audience who can apply the research findings to instructional contexts, promoting the transferability of research into the real-world. https://doi.org/10.1075/rmal.3.15kim © 2022 John Benjamins Publishing Company

356 YouJin Kim and Laura Gurzynski-Weiss

To date, there has been a growing interest in ISLA research, with the translat ability of research findings to classroom practices receiving increased attention. While the importance of evidence-driven practices has been emphasized (Sato & Loewen, 2020), there have been concerns regarding a lack of connection between research and classroom practices. For instance, Medgyes (2017) states, “In short, research findings have predominantly failed to find their way into the classroom… It looks as if they are mere extras in the language-teaching operation” (p. 494). Medgyes further shares that, “until proven otherwise, the pedagogical relevance of language-related academic research is of dubious value and the role researchers play may be considered parasitical” (p. 496). However, there has been noticeable improvement in ISLA research in terms of research methods and the range of pedagogically-motivated topics. Moreover, researchers have begun to make explicit efforts to reach diverse stakeholders. In this chapter, we use the term researcher to refer to those who work in a higher education institution who have both research and teaching obligations and who are not the instructors of the research site class rooms. We use teachers to refer to those who are instructors of the research site classrooms, who do not have research obligations in their position, and who may or may not have training in research. We use the term teacher-researcher to refer to those who both research and teach regularly and are conducting research in their own classroom (see, for example, the section on action research). Finally, we rec ognize that there are many variations of such positions and do our best to address common realities and concerns affecting those involved in ISLA in these afore mentioned capacities. The remaining sections of the current chapter are as follows: (1) suggestions for developing ISLA research that informs pedagogy and vice versa, (2) dissemination of ISLA research, and (3) future directions for ISLA research. 2. Suggestions for developing ISLA research that informs pedagogy and vice-versa The overarching goal of ISLA research is to maximize the effects of instruction in various contexts. Therefore, researcher-teacher collaborations and training teachers to have a deeper awareness of the value of evidence-based pedagogy is critical. In this section, we outline suggestions for promoting ISLA research that informs pedagogy and vice-versa in three ways: (1) how to promote collabora tion among different stakeholders; (2) how teacher training programs can prepare teacher-researchers; and (3) how to conduct research on teachers’ awareness of research-grounded ideas.

Chapter 15. Contributing to the advancement of the field 357

2.1 Facilitate collaboration among ISLA researchers, language program directors, and classroom teachers The field of ISLA has recently experienced increasing attention to and recogni tion of the importance of collaboration among different stakeholders of lan guage instruction in order to ensure that research can ultimately be useful for L2 instruction. While many champion this potential relationship (Ortega, 2005; Sato & Loewen, 2019), others provide a less optimistic viewpoint. For example, Medgyes (2017) claims that although researchers want to share their expertise about teaching, teachers normally do not need to depend on research to do their job. Medgyes introduces the term “teacher-inquirer,” and encourages teachers to share their voice. Specifically, he claims that in order to enhance the mutual benefit of researcher-teacher collaborations, both groups’ needs should be taken into consid eration, and, we would argue, both groups should be shown the value of each other’s perspective. If teachers believe that they can perform their job well without reading research, they probably do not see the need to read research articles or be involved in research projects. Or if research is written in such a way that it remains inacces sible, teachers might not be interested in spending time learning novel ideas from research. As a result, dated or counterproductive instructional practices may be followed, with practices based more on habit than those shown to have been proven effective. From a researcher perspective, failing to involve a teacher perspective may result in investigating questions that are not relevant or useful to the realities of current classroom practices or possibilities and/or data that are not useful for a given context. Keeping ISLA research solely within the research community also risks not having any future impact or confirmation that each discovery does, in fact, translate to the L2 classroom. Without a clear dialogue between stakeholders, the full potential of ISLA is limited at best, artificial at worst. That being said, challenges associated with promoting such a dialogue must be identified before they can be addressed. From the teacher perspective, practitioners may not have access to research articles or have professional development funds to attend conferences or permission to take time off to participate in professional development occasions. They may also not be interested in reading articles that are full of jargon and that they may perceive as irrelevant to their own teaching. Furthermore, with a daily teaching schedule, engaging in research dialogue might not be a priority, or even a possibility, especially when it is not a required or com pensated element of their job (Borg, 2010). From an epistemological perspective (Medgyes, 2017), teachers may not believe in the need, utility, or validity of the research itself.

358 YouJin Kim and Laura Gurzynski-Weiss

From the researcher perspective, researchers have to juggle following robust research methods with the realities of the L2 classroom, as discussed in several chapters in the book. More specifically, classroom contexts are not comparable to lab-based settings in terms of controlling factors. Therefore, uncontrolled (and po tentially confounding) factors may impact the study results. For ecological validity reasons (i.e., how the study methods and findings reflect the real-life setting), classroom-based ISLA research may do its best to address uncontrolled factors (see Chapter 6) while explicitly recognizing the ones that cannot be controlled. Having confounding variables may negatively affect the quality of the research or the venues in which classroom research can be published but, on the flip side, it can represent the realities of the contexts where ISLA occurs. We therefore must balance these considerations to provide research that is both robust in terms of ISLA empirical research standards and relevant to the day-to-day practice of lan guage teaching. Journals must also take into consideration the nature of the study at hand rather than simply requiring certain methodological characteristics for potential publication. Below we list several ideas for promoting productive ISLA dialogue between re searchers and teachers in several contexts. First, we outline how facilitating dialogue between teachers and researchers can be accomplished via longitudinal projects.

Longitudinal collaboration example #1: Developing task-supported 2.1.1 course curriculum in a Korean as an L2 context YouJin Kim and her collaborators conducted a four-year longitudinal classroombased project in a Korean language program at a private university in the US. The research team consisted of five people – a researcher specialized in SLA and TBLT, a Korean language program director, two classroom teachers, and an MA research assistant. When Kim initially approached the program director to conduct a small-scale classroom-based task research project, the program director was con cerned about conducting classroom-based research in his Korean program because many restrictions needed to be followed for research design. The researcher and her research assistant created several tasks first which were then modified based on the program director and teachers’ input. Once the planned classroom research project started, the program director saw the potential of developing a research-driven Korean task-supported curriculum, and they created a collaborative long-term plan. This longitudinal study of task-based curricula development has been shared through various venues including conference presentations (Kim et al., 2017), jour nal manuscripts (Kim et al., in press; Kim et al., 2020), and a task-based Korean language textbook (Kim et al., 2021). While creating the entire course curricula, different stakeholders’ needs were discussed and addressed. Through numerous

Chapter 15. Contributing to the advancement of the field 359

weekly meetings and classroom observation, convergence among different stake holders’ opinions and needs became possible. This concrete example demonstrates the potential when there is collaboration between researchers, language program directors, and classroom teachers. The key component of such a project is that each stakeholder needs to experience positive rewards of such collaboration while working on ISLA research (e.g., students’ progress, their professional development).

Longitudinal collaboration example #2: Designing tasks 2.1.2 for elementary level learners Another example of such a project is an ongoing collaboration between a research team (faculty member Gurzynski-Weiss, graduate students Wray and Coulter-Kern, and undergraduates Underhill and Williams), local elementary-level Spanish teach ers, administrators, a community outreach liaison, and parents. The project began when Gurzynski-Weiss was mistakenly invited to a dual-language immersion meet ing. When she asked about exposure-track Spanish and found out the school did not have a program planned, she presented the benefits of such a program, and volunteered to write a grant, put a team together, and create one collaboratively. Gurzynski-Weiss et al. (in progress) conducted a needs analysis to investigate what L2 Spanish tasks 712 elementary/primary K-5 learners (ages 6–12) need to learn during their weekly Spanish exposure course. The research team then designed a tailor-made series of 34 tasks for an academic year, according to this information and in line with TBLT research. By involving all aforementioned stakeholders at each stage (before designing tasks, during the task design and piloting, meeting weekly during the program administration itself, and afterwards), the team en sured that the program is meaningfully situated within this unique community and highlights existing Spanish opportunities outside of the classroom where more than 30% of the population speaks Spanish as an L1. The project is contributing to the paucity of literature on exposure-track L2 development and demonstrating that it is possible for children to develop an L2 during minimal classroom time if the tasks are well-designed and motivated by both the needs of the community and ISLA knowledge. Presentations and write-ups are being completed collaboratively with all interested stakeholders playing a role to ensure the project is presented in meaningful venues.

360 YouJin Kim and Laura Gurzynski-Weiss

2.2 Build in collaborations during teacher training programs In order to promote such researcher-teacher collaborations and ensure that ISLA research meets the current needs in L2 contexts, it is also important to raise pre service and inservice teachers’ awareness of the value of ISLA research and provide training to empower teachers to fully participate in ISLA research and even conduct their own studies. Below we suggest two options that can be included in teacher training courses. 2.2.1 Conduct a classroom replication study as part of a teacher training program One way to motivate teachers to remain engaged with research and to continue to seek out a connection between research and teaching may be to incorporate and in troduce ISLA research into teacher training program curricula so that students are exposed to ISLA research and learn how to incorporate the findings into their future teaching. A good model can be found in Vásquez and Harvey (2010), who examined graduate students’ evolving perceptions of corrective feedback. Participants were enrolled in a semester-long SLA course in their applied linguistics graduate pro grams at the time of data collection. As part of a course requirement, they carried out a partial replication of Lyster and Ranta’s (1997) study of corrective feedback in some of their own English as a second language (ESL) classes. The goal of the replication study was to discover to what extent conducting the classroom-based research focusing on corrective feedback impacts their perception and beliefs about corrective feedback. Through such a replication study as a part of the coursework, participants were able to become aware about the multiple dimensions of corrective feedback and gained different perspectives of error correction. As a novice researcher, it can be daunting to carry out original empirical study. However, closely following the research design of previously published research can offer hands-on, step-by-step training on how to conduct research. Thus for novice researcher training purposes, conducting replication studies through class projects with other peers and the instructor can be an effective way to raise inservice teachers’ awareness of the value of research (see Chapter 5 for further details on conducting replication research). 2.2.2 Conduct an action research project as part of a teacher training program Action research is increasingly popular for ISLA practitioners as it immediately bridges the gap between research and pedagogy. Specifically, action research is conducted by teachers or teacher-researchers within their own classroom to better understand a specific question or issue at hand and with the goal of providing immediately usable information for a particular classroom (see Burns & Edwards, 2016). In other words, action research allows teachers to obtain an objective answer

Chapter 15. Contributing to the advancement of the field 361

to a question they have about their own classroom so that they can continue their teaching in line with what they find. Action research can be used for the teacher’s purposes only to make changes to their teaching based on what they find, as part of a teacher training course (more on that in the next section), or shared with a larger teaching and/or research audience in journals of scholarship of teaching and learning (SOTL; e.g., Journal of the Scholarship of Teaching and Learning), in journals focused on education (e.g., Journal of Educational Change), in journals that publish ISLA studies (e.g., Language Teaching Research), or even in journals dedicated specifically to action research (e.g., The Journal of Teacher Action Research; Educational Action Research; Action Research; Canadian Journal of Action Research). Given the nature of this research, conducted in a single classroom or two (depending on the responsibilities of the teacher or teacher-researcher), action research is often more qualitative than quantitative and typically does not have the goal of generalizable knowledge beyond an individual context (Calvert & Sheen, 2015; Mack, 2012). That being said, action research is increasingly prevalent in publications because it is conducted within real heterogeneous classrooms and thus provides concrete examples and useful sugges tions for those looking from both an ISLA research and L2 classroom perspective. 2.3 Conduct research on teachers’ awareness of and engagement with research-grounded ideas ISLA empirical studies have been slow to include teacher perspectives for trian gulating data and to more thoroughly understand how L2 learning opportunities occur in a given language classroom, or how relevant (or not) a given research paradigm is. In the former area, Gurzynski-Weiss (2016) examined how 32 Spanish L2 teachers make moment-to-moment oral corrective feedback decisions during grammar-focused university-level language lessons. She explored how teacher IDs including SLA research training, L1 (Spanish or English), and years of teaching experience related to how they paid attention to learner errors, how they decided which errors to correct, and how they provided corrective feedback. While the recorded classes would have shown the oral corrective feedback given to students and how students responded, by conducting stimulated recalls with the teachers and giving them background questionnaires about their IDs, Gurzynski-Weiss was able to uncover that the in-the-moment oral corrective feedback decisions are ex tremely complex and largely based on teachers’ backgrounds (L1 and experience) more so than their SLA training. In the latter area, examining research paradigms from teacher perspectives has been found to be particularly important in the area of TBLT. For example, Révész and Gurzynski-Weiss (2016) examined how experienced university-level

362 YouJin Kim and Laura Gurzynski-Weiss

ESL teachers conceptualized task complexity. Specifically, they asked 16 teach ers to examine a series of tasks and to analyze what level they would use each task for and how they would make each task more and less difficult. They further compared this information (collected via think-alouds and eye-tracking) to the three most utilized researcher-oriented paradigms of task complexity (Ellis, 2003; Robinson & Gilabert, 2007; Skehan, 1998) and found that the most researched para digm (Robinson & Gilabert) was missing the most important factor manipulated by teachers to influence complexity: linguistic structure. In surveying teachers, a prime audience to benefit from task complexity-related research findings (at least in theory, pun intended!), Révész and Gurzynski-Weiss uncovered that teacher voices are critical to ensuring ISLA research – conducted to ultimately benefit L2 classrooms – does in fact fulfill its greatest aim. 3. Dissemination of ISLA research 3.1 Writing a research report: Parts of an empirical ISLA study report In this section, we will walk you through the central components of an empirical study in the field of ISLA. As we have discussed in Chapter 1, there are different research methodologies that require different types of data, and each (e.g., quanti tative research, qualitative research, mixed methods, action research) would require different write-up styles (see example studies cited in Chapters 2, 3, 4, and 5). All of the components below will be needed when you share your research; some will be created during the study design prior to submitting to an ethics committee (see Chapter 6), and others will be completed after the data are collected and analyzed. We will take each component in turn. Our outline is a somewhat standardized format of an ISLA study report in peer-reviewed journals. Again, depending on your research methods, the content of each section may differ, but we intend to describe the organization of empir ical ISLA research reports in a general sense. Importantly, and as mentioned in Chapter 1, we encourage you to familiarize yourself with how studies on your research topic are written up. (1) Abstract The first part of a published research study, and what you will submit for conference presentations prior to publication, is an abstract. In 150–300 words, an abstract provides a concise and coherent description of your study with, at minimum, the theoretical and/or methodological framework of your study (to provide the read ers with a quick way of understanding what assumptions and operationalizations you are adhering to), the research questions and connection to prior research,

Chapter 15. Contributing to the advancement of the field 363

the context/setting, participants, methods and instruments, results (including the analyses undertaken), and the impact of your findings for the field and relevant stakeholders. Oftentimes following an abstract you will find five to seven key words that assist in categorizing your study for searching purposes. These keywords can indicate the general domain (instructed second language acquisition), specify the L2(s) of study (e.g., L2 Spanish), describe participants and/or the instructed context (e.g., university-level learners, Zoom classroom, domestic immersion), and often indicate one or two independent or dependent variables under study (e.g., task difficulty, willingness to communicate, engagement, functional adequacy); more on variables in the research question section). For this chapter, we chose the following keywords/phrases to represent our work and facilitate locating it via search engines: instructed second language acquisition, research methods, teacher-researcher collaborations, and research report. (2) Introduction After the abstract comes a brief introduction to the study. The introduction situates your empirical study within the specific domain of ISLA both theoretically and methodologically. The introduction explains how your study connects to and ex pands on prior research, and it specifies the variables that one will expect to find in the next section – the review of the literature – and operationalizes those terms. It also provides motivation to keep reading by explicitly stating why the study matters in all areas: theory, methods, and potential applications. It is important to note that, while some information in the introduction is repeated from the abstract, it must be written in a novel way. (3) Literature review The literature review presents a detailed synthesis of the prior research conducted on your research focus and related variables. Importantly, as the author of the study, you are responsible for identifying the literature necessary to justify your study de sign, providing original synthesis and drawing connections between studies while highlighting research gaps; it is not enough to summarize each individual study that you have read or that relates to a specific variable. A simple list of study summaries would be similar to an annotated bibliography, which we encouraged you to keep for your own records and learning (see Chapter 1). Literature reviews are divided into sections with relevant variables, themes, or questions as the section headings to guide the reader in further understanding what synthesis they will find in each paragraph. The temptation for novice researchers is to include too much detail in summarizing the literature and too little original synthesis, at times resulting in too much of a paper being about what others have done. The literature review motivates your study in a supporting role; it is not the

364 YouJin Kim and Laura Gurzynski-Weiss

main event.1 Some scholars provide explicit tie-ins to their own studies at the end of each sub-section of the literature review, stating how they are improving upon current research in specific ways. If you do not do this at the end of each section, you can add this information in a summarizing paragraph at the end of the literature review. Either way, upon finishing the literature review, the reader should have a clear idea what your study is about, how each part of your research questions are motivated, and how you are improving upon limitations present in prior work. While specific journal recommendations vary regarding the length of the literature review permitted (see each journal’s “Author Guidelines”), we recommend no more than 20–25% of the entire manuscript. (4) Research questions and hypotheses (explicit or unarticulated) The literature review often ends with research questions, or they are presented in a separate section after the literature review where the target research gaps are de scribed. You present your research questions concisely, either explicitly stating within the question how you are operationalizing key terms, or providing that information afterwards in an explanation with relevant citations. The most robust research questions are concise, specific, measurable, and straight forward. So for example, a less-than-robust research question is as follows: “Does feedback timing matter for learning Spanish?” This question is less than ideal due to the vagueness of the terms feedback timing and learning, and the lack of information regarding the participants of study and the tasks that they are completing. Whereas with a few a djustments, this research question can be made much stronger, as seen in Henderson (2020): “What is the effect of corrective feedback timing (immediate vs delayed after the task) on the L2 development of the Spanish past subjunctive?” (RQ1). A hypothesis is a prediction of an anticipated outcome or relationship between variables based on previous work. Some researchers provide explicit hypotheses, while others leave their hypotheses unarticulated though often interpretable by the study’s design decisions and synthesis of prior literature. Research following quantitative methods often starts with hypotheses that are motivated by previous theory and empirical research (see Chapter 2 for more information about hypoth esis testing). (5) Methods Following the explicit presentation of research questions and, if any, hypotheses, the empirical study then moves on to describe the methods in enough detail that the study can be fully understood and replicated. In this subsection you will often find 1. At least regarding empirical studies. For narrative syntheses, the literature review is the prin cipal focus (see Gurzynski-Weiss & Plonsky, 2017).

Chapter 15. Contributing to the advancement of the field 365

an overview of the methodology, the participants and context, and any materials and instruments used, data collection and data analysis procedures. The reader may remember when we outlined the need for being current in theoretical and meth odological discussions (Chapter 1), we were talking about the justification of the decisions being made. In the empirical study itself, the methodological discussion occurs in the literature review, and then, after a brief reminder of the methodology followed, the focus shifts to the specific methods within the study itself. In this brief reminder of the methodology, the study will often describe if it fits within overarching categorizations of quantitative (see also Chapter 2), qualitative (see also Chapter 3), mixed (see also Chapter 4), or replication methodology (see also Chapter 5). a. Research design In the research design section, two principal types of variables are introduced, particularly in quantitative research: independent variables or prediction variables, the variables the researcher(s) either set out to manipulate or research by holding constant (such as age), and dependent variables, the variables the researcher(s) measures to see if the manipulations of the independent variables have had the desired effect (see Chapter 2). For instance, in Henderson (2020), the independent variables are immediate or delayed feedback, and the dependent variable is L2 development (gains in target-like responses of the Spanish past subjunctive from pre- to post-test). For other research approaches such as case studies, researchers justify why such an approach was selected to answer the research questions. b. Participants and context In the description of participants and context, the author(s) must articulate all variables that are theoretically or methodologically relevant or may play a role – whether anticipated or not – in the empirical study. This is important not only to ensure that there are no moderating or mediating variables during the analysis por tion of the study, but also to provide a way of understanding how the results from the study may or may not extend to the reader’s context during the discussion and implications section. Potential participant differences at the onset are particularly important when there is an intervention at play or a study that examines develop ment; we want to be able to state with confidence that the observed effects are due to the intervention itself and not to unexpected or uncontrolled factors. While the ultimate goal in ISLA is to understand the nature of how L2s are learned in instruc tional contexts and what can be done to maximize this learning, the myriad vari ables at play in each real-life context renders it impossible to control all variables. Thus, to ensure potential generalizability (at best) or correct interpretation (min imally), we must report as much information about the participants and context

366 YouJin Kim and Laura Gurzynski-Weiss

as possible. At minimum, the participant information should include the number of participants, the proficiency level(s) and how this was determined (ideally with respect to both receptive and productive skills), L1s, current use of and future plans with the language of study (in university-level settings this often results in stating their majors and/or minors); (current) age and biological sex is often also provided, though age of L2 onset and information about L2 proficiency or specific areas competencies may provide better insight (see Chapter 6). In terms of context, the author(s) must at least provide the type of instruc tional context (e.g., two-year community college in a rural monolingual area, ref ugee center for recently arrived former political prisoners of a specific country or area, elementary school in a multilingual city, four-year public research university, face-to-face classroom, Zoom classroom, two-week full immersion study abroad program) as well as all other relevant details such as hours of instructional con tact, information about the instructor(s) and other interlocutors, use of the target language outside of the classroom, type of pedagogical methodology followed or example of a typical lesson, goals or objectives and how these are measured, etc. The type and depth of information provided on each of the aforementioned factors depends on the specific study at hand. c. Materials/instruments In this next section the author(s) outlines the specific materials or research instruments (tools) used to collect their data in order to answer the research questions in their study. Materials may include pre- and post-tests, oral or writing tasks, interview protocols, or questionnaires, to name but a few. Most journals now encourage or require all instruments to be submitted to IRIS-database.org (a digital repository of instruments and materials from published L2 research) so that others looking to use the same type of instrument can imme diately find similar materials; this also allows the authors to save space and word count by using IRIS to house the instrument rather than including it directly in the study as an Appendix. If the materials are L2 tasks, they are often also shared on the TBLT Language Learning Task Bank (tblt.indiana.edu). Note that IRIS publishes instruments from empirical studies once a paper has been accepted or published; the Task Bank is open to all at any stage. Thus, both sites offer advanced shar ing of your work while you wait for the sometimes lengthy process of review and publication. Many individual journals also permit and encourage the storage of instruments in their online storage (e.g., Studies in Second Language Acquisition) and, in some cases, raw anonymized data, as well (e.g., System).

Chapter 15. Contributing to the advancement of the field 367

d. Procedures In this section, the design details of the study and the procedures followed are dis cussed. This includes determining participant recruitment and inclusion/exclusion criteria, as well as all details regarding data collection, including where the data were collected within a given study space, the script read before data collection, versions of software used, time limits imposed (and justifications and/or tie-ins to prior research if applicable), etc. Enough detail must be provided that the reader knows exactly in what order what happened, where, for how long, with whom, and why the design and procedures were the most robust possible given prior studies and current trends in the field. e. Data coding and analysis The next section of the paper focuses on how data were coded and analyzed. Data coding is different from data analysis in that data coding often involves coding schemes (e.g., feedback types, pragmatic strategies, interview thematic coding), and data analysis follows after coding is completed. Much like all prior sections, com plete detail must be provided to allow someone else to reach the same conclusion with your raw data, as well as to be able to analyze additional datasets in a reliable and valid way. Justification must be provided throughout this section regarding the specific analyses chosen and why they were the most robust for the dataset at hand. Researchers often adopt previous data coding schemes that have been tested for validity and make necessary modifications for their own data. Examples should be provided to demonstrate coding, whether it be qualitative or quantitative, and representative if not full presentation of the data is also required (see final recom mendation on supplemental files at the conclusion of this section). This section you would also describe and preview supplemental analyses that you performed such as qualitative data that was quantified (e.g., describing that you counted the frequency of occurrences of certain coding) or quantitative data that was also qualitatively analyzed (e.g., describing that you qualitatively zoomed in on a certain part of the data to supplement quantitative findings) (see Chapter 4). Measures of inter-rater reliability are needed if there is any interpretation of the data, during coding and/or analysis; the rule of thumb is that a second per son, trained in your coding protocol, codes a representative and randomized 30% of your data, and you compare the coding results. The specific calculations of inter-rater reliability depend on the specific type of analysis; see Larsson et al. (2020). During the inter-rater reliability calculation, you can determine whether the coding scheme is being used consistently. Coding schemes can be developed based on previous research either theory-driven, empirically-driven or a mixture of both. We also recommend that you mention how data was stored and cleaned of

368 YouJin Kim and Laura Gurzynski-Weiss

identifying information, so that others may request the data from you for future use, and/or save the data in the same way for retroactive cross-study or meta-analysis (Gass et al., 2021). (6) Results Following the methods section, the results section presents what has been found in relation to the research questions. We recommend repeating each research ques tion, either explicitly as the question or in thematic prose, and then providing detailed results in turn. In the results section, you report the findings of the study objectively. We rec ommend that you save your own interpretation of the findings for the next section, Discussion, or, if you decide to combine results and discussion in one single sec tion, that you first present the results and then provide interpretation. The latter part is often preferred in qualitative research. If your study is based on quantitative analysis, make sure to provide descriptive statistics and the appropriate statistical results. You must provide the data required by the style used by the publication venue (e.g., American Psychological Association; APA) and in the required format (tables and/or figures). For qualitative data, depending on the types of data, the way the results are reported could vary. (7) Interpretation, discussion, and implications While some authors choose to combine the results presentation with interpretation, discussion, and implications, many still separate these sections. If you choose the latter option, you then interpret the results that have been presented and discuss their meaning within the context of the prior research you outlined in the literature review. This is not the place to introduce numerous additional studies; instead, it is where you revisit what you have already shared with your readers and highlight how your work supports earlier findings or offer explanations of why and how your results differ from prior findings. If you are offering an alternative to prior patterns, you must justify why the results are different. You also need to elaborate on what the implications are for your findings, what new insight they offer, and what other future studies should do in light of what you have found. For example, if your adjustment to a specific measure of listening competency found a more robust relationship between a treatment and outcome measure compared to prior studies, the implications section is where you would mention this. More importantly, how the research findings inform teaching should be discussed in a teacher-relevant manner.

Chapter 15. Contributing to the advancement of the field 369

(8) Limitations and future directions Every study has limitations, whether inherent to the design choices (e.g., using an intact class to increase ecological validity while rendering impossible a ran domized sample), occurring unexpectedly during data collection, or discovered during results interpretation (including design flaws that you could have noticed and addressed earlier). As author, you must explicitly state that you are aware of the limitations, enumerate them, and suggest ways for future researchers to ameliorate the issues in future iterations. This section is also where you can suggest specific steps for others to take to advance the line of work one step further. You may also state the ongoing or soon-to-be-published work that you have that continues this branch of research. (9) Conclusions In the conclusions section, often 1–2 paragraphs, you provide a recap of the purpose of the study, making sure to situate and contextualize it within the current theoret ical and methodological discussions of the field, state the unique contributions of the study and often provides a final suggestion for the specific next step in the field. In other words, the conclusion answers the questions: What was this study about? How does it relate to prior work? How does it contribute something new? And, the very frank and central question: why should I care about this study? (10) References With few exceptions, the majority of ISLA research is published in journals and presses that follow APA formatting.2 APA style dictates the punctuation, headings, and subheadings used, and guides how the references should be shared. Each and every study or prior work mentioned explicitly in the study must be properly cited in the references section; this is where your readers can find the information you shared and keep reading to inform their own work and more fully appreciate your contribution. One must ensure that every citation within the paper is necessary and serves a unique purpose as the references section answers the questions: How does this study fit into the larger field of ISLA? What other resources can I read if I’m interested in this topic?

2. You can find the most recent APA manual in your school library or for purchase, or, for an abbreviated version, we encourage you to utilize Purdue University’s Online Writing Lab, which has the information available for free online https://owl.purdue.edu/owl/research_and_citation/ apa_style/apa_style_introduction.html.

370 YouJin Kim and Laura Gurzynski-Weiss

(11) Appendices After references come the appendices, if applicable. Listed alphabetically (Appendix A, Appendix B, etc.) the appendices house additional information not included in the body of the study. If you have the space to include your instruments within your study, you would place them within the appendices. The appendices provide the answers to the questions: What instruments were used in this study? What else do I need to know or have at my disposal to replicate or expand on this work? (12) Tables and figures The tables and figures of your study present data in digestible and meaningful ways. While some journals require placing tables or figures within the body of the text (e.g., Second Language Research and Studies in Second Language Acquisition), others ask you to add a placeholder within the text along the lines of “[INSERT TABLE X ABOUT HERE],” and add the tables and figures after the appendices (e.g., Foreign Language Annals and the Journal of Second Language Teaching and Research). Once the paper is accepted, all tables and figures are inserted at the right place during the typesetting stage. The tables and figures answer questions including: What data are important in this study? What are the findings of this study? Note that tables and figures are denoted in numerical order according to appearance within the body of the main text and that many journals count the main manuscript, the references, appendices, and tables and figures within their word count. APA formatting is very specific on what and how to report results in tables and figures and how to title them accordingly. (13) Supplemental and online only materials Finally, some information may need to be cut from your write-up due to space limits. It is becoming increasingly common for appendices and/or supplemental materials (including downloadable copies of raw data) to be available within the online version of journals (e.g., Modern Language Journal and Language Teaching Research). 3.2

Sharing ISLA research with diverse stakeholders

Echoing back to the beginning of the volume, we conduct research to advance our knowledge about a chosen topic or an issue within the field of ISLA. In order to make ISLA findings more meaningful, communicating the research findings with practitioners and offering hands-on instructional policy implementation or materials design are critical. We recommend that you survey the stakeholders who could be impacted by (or impact) your research study and make sure that you com municate your results and invite their continued participation. In other words, the

Chapter 15. Contributing to the advancement of the field 371

practical implications of the study should be discussed with stakeholders who are directly associated with decision making – program directors, classroom teachers, material publishers, administrative staff and, in the case of language programs in volving minors, parents and guardians. Given this substantial and diverse list, how can we ensure that our research findings are disseminated to a larger community of practice-oriented stakeholders? Within academia, we often share our work in conference presentations and pub lications. With few exceptions (COVID19-related conference delays and postpone ments being one of them), studies are presented prior to being published (and in some cases, that is a requirement for the conference). When it comes to conferences, there are many types; there are smaller conferences (e.g., Georgetown University Round Table), or research-interest groups (e.g., International Association for Task-Based Language Teaching), regional conferences (e.g., Northeast Conference on the Teaching of Foreign Languages, Southwest Conference on Language Teaching, Sunshine TESOL state conference), or larger national (e.g., American Association of Applied Linguistics, Canadian Association of Applied Linguistics) or even international conferences (e.g., AILA World Congress). We recommend that you choose your target conference for your study based on the study focus and design and the logistics of what is possible for your travel. In general, we recom mend presenting locally and then, as you gain more experience or co-author with a more senior colleague or faculty member, to attend and present at larger venues. In part to reach teachers and go beyond academia when disseminating your research, there has been an increasing number of open-access journals or arti cles within journals such as Studies in Second Language Learning and Teaching, Language Learning and Technology, TESL Canada Journal, the Canadian Journal of Applied Linguistics, and TASK. Many of these journals publish articles that focus on the link between research and practice, and they can be useful resources for teachers. Furthermore, some of the journals that allow only restricted access to their content provide video abstracts (see TESOL Quarterly, and Foreign Language Annals) and challenge statements (see Foreign Language Annals) at no cost. Such video abstracts and challenge statements (motivation of the study) are required to be created using practitioner-friendly terms. Furthermore, ISLA researchers are also now able to participate in the “Open Accessible Summaries in Language Studies (OASIS),” where one-page summaries of published research studies are written and shared in non-technical language. Professional and language-specific organization newsletters, such as TESOL’s interest section on Applied Linguistics in the post-international convention (ALIS Newsletter – September 2021 (com mpartners.com) are some good examples. The authors are often researchers who conduct practice-oriented research, and this freely available on-line resource offers up-to-date information about connecting research and practice. Nowadays social

372 YouJin Kim and Laura Gurzynski-Weiss

media allows a direct communication channel between authors and audience: The research-practice gap – Sandy Millin (wordpress.com). ISLA researchers may want to collaborate directly with teachers by publishing teaching resources on-line for easy accessibility. Recently, due to easy access to on-line resources, researchers have begun to transform their research articles into practitioner-friendly resources such as on-line blogging or YouTube channels (see Kazuya Saito’s YouTube channel “Kazuya L2 lab” or Florencia Henshaw’s “Unpacking language pedagogy), podcasts (Inspired Proficiency; We Teach Languages; Radio Ambulante, etc.), or creating/participating in social media groups for collaborative dialogue between researchers and teachers (Facebook groups including Spanish Language Pedagogy in Higher Ed; Research in Language Education; Technology for Language Teaching and Learning). Additionally, a growing interest in making research materials or instructional tasks available through open-access online platforms has been shared among ISLA researchers. For instance, the TBLT Language Learning Task Bank https://tblt. indiana.edu/ shares a number of tasks that were used in previous research studies or that were reviewed by the Task Bank board members. Additionally, a digital repository of instruments and materials from published L2 research (IRIS-database. org) shares research materials including (but not limited to) ISLA research and pedagogical intervention. Finally, we recommend that you survey the stakeholders who could be im pacted by (or impact) your research study and make sure that you communi cate your results and invite their continued participation. In the aforementioned ongoing elementary-level task-based program example from Gurzynski-Weiss et al., the research team is regularly updating parents, the community liaison, and administrators with the progress of the student work and teacher and student ex periences, as well as sharing the upcoming task-based lesson communicative goals and ways to practice in the community. This is being shared in regular meetings between the researchers, teacher, community liaison, and administrators, in reg ular email write-ups to the administrators, in social media posts on the parent/ caregiver Facebook page and school website, and in flyers and similar materials shared in community festivals. Ongoing research sharing and continued collabo rative conversation does not have to be a significant investment of time to have a significant impact.

Chapter 15. Contributing to the advancement of the field 373

4. Future directions for ISLA research Over the last two decades, the field of ISLA has been noticeably growing. Building on the current vibrant research in the field of ISLA, what are our next steps? 4.1 Conduct practice-based research ISLA researchers advocate making connections between research and practice and particularly promote collaboration between researchers and practicioners. Sato and Loewen (2022) propose practice-based research (PBR) as an alternative to evidence-based practice (see also Chapter 4, this volume). According to Sato and Loewen, in PBR, research topics are chosen both by researchers and practition ers after reviewing the practical concerns in instructional settings. This process would allow researchers to choose topics that are in need of investigation from not only their perspectives – which are often motivated by previous research – but practitioners’ perspectives as well. Once the topic is chosen, Sato and Loewen sug gest that the development and implementation of the project could be done while taking practitioners’ perspectives into consideration. As a result, the intervention of instruction could be in alignment with inquiries that the practitioners have. This process would increase the ecological validity of research. Finally, the findings of the study could be directly applied to classroom instruction. Sato and Loewen claim that practice-based research is cyclical. 4.2 Conduct a replication study Recently, the demand for replication studies has increasingly been discussed across different scientific disciplines (see Chapter 5). The desire for replication is connected to the integral role such studies have in verifying existing trends and demonstrating the generalizability of previous findings. With increasing awareness of the potential contributions of replication studies to advancing the field, applied linguists have acknowledged the importance of these studies (e.g., Mackey & Porte, 2012; Porte & McManus, 2019). You will see that the major applied linguistics journals such as Language Teaching and Studies in Second Language Acquisition have explicitly advocated for the value of replication studies and have increasingly published replicated empirical studies (e.g., Erlam & Ellis, 2019) and position papers that call for replication stud ies (e.g., Révész, 2019). As evidence-based suggestions might not be generalizable to different contexts, replicating the same or similar research design in diverse contexts would increase the validity of findings in L2 instruction (see Chapter 5).

374 YouJin Kim and Laura Gurzynski-Weiss

4.3 Expand our conceptualization of the instructional contexts As mentioned at the outset of the volume (see also Collins & Muñoz, 2016; Sato & Storch, 2020; Zou, 2020), like much of the world, the L2 classroom is going through transformative change, some proactive and some reactive to the changing post-pandemic realities of teaching and living in a global community; ISLA re search must follow suit. Additionally, due to the increasing interest in the use of mo bile or tablet applications, researchers have examined the effects of mobile-assisted language learning in language development as well as the factors contributing to the use of mobile learning applications (e.g., Loewen et al., 2019) 5. Conclusions In this volume we walked you through how to conduct ISLA research step-by-step (Chapters 1 and 6) and introduced different research methodology in the field of ISLA (Chapters 2, 3, 4, and 5). Researchers specialized in each language skill shared their expertise and discussed representative data collection and analysis options for different language skills (Chapters 7–14). In this final chapter, we concluded the volume by reiterating why ISLA research is needed and how it can be disseminated to various stakeholders so that the research can impact real-world second language teaching and vice versa. It is our hope that each chapter offers helpful information on how to conduct ISLA research, whether you are new to ISLA research or research in general, a seasoned researcher looking for inspiration for a new area of work, or anywhere in between. Finally, now more than ever, we must conduct research that is relevant and fair to those participating and those who would benefit from the findings (see Chapter 6). And it is important that we (re)examine in more nuanced detail how interaction can be optimally designed and facilitated for each of the myriad L2 learning contexts, especially those that have emerged in online and hybrid online/ in-person settings. We hope that this volume inspires and guides novice and expe rienced researchers alike to conduct ISLA research that has a positive impact and creates knowledge that benefits all. 6. Further reading and additional resources This chapter highlights the most important considerations for sharing your ISLA research in both academic and non-academic settings. Given this expansive topic, there is much additional reading available on each of the topics discussed here.

6.1

Chapter 15. Contributing to the advancement of the field 375

The research-pedagogy connection

Allright, D., & Bailey, K. M. (1991). Focus on the language classroom: An introduction to classroom research for language teachers. Cambridge University Press. Burns, A., & Edwards, E. (2016). Language teacher action research: Achieving sustainability. English Language Teaching Journal, 70(1), 6–15. https://doi.org/10.1093/elt/ccv060 McKay, S. L. (2006). Researching second language classrooms. Lawrence Erlbaum Associates. https://doi.org/10.4324/9781410617378 Ortega, L. (2005). For what and for whom is our research? The ethical as transformative lens in instructed SLA. Modern Language Journal, 89(3), 427–443. https://doi.org/10.1111/j.1540-4781.2005.00315.x Rose, H., & McKinley, J. (2017). The prevalence of pedagogy-related research in applied linguis tics: Extending the debate. Applied Linguistics, 38(4), 599–604. https://doi.org/10.1093/applin/amw042 Wallace, M. J. (1998). Action research for language teachers. Cambridge University Press.

6.2 Sharing your work within academia Applied Linguistics Research Methods – Discussion, https://www.facebook.com/groups/ appliedlinguisticsresearchmethods IRIS-database.org OASIS-database.org The TBLT Task Bank, tblt.indiana.edu Story Builder, http://www.story-builder.ca/ Tulquest, http://tulquest.huma-num.fr/ Foreign Language Annals, https://onlinelibrary.wiley.com/journal/19449720 Instructed Second Language Acquisition, https://journal.equinoxpub.com/ISLA Language Learning and Technology, https://www.lltjournal.org/ Studies in Second Language Acquisition, https://www.cambridge.org/core/journals/studies-in-secondlanguage-acquisition Studies in Second Language Learning and Teaching, https://pressto.amu.edu.pl/index.php/ssllt Second Language Research, https://journals.sagepub.com/home/slr System, https://www.journals.elsevier.com/system TESOL Quarterly, Wiley Online Library

6.3

Sharing your work beyond academia

L2/FL Pragmatics Wiki, http://wlpragmatics.pbworks.com/ Open Educational Resources Commons, https://www.oercommons.org/ MERLOT, https://www.merlot.org/merlot/ The National Foreign Language Resource Center, https://www.nflrc.org/ The TBLT Task Bank, tblt.indiana.edu Technology for Language Teaching, https://www.facebook.com/groups/TechforLTL TES, https://www.tes.com/en-us/teaching-resources Kazuya Saito L2 Lab, https://www.youtube.com/channel/UCjv5y3IVuVQVgV-BSxVb5Wg

376 YouJin Kim and Laura Gurzynski-Weiss

References Borg, S. (2010). Language teacher research engagement. Language Teaching, 43(4), 391–429. https://doi.org/10.1017/S0261444810000170 Calvert, M., & Sheen, Y. (2015). Task-based language learning and teaching: An action-research study. Language Teaching Research, 19(2), 226–244. https://doi.org/10.1177/1362168814547037 Collins, L., & Muñoz, C. (2016). The foreign language classroom: Current perspectives and future considerations. Modern Language Journal, 100(S1), 133–147. https://doi.org/10.1111/modl.12305 Ellis, R. (2003). Task-based language learning and teaching. Oxford University Press. Erlam, R., & Ellis, E. (2019). Input-based tasks for beginner-level learners: An approximate rep lication and extension of Erlam & Ellis (2018). Language Teaching, 52(4), 490–511. https://doi.org/10.1017/S0261444818000216 Gass, S., Loewen, S., & Plonsky, L. (2021). Coming of age: The past, present, and future of quan titative SLA research. Language Teaching, 54(2), 245–258. https://doi.org/10.1017/S0261444819000430 Gurzynski-Weiss, L. (2016). Factors influencing Spanish instructors’ in-class feedback decisions. The Modern Language Journal, 100(1), 255–275. https://doi.org/10.1111/modl.12314 Gurzynski-Weiss, L., & Plonsky, L. (2017). Look who’s interacting: A scoping review of research involving non-teacher/non-peer interlocutors. In L. Gurzynski-Weiss (Ed.), Expanding individual difference research in the interaction approach: Investigating learners, instructors, and other interlocutors (pp. 305–324). John Benjamins. https://doi.org/10.1075/aals.16.13gur Gurzynski-Weiss, L., Wray, M., & Coulter-Kern, M. (in progress). Measuring L2 Spanish development over time in task-based elementary-level exposure settings. Henderson, C. (2020). Perfect timing? Exploring the effects of immediate and delayed corrective feedback, communication mode, and working memory on the acquisition of Spanish as a foreign language [Doctoral dissertation, Indiana University]. ProQuest. Kim, Y., Choi, B., Choi, S., Kang, S., Kim, B., & Yun, H. (2017, November). Task repetition, written corrective feedback, and the learning of Korean. Paper presented at the American Council on the Teaching of Foreign Languages (ACTFL), Nashville, Tennessee. Kim, Y., Choi, B., Kang, S., Kim, B., & Yun, H. (2020). Comparing the effects of direct and in direct synchronous written corrective feedback: Learning outcomes and students’ percep tions. Foreign Language Annals, 53(1), 176–199. https://doi.org/10.1111/flan.12443 Kim, Y., Choi, B., Kang, S., Yun, H., & Kim, B. (2017, April). TBLT might not work with lowlevel foreign language learners taught by novice teachers: Overcoming the challenges for the better. Paper presented at the International Conference on Task-based Language Teaching Conference, University of Barcelona, Barcelona, Spain. Kim, Y., Choi, B., Yun, H., Kim, B., & Choi, S. (in press). Task repetition, synchronous writ ten corrective feedback and the learning of Korean: A classroom-based study. Language Teaching Research. https://doi.org/10.1177/1362168820912354 Kim, Y., Choi, B., Yun, H., Kim, B., & Kang, S. (2021). Learning Korean through tasks: High beginner to low intermediate. Kong & Park Publishing. Larsson, T., Paquot, M., & Plonsky, L. (2020). Inter-rater reliability in learner corpus research: Insights from a collaborative study on adverb placement. International Journal of Learner Corpus Research, 6(2), 237–251. https://doi.org/10.1075/ijlcr.20001.lar

Chapter 15. Contributing to the advancement of the field 377

Loewen, S., Crowther, D., Isbell, D. R., Kim, K. M., Maloney, J., Miller, Z. F., & Rawal, H. (2019). Mobile-assisted language learning: A Duolingo case study. ReCALL, 31(3), 293–311. https://doi.org/10.1017/S0958344019000065 Lyster, R., & Ranta, L. (1997). Corrective feedback and learner uptake. Studies in Second Language Acquisition, 19(1), 37–66. https://doi.org/10.1017/S0272263197001034 Mack, L. (2012). Does every student have a voice? Critical action research on equitable class room participation practices. Language Teaching Research, 16(3), 417–434. https://doi.org/10.1177/1362168812436922 Mackey, A., & Porte, G. (2012). Why (or why not), when, and how to replicate research. In G. Porte, C. A. Chapelle, & S. Hunston (Eds.), Replication research in applied linguistics (pp. 21–46). Cambridge University Press. Medgyes, P. (2017). The (ir) relevance of academic research for the language teacher. ELT Journal, 71(4), 491–498. https://doi.org/10.1093/elt/ccx034 Porte, G., & McManus, K. (2019). Doing replication research in applied linguistics. Routledge. Révész, A. (2019). Replication in task-based language teaching research: Kim (2012) and Shintani (2012). Language Teaching, 52(3), 374–384. https://doi.org/10.1017/S0261444817000283 Révész, A., & Gurzynski-Weiss, L. (2016). Teachers’ perspectives on second language task dif ficulty: Insights from think-alouds and eye tracking. Annual Review of Applied Linguistics, 36, 182–204. https://doi.org/10.1017/S0267190515000124 Robinson, P., & Gilabert, R. (2007). Task complexity, the Cognition Hypothesis and second lan guage learning and performance. IRAL, 45, 161–176. https://doi.org/10.1515/iral.2007.007 Sato, M., & Loewen, S. (2019). Do teachers care about research? The research–pedagogy dia logue. ELT Journal, 73(1), 1–10. https://doi.org/10.1093/elt/ccy048 Sato, M., & Loewen, S. (Eds.) (2020). Evidence-based second language pedagogy: A collection of instructed second language acquisition studies. Routledge. Sato, M., & Loewen, S. (2022). The research-practice dialogue in second language learning and teaching: Past, present, and future. Modern Language Journal, 106(3). Sato, M., & Storch, N. (2020). Context matters: Learner beliefs and interactional behaviors in an EFL vs. ESL context. Language Teaching Research. https://doi.org/10.1177/1362168820923582 Skehan, P. (1998). Task-based instruction. Annual Review of Applied Linguistics, 18, 268–286. https://doi.org/10.1017/S0267190500003585 Vásquez, C., & Harvey, J. (2010). Raising teachers’ awareness about corrective feedback through research replication. Language Teaching Research, 14(4), 421–443. https://doi.org/10.1177/1362168810375365 Zou, D. (2020). Gamified flipped EFL classroom for primary education: Student and teacher perceptions. Journal of Computers in Education, 7(2), 213–228. https://doi.org/10.1007/s40692-020-00153-w

Index

A Abstract 362, 363 Accent/accentedness 233–236, 241–245, 259 Accentedness rating 243 Acceptable/Acceptability 153, 166, 172, 215–217, 220, 221, 224, 235 Acceptability judgment task (AJT) 207, 213, 217, 222 Accuracy 208, 215, 306, 308, 309, 315–317, 319, 330, 332, 333, 335, 337, 339, 340, 343, 346 Accuracy measures 288, 310 (L2) achievement 36 Acoustic-phonetic processing 262 Acquisition 4, 7 Action research 55, 61, 65, 66, 74, 356, 360–362 Affective 132, 134 Affordance 212, 214 Age 310 Age of arrival 245 Age of onset 10 Agency 56, 58, 59, 64, 66, 70, 73, 212 AILA World Congress 371 Allocation of attention 288 Ambiguity resolution 288 Ambiguous/unambiguous language 296 American Association of Applied Linguistics 371 Amount of input 310 Analytic talk 219, 224, 225 Anaphora resolution 293 Anomalous/normal syntactic structures 296 Anomaly detection 293 ANOVA 37, 40, 41, 128

Anxiety 87, 88, 93, 131, 132, 135, 138, 139, 141, 259, 317 Appendices 370 Applied ISLA 309 Approach 5, 8, 11–15, 17, 23, 24 Approximate replication 107–113, 116, 118 Aptitude 214 Argument overlap 293, 294 Articles 208, 209 Articulator 331 Artifacts 56, 62, 65, 68 Artificial languages 291–294 Aspect 104–110, 116, 214, 216, 220, 227 Assessments 36, 40, 48, 50, 51 Assistive prompt 212 Associations 181, 183 Asynchronous computer mediated communication 152 Attention 4, 6, 7, 11, 12, 132, 134, 209–211, 213, 220 Audio recordings 212, 225 Audio-visual prompts 260 Aurally-presented stimuli 213 Authenticity 153, 158, 163, 165, 259, 261, 262, 265, 272, 273 Autoethnography 64 Automated Writing Evaluation (AWE) 308 Automatization 211 Automatized explicit knowledge 214 Auxiliaries 208 B Background knowledge 259 Bare lexical items 209, 217 Behavioral 132, 134 Behavioral methods 264, 267 Behaviors of interest 226 Beliefs 136, 142

Between-group 35, 37, 38, 40, 41, 50 Between-subject designs 186 Bi/multilingual norm 125 Bidirectional listening 261 Binary 224 Bonferroni test 33 Bootstrapping 129 Bottom-up approach/models 219, 258 Brain imaging 211 Breadth of knowledge 182 Breakdown fluency 333, 335, 339 C CA(L)F measures 315 CAF 329, 330, 332–335, 337, 340, 345, 347 CAFFA 329, 330, 332, 335, 337, 338, 341, 343–345, 347 Calibration 271 Canadian Association of Applied Linguistics 371 Canadian Journal of Action Research 361 Canadian Journal of Applied Linguistics 371 Case study/studies 61–63, 74, 212, 310, 311 Categorical 32 Challenge statements 371 Chi-square 128 Clarification requests 111 Class observations 91, 95 Classroom interaction 211, 217, 219, 220 Classroom management 82 Classroom observations/ observation design 110, 212, 310

380 Research Methods in Instructed Second Language Acquisition

Classroom teachers 355, 357–359, 371 Classroom(based) studies 38, 47, 82, 246 CLIL 95 Close replication 105, 107–110, 113, 114, 116, 119 Closed role-plays 159 Co-constructed knowledge 131 “Co-construction” activities 220 Coda consonants 235 Coding 67, 216 Cognates/cognateness 197, 283, 296 Cognition Hypothesis 344 Cognitive approaches 210, 211 Cognitive strategies Cognitive system 330 Cognitive-interactionist 138 Coherence/cohesion 211, 333, 335, 340, 347 Collaboration 128, 226 Collaborative dialogues 152, 314 Collaborative writing 94, 308, 314, 317 Collocations 181–183, 190–192, 197 Comfortably intelligible speech 234 Communicative competence 258, 261–263, 265 Communicative functions 150, 167 Communicative goal 150, 160 Communicative impacts 209 Communicative language use 305 Communicative proficiency 207 Communicative tasks 217, 220 Comparative re-production 108 Competence 5, 7 Complexity 138, 306, 310, 311, 315, 317, 329, 330, 332, 333, 336, 338, 339, 342–344 Complexity by coordination 334, 343 Complexity by subordination 334 Complexity of teaching 81, 82

Complexity, accuracy, fluency (CAF) 306, 329, 330, 332, 339 Complexity, accuracy, fluency (CAF) measures 332, 337, 339 Complexity, accuracy, fluency, functional adequacy (CAFFA) 329, 342 Component processes in reading 281, 295 Component words 188, 197 Comprehensibility 235, 241–243, 245, 247, 248, 335, 340, 347 Comprehensibility rating 243, 244 Comprehension accuracy 284, 296 Comprehension checks 111 Comprehension monitoring 262, 285 Comprehension practice 109 Comprehension task 39 Computer laboratory 213 Computer-animated production test 153 Computer-based 306 Concept-based instruction 214 Conceptual replication 107, 108, 114, 116 Conceptualizer 331 Conclusions 369 Concreteness 197 Concurrent data 308, 312, 313, 318 Concurrent verbal reports 265 Concurrently 213, 219 Conduct observations 130 Confirmation checks 111 Confirmatory factor analysis (CFA) 285 Confounding 358 Confounding variables 358 Congruency in collocations 197 Conscious 213 Constraints on use 183 Content 335, 336, 340, 347 Content organization 259 Content-based instruction 80, 81

Context 81–83, 87, 91, 357, 358, 361, 363, 365, 366, 368 Contexts of use 216 Continuous 32 Control group 33, 37, 38, 46, 151, 152, 155, 156, 162, 172, 220, 221 Controlled tasks 216, 218 Convergent design 84–88, 92, 93, 96, 97 Convergent validity 51 Conversation analysis (CA) 68, 161 Conversational maxims of quantity, relation, manner and quality 340 Corpus 107 Corrected effect size correlation 114 Corrective feedback 89, 94, 236, 239–241, 250 Correlational 311 Correlational research 31, 32, 36, 43, 44 Counterbalanced 138 Cross-linguistic 337, 342 Cross-sectional/ cross-sectional research 126, 127, 335 Cultural conventions 150 Curricular objectives 82 D Data analysis 365, 367 Data coding 367 Data consolidation 92 Data transformation 92 Data visualization 129 Declarative knowledge 210, 211, 213, 240 Decode 282 Deductive 8, 220, 221, 224 Deductive approach 162 Definite article 209 Delayed posttest 109, 186, 187, 190, 191, 194, 195 Delivery format 259 Demographics 212 Dependant variable 32, 34, 45, 47

Index 381

Depth of knowledge 182 Depth of processing (DoP) 314 Derivational complexity 334 Descriptive 212, 224, 311 Descriptive results 114, 115 Descriptive statistics 33 Designs 79, 84–86, 92, 93, 96 Development 4, 5, 9, 12, 14 Dialectal features 233 Dialogic 224 Dialogic speaking tasks 248 Dialogue 156, 171, 178 Dialogue construction tasks 156, 157 Dichotomous interpretation 114 Digital filter 249 Direct 308, 316, 319–321 Direct and indirect expressions 164 Directness 153, 161, 164 Discourse completion tasks (DCTs) 150, 154 Discourse level processing 283 Discovery-learning 291, 292, 297 Discursive processes 224 Discussion 365, 368, 369 Distractors 222 Divergent validity 51 Diversity, equity, inclusion and access (DEIA) 22 Dual commitment 81, 82 Dual language program 87 During-reading strategies 296 Dyadic task-based interaction 134 Dyads 110, 111 Dynamic 10, 130, 131, 135 Dynamic assessment 214 E Early measures 289 Ecological validity 13, 47, 79, 82, 84, 90, 217, 226, 308, 316, 358, 369, 373 Educational Action Research 361 Effect size 114, 115, 117 Electroencephalography (EEG) 267

Elicited imitation tasks 213 Elicited oral imitation test 8 Emergent 95 Emic 212, 215 Emic (insider) perspective 63 Emotion 56, 59, 60, 64, 65, 70, 71, 73 Emotional factors 135 Engagement 17, 131, 132, 134, 318, 320, 321 Enjoyment 87, 88, 93 Epistemology 80 Error correction 360 Errors 333, 335 Ethical research 226 Ethics Committee 19, 20, 22 Ethnography 55, 61, 63, 311 Etic 212 Event-related brain potentials (ERPs) 288 Evidence of learning 316, 317 Evidence-based L2 pronunciation teaching 233, 246 Evidence-based pedagogy 287 Evidence-based practice (EBP) 83, 373 Exact replication 107 Exempt research studies 20 Exit questionnaire 126 Expedited studies 20, 22 Experience (or discovery) approach 291 Experimental groups 220 Experimental laboratory methods 287 Experimental research 31, 36, 38, 43, 46, 48, 49 Expert raters 340 Expertise 131, 139 Explanations 208, 212, 214, 219 Explanatory sequential design 88, 89, 97 Explicit 207, 210–214, 217, 219–221, 224, 225, 228 Explicit information (EI) 109 Explicit instruction 152, 207, 219–221, 285, 291–293, 297 Explicit learning 4, 7, 8, 23 Explicit metapragmatic information 156

Explicit phonetic instruction 236, 240, 250 Exploratory sequential design 85, 89–91, 97 Extended text 154 Extension study 104 Extensive reading 286, 290, 291, 299 External validity 140 Eye tracking 211, 213, 257, 258, 264, 267–271, 273, 274, 308, 312, 362, 377 F FA rating scale 334, 340, 341, 345–347 Face-to-face mode 84, 93 Factorial design 41, 42, 49 Feedback 6, 11, 14, 15, 20, 211, 212, 217, 305, 308, 311, 316, 318, 319, 321 Feedback decisions 376 Feedback processing framework 316 Field notes 63, 66, 68, 130 Fine-grained measures 333, 335, 336 First language (L1) 259 First language (L1) transfer 215 First-fixation duration 289 Fixations 267 Fixed 95 Fluency 245, 306, 311, 315, 316, 329, 330, 332, 333, 335, 337, 339, 343, 346 Fluency development 286 Fluency measures 155 Fluent 214 Focus group interviews 87 Focus on form 39, 306 Focus on formS 39 Focus on meaning 39 Focused 308, 310, 314, 316, 319 Focused coding 67 Forced-choice identification task 244 Foreign Language Annals 370, 371, 375 Foreign language learning 9 Form 208

382 Research Methods in Instructed Second Language Acquisition

Form-focused explicit instruction 152 Form-meaning connections 138 Form-meaning mappings 109 Formulaic language/sequences 183, 310 Formulation 306 Formulator 331 Foundational knowledge 286, 298 Framework 5, 12, 15 Frequency 216, 218, 219 Frequent 217 Functional adequacy 329, 335, 348 Functional adequacy (FA) 329, 330 Functional load 235, 245, 247 Functional magnetic resonance imaging (fMRI) 267 Future directions 355, 356, 369, 373 Future interlocutors 136 G Gaze duration 289 General knowledge 283, 284 General measures 337 Generalizability 13, 18, 82, 163, 226 Georgetown University Round Table 371 Gestures 210 Global measures 333 Goal theory 310 Goals 310 Graded readers 287 Grammar 207, 208, 210–215, 219, 220, 225–228 Grammatical concepts 212, 225 Grammatical functions 182, 183, 187 Grammaticality judgment test (GJT) 8, 213 Grit 131 Grounded Theory (GT) 67, 74 Group-level analyses 110 Groups 106, 110, 115

H Head acts 157 Heritage language learning 9 Heritage language speakers 105 High/low frequency vocabulary 296 Higher-level cognitive processes 262 Higher-order comprehension processes 284 Histograms 129 Holistic measures 336 Homogeneity of variance 129 Hypothesis 5, 12, 13, 108, 364 Hypothesis testing 306, 314 I ID factors 44, 46, 49 Ideal L2 self 61, 133 Identity 55, 56, 58, 59, 61, 62, 64, 66, 68, 70, 71, 73, 135, 142 Idiodynamic method 132, 135 Illocutionary force 161 Imageability 197 Immediate posttest 187, 190, 191 Implications 355, 365, 368, 371 Implicit 210, 211, 213, 214, 221 Implicit instruction 152 Implicit learning 4, 8, 9 Impressions 150 In-class practices 136 In-situ studies 290, 297 Inanimate L2 sources 136 Inchoative 219, 221 Incidental learning 185–187, 189, 193–196 Incidental vocabulary learning 181, 184, 185, 187, 188, 193–195 Indefinite article 209 Independent samples t-test 128 Independent variable 32–34, 37, 38, 41, 45, 46, 49 Independently verified 224 Indirect 308, 319, 320 Individual differences (IDs) 5, 18, 285, 308 Individual interviews 133, 134 Individual vs. collaborative 306 Inductive 8, 220, 221 Inferences 282–284, 297

Inferencing 260, 262 Inferential statistics 33, 46, 48, 211 Infinitive verb forms 217 Inflectional complexity 334 Inflectional morphemes 208–210 Information exchange tasks 110 Initial study 104–114, 116, 117 Input 5, 7, 9–12, 210, 211, 258, 260–263, 266, 272, 273 Input processing skills 286 Inservice teachers 360 Insider 130 Instructed approach 291 Instructed second language (L2) pronunciation 233 Instructed second language acquisition (ISLA) 3, 355, 363, 375 Instructed second language acquisition of reading (ISLAR) 281 Instructed settings 3, 4, 7, 9 Instructional conditions 211, 219 Instructional contexts 355, 365, 374 Instructional intervention 182, 184–186, 190, 191, 196, 305 Instructional materials 214 Instructional methods 149, 151, 152 Instructional studies 150–153, 159, 164–167 Instructional support 224, 225 Instruments 363, 365, 366, 370, 372 Intact classes 125–128, 138, 140, 218 Integrated listening tasks 262 Intelligibility 235, 237, 241, 242, 244, 247 Intelligibility principle 235 Intensive reading 286, 296 Intentional learning 184, 185, 187, 188, 191–194, 198 Intentional vocabulary learning 181, 185, 187–189, 194, 199 Intentions 150, 218 Interaction 110

Index 383

Interaction effects 41, 49 Interactional competence 215 Interactional outcomes 159, 160, 166 Interactive listening 259, 261, 262, 266, 272, 273 Interactive models 258, 259 Intercultural competence 91, 95 Interdisciplinary research methods 281, 291 Interlanguage 136 Interlingual homographs 283 Interlocutor individual differences 126 Interlocutors 6, 10, 15, 263 Internal reliability 40, 50 Internal validity 140, 217 Internalized 212 International Association for Task-Based Language Teaching 371 Interpretation 207, 212–214, 227, 365, 367–369 Interpretive 55, 56, 62, 211 Interrater coefficients 197 Interrater reliability 51, 224, 367 Interrogative constructions 219 Intervention studies 181, 186, 187, 196 Interviews 55–57, 62–64, 66, 68, 70, 71, 131, 134, 212, 213, 219, 264, 265, 278, 308, 310, 312, 318 Intra-rater reliability 51 Introduction 355, 363 Introspective techniques 312 Investment 56, 60, 61 IRIS database 220, 231, 366, 372 ISLA applied 309, 318–320 ISLA researchers 355, 357, 371–373 J Jargon 21 Journal of Educational Change 361 Journal of Mixed Methods Research 80

Journal of Second Language Teaching and Research 370 Journal of Teacher Action Research 361 Journal of the Scholarship of Teaching and Learning 361 Judgment scales 244 Judgment tasks 109 K Keystroke logging tools 310, 312, 313 Knowledge about the world 282, 283 Kolmogorov-Smirnov goodness-of-fit test 128 L L1 fluency 84, 94 L1 reading comprehension 285 L2 anti-ought-to self 133 L2 fluency 84, 85, 93, 94 L2 grit 132, 136, 141 L2 immersion 217 L2 learning processes 305, 306 L2 listening development 257, 272 (L2) Motivational Self System 61 L2 pragmatics 149–151, 159 L2 proficiency 32, 35, 36, 43, 44, 284, 296 L2 pronunciation research methods 233 L2 Spanish 207, 217–220, 224 L2 speaking 84 L2 speaking or writing activities 218 L2 speech (data) analysis 233 L2 speech instruments 233 L2 speech production 138 L2 vocabulary knowledge 259 L2 writers/writing 93–95, 305, 306, 308, 309, 311–316, 318, 319, 321 Lab-based studies 355 Labeling 224 Laboratory-based studies 233, 246 Laboratory research 287, 288, 290, 291

Language Learning and Technology 371, 375 Language learning strategies 95 Language program directors 357, 359 Language socialization 55–57, 62 Language socialization theory 56 Language Teaching Research 361, 370 Language testing 98 Language-focused learning 286 Language-related episodes 132, 138, 152, 219, 311 Large-scale quantitative methods 211 Larger-scale studies 310 Late measures 289 Learner affects 93 Learner gender 93 Learner individual differences (IDs) 126 Learner psychology 82 Learner traits 36, 48 Learner-generated content 94 Learning 3–7, 9–14, 17 Learning outcome measures 127 Learning outcomes 149–153, 163–168 Legitimacy 61 Length (of input) 259 Length-based measures 333–335, 342 Level of proficiency 309 Levene’s test 129 Lexical complexity 315, 333, 334, 336, 337, 342 Lexical density 336 Lexical development 185 Lexical diversity 337, 339, 340, 342 Lexical items 207–209 Lexical processing 283 Lexical profiles 184, 194 Lexical richness 336 Lexical sophistication 138 Likert scale 153, 158, 159, 221 Limitations 364, 369 Limited Attentional Capacity Model 344

384 Research Methods in Instructed Second Language Acquisition

Linguistic complexity 333, 340 Linguistic knowledge 283 Linguistic self-confidence 138, 139 Listener characteristics 259 Listener judgments 242, 244 Listening 257–274 Listening comprehension 258–262, 264, 273 Listening strategies 259–261, 266, 273 Literature review 363–365, 368 Longitudinal collaboration 358, 359 Longitudinal projects 358 Longitudinal research 337, 342 Longitudinal studies 126, 127, 309 Low-tech 218 Lower-level cognitive processes 262 M Macro studies 34, 35 Macro timescales 131 Main effects 41, 42 Manipulation of tasks 305, 317 Materials 365, 366, 370, 372 Meaning 208, 215, 216, 224, 225 Meaning-focused input 286 Meaning-focused output 286 Meaning-focused reading 286, 296 Meaningful activity 210–212, 214 Meaningful interaction 134 Measures 329, 330, 333–339, 342–345, 347 Mechanical exercises 208 Media resources 218 Mediating variables 139 Mediation 58 Mediational needs 212 Medium 306, 314, 320 Meta-analysis/analyses 80, 85, 87, 92, 94, 290, 291, 298 Metacognition 259 Metacognitive Awareness Listening Questionnaire (MALQ) 265

Metacognitive instruction 261 Metacognitive strategies 260, 262, 265 Metalinguistic 308, 312, 319, 320 Metalinguistic explanations 208 Metalinguistic understanding 210, 212 Methodology/methodologies 55, 56, 61, 64, 65, 67, 68, 72–74 Methods 55, 56, 61, 63, 64, 66, 68, 69, 71, 73–75, 257, 258, 261, 263–268, 272, 273, 358, 362–365 Methods for assessing learning outcomes 150, 151, 153, 157, 159, 162, 168 Micro studies 34, 35 Micro timescales 131, 133 Microgenesis 213 Missing data 129 Mixed design 37, 41, 50 Mixed effects modeling 198 Mixed effects models 198 Mixed methods research (MMR) 79, 85, 95 Mixed methods study 268, 273 Mobile applications 10 Mobile instant messaging 89 Mobile-mediated dynamic assessment 94 Model 5, 12 Model of Speech Production 330, 331, 347 Moderating factors 139 Moderating variables 131 Modern Language Journal 370 Modified graded reader 190 “Modified” or “pushed” output 316 Monitoring 306 Monologic tasks 242, 248 Mood 214 Morphological complexity 333, 334, 336, 343 Morphosyntax 208 Motivation 56, 60, 61, 67, 68, 127, 131–134, 136, 259 Motivation surveys 296 Motivational factors 135

Motivational regulation strategies 95 Multi-word expressions 296 Multimedia 259, 260, 273 Multimodal 64, 66, 69 Multiple choice task 292 Multiple regression 128 Multiple scoring systems 188 Multiple-choice questions 259, 262 Multiword items(units, MWU) 181–186, 188, 189, 197–199 N Narrative data 134 Narrative inquiry 55, 61, 70, 71, 74 Narrative review 80 Narrative synthesis 15 Narratives 132, 134 Native-speaker variety 215 Nativeness principle 235 Naturally-occurring interactions 159 Negotiating 225 Negotiation of/for meaning 6, 110, 111 Netnography 64 Neuroimaging methods 264, 267 Neuroimaging research 210 Neurological evidence 238 Non-cognates 283, 296 Non-conscious 210 Non-expert raters 340, 341, 345–347 Non-normality 129 Non-redundant measures 338 Non-standard grammatical features 215 Non-target-like L2 use 215 Non-university contexts 130 Non-veridicality 265, 266 Nonnative-accented speech 234 Nontarget form 241 Normality 129 Normality test of sphericity 129 Norms of language use 150 Northeast Conference on the Teaching of Foreign Languages 371

Index 385

Note-taking 259 Noticing 306, 313, 314 Noticing hypothesis 151 Null hypothesis significance testing 33, 114 Number of fixations 289 Number of regressions from and to a word 289 O Objectivist, positivist paradigm 62 Observable linguistic behavior 213 Observational 311 Observational research 31, 32, 36, 44 Observational studies 296 Observations 55–57, 62–66, 68 Observer effect 125, 130, 138, 139 Offline measures 288, 297 Offline tests 109, 288 Online tests 109 Onset consonants 235 Ontology 80 Open Accessible Summaries in Language Studies (OASIS) 371 Open coding 67 Open role-plays 159–161 Open science 128 Open Science Framework 295 Open-ended 213, 216–218 Open-ended questions 259 Operationalize 3, 7, 21, 213, 214 Oral corrective feedback 361 Oral DCTs 155, 158 Oral tasks 337, 341, 344 Oral vs. written mode 306 Orthographic knowledge 283 Orthography-phonology mapping 292 Ought-to L2 self 61, 133 Outcomes 305–307, 318 Outliers 89 Outsider 130 Overall complexity 333, 334, 342 Overall proficiency in the target L2 259 Overtly 282

P Paper-and-pencil test 218 Paper-based 306 Paradigm 55, 56, 61 Paradigm war 80 Paragraph-level 218 Paralinguistic cues 263 Partial and full knowledge 190 Partial knowledge 188 Partial replication 360 Participant attrition 127, 128 Participant bias 46 Participant fatigue 217 Participant profiles 111 Participant’s L1 151 Participants 360, 363–366 Passion 131, 136, 141 Passives 215, 216 Past interlocutors 136 Pause length 155 Pearson correlation 128 Pedagogical effectiveness 214 Pedagogical implications 81 Pedagogically-relevant knowledge 306, 307 Peers 214 Percentages 216 Perception and beliefs about corrective feedback 360 Perception-first view 238–240 Perceptual Assimilation Model (PAM) 238 Perseverance 131, 136, 141 Person-in-context 61 Perspective taking 285 Phonetic instruction 239 Phonetic plan 331 Phonics training 292 Phonological complexity 333 Phrasal complexity 334, 336, 343 Phraseological complexity 333 Physical context 218 Pictorial modality 259 Picture description tasks (PDTs) 207, 217, 218, 223, 242, 243, 245, 248, 249 Picture word matching 292 Pilot 16, 17, 19, 20 Pilot testing 217, 218, 226, 227 Playback control 259

Politeness 152–154, 161, 164, 173 Population validity 47 Post-positivist (paradigm) 61 Post-reading questionnaire 290 Postmodernist (paradigm) 61 Posttest 126, 127 Power 33, 34, 44, 48, 49 Power analysis 33, 34 Practical relevance 82, 83, 97 Practice 239, 240, 246 Practice effect 138 Practice opportunities 236, 239–241, 250 Practice-based research (PBR) 83, 373 Practitioners 81–84, 97, 98, 357, 360, 370, 373 Pragmalinguistics 168, 179 Pragmatic knowledge 149–152, 165–168, 180 Pragmatic processing 262 Pragmatic-related episodes 156 Pragmatically 215 Pragmatics 94, 98 Pragmatism 80, 81 Pre-/posttest design 151 Pre-empt 221 Pre-reading strategies 296 Pre-registration 112 Pre-verbal message 331 Predictions 283, 285, 295 Prefixes 208 Pretest 36, 41, 109, 127, 157, 186 Pretest-posttest-(delayed posttest) designs 311 Primary-interest predictors 293, 296 Priming 288, 296, 299 Prior experience with the L2 131 Prior knowledge 125, 130 Prior student relationships 126 Probability of word skipping 289 Problem-solving tasks 213 Procedural knowledge 210, 211, 214, 240 Proceduralized 4 Procedures 365, 367 Process-oriented approaches 257, 308

386 Research Methods in Instructed Second Language Acquisition

Processes 211, 213, 215, 219, 224–226 Processing tasks 213, 214 Product-oriented approach 263, 308, 320 Production 207, 213–216, 218, 223 Productive 211, 216 Productive knowledge 182, 183, 190 Productive modalities 214 Productive vocabulary knowledge 181 Productive writing and speaking 133 Products 225 Professional development 357, 359 Proficiency 131–133, 135, 142 Prominence 218 Prompts 241, 242, 244 Pronounceability 197 Pronouns 208 Pronunciation 233–242, 245–256 Pronunciation errors 235, 241 Propensity score matching 291 Propositional complexity 333, 335 Prosody 164, 168 Protocol 13, 14, 20–22 Pseudowords 191, 193, 194 Psycholinguistics 220 Psychological and physiological impact 166 Psychological constructs 36, 51 Publication bias 128 Q Q-Q plots 129 Qualitative 55, 56, 61–63, 67, 72, 73 Qualitative research 55, 56, 59, 61, 67, 72 Qualitative research synthesis 72 Qualitative techniques 290 Quasi- and true experimental 311 Quasi-experimental in-situ design 296

Quasi-experimental study 38 Questionnaires 66, 80, 84–91, 93–95, 134, 264, 265, 312 R Random group assignment 38 Randomized 223 Rate 7 Rater training 337, 338, 344–347 Reaction time measures 211 Reactivity 46, 265, 266, 312, 313 Reader motivation 290 Reading 281–299 Reading comprehension 281–291, 293–295, 297–299 Reading development 281, 284, 285, 287, 298 Reading fluency 284, 290, 294, 296, 298 Reading tasks 248 Reading time 288, 294 Reading-aloud tasks 242 Real-time (online) processing 288 Real-time ability 210 Real-time measures 288 Recall tests 187, 188 Receptive listening and reading skills 133 Receptive modalities 214 Receptive vocabulary knowledge 181 Recognition tests 187, 188 Recruitment 129, 130 Redundant 343 References 369, 370 Reflective journals 91, 95 Reflexivity 67, 71, 72 Reformulations 241 Relevant responses 218 Reliability scores 249 Reliable/reliability/reliability measures 15, 18, 40, 41, 47, 50, 51, 197, 338, 341, 343, 344, 346 Repair fluency 333, 335, 339, 343 Repeated-measures (counterbalanced) designs 311

Replication Recipe template 112, 113 Replication research 103, 105, 107, 111, 114–116, 118, 360 Replication study 104–108, 110–113, 116–118 Reproduction tasks 39 Request strategies 152, 161, 172, 178, 179 Requests 152, 154, 156, 159, 161, 165, 166, 169, 170, 172, 177 Research design 358, 360, 365, 373 Research methods 355, 356, 358, 362, 363 Research questions 103–107, 111, 118, 362, 364–366, 368 Research report 355, 362, 363 Researcher bias 46 Researcher-practitioner collaboration 233, 250 Researcher-teacher collaboration 356, 357, 360 Response latency 288 Response modality 259 Restructuring 316, 321 Results 358, 363, 365–370, 372 Retrospective techniques 312 Retrospective verbal reports 265, 266 Retrospectively 213 Right to speak 60, 61 Robot-assisted instruction 152 Role-plays 150, 153, 154, 159, 161, 163–168, 175 Route 7 Rule formation 225 S Saccades 267 Sample 31–34, 43, 46–48, 50 Sample size 151 Sample size planning 128 Sampling 34, 47, 50 Scaffolded feedback 89 Scaffolding 58 Scalar judgments 243 Scatterplots 129 “Scientific” grammatical concepts 214 Scoring scheme 224

Index 387

Screen-capture technologies 312 Second language acquisition (SLA) 3 Second language learning 9, 23 Second Language Research 370, 375 Segmentals 236–238, 241, 250 Self-assessed proficiency 87 Self-monitoring 135 Self-paced reading 109, 281, 293 Self-perceived communicative competence 138, 139 Self-regulated activity 212 Self-report questionnaire 138 Semantic 213, 220, 221 Semantic processing 262 Semi-structured 87, 89, 91 Shapiro-Wilk tests 129 Simple view of reading 284, 285 Simplification strategies 217 Single words 296 Single-word units 183 Skill acquisition theory 151, 240 Small sample sizes 125, 128, 129 Social 129, 132, 134, 137 Social approaches 210–212 Social context 56, 58, 60, 62, 70, 71, 73 Social distance 158, 164 Social factors 135 Social interaction 98 Social media 93 Sociocognitive 226 Sociocultural context 149 Sociocultural Theory 151 Sociolinguistically 215 Sociopragmatics 179 Sophistication 334, 336, 343 Southwest Conference on Language Teaching 371 Spanish 207, 225 Spanish pronoun se 219 Speaker characteristics 259 Specific measures 335, 336 Speech act 152–156, 162, 164, 165, 167, 169 Speech community 233

Speech modalities 233, 236 Speech perception 236, 238, 239, 241, 250 Speech plan 331 Speech production 238–241, 244, 250 Speech production accuracy 238 Speech rate 155, 259 Speed fluency 333, 335, 339, 343 Spillover effects 217 Spontaneous interactions 160 Stages 219 Standard variety 214 Statistical assumptions 128, 140 Statistical conclusion validity 31, 45, 48, 52 Stimulated recalls 111, 265, 266, 270, 308, 313, 361 Strategy-based instruction 261 Students’ and teachers’ roles 310 Studies in Second Language Acquisition 366, 370, 373, 375 Studies in Second Language Learning and Teaching 371, 375 Study abroad 9, 10, 89, 94, 98, 217, 310 Study report 362 Sublexical processing 283 Suffixes 208 Summability 197 Sunshine TESOL state conference 371 Suprasegmentals 236–238, 241, 250 Survey research 264–266 Symbol-picture matching 292 Syntactic and lexical mitigations 167 Syntactic complexity 138, 259, 315, 333–337, 339, 340, 342, 348 Syntactic processing 283 Syntax 208 Synthesis 150, 151 System (journal) 366, 375

T T-test 37, 40, 41 Tables and figures 370 Tailor-made instruments 133 Target features 127 Target form 240, 241 Target language 151, 152, 163 Target language input 282 Target pragmatic features 151, 154 Target structure 39, 41, 208, 214–219, 221, 225 TASK 371 Task characteristics 259, 260 Task complexity 45, 138–140, 306, 307, 311–313, 317, 362 Task complexity factors 306, 307 Task engagement 317, 321 Task implementation conditions 306 Task modality 306, 307, 335, 340, 348 Task motivation 317 Task requirements 335, 340, 347 Task type 335, 337, 338, 340, 344, 347, 348 Task-based approach 152, 168 Task-based language teaching (TBLT) 138, 330, 358 Task-specific motivation 132–134 TBLA 330, 332, 334 TBLT 332, 334, 359, 361, 372 TBLT Language Learning Task Bank (tblt.indiana.edu) 366 Teacher beliefs 93 Teacher emotions 59, 60 Teacher identity 78 Teacher individual differences 125, 126 Teacher training program 356, 360 Teacher vision and motivation 95 Teacher-researcher 356, 360, 361 Teacher-researcher collaborations 355, 363 Teacher’s motivational strategies 318

388 Research Methods in Instructed Second Language Acquisition

Teachers 208, 214, 226–228 Teachers’ accents 94 Teaching materials 91, 95 Technology 217, 218, 227, 257, 264, 267, 274 Technology-enhanced simulations 153 Technology-mediated pragmatics 152 Tense 214 TESL Canada Journal 371 TESOL Quarterly 371, 375 Test modes 93 Test-retest reliability 40 Test-taking strategies 268, 269 Testing vocabulary knowledge 181 Tests eliciting vocabulary knowledge 187 Thematic coding 224 Theoretical coding 67 Theory 5, 11, 12, 14–16 Think-alouds 213, 265, 266, 290, 296, 308, 312, 314, 315, 319, 320, 362 Time constraints 259 Time gaps 218 Time pressure 213, 221 Time-based measures 333 Top-down 8, 13 Top-down models 258 Topic familiarity 283 Topic selection 49 Total reading time 289 Trade-off hypothesis 344 Training 129, 136 Transcription task 244 Transcripts 220, 224 Translanguaging 137, 142 Triadic Componential Framework 344 Triangulating 133, 225, 268, 272, 273, 361 TSLT (task supported language teaching) 162 Type I and Type II 33 Type of linguistic item 309, 318 Types of tasks 259, 266

U Unacceptable 215, 217, 221 Unequal variances 129 Unfocused 308, 319, 320 Universal mental processes 210, 212 University bias 130 Uptake 316 Usage 110 Use 207–216, 218–220, 224 Use of hesitations and pauses 259 Use of metacognitive strategies 259, 264 Use of reading strategies 283 Use of visuals 259 V Valid measures 338, 343 Valid/validity 13, 15, 31, 36, 40, 44–52, 214, 226, 264–266, 269, 272, 273, 341, 344, 345, 347 Variable(s) 32 Verb meanings 215, 221 Verbal modality 259 Verbal report data 268, 269 Verbal reports 264–266, 271, 272 Verbal self reports 211 Veridicality 312, 313 Video abstracts 371 Video conference mode 84 Video recording 130, 225 Virtual interlocutor 153 Visual cues 260 Visual display 92 Vocabulary components approach 182, 196 Vocabulary knowledge 181, 182, 184–188, 191, 193, 196, 198, 199 Vocabulary Levels Test 190, 194 Vocabulary profiling software 194 Vocabulary size 285, 290, 296 Vocabulary test formats 187

Voice 214, 216 Vulnerable population 129 Vygotskian sociocultural theory (SCT) 57 W WCF processing 312 Willingness to communicate (WTC) 60, 132, 135, 245 Willingness to read 290 Within-group 35, 37, 41, 50 Within-group effect sizes 114 Within-subject designs 186 Word families 184 Word limits 218 Word parts 181–183 Word properties 198, 199 Word reading (decoding) 284 Word recognition 262 Word-to-text integration (WTI) 288, 293, 296 Working memory 210, 211 Working memory capacity 259 Writing processes 306, 307, 309, 311 Writing tasks 305, 306, 317, 318 Written autobiographical reflections 133 Written corrective feedback (WCF) 305, 308 Written DCTs 155 Written L2 development 305, 309, 310 Written languaging 308, 312, 313 Written stimuli 213 Written tasks 211 Z Zone of proximal development (ZPD) 58

This stand-alone ISLA research methods guide begins by offering foundations of conducting ISLA research, followed by chapters organized by four skill areas (listening, speaking, reading, writing) and four major linguistic features (grammar, vocabulary, pronunciation, pragmatics). In each chapter, authors define the target sub-domain of ISLA, outline the basics of research design, provide concrete guidance on crafting robust research questions, identifying appropriate methodology and method(s), adapting an existing instrument or creating your own, carrying out a study, analyzing and interpreting data, and determining how/where/when to share your work. The volume also dedicates chapters to addressing common inquiries of conducting ISLA research.

“Put simply, this volume is absolutely indispensable for anyone engaged in ISLA research. Two of the world’s most accomplished and renowned ISLA scholars, Laura Gurzynski-Weiss and YouJin Kim, and the incredible cast of contributing authors they’ve brought together are setting the new standard for empiricism in ISLA. Deftly aligning substantive and methodological considerations, this text answers every question you ever had about ISLA methods and every question you never thought to ask. Just one question remains: How did ISLA make it this far without this book?” Luke Plonsky, Northern Arizona University, USA “This comprehensive book eases the reader into the world of instructed second language acquisition (ISLA) research. The editors have brought together leading ISLA scholars, who have written accessible chapters that cover a wide range of topics and explore critical research-related concerns. Whether you’re new to ISLA in general or wish to branch out in a new direction, this book walks you through the information you need to conduct your next (or even first) ISLA research study.” Shawn Loewen, Michigan State University, USA

isbn 978 90 272 1268 9

John Benjamins Publishing Company